From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202412 header.b=KI4xQXnz; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 472DA5A0625 for ; Fri, 31 Jan 2025 07:32:26 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202412; t=1738305128; bh=SnDtst+nAM15FNAX1xwrZTuxPjYtLCi9nXwm6vO4odI=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=KI4xQXnzvDOlganVzumgs0ui38MZrds/79bdFFlqP96Flgx6jrq3UuTUTaV6HeFrO Qsuph5AagupgFaO9NuTmY7Vus9C2ujTIwOvIqUR/wbVPiieAt0Q6epKjBJcLdh9/ZU pPEvLFIVKy1st7iTTyWQgOkFgzi8v4s5dzi/PmsxNW3cJnsoVU4cPWWAGXR50gSEbe cCBvx2U4DAMpqaoEiWgk84XSa9wlrpLQJ2QbKGFh55s8hGIhFrk7x8R+mC4MSjE/hV HipFJkIROhl3t8xOlOXapTk9wTEFCqHcUMqoyXmfeL2A3Gx2BXIa50n+g3AObsn7j6 NQSnmSLPhm4qw== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4YkmJm0PqBz4wgp; Fri, 31 Jan 2025 17:32:08 +1100 (AEDT) Date: Fri, 31 Jan 2025 17:14:18 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 6/7] Introduce facilities for guest migration on top of vhost-user infrastructure Message-ID: References: <20250127231532.672363-7-sbrivio@redhat.com> <20250128075001.3557d398@elisabeth> <20250129083350.220a7ab0@elisabeth> <20250130055522.39acb265@elisabeth> <20250130093236.117c3fd0@elisabeth> <20250131063655.41a5861b@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="16QxS/f13tfHRC9B" Content-Disposition: inline In-Reply-To: <20250131063655.41a5861b@elisabeth> Message-ID-Hash: 633QUJYSCD7F3DZP2SDJOT76VYY2UM5N X-Message-ID-Hash: 633QUJYSCD7F3DZP2SDJOT76VYY2UM5N X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Laurent Vivier X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --16QxS/f13tfHRC9B Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jan 31, 2025 at 06:36:55AM +0100, Stefano Brivio wrote: > On Thu, 30 Jan 2025 09:32:36 +0100 > Stefano Brivio wrote: >=20 > > I would like to quickly complete the whole flow first, because I think > > we can inform design and implementation decisions much better at that > > point >=20 > So, there seems to be a problem with (testing?) this. I couldn't quite > understand the root cause yet, and it doesn't happen with the reference > source.c and target.c implementations I shared. >=20 > Let's assume I have a connection in the source guest to 127.0.0.1:9091, > from 127.0.0.1:56350. After the migration, in the target, I get: >=20 > --- > socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) =3D 79 > setsockopt(79, SOL_SOCKET, SO_REUSEADDR, [1], 4) =3D 0 > bind(79, {sa_family=3DAF_INET, sin_port=3Dhtons(56350), sin_addr=3Dinet_a= ddr("0.0.0.0")}, 16) =3D 0 > sendmsg(72, {msg_name=3DNULL, msg_namelen=3D0, msg_iov=3D[{iov_base=3D"\1= ", iov_len=3D1}], msg_iovlen=3D1, msg_control=3D[{cmsg_len=3D20, cmsg_level= =3DSOL_SOCKET, cmsg_type=3DSCM_RIGHTS, cmsg_data=3D[79]}], msg_controllen= =3D24, msg_flags=3D0}, 0) =3D 1 > recvfrom(72, "\1", 1, 0, NULL, NULL) =3D 1 > setsockopt(79, SOL_TCP, TCP_REPAIR_QUEUE, [2], 4) =3D 0 > setsockopt(79, SOL_TCP, TCP_QUEUE_SEQ, [1788468535], 4) =3D 0 > write(2, "77.6923: ", 977.6923: ) =3D 9 > write(2, "Set send queue sequence for sock"..., 51Set send queue sequence= for socket 79 to 1788468535) =3D 51 > write(2, "\n", 1 > ) =3D 1 > setsockopt(79, SOL_TCP, TCP_REPAIR_QUEUE, [1], 4) =3D 0 > setsockopt(79, SOL_TCP, TCP_QUEUE_SEQ, [115288604], 4) =3D 0 > write(2, "77.6924: ", 977.6924: ) =3D 9 > write(2, "Set receive queue sequence for s"..., 53Set receive queue seque= nce for socket 79 to 115288604) =3D 53 > write(2, "\n", 1 > ) =3D 1 > connect(79, {sa_family=3DAF_INET, sin_port=3Dhtons(9091), sin_addr=3Dinet= _addr("127.0.0.1")}, 16) =3D -1 EADDRNOTAVAIL (Cannot assign requested addr= ess) > --- >=20 > EADDRNOTAVAIL, according to the documentation, which seems to be > consistent with a glance at the implementation (that is, I must be > missing some issue in the kernel), should be returned on connect() if: >=20 > EADDRNOTAVAIL > (Internet domain sockets) The socket referred to by > sockfd had not previously been bound to an address > and, upon attempting to bind it to an ephemeral > port, it was determined that all port numbers in the > ephemeral port range are currently in use. See the > discussion of /proc/sys/net/ipv4/ip_local_port_range > in ip(7). >=20 > but well, of course it was bound. >=20 > To a port, indeed, not a full address, that is, any (0.0.0.0) and > address port, but I think for the purposes of this description that > bind() call is enough. So, I was wondering if binding to 0.0.0.0 is sufficient for a repaired socket. Usually, of course, that 0.0.0.0 would be resolved to a real address at connect() time. But TCP_REPAIR's version of connect() bypasses a bunch of the usual connect logic, so maybe we need an explicit address here. =2E..but that doesn't explain the difference between passt and your test implementation. > Is this related to SO_REUSEADDR? I need it (on both source and target) > because, at least in my tests, source and target are on the same > machine, in the same namespace. If I drop it: Again, I can think of various problems that not having the same address available on source and dest might have, but not any which explain the difference between passt and the experimental impl. > --- > bind(79, {sa_family=3DAF_INET, sin_port=3Dhtons(46280), sin_addr=3Dinet_a= ddr("0.0.0.0")}, 16) =3D -1 EADDRINUSE (Address already in use) > --- >=20 > as expected. >=20 > However, in my reference implementation, with a connection from > 127.0.0.1:9998 to 127.0.0.1:9091, this is what the target does: >=20 > --- > socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) =3D 3 > setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) =3D 0 > bind(3, {sa_family=3DAF_INET, sin_port=3Dhtons(9998), sin_addr=3Dinet_add= r("0.0.0.0")}, 16) =3D 0 > socket(AF_UNIX, SOCK_STREAM, 0) =3D 4 > unlink("/tmp/repair.sock") =3D 0 > bind(4, {sa_family=3DAF_UNIX, sun_path=3D"/tmp/repair.sock"}, 110) =3D 0 > listen(4, 1) =3D 0 > accept(4, NULL, NULL) =3D 5 > sendmsg(5, {msg_name=3DNULL, msg_namelen=3D0, msg_iov=3D[{iov_base=3D"\1"= , iov_len=3D1}], msg_iovlen=3D1, msg_control=3D[{cmsg_len=3D20, cmsg_level= =3DSOL_SOCKET, cmsg_type=3DSCM_RIGHTS, cmsg_data=3D[3]}], msg_controllen=3D= 24, msg_flags=3D0}, 0) =3D 1 > recvfrom(5, "\1", 1, 0, NULL, NULL) =3D 1 > setsockopt(3, SOL_TCP, TCP_REPAIR_QUEUE, [2], 4) =3D 0 > setsockopt(3, SOL_TCP, TCP_QUEUE_SEQ, [1612504019], 4) =3D 0 > setsockopt(3, SOL_TCP, TCP_REPAIR_QUEUE, [1], 4) =3D 0 > setsockopt(3, SOL_TCP, TCP_QUEUE_SEQ, [1756508956], 4) =3D 0 > connect(3, {sa_family=3DAF_INET, sin_port=3Dhtons(9091), sin_addr=3Dinet_= addr("127.0.0.1")}, 16) =3D 0 > --- >=20 > The only obvious difference is that, here, I'm not binding to an > ephemeral port: the source port (in both source and target "guests") is > 9998. >=20 > Fine, so I tried forcing a lower port in passt (source) as well, and > this is what I get in the target now: >=20 > --- > socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) =3D 79 > setsockopt(79, SOL_SOCKET, SO_REUSEADDR, [1], 4) =3D 0 > bind(79, {sa_family=3DAF_INET, sin_port=3Dhtons(9000), sin_addr=3Dinet_ad= dr("0.0.0.0")}, 16) =3D 0 > sendmsg(72, {msg_name=3DNULL, msg_namelen=3D0, msg_iov=3D[{iov_base=3D"\1= ", iov_len=3D1}], msg_iovlen=3D1, msg_control=3D[{cmsg_len=3D20, cmsg_level= =3DSOL_SOCKET, cmsg_type=3DSCM_RIGHTS, cmsg_data=3D[79]}], msg_controllen= =3D24, msg_flags=3D0}, 0) =3D 1 > recvfrom(72, "\1", 1, 0, NULL, NULL) =3D 1 > setsockopt(79, SOL_TCP, TCP_REPAIR_QUEUE, [2], 4) =3D 0 > setsockopt(79, SOL_TCP, TCP_QUEUE_SEQ, [-348109334], 4) =3D 0 > write(2, "46.9751: ", 946.9751: ) =3D 9 > write(2, "Set send queue sequence for sock"..., 51Set send queue sequence= for socket 79 to 3946857962) =3D 51 > write(2, "\n", 1 > ) =3D 1 > setsockopt(79, SOL_TCP, TCP_REPAIR_QUEUE, [1], 4) =3D 0 > setsockopt(79, SOL_TCP, TCP_QUEUE_SEQ, [-1820322671], 4) =3D 0 > write(2, "46.9752: ", 946.9752: ) =3D 9 > write(2, "Set receive queue sequence for s"..., 54Set receive queue seque= nce for socket 79 to 2474644625) =3D 54 > write(2, "\n", 1 > ) =3D 1 > connect(79, {sa_family=3DAF_INET, sin_port=3Dhtons(9091), sin_addr=3Dinet= _addr("127.0.0.1")}, 16) =3D -1 EADDRNOTAVAIL (Cannot assign requested addr= ess) > --- >=20 > no obvious difference. I'll try binding to an explicit address, next, > but I have no idea why 1. we get EADDRNOTAVAIL after a bind() and 2. it > works with the reference implementation. I have no ideas yet :(. > Yes, I explicitly close() the socket in the source passt now, but that > doesn't change things. >=20 > This is presumably just an issue with testing, because in real use > cases source and target guests would be on different machines. Another > idea could be separating the namespaces. Well, if that's relevant to the problem which isn't clear yet. I mean, I guess it's worth trying with source and dest in different namespaces. > I can't just run source and target passt in two instances of pasta > --config-net, because pasta would run into the same issue, Uh.. which same issue? pasta's not trying to do any TCP_REPAIR stuff or migration. > but I could > isolate one namespace with it, then add two network namespaces inside > that, and connect them with veth pairs. Two pasta instances actually sounds like a better bet to me, because the two "hosts" will have the same address, which is what we'd expect for a "real" migration - and it kind of has to be the case for the host side connections to work afterwards. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --16QxS/f13tfHRC9B Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmecaisACgkQzQJF27ox 2GfTUw/9E9JLAMxdiSuhigq2Eqyc7kBBpwC8Mk+OtAqCFcoUVx1sWYZEQIQ305cJ 1CgBB7XekwHs3+Sx72XqDiYSraGBhBl9BaKY3wxZM14pZXX1/RhdKszuUgQu6OJK zFcXdRZFpppaRPOqr0VKGKfgjc7f99pX1Mr9bnJAAPeQvcKajAPHrLDND01eRnKI xK9ES4m0iKfO6JcqYjs+pCg2HFxfvBp2KjE0ZPbwHG9W4iAsH26C8C1HnXUIpXhX VheRY13MmRPHSfR6W+ULuXGaXrqZf8czO+HSVJsphujLuqdwAjcwxdSIc4o/Yr3o SQ395cSZLPHRpoXiOFqZiCqqzQ2vsE8TPyljbeq1cQCVVkYvgc/vOS1ghYuWp7St tj3xt6C1ZsHg6eELiXKqNvBIPr96MNOXaEO1c8dcEAnQbaOZTIcYl8gjCZcTQG9h naCPhWjxTnu2BaXzxMaMK9fdyLygWD+l1rGi8Ca6kjT8F6LgIEgLSKQZp4ZnsNJC eb3HkJ2U6dJpX76PsQe8hS9ndMeugXs+zThdxhLlfSHSij3Zvoe6/6OdG1gNH44t UhznFewCDRBIjxUXgP4lylIkwFx7+AWgONNIA650Slv54sH7ujOUC+L/wLTMXdqn diXrg/xIYzY91uzbDXs/GH3LdHtb8H7xKc9AFn5RiQp++rLmByA= =WnTV -----END PGP SIGNATURE----- --16QxS/f13tfHRC9B--