From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202502 header.b=e85opkeB; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 2BDB65A0271 for ; Mon, 03 Feb 2025 10:25:08 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202502; t=1738574694; bh=O5tgSlGfhN/NifISAyQQdHfTYVaBXiXQA1WuiaZrDjk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=e85opkeBnDQ6voHnMMrKa3Ja/ENclDghNeAfGAXsfBBJ5rCu/R9u7td3RHax9HHyf 682vEe0Uf8YpcgrEdALck7zf+YvWOp+8QMo+aEVZq5tvD8D3HmNjTqgfjsrpAHA37D bJTQpQ0RKO4J4L2dHrdTWAxBJQiZzH+0Zya3GBmF2Zuk9A9bS0DcRE1hpoA2KaYTvV MBln0VAl9pcDr/BeXKPQLKK6R/1+4tnfzd37HCaioGs4Me7+jXKRDO1vePJ7OIbJsO FyLMDxS9iT+A+3KX8m5+GIQDNyD9W1BzhpsQ3UjgRwFUz1M63/QoSKef+1OIx2qgqk YooLxyLU7xvxg== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4Ymh0k3FGvz4wxx; Mon, 3 Feb 2025 20:24:54 +1100 (AEDT) Date: Mon, 3 Feb 2025 20:06:28 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 6/7] Introduce facilities for guest migration on top of vhost-user infrastructure Message-ID: References: <20250129083350.220a7ab0@elisabeth> <20250130055522.39acb265@elisabeth> <20250130093236.117c3fd0@elisabeth> <20250131063655.41a5861b@elisabeth> <20250131100919.0950ec1e@elisabeth> <20250203070928.54561e7e@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="ERQRcXX2dp8DTkpS" Content-Disposition: inline In-Reply-To: <20250203070928.54561e7e@elisabeth> Message-ID-Hash: XTHUAADWKVIEGD6JZGANXHRSSJFD6C65 X-Message-ID-Hash: XTHUAADWKVIEGD6JZGANXHRSSJFD6C65 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Laurent Vivier X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --ERQRcXX2dp8DTkpS Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Feb 03, 2025 at 07:09:28AM +0100, Stefano Brivio wrote: > On Mon, 3 Feb 2025 11:46:13 +1100 > David Gibson wrote: >=20 > > On Fri, Jan 31, 2025 at 10:09:19AM +0100, Stefano Brivio wrote: > > > Fixed, finally. Some answers: > > >=20 > > > On Fri, 31 Jan 2025 17:14:18 +1100 > > > David Gibson wrote: > > > =20 > > > > On Fri, Jan 31, 2025 at 06:36:55AM +0100, Stefano Brivio wrote: =20 > > > > > On Thu, 30 Jan 2025 09:32:36 +0100 > > > > > Stefano Brivio wrote: > > > > > =20 > > > > > > I would like to quickly complete the whole flow first, because = I think > > > > > > we can inform design and implementation decisions much better a= t that > > > > > > point =20 > > > > >=20 > > > > > So, there seems to be a problem with (testing?) this. I couldn't = quite > > > > > understand the root cause yet, and it doesn't happen with the ref= erence > > > > > source.c and target.c implementations I shared. > > > > >=20 > > > > > Let's assume I have a connection in the source guest to 127.0.0.1= :9091, > > > > > from 127.0.0.1:56350. After the migration, in the target, I get: > > > > >=20 > > > > > --- > > > > > socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) =3D 79 > > > > > setsockopt(79, SOL_SOCKET, SO_REUSEADDR, [1], 4) =3D 0 > > > > > bind(79, {sa_family=3DAF_INET, sin_port=3Dhtons(56350), sin_addr= =3Dinet_addr("0.0.0.0")}, 16) =3D 0 > > > > > sendmsg(72, {msg_name=3DNULL, msg_namelen=3D0, msg_iov=3D[{iov_ba= se=3D"\1", iov_len=3D1}], msg_iovlen=3D1, msg_control=3D[{cmsg_len=3D20, cm= sg_level=3DSOL_SOCKET, cmsg_type=3DSCM_RIGHTS, cmsg_data=3D[79]}], msg_cont= rollen=3D24, msg_flags=3D0}, 0) =3D 1 > > > > > recvfrom(72, "\1", 1, 0, NULL, NULL) =3D 1 > > > > > setsockopt(79, SOL_TCP, TCP_REPAIR_QUEUE, [2], 4) =3D 0 > > > > > setsockopt(79, SOL_TCP, TCP_QUEUE_SEQ, [1788468535], 4) =3D 0 > > > > > write(2, "77.6923: ", 977.6923: ) =3D 9 > > > > > write(2, "Set send queue sequence for sock"..., 51Set send queue = sequence for socket 79 to 1788468535) =3D 51 > > > > > write(2, "\n", 1 > > > > > ) =3D 1 > > > > > setsockopt(79, SOL_TCP, TCP_REPAIR_QUEUE, [1], 4) =3D 0 > > > > > setsockopt(79, SOL_TCP, TCP_QUEUE_SEQ, [115288604], 4) =3D 0 > > > > > write(2, "77.6924: ", 977.6924: ) =3D 9 > > > > > write(2, "Set receive queue sequence for s"..., 53Set receive que= ue sequence for socket 79 to 115288604) =3D 53 > > > > > write(2, "\n", 1 > > > > > ) =3D 1 > > > > > connect(79, {sa_family=3DAF_INET, sin_port=3Dhtons(9091), sin_add= r=3Dinet_addr("127.0.0.1")}, 16) =3D -1 EADDRNOTAVAIL (Cannot assign reques= ted address) > > > > > --- > > > > >=20 > > > > > EADDRNOTAVAIL, according to the documentation, which seems to be > > > > > consistent with a glance at the implementation (that is, I must be > > > > > missing some issue in the kernel), should be returned on connect(= ) if: > > > > >=20 > > > > > EADDRNOTAVAIL > > > > > (Internet domain sockets) The socket referred to by > > > > > sockfd had not previously been bound to an address > > > > > and, upon attempting to bind it to an ephemeral > > > > > port, it was determined that all port numbers in the > > > > > ephemeral port range are currently in use. See the > > > > > discussion of /proc/sys/net/ipv4/ip_local_port_range > > > > > in ip(7). > > > > >=20 > > > > > but well, of course it was bound. > > > > >=20 > > > > > To a port, indeed, not a full address, that is, any (0.0.0.0) and > > > > > address port, but I think for the purposes of this description th= at > > > > > bind() call is enough. =20 > > > >=20 > > > > So, I was wondering if binding to 0.0.0.0 is sufficient for a repai= red > > > > socket. =20 > > >=20 > > > It is. > > > =20 > > > > Usually, of course, that 0.0.0.0 would be resolved to a real > > > > address at connect() time. But TCP_REPAIR's version of connect() > > > > bypasses a bunch of the usual connect logic, so maybe we need an > > > > explicit address here. =20 > > >=20 > > > No need. =20 > >=20 > > Ok. > >=20 > > > > ...but that doesn't explain the difference between passt and your t= est > > > > implementation. =20 > > >=20 > > > The difference that actually matters is that the test implementation > > > terminates, and that has the equivalent effect of switching off repair > > > mode for the closed sockets, which frees up all the associated contex= t, > > > including the port. > > >=20 > > > Usually, there are no valid operations on closed sockets (not even > > > close()). This is the first exception I ever met: you can set > > > TCP_REPAIR_OFF. =20 > >=20 > > I'm still confused by the specific sequence of events that's causing > > the problem. If a socket is closed with close(2) it should no longer > > exist, so I don't see how you could even attempt to do anything with > > it. > >=20 > > Do you mean that the socket is shutdown(RD|WR)? Or that it's been > > closed by passt, but not by passt-repair? Or the other way around? > >=20 > > I'd kind of assume that you _must_ close the socket while still in > > repair mode, since we want it to go away on the source without > > attempting to FIN or RST or anything. >=20 > While the explanation for the issue is what you gave as comment to 8/20 > (I need to close() the socket from passt-repair), let me answer here: > sure, I must close() it, and it was close()d by passt but not > passt-repair. Right, I realised the problem with the missing close in passt-repair after I wrote this. > > > But there's a catch: you can't pass a closed socket in repair mode via > > > SCM_RIGHTS (well, I'm fairly sure nobody approached this level of > > > insanity before): you get EBADF (which is an understatement). > > >=20 > > > And there's another catch: if you actually try to do that, even if it > > > fails, that has the same effect of clearing the socket entirely: you > > > free up the port. =20 > >=20 > > !?! this is even more baffling. Passing what's now an unrelated, > > unassigned integer as an fd is having some effect on a socket that was > > around!? If so that's a horrifying kernel bug. >=20 > Nah, most likely not. The EBADF on a close()d socket is a bit > questionable (it should be EINVAL? Or a -1 socket in the > recipient?), You're not "passing a closed socket", that's nonsensical. You're trying to pass a stale fd that's no longer refers to your socket. EBADF is _exactly_ what should happen, regardless of whether or not the underlying socket is really closed, or if it's held open by another fd somewhere (a dup() or something passed to another process like in this case). > but other than that, the explanation is that passing that closed socket > caused EOF in passt-repair, and passt-repair would quit, solving the > issue. Passing a bad fd caused an error on the sendmsg(), which caused an EOF on the other end. Which is a little odd, but again nothing to do with "passing a closed socket"; that's impossible - if the socket is closed there's no way to refer to it and so no way to even attempt sending it. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --ERQRcXX2dp8DTkpS Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmeghxMACgkQzQJF27ox 2GdKDg/+OEonVNpj38vZRupWTjFDO7KD2JCq01GHJ84rBy4zdGrXlAABRHSOX2Jw GTQrGXvovarb1mv7GpKFs1jhsXrnTx/hzOcCmlBkrVmXZ/J1cKzJvylKMKR0/3N1 UvwyxHwVmVmgm7uUP9N10tbmFSqi4Gzd9sDfBQySabT98XantMEcwiW/AqeAjCYB tyvy6xPGfN6wul/3YLrtExoaAl+/38OPEa87Qy6vq7IzSIRS8kHKuamxhn+AFzZA 4t4VlvI2E08fVWrk9c67mVYelAZvbYfKFnJldEscqVH2iZ5q7NB6bdsDyCTAgX+Z SzkAtoczjYan0cd0Cf8ZrzaogmOeUI5zJPEgSIN0Wh7R1C5ZhvLK2KzEuWSCW3s9 AjQwhQrFdVYpeoLcUJByw/jTG3UYwPv1xHeEWcAWvceUezyXmkXKYFzpPBN/uYrh pHP1HDoA/a1eSZBtHaW3mKbCLUJHYz1AHlo6nYEYT33MmfeMXw8oW83cbUoNUOY+ RkJ1TOTkeiY+KXC5pdEEbV8360E5UbSCp6t/aox5CwipKBQsNW7cBV0AKz2Qwf2d FeuvUrpYMLblgMrZBBKrs/WbzcGKcMnlgPlPQfiJS6ZuYRVLwLI+I9MWywuosI3K Qoyw5NpjoTkMmTAJ8YrjLN3RDxCPJ4LHdHoAIokV5Jn455hHENU= =UdDv -----END PGP SIGNATURE----- --ERQRcXX2dp8DTkpS--