From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202512 header.b=FDRmGZgw; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id C96F45A004E for ; Mon, 02 Feb 2026 01:24:25 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202512; t=1769991861; bh=qC7OH9TJbigF6RWHSrDx14Nk4lJKnMhd/YON0UelMaE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=FDRmGZgwY4gIV/Axg1S8KOp8ylo+vc2qCoR849ohl/QfmAJ4CIEc7Ud7LJ3QIJCM7 VMUlJbhWDjkr5zlrl8cR6koXve6kZvBzUmQ1NvjV9GcO1og/uBH1qSx6ZQ63XiWR+o AJKTNSwFwugr56GKrp+vC2fyUkS1mKfg3HIZkmQSiJO4WGlqWo5eD07JQZ+MDeA0/y vc8DI8AVmHH1TVkIB4UZDbwFaZ+HabFfqjQ2/5uE2tGUEmwgvmIfi6E09jhCvRzm10 9TkUBC3iytuBbeoYf4Y1E+V8pkllEyb7w39r3fn69oGiZHSFKLZ4HI/tD+Ek/lyr5I 47fw4pqLQysHw== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4f46n16bSMz4w0p; Mon, 02 Feb 2026 11:24:21 +1100 (AEDT) Date: Mon, 2 Feb 2026 10:24:14 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 1/1] migrate: Use forward table information to close() listening sockets Message-ID: References: <20260130055811.2408284-1-david@gibson.dropbear.id.au> <20260130055811.2408284-2-david@gibson.dropbear.id.au> <20260131104727.2fbdfaff@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="J4TAGBppF6PaNVLJ" Content-Disposition: inline In-Reply-To: <20260131104727.2fbdfaff@elisabeth> Message-ID-Hash: SYEMV3KFMAXB5RIPHLRLKOPEYARPNEXF X-Message-ID-Hash: SYEMV3KFMAXB5RIPHLRLKOPEYARPNEXF X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --J4TAGBppF6PaNVLJ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Jan 31, 2026 at 10:47:28AM +0100, Stefano Brivio wrote: > On Fri, 30 Jan 2026 16:58:11 +1100 > David Gibson wrote: >=20 > > On incoming migrations we need to bind() reconstructed sockets to their > > correct local address. We can't do this if the origin passt instance is > > in the same namespace and still has those addresses bound. Arguably th= at's > > a bug in bind()s operation during repair mode, but for now we have to w= ork > > around it. > >=20 > > So, to allow local-to-local migrations we close() sockets on the outgoi= ng > > side as we process them. In addition to closing the connected socket we > > also have to close the associated listen()ing socket, because that can = also > > cause an address conflict. > >=20 > > To do that, we introduced the listening_sock field in the connection > > state, because we had no other way to find the right listening sockets. > > Now that we have the forwarding table, we have a complete list of > > listening sockets elsewhere. We can use that instead, to close all > > listening sockets on outbound migration, rather than just the ones that > > might conflict. > >=20 > > This is cleaner and, importantly, saves a valuable 32-bits in the flow > > state structure. It does mean that there is a longer window where a pe= er > > attempting to connect during migration might get a Connection Refused. > > I think this is an acceptable trade-off for now: arguably we should not > > allow local-to-local migrations in any case, since the socket closes ma= ke > > it impossible to safely roll back migration as per the qemu model. > >=20 > > Signed-off-by: David Gibson > > --- > > flow.c | 12 ++++++++++++ > > fwd.c | 21 +++++++++++++++++++++ > > fwd.h | 1 + > > tcp.c | 9 --------- > > tcp_conn.h | 3 --- > > 5 files changed, 34 insertions(+), 12 deletions(-) > >=20 > > diff --git a/flow.c b/flow.c > > index fd4d5f38..5207143d 100644 > > --- a/flow.c > > +++ b/flow.c > > @@ -1023,6 +1023,9 @@ static int flow_migrate_source_rollback(struct ct= x *c, unsigned bound, int ret) > > =20 > > debug("...roll back migration"); > > =20 > > + if (fwd_listen_sync(c, &c->tcp.fwd_in, PIF_HOST, IPPROTO_TCP) < 0) > > + die("Failed to re-establish listening sockets"); > > + > > foreach_established_tcp_flow(flow) { > > if (FLOW_IDX(flow) >=3D bound) > > break; > > @@ -1147,6 +1150,15 @@ int flow_migrate_source(struct ctx *c, const str= uct migrate_stage *stage, >=20 > Nit: the comment to this function currently says "Send data (flow > table) for flow, close listening". I fixed that up (dropped ", close list= ening"). Good point, thanks. > > return flow_migrate_source_rollback(c, FLOW_MAX, rc); > > } > > =20 > > + /* HACK: A local to local migrate will fail if the origin passt has t= he > > + * listening sockets still open when the destination passt tries to b= ind > > + * them. This does mean there's a window where we lost our listen()s, > > + * even if the migration is rolled back later. The only way to really > > + * fix that is to not allow local to local migration, which arguably = we > > + * should (use namespaces for testing instead). */ >=20 > Actually, we already use namespaces in the current tests, Oh, nice. > but we didn't > (always) do that during development, and it might be convenient in > general to have the possibility to test *a part* of the implementation > using the same namespace as long as it's reasonably cheap (it seems to > be). Depends what cost you're talking about. It's cheap in terms of computational complexity, and code compexity. It means, however, that we can't necessarily roll back failed migrations - i.e. resume on the origin system. That isn't really correct for the qemu migration model, which is why I think allowing local migrations probably isn't the best idea, at least by default. > That's just a part because anyway bind() and connect() will conflict, > if we're in the same namespace, which is a kernel issue you already > noted: Well, it's a kernel issue that the bound listen()ing sockets conflict with the half-constructed flow sockets. Having the listening sockets of the origin passt conflict with the listening sockets of the destination passt is pretty much expected, and would still be an impediment to local migration. > https://pad.passt.top/p/TcpRepairTodo#L3 > Repair mode sockets should not have address conflicts with non-repair > sockets (both bind() and connect()) >=20 > but even that part is convenient to have, I think, I'm not really sure what you mean by that. > so I'm a bit worried > that somebody might take this comment as a to-do item, while I don't > think it should be. >=20 > Patch applied anyway, to give this as much testing time and exposure as > possible. >=20 > --=20 > Stefano >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --J4TAGBppF6PaNVLJ Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAml/7q0ACgkQzQJF27ox 2Gdy0hAAgZGO5BZyM1pwG1OUbgIUT5g+IvvoKQxXbDq4sHkU4WQ1+t+9pUcHyE09 tOvYUfKM8M5HySFW8uu3q6MCujOz1plXd7k6ZbW0JnLlF8A+dm2iwRjc/1kujanm 23lrefI2dFnWBpNAOLZ5qqQwfAPcC6/MEuspmXXCoFwRDO08ybbc+/94pmZ6dPIE RQPUjmndtFvyKXLL/mgJnRQiOS5WAaSmf4oHglC+FmfK2hlHQdVmwRrhw7tcUZmQ MBaZuw5VeRrw8tIi26MA/iGETwngDjIn7QB2qyFCNBnzksKC4HL+SJWUD2hF7Zb8 gGJvUHTJL7vGT/Dowb7T5RxaISz0Hea7Zc8dZ7GDwE7ThYZQxKvqtWvg6wZWWvJK i1ogCHpGXZW8gRzH3gXg9zn6iGJ4wHqRi/LyJY1VtWsGyA+CllZM6iCnj591Qbji XR+OeYiEFjH8tLxQ6iyNfJm9IB2hxPeAKUg++kMGs7G2+h0U+8RfnJF8TCuRsy0z Bs9Tmdw4wen6e+7jkTLlgtBhkHG4lQ7BiTkCXXeRwSLryjEOS8BbHx1t9ebWsbEx ODMWtJVUDXoTc3yOol23cUonsrVGkk9wgO2IQelaYmmEeTm+9LaoZPEhlk3mDEzA G5JHtt68tm9HN4ccLJ1e58OW6nJBNm8qY5Xqoo9KUp5Qr2z3zVw= =MaNy -----END PGP SIGNATURE----- --J4TAGBppF6PaNVLJ--