From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=gzdwzhf/; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id E6F385A004E for ; Thu, 05 Feb 2026 01:17:58 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1770250677; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GO1mKbrPaTsXb5Xtn8rrfDuFE/0/Htl01rdz52cv4/Y=; b=gzdwzhf/V1GQNMyvBg11ExHMAQuCLUap9mHWRNUuBlqCRqy40fGw0dCRNe+2uGrQ5KX320 PbeLJ9FJVkCaBsQEGdM3h9HoK33T/D4wyBmE9MaR2yr8AvEoEeyRMuIOweNWpucNJcoHvs /bIHXyp2lG444lcM/U8VMkPiERHQlR4= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-470-_GZNhAmkPjartm5WokU92A-1; Wed, 04 Feb 2026 19:17:56 -0500 X-MC-Unique: _GZNhAmkPjartm5WokU92A-1 X-Mimecast-MFC-AGG-ID: _GZNhAmkPjartm5WokU92A_1770250675 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-4801e3aab32so2520725e9.3 for ; Wed, 04 Feb 2026 16:17:56 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770250674; x=1770855474; h=date:content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=GO1mKbrPaTsXb5Xtn8rrfDuFE/0/Htl01rdz52cv4/Y=; b=KfFdFgwe4ams33k6ZHRm5n5YF9xY6/EfOMz1FOKDvBdX8PHpDkBmI2D/RKGG0OYFha I7lHy9UJAO6M9JvW/WDJz8mX2iAIDnQ1sOPkYG8gJkM7PXe2pwF/KfAs8IB/0f/5ibLt xI1R1DKwvunV2HWhJL4NxcsakT0bsms6sSHYrCAPCGKuyOpNs1UzrWjNWE94av1Ob8G+ A9c+TzJJ3/5APG5c0ym6uLBm3fhyF6O1pmhgDzSXLkDsgOwUh9AI7EhlGFG3Twq002S3 8NmymGIxqNN9ikD9YlSoAnA5139VDENaAcRrGpVaWQgozYgIdhYI+vQsOWppNEo1OPei 44Gw== X-Gm-Message-State: AOJu0YwBass2Ua6a06aIu2CoLgn0bFwXnr3NdDGD14Nio/GCQMQ7HgcV 5Msq42t3a3FxhIKeYkG04AsqNFy+KRq152WNyqUcoXHE6PbfBid0Ojyv8smdBodkPIh7rU+2VIc 8NnUWoD6wE0FXiO+Z53K/htQ4xm9Grlas/aCFZHSHTIe49BsE7Qaa97v6DX7z5A== X-Gm-Gg: AZuq6aJr7eEiOiN12i+4x9+8sbbOdINefIyPtoV0vuPaNflEUqgwI+uQXkRI8MlVEXK l1CGFSUxxzORQ+E4GC830U+6JO6T1DPQYWUAh826Xgaszo8vVPnOwpwZOWtKWTjANYAGjEKu9pY KZPvHcZhtkv2NgUJY/IPGJ/zViH56YteESQWTDeMwwG7Z/P+TYc/UCsrBTmTJW+prKqsyDLiQAc FTl25qg0+LrBbQxMCEgbjK3T40VGDcRxCnIYlyopAku7+w3ercpbWMnFmuqnjZaIlGDzhf3nYF6 CY2AOeFMDPObYP45TxiKPDGmpzSpXRhO3R1T2RekfLdFjcmPBx2t4pemG50ewr+sTCpLNtqivdA ghJ/hn0fReObp2tnMWXTS X-Received: by 2002:a05:600c:c3db:10b0:480:4a8f:2d5c with SMTP id 5b1f17b1804b1-4830e991dccmr47768615e9.29.1770250674300; Wed, 04 Feb 2026 16:17:54 -0800 (PST) X-Received: by 2002:a05:600c:c3db:10b0:480:4a8f:2d5c with SMTP id 5b1f17b1804b1-4830e991dccmr47768515e9.29.1770250673625; Wed, 04 Feb 2026 16:17:53 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-48317d3fd09sm18508805e9.10.2026.02.04.16.17.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Feb 2026 16:17:53 -0800 (PST) From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 1/1] migrate: Use forward table information to close() listening sockets Message-ID: <20260205011752.65ba41c6@elisabeth> In-Reply-To: References: <20260130055811.2408284-1-david@gibson.dropbear.id.au> <20260130055811.2408284-2-david@gibson.dropbear.id.au> <20260131104727.2fbdfaff@elisabeth> <20260202231159.44c251bd@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 Date: Thu, 05 Feb 2026 01:17:52 +0100 (CET) X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: IfMvPPGYUq8bTeAdd5ZZGyntJ7EfXnloo1hZLO3yHrM_1770250675 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: VGJRJYDK6TCCUYIXGQWJ2HCT4FE6SZNB X-Message-ID-Hash: VGJRJYDK6TCCUYIXGQWJ2HCT4FE6SZNB X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Wed, 4 Feb 2026 21:57:03 +1000 David Gibson wrote: > On Mon, Feb 02, 2026 at 11:12:08PM +0100, Stefano Brivio wrote: > > On Mon, 2 Feb 2026 10:24:14 +1000 > > David Gibson wrote: > > > > > On Sat, Jan 31, 2026 at 10:47:28AM +0100, Stefano Brivio wrote: > > > > On Fri, 30 Jan 2026 16:58:11 +1100 > > > > David Gibson wrote: > > > > > > > > > On incoming migrations we need to bind() reconstructed sockets to their > > > > > correct local address. We can't do this if the origin passt instance is > > > > > in the same namespace and still has those addresses bound. Arguably that's > > > > > a bug in bind()s operation during repair mode, but for now we have to work > > > > > around it. > > > > > > > > > > So, to allow local-to-local migrations we close() sockets on the outgoing > > > > > side as we process them. In addition to closing the connected socket we > > > > > also have to close the associated listen()ing socket, because that can also > > > > > cause an address conflict. > > > > > > > > > > To do that, we introduced the listening_sock field in the connection > > > > > state, because we had no other way to find the right listening sockets. > > > > > Now that we have the forwarding table, we have a complete list of > > > > > listening sockets elsewhere. We can use that instead, to close all > > > > > listening sockets on outbound migration, rather than just the ones that > > > > > might conflict. > > > > > > > > > > This is cleaner and, importantly, saves a valuable 32-bits in the flow > > > > > state structure. It does mean that there is a longer window where a peer > > > > > attempting to connect during migration might get a Connection Refused. > > > > > I think this is an acceptable trade-off for now: arguably we should not > > > > > allow local-to-local migrations in any case, since the socket closes make > > > > > it impossible to safely roll back migration as per the qemu model. > > > > > > > > > > Signed-off-by: David Gibson > > > > > --- > > > > > flow.c | 12 ++++++++++++ > > > > > fwd.c | 21 +++++++++++++++++++++ > > > > > fwd.h | 1 + > > > > > tcp.c | 9 --------- > > > > > tcp_conn.h | 3 --- > > > > > 5 files changed, 34 insertions(+), 12 deletions(-) > > > > > > > > > > diff --git a/flow.c b/flow.c > > > > > index fd4d5f38..5207143d 100644 > > > > > --- a/flow.c > > > > > +++ b/flow.c > > > > > @@ -1023,6 +1023,9 @@ static int flow_migrate_source_rollback(struct ctx *c, unsigned bound, int ret) > > > > > > > > > > debug("...roll back migration"); > > > > > > > > > > + if (fwd_listen_sync(c, &c->tcp.fwd_in, PIF_HOST, IPPROTO_TCP) < 0) > > > > > + die("Failed to re-establish listening sockets"); > > > > > + > > > > > foreach_established_tcp_flow(flow) { > > > > > if (FLOW_IDX(flow) >= bound) > > > > > break; > > > > > @@ -1147,6 +1150,15 @@ int flow_migrate_source(struct ctx *c, const struct migrate_stage *stage, > > > > > > > > Nit: the comment to this function currently says "Send data (flow > > > > table) for flow, close listening". I fixed that up (dropped ", close listening"). > > > > > > Good point, thanks. > > > > > > > > return flow_migrate_source_rollback(c, FLOW_MAX, rc); > > > > > } > > > > > > > > > > + /* HACK: A local to local migrate will fail if the origin passt has the > > > > > + * listening sockets still open when the destination passt tries to bind > > > > > + * them. This does mean there's a window where we lost our listen()s, > > > > > + * even if the migration is rolled back later. The only way to really > > > > > + * fix that is to not allow local to local migration, which arguably we > > > > > + * should (use namespaces for testing instead). */ > > > > > > > > Actually, we already use namespaces in the current tests, > > > > > > Oh, nice. > > > > > > > but we didn't > > > > (always) do that during development, and it might be convenient in > > > > general to have the possibility to test *a part* of the implementation > > > > using the same namespace as long as it's reasonably cheap (it seems to > > > > be). > > > > > > Depends what cost you're talking about. It's cheap in terms of > > > computational complexity, and code compexity. It means, however, that > > > we can't necessarily roll back failed migrations - i.e. resume on the > > > origin system. That isn't really correct for the qemu migration > > > model, which is why I think allowing local migrations probably isn't > > > the best idea, at least by default. > > > > Well it's not entirely true that we can't roll back failed migrations. > > > > Depending on the stage we're at, we might close connections, but other > > than that we can always continue, somehow. Losing some connections > > looks relatively cheap as well. > > I guess that's true. I'm not sure we currently handle this even as > well as is possible within the constraints. I actually checked a while ago, nothing unexpected happen. > > But sure, it's not ideal. Maybe yes, we shouldn't have that as default, > > add a new option and update QEMU's documentation (see below) with > > it. > > There's two layers to it. Dropping connections on what's otherwise a > no-op migration failure is ugly. Opening the possibility that we > might not be able to rebind ports we were already listening on is > worse. But that would only happen if somebody starts an unrelated process binding the same ports, right? I'm not sure if it's something we really need to take care of. > > > > That's just a part because anyway bind() and connect() will conflict, > > > > if we're in the same namespace, which is a kernel issue you already > > > > noted: > > > > > > Well, it's a kernel issue that the bound listen()ing sockets conflict > > > with the half-constructed flow sockets. Having the listening sockets > > > of the origin passt conflict with the listening sockets of the > > > destination passt is pretty much expected, and would still be an > > > impediment to local migration. > > > > Ah, right, we don't set listening sockets to repair mode... should we, > > by the way? > > Uh.. I don't think so? I'm not really sure how repair mode operates > for listening sockets, or if it should even allow that. The relevant > state of a listening socket is pretty much limited to its bound > address, so I don't think we need any additional mechanism to extract > state. I wasn't suggesting that for any functional purpose. Repair mode has no meaning there. But it would have the advantage, if we fix/change this in the kernel: https://pad.passt.top/p/TcpRepairTodo#L3 Repair mode sockets should not have address conflicts with non-repair sockets (both bind() and connect()) that bind() calls wouldn't conflict with bound sockets in repair mode, and let us test a bit more of local migration. On the other hand I'm not sure what should happen once you bring a socket, which was originally bound to a given port, out of repair mode, once there's a new socket bound to it. -- Stefano