From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=U+rR2/so; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 61E225A0271 for ; Mon, 02 Feb 2026 23:12:14 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1770070333; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SgG0eBDPHzhXULGQJQItTgLCDtFCk2mZ59jRR5ioqro=; b=U+rR2/so+1zOiqbI/4Fz+7h324huDMhSTXzlsYLId/n2fpHLxbADubrVZ8rTTy8Buej72Q nN8lFf266yVVKuKV8c4ikaoEeCkZ0BSzqguV8cWrZe4sySB6MeQI6o8tsq3rZWTNtU2E1R 8j72eVYKqpn1HZT6/6oLZLmOeIZsorg= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-50-JBw7YizmO5K6ewNGLgORNg-1; Mon, 02 Feb 2026 17:12:11 -0500 X-MC-Unique: JBw7YizmO5K6ewNGLgORNg-1 X-Mimecast-MFC-AGG-ID: JBw7YizmO5K6ewNGLgORNg_1770070331 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-4806865a01bso39176005e9.0 for ; Mon, 02 Feb 2026 14:12:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770070330; x=1770675130; h=date:content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=SgG0eBDPHzhXULGQJQItTgLCDtFCk2mZ59jRR5ioqro=; b=o89UTaksMPFZTap+z7BDHLYEZJU8EzTi0ewQ7Te2YeAqXxEPOYXBAOynxrZoGpe+U1 KNnwn5ao8/A4CGMGRZNaCYZAXLLy6wmco6Co0wIgEmxRTVqESbfM2BvPiWc4fAB6AYAv yoNXmJ4vyE8FtbuV864baHaKQLVh3E3q9i3T+cMpiO6vjALZNYNo/acMricVbmAjBy1U Xf2E2GrGjZw6hz6lALdUCAZ7dWcs2r1OqZUf4ESSDwGihPw9gxgKl+g68Pxw+JygURNz GoznR75MDNMAxU9/mKxGrjpZzJwre5XadwPh8lNAhw1cazqd3ofDj/LHvnmWadykX5w8 DuPw== X-Gm-Message-State: AOJu0YyvNTMf0hMSYq20dNLy0MvPK0jqtQf+SwhREzjG2AVkHoyFI53B Ux7Rm8XQrXTvZIzo8D/wbfRgig2YAeQRiI3WwlI+BFmzFp7zHf+WK7iL9ZW4+ddtOmhHisP3Ki7 vuBgVmTMUGpEx4P9gFHSf0bEcHhT3GBXhym6jCa5qXo2BHWellhU/3g== X-Gm-Gg: AZuq6aKzc0siZl5dCvYSWNFuTvX9BKLZVWwFtP2qIaCv++J0hEzjQ7DTJVcE1gZXn4I 75Ma+AxMI8C4O9/5aQFilbh2Qy0Yr3Xbyqd2ZPokhyKGCNi/hEATdSXZhvfYAPXCO2xzaBd2D+Y a+W//WJfMl6gA2kq7UtB28mvdJEnk7JgstgMdIvUTttX5tfD/aGpkRlKQEhOEsr6MOIcChhizIf /Yy2TOI2oXZEjDGZhZTD+oOYZuoebCUAJNvi9vTFtXKppOB5yOIu0e8jrDUTZPhnAIbeF+JYaag Mupa0jOKUMUlGEuHZhiP6iX/6x2UxwhtfRb/pYMinCKyJZxKb8rKfJczFPQA6y5sHFm5SGRYVSu pxOjufQi4Kx00zdWF2DTW X-Received: by 2002:a05:600c:81c8:b0:479:2f95:5179 with SMTP id 5b1f17b1804b1-482db467e7cmr174911625e9.15.1770070330493; Mon, 02 Feb 2026 14:12:10 -0800 (PST) X-Received: by 2002:a05:600c:81c8:b0:479:2f95:5179 with SMTP id 5b1f17b1804b1-482db467e7cmr174911235e9.15.1770070329724; Mon, 02 Feb 2026 14:12:09 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4830512d629sm16492945e9.7.2026.02.02.14.12.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Feb 2026 14:12:09 -0800 (PST) From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 1/1] migrate: Use forward table information to close() listening sockets Message-ID: <20260202231159.44c251bd@elisabeth> In-Reply-To: References: <20260130055811.2408284-1-david@gibson.dropbear.id.au> <20260130055811.2408284-2-david@gibson.dropbear.id.au> <20260131104727.2fbdfaff@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 Date: Mon, 02 Feb 2026 23:12:08 +0100 (CET) X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: RknBQ7m0u0u317zjyjdG1G7X73vb5OVXSCFTIHI2Mho_1770070331 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: SMQEL5VSPMCQTCLSW7T2BOUWEQJ6MQ2G X-Message-ID-Hash: SMQEL5VSPMCQTCLSW7T2BOUWEQJ6MQ2G X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Mon, 2 Feb 2026 10:24:14 +1000 David Gibson wrote: > On Sat, Jan 31, 2026 at 10:47:28AM +0100, Stefano Brivio wrote: > > On Fri, 30 Jan 2026 16:58:11 +1100 > > David Gibson wrote: > > > > > On incoming migrations we need to bind() reconstructed sockets to their > > > correct local address. We can't do this if the origin passt instance is > > > in the same namespace and still has those addresses bound. Arguably that's > > > a bug in bind()s operation during repair mode, but for now we have to work > > > around it. > > > > > > So, to allow local-to-local migrations we close() sockets on the outgoing > > > side as we process them. In addition to closing the connected socket we > > > also have to close the associated listen()ing socket, because that can also > > > cause an address conflict. > > > > > > To do that, we introduced the listening_sock field in the connection > > > state, because we had no other way to find the right listening sockets. > > > Now that we have the forwarding table, we have a complete list of > > > listening sockets elsewhere. We can use that instead, to close all > > > listening sockets on outbound migration, rather than just the ones that > > > might conflict. > > > > > > This is cleaner and, importantly, saves a valuable 32-bits in the flow > > > state structure. It does mean that there is a longer window where a peer > > > attempting to connect during migration might get a Connection Refused. > > > I think this is an acceptable trade-off for now: arguably we should not > > > allow local-to-local migrations in any case, since the socket closes make > > > it impossible to safely roll back migration as per the qemu model. > > > > > > Signed-off-by: David Gibson > > > --- > > > flow.c | 12 ++++++++++++ > > > fwd.c | 21 +++++++++++++++++++++ > > > fwd.h | 1 + > > > tcp.c | 9 --------- > > > tcp_conn.h | 3 --- > > > 5 files changed, 34 insertions(+), 12 deletions(-) > > > > > > diff --git a/flow.c b/flow.c > > > index fd4d5f38..5207143d 100644 > > > --- a/flow.c > > > +++ b/flow.c > > > @@ -1023,6 +1023,9 @@ static int flow_migrate_source_rollback(struct ctx *c, unsigned bound, int ret) > > > > > > debug("...roll back migration"); > > > > > > + if (fwd_listen_sync(c, &c->tcp.fwd_in, PIF_HOST, IPPROTO_TCP) < 0) > > > + die("Failed to re-establish listening sockets"); > > > + > > > foreach_established_tcp_flow(flow) { > > > if (FLOW_IDX(flow) >= bound) > > > break; > > > @@ -1147,6 +1150,15 @@ int flow_migrate_source(struct ctx *c, const struct migrate_stage *stage, > > > > Nit: the comment to this function currently says "Send data (flow > > table) for flow, close listening". I fixed that up (dropped ", close listening"). > > Good point, thanks. > > > > return flow_migrate_source_rollback(c, FLOW_MAX, rc); > > > } > > > > > > + /* HACK: A local to local migrate will fail if the origin passt has the > > > + * listening sockets still open when the destination passt tries to bind > > > + * them. This does mean there's a window where we lost our listen()s, > > > + * even if the migration is rolled back later. The only way to really > > > + * fix that is to not allow local to local migration, which arguably we > > > + * should (use namespaces for testing instead). */ > > > > Actually, we already use namespaces in the current tests, > > Oh, nice. > > > but we didn't > > (always) do that during development, and it might be convenient in > > general to have the possibility to test *a part* of the implementation > > using the same namespace as long as it's reasonably cheap (it seems to > > be). > > Depends what cost you're talking about. It's cheap in terms of > computational complexity, and code compexity. It means, however, that > we can't necessarily roll back failed migrations - i.e. resume on the > origin system. That isn't really correct for the qemu migration > model, which is why I think allowing local migrations probably isn't > the best idea, at least by default. Well it's not entirely true that we can't roll back failed migrations. Depending on the stage we're at, we might close connections, but other than that we can always continue, somehow. Losing some connections looks relatively cheap as well. But sure, it's not ideal. Maybe yes, we shouldn't have that as default, add a new option and update QEMU's documentation (see below) with it. > > That's just a part because anyway bind() and connect() will conflict, > > if we're in the same namespace, which is a kernel issue you already > > noted: > > Well, it's a kernel issue that the bound listen()ing sockets conflict > with the half-constructed flow sockets. Having the listening sockets > of the origin passt conflict with the listening sockets of the > destination passt is pretty much expected, and would still be an > impediment to local migration. Ah, right, we don't set listening sockets to repair mode... should we, by the way? With the fix for freezing incoming TCP queues I'm working on, as a side effect, that would also mean that the kernel will ignore SYN segments altogether, which is desirable I think. Maybe it already happens for some other reason...? But anyway, without forwarded ports you don't have listening sockets. > > https://pad.passt.top/p/TcpRepairTodo#L3 > > Repair mode sockets should not have address conflicts with non-repair > > sockets (both bind() and connect()) > > > > but even that part is convenient to have, I think, > > I'm not really sure what you mean by that. That we can still *partially* test local migration, without forwarded ports. We can check the interface, including passt-repair and all the vhost-user bits without actual connections, in a simpler way compared to namespaces. See: https://www.qemu.org/docs/master/system/devices/net.html#example-of-migration-of-a-guest-on-the-same-host and: https://lore.kernel.org/qemu-devel/20260131032700.12f27487@elisabeth/ it won't work for any practical case, but one can still test a big chunk of functionality that way. -- Stefano