From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=coKCpJkc; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 1C91C5A0271 for ; Sat, 31 Jan 2026 10:47:35 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1769852854; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qox75XTPJ5vOo4tOJD9ubiyRYA6/N8N07/ai+vU4lIk=; b=coKCpJkcfMFuLQ4im7irjJbgpYn1SY0T//9y0bRNsUq9FOyMwHXd05/cpvX7I4LqvtPJho bVfzBHrgI9S8mj5aysTORGHGoBSksGP+eCyN7uNduAxeTkxMXPKLttKuD2xphNMWhRzMHj 37MnXAHKYCgoLaliDnA0SHCAwsTBRyU= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-674-BCR5X0d4OKu8NATwLgfAgw-1; Sat, 31 Jan 2026 04:47:32 -0500 X-MC-Unique: BCR5X0d4OKu8NATwLgfAgw-1 X-Mimecast-MFC-AGG-ID: BCR5X0d4OKu8NATwLgfAgw_1769852851 Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-4801bceb317so26733395e9.1 for ; Sat, 31 Jan 2026 01:47:32 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769852850; x=1770457650; h=date:content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qox75XTPJ5vOo4tOJD9ubiyRYA6/N8N07/ai+vU4lIk=; b=TGulOKCLtF/FPYHcGhFS9gE5lV3SgeyvSm5u9CCddlEBotS+5mcszk3cfzhhzGJhhI DI5GeGn16Hv1/h6mgtz58aX2/wvoOxfhZwprMGaDjQvYUOzCn9XDT0w/9/wBPfEJh+Nu 979tnR/ljjRyZKd4e5IP3BPTpwHyMyYWAW3WMqRYV5w4piS4Bd1tMiRZ7uM7GEF6oXVh vDDUrXjB5ztgKELz5FC/WFAWOJSOv8n008K3KOwPRYdMtbxjs+WY+VCRzNFpiQPuZ639 a06vKfvHL/Zx2byk+GwZJB2BNCU03vMipK+IkqgfsKoz4xo2CiQp5pOhd0o/bTuwXE6M fRCg== X-Gm-Message-State: AOJu0YwOO4mbzytPV0R+Ul1D+FKHgZO/vDd7gPt5mRI/wXomAGrSWige MdO8Nh5egB6KFcWvspcSFYMbVw9tQqd+Z55RcDUVwVCpPgKhGgJJPvtDNiabwZd4Nn8KCr7HWUM megKsLQiMQLgPo/i/HM54TkRuzEwMg5sAvPtFgj9rr5kztme87K0t4qQ5k/XXTA== X-Gm-Gg: AZuq6aJh3I8986mhad2J9Gi0WWxmSKCXkHreyLdfgJ55RfmWrZiAiN4NYOi1gH3l6SX qmqOncidOZFCD7uGyPCxSgLIYLR4Gel8LX0YcMtnezXuPMn/SmMeAAbG7iFrwh0GYoJ9svxha4d jybJX/W5QDx0L/l43nlBVlWlhqmiODz/u8lRgHGI//OwURmq3emMq6lCGOy4Xhw7k/MJNZa2x/P 2hwZ8ptG1CznL/nE/BPDIS3FBZSyz54kwW+qgA/AQKwHLiQ8ozVGuqo/ZOT0HVWrJX6bcAMvjjK G+efCTdhx8hIU9Nc+155s3eJ0IvcPWvJ6kWW/ZW7xn1IiiZymYgDOaAL/MDCr0+G7ESNzH4eJS2 5yC4oxsEcgWlTkwo0Nlsn6X/C59I8LRFQ2ySnQg== X-Received: by 2002:a05:600c:1f14:b0:480:4a90:1af2 with SMTP id 5b1f17b1804b1-482db4a5f2amr85443875e9.35.1769852849611; Sat, 31 Jan 2026 01:47:29 -0800 (PST) X-Received: by 2002:a05:600c:1f14:b0:480:4a90:1af2 with SMTP id 5b1f17b1804b1-482db4a5f2amr85443595e9.35.1769852849132; Sat, 31 Jan 2026 01:47:29 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-48066bee7d0sm342851805e9.4.2026.01.31.01.47.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 31 Jan 2026 01:47:28 -0800 (PST) From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 1/1] migrate: Use forward table information to close() listening sockets Message-ID: <20260131104727.2fbdfaff@elisabeth> In-Reply-To: <20260130055811.2408284-2-david@gibson.dropbear.id.au> References: <20260130055811.2408284-1-david@gibson.dropbear.id.au> <20260130055811.2408284-2-david@gibson.dropbear.id.au> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 Date: Sat, 31 Jan 2026 10:47:28 +0100 (CET) X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: ogZrGZI0GdMxWhikMkm1taF75a5n0NgGj0W-HH5qNg0_1769852851 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: E5EDVU554XOBKIS7L6F7ZA7KEIPAARF2 X-Message-ID-Hash: E5EDVU554XOBKIS7L6F7ZA7KEIPAARF2 X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, 30 Jan 2026 16:58:11 +1100 David Gibson wrote: > On incoming migrations we need to bind() reconstructed sockets to their > correct local address. We can't do this if the origin passt instance is > in the same namespace and still has those addresses bound. Arguably that's > a bug in bind()s operation during repair mode, but for now we have to work > around it. > > So, to allow local-to-local migrations we close() sockets on the outgoing > side as we process them. In addition to closing the connected socket we > also have to close the associated listen()ing socket, because that can also > cause an address conflict. > > To do that, we introduced the listening_sock field in the connection > state, because we had no other way to find the right listening sockets. > Now that we have the forwarding table, we have a complete list of > listening sockets elsewhere. We can use that instead, to close all > listening sockets on outbound migration, rather than just the ones that > might conflict. > > This is cleaner and, importantly, saves a valuable 32-bits in the flow > state structure. It does mean that there is a longer window where a peer > attempting to connect during migration might get a Connection Refused. > I think this is an acceptable trade-off for now: arguably we should not > allow local-to-local migrations in any case, since the socket closes make > it impossible to safely roll back migration as per the qemu model. > > Signed-off-by: David Gibson > --- > flow.c | 12 ++++++++++++ > fwd.c | 21 +++++++++++++++++++++ > fwd.h | 1 + > tcp.c | 9 --------- > tcp_conn.h | 3 --- > 5 files changed, 34 insertions(+), 12 deletions(-) > > diff --git a/flow.c b/flow.c > index fd4d5f38..5207143d 100644 > --- a/flow.c > +++ b/flow.c > @@ -1023,6 +1023,9 @@ static int flow_migrate_source_rollback(struct ctx *c, unsigned bound, int ret) > > debug("...roll back migration"); > > + if (fwd_listen_sync(c, &c->tcp.fwd_in, PIF_HOST, IPPROTO_TCP) < 0) > + die("Failed to re-establish listening sockets"); > + > foreach_established_tcp_flow(flow) { > if (FLOW_IDX(flow) >= bound) > break; > @@ -1147,6 +1150,15 @@ int flow_migrate_source(struct ctx *c, const struct migrate_stage *stage, Nit: the comment to this function currently says "Send data (flow table) for flow, close listening". I fixed that up (dropped ", close listening"). > return flow_migrate_source_rollback(c, FLOW_MAX, rc); > } > > + /* HACK: A local to local migrate will fail if the origin passt has the > + * listening sockets still open when the destination passt tries to bind > + * them. This does mean there's a window where we lost our listen()s, > + * even if the migration is rolled back later. The only way to really > + * fix that is to not allow local to local migration, which arguably we > + * should (use namespaces for testing instead). */ Actually, we already use namespaces in the current tests, but we didn't (always) do that during development, and it might be convenient in general to have the possibility to test *a part* of the implementation using the same namespace as long as it's reasonably cheap (it seems to be). That's just a part because anyway bind() and connect() will conflict, if we're in the same namespace, which is a kernel issue you already noted: https://pad.passt.top/p/TcpRepairTodo#L3 Repair mode sockets should not have address conflicts with non-repair sockets (both bind() and connect()) but even that part is convenient to have, I think, so I'm a bit worried that somebody might take this comment as a to-do item, while I don't think it should be. Patch applied anyway, to give this as much testing time and exposure as possible. -- Stefano