public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH 1/1] migrate: Use forward table information to close() listening sockets
Date: Wed, 4 Feb 2026 21:57:03 +1000	[thread overview]
Message-ID: <aYM0D7lpl567ZNwN@zatzit> (raw)
In-Reply-To: <20260202231159.44c251bd@elisabeth>

[-- Attachment #1: Type: text/plain, Size: 7522 bytes --]

On Mon, Feb 02, 2026 at 11:12:08PM +0100, Stefano Brivio wrote:
> On Mon, 2 Feb 2026 10:24:14 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Sat, Jan 31, 2026 at 10:47:28AM +0100, Stefano Brivio wrote:
> > > On Fri, 30 Jan 2026 16:58:11 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > On incoming migrations we need to bind() reconstructed sockets to their
> > > > correct local address.  We can't do this if the origin passt instance is
> > > > in the same namespace and still has those addresses bound.  Arguably that's
> > > > a bug in bind()s operation during repair mode, but for now we have to work
> > > > around it.
> > > > 
> > > > So, to allow local-to-local migrations we close() sockets on the outgoing
> > > > side as we process them.  In addition to closing the connected socket we
> > > > also have to close the associated listen()ing socket, because that can also
> > > > cause an address conflict.
> > > > 
> > > > To do that, we introduced the listening_sock field in the connection
> > > > state, because we had no other way to find the right listening sockets.
> > > > Now that we have the forwarding table, we have a complete list of
> > > > listening sockets elsewhere.  We can use that instead, to close all
> > > > listening sockets on outbound migration, rather than just the ones that
> > > > might conflict.
> > > > 
> > > > This is cleaner and, importantly, saves a valuable 32-bits in the flow
> > > > state structure.  It does mean that there is a longer window where a peer
> > > > attempting to connect during migration might get a Connection Refused.
> > > > I think this is an acceptable trade-off for now: arguably we should not
> > > > allow local-to-local migrations in any case, since the socket closes make
> > > > it impossible to safely roll back migration as per the qemu model.
> > > > 
> > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > ---
> > > >  flow.c     | 12 ++++++++++++
> > > >  fwd.c      | 21 +++++++++++++++++++++
> > > >  fwd.h      |  1 +
> > > >  tcp.c      |  9 ---------
> > > >  tcp_conn.h |  3 ---
> > > >  5 files changed, 34 insertions(+), 12 deletions(-)
> > > > 
> > > > diff --git a/flow.c b/flow.c
> > > > index fd4d5f38..5207143d 100644
> > > > --- a/flow.c
> > > > +++ b/flow.c
> > > > @@ -1023,6 +1023,9 @@ static int flow_migrate_source_rollback(struct ctx *c, unsigned bound, int ret)
> > > >  
> > > >  	debug("...roll back migration");
> > > >  
> > > > +	if (fwd_listen_sync(c, &c->tcp.fwd_in, PIF_HOST, IPPROTO_TCP) < 0)
> > > > +		die("Failed to re-establish listening sockets");
> > > > +
> > > >  	foreach_established_tcp_flow(flow) {
> > > >  		if (FLOW_IDX(flow) >= bound)
> > > >  			break;
> > > > @@ -1147,6 +1150,15 @@ int flow_migrate_source(struct ctx *c, const struct migrate_stage *stage,  
> > > 
> > > Nit: the comment to this function currently says "Send data (flow
> > > table) for flow, close listening". I fixed that up (dropped ", close listening").  
> > 
> > Good point, thanks.
> > 
> > > >  		return flow_migrate_source_rollback(c, FLOW_MAX, rc);
> > > >  	}
> > > >  
> > > > +	/* HACK: A local to local migrate will fail if the origin passt has the
> > > > +	 * listening sockets still open when the destination passt tries to bind
> > > > +	 * them.  This does mean there's a window where we lost our listen()s,
> > > > +	 * even if the migration is rolled back later.  The only way to really
> > > > +	 * fix that is to not allow local to local migration, which arguably we
> > > > +	 * should (use namespaces for testing instead). */  
> > > 
> > > Actually, we already use namespaces in the current tests,  
> > 
> > Oh, nice.
> > 
> > > but we didn't
> > > (always) do that during development, and it might be convenient in
> > > general to have the possibility to test *a part* of the implementation
> > > using the same namespace as long as it's reasonably cheap (it seems to
> > > be).  
> > 
> > Depends what cost you're talking about.  It's cheap in terms of
> > computational complexity, and code compexity.  It means, however, that
> > we can't necessarily roll back failed migrations - i.e. resume on the
> > origin system.  That isn't really correct for the qemu migration
> > model, which is why I think allowing local migrations probably isn't
> > the best idea, at least by default.
> 
> Well it's not entirely true that we can't roll back failed migrations.
> 
> Depending on the stage we're at, we might close connections, but other
> than that we can always continue, somehow. Losing some connections
> looks relatively cheap as well.

I guess that's true. I'm not sure we currently handle this even as
well as is possible within the constraints.

> But sure, it's not ideal. Maybe yes, we shouldn't have that as default,
> add a new option and update QEMU's documentation (see below) with
> it.

There's two layers to it.  Dropping connections on what's otherwise a
no-op migration failure is ugly.  Opening the possibility that we
might not be able to rebind ports we were already listening on is
worse.

> > > That's just a part because anyway bind() and connect() will conflict,
> > > if we're in the same namespace, which is a kernel issue you already
> > > noted:  
> > 
> > Well, it's a kernel issue that the bound listen()ing sockets conflict
> > with the half-constructed flow sockets.  Having the listening sockets
> > of the origin passt conflict with the listening sockets of the
> > destination passt is pretty much expected, and would still be an
> > impediment to local migration.
> 
> Ah, right, we don't set listening sockets to repair mode... should we,
> by the way?

Uh.. I don't think so?  I'm not really sure how repair mode operates
for listening sockets, or if it should even allow that.  The relevant
state of a listening socket is pretty much limited to its bound
address, so I don't think we need any additional mechanism to extract
state.

> With the fix for freezing incoming TCP queues I'm working
> on, as a side effect, that would also mean that the kernel will ignore
> SYN segments altogether, which is desirable I think. Maybe it already
> happens for some other reason...?
> 
> But anyway, without forwarded ports you don't have listening sockets.

True.

> > >   https://pad.passt.top/p/TcpRepairTodo#L3
> > >   Repair mode sockets should not have address conflicts with non-repair
> > >   sockets (both bind() and connect())
> > > 
> > > but even that part is convenient to have, I think,  
> > 
> > I'm not really sure what you mean by that.
> 
> That we can still *partially* test local migration, without forwarded
> ports.

Ah, ok.

> We can check the interface, including passt-repair and all the
> vhost-user bits without actual connections, in a simpler way compared
> to namespaces.
> 
> See:
> 
>   https://www.qemu.org/docs/master/system/devices/net.html#example-of-migration-of-a-guest-on-the-same-host
> 
> and:
> 
>   https://lore.kernel.org/qemu-devel/20260131032700.12f27487@elisabeth/
> 
> it won't work for any practical case, but one can still test a big
> chunk of functionality that way.

Good point.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

      reply	other threads:[~2026-02-04 12:05 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-30  5:58 [PATCH 0/1] migrate: Eliminate listening_sock field from TCP connection state David Gibson
2026-01-30  5:58 ` [PATCH 1/1] migrate: Use forward table information to close() listening sockets David Gibson
2026-01-31  9:47   ` Stefano Brivio
2026-02-02  0:24     ` David Gibson
2026-02-02 22:12       ` Stefano Brivio
2026-02-04 11:57         ` David Gibson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aYM0D7lpl567ZNwN@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).