public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top, Nir Dothan <ndothan@redhat.com>
Subject: Re: [PATCH] treewide: By default, don't quit source after migration, keep sockets open
Date: Wed, 23 Jul 2025 10:27:30 +1000	[thread overview]
Message-ID: <aIAscoA5yMLtBbTf@zatzit> (raw)
In-Reply-To: <20250722231249.22e09340@elisabeth>

[-- Attachment #1: Type: text/plain, Size: 6540 bytes --]

On Tue, Jul 22, 2025 at 11:12:49PM +0200, Stefano Brivio wrote:
> On Tue, 22 Jul 2025 10:33:00 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Tue, Jul 22, 2025 at 12:12:58AM +0200, Stefano Brivio wrote:
> > > We are hitting an issue in the KubeVirt integration where some data is
> > > still sent to the source instance even after migration is complete. As
> > > we exit, the kernel closes our sockets and resets connections. The
> > > resulting RST segments are sent to peers, effectively terminating
> > > connections that were meanwhile migrated.
> > > 
> > > At the moment, this is not done intentionally, but in the future
> > > KubeVirt might enable OVN-Kubernetes features where source and
> > > destination nodes are explicitly getting mirrored traffic for a while,
> > > in order to decrease migration downtime.
> > > 
> > > By default, don't quit after migration is completed on the source: the
> > > previous behaviour can be enabled with the new --migrate-exit
> > > option.  
> > 
> > Might be worth explicitly mentioning that the reason this helps is
> > that having the sockets still around in repair mode prevents the
> > source node from responding to any traffic for them.
> 
> Added as separate paragraph, below.

Thanks.

[snip]
> > > diff --git a/conf.c b/conf.c
> > > index 6c747aa..5e69014 100644
> > > --- a/conf.c
> > > +++ b/conf.c
> > > @@ -864,6 +864,12 @@ static void usage(const char *name, FILE *f, int status)
> > >  		FPRINTF(f,
> > >  			"  --repair-path PATH	path for passt-repair(1)\n"
> > >  			"    default: append '.repair' to UNIX domain path\n");
> > > +		FPRINTF(f,
> > > +			"  --migrate-exit	source quits after migration\n"
> > > +			"    default: source keeps running after migration\n");
> > > +		FPRINTF(f,
> > > +			"  --migrate-no-linger	close sockets on migration\n"
> > > +			"    default: keep sockets open, ignore data events\n");  
> > 
> > I'm a bit uncomfortable adding this as regular, supported options.
> > I'm hoping we can eliminate them in the not too distant future.  In
> > the medium term we can do that by improving our test code to simulate
> > different hosts with namespaces.  Longer term we can allow local to
> > local migration without these options by fixing the kernel: I don't
> > believe bind() should fail if the address conflict is with a socket in
> > repair mode.
> 
> I see the point about --migrate-no-linger, but --migrate-exit looks
> like something that depends on the migration workflow, not necessarily
> something to cover up issues in our tests or in the kernel. In any
> case...

It does depend on the workflow, but I think this needs to be managed
by qemu (or whatever upper level component is managing the overall
migration).  --migrate-exit doesn't make us exit once the migration is
complete - it makes us exit once *our part of* the migration is
complete, the migration could still fail after that.  So in the happy
path, yes, we'll exit, but it should be because qemu or libvirt or
whatever is satisfied the migration is finished and kills us off

> > Further, as user accessible options these are kinda weird - what they
> > do is defined in terms of some pretty obscure internals, not in terms
> > of what the upshot is.  So, I think it would be better to introduce
> > these as deprecated / debug-only / undocumented options.
> 
> ...I marked both as deprecated in v2, to signal to any hypothetical
> (I'm fairly sure not real) user that they shouldn't assume it's safe to
> rely on them.
> 
> That's the only other category of options we have so far, we don't have
> undocumented ones, and I think it's a feature. Especially as we use
> them in the tests.

Works for me.

> By the way, I would argue that testing is an important type of usage,
> and changes in the test setup itself or in the kernel are non-trivial
> effort, and it's rather uncertain whether we'll ever manage to commit
> to that, so I don't quite feel that marking them as deprecated is
> entirely warranted, but I see the point about possibly dropping that
> migration model at some point in the future.
> 
> Regardless, the fact we *need* them at the moment should prove that
> they are at least convenient, even if just for testing.

Fair enough.

[snip]
> > > diff --git a/passt.c b/passt.c
> > > index 388d10f..f4c108f 100644
> > > --- a/passt.c
> > > +++ b/passt.c
> > > @@ -308,6 +308,9 @@ loop:
> > >  		      c.mode == MODE_PASTA ? "pasta" : "passt",
> > >  		      EPOLL_TYPE_STR(ref.type), ref.fd, eventmask);
> > >  
> > > +		if (c.ignore_data_events && ref.type <= EPOLL_TYPE_DATA_MAX)
> > > +			continue;
> > > +  
> > 
> > This worries me slightly.  I think it will be fine if passt is killed
> > not long after the migration.  However, if passt lingers, we're
> > ignoring, but not clearing events.  For those events which are
> > level-triggered this could make us chew CPU on a tight epoll_wait()
> > loop.
> 
> Ah, yes, I had that concern as well, but I conveniently forgot about it
> at some point.
> 
> > I think a preferable approach would be to EPOLL_CTL_DEL each socket at
> > the point we would close() them with --migrate-no-linger.  I'd be fine
> > with that as a follow up improvement, though.
> 
> I was pondering about adding this on top of the ignore_data_events
> trick, but, actually, the whole thing as of this patch is somewhat
> bogus because
> 
> - we're ignoring events on TCP sockets (intentional),
> 
> - we're ignoring events on the tap device (who cares, migration is only
>   supported with vhost-user)
> 
> - but *not* ignoring events on the vhost-user kick descriptor
>   (oversight).
> 
> On a second thought, it doesn't look safe to ignore events on the kick
> descriptor, and in any case, with this change, we don't want to prevent
> the guest to send out further packets. It's not expected anyway.
> 
> So I just replaced the whole thing with EPOLL_CTL_DEL (epoll_del()) as
> we go through the sockets. It's simpler and arguably safer.

Yes, that's what I had in mind as well (I thought I put that in the
mail, but it looks like I didn't).  Just one additional concern that I
don't think need hold up merge: do we also need to epoll_del() our
listening sockets?

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2025-07-23  0:27 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-21 22:12 [PATCH] treewide: By default, don't quit source after migration, keep sockets open Stefano Brivio
2025-07-22  0:33 ` David Gibson
2025-07-22 21:12   ` Stefano Brivio
2025-07-23  0:27     ` David Gibson [this message]
2025-07-23  9:17       ` Stefano Brivio
2025-07-24  1:48         ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aIAscoA5yMLtBbTf@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=ndothan@redhat.com \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).