public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v2 0/2] More graceful handling of migration without passt-repair
Date: Thu, 27 Feb 2025 05:32:19 +0100	[thread overview]
Message-ID: <20250227053219.01e76539@elisabeth> (raw)
In-Reply-To: <Z7_DTVnEZFYlRMJl@zatzit>

On Thu, 27 Feb 2025 12:43:41 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Feb 26, 2025 at 12:24:12PM +0100, Stefano Brivio wrote:
> > On Wed, 26 Feb 2025 19:51:11 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Wed, Feb 26, 2025 at 09:09:48AM +0100, Stefano Brivio wrote:  
> > > > On Wed, 26 Feb 2025 11:27:32 +1100
> > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >     
> > > > > On Tue, Feb 25, 2025 at 06:43:16PM +0100, Stefano Brivio wrote:    
> > > > > > On Tue, 25 Feb 2025 16:51:30 +1100
> > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > >       
> > > > > > > From Red Hat internal testing we've had some reports that if
> > > > > > > attempting to migrate without passt-repair, the failure mode is uglier
> > > > > > > than we'd like.  The migration fails, which is somewhat expected, but
> > > > > > > we don't correctly roll things back on the source, so it breaks
> > > > > > > network there as well.
> > > > > > > 
> > > > > > > Handle this more gracefully allowing the migration to proceed in this
> > > > > > > case, but allow TCP connections to break
> > > > > > > 
> > > > > > > I've now tested this reasonably:
> > > > > > >  * I get a clean migration if there are now active flows
> > > > > > >  * Migration completes, although connections are broken if
> > > > > > >    passt-repair isn't connected
> > > > > > >  * Basic test suite (minus perf)
> > > > > > > 
> > > > > > > I didn't manage to test with libvirt yet, but I'm pretty convinced the
> > > > > > > behaviour should be better than it was.      
> > > > > > 
> > > > > > I did, and it is. The series looks good to me and I would apply it as
> > > > > > it is, but I'm waiting a bit longer in case you want to try out some
> > > > > > variations based on my tests as well. Here's what I did.      
> > > > > 
> > > > > [snip]
> > > > > 
> > > > > Thanks for the detailed instructions.  More complex than I might have
> > > > > liked, but oh well.
> > > > >     
> > > > > >   $ virsh migrate --verbose --p2p --live --unsafe alpine --tunneled qemu+ssh://88.198.0.161:10951/session
> > > > > >   Migration: [97.59 %]error: End of file while reading data: : Input/output error
> > > > > > 
> > > > > > ...despite --verbose the error doesn't tell much (perhaps I need
> > > > > > LIBVIRT_DEBUG=1 instead?), but passt terminates at this point. With
> > > > > > this series (I just used 'make install' from the local build), migration
> > > > > > succeeds instead:
> > > > > > 
> > > > > >   $ virsh migrate --verbose --p2p --live --unsafe alpine --tunneled qemu+ssh://88.198.0.161:10951/session
> > > > > >   Migration: [100.00 %]
> > > > > > 
> > > > > > Now, on the target, I still have to figure out how to tell libvirt
> > > > > > to start QEMU and prepare for the migration (equivalent of '-incoming'
> > > > > > as we use in our tests), instead of just starting a new instance like
> > > > > > it does. Otherwise, I have no chance to start passt-repair there.
> > > > > > Perhaps it has something to do with persistent mode described here:      
> > > > > 
> > > > > Ah.  So I'm pretty sure virsh migrate will automatically start qemu
> > > > > with --incoming on the target.    
> > > > 
> > > > ("-incoming"), yes, see src/qemu/qemu_migration.c,
> > > > qemuMigrationDstPrepare().
> > > >     
> > > > > IIUC the problem here is more about
> > > > > timing: we want it to start it early, so that we have a chance to
> > > > > start passt-repair and let it connect before the migration actually
> > > > > happens.    
> > > > 
> > > > For the timing itself, we could actually wait for passt-repair to be
> > > > there, with a timeout (say, 100ms).    
> > > 
> > > I guess.  That still requires some way for KubeVirt (or whatever) to
> > > know at least roughly when it needs to launch passt-repair, and I'm
> > > not sure if that's something we can currently get from libvirt.  
> > 
> > KubeVirt sets up the target pod, and that's when it should be done (if
> > we have an inotify mechanism or similar). I can't point to an exact code
> > path yet but there's something like that.  
> 
> Right, but that approach does require inotify and starting
> passt-repair before passt, which might be fine, but I have the concern
> noted below.  To avoid that we'd need notification after passt & qemu
> are started on the target, but before the migration is actually
> initiated which I don't think libvirt provides.
> 
> > > > We could also modify passt-repair to set up an inotify watcher if the
> > > > socket isn't there yet.    
> > > 
> > > Maybe, yes.  This kind of breaks our "passt starts first, passt-repair
> > > connects to it" model though, and I wonder if we need to revisit the
> > > security implications of that.  
> > 
> > I don't think it actually breaks that model for security purposes,
> > because the guest doesn't have anyway a chance to cause a connection to
> > passt-repair. The guest is still suspended (or missing) at that point.  
> 
> I wasn't thinking of threat models coming from the guest, but an
> attack from an unrelated process impersonating passt in order to
> access passt-repair's superpowers.

Then an inotify watch shouldn't substantially change things. The
attacker could create the socket earlier and obtain the same outcome.

> [...]
>
> > We could even think of deferring switching repair mode off until the
> > right address is there, by the way. That would make a difference to
> > us.  
> 
> Do you mean by blocking?  Or by returning to normal operation with the
> flow flagged somehow to be "woken up" by a netlink monitor?

The latter. I don't think we should block connectivity (with new
addresses) meanwhile.

-- 
Stefano


      reply	other threads:[~2025-02-27  4:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-25  5:51 [PATCH v2 0/2] More graceful handling of migration without passt-repair David Gibson
2025-02-25  5:51 ` [PATCH v2 1/2] migrate, flow: Trivially succeed if migrating with no flows David Gibson
2025-02-25  5:51 ` [PATCH v2 2/2] migrate, flow: Don't attempt to migrate TCP flows without passt-repair David Gibson
2025-02-25 17:43 ` [PATCH v2 0/2] More graceful handling of migration " Stefano Brivio
2025-02-26  0:27   ` David Gibson
2025-02-26  8:09     ` Stefano Brivio
2025-02-26  8:51       ` David Gibson
2025-02-26 11:24         ` Stefano Brivio
2025-02-27  1:43           ` David Gibson
2025-02-27  4:32             ` Stefano Brivio [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250227053219.01e76539@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).