From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v2 0/2] More graceful handling of migration without passt-repair
Date: Thu, 27 Feb 2025 05:32:19 +0100 [thread overview]
Message-ID: <20250227053219.01e76539@elisabeth> (raw)
In-Reply-To: <Z7_DTVnEZFYlRMJl@zatzit>
On Thu, 27 Feb 2025 12:43:41 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> On Wed, Feb 26, 2025 at 12:24:12PM +0100, Stefano Brivio wrote:
> > On Wed, 26 Feb 2025 19:51:11 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >
> > > On Wed, Feb 26, 2025 at 09:09:48AM +0100, Stefano Brivio wrote:
> > > > On Wed, 26 Feb 2025 11:27:32 +1100
> > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >
> > > > > On Tue, Feb 25, 2025 at 06:43:16PM +0100, Stefano Brivio wrote:
> > > > > > On Tue, 25 Feb 2025 16:51:30 +1100
> > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > >
> > > > > > > From Red Hat internal testing we've had some reports that if
> > > > > > > attempting to migrate without passt-repair, the failure mode is uglier
> > > > > > > than we'd like. The migration fails, which is somewhat expected, but
> > > > > > > we don't correctly roll things back on the source, so it breaks
> > > > > > > network there as well.
> > > > > > >
> > > > > > > Handle this more gracefully allowing the migration to proceed in this
> > > > > > > case, but allow TCP connections to break
> > > > > > >
> > > > > > > I've now tested this reasonably:
> > > > > > > * I get a clean migration if there are now active flows
> > > > > > > * Migration completes, although connections are broken if
> > > > > > > passt-repair isn't connected
> > > > > > > * Basic test suite (minus perf)
> > > > > > >
> > > > > > > I didn't manage to test with libvirt yet, but I'm pretty convinced the
> > > > > > > behaviour should be better than it was.
> > > > > >
> > > > > > I did, and it is. The series looks good to me and I would apply it as
> > > > > > it is, but I'm waiting a bit longer in case you want to try out some
> > > > > > variations based on my tests as well. Here's what I did.
> > > > >
> > > > > [snip]
> > > > >
> > > > > Thanks for the detailed instructions. More complex than I might have
> > > > > liked, but oh well.
> > > > >
> > > > > > $ virsh migrate --verbose --p2p --live --unsafe alpine --tunneled qemu+ssh://88.198.0.161:10951/session
> > > > > > Migration: [97.59 %]error: End of file while reading data: : Input/output error
> > > > > >
> > > > > > ...despite --verbose the error doesn't tell much (perhaps I need
> > > > > > LIBVIRT_DEBUG=1 instead?), but passt terminates at this point. With
> > > > > > this series (I just used 'make install' from the local build), migration
> > > > > > succeeds instead:
> > > > > >
> > > > > > $ virsh migrate --verbose --p2p --live --unsafe alpine --tunneled qemu+ssh://88.198.0.161:10951/session
> > > > > > Migration: [100.00 %]
> > > > > >
> > > > > > Now, on the target, I still have to figure out how to tell libvirt
> > > > > > to start QEMU and prepare for the migration (equivalent of '-incoming'
> > > > > > as we use in our tests), instead of just starting a new instance like
> > > > > > it does. Otherwise, I have no chance to start passt-repair there.
> > > > > > Perhaps it has something to do with persistent mode described here:
> > > > >
> > > > > Ah. So I'm pretty sure virsh migrate will automatically start qemu
> > > > > with --incoming on the target.
> > > >
> > > > ("-incoming"), yes, see src/qemu/qemu_migration.c,
> > > > qemuMigrationDstPrepare().
> > > >
> > > > > IIUC the problem here is more about
> > > > > timing: we want it to start it early, so that we have a chance to
> > > > > start passt-repair and let it connect before the migration actually
> > > > > happens.
> > > >
> > > > For the timing itself, we could actually wait for passt-repair to be
> > > > there, with a timeout (say, 100ms).
> > >
> > > I guess. That still requires some way for KubeVirt (or whatever) to
> > > know at least roughly when it needs to launch passt-repair, and I'm
> > > not sure if that's something we can currently get from libvirt.
> >
> > KubeVirt sets up the target pod, and that's when it should be done (if
> > we have an inotify mechanism or similar). I can't point to an exact code
> > path yet but there's something like that.
>
> Right, but that approach does require inotify and starting
> passt-repair before passt, which might be fine, but I have the concern
> noted below. To avoid that we'd need notification after passt & qemu
> are started on the target, but before the migration is actually
> initiated which I don't think libvirt provides.
>
> > > > We could also modify passt-repair to set up an inotify watcher if the
> > > > socket isn't there yet.
> > >
> > > Maybe, yes. This kind of breaks our "passt starts first, passt-repair
> > > connects to it" model though, and I wonder if we need to revisit the
> > > security implications of that.
> >
> > I don't think it actually breaks that model for security purposes,
> > because the guest doesn't have anyway a chance to cause a connection to
> > passt-repair. The guest is still suspended (or missing) at that point.
>
> I wasn't thinking of threat models coming from the guest, but an
> attack from an unrelated process impersonating passt in order to
> access passt-repair's superpowers.
Then an inotify watch shouldn't substantially change things. The
attacker could create the socket earlier and obtain the same outcome.
> [...]
>
> > We could even think of deferring switching repair mode off until the
> > right address is there, by the way. That would make a difference to
> > us.
>
> Do you mean by blocking? Or by returning to normal operation with the
> flow flagged somehow to be "woken up" by a netlink monitor?
The latter. I don't think we should block connectivity (with new
addresses) meanwhile.
--
Stefano
prev parent reply other threads:[~2025-02-27 4:32 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-25 5:51 [PATCH v2 0/2] More graceful handling of migration without passt-repair David Gibson
2025-02-25 5:51 ` [PATCH v2 1/2] migrate, flow: Trivially succeed if migrating with no flows David Gibson
2025-02-25 5:51 ` [PATCH v2 2/2] migrate, flow: Don't attempt to migrate TCP flows without passt-repair David Gibson
2025-02-25 17:43 ` [PATCH v2 0/2] More graceful handling of migration " Stefano Brivio
2025-02-26 0:27 ` David Gibson
2025-02-26 8:09 ` Stefano Brivio
2025-02-26 8:51 ` David Gibson
2025-02-26 11:24 ` Stefano Brivio
2025-02-27 1:43 ` David Gibson
2025-02-27 4:32 ` Stefano Brivio [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250227053219.01e76539@elisabeth \
--to=sbrivio@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).