From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v2 0/2] More graceful handling of migration without passt-repair
Date: Wed, 26 Feb 2025 09:09:48 +0100 [thread overview]
Message-ID: <20250226090948.3d1fff91@elisabeth> (raw)
In-Reply-To: <Z75f9IDhnLS7UmDW@zatzit>
On Wed, 26 Feb 2025 11:27:32 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> On Tue, Feb 25, 2025 at 06:43:16PM +0100, Stefano Brivio wrote:
> > On Tue, 25 Feb 2025 16:51:30 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >
> > > From Red Hat internal testing we've had some reports that if
> > > attempting to migrate without passt-repair, the failure mode is uglier
> > > than we'd like. The migration fails, which is somewhat expected, but
> > > we don't correctly roll things back on the source, so it breaks
> > > network there as well.
> > >
> > > Handle this more gracefully allowing the migration to proceed in this
> > > case, but allow TCP connections to break
> > >
> > > I've now tested this reasonably:
> > > * I get a clean migration if there are now active flows
> > > * Migration completes, although connections are broken if
> > > passt-repair isn't connected
> > > * Basic test suite (minus perf)
> > >
> > > I didn't manage to test with libvirt yet, but I'm pretty convinced the
> > > behaviour should be better than it was.
> >
> > I did, and it is. The series looks good to me and I would apply it as
> > it is, but I'm waiting a bit longer in case you want to try out some
> > variations based on my tests as well. Here's what I did.
>
> [snip]
>
> Thanks for the detailed instructions. More complex than I might have
> liked, but oh well.
>
> > $ virsh migrate --verbose --p2p --live --unsafe alpine --tunneled qemu+ssh://88.198.0.161:10951/session
> > Migration: [97.59 %]error: End of file while reading data: : Input/output error
> >
> > ...despite --verbose the error doesn't tell much (perhaps I need
> > LIBVIRT_DEBUG=1 instead?), but passt terminates at this point. With
> > this series (I just used 'make install' from the local build), migration
> > succeeds instead:
> >
> > $ virsh migrate --verbose --p2p --live --unsafe alpine --tunneled qemu+ssh://88.198.0.161:10951/session
> > Migration: [100.00 %]
> >
> > Now, on the target, I still have to figure out how to tell libvirt
> > to start QEMU and prepare for the migration (equivalent of '-incoming'
> > as we use in our tests), instead of just starting a new instance like
> > it does. Otherwise, I have no chance to start passt-repair there.
> > Perhaps it has something to do with persistent mode described here:
>
> Ah. So I'm pretty sure virsh migrate will automatically start qemu
> with --incoming on the target.
("-incoming"), yes, see src/qemu/qemu_migration.c,
qemuMigrationDstPrepare().
> IIUC the problem here is more about
> timing: we want it to start it early, so that we have a chance to
> start passt-repair and let it connect before the migration actually
> happens.
For the timing itself, we could actually wait for passt-repair to be
there, with a timeout (say, 100ms).
We could also modify passt-repair to set up an inotify watcher if the
socket isn't there yet.
> Crud... I didn't think of this before. I don't know that there's any
> sensible way to do this without having libvirt managing passt-repair
> as well.
But we can't really use it as we're assuming that passt-repair will run
with capabilities virtqemud doesn't want/need.
> I mean it's not impossible there's some option to do this,
> but I doubt there's been any reason before for something outside of
> libvirt to control the timing of the target qemu's creation. I think
> we need to ask libvirt people about this.
I'm looking into it (and perhaps virtiofsd had similar troubles?).
> > https://libvirt.org/migration.html#configuration-file-handling
>
> Yeah.. I don't think this is relevant.
>
> > and --listen-address, but I'm not quite sure yet.
> >
> > That is, I could only test different failures (early one on source, or
> > later one on target) with this, not a complete successful migration.
> >
> > > There are more fragile cases that I'm looking to fix, particularly the
> > > die()s in flow_migrate_source_rollback() and elsewhere, however I ran
> > > into various complications that I didn't manage to sort out today.
> > > I'll continue looking at those tomorrow. I'm now pretty confident
> > > that those additional fixes won't entirely supersede the changes in
> > > this series, so it should be fine to apply these on their own.
> >
> > By the way, I think the somewhat less fragile/more obvious case where
> > we fail clumsily is when the target doesn't have the same address as
> > the source (among other possible addresses). In that case, we fail (and
> > terminate) with a rather awkward:
>
> Ah, yes, that is a higher priority fragile case.
>
> > 93.7217: ERROR: Failed to bind socket for migrated flow: Cannot assign requested address
> > 93.7218: ERROR: Flow 0 (TCP connection): Can't set up socket: (null), drop
> > 93.7331: ERROR: Selecting TCP_SEND_QUEUE, socket 1: Socket operation on non-socket
> > 93.7333: ERROR: Unexpected reply from TCP_REPAIR helper: -100
> >
> > that's because, oops, I only took care of socket() failures in
> > tcp_flow_repair_socket(), but not bind() failures (!). Sorry.
>
> No, you check for errors on both.
Well, "check", yes, but I'm not even setting an error code. I haven't
tried your 3/3 yet but look at "(null)" resulting from:
flow_err(flow, "Can't set up socket: %s, drop", strerror_(rc));
...rc is 0.
> The problem is that in
> tcp_flow_migrate_target() we cancel the flow allocation and carry on -
> but the source will still send information for this flow, putting us
> out of sync with the stream.
That, too, yes.
> > Once that's fixed, flow_migrate_target() should also take care of
> > decreasing 'count' accordingly. I just had a glimpse but didn't
> > really try to sketch a fix.
>
> Adjusting count won't do the job. Instead we'd need to keep the flow
> around, but marked as "dead" somehow, so that we read but discard the
> incoming information for it. The MIGRATING state I added in one of my
> drafts was supposed to help with this sort of thing. But that's quite
> a complex change.
I think it's great that you could (practically) solve it with three
lines...
> Hrm... at least in the near term, I think it might actually be easier
> to set IP_FREEBIND when we create sockets for in-migrating flows.
> That way we can process them normally, they just won't do much without
> the address set. It has the additional advantage that it should work
> if the higher layers only move the IP just after the migration,
> instead of in advance.
Perhaps we want it anyway, but I wonder: what happens if we turn repair
mode off and we bound to a non-local address? I suppose we won't send
out anything, but I'm not sure. If we send out the first keep-alive
segment with a wrong address, we probably ruined the connection.
Once I find a solution for the target libvirt/passt-repair thing (and
the remaining SELinux issues), I'll try to have a look at this too. I
haven't tried yet a migration with a mismatching address on the target
and passt-repair available.
--
Stefano
next prev parent reply other threads:[~2025-02-26 8:09 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-25 5:51 [PATCH v2 0/2] More graceful handling of migration without passt-repair David Gibson
2025-02-25 5:51 ` [PATCH v2 1/2] migrate, flow: Trivially succeed if migrating with no flows David Gibson
2025-02-25 5:51 ` [PATCH v2 2/2] migrate, flow: Don't attempt to migrate TCP flows without passt-repair David Gibson
2025-02-25 17:43 ` [PATCH v2 0/2] More graceful handling of migration " Stefano Brivio
2025-02-26 0:27 ` David Gibson
2025-02-26 8:09 ` Stefano Brivio [this message]
2025-02-26 8:51 ` David Gibson
2025-02-26 11:24 ` Stefano Brivio
2025-02-27 1:43 ` David Gibson
2025-02-27 4:32 ` Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250226090948.3d1fff91@elisabeth \
--to=sbrivio@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).