Re: [PATCH v2 0/2] More graceful handling of migration without passt-repair

public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed

From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v2 0/2] More graceful handling of migration without passt-repair
Date: Wed, 26 Feb 2025 09:09:48 +0100	[thread overview]
Message-ID: <20250226090948.3d1fff91@elisabeth> (raw)
In-Reply-To: <Z75f9IDhnLS7UmDW@zatzit>

On Wed, 26 Feb 2025 11:27:32 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Tue, Feb 25, 2025 at 06:43:16PM +0100, Stefano Brivio wrote:
> > On Tue, 25 Feb 2025 16:51:30 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > From Red Hat internal testing we've had some reports that if
> > > attempting to migrate without passt-repair, the failure mode is uglier
> > > than we'd like.  The migration fails, which is somewhat expected, but
> > > we don't correctly roll things back on the source, so it breaks
> > > network there as well.
> > > 
> > > Handle this more gracefully allowing the migration to proceed in this
> > > case, but allow TCP connections to break
> > > 
> > > I've now tested this reasonably:
> > >  * I get a clean migration if there are now active flows
> > >  * Migration completes, although connections are broken if
> > >    passt-repair isn't connected
> > >  * Basic test suite (minus perf)
> > > 
> > > I didn't manage to test with libvirt yet, but I'm pretty convinced the
> > > behaviour should be better than it was.  
> > 
> > I did, and it is. The series looks good to me and I would apply it as
> > it is, but I'm waiting a bit longer in case you want to try out some
> > variations based on my tests as well. Here's what I did.  
> 
> [snip]
> 
> Thanks for the detailed instructions.  More complex than I might have
> liked, but oh well.
> 
> >   $ virsh migrate --verbose --p2p --live --unsafe alpine --tunneled qemu+ssh://88.198.0.161:10951/session
> >   Migration: [97.59 %]error: End of file while reading data: : Input/output error
> > 
> > ...despite --verbose the error doesn't tell much (perhaps I need
> > LIBVIRT_DEBUG=1 instead?), but passt terminates at this point. With
> > this series (I just used 'make install' from the local build), migration
> > succeeds instead:
> > 
> >   $ virsh migrate --verbose --p2p --live --unsafe alpine --tunneled qemu+ssh://88.198.0.161:10951/session
> >   Migration: [100.00 %]
> > 
> > Now, on the target, I still have to figure out how to tell libvirt
> > to start QEMU and prepare for the migration (equivalent of '-incoming'
> > as we use in our tests), instead of just starting a new instance like
> > it does. Otherwise, I have no chance to start passt-repair there.
> > Perhaps it has something to do with persistent mode described here:  
> 
> Ah.  So I'm pretty sure virsh migrate will automatically start qemu
> with --incoming on the target.

("-incoming"), yes, see src/qemu/qemu_migration.c,
qemuMigrationDstPrepare().

> IIUC the problem here is more about
> timing: we want it to start it early, so that we have a chance to
> start passt-repair and let it connect before the migration actually
> happens.

For the timing itself, we could actually wait for passt-repair to be
there, with a timeout (say, 100ms).

We could also modify passt-repair to set up an inotify watcher if the
socket isn't there yet.

> Crud... I didn't think of this before.  I don't know that there's any
> sensible way to do this without having libvirt managing passt-repair
> as well.

But we can't really use it as we're assuming that passt-repair will run
with capabilities virtqemud doesn't want/need.

> I mean it's not impossible there's some option to do this,
> but I doubt there's been any reason before for something outside of
> libvirt to control the timing of the target qemu's creation.  I think
> we need to ask libvirt people about this.

I'm looking into it (and perhaps virtiofsd had similar troubles?).

> >   https://libvirt.org/migration.html#configuration-file-handling  
> 
> Yeah.. I don't think this is relevant.
> 
> > and --listen-address, but I'm not quite sure yet.
> > 
> > That is, I could only test different failures (early one on source, or
> > later one on target) with this, not a complete successful migration.
> >   
> > > There are more fragile cases that I'm looking to fix, particularly the
> > > die()s in flow_migrate_source_rollback() and elsewhere, however I ran
> > > into various complications that I didn't manage to sort out today.
> > > I'll continue looking at those tomorrow.  I'm now pretty confident
> > > that those additional fixes won't entirely supersede the changes in
> > > this series, so it should be fine to apply these on their own.  
> > 
> > By the way, I think the somewhat less fragile/more obvious case where
> > we fail clumsily is when the target doesn't have the same address as
> > the source (among other possible addresses). In that case, we fail (and
> > terminate) with a rather awkward:  
> 
> Ah, yes, that is a higher priority fragile case.
> 
> >   93.7217: ERROR:   Failed to bind socket for migrated flow: Cannot assign requested address
> >   93.7218: ERROR:   Flow 0 (TCP connection): Can't set up socket: (null), drop
> >   93.7331: ERROR:   Selecting TCP_SEND_QUEUE, socket 1: Socket operation on non-socket
> >   93.7333: ERROR:   Unexpected reply from TCP_REPAIR helper: -100
> > 
> > that's because, oops, I only took care of socket() failures in
> > tcp_flow_repair_socket(), but not bind() failures (!). Sorry.  
> 
> No, you check for errors on both.

Well, "check", yes, but I'm not even setting an error code. I haven't
tried your 3/3 yet but look at "(null)" resulting from:

 		flow_err(flow, "Can't set up socket: %s, drop", strerror_(rc));

...rc is 0.

> The problem is that in
> tcp_flow_migrate_target() we cancel the flow allocation and carry on -
> but the source will still send information for this flow, putting us
> out of sync with the stream.

That, too, yes.

> > Once that's fixed, flow_migrate_target() should also take care of
> > decreasing 'count' accordingly. I just had a glimpse but didn't
> > really try to sketch a fix.  
> 
> Adjusting count won't do the job.  Instead we'd need to keep the flow
> around, but marked as "dead" somehow, so that we read but discard the
> incoming information for it.  The MIGRATING state I added in one of my
> drafts was supposed to help with this sort of thing.  But that's quite
> a complex change.

I think it's great that you could (practically) solve it with three
lines...

> Hrm... at least in the near term, I think it might actually be easier
> to set IP_FREEBIND when we create sockets for in-migrating flows.
> That way we can process them normally, they just won't do much without
> the address set.  It has the additional advantage that it should work
> if the higher layers only move the IP just after the migration,
> instead of in advance.

Perhaps we want it anyway, but I wonder: what happens if we turn repair
mode off and we bound to a non-local address? I suppose we won't send
out anything, but I'm not sure. If we send out the first keep-alive
segment with a wrong address, we probably ruined the connection.

Once I find a solution for the target libvirt/passt-repair thing (and
the remaining SELinux issues), I'll try to have a look at this too. I
haven't tried yet a migration with a mismatching address on the target
and passt-repair available.

-- 
Stefano

next prev parent reply	other threads:[~2025-02-26  8:09 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-25  5:51 David Gibson
2025-02-25  5:51 ` [PATCH v2 1/2] migrate, flow: Trivially succeed if migrating with no flows David Gibson
2025-02-25  5:51 ` [PATCH v2 2/2] migrate, flow: Don't attempt to migrate TCP flows without passt-repair David Gibson
2025-02-25 17:43 ` [PATCH v2 0/2] More graceful handling of migration " Stefano Brivio
2025-02-26  0:27   ` David Gibson
2025-02-26  8:09     ` Stefano Brivio [this message]
2025-02-26  8:51       ` David Gibson
2025-02-26 11:24         ` Stefano Brivio
2025-02-27  1:43           ` David Gibson
2025-02-27  4:32             ` Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250226090948.3d1fff91@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).