Re: [PATCH v5 5/6] vhost_user: Make source quit after reporting migration state

public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed

From: Hanna Czenczek <hreitz@redhat.com>
To: Stefano Brivio <sbrivio@redhat.com>,
	German Maglione <gmaglione@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>,
	passt-dev@passt.top, Laurent Vivier <lvivier@redhat.com>
Subject: Re: [PATCH v5 5/6] vhost_user: Make source quit after reporting migration state
Date: Wed, 5 Feb 2025 09:58:11 +0100	[thread overview]
Message-ID: <9aa66ede-51f5-46b5-ac15-4dbe93ce8903@redhat.com> (raw)
In-Reply-To: <20250205064708.1af3ef28@elisabeth>

On 05.02.25 06:47, Stefano Brivio wrote:
> To: German whom I feel okay to To: now that
> https://github.com/kubevirt/kubevirt/pull/13756 is merged, and Hanna who
> knows one thing or two about vhost-user based migration.
>
> This is Stefano from your neighbouring mailing list, passt-dev. David
> is wondering:
>
> On Wed, 5 Feb 2025 13:09:42 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
>> On Wed, Feb 05, 2025 at 01:39:03AM +0100, Stefano Brivio wrote:
>>> On migration, the source process asks passt-helper to set TCP sockets
>>> in repair mode, dumps the information we need to migrate connections,
>>> and closes them.
>>>
>>> At this point, we can't pass them back to passt-helper using
>>> SCM_RIGHTS, because they are closed, from that perspective, and
>>> sendmsg() will give us EBADF. But if we don't clear repair mode, the
>>> port they are bound to will not be available for binding in the
>>> target.
>>>
>>> Terminate once we're done with the migration and we reported the
>>> state. This is equivalent to clearing repair mode on the sockets we
>>> just closed.
>> As we've discussed, quitting still makes sense, but the description
>> above is not really accurate.  Perhaps,
>>
>> ===
>> Once we've passed the migration's "point of no return", there's no way
>> to resume the guest on the source side, because we no longer own the
>> connections.  There's not really anything we can do except exit.
>> ===
>>
>> Except.. thinking about it, I'm not sure that's technically true.
>> After migration, the source qemu enters a kind of limbo state.  I
>> suppose for the case of to-disk migration (savevm) the guest can
>> actually be resumed.  Which for us is not really compatible with
>> completing at least a local migration properly.  Not really sure what
>> to do about that.

I wouldn’t call it a limbo state, AFAIU the source side just continues 
to be the VM that it was, just paused.  So in case of migration failure 
on the destination (which is where virtiofsd migration errors are 
reported), you can (and should, I believe?) continue the source VM.  But 
even in case of success, you could have it continue, and then you’d have 
cloned the VM (which I think has always been a bit of a problem for 
networking specifically).

>> I think it's also technically possible to use monitor commands to boot
>> up essentially an entirely new guest instance in the original qemu,
>> in which case for us it would make sense to basically reset ourselves
>> (flush the low table).
>>
>> Hrm.. we really need to know the sequence of events in a bit more
>> detail to get this right (not that this stops improving the guts of
>> the logic in the meantime).
>>
>> I'm asking around to see if I can find who did the migration stuff or
>> virtiofsd, so we can compare notes.
> ...what we should be doing in the source passt at different stages of
> moving our TCP connections (or failing to move them) over to the target,
> which might be inspired by what you're doing with your... filesystem
> things in virtiofsd.

As far as I understand, with passt, there’s ownership that needs to be… 
passet. (ha ha)  Which leads to that “point of no return”.  We don’t 
have such a thing in virtiofsd, because ownership can be and is shared 
between source and destination instance once the migration state has 
started to be generated (and we know that the guest is stopped during 
vhost-user migration).

So how it works in virtiofsd is:
- Depending on how it’s done, collecting our state can too much take 
time for VM downtime.  So once we see F_LOG_ALL set, we begin collecting 
it.  At this point, the guest is still running, so we need to be able to 
change the collected state when the guest does something that’d affect it.
- Once we receive SET_DEVICE_STATE, we serialize our state.*
- Errors are returned through CHECK_DEVICE_STATE.
- The source instance remains open and operational.  If the (source) 
guest is unpaused, it can continue to use the filesystem.

The destination instance meanwhile receives that state (while the guest 
is stopped), deserializes and applies it, and is then ready to begin 
operation whenever requests start coming in from the guest.

There is no point of no return for us, because the source instance 
remains operational throughout; nothing is moved, just shared. Until 
execution continues on the destination (which I think you can only 
detect by seeing the virtqueue being started?), anything can still 
report an error and thus fail migration, requiring continuing on the source.

Besides waiting for which instance will eventually continue execution, 
the only thing that comes to mind that might be helpful is the 
return-path feature; with it, the source instance will at least know 
whether the destination was successful in migrating or not.  It isn’t 
reflected in vhost-user (yet), though.

* Note that in case of snapshotting (migration to file) with the 
background-snapshot migration flag set, F_LOG_ALL won’t be set.  So we 
must also prepare for receiving SET_DEVICE_STATE without F_LOG_ALL set, 
in which case we must then first collect our state before we can 
serialize it.

Hanna

> We're taking for granted that as long as we have a chance to detect
> failure (e.g. we can't dump sequence numbers from a TCP socket in the
> source), we should use that to abort the whole thing.
>
> Once we're past that point, we have several options. And actually,
> that's regardless of failure: because we also have the question of what
> to do if we see that nothing went wrong.
>
> We can exit in the source, for example (this is what patch implements):
> wait for VHOST_USER_CHECK_DEVICE_STATE, report that, and quit.
>
> Or we can just clear up all our connections and resume (start from a
> blank state). Or do that, only if we didn't smell failure.
>
> Would you have some pointers, general guidelines, ideas? I know that
> the topic is a bit broad, but I'm hopeful that you have a lot of
> clear answers for us. :) Thanks.
>
>>> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
>>> ---
>>>   vhost_user.c | 11 +++++++++++
>>>   1 file changed, 11 insertions(+)
>>>
>>> diff --git a/vhost_user.c b/vhost_user.c
>>> index b107d0f..70773d6 100644
>>> --- a/vhost_user.c
>>> +++ b/vhost_user.c
>>> @@ -997,6 +997,8 @@ static bool vu_send_rarp_exec(struct vu_dev *vdev,
>>>   	return false;
>>>   }
>>>   
>>> +static bool quit_on_device_state = false;
>>> +
>>>   /**
>>>    * vu_set_device_state_fd_exec() - Set the device state migration channel
>>>    * @vdev:	vhost-user device
>>> @@ -1024,6 +1026,9 @@ static bool vu_set_device_state_fd_exec(struct vu_dev *vdev,
>>>   	migrate_request(vdev->context, msg->fds[0],
>>>   			direction == VHOST_USER_TRANSFER_STATE_DIRECTION_LOAD);
>>>   
>>> +	if (direction == VHOST_USER_TRANSFER_STATE_DIRECTION_SAVE)
>>> +		quit_on_device_state = true;
>>> +
>>>   	/* We don't provide a new fd for the data transfer */
>>>   	vmsg_set_reply_u64(msg, VHOST_USER_VRING_NOFD_MASK);
>>>   
>>> @@ -1201,4 +1206,10 @@ void vu_control_handler(struct vu_dev *vdev, int fd, uint32_t events)
>>>   
>>>   	if (reply_requested)
>>>   		vu_send_reply(fd, &msg);
>>> +
>>> +	if (quit_on_device_state &&
>>> +	    msg.hdr.request == VHOST_USER_CHECK_DEVICE_STATE) {
>>> +		info("Migration complete, exiting");
>>> +		exit(EXIT_SUCCESS);
>>> +	}
>>>   }

next prev parent reply	other threads:[~2025-02-05  8:58 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-05  0:38 [PATCH v5 0/6] Draft, incomplete series introducing state migration Stefano Brivio
2025-02-05  0:38 ` [PATCH v5 1/6] Introduce facilities for guest migration on top of vhost-user infrastructure Stefano Brivio
2025-02-05  1:44   ` David Gibson
2025-02-05  0:39 ` [PATCH v5 2/6] migrate: Make more handling common rather than vhost-user specific Stefano Brivio
2025-02-05  0:39 ` [PATCH v5 3/6] migrate: Don't handle the migration channel through epoll Stefano Brivio
2025-02-05  0:39 ` [PATCH v5 4/6] Add interfaces and configuration bits for passt-repair Stefano Brivio
2025-02-05  0:39 ` [PATCH v5 5/6] vhost_user: Make source quit after reporting migration state Stefano Brivio
2025-02-05  2:09   ` David Gibson
2025-02-05  5:47     ` Stefano Brivio
2025-02-05  8:58       ` Hanna Czenczek [this message]
2025-02-05 10:19         ` Stefano Brivio
2025-02-05 11:39         ` David Gibson
2025-02-05  0:39 ` [PATCH v5 6/6] Implement source and target sides of migration Stefano Brivio
2025-02-05  1:10   ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9aa66ede-51f5-46b5-ac15-4dbe93ce8903@redhat.com \
    --to=hreitz@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=gmaglione@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).