public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Laurent Vivier <lvivier@redhat.com>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH 8/9] vhost-user: add VHOST_USER_SET_DEVICE_STATE_FD command
Date: Fri, 20 Dec 2024 08:56:27 +0100	[thread overview]
Message-ID: <61d7f502-6f4d-40bc-b3a5-feed4229585d@redhat.com> (raw)
In-Reply-To: <20241219204759.65bc5353@elisabeth>

On 19/12/2024 20:47, Stefano Brivio wrote:
> On Thu, 19 Dec 2024 12:13:59 +0100
> Laurent Vivier <lvivier@redhat.com> wrote:
> 
>> +/**
>> + * vu_migrate() -- Send/receive passt insternal state to/from QEMU
> 
> Magic!
> 
>> + * @vdev:	vhost-user device
>> + * @events:	epoll events
>> + */
>> +void vu_migrate(struct vu_dev *vdev, uint32_t events)
>> +{
>> +	int ret;
>> +
>> +	/* TODO: collect/set passt internal state
>> +         *       and use vdev->device_state_fd to send/receive it
>> +         */
>> +	debug("vu_migrate fd %d events %x", vdev->device_state_fd, events);
>> +	if (events & EPOLLOUT) {
> 
> I haven't really reviewed the series yet, but I have a preliminary
> question: how does the hypervisor tell us that we're writing too much?

The hypervisor is not aware of anything. It's only a bridge between the two passt instances.
But normally the destination side can report an error, this aborts the migration on both 
sides.

In QEMU, the code is:

     while (true) {
         ssize_t read_ret;

	/* read the data from the backend */
         read_ret = RETRY_ON_EINTR(read(read_fd, transfer_buf, chunk_size));
         if (read_ret < 0) {
             ret = -errno;
             error_setg_errno(errp, -ret, "Failed to receive state");
             goto fail;
         }

         assert(read_ret <= chunk_size);
         qemu_put_be32(f, read_ret);

         if (read_ret == 0) {
             /* EOF */
             break;
         }

	/* send to destination QEMU */
         qemu_put_buffer(f, transfer_buf, read_ret);
     }

     /*
      * Back-end will not really care, but be clean and close our end of the pipe
      * before inquiring the back-end about whether transfer was successful
      */
     close(read_fd);

On the other side:

     while (true) {
         size_t this_chunk_size = qemu_get_be32(f);
         ssize_t write_ret;
         const uint8_t *transfer_pointer;

         if (this_chunk_size == 0) {
             /* End of state */
             break;
         }

         if (transfer_buf_size < this_chunk_size) {
             transfer_buf = g_realloc(transfer_buf, this_chunk_size);
             transfer_buf_size = this_chunk_size;
         }

         if (qemu_get_buffer(f, transfer_buf, this_chunk_size) <
                 this_chunk_size)
         {
             error_setg(errp, "Failed to read state");
             ret = -EINVAL;
             goto fail;
         }

         transfer_pointer = transfer_buf;
         while (this_chunk_size > 0) {
             write_ret = RETRY_ON_EINTR(
                 write(write_fd, transfer_pointer, this_chunk_size)
             );
             if (write_ret < 0) {
                 ret = -errno;
                 error_setg_errno(errp, -ret, "Failed to send state");
                 goto fail;
             } else if (write_ret == 0) {
                 error_setg(errp, "Failed to send state: Connection is closed");
                 ret = -ECONNRESET;
                 goto fail;
             }

             assert(write_ret <= this_chunk_size);
             this_chunk_size -= write_ret;
             transfer_pointer += write_ret;
         }
     }

     /*
      * Close our end, thus ending transfer, before inquiring the back-end about
      * whether transfer was successful
      */
     close(write_fd);


Moreover, I think it's important to know, the source side is stopped at the end of 
migration but it is in a state it can be restarted.

> I guess we'll do a short write and we'll need to go back to EPOLLOUT?
> There's no minimum chunk size we can write, correct?

Correct. It works even if there is no write() in this code.
The hypervisor reads until we close the file descriptor (it's a pipe in fact).

> 
>> +		debug("Saving backend state");
>> +
>> +		/* send some stuff */
>> +		ret = write(vdev->device_state_fd, "PASST", 6);
>> +		/* value to be returned by VHOST_USER_CHECK_DEVICE_STATE */
>> +		vdev->device_state_result = ret == -1 ? -1 : 0;
>> +		/* Closing the file descriptor signals the end of transfer */
>> +		epoll_ctl(vdev->context->epollfd, EPOLL_CTL_DEL,
>> +			  vdev->device_state_fd, NULL);
>> +		close(vdev->device_state_fd);
>> +		vdev->device_state_fd = -1;
>> +	} else if (events & EPOLLIN) {
> 
> ...and similarly here, I guess we'll get a short read?

We must read until we get a close().

> 
>> +		char buf[6];
>> +
>> +		debug("Loading backend state");
>> +		/* read some stuff */
>> +		ret = read(vdev->device_state_fd, buf, sizeof(buf));
>> +		/* value to be returned by VHOST_USER_CHECK_DEVICE_STATE */
>> +		if (ret != sizeof(buf)) {
>> +			vdev->device_state_result = -1;
>> +		} else {
>> +			ret = strncmp(buf, "PASST", sizeof(buf));
>> +			vdev->device_state_result = ret == 0 ? 0 : -1;
>> +		}
>> +	} else if (events & EPOLLHUP) {
>> +		debug("Closing migration channel");
>> +
>> +		/* The end of file signals the end of the transfer. */
>> +		epoll_ctl(vdev->context->epollfd, EPOLL_CTL_DEL,
>> +			  vdev->device_state_fd, NULL);
>> +		close(vdev->device_state_fd);
>> +		vdev->device_state_fd = -1;
>> +	}
>> +}
> 

Thanks,
Laurent


  reply	other threads:[~2024-12-20  7:56 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-19 11:13 [PATCH 0/9] vhost-user: Migration support Laurent Vivier
2024-12-19 11:13 ` [PATCH 1/9] virtio: Use const pointer for vu_dev Laurent Vivier
2024-12-20  0:24   ` David Gibson
2024-12-19 11:13 ` [PATCH 2/9] vhost-user: update protocol features and commands list Laurent Vivier
2024-12-19 11:13 ` [PATCH 3/9] vhost-user: add VHOST_USER_SET_LOG_FD command Laurent Vivier
2024-12-19 11:13 ` [PATCH 4/9] vhost-user: Pass vu_dev to more virtio functions Laurent Vivier
2024-12-19 11:13 ` [PATCH 5/9] vhost-user: add VHOST_USER_SET_LOG_BASE command Laurent Vivier
2024-12-19 11:13 ` [PATCH 6/9] vhost-user: Report to front-end we support VHOST_USER_PROTOCOL_F_LOG_SHMFD Laurent Vivier
2024-12-19 11:13 ` [PATCH 7/9] vhost-user: add VHOST_USER_CHECK_DEVICE_STATE command Laurent Vivier
2024-12-19 11:13 ` [PATCH 8/9] vhost-user: add VHOST_USER_SET_DEVICE_STATE_FD command Laurent Vivier
2024-12-19 19:47   ` Stefano Brivio
2024-12-20  7:56     ` Laurent Vivier [this message]
2024-12-20 13:28       ` Stefano Brivio
2024-12-19 11:14 ` [PATCH 9/9] vhost-user: Report to front-end we support VHOST_USER_PROTOCOL_F_DEVICE_STATE Laurent Vivier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=61d7f502-6f4d-40bc-b3a5-feed4229585d@redhat.com \
    --to=lvivier@redhat.com \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).