From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202412 header.b=LRypbwut; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 9E9355A026F for ; Wed, 29 Jan 2025 02:29:52 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202412; t=1738114167; bh=4OKFjPcIKZLhyGienZA/4QXDxaG2pU0vhuHZSVvG8l8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=LRypbwutiP3/OVR6XZXyyGc+LPSPChtvzKgm+vY6R3zLP8IB64zGGZMnjitIHt5M5 NECeeMHWg2TvYZKvSh3F2kiCjIcO1Bs8yznYWp8ZBeshDsqDtcTraCMTF/n6Sqti4y YkXIbGjDsoai6sg8y7PXB0oXjjTH+u8HgLCLJCZl10/JRuPj6NOPOmgCCYcNS9DG7O v9VrGEVXBWqzP3wzaBo/P99kfYDI/rar2F3Jyqtsa5O0XDS+ZLS3IpUzIjrSo/kNlq jWWr8WAVmm1yNBUoMV/cQfqME9lq/5HXFWhUgYQeUsJFPFInuoxoG6a+fWujZKI4M9 a9VutS27gokWg== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4YjPhR1FfNz4x3d; Wed, 29 Jan 2025 12:29:27 +1100 (AEDT) Date: Wed, 29 Jan 2025 12:16:58 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 6/7] Introduce facilities for guest migration on top of vhost-user infrastructure Message-ID: References: <20250127231532.672363-1-sbrivio@redhat.com> <20250127231532.672363-7-sbrivio@redhat.com> <20250128075001.3557d398@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="K4gxDP4fQnRuDobU" Content-Disposition: inline In-Reply-To: <20250128075001.3557d398@elisabeth> Message-ID-Hash: OTHOYUYKK4LFQCDD4NLMGHS6WHHBUFJT X-Message-ID-Hash: OTHOYUYKK4LFQCDD4NLMGHS6WHHBUFJT X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Laurent Vivier X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --K4gxDP4fQnRuDobU Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jan 28, 2025 at 07:50:01AM +0100, Stefano Brivio wrote: > On Tue, 28 Jan 2025 12:40:12 +1100 > David Gibson wrote: >=20 > > On Tue, Jan 28, 2025 at 12:15:31AM +0100, Stefano Brivio wrote: > > > Add two sets (source or target) of three functions each for passt in > > > vhost-user mode, triggered by activity on the file descriptor passed > > > via VHOST_USER_PROTOCOL_F_DEVICE_STATE: > > >=20 > > > - migrate_source_pre() and migrate_target_pre() are called to prepare > > > for migration, before data is transferred > > >=20 > > > - migrate_source() sends, and migrate_target() receives migration data > > >=20 > > > - migrate_source_post() and migrate_target_post() are responsible for > > > any post-migration task > > >=20 > > > Callbacks are added to these functions with arrays of function > > > pointers in migrate.c. Migration handlers are versioned. > > >=20 > > > Versioned descriptions of data sections will be added to the > > > data_versions array, which points to versioned iovec arrays. Version > > > 1 is currently empty and will be filled in in subsequent patches. > > >=20 > > > The source announces the data version to be used and informs the peer > > > about endianness, and the size of void *, time_t, flow entries and > > > flow hash table entries. > > >=20 > > > The target checks if the version of the source is still supported. If > > > it's not, it aborts the migration. > > >=20 > > > Signed-off-by: Stefano Brivio > > > --- > > > Makefile | 12 +-- > > > migrate.c | 259 ++++++++++++++++++++++++++++++++++++++++++++++++++= ++ > > > migrate.h | 90 ++++++++++++++++++ > > > passt.c | 2 +- > > > vu_common.c | 122 ++++++++++++++++--------- > > > vu_common.h | 2 +- > > > 6 files changed, 438 insertions(+), 49 deletions(-) > > > create mode 100644 migrate.c > > > create mode 100644 migrate.h > > >=20 > > > diff --git a/Makefile b/Makefile > > > index 464eef1..1383875 100644 > > > --- a/Makefile > > > +++ b/Makefile > > > @@ -38,8 +38,8 @@ FLAGS +=3D -DDUAL_STACK_SOCKETS=3D$(DUAL_STACK_SOCK= ETS) > > > =20 > > > PASST_SRCS =3D arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c= fwd.c \ > > > icmp.c igmp.c inany.c iov.c ip.c isolation.c lineread.c log.c mld.c= \ > > > - ndp.c netlink.c packet.c passt.c pasta.c pcap.c pif.c tap.c tcp.c \ > > > - tcp_buf.c tcp_splice.c tcp_vu.c udp.c udp_flow.c udp_vu.c util.c \ > > > + ndp.c netlink.c migrate.c packet.c passt.c pasta.c pcap.c pif.c tap= =2Ec \ > > > + tcp.c tcp_buf.c tcp_splice.c tcp_vu.c udp.c udp_flow.c udp_vu.c uti= l.c \ > > > vhost_user.c virtio.c vu_common.c > > > QRAP_SRCS =3D qrap.c > > > SRCS =3D $(PASST_SRCS) $(QRAP_SRCS) > > > @@ -48,10 +48,10 @@ MANPAGES =3D passt.1 pasta.1 qrap.1 > > > =20 > > > PASST_HEADERS =3D arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h flo= w.h fwd.h \ > > > flow_table.h icmp.h icmp_flow.h inany.h iov.h ip.h isolation.h \ > > > - lineread.h log.h ndp.h netlink.h packet.h passt.h pasta.h pcap.h pi= f.h \ > > > - siphash.h tap.h tcp.h tcp_buf.h tcp_conn.h tcp_internal.h tcp_splic= e.h \ > > > - tcp_vu.h udp.h udp_flow.h udp_internal.h udp_vu.h util.h vhost_user= =2Eh \ > > > - virtio.h vu_common.h > > > + lineread.h log.h migrate.h ndp.h netlink.h packet.h passt.h pasta.h= \ > > > + pcap.h pif.h siphash.h tap.h tcp.h tcp_buf.h tcp_conn.h tcp_interna= l.h \ > > > + tcp_splice.h tcp_vu.h udp.h udp_flow.h udp_internal.h udp_vu.h util= =2Eh \ > > > + vhost_user.h virtio.h vu_common.h > > > HEADERS =3D $(PASST_HEADERS) seccomp.h > > > =20 > > > C :=3D \#include \nint main(){int a=3Dgetrandom(0, 0, = 0);} > > > diff --git a/migrate.c b/migrate.c > > > new file mode 100644 > > > index 0000000..bee9653 > > > --- /dev/null > > > +++ b/migrate.c > > > @@ -0,0 +1,259 @@ > > > +// SPDX-License-Identifier: GPL-2.0-or-later > > > + > > > +/* PASST - Plug A Simple Socket Transport > > > + * for qemu/UNIX domain socket mode > > > + * > > > + * PASTA - Pack A Subtle Tap Abstraction > > > + * for network namespace/tap device mode > > > + * > > > + * migrate.c - Migration sections, layout, and routines > > > + * > > > + * Copyright (c) 2025 Red Hat GmbH > > > + * Author: Stefano Brivio > > > + */ > > > + > > > +#include > > > +#include > > > + > > > +#include "util.h" > > > +#include "ip.h" > > > +#include "passt.h" > > > +#include "inany.h" > > > +#include "flow.h" > > > +#include "flow_table.h" > > > + > > > +#include "migrate.h" > > > + > > > +/* Current version of migration data */ > > > +#define MIGRATE_VERSION 1 > > > + > > > +/* Magic as we see it and as seen with reverse endianness */ > > > +#define MIGRATE_MAGIC 0xB1BB1D1B0BB1D1B0 > > > +#define MIGRATE_MAGIC_SWAPPED 0xB0D1B10B1B1DBBB1 =20 > >=20 > > As noted, I'm hoping we can get rid of "either endian" migration. But > > if this stays, we should define it using __bswap_constant_32() to > > avoid embarrassing mistakes. >=20 > Those always give me issues on musl, What sort of issues? We're already using them, and have fallback versions defined in util.h > so I'd rather test things on > big-endian and realise it's actually 0xB0D1B1B01B1DBBB1 (0x0b bitswap). >=20 > Feel free to post a different proposal if tested. >=20 > > > + > > > +/* Migration header to send from source */ > > > +static union migrate_header header =3D { > > > + .magic =3D MIGRATE_MAGIC, > > > + .version =3D htonl_constant(MIGRATE_VERSION), > > > + .time_t_size =3D htonl_constant(sizeof(time_t)), > > > + .flow_size =3D htonl_constant(sizeof(union flow)), > > > + .flow_sidx_size =3D htonl_constant(sizeof(struct flow_sidx)), > > > + .voidp_size =3D htonl_constant(sizeof(void *)), > > > +}; > > > + > > > +/* Data sections for version 1 */ > > > +static struct iovec sections_v1[] =3D { > > > + { &header, sizeof(header) }, > > > +}; > > > + > > > +/* Set of data versions */ > > > +static struct migrate_data data_versions[] =3D { > > > + { > > > + 1, sections_v1, > > > + }, > > > + { 0 }, > > > +}; > > > + > > > +/* Handlers to call in source before sending data */ > > > +struct migrate_handler handlers_source_pre[] =3D { > > > + { 0 }, > > > +}; > > > + > > > +/* Handlers to call in source after sending data */ > > > +struct migrate_handler handlers_source_post[] =3D { > > > + { 0 }, > > > +}; > > > + > > > +/* Handlers to call in target before receiving data with version 1 */ > > > +struct migrate_handler handlers_target_pre_v1[] =3D { > > > + { 0 }, > > > +}; > > > + > > > +/* Handlers to call in target after receiving data with version 1 */ > > > +struct migrate_handler handlers_target_post_v1[] =3D { > > > + { 0 }, > > > +}; > > > + > > > +/* Versioned sets of migration handlers */ > > > +struct migrate_target_handlers target_handlers[] =3D { > > > + { > > > + 1, > > > + handlers_target_pre_v1, > > > + handlers_target_post_v1, > > > + }, > > > + { 0 }, > > > +}; > > > + > > > +/** > > > + * migrate_source_pre() - Pre-migration tasks as source > > > + * @m: Migration metadata > > > + * > > > + * Return: 0 on success, error code on failure > > > + */ > > > +int migrate_source_pre(struct migrate_meta *m) > > > +{ > > > + struct migrate_handler *h; > > > + > > > + for (h =3D handlers_source_pre; h->fn; h++) { > > > + int rc; > > > + > > > + if ((rc =3D h->fn(m, h->data))) > > > + return rc; > > > + } > > > + > > > + return 0; > > > +} > > > + > > > +/** > > > + * migrate_source() - Perform migration as source: send state to hyp= ervisor > > > + * @fd: Descriptor for state transfer > > > + * @m: Migration metadata > > > + * > > > + * Return: 0 on success, error code on failure > > > + */ > > > +int migrate_source(int fd, const struct migrate_meta *m) > > > +{ > > > + static struct migrate_data *d; > > > + unsigned count; > > > + int rc; > > > + > > > + for (d =3D data_versions; d->v !=3D MIGRATE_VERSION; d++); =20 > >=20 > > Should ASSERT() if we don't find the version within the array. >=20 > This looks a bit unnecessary, MIGRATE_VERSION is defined just above... > it's just a readability killer to me. >=20 > > > + for (count =3D 0; d->sections[count].iov_len; count++); > > > + > > > + debug("Writing %u migration sections", count - 1 /* minus header */= ); > > > + rc =3D write_remainder(fd, d->sections, count, 0); > > > + if (rc < 0) > > > + return errno; > > > + > > > + return 0; > > > +} > > > + > > > +/** > > > + * migrate_source_post() - Post-migration tasks as source > > > + * @m: Migration metadata > > > + * > > > + * Return: 0 on success, error code on failure > > > + */ > > > +void migrate_source_post(struct migrate_meta *m) > > > +{ > > > + struct migrate_handler *h; > > > + > > > + for (h =3D handlers_source_post; h->fn; h++) > > > + h->fn(m, h->data); =20 > >=20 > > Is there actually anything we might need to do on the source after a > > successful migration, other than exit? >=20 > We might want to log a couple of things, which would warrant these > handlers. >=20 > But let's say we need to do something *similar* to "updating the > network" such as the RARP announcement that QEMU is requesting (this is IIUC, that's on the target end, not the source end... > intended for OVN-Kubernetes, so go figure), or that we need a > workaround for a kernel issue with implicit close() with TCP_REPAIR > on... I would leave this in for completeness. =2E..but sure, point taken. > > > +} > > > + > > > +/** > > > + * migrate_target_read_header() - Set metadata in target from source= header > > > + * @fd: Descriptor for state transfer > > > + * @m: Migration metadata, filled on return > > > + * > > > + * Return: 0 on success, error code on failure =20 > >=20 > > We nearly always use negative error codes. Why not here? >=20 > Because the reply to VHOST_USER_SET_DEVICE_STATE_FD is unsigned: >=20 > https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#front-end-m= essage-types >=20 > and I want to keep this consistent/untranslated. Ok. > > > + */ > > > +int migrate_target_read_header(int fd, struct migrate_meta *m) > > > +{ > > > + static struct migrate_data *d; > > > + union migrate_header h; > > > + > > > + if (read_all_buf(fd, &h, sizeof(h))) > > > + return errno; > > > + > > > + debug("Source magic: 0x%016" PRIx64 ", sizeof(void *): %u, version:= %u", > > > + h.magic, ntohl(h.voidp_size), ntohl(h.version)); > > > + > > > + for (d =3D data_versions; d->v !=3D ntohl(h.version); d++); > > > + if (!d->v) > > > + return ENOTSUP; =20 > >=20 > > This is too late. The loop doesn't check it, so you've already > > overrun the data_versions table if the version wasn't in there. >=20 > Ah, yes, I forgot the '&& d->v' part (see migrate_target()). >=20 > > Easier to use an ARRAY_SIZE() limit in the loop, I think. >=20 > I'd rather keep that as a one-liner, and NULL-terminate the arrays. >=20 > > > + m->v =3D d->v; > > > + > > > + if (h.magic =3D=3D MIGRATE_MAGIC) > > > + m->bswap =3D false; > > > + else if (h.magic =3D=3D MIGRATE_MAGIC_SWAPPED) > > > + m->bswap =3D true; > > > + else > > > + return ENOTSUP; > > > + > > > + if (ntohl(h.voidp_size) =3D=3D 4) > > > + m->source_64b =3D false; > > > + else if (ntohl(h.voidp_size) =3D=3D 8) > > > + m->source_64b =3D true; > > > + else > > > + return ENOTSUP; > > > + > > > + if (ntohl(h.time_t_size) =3D=3D 4) > > > + m->time_64b =3D false; > > > + else if (ntohl(h.time_t_size) =3D=3D 8) > > > + m->time_64b =3D true; > > > + else > > > + return ENOTSUP; > > > + > > > + m->flow_size =3D ntohl(h.flow_size); > > > + m->flow_sidx_size =3D ntohl(h.flow_sidx_size); > > > + > > > + return 0; > > > +} > > > + > > > +/** > > > + * migrate_target_pre() - Pre-migration tasks as target > > > + * @m: Migration metadata > > > + * > > > + * Return: 0 on success, error code on failure > > > + */ > > > +int migrate_target_pre(struct migrate_meta *m) > > > +{ > > > + struct migrate_target_handlers *th; > > > + struct migrate_handler *h; > > > + > > > + for (th =3D target_handlers; th->v !=3D m->v && th->v; th++); > > > + > > > + for (h =3D th->pre; h->fn; h++) { > > > + int rc; > > > + > > > + if ((rc =3D h->fn(m, h->data))) > > > + return rc; > > > + } > > > + > > > + return 0; > > > +} > > > + > > > +/** > > > + * migrate_target() - Perform migration as target: receive state fro= m hypervisor > > > + * @fd: Descriptor for state transfer > > > + * @m: Migration metadata > > > + * > > > + * Return: 0 on success, error code on failure > > > + * > > > + * #syscalls:vu readv > > > + */ > > > +int migrate_target(int fd, const struct migrate_meta *m) > > > +{ > > > + static struct migrate_data *d; > > > + unsigned cnt; > > > + int rc; > > > + > > > + for (d =3D data_versions; d->v !=3D m->v && d->v; d++); > > > + > > > + for (cnt =3D 0; d->sections[cnt + 1 /* skip header */].iov_len; cnt= ++); > > > + > > > + debug("Reading %u migration sections", cnt); > > > + rc =3D read_remainder(fd, d->sections + 1, cnt, 0); > > > + if (rc < 0) > > > + return errno; > > > + > > > + return 0; > > > +} > > > + > > > +/** > > > + * migrate_target_post() - Post-migration tasks as target > > > + * @m: Migration metadata > > > + */ > > > +void migrate_target_post(struct migrate_meta *m) > > > +{ > > > + struct migrate_target_handlers *th; > > > + struct migrate_handler *h; > > > + > > > + for (th =3D target_handlers; th->v !=3D m->v && th->v; th++); > > > + > > > + for (h =3D th->post; h->fn; h++) > > > + h->fn(m, h->data); > > > +} > > > diff --git a/migrate.h b/migrate.h > > > new file mode 100644 > > > index 0000000..5582f75 > > > --- /dev/null > > > +++ b/migrate.h > > > @@ -0,0 +1,90 @@ > > > +/* SPDX-License-Identifier: GPL-2.0-or-later > > > + * Copyright (c) 2025 Red Hat GmbH > > > + * Author: Stefano Brivio > > > + */ > > > +=20 > > > +#ifndef MIGRATE_H > > > +#define MIGRATE_H > > > + > > > +/** > > > + * struct migrate_meta - Migration metadata > > > + * @v: Chosen migration data version, host order > > > + * @bswap: Source has opposite endianness > > > + * @peer_64b: Source uses 64-bit void * > > > + * @time_64b: Source uses 64-bit time_t > > > + * @flow_size: Size of union flow in source > > > + * @flow_sidx_size: Size of struct flow_sidx in source > > > + */ > > > +struct migrate_meta { > > > + uint32_t v; > > > + bool bswap; > > > + bool source_64b; > > > + bool time_64b; > > > + size_t flow_size; > > > + size_t flow_sidx_size; > > > +}; > > > + > > > +/** > > > + * union migrate_header - Migration header from source > > > + * @magic: 0xB1BB1D1B0BB1D1B0, host order > > > + * @version: Source sends highest known, target aborts if unsupport= ed > > > + * @voidp_size: sizeof(void *), network order > > > + * @time_t_size: sizeof(time_t), network order > > > + * @flow_size: sizeof(union flow), network order > > > + * @flow_sidx_size: sizeof(struct flow_sidx_t), network order > > > + * @unused: Go figure > > > + */ > > > +union migrate_header { > > > + struct { > > > + uint64_t magic; > > > + uint32_t version; > > > + uint32_t voidp_size; > > > + uint32_t time_t_size; > > > + uint32_t flow_size; > > > + uint32_t flow_sidx_size; > > > + }; > > > + uint8_t unused[65536]; =20 > >=20 > > So, having looked at this, I no longer think padding the header to 64kiB > > is a good idea. The structure means we're basically stuck always > > having that chunky header. Instead, I think the header should be > > absolutely minimal: basically magic and version only. v1 (and maybe > > others) can add a "metadata" or whatever section for additional > > information like this they need. >=20 > The header is processed by the target in a separate, preliminary step, > though. >=20 > That's why I added metadata right in the header: if the target needs to > abort the migration because, say, the size of a flow entry is too big > to handle for a particular version, then we should know that before > migrate_target_pre(). Ah, yes, I missed that, we'd need a more complex design to do additional transfers and checks before making the target_pre callbacks. > As long as we check the version first, we can always shrink the header > later on. *thinks*.. I guess so, though it's kind of awkward; a future version would have to read the "header of the header", check the version, then if it's the old one, read the remainder of the 64kiB block. I still think we should clearly separate the part that we're committing to being in every future version (which I think should just be magic and version), from the stuff that's just v1. > But having 64 KiB reserved looks more robust because it's a > safe place to add this kind of metadata. >=20 > Note that 64 KiB is typically transferred in a single read/write > from/to the vhost-user back-end. Ok, but it also has to go over the qemu migration channel, which will often be a physical link, not a super-fast local/virtual one, and may be bandwidth capped as well. I'm not actually certain if 64kiB is likely to be a problem there, but it *is* large compared to the state blobs of most qemu devices (usually only a few hundred bytes). > > > +}; > > > + > > > +/** > > > + * struct migrate_data - Data sections for given source version > > > + * @v: Source version this applies to, host order > > > + * @sections: Array of data sections, NULL-terminated > > > + */ > > > +struct migrate_data { > > > + uint32_t v; > > > + struct iovec *sections; > > > +}; > > > + > > > +/** > > > + * struct migrate_handler - Function to handle a specific data secti= on > > > + * @fn: Function pointer taking pointer to data section > > > + * @data: Associated data section > > > + */ > > > +struct migrate_handler { > > > + int (*fn)(struct migrate_meta *m, void *data); > > > + void *data; > > > +}; > > > + > > > +/** > > > + * struct migrate_target_handlers - Versioned sets of migration targ= et handlers > > > + * @v: Source version this applies to, host order > > > + * @pre: Set of functions to execute in target before data copy > > > + * @post: Set of functions to execute in target after data copy > > > + */ > > > +struct migrate_target_handlers { > > > + uint32_t v; > > > + struct migrate_handler *pre; > > > + struct migrate_handler *post; > > > +}; > > > + > > > +int migrate_source_pre(struct migrate_meta *m); > > > +int migrate_source(int fd, const struct migrate_meta *m); > > > +void migrate_source_post(struct migrate_meta *m); > > > + > > > +int migrate_target_read_header(int fd, struct migrate_meta *m); > > > +int migrate_target_pre(struct migrate_meta *m); > > > +int migrate_target(int fd, const struct migrate_meta *m); > > > +void migrate_target_post(struct migrate_meta *m); > > > + > > > +#endif /* MIGRATE_H */ > > > diff --git a/passt.c b/passt.c > > > index b1c8ab6..184d4e5 100644 > > > --- a/passt.c > > > +++ b/passt.c > > > @@ -358,7 +358,7 @@ loop: > > > vu_kick_cb(c.vdev, ref, &now); > > > break; > > > case EPOLL_TYPE_VHOST_MIGRATION: > > > - vu_migrate(c.vdev, eventmask); > > > + vu_migrate(&c, eventmask); > > > break; > > > default: > > > /* Can't happen */ > > > diff --git a/vu_common.c b/vu_common.c > > > index f43d8ac..0c67bd0 100644 > > > --- a/vu_common.c > > > +++ b/vu_common.c > > > @@ -5,6 +5,7 @@ > > > * common_vu.c - vhost-user common UDP and TCP functions > > > */ > > > =20 > > > +#include > > > #include > > > #include > > > #include > > > @@ -17,6 +18,7 @@ > > > #include "vhost_user.h" > > > #include "pcap.h" > > > #include "vu_common.h" > > > +#include "migrate.h" > > > =20 > > > #define VU_MAX_TX_BUFFER_NB 2 > > > =20 > > > @@ -305,50 +307,88 @@ err: > > > } > > > =20 > > > /** > > > - * vu_migrate() - Send/receive passt insternal state to/from QEMU > > > - * @vdev: vhost-user device > > > + * vu_migrate_source() - Migration as source, send state to hypervis= or > > > + * @fd: File descriptor for state transfer > > > + * > > > + * Return: 0 on success, positive error code on failure > > > + */ > > > +static int vu_migrate_source(int fd) > > > +{ > > > + struct migrate_meta m; > > > + int rc; > > > + > > > + if ((rc =3D migrate_source_pre(&m))) { > > > + err("Source pre-migration failed: %s, abort", strerror_(rc)); > > > + return rc; > > > + } > > > + > > > + debug("Saving backend state"); > > > + > > > + rc =3D migrate_source(fd, &m); > > > + if (rc) > > > + err("Source migration failed: %s", strerror_(rc)); > > > + else > > > + migrate_source_post(&m); > > > + > > > + return rc; =20 > >=20 > > After a successful source migration shouldn't we exit, or at least > > quiesce ourselves so we don't accidentally mess with anything the > > target is now doing? >=20 > Maybe, yes. Pending TCP connections should be safe because with > TCP_REPAIR they're already quiesced, but we don't close listening > sockets (yet). >=20 > Perhaps a reasonable approach for the moment would be to declare a > single migrate_source_post handler logging a info() message and > exiting. Seems sensible for now. =20 > > > +} > > > + > > > +/** > > > + * vu_migrate_target() - Migration as target, receive state from hyp= ervisor > > > + * @fd: File descriptor for state transfer > > > + * > > > + * Return: 0 on success, positive error code on failure > > > + */ > > > +static int vu_migrate_target(int fd) > > > +{ > > > + struct migrate_meta m; > > > + int rc; > > > + > > > + rc =3D migrate_target_read_header(fd, &m); > > > + if (rc) { > > > + err("Migration header check failed: %s, abort", strerror_(rc)); > > > + return rc; > > > + } > > > + > > > + if ((rc =3D migrate_target_pre(&m))) { > > > + err("Target pre-migration failed: %s, abort", strerror_(rc)); > > > + return rc; > > > + } > > > + > > > + debug("Loading backend state"); > > > + > > > + rc =3D migrate_target(fd, &m); > > > + if (rc) > > > + err("Target migration failed: %s", strerror_(rc)); > > > + else > > > + migrate_target_post(&m); > > > + > > > + return rc; > > > +} > > > + > > > +/** > > > + * vu_migrate() - Send/receive passt internal state to/from QEMU > > > + * @c: Execution context > > > * @events: epoll events > > > */ > > > -void vu_migrate(struct vu_dev *vdev, uint32_t events) > > > +void vu_migrate(struct ctx *c, uint32_t events) > > > { > > > - int ret; > > > + struct vu_dev *vdev =3D c->vdev; > > > + int rc =3D EIO; > > > =20 > > > - /* TODO: collect/set passt internal state > > > - * and use vdev->device_state_fd to send/receive it > > > - */ > > > debug("vu_migrate fd %d events %x", vdev->device_state_fd, events); > > > - if (events & EPOLLOUT) { > > > - debug("Saving backend state"); > > > - > > > - /* send some stuff */ > > > - ret =3D write(vdev->device_state_fd, "PASST", 6); > > > - /* value to be returned by VHOST_USER_CHECK_DEVICE_STATE */ > > > - vdev->device_state_result =3D ret =3D=3D -1 ? -1 : 0; > > > - /* Closing the file descriptor signals the end of transfer */ > > > - epoll_ctl(vdev->context->epollfd, EPOLL_CTL_DEL, > > > - vdev->device_state_fd, NULL); > > > - close(vdev->device_state_fd); > > > - vdev->device_state_fd =3D -1; > > > - } else if (events & EPOLLIN) { > > > - char buf[6]; > > > - > > > - debug("Loading backend state"); > > > - /* read some stuff */ > > > - ret =3D read(vdev->device_state_fd, buf, sizeof(buf)); > > > - /* value to be returned by VHOST_USER_CHECK_DEVICE_STATE */ > > > - if (ret !=3D sizeof(buf)) { > > > - vdev->device_state_result =3D -1; > > > - } else { > > > - ret =3D strncmp(buf, "PASST", sizeof(buf)); > > > - vdev->device_state_result =3D ret =3D=3D 0 ? 0 : -1; > > > - } > > > - } else if (events & EPOLLHUP) { > > > - debug("Closing migration channel"); > > > - > > > - /* The end of file signals the end of the transfer. */ > > > - epoll_ctl(vdev->context->epollfd, EPOLL_CTL_DEL, > > > - vdev->device_state_fd, NULL); > > > - close(vdev->device_state_fd); > > > - vdev->device_state_fd =3D -1; > > > - } > > > + > > > + if (events & EPOLLOUT) > > > + rc =3D vu_migrate_source(vdev->device_state_fd); > > > + else if (events & EPOLLIN) > > > + rc =3D vu_migrate_target(vdev->device_state_fd); > > > + > > > + /* EPOLLHUP without EPOLLIN/EPOLLOUT, or EPOLLERR? Migration failed= */ > > > + > > > + vdev->device_state_result =3D rc; > > > + > > > + epoll_ctl(c->epollfd, EPOLL_CTL_DEL, vdev->device_state_fd, NULL); > > > + debug("Closing migration channel"); > > > + close(vdev->device_state_fd); > > > + vdev->device_state_fd =3D -1; > > > } > > > diff --git a/vu_common.h b/vu_common.h > > > index d56c021..69c4006 100644 > > > --- a/vu_common.h > > > +++ b/vu_common.h > > > @@ -57,5 +57,5 @@ void vu_flush(const struct vu_dev *vdev, struct vu_= virtq *vq, > > > void vu_kick_cb(struct vu_dev *vdev, union epoll_ref ref, > > > const struct timespec *now); > > > int vu_send_single(const struct ctx *c, const void *buf, size_t size= ); > > > -void vu_migrate(struct vu_dev *vdev, uint32_t events); > > > +void vu_migrate(struct ctx *c, uint32_t events); > > > #endif /* VU_COMMON_H */ =20 > >=20 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --K4gxDP4fQnRuDobU Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmeZgYkACgkQzQJF27ox 2Ge8oBAAgvm4g8+fI1E+EnTKV0iuZ4brj5qqkQkfr1y+34Zx0IZXI2Kvnt08Qin0 mqSgWLRE2jmFDgwlYFrGKCS9JiHRmhFGMdqXQ32GxkC96v/RhHybNoRCOqO4qBAB IakI9NnnfYgMLaP6/yFXuw74O+52QvVVtEiylJn8IDgpAw6ZzCcunUVOjOchsWRq zst8bq5tpVJrmIiyAC2AmiKVpY99OLJAwvXt0HrkU23SZuiOhvceUotjEJf/yZOg DvjiPjJK+mjx/dBRT6YTdOUb3/w3LEJNs1v+FBSX8KHJu6P2IKusVttWxpoJ/Fr9 p5OgScGYXh9YuEWwM8iME2KP8jHo7E8slEZPc8c3IgzMkf8c6ljj0OWQOz2KOBgG aLDTuhE3pBmCmB/YV1LgAOMKiCq1MohGsi6Fc5bsVoZUyz2cWmSlSJclJWDrSQrl I17Z0LyUVG6xfWaQ1+qJGhPHvcGeTjGGJs6pttY8BfmzP05zRd/3uqgy5VX7Jr/p 77ep+rmlIGfcIt7IGYH0ipFIKebfxYjqlTn37h1SMc+ncDdyutOuCRn0GiApVGK9 4RJiTCfL3FE2tME7z4arhhfhrPWt3mAyp8TjS7J1MRVPhuDJPHZpywuUvQPbeAfS q6tF8dNLrc2Ogczcw7KFZuUilJFe99PtVEtJK4MiAnBL+491T9g= =nIJZ -----END PGP SIGNATURE----- --K4gxDP4fQnRuDobU--