From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202502 header.b=kbb1RXWL; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id D2D5A5A061C for ; Wed, 05 Feb 2025 03:31:56 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202502; t=1738722703; bh=GSQZD3yRtLzFFMvMnnhre56AMC/L7GeSsTYN5+MQPrk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=kbb1RXWL/mg7GWFu2kGceAorfTpMPE15K0zAz5HUhM6G88AOnLiDEIwUooYHAMgEq qngGhxsv69GPOzjoNRpnSClqP1U8N03RnrWHgt/iurPNf91PMHOaffzpl5UYZFRQSL WAEHHeQfFMYBVvFivIQKjfxHbFm5IQYwpHm4VyRx88FDXh6vJImYub/Wpwglj+lM1j sIGCFxczBR61x7l84sxeUAGA0yZOtfmBbMvWexvTyd0ik8xGsoMqExYV5F1A9frlvQ PBRpF4aQjocDJO6xOAmE/IL5qLmbhXC/OvTqzdJ09WUz/tihdMrV1DAyfEtFLopw6y 2wTylQtfrBXkw== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4Ynkl31Smvz4wyV; Wed, 5 Feb 2025 13:31:43 +1100 (AEDT) Date: Wed, 5 Feb 2025 12:44:40 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v5 1/6] Introduce facilities for guest migration on top of vhost-user infrastructure Message-ID: References: <20250205003904.2797491-1-sbrivio@redhat.com> <20250205003904.2797491-2-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="5GlrhXnhoTLhTa0v" Content-Disposition: inline In-Reply-To: <20250205003904.2797491-2-sbrivio@redhat.com> Message-ID-Hash: F7XKM2A57O2MZCKMZCATKF7GO4PLWBEM X-Message-ID-Hash: F7XKM2A57O2MZCKMZCATKF7GO4PLWBEM X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Laurent Vivier X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --5GlrhXnhoTLhTa0v Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Feb 05, 2025 at 01:38:59AM +0100, Stefano Brivio wrote: > Add migration facilities based on top of the current vhost-user > infrastructure, moving vu_migrate() to migrate.c. >=20 > Versioned migration stages define function pointers to be called on > source or target, or data sections that need to be transferred. >=20 > The migration header consists of a magic number and a version > identifier. >=20 > Co-authored-by: David Gibson Given this, it should also have my S-o-b, Signed-off-by: David Gibson And, given that we already have an awkward co-authorship situation, it probably makes sense to fold patches 2 & 3 into this one. > Signed-off-by: Stefano Brivio > --- > Makefile | 12 +-- > migrate.c | 210 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > migrate.h | 51 +++++++++++++ > passt.c | 2 +- > util.h | 26 +++++++ > vu_common.c | 58 +++++---------- > vu_common.h | 2 +- > 7 files changed, 315 insertions(+), 46 deletions(-) > create mode 100644 migrate.c > create mode 100644 migrate.h >=20 > diff --git a/Makefile b/Makefile > index d3d4b78..be89b07 100644 > --- a/Makefile > +++ b/Makefile > @@ -38,8 +38,8 @@ FLAGS +=3D -DDUAL_STACK_SOCKETS=3D$(DUAL_STACK_SOCKETS) > =20 > PASST_SRCS =3D arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c fwd= =2Ec \ > icmp.c igmp.c inany.c iov.c ip.c isolation.c lineread.c log.c mld.c \ > - ndp.c netlink.c packet.c passt.c pasta.c pcap.c pif.c tap.c tcp.c \ > - tcp_buf.c tcp_splice.c tcp_vu.c udp.c udp_flow.c udp_vu.c util.c \ > + ndp.c netlink.c migrate.c packet.c passt.c pasta.c pcap.c pif.c tap.c \ > + tcp.c tcp_buf.c tcp_splice.c tcp_vu.c udp.c udp_flow.c udp_vu.c util.c \ > vhost_user.c virtio.c vu_common.c > QRAP_SRCS =3D qrap.c > PASST_REPAIR_SRCS =3D passt-repair.c > @@ -49,10 +49,10 @@ MANPAGES =3D passt.1 pasta.1 qrap.1 passt-repair.1 > =20 > PASST_HEADERS =3D arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h flow.h = fwd.h \ > flow_table.h icmp.h icmp_flow.h inany.h iov.h ip.h isolation.h \ > - lineread.h log.h ndp.h netlink.h packet.h passt.h pasta.h pcap.h pif.h \ > - siphash.h tap.h tcp.h tcp_buf.h tcp_conn.h tcp_internal.h tcp_splice.h \ > - tcp_vu.h udp.h udp_flow.h udp_internal.h udp_vu.h util.h vhost_user.h \ > - virtio.h vu_common.h > + lineread.h log.h migrate.h ndp.h netlink.h packet.h passt.h pasta.h \ > + pcap.h pif.h siphash.h tap.h tcp.h tcp_buf.h tcp_conn.h tcp_internal.h \ > + tcp_splice.h tcp_vu.h udp.h udp_flow.h udp_internal.h udp_vu.h util.h \ > + vhost_user.h virtio.h vu_common.h > HEADERS =3D $(PASST_HEADERS) seccomp.h > =20 > C :=3D \#include \nint main(){int a=3Dgetrandom(0, 0, 0);} > diff --git a/migrate.c b/migrate.c > new file mode 100644 > index 0000000..a7031f9 > --- /dev/null > +++ b/migrate.c > @@ -0,0 +1,210 @@ > +// SPDX-License-Identifier: GPL-2.0-or-later > + > +/* PASST - Plug A Simple Socket Transport > + * for qemu/UNIX domain socket mode > + * > + * PASTA - Pack A Subtle Tap Abstraction > + * for network namespace/tap device mode > + * > + * migrate.c - Migration sections, layout, and routines > + * > + * Copyright (c) 2025 Red Hat GmbH > + * Author: Stefano Brivio > + */ > + > +#include > +#include > + > +#include "util.h" > +#include "ip.h" > +#include "passt.h" > +#include "inany.h" > +#include "flow.h" > +#include "flow_table.h" > + > +#include "migrate.h" > + > +/* Current version of migration data */ > +#define MIGRATE_VERSION 1 > + > +/* Magic identifier for migration data */ > +#define MIGRATE_MAGIC 0xB1BB1D1B0BB1D1B0 > + > +/* Migration header to send from source */ > +static struct migrate_header header =3D { > + .magic =3D htonll_constant(MIGRATE_MAGIC), > + .version =3D htonl_constant(MIGRATE_VERSION), > +}; > + > +/** > + * migrate_send_block() - Migration stage handler to send verbatim data > + * @c: Execution context > + * @stage: Migration stage > + * @fd: Migration fd > + * > + * Sends the buffer in @stage->iov over the migration channel. > + */ > +__attribute__((__unused__)) > +static int migrate_send_block(struct ctx *c, > + const struct migrate_stage *stage, int fd) > +{ > + (void)c; > + > + if (write_remainder(fd, &stage->iov, 1, 0) < 0) > + return errno; > + > + return 0; > +} > + > +/** > + * migrate_recv_block() - Migration stage handler to receive verbatim da= ta > + * @c: Execution context > + * @stage: Migration stage > + * @fd: Migration fd > + * > + * Reads the buffer in @stage->iov from the migration channel. > + * > + * #syscalls:vu readv > + */ > +__attribute__((__unused__)) > +static int migrate_recv_block(struct ctx *c, > + const struct migrate_stage *stage, int fd) > +{ > + (void)c; > + > + if (read_remainder(fd, &stage->iov, 1, 0) < 0) > + return errno; > + > + return 0; > +} > + > +#define DATA_STAGE(v) \ > + { \ > + .name =3D #v, \ > + .source =3D migrate_send_block, \ > + .target =3D migrate_recv_block, \ > + .iov =3D { &(v), sizeof(v) }, \ > + } > + > +/* Stages for version 1 */ > +static const struct migrate_stage stages_v1[] =3D { > + { > + .name =3D "flow pre", > + .target =3D NULL, > + }, > + { > + .name =3D "flow post", > + .source =3D NULL, > + }, > + { 0 }, > +}; > + > +/* Set of data versions */ > +static const struct migrate_version versions[] =3D { > + { > + 1, stages_v1, > + }, > + { 0 }, > +}; > + > +/** > + * migrate_source() - Migration as source, send state to hypervisor > + * @c: Execution context > + * @fd: File descriptor for state transfer > + * > + * Return: 0 on success, positive error code on failure > + */ > +int migrate_source(struct ctx *c, int fd) > +{ > + const struct migrate_version *v =3D versions + ARRAY_SIZE(versions) - 1; > + const struct migrate_stage *s; > + int ret; > + > + ret =3D write_all_buf(fd, &header, sizeof(header)); > + if (ret) { > + err("Can't send migration header: %s, abort", strerror_(ret)); > + return ret; > + } > + > + for (s =3D v->s; *s->name; s++) { > + if (!s->source) > + continue; > + > + debug("Source side migration: %s", s->name); > + > + if ((ret =3D s->source(c, s, fd))) { > + err("Source migration stage %s: %s, abort", s->name, > + strerror_(ret)); > + return ret; > + } > + } > + > + return 0; > +} > + > +/** > + * migrate_target_read_header() - Read header in target > + * @fd: Descriptor for state transfer > + * > + * Return: version number on success, 0 on failure with errno set > + */ > +static uint32_t migrate_target_read_header(int fd) > +{ > + struct migrate_header h; > + > + if (read_all_buf(fd, &h, sizeof(h))) > + return 0; > + > + debug("Source magic: 0x%016" PRIx64 ", version: %u", > + be64toh(h.magic), ntohl_constant(h.version)); > + > + if (ntohll_constant(h.magic) !=3D MIGRATE_MAGIC || !ntohl(h.version)) { > + errno =3D EINVAL; > + return 0; > + } > + > + return ntohl(h.version); > +} > + > +/** > + * migrate_target() - Migration as target, receive state from hypervisor > + * @c: Execution context > + * @fd: File descriptor for state transfer > + * > + * Return: 0 on success, positive error code on failure > + */ > +int migrate_target(struct ctx *c, int fd) > +{ > + const struct migrate_version *v; > + const struct migrate_stage *s; > + uint32_t id; > + int ret; > + > + id =3D migrate_target_read_header(fd); > + if (!id) { > + ret =3D errno; > + err("Migration header check failed: %s, abort", strerror_(ret)); > + return ret; > + } > + > + for (v =3D versions; v->id && v->id =3D=3D id; v++); > + if (!v->id) { > + err("Unsupported version: %u", id); > + return -ENOTSUP; > + } > + > + for (s =3D v->s; *s->name; s++) { > + if (!s->target) > + continue; > + > + debug("Target side migration: %s", s->name); > + > + if ((ret =3D s->target(c, s, fd))) { > + err("Target migration stage %s: %s, abort", s->name, > + strerror_(ret)); > + return ret; > + } > + } > + > + return 0; > +} > diff --git a/migrate.h b/migrate.h > new file mode 100644 > index 0000000..3093b6e > --- /dev/null > +++ b/migrate.h > @@ -0,0 +1,51 @@ > +/* SPDX-License-Identifier: GPL-2.0-or-later > + * Copyright (c) 2025 Red Hat GmbH > + * Author: Stefano Brivio > + */ > + > +#ifndef MIGRATE_H > +#define MIGRATE_H > + > +/** > + * struct migrate_header - Migration header from source > + * @magic: 0xB1BB1D1B0BB1D1B0, network order > + * @version: Highest known, target aborts if too old, network order > + */ > +struct migrate_header { > + uint64_t magic; > + uint32_t version; > +} __attribute__((packed)); > + > +/** > + * struct migrate_stage - Callbacks and parameters for one stage of migr= ation > + * @name: Stage name (for debugging) > + * @source: Callback to implement this stage on the source > + * @target: Callback to implement this stage on the target > + * @iov: Optional data section to transfer > + */ > +struct migrate_stage { > + const char *name; > + int (*source)(struct ctx *c, > + const struct migrate_stage *stage, int fd); > + int (*target)(struct ctx *c, > + const struct migrate_stage *stage, int fd); > + > + /* FIXME: rollback callbacks? */ > + > + struct iovec iov; > +}; > + > +/** > + * struct migrate_version - Stages for a particular protocol version > + * @id: Version number, host order > + * @s: Ordered array of stages, NULL-terminated > + */ > +struct migrate_version { > + uint32_t id; > + const struct migrate_stage *s; > +}; > + > +int migrate_source(struct ctx *c, int fd); > +int migrate_target(struct ctx *c, int fd); > + > +#endif /* MIGRATE_H */ > diff --git a/passt.c b/passt.c > index b1c8ab6..184d4e5 100644 > --- a/passt.c > +++ b/passt.c > @@ -358,7 +358,7 @@ loop: > vu_kick_cb(c.vdev, ref, &now); > break; > case EPOLL_TYPE_VHOST_MIGRATION: > - vu_migrate(c.vdev, eventmask); > + vu_migrate(&c, eventmask); > break; > default: > /* Can't happen */ > diff --git a/util.h b/util.h > index 23b165c..1aed629 100644 > --- a/util.h > +++ b/util.h > @@ -122,12 +122,38 @@ > (((x) & 0x0000ff00) << 8) | (((x) & 0x000000ff) << 24)) > #endif > =20 > +#ifndef __bswap_constant_32 > +#define __bswap_constant_32(x) \ > + ((((x) & 0xff000000) >> 24) | (((x) & 0x00ff0000) >> 8) | \ > + (((x) & 0x0000ff00) << 8) | (((x) & 0x000000ff) << 24)) > +#endif > + > +#ifndef __bswap_constant_64 > +#define __bswap_constant_64(x) \ > + ((((x) & 0xff00000000000000ULL) >> 56) | \ > + (((x) & 0x00ff000000000000ULL) >> 40) | \ > + (((x) & 0x0000ff0000000000ULL) >> 24) | \ > + (((x) & 0x000000ff00000000ULL) >> 8) | \ > + (((x) & 0x00000000ff000000ULL) << 8) | \ > + (((x) & 0x0000000000ff0000ULL) << 24) | \ > + (((x) & 0x000000000000ff00ULL) << 40) | \ > + (((x) & 0x00000000000000ffULL) << 56)) > +#endif > + > #if __BYTE_ORDER =3D=3D __BIG_ENDIAN > #define htons_constant(x) (x) > #define htonl_constant(x) (x) > +#define htonll_constant(x) (x) > +#define ntohs_constant(x) (x) > +#define ntohl_constant(x) (x) > +#define ntohll_constant(x) (x) > #else > #define htons_constant(x) (__bswap_constant_16(x)) > #define htonl_constant(x) (__bswap_constant_32(x)) > +#define htonll_constant(x) (__bswap_constant_64(x)) > +#define ntohs_constant(x) (__bswap_constant_16(x)) > +#define ntohl_constant(x) (__bswap_constant_32(x)) > +#define ntohll_constant(x) (__bswap_constant_64(x)) > #endif > =20 > /** > diff --git a/vu_common.c b/vu_common.c > index ab04d31..3d41824 100644 > --- a/vu_common.c > +++ b/vu_common.c > @@ -5,6 +5,7 @@ > * common_vu.c - vhost-user common UDP and TCP functions > */ > =20 > +#include > #include > #include > #include > @@ -17,6 +18,7 @@ > #include "vhost_user.h" > #include "pcap.h" > #include "vu_common.h" > +#include "migrate.h" > =20 > #define VU_MAX_TX_BUFFER_NB 2 > =20 > @@ -305,48 +307,28 @@ err: > } > =20 > /** > - * vu_migrate() - Send/receive passt insternal state to/from QEMU > - * @vdev: vhost-user device > + * vu_migrate() - Send/receive passt internal state to/from QEMU > + * @c: Execution context > * @events: epoll events > */ > -void vu_migrate(struct vu_dev *vdev, uint32_t events) > +void vu_migrate(struct ctx *c, uint32_t events) > { > - int ret; > + struct vu_dev *vdev =3D c->vdev; > + int rc =3D EIO; > =20 > - /* TODO: collect/set passt internal state > - * and use vdev->device_state_fd to send/receive it > - */ > debug("vu_migrate fd %d events %x", vdev->device_state_fd, events); > - if (events & EPOLLOUT) { > - debug("Saving backend state"); > - > - /* send some stuff */ > - ret =3D write(vdev->device_state_fd, "PASST", 6); > - /* value to be returned by VHOST_USER_CHECK_DEVICE_STATE */ > - vdev->device_state_result =3D ret =3D=3D -1 ? -1 : 0; > - /* Closing the file descriptor signals the end of transfer */ > - epoll_del(vdev->context, vdev->device_state_fd); > - close(vdev->device_state_fd); > - vdev->device_state_fd =3D -1; > - } else if (events & EPOLLIN) { > - char buf[6]; > - > - debug("Loading backend state"); > - /* read some stuff */ > - ret =3D read(vdev->device_state_fd, buf, sizeof(buf)); > - /* value to be returned by VHOST_USER_CHECK_DEVICE_STATE */ > - if (ret !=3D sizeof(buf)) { > - vdev->device_state_result =3D -1; > - } else { > - ret =3D strncmp(buf, "PASST", sizeof(buf)); > - vdev->device_state_result =3D ret =3D=3D 0 ? 0 : -1; > - } > - } else if (events & EPOLLHUP) { > - debug("Closing migration channel"); > =20 > - /* The end of file signals the end of the transfer. */ > - epoll_del(vdev->context, vdev->device_state_fd); > - close(vdev->device_state_fd); > - vdev->device_state_fd =3D -1; > - } > + if (events & EPOLLOUT) > + rc =3D migrate_source(c, vdev->device_state_fd); > + else if (events & EPOLLIN) > + rc =3D migrate_target(c, vdev->device_state_fd); > + > + /* EPOLLHUP without EPOLLIN/EPOLLOUT, or EPOLLERR? Migration failed */ > + > + vdev->device_state_result =3D rc; > + > + epoll_ctl(c->epollfd, EPOLL_CTL_DEL, vdev->device_state_fd, NULL); > + debug("Closing migration channel"); > + close(vdev->device_state_fd); > + vdev->device_state_fd =3D -1; > } > diff --git a/vu_common.h b/vu_common.h > index d56c021..69c4006 100644 > --- a/vu_common.h > +++ b/vu_common.h > @@ -57,5 +57,5 @@ void vu_flush(const struct vu_dev *vdev, struct vu_virt= q *vq, > void vu_kick_cb(struct vu_dev *vdev, union epoll_ref ref, > const struct timespec *now); > int vu_send_single(const struct ctx *c, const void *buf, size_t size); > -void vu_migrate(struct vu_dev *vdev, uint32_t events); > +void vu_migrate(struct ctx *c, uint32_t events); > #endif /* VU_COMMON_H */ --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --5GlrhXnhoTLhTa0v Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmeiwocACgkQzQJF27ox 2GeqYA//YG5tRluAmUp04phy/YfwVjIk+jEv9DpW9d23rGP4iVLiMA971/EnTsMP 4fiT8J4WBB9LJr4F6reWnt3f0Yt5/Dw1PlL7WZMqFCkJGanCgHHdqiXWWA0os7Mr kxsjrXg0PWyYDISc8HQfWnQ7kGur1NI5szNCxCdumXjbtX+x8eALKiCE7WWSybV6 smzwKj0Rold/zXo3B6pdfkT+84CmkOQn+PiHnz2zgAD6UkQd/ifjMv6KeydhIzen nb1krBYnY3hw7RAARaVlCCMo3qR/VO939QG5TmW8soGvE+F1uNxteJm33eX9Chw/ aRZ4co0koafUu7gkd8/kliFRzLQ9YpOyCkccpGnUG3KoMPM8uGdrrGLuZ1SM0A6A G/f1ct514/IxN8ViO2F+CmOBUj//y3FUqSrXMBSJXoqOMYhnLmQ5mPosXvBSVAff FvPBM2G6Rl5zrzTfYRz113hfBYK/3vpiWotvh+DD1QMOayB4TP4HFt3m7yjLZDmS a74jRhtpurd9JBlc7HQJX1itLXbSDO9Br8xqG+GwUDs/tO9tRAFwiHx0e4GSsGWU 5ubqEwo8+HJpFehZH7ifaCKwOy5pD9tbt8MBXxJxthqse8xAKVMVbhv/frkjH+pS MfZAGX1A6ZjzpMSFWydXfhGk32usydniKPlOHCJYpWbbyL5WJsA= =/ig/ -----END PGP SIGNATURE----- --5GlrhXnhoTLhTa0v--