From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 58EC95A0274 for ; Mon, 12 Feb 2024 04:00:57 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1707706854; bh=r5R4UQYp+5Zx5UufukRPHr5Qfo1vWrn4bATr6jnAWn8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=WO46RJ1Im1hpv1CeyvFPFlvJASeqJxfpATecr8XieAvnlvhGqaF4zk3tuelY+xJul ltgD7FtF1vjlP0TDGFKAarogoBmk1wzFlRrfFHkY+ewGECBK2ysJ0ZtOqRAsuLS5QL a00AdXLHCRnx4VbYUg6WM4dr1yMHemf9OsGTVCNu0a4jo14NL01kykYDXyUFiuSB36 4aNG1PHlTMgSU1lYdMgWLr82iUqE2GaIzpGD8uwQ6WFvJ6d8qWQpqFAYl6HD0ERA5Y 5WcH5rck72WOGtgO1CIesUgYzAjCNtzKYQ8t7v3FqSbD7L2fnv18y6Os9ixw0kfH0l 8d33rrpOjaDzA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4TY8NQ0hKmz4wyY; Mon, 12 Feb 2024 14:00:54 +1100 (AEDT) Date: Mon, 12 Feb 2024 13:49:52 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 20/24] vhost-user: add vhost-user Message-ID: References: <20240202141151.3762941-1-lvivier@redhat.com> <20240202141151.3762941-21-lvivier@redhat.com> <20240212001901.66ada100@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="W6CzWx2n32cCTQ1k" Content-Disposition: inline In-Reply-To: <20240212001901.66ada100@elisabeth> Message-ID-Hash: US3LPTLS7CMF53Y3LKYIU3GUQ6P4PVE5 X-Message-ID-Hash: US3LPTLS7CMF53Y3LKYIU3GUQ6P4PVE5 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Laurent Vivier , passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --W6CzWx2n32cCTQ1k Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Feb 12, 2024 at 12:19:01AM +0100, Stefano Brivio wrote: > On Fri, 2 Feb 2024 15:11:47 +0100 > Laurent Vivier wrote: >=20 > > add virtio and vhost-user functions to connect with QEMU. > >=20 > > $ ./passt --vhost-user > >=20 > > and > >=20 > > # qemu-system-x86_64 ... -m 4G \ > > -object memory-backend-memfd,id=3Dmemfd0,share=3Don,size=3D4G \ > > -numa node,memdev=3Dmemfd0 \ > > -chardev socket,id=3Dchr0,path=3D/tmp/passt_1.socket \ > > -netdev vhost-user,id=3Dnetdev0,chardev=3Dchr0 \ > > -device virtio-net,mac=3D9a:2b:2c:2d:2e:2f,netdev=3Dnetdev0 \ > > ... > >=20 > > Signed-off-by: Laurent Vivier > > --- > > conf.c | 20 ++++++++++++++-- > > passt.c | 7 ++++++ > > passt.h | 1 + > > tap.c | 73 ++++++++++++++++++++++++++++++++++++++++++--------------- > > tcp.c | 8 +++++-- > > udp.c | 6 +++-- > > 6 files changed, 90 insertions(+), 25 deletions(-) >=20 > This would need a matching change in the man page, passt.1, at least > documenting the --vhost-user option and adjusting descriptions about > the guest communication interface (look for "UNIX domain" there). >=20 > > diff --git a/conf.c b/conf.c > > index b6a2a1f0fdc3..40aa9519f8a6 100644 > > --- a/conf.c > > +++ b/conf.c > > @@ -44,6 +44,7 @@ > > #include "lineread.h" > > #include "isolation.h" > > #include "log.h" > > +#include "vhost_user.h" > > =20 > > /** > > * next_chunk - Return the next piece of a string delimited by a chara= cter > > @@ -735,9 +736,12 @@ static void print_usage(const char *name, int stat= us) > > info( " -I, --ns-ifname NAME namespace interface name"); > > info( " default: same interface name as external one"); > > } else { > > - info( " -s, --socket PATH UNIX domain socket path"); > > + info( " -s, --socket, --socket-path PATH UNIX domain socket path"= ); >=20 > I don't get the point of --socket-path. It's handled just like -s > anyway, right? Why can't it just be -s / --socket? I believe the issue is that there's an expected command line interface for the vhost server, which uses --socket-path for, well, the socket path. Hence adding an alias to the existing passt option. > > info( " default: probe free path starting from " > > UNIX_SOCK_PATH, 1); > > + info( " --vhost-user Enable vhost-user mode"); > > + info( " UNIX domain socket is provided by -s option"); > > + info( " --print-capabilities print back-end capabilities in JSON = format"); >=20 > Instead of introducing a new option, couldn't we have these printed > unconditionally with debug()? I guess it's debug-level stuff anyway. Likewise, I think this option is expected by the thing which starts the vhost server. > > } > > =20 > > info( " -F, --fd FD Use FD as pre-opened connected socket"); > > @@ -1123,6 +1127,7 @@ void conf(struct ctx *c, int argc, char **argv) > > {"help", no_argument, NULL, 'h' }, > > {"socket", required_argument, NULL, 's' }, > > {"fd", required_argument, NULL, 'F' }, > > + {"socket-path", required_argument, NULL, 's' }, /* vhost-user manda= tory */ > > {"ns-ifname", required_argument, NULL, 'I' }, > > {"pcap", required_argument, NULL, 'p' }, > > {"pid", required_argument, NULL, 'P' }, > > @@ -1169,6 +1174,8 @@ void conf(struct ctx *c, int argc, char **argv) > > {"config-net", no_argument, NULL, 17 }, > > {"no-copy-routes", no_argument, NULL, 18 }, > > {"no-copy-addrs", no_argument, NULL, 19 }, > > + {"vhost-user", no_argument, NULL, 20 }, > > + {"print-capabilities", no_argument, NULL, 21 }, /* vhost-user manda= tory */ > > { 0 }, > > }; > > char userns[PATH_MAX] =3D { 0 }, netns[PATH_MAX] =3D { 0 }; > > @@ -1328,7 +1335,6 @@ void conf(struct ctx *c, int argc, char **argv) > > sizeof(c->ip6.ifname_out), "%s", optarg); > > if (ret <=3D 0 || ret >=3D (int)sizeof(c->ip6.ifname_out)) > > die("Invalid interface name: %s", optarg); > > - > > break; > > case 17: > > if (c->mode !=3D MODE_PASTA) > > @@ -1350,6 +1356,16 @@ void conf(struct ctx *c, int argc, char **argv) > > warn("--no-copy-addrs will be dropped soon"); > > c->no_copy_addrs =3D copy_addrs_opt =3D true; > > break; > > + case 20: > > + if (c->mode =3D=3D MODE_PASTA) { >=20 > And, if c->mode =3D=3D MODE_VU, the option is redundant (see e.g. the han= dling > for --one-off). >=20 > > + err("--vhost-user is for passt mode only"); > > + usage(argv[0]); > > + } > > + c->mode =3D MODE_VU; > > + break; > > + case 21: > > + vu_print_capabilities(); > > + break; > > case 'd': > > if (c->debug) > > die("Multiple --debug options given"); > > diff --git a/passt.c b/passt.c > > index 95034d73381f..952aded12848 100644 > > --- a/passt.c > > +++ b/passt.c > > @@ -282,6 +282,7 @@ int main(int argc, char **argv) > > quit_fd =3D pasta_netns_quit_init(&c); > > =20 > > tap_sock_init(&c); > > + vu_init(&c); > > =20 > > secret_init(&c); > > =20 > > @@ -399,6 +400,12 @@ loop: > > case EPOLL_TYPE_ICMPV6: > > icmp_sock_handler(&c, AF_INET6, ref); > > break; > > + case EPOLL_TYPE_VHOST_CMD: > > + tap_handler_vu(&c, eventmask); > > + break; > > + case EPOLL_TYPE_VHOST_KICK: > > + vu_kick_cb(&c, ref); > > + break; > > default: > > /* Can't happen */ > > ASSERT(0); > > diff --git a/passt.h b/passt.h > > index 6ed1d0b19e82..4e0100d51a4d 100644 > > --- a/passt.h > > +++ b/passt.h > > @@ -141,6 +141,7 @@ struct fqdn { > > enum passt_modes { > > MODE_PASST, > > MODE_PASTA, > > + MODE_VU, > > }; > > =20 > > /** > > diff --git a/tap.c b/tap.c > > index 936206e53637..c2a917bc00ca 100644 > > --- a/tap.c > > +++ b/tap.c > > @@ -57,6 +57,7 @@ > > #include "packet.h" > > #include "tap.h" > > #include "log.h" > > +#include "vhost_user.h" > > =20 > > /* IPv4 (plus ARP) and IPv6 message batches from tap/guest to IP handl= ers */ > > static PACKET_POOL_NOINIT(pool_tap4, TAP_MSGS, pkt_buf); > > @@ -75,19 +76,22 @@ static PACKET_POOL_NOINIT(pool_tap6, TAP_MSGS, pkt_= buf); > > */ > > int tap_send(const struct ctx *c, const void *data, size_t len) > > { > > - pcap(data, len); > > + int flags =3D MSG_NOSIGNAL | MSG_DONTWAIT; > > + uint32_t vnet_len =3D htonl(len); > > =20 > > - if (c->mode =3D=3D MODE_PASST) { > > - int flags =3D MSG_NOSIGNAL | MSG_DONTWAIT; > > - uint32_t vnet_len =3D htonl(len); > > + pcap(data, len); > > =20 > > + switch (c->mode) { > > + case MODE_PASST: > > if (send(c->fd_tap, &vnet_len, 4, flags) < 0) > > return -1; > > - >=20 > Unrelated change. >=20 > > return send(c->fd_tap, data, len, flags); > > + case MODE_PASTA: > > + return write(c->fd_tap, (char *)data, len); > > + case MODE_VU: > > + return vu_send(c, data, len); > > } > > - > > - return write(c->fd_tap, (char *)data, len); > > + return 0; > > } > > =20 > > /** > > @@ -428,10 +432,20 @@ size_t tap_send_frames(const struct ctx *c, const= struct iovec *iov, size_t n) > > if (!n) > > return 0; > > =20 > > - if (c->mode =3D=3D MODE_PASTA) > > + switch (c->mode) { > > + case MODE_PASTA: > > m =3D tap_send_frames_pasta(c, iov, n); > > - else > > + break; > > + case MODE_PASST: > > m =3D tap_send_frames_passt(c, iov, n); > > + break; > > + case MODE_VU: > > + m =3D tap_send_frames_vu(c, iov, n); > > + break; > > + default: > > + m =3D 0; > > + break; > > + } > > =20 > > if (m < n) > > debug("tap: failed to send %zu frames of %zu", n - m, n); > > @@ -1149,11 +1163,17 @@ static void tap_sock_unix_init(struct ctx *c) > > ev.data.u64 =3D ref.u64; > > epoll_ctl(c->epollfd, EPOLL_CTL_ADD, c->fd_tap_listen, &ev); > > =20 > > - info("You can now start qemu (>=3D 7.2, with commit 13c6be96618c):"); > > - info(" kvm ... -device virtio-net-pci,netdev=3Ds -netdev stream,id= =3Ds,server=3Doff,addr.type=3Dunix,addr.path=3D%s", > > - addr.sun_path); > > - info("or qrap, for earlier qemu versions:"); > > - info(" ./qrap 5 kvm ... -net socket,fd=3D5 -net nic,model=3Dvirtio= "); > > + if (c->mode =3D=3D MODE_VU) { > > + info("You can start qemu with:"); > > + info(" kvm ... -chardev socket,id=3Dchr0,path=3D%s -netdev vhost-= user,id=3Dnetdev0,chardev=3Dchr0 -device virtio-net,netdev=3Dnetdev0 -objec= t memory-backend-memfd,id=3Dmemfd0,share=3Don,size=3D$RAMSIZE -numa node,me= mdev=3Dmemfd0\n", >=20 > We'll never make it nice, but at least we can make it shorter by using > single characters for device names (see "id=3Ds" for the socket) and > perhaps select something reasonable, say 1G, for $RAMSIZE? I haven't > tried other ways yet. >=20 > > + addr.sun_path); > > + } else { > > + info("You can now start qemu (>=3D 7.2, with commit 13c6be96618c):"); > > + info(" kvm ... -device virtio-net-pci,netdev=3Ds -netdev stream,i= d=3Ds,server=3Doff,addr.type=3Dunix,addr.path=3D%s", > > + addr.sun_path); > > + info("or qrap, for earlier qemu versions:"); > > + info(" ./qrap 5 kvm ... -net socket,fd=3D5 -net nic,model=3Dvirti= o"); > > + } > > } > > =20 > > /** > > @@ -1163,7 +1183,7 @@ static void tap_sock_unix_init(struct ctx *c) > > */ > > void tap_listen_handler(struct ctx *c, uint32_t events) > > { > > - union epoll_ref ref =3D { .type =3D EPOLL_TYPE_TAP_PASST }; > > + union epoll_ref ref; >=20 > ...then it should go after 'int v'. >=20 > > struct epoll_event ev =3D { 0 }; > > int v =3D INT_MAX / 2; > > struct ucred ucred; > > @@ -1204,7 +1224,13 @@ void tap_listen_handler(struct ctx *c, uint32_t = events) > > trace("tap: failed to set SO_SNDBUF to %i", v); > > =20 > > ref.fd =3D c->fd_tap; > > - ev.events =3D EPOLLIN | EPOLLET | EPOLLRDHUP; > > + if (c->mode =3D=3D MODE_VU) { > > + ref.type =3D EPOLL_TYPE_VHOST_CMD; > > + ev.events =3D EPOLLIN | EPOLLRDHUP; > > + } else { > > + ref.type =3D EPOLL_TYPE_TAP_PASST; > > + ev.events =3D EPOLLIN | EPOLLRDHUP | EPOLLET; > > + } > > ev.data.u64 =3D ref.u64; > > epoll_ctl(c->epollfd, EPOLL_CTL_ADD, c->fd_tap, &ev); > > } > > @@ -1288,12 +1314,21 @@ void tap_sock_init(struct ctx *c) > > =20 > > ASSERT(c->one_off); > > ref.fd =3D c->fd_tap; > > - if (c->mode =3D=3D MODE_PASST) > > + switch (c->mode) { > > + case MODE_PASST: > > ref.type =3D EPOLL_TYPE_TAP_PASST; > > - else > > + ev.events =3D EPOLLIN | EPOLLET | EPOLLRDHUP; > > + break; > > + case MODE_PASTA: > > ref.type =3D EPOLL_TYPE_TAP_PASTA; > > + ev.events =3D EPOLLIN | EPOLLET | EPOLLRDHUP; > > + break; > > + case MODE_VU: > > + ref.type =3D EPOLL_TYPE_VHOST_CMD; > > + ev.events =3D EPOLLIN | EPOLLRDHUP; > > + break; > > + } > > =20 > > - ev.events =3D EPOLLIN | EPOLLET | EPOLLRDHUP; > > ev.data.u64 =3D ref.u64; > > epoll_ctl(c->epollfd, EPOLL_CTL_ADD, c->fd_tap, &ev); > > return; > > diff --git a/tcp.c b/tcp.c > > index 54c15087d678..b6aca9f37f19 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -1033,7 +1033,9 @@ size_t ipv4_fill_headers(const struct ctx *c, > > =20 > > tcp_set_tcp_header(th, conn, seq); > > =20 > > - th->check =3D tcp_update_check_tcp4(iph); > > + th->check =3D 0; > > + if (c->mode !=3D MODE_VU || *c->pcap) > > + th->check =3D tcp_update_check_tcp4(iph); > > =20 > > return ip_len; > > } > > @@ -1069,7 +1071,9 @@ size_t ipv6_fill_headers(const struct ctx *c, > > =20 > > tcp_set_tcp_header(th, conn, seq); > > =20 > > - th->check =3D tcp_update_check_tcp6(ip6h); > > + th->check =3D 0; > > + if (c->mode !=3D MODE_VU || *c->pcap) > > + th->check =3D tcp_update_check_tcp6(ip6h); > > =20 > > ip6h->hop_limit =3D 255; > > ip6h->version =3D 6; > > diff --git a/udp.c b/udp.c > > index a189c2e0b5a2..799a10989a91 100644 > > --- a/udp.c > > +++ b/udp.c > > @@ -671,8 +671,10 @@ static size_t udp_update_hdr6(const struct ctx *c,= struct ipv6hdr *ip6h, > > uh->source =3D s_in6->sin6_port; > > uh->dest =3D htons(dstport); > > uh->len =3D ip6h->payload_len; > > - uh->check =3D csum(uh, ntohs(ip6h->payload_len), > > - proto_ipv6_header_checksum(ip6h, IPPROTO_UDP)); > > + uh->check =3D 0; > > + if (c->mode !=3D MODE_VU || *c->pcap) > > + uh->check =3D csum(uh, ntohs(ip6h->payload_len), > > + proto_ipv6_header_checksum(ip6h, IPPROTO_UDP)); > > ip6h->version =3D 6; > > ip6h->nexthdr =3D IPPROTO_UDP; > > ip6h->hop_limit =3D 255; >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --W6CzWx2n32cCTQ1k Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmXJh08ACgkQzQJF27ox 2GeIgw//YxHH/LWlTOfWQmIGuSX/Kp1EfnNx5xgxlMSSqfxK9lyCQvZWlLmmgxHL Ha7Z8x34cdeeIP4vVWJUGV1ypcwko++2kivajTEIoq9IKQb/aDps1k4uPl+PW6jS vL1ATjpZhXqK7bwr6BWx0/VGhCGZ50H/BWrsGbHnM05UJmZtb+ajRg+2TVWIzonM 0MmMe5O2ru0nF/RxM0aA6dT0AzRk0pISlD6GoqChM3IdJmqZUkOzXQTeS0/uHSyR K0ZW6/Gw8oF41KVbm3iI+0uFEC7Kv/0bAwwW3bADdRdFJmjpbd2z4xyE55vvKKVs MyN5SunSjpe9rub6HIpbT8UGGZe/n+vps5KJNUKX3XBTlDT4H13uxfS2bnzuS8ql KiXer6g6rd0KAa4HmRZhtQLcPla2g7D8KIbrUew+9uGnd2qSK/5xGKg+dqKCF+qr QV2sIEH9Y1dwGyiETXSctExZD/D4sWyuQfShRpGP9cKooTGMEqFAHgiGoKc/c3Yk 6KCMQ82adh94nK+S03APmLp4oA5KxKDw4sgXW+P+myAmsWR+cxJDRO9+6RVt8e01 652R9GYbagxnXB0I1q9bv1fseZMLNLA6DK1w3GAJqod5Nm2/gy6WGuCRNxCOfcDQ 1NFxlhr1mKE82u3rvJ4OGxQrEtpb0FQuSkM4V54EMm0G9K8t+aY= =6fVj -----END PGP SIGNATURE----- --W6CzWx2n32cCTQ1k--