From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 7A6795A0271 for ; Wed, 7 Feb 2024 03:40:41 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1707273637; bh=+/LIuSRMIjH/ds2sjdDJl2z5verXbSHrKWta0Iub9CI=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=PM7tsPlXNCV452eds6kCRCp8j8lgpVqt6EBKwEXfSxjphAsXv/mBmYpkG69vifuo/ B0ocAIfQZp7V4Nlt/tlwljDC4aLpacGzkg0SfY7x0pIrCXmtt4XIoblAH8ZFoHgbte RT//fx40HCGzp5zohMB8kgRoguJZCRrMpSdDIQE2tdJ5AAQOCG6YkECkDQP8e+ZBt/ WWcWJjjjicNrz6q8AqzdHp/maTr95NgpzqYX0bOaTgj4MPpdK1PPuqRmUrdi2bcf4K eVb1bVXI9G96i2LSChuqLRS8J8S58410CbvShbVJUaCuk8k/3to5jTR+LoysdMSHAe NgwV8XAMlZW4w== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4TV49K4B7vz4wd0; Wed, 7 Feb 2024 13:40:37 +1100 (AEDT) Date: Wed, 7 Feb 2024 13:40:33 +1100 From: David Gibson To: Laurent Vivier Subject: Re: [PATCH 20/24] vhost-user: add vhost-user Message-ID: References: <20240202141151.3762941-1-lvivier@redhat.com> <20240202141151.3762941-21-lvivier@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="g4VCaB7zpe6umDa2" Content-Disposition: inline In-Reply-To: <20240202141151.3762941-21-lvivier@redhat.com> Message-ID-Hash: P65SDRAYWKE6H2BP6OBVL4AKBNS7ZYUD X-Message-ID-Hash: P65SDRAYWKE6H2BP6OBVL4AKBNS7ZYUD X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --g4VCaB7zpe6umDa2 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Feb 02, 2024 at 03:11:47PM +0100, Laurent Vivier wrote: > add virtio and vhost-user functions to connect with QEMU. >=20 > $ ./passt --vhost-user >=20 > and >=20 > # qemu-system-x86_64 ... -m 4G \ > -object memory-backend-memfd,id=3Dmemfd0,share=3Don,size=3D4G \ > -numa node,memdev=3Dmemfd0 \ > -chardev socket,id=3Dchr0,path=3D/tmp/passt_1.socket \ I think it would be wise to use different default socket names for vhost-user than for the qemu socket protocol. Or even to require --socket-path: the reasons we have these rather weird default probed paths don't apply here, AFAICT. > -netdev vhost-user,id=3Dnetdev0,chardev=3Dchr0 \ > -device virtio-net,mac=3D9a:2b:2c:2d:2e:2f,netdev=3Dnetdev0 \ > ... >=20 > Signed-off-by: Laurent Vivier > --- > conf.c | 20 ++++++++++++++-- > passt.c | 7 ++++++ > passt.h | 1 + > tap.c | 73 ++++++++++++++++++++++++++++++++++++++++++--------------- > tcp.c | 8 +++++-- > udp.c | 6 +++-- > 6 files changed, 90 insertions(+), 25 deletions(-) >=20 > diff --git a/conf.c b/conf.c > index b6a2a1f0fdc3..40aa9519f8a6 100644 > --- a/conf.c > +++ b/conf.c > @@ -44,6 +44,7 @@ > #include "lineread.h" > #include "isolation.h" > #include "log.h" > +#include "vhost_user.h" > =20 > /** > * next_chunk - Return the next piece of a string delimited by a charact= er > @@ -735,9 +736,12 @@ static void print_usage(const char *name, int status) > info( " -I, --ns-ifname NAME namespace interface name"); > info( " default: same interface name as external one"); > } else { > - info( " -s, --socket PATH UNIX domain socket path"); > + info( " -s, --socket, --socket-path PATH UNIX domain socket path"); > info( " default: probe free path starting from " > UNIX_SOCK_PATH, 1); > + info( " --vhost-user Enable vhost-user mode"); > + info( " UNIX domain socket is provided by -s option"); > + info( " --print-capabilities print back-end capabilities in JSON fo= rmat"); > } > =20 > info( " -F, --fd FD Use FD as pre-opened connected socket"); > @@ -1123,6 +1127,7 @@ void conf(struct ctx *c, int argc, char **argv) > {"help", no_argument, NULL, 'h' }, > {"socket", required_argument, NULL, 's' }, > {"fd", required_argument, NULL, 'F' }, > + {"socket-path", required_argument, NULL, 's' }, /* vhost-user mandato= ry */ > {"ns-ifname", required_argument, NULL, 'I' }, > {"pcap", required_argument, NULL, 'p' }, > {"pid", required_argument, NULL, 'P' }, > @@ -1169,6 +1174,8 @@ void conf(struct ctx *c, int argc, char **argv) > {"config-net", no_argument, NULL, 17 }, > {"no-copy-routes", no_argument, NULL, 18 }, > {"no-copy-addrs", no_argument, NULL, 19 }, > + {"vhost-user", no_argument, NULL, 20 }, > + {"print-capabilities", no_argument, NULL, 21 }, /* vhost-user mandato= ry */ > { 0 }, > }; > char userns[PATH_MAX] =3D { 0 }, netns[PATH_MAX] =3D { 0 }; > @@ -1328,7 +1335,6 @@ void conf(struct ctx *c, int argc, char **argv) > sizeof(c->ip6.ifname_out), "%s", optarg); > if (ret <=3D 0 || ret >=3D (int)sizeof(c->ip6.ifname_out)) > die("Invalid interface name: %s", optarg); > - Unrelated change > break; > case 17: > if (c->mode !=3D MODE_PASTA) > @@ -1350,6 +1356,16 @@ void conf(struct ctx *c, int argc, char **argv) > warn("--no-copy-addrs will be dropped soon"); > c->no_copy_addrs =3D copy_addrs_opt =3D true; > break; > + case 20: > + if (c->mode =3D=3D MODE_PASTA) { > + err("--vhost-user is for passt mode only"); > + usage(argv[0]); > + } > + c->mode =3D MODE_VU; > + break; > + case 21: > + vu_print_capabilities(); > + break; > case 'd': > if (c->debug) > die("Multiple --debug options given"); > diff --git a/passt.c b/passt.c > index 95034d73381f..952aded12848 100644 > --- a/passt.c > +++ b/passt.c > @@ -282,6 +282,7 @@ int main(int argc, char **argv) > quit_fd =3D pasta_netns_quit_init(&c); > =20 > tap_sock_init(&c); > + vu_init(&c); > =20 > secret_init(&c); > =20 > @@ -399,6 +400,12 @@ loop: > case EPOLL_TYPE_ICMPV6: > icmp_sock_handler(&c, AF_INET6, ref); > break; > + case EPOLL_TYPE_VHOST_CMD: > + tap_handler_vu(&c, eventmask); > + break; > + case EPOLL_TYPE_VHOST_KICK: > + vu_kick_cb(&c, ref); > + break; > default: > /* Can't happen */ > ASSERT(0); > diff --git a/passt.h b/passt.h > index 6ed1d0b19e82..4e0100d51a4d 100644 > --- a/passt.h > +++ b/passt.h > @@ -141,6 +141,7 @@ struct fqdn { > enum passt_modes { > MODE_PASST, > MODE_PASTA, > + MODE_VU, > }; > =20 > /** > diff --git a/tap.c b/tap.c > index 936206e53637..c2a917bc00ca 100644 > --- a/tap.c > +++ b/tap.c > @@ -57,6 +57,7 @@ > #include "packet.h" > #include "tap.h" > #include "log.h" > +#include "vhost_user.h" > =20 > /* IPv4 (plus ARP) and IPv6 message batches from tap/guest to IP handler= s */ > static PACKET_POOL_NOINIT(pool_tap4, TAP_MSGS, pkt_buf); > @@ -75,19 +76,22 @@ static PACKET_POOL_NOINIT(pool_tap6, TAP_MSGS, pkt_bu= f); > */ > int tap_send(const struct ctx *c, const void *data, size_t len) > { > - pcap(data, len); > + int flags =3D MSG_NOSIGNAL | MSG_DONTWAIT; > + uint32_t vnet_len =3D htonl(len); > =20 > - if (c->mode =3D=3D MODE_PASST) { > - int flags =3D MSG_NOSIGNAL | MSG_DONTWAIT; > - uint32_t vnet_len =3D htonl(len); > + pcap(data, len); > =20 > + switch (c->mode) { > + case MODE_PASST: > if (send(c->fd_tap, &vnet_len, 4, flags) < 0) > return -1; > - > return send(c->fd_tap, data, len, flags); > + case MODE_PASTA: > + return write(c->fd_tap, (char *)data, len); > + case MODE_VU: > + return vu_send(c, data, len); > } > - > - return write(c->fd_tap, (char *)data, len); > + return 0; > } > =20 > /** > @@ -428,10 +432,20 @@ size_t tap_send_frames(const struct ctx *c, const s= truct iovec *iov, size_t n) > if (!n) > return 0; > =20 > - if (c->mode =3D=3D MODE_PASTA) > + switch (c->mode) { > + case MODE_PASTA: > m =3D tap_send_frames_pasta(c, iov, n); > - else > + break; > + case MODE_PASST: > m =3D tap_send_frames_passt(c, iov, n); > + break; > + case MODE_VU: > + m =3D tap_send_frames_vu(c, iov, n); > + break; > + default: > + m =3D 0; > + break; > + } > =20 > if (m < n) > debug("tap: failed to send %zu frames of %zu", n - m, n); > @@ -1149,11 +1163,17 @@ static void tap_sock_unix_init(struct ctx *c) > ev.data.u64 =3D ref.u64; > epoll_ctl(c->epollfd, EPOLL_CTL_ADD, c->fd_tap_listen, &ev); > =20 > - info("You can now start qemu (>=3D 7.2, with commit 13c6be96618c):"); > - info(" kvm ... -device virtio-net-pci,netdev=3Ds -netdev stream,id= =3Ds,server=3Doff,addr.type=3Dunix,addr.path=3D%s", > - addr.sun_path); > - info("or qrap, for earlier qemu versions:"); > - info(" ./qrap 5 kvm ... -net socket,fd=3D5 -net nic,model=3Dvirtio"); > + if (c->mode =3D=3D MODE_VU) { > + info("You can start qemu with:"); > + info(" kvm ... -chardev socket,id=3Dchr0,path=3D%s -netdev vhost-us= er,id=3Dnetdev0,chardev=3Dchr0 -device virtio-net,netdev=3Dnetdev0 -object = memory-backend-memfd,id=3Dmemfd0,share=3Don,size=3D$RAMSIZE -numa node,memd= ev=3Dmemfd0\n", > + addr.sun_path); > + } else { > + info("You can now start qemu (>=3D 7.2, with commit 13c6be96618c):"); > + info(" kvm ... -device virtio-net-pci,netdev=3Ds -netdev stream,id= =3Ds,server=3Doff,addr.type=3Dunix,addr.path=3D%s", > + addr.sun_path); > + info("or qrap, for earlier qemu versions:"); > + info(" ./qrap 5 kvm ... -net socket,fd=3D5 -net nic,model=3Dvirtio"= ); > + } > } > =20 > /** > @@ -1163,7 +1183,7 @@ static void tap_sock_unix_init(struct ctx *c) > */ > void tap_listen_handler(struct ctx *c, uint32_t events) > { > - union epoll_ref ref =3D { .type =3D EPOLL_TYPE_TAP_PASST }; > + union epoll_ref ref; > struct epoll_event ev =3D { 0 }; > int v =3D INT_MAX / 2; > struct ucred ucred; > @@ -1204,7 +1224,13 @@ void tap_listen_handler(struct ctx *c, uint32_t ev= ents) > trace("tap: failed to set SO_SNDBUF to %i", v); > =20 > ref.fd =3D c->fd_tap; > - ev.events =3D EPOLLIN | EPOLLET | EPOLLRDHUP; > + if (c->mode =3D=3D MODE_VU) { I'd prefer to see different epoll types for listening "old-style" unix sockets and listening vhost-user sockets, rather than having a switch on the global mode here. > + ref.type =3D EPOLL_TYPE_VHOST_CMD; > + ev.events =3D EPOLLIN | EPOLLRDHUP; > + } else { > + ref.type =3D EPOLL_TYPE_TAP_PASST; > + ev.events =3D EPOLLIN | EPOLLRDHUP | EPOLLET; > + } > ev.data.u64 =3D ref.u64; > epoll_ctl(c->epollfd, EPOLL_CTL_ADD, c->fd_tap, &ev); > } > @@ -1288,12 +1314,21 @@ void tap_sock_init(struct ctx *c) > =20 > ASSERT(c->one_off); > ref.fd =3D c->fd_tap; > - if (c->mode =3D=3D MODE_PASST) > + switch (c->mode) { > + case MODE_PASST: > ref.type =3D EPOLL_TYPE_TAP_PASST; > - else > + ev.events =3D EPOLLIN | EPOLLET | EPOLLRDHUP; > + break; > + case MODE_PASTA: > ref.type =3D EPOLL_TYPE_TAP_PASTA; > + ev.events =3D EPOLLIN | EPOLLET | EPOLLRDHUP; > + break; > + case MODE_VU: > + ref.type =3D EPOLL_TYPE_VHOST_CMD; > + ev.events =3D EPOLLIN | EPOLLRDHUP; > + break; Please add a default: case, even if it's just ASSERT(0). Leaving it out tends to make static checkers unhappy. > + } > =20 > - ev.events =3D EPOLLIN | EPOLLET | EPOLLRDHUP; > ev.data.u64 =3D ref.u64; > epoll_ctl(c->epollfd, EPOLL_CTL_ADD, c->fd_tap, &ev); > return; > diff --git a/tcp.c b/tcp.c > index 54c15087d678..b6aca9f37f19 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -1033,7 +1033,9 @@ size_t ipv4_fill_headers(const struct ctx *c, > =20 > tcp_set_tcp_header(th, conn, seq); > =20 > - th->check =3D tcp_update_check_tcp4(iph); > + th->check =3D 0; > + if (c->mode !=3D MODE_VU || *c->pcap) > + th->check =3D tcp_update_check_tcp4(iph); > =20 > return ip_len; > } > @@ -1069,7 +1071,9 @@ size_t ipv6_fill_headers(const struct ctx *c, > =20 > tcp_set_tcp_header(th, conn, seq); > =20 > - th->check =3D tcp_update_check_tcp6(ip6h); > + th->check =3D 0; > + if (c->mode !=3D MODE_VU || *c->pcap) > + th->check =3D tcp_update_check_tcp6(ip6h); > =20 > ip6h->hop_limit =3D 255; > ip6h->version =3D 6; > diff --git a/udp.c b/udp.c > index a189c2e0b5a2..799a10989a91 100644 > --- a/udp.c > +++ b/udp.c > @@ -671,8 +671,10 @@ static size_t udp_update_hdr6(const struct ctx *c, s= truct ipv6hdr *ip6h, > uh->source =3D s_in6->sin6_port; > uh->dest =3D htons(dstport); > uh->len =3D ip6h->payload_len; > - uh->check =3D csum(uh, ntohs(ip6h->payload_len), > - proto_ipv6_header_checksum(ip6h, IPPROTO_UDP)); > + uh->check =3D 0; > + if (c->mode !=3D MODE_VU || *c->pcap) > + uh->check =3D csum(uh, ntohs(ip6h->payload_len), > + proto_ipv6_header_checksum(ip6h, IPPROTO_UDP)); > ip6h->version =3D 6; > ip6h->nexthdr =3D IPPROTO_UDP; > ip6h->hop_limit =3D 255; --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --g4VCaB7zpe6umDa2 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmXC7aAACgkQzQJF27ox 2Gecuw/+JSgTxg5Nn+NPyZvi5ioEAno/6F8dyvszkJUtI06UGYQ1nn0AAsDpYLjM +1NO15InZVQbsTs129Hz9rC0fVpKVOONne5eqhEJB8j3J2ayOpFsO4d6ytqHX9V+ oq1VxznnXaa58vpvstISMeUXtgRfyktnc0BMO07//kZ7BlqY8/iX/kF7ONnpAJJL WfFwy6bk1yuRviy+IumsWzGQWHY05wv8NO2ybmn7Jwfiy30klKg1qZc0bbNB69V2 MOu9CaOlW6JIzJI0CypWSe62FoffZGA8kjJVSaxAe+PzhXdn6KkyHQ5rtBSRc5h/ RgifD0W5csdpnWYRSpVUxWkZmKnOF0666wkMkMxPvaZt2bC9aE3Z4B6bR5cJdaVt 4SF/CksgYI3eRuyS2Ia/ool+R9R4aZizVIxBnIy5G87imVZHpOC/ugzoLGQZGDdO BL/EV+65y4YYtcGvlVW0JhNrF1X2bzD8nf+JPQSMJ14ekDfPjBe1yLwFbrSVG0gI tV2rpYZfg28jmYQu3TPucVtOLSYSkM9Xb3/OehYnpGO3KYv7rwZ4rKy13s1UBHex pIBnV5hn5uYSjj8pJGTqjJIo3X4aKrpAqElqKFNjAvv/89nnwT/8CKBrzkK3QJJH VIAYBw6aYBnjoY44v9vpZIUJ07oG9YORFtjDIbNpxAudIQ5fsdU= =TmYM -----END PGP SIGNATURE----- --g4VCaB7zpe6umDa2--