From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTP id 6122D5A0274 for ; Mon, 12 Feb 2024 00:19:40 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707693579; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HYUrGFdJZ/+Z+pRysR4ILRvP+SCgrlMHGMQKw+RGfUk=; b=g/H9j9cwWt8k1RcQZ0OU3ludL6ewu+hacWQtCs8mwxxvuXPueHrK8KuJ5N+P1qeCKFwKDp 34xBXGcdADGHGFTaaWmJzrfL5A3SaU3oalIZeTdHoOm3AvG5Wc3iGIbAv+F9RIYcAIbh0s OdbPwA5WB80pk2mHOafrSQWGdmbJmE8= Received: from mail-ej1-f71.google.com (mail-ej1-f71.google.com [209.85.218.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-486-cTAtGyrzO3K2GRUDPnfWDw-1; Sun, 11 Feb 2024 18:19:38 -0500 X-MC-Unique: cTAtGyrzO3K2GRUDPnfWDw-1 Received: by mail-ej1-f71.google.com with SMTP id a640c23a62f3a-a3c2a65d99eso49041666b.0 for ; Sun, 11 Feb 2024 15:19:37 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707693576; x=1708298376; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=HYUrGFdJZ/+Z+pRysR4ILRvP+SCgrlMHGMQKw+RGfUk=; b=Yqn2Z8p+I1a7TqIdd4MdgKNCBn7Qpw3aGlh8c4ERshL6rXQWVshOyslwKVNudsql5Z SHlax65jglt5gk7YxJRdGD6PCcsfoh6NhGVFXoEH+fmgrC59+dNdhEN6erjAP+tC/SVD f7nffhV2Ga8QlXAFFZK2npxfjhFjGtO4bfj1cQ3k64BcmcLgzmnZ8mxaUeiWdfAxI2B0 /OcOuk70NkU16Paqb+zI5nMBuu0d36T9NYblPe/NKiIKT4GvHVica/oDsQ3xv3djOlMM 7VClX6gLJnT8R7QWOQKi5ae8NVZPyx7vGROj/TCefZIV1/2WicJLe3tq5kr1gDrH+fN/ mggA== X-Gm-Message-State: AOJu0YyJvMNVy2kARC8VGaRvwGPmDSuK2ktmVxjHmWRKVcgX77DsATC8 jMdbcPkwe+y0aGpAksF8OOEwteA3qqdxOjIf6ruSBpLerfZI8GaWDfvP+90/CB48T4ed2LH6Na7 glZu7G28uLXpI9nu0VmRtKDyENVDvncRnfVK7zIyEIwUcYNz6w51GdlP12SuAoebbN7ozcAqx3U IKlEsq4FMmOB2v3bX+oPWINsEVXbwSI0e8I50= X-Received: by 2002:a17:906:6852:b0:a3c:82ef:b63c with SMTP id a18-20020a170906685200b00a3c82efb63cmr1486128ejs.1.1707693576388; Sun, 11 Feb 2024 15:19:36 -0800 (PST) X-Google-Smtp-Source: AGHT+IEuzqKwf52/q/EZPbzlcvxTtQY+XcVtKr9A6mJ9/4bt25OOBM6Ovnl/H61BeSmcC6pRwweN1g== X-Received: by 2002:a17:906:6852:b0:a3c:82ef:b63c with SMTP id a18-20020a170906685200b00a3c82efb63cmr1486118ejs.1.1707693575888; Sun, 11 Feb 2024 15:19:35 -0800 (PST) Received: from maya.cloud.tilaa.com (maya.cloud.tilaa.com. [164.138.29.33]) by smtp.gmail.com with ESMTPSA id u24-20020a1709064ad800b00a3c598d6b74sm1486429ejt.54.2024.02.11.15.19.35 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 11 Feb 2024 15:19:35 -0800 (PST) Date: Mon, 12 Feb 2024 00:19:01 +0100 From: Stefano Brivio To: Laurent Vivier Subject: Re: [PATCH 20/24] vhost-user: add vhost-user Message-ID: <20240212001901.66ada100@elisabeth> In-Reply-To: <20240202141151.3762941-21-lvivier@redhat.com> References: <20240202141151.3762941-1-lvivier@redhat.com> <20240202141151.3762941-21-lvivier@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.1.1 (GTK 3.24.36; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: WBKNJM5IKKIMUYZJWB73GARTSUEOBEOE X-Message-ID-Hash: WBKNJM5IKKIMUYZJWB73GARTSUEOBEOE X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, 2 Feb 2024 15:11:47 +0100 Laurent Vivier wrote: > add virtio and vhost-user functions to connect with QEMU. > > $ ./passt --vhost-user > > and > > # qemu-system-x86_64 ... -m 4G \ > -object memory-backend-memfd,id=memfd0,share=on,size=4G \ > -numa node,memdev=memfd0 \ > -chardev socket,id=chr0,path=/tmp/passt_1.socket \ > -netdev vhost-user,id=netdev0,chardev=chr0 \ > -device virtio-net,mac=9a:2b:2c:2d:2e:2f,netdev=netdev0 \ > ... > > Signed-off-by: Laurent Vivier > --- > conf.c | 20 ++++++++++++++-- > passt.c | 7 ++++++ > passt.h | 1 + > tap.c | 73 ++++++++++++++++++++++++++++++++++++++++++--------------- > tcp.c | 8 +++++-- > udp.c | 6 +++-- > 6 files changed, 90 insertions(+), 25 deletions(-) This would need a matching change in the man page, passt.1, at least documenting the --vhost-user option and adjusting descriptions about the guest communication interface (look for "UNIX domain" there). > diff --git a/conf.c b/conf.c > index b6a2a1f0fdc3..40aa9519f8a6 100644 > --- a/conf.c > +++ b/conf.c > @@ -44,6 +44,7 @@ > #include "lineread.h" > #include "isolation.h" > #include "log.h" > +#include "vhost_user.h" > > /** > * next_chunk - Return the next piece of a string delimited by a character > @@ -735,9 +736,12 @@ static void print_usage(const char *name, int status) > info( " -I, --ns-ifname NAME namespace interface name"); > info( " default: same interface name as external one"); > } else { > - info( " -s, --socket PATH UNIX domain socket path"); > + info( " -s, --socket, --socket-path PATH UNIX domain socket path"); I don't get the point of --socket-path. It's handled just like -s anyway, right? Why can't it just be -s / --socket? > info( " default: probe free path starting from " > UNIX_SOCK_PATH, 1); > + info( " --vhost-user Enable vhost-user mode"); > + info( " UNIX domain socket is provided by -s option"); > + info( " --print-capabilities print back-end capabilities in JSON format"); Instead of introducing a new option, couldn't we have these printed unconditionally with debug()? I guess it's debug-level stuff anyway. > } > > info( " -F, --fd FD Use FD as pre-opened connected socket"); > @@ -1123,6 +1127,7 @@ void conf(struct ctx *c, int argc, char **argv) > {"help", no_argument, NULL, 'h' }, > {"socket", required_argument, NULL, 's' }, > {"fd", required_argument, NULL, 'F' }, > + {"socket-path", required_argument, NULL, 's' }, /* vhost-user mandatory */ > {"ns-ifname", required_argument, NULL, 'I' }, > {"pcap", required_argument, NULL, 'p' }, > {"pid", required_argument, NULL, 'P' }, > @@ -1169,6 +1174,8 @@ void conf(struct ctx *c, int argc, char **argv) > {"config-net", no_argument, NULL, 17 }, > {"no-copy-routes", no_argument, NULL, 18 }, > {"no-copy-addrs", no_argument, NULL, 19 }, > + {"vhost-user", no_argument, NULL, 20 }, > + {"print-capabilities", no_argument, NULL, 21 }, /* vhost-user mandatory */ > { 0 }, > }; > char userns[PATH_MAX] = { 0 }, netns[PATH_MAX] = { 0 }; > @@ -1328,7 +1335,6 @@ void conf(struct ctx *c, int argc, char **argv) > sizeof(c->ip6.ifname_out), "%s", optarg); > if (ret <= 0 || ret >= (int)sizeof(c->ip6.ifname_out)) > die("Invalid interface name: %s", optarg); > - > break; > case 17: > if (c->mode != MODE_PASTA) > @@ -1350,6 +1356,16 @@ void conf(struct ctx *c, int argc, char **argv) > warn("--no-copy-addrs will be dropped soon"); > c->no_copy_addrs = copy_addrs_opt = true; > break; > + case 20: > + if (c->mode == MODE_PASTA) { And, if c->mode == MODE_VU, the option is redundant (see e.g. the handling for --one-off). > + err("--vhost-user is for passt mode only"); > + usage(argv[0]); > + } > + c->mode = MODE_VU; > + break; > + case 21: > + vu_print_capabilities(); > + break; > case 'd': > if (c->debug) > die("Multiple --debug options given"); > diff --git a/passt.c b/passt.c > index 95034d73381f..952aded12848 100644 > --- a/passt.c > +++ b/passt.c > @@ -282,6 +282,7 @@ int main(int argc, char **argv) > quit_fd = pasta_netns_quit_init(&c); > > tap_sock_init(&c); > + vu_init(&c); > > secret_init(&c); > > @@ -399,6 +400,12 @@ loop: > case EPOLL_TYPE_ICMPV6: > icmp_sock_handler(&c, AF_INET6, ref); > break; > + case EPOLL_TYPE_VHOST_CMD: > + tap_handler_vu(&c, eventmask); > + break; > + case EPOLL_TYPE_VHOST_KICK: > + vu_kick_cb(&c, ref); > + break; > default: > /* Can't happen */ > ASSERT(0); > diff --git a/passt.h b/passt.h > index 6ed1d0b19e82..4e0100d51a4d 100644 > --- a/passt.h > +++ b/passt.h > @@ -141,6 +141,7 @@ struct fqdn { > enum passt_modes { > MODE_PASST, > MODE_PASTA, > + MODE_VU, > }; > > /** > diff --git a/tap.c b/tap.c > index 936206e53637..c2a917bc00ca 100644 > --- a/tap.c > +++ b/tap.c > @@ -57,6 +57,7 @@ > #include "packet.h" > #include "tap.h" > #include "log.h" > +#include "vhost_user.h" > > /* IPv4 (plus ARP) and IPv6 message batches from tap/guest to IP handlers */ > static PACKET_POOL_NOINIT(pool_tap4, TAP_MSGS, pkt_buf); > @@ -75,19 +76,22 @@ static PACKET_POOL_NOINIT(pool_tap6, TAP_MSGS, pkt_buf); > */ > int tap_send(const struct ctx *c, const void *data, size_t len) > { > - pcap(data, len); > + int flags = MSG_NOSIGNAL | MSG_DONTWAIT; > + uint32_t vnet_len = htonl(len); > > - if (c->mode == MODE_PASST) { > - int flags = MSG_NOSIGNAL | MSG_DONTWAIT; > - uint32_t vnet_len = htonl(len); > + pcap(data, len); > > + switch (c->mode) { > + case MODE_PASST: > if (send(c->fd_tap, &vnet_len, 4, flags) < 0) > return -1; > - Unrelated change. > return send(c->fd_tap, data, len, flags); > + case MODE_PASTA: > + return write(c->fd_tap, (char *)data, len); > + case MODE_VU: > + return vu_send(c, data, len); > } > - > - return write(c->fd_tap, (char *)data, len); > + return 0; > } > > /** > @@ -428,10 +432,20 @@ size_t tap_send_frames(const struct ctx *c, const struct iovec *iov, size_t n) > if (!n) > return 0; > > - if (c->mode == MODE_PASTA) > + switch (c->mode) { > + case MODE_PASTA: > m = tap_send_frames_pasta(c, iov, n); > - else > + break; > + case MODE_PASST: > m = tap_send_frames_passt(c, iov, n); > + break; > + case MODE_VU: > + m = tap_send_frames_vu(c, iov, n); > + break; > + default: > + m = 0; > + break; > + } > > if (m < n) > debug("tap: failed to send %zu frames of %zu", n - m, n); > @@ -1149,11 +1163,17 @@ static void tap_sock_unix_init(struct ctx *c) > ev.data.u64 = ref.u64; > epoll_ctl(c->epollfd, EPOLL_CTL_ADD, c->fd_tap_listen, &ev); > > - info("You can now start qemu (>= 7.2, with commit 13c6be96618c):"); > - info(" kvm ... -device virtio-net-pci,netdev=s -netdev stream,id=s,server=off,addr.type=unix,addr.path=%s", > - addr.sun_path); > - info("or qrap, for earlier qemu versions:"); > - info(" ./qrap 5 kvm ... -net socket,fd=5 -net nic,model=virtio"); > + if (c->mode == MODE_VU) { > + info("You can start qemu with:"); > + info(" kvm ... -chardev socket,id=chr0,path=%s -netdev vhost-user,id=netdev0,chardev=chr0 -device virtio-net,netdev=netdev0 -object memory-backend-memfd,id=memfd0,share=on,size=$RAMSIZE -numa node,memdev=memfd0\n", We'll never make it nice, but at least we can make it shorter by using single characters for device names (see "id=s" for the socket) and perhaps select something reasonable, say 1G, for $RAMSIZE? I haven't tried other ways yet. > + addr.sun_path); > + } else { > + info("You can now start qemu (>= 7.2, with commit 13c6be96618c):"); > + info(" kvm ... -device virtio-net-pci,netdev=s -netdev stream,id=s,server=off,addr.type=unix,addr.path=%s", > + addr.sun_path); > + info("or qrap, for earlier qemu versions:"); > + info(" ./qrap 5 kvm ... -net socket,fd=5 -net nic,model=virtio"); > + } > } > > /** > @@ -1163,7 +1183,7 @@ static void tap_sock_unix_init(struct ctx *c) > */ > void tap_listen_handler(struct ctx *c, uint32_t events) > { > - union epoll_ref ref = { .type = EPOLL_TYPE_TAP_PASST }; > + union epoll_ref ref; ...then it should go after 'int v'. > struct epoll_event ev = { 0 }; > int v = INT_MAX / 2; > struct ucred ucred; > @@ -1204,7 +1224,13 @@ void tap_listen_handler(struct ctx *c, uint32_t events) > trace("tap: failed to set SO_SNDBUF to %i", v); > > ref.fd = c->fd_tap; > - ev.events = EPOLLIN | EPOLLET | EPOLLRDHUP; > + if (c->mode == MODE_VU) { > + ref.type = EPOLL_TYPE_VHOST_CMD; > + ev.events = EPOLLIN | EPOLLRDHUP; > + } else { > + ref.type = EPOLL_TYPE_TAP_PASST; > + ev.events = EPOLLIN | EPOLLRDHUP | EPOLLET; > + } > ev.data.u64 = ref.u64; > epoll_ctl(c->epollfd, EPOLL_CTL_ADD, c->fd_tap, &ev); > } > @@ -1288,12 +1314,21 @@ void tap_sock_init(struct ctx *c) > > ASSERT(c->one_off); > ref.fd = c->fd_tap; > - if (c->mode == MODE_PASST) > + switch (c->mode) { > + case MODE_PASST: > ref.type = EPOLL_TYPE_TAP_PASST; > - else > + ev.events = EPOLLIN | EPOLLET | EPOLLRDHUP; > + break; > + case MODE_PASTA: > ref.type = EPOLL_TYPE_TAP_PASTA; > + ev.events = EPOLLIN | EPOLLET | EPOLLRDHUP; > + break; > + case MODE_VU: > + ref.type = EPOLL_TYPE_VHOST_CMD; > + ev.events = EPOLLIN | EPOLLRDHUP; > + break; > + } > > - ev.events = EPOLLIN | EPOLLET | EPOLLRDHUP; > ev.data.u64 = ref.u64; > epoll_ctl(c->epollfd, EPOLL_CTL_ADD, c->fd_tap, &ev); > return; > diff --git a/tcp.c b/tcp.c > index 54c15087d678..b6aca9f37f19 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -1033,7 +1033,9 @@ size_t ipv4_fill_headers(const struct ctx *c, > > tcp_set_tcp_header(th, conn, seq); > > - th->check = tcp_update_check_tcp4(iph); > + th->check = 0; > + if (c->mode != MODE_VU || *c->pcap) > + th->check = tcp_update_check_tcp4(iph); > > return ip_len; > } > @@ -1069,7 +1071,9 @@ size_t ipv6_fill_headers(const struct ctx *c, > > tcp_set_tcp_header(th, conn, seq); > > - th->check = tcp_update_check_tcp6(ip6h); > + th->check = 0; > + if (c->mode != MODE_VU || *c->pcap) > + th->check = tcp_update_check_tcp6(ip6h); > > ip6h->hop_limit = 255; > ip6h->version = 6; > diff --git a/udp.c b/udp.c > index a189c2e0b5a2..799a10989a91 100644 > --- a/udp.c > +++ b/udp.c > @@ -671,8 +671,10 @@ static size_t udp_update_hdr6(const struct ctx *c, struct ipv6hdr *ip6h, > uh->source = s_in6->sin6_port; > uh->dest = htons(dstport); > uh->len = ip6h->payload_len; > - uh->check = csum(uh, ntohs(ip6h->payload_len), > - proto_ipv6_header_checksum(ip6h, IPPROTO_UDP)); > + uh->check = 0; > + if (c->mode != MODE_VU || *c->pcap) > + uh->check = csum(uh, ntohs(ip6h->payload_len), > + proto_ipv6_header_checksum(ip6h, IPPROTO_UDP)); > ip6h->version = 6; > ip6h->nexthdr = IPPROTO_UDP; > ip6h->hop_limit = 255; -- Stefano