From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202408 header.b=D8odVFld; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 81F195A004E for ; Mon, 23 Sep 2024 06:11:52 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202408; t=1727064705; bh=peGX9HzM5Po9k5um0sYo/MUtTM/Fo8dMlJsMd84oNPs=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=D8odVFldVj94hnPv5bfTMLNcZmLnb/ikBcK5rMDrey7xRbSHP/Zvgt+AYWk9IeQN/ NOMN2vXR7WJByrEmdxGEN33Iy5j3TRj189C2BxPsTXu9IEK/OxZiPTCsEIrHwqyguh Eo9saqMvtxgwjbAzF6xq6G+SkdY0qQfwv8bDtaV06Xg4OlNUUv9J+81FU3sj/kHqHY GCM4RsYzUv9SPRJSihGSmGy3vnkmFbM+Nt7lqYOB47Z2BA0G9gZO0ztQTELYFGdGVk sDbAUo+57kc3IOsZt+mX0dqn4wlinqOyITTmmqh/yZP76ikmA9HurX9//QzkhU8rep 34KDQk3c4uIfw== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4XBqLn0pQTz4wyk; Mon, 23 Sep 2024 14:11:45 +1000 (AEST) Date: Mon, 23 Sep 2024 14:11:34 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v5 4/4] vhost-user: add vhost-user Message-ID: References: <20240913162036.3635783-1-lvivier@redhat.com> <20240913162036.3635783-5-lvivier@redhat.com> <20240919155143.437ef709@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="O25hLRtSOPoftLs9" Content-Disposition: inline In-Reply-To: <20240919155143.437ef709@elisabeth> Message-ID-Hash: QADYT27Q5XLEVXDJLK3XIRQJ2FZCQ326 X-Message-ID-Hash: QADYT27Q5XLEVXDJLK3XIRQJ2FZCQ326 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Laurent Vivier , passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --O25hLRtSOPoftLs9 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Sep 19, 2024 at 03:51:43PM +0200, Stefano Brivio wrote: > Sorry for the delay, I wanted first to finish extending tests to run > also functional ones (not just throughput and latency) with vhost-user, > but it's taking me a bit longer than expected, so here comes the review. >=20 > By the way, by mistake I let passt run in non-vhost-user mode while > QEMU was configured to use it. This results in a loop: >=20 > qemu-system-x86_64: -netdev vhost-user,id=3Dnetdev0,chardev=3Dchr0: Faile= d to read msg header. Read 0 instead of 12. Original request 1. > qemu-system-x86_64: -netdev vhost-user,id=3Dnetdev0,chardev=3Dchr0: vhost= _backend_init failed: Protocol error > qemu-system-x86_64: -netdev vhost-user,id=3Dnetdev0,chardev=3Dchr0: faile= d to init vhost_net for queue 0 > qemu-system-x86_64: -netdev vhost-user,id=3Dnetdev0,chardev=3Dchr0: Faile= d to read msg header. Read 0 instead of 12. Original request 1. > qemu-system-x86_64: -netdev vhost-user,id=3Dnetdev0,chardev=3Dchr0: vhost= _backend_init failed: Protocol error > qemu-system-x86_64: -netdev vhost-user,id=3Dnetdev0,chardev=3Dchr0: faile= d to init vhost_net for queue 0 > ... >=20 > and passt says: >=20 > accepted connection from PID 4807 > Bad frame size from guest, resetting connection > Client connection closed > accepted connection from PID 4807 > Bad frame size from guest, resetting connection > Client connection closed > ... >=20 > while happily flooding system logs. I guess it should be fixed in QEMU > at some point: if the vhost_net initialisation fails, I don't see the > point in retrying. This is without the "reconnect" option by the way: Fwiw, I tend to agree. [snip] > > +.TP > > +.BR \-\-vhost-user >=20 > I think we should introduce this option as deprecated right away, so > that we can switch to vhost-user mode by default soon (checking if the > hypervisor sends us a vhost-user command) without having to keep this > option around. At that point, we can add --no-vhost-user instead. If we introduced this as deprecated, then we should also introduce --no-vhost-user immediately (as a no-op). So that people can make future proof scripts that _don't_ want VU right away. Or... Maybe we should start deprecating in terms of a a more general option, say: --backend qemu-stream --backend tuntap --backend vhost-user So we don't need to proliferate even more options, when say we want to add: --backend vduse Or whatever. > If it makes sense, you could copy the text from --stderr: >=20 > Note that this configuration option is \fBdeprecated\fR and will be remov= ed in a > future version. >=20 > > +Enable vhost-user. The vhost-user command socket is provided by \fB--s= ocket\fR. > > + > > +.TP > > +.BR \-\-print-capabilities > > +Print back-end capabilities in JSON format, only meaningful for vhost-= user mode. > > + > > .TP > > .BR \-F ", " \-\-fd " " \fIFD > > Pass a pre-opened, connected socket to \fBpasst\fR. Usually the socket= is opened > > diff --git a/passt.c b/passt.c > > index ad6f0bc32df6..b64efeaf346c 100644 > > --- a/passt.c > > +++ b/passt.c > > @@ -74,6 +74,8 @@ char *epoll_type_str[] =3D { > > [EPOLL_TYPE_TAP_PASTA] =3D "/dev/net/tun device", > > [EPOLL_TYPE_TAP_PASST] =3D "connected qemu socket", > > [EPOLL_TYPE_TAP_LISTEN] =3D "listening qemu socket", > > + [EPOLL_TYPE_VHOST_CMD] =3D "vhost-user command socket", > > + [EPOLL_TYPE_VHOST_KICK] =3D "vhost-user kick socket", > > }; > > static_assert(ARRAY_SIZE(epoll_type_str) =3D=3D EPOLL_NUM_TYPES, > > "epoll_type_str[] doesn't match enum epoll_type"); > > @@ -206,6 +208,7 @@ int main(int argc, char **argv) > > struct rlimit limit; > > struct timespec now; > > struct sigaction sa; > > + struct vu_dev vdev; > > =20 > > clock_gettime(CLOCK_MONOTONIC, &log_start); > > =20 > > @@ -262,6 +265,8 @@ int main(int argc, char **argv) > > pasta_netns_quit_init(&c); > > =20 > > tap_sock_init(&c); > > + if (c.mode =3D=3D MODE_VU) > > + vu_init(&c, &vdev); > > =20 > > secret_init(&c); > > =20 > > @@ -352,14 +357,31 @@ loop: > > tcp_timer_handler(&c, ref); > > break; > > case EPOLL_TYPE_UDP_LISTEN: > > - udp_listen_sock_handler(&c, ref, eventmask, &now); > > + if (c.mode =3D=3D MODE_VU) { >=20 > Eventually, we'll probably want to make passt more generic and to > support multiple guests, so at that point this might become > EPOLL_TYPE_UDP_VU_LISTEN if it's a socket we opened for a guest using > vhost-user. Or maybe we'll have to unify the receive paths, so this > will remain EPOLL_TYPE_UDP_LISTEN. Yeah, I don't think having different epoll types would be particularly helpful here. If we have multiple guests we won't necessarily know which one we'll be forwarding to until we've been through the "forward / NAT" logic. >=20 > Either way, _if it's more convenient for you right now_, I wouldn't see > any issue in defining new EPOLL_TYPE_UDP_VU_{LISTEN,REPLY} values. >=20 > > + udp_vu_listen_sock_handler(&c, ref, eventmask, > > + &now); > > + } else { > > + udp_buf_listen_sock_handler(&c, ref, eventmask, > > + &now); > > + } > > break; > > case EPOLL_TYPE_UDP_REPLY: It could potentially be used for this case, though, since this is associated with a single flow. > > - udp_reply_sock_handler(&c, ref, eventmask, &now); > > + if (c.mode =3D=3D MODE_VU) > > + udp_vu_reply_sock_handler(&c, ref, eventmask, > > + &now); > > + else > > + udp_buf_reply_sock_handler(&c, ref, eventmask, > > + &now); > > break; [snip] > > +/** > > + * tcp_vu_pcap() - Capture a single frame to pcap file (TCP) > > + * @c: Execution context > > + * @tapside: Address information for one side of the flow > > + * @iov: Pointer to the array of IO vectors > > + * @iov_used: Length of the array > > + * @l4len: IPv4 Payload length > > + */ > > +static void tcp_vu_pcap(const struct ctx *c, const struct flowside *ta= pside, >=20 > 'c' should be const (unless you modify data pointed by it, but I don't > see where), otherwise gcc complains: I'm guessing this was meant to go on the next function down. This might be a content conflict with one of my recent changes: the "TCP cleanups" added "const" to a lot of context pointers through the TCP code, I'm guessing this function was calling one of them, so couldn't be const when Laurent first wrote it. >=20 > tcp.c: In function =E2=80=98tcp_send_flag=E2=80=99: > tcp.c:1249:41: warning: passing argument 1 of =E2=80=98tcp_vu_send_flag= =E2=80=99 discards =E2=80=98const=E2=80=99 qualifier from pointer target ty= pe [-Wdiscarded-qualifiers] > 1249 | return tcp_vu_send_flag(c, conn, flags); > | ^ > In file included from tcp.c:307: > tcp_vu.h:9:34: note: expected =E2=80=98struct ctx *=E2=80=99 but argument= is of type =E2=80=98const struct ctx *=E2=80=99 > 9 | int tcp_vu_send_flag(struct ctx *c, struct tcp_tap_conn *conn, in= t flags); > | ~~~~~~~~~~~~^ --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --O25hLRtSOPoftLs9 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmbw6nYACgkQzQJF27ox 2Gc8eRAAor2S9Sj031GByF/PvsFvGLt0itxxhE8MKNz/e9NfYJm709cf5WG4sqCs OafCCe9A6m70JRez13BsE5cVokUa9E3dzEN2gkPjFnwD5UvCYJkkZ1iwZNoxdDP0 115IPf3qZaApnhGZXIcN9TyChIh5OvEs9MPAc9r8melgeVpkLnxYWVtcF+HGSpD+ BsrbORvkDb87E2q+8cWIxSIcErzrlnpZD6szlb0dnblfjBQGkCPIK5rbTNo3GIwQ 8IHhZ627vLfyC31XbtN+JOS5xgwtpJcuhbeOKFRT2MNM5+xFr9htI7vfNyO5FW90 vhXksv7vKI2d2QKWrX2IYbkTDe9Nj4b0/5bKxU5V6jxd0Nh/LWzKC3hm5TjZf7ui h2X2ieHm7vqIFHUmXymHPdqqkiYY9jZOsqjcWakMsW/LWkqKQx2wV/akgYHxSpxu 7coRM+lzyDrK+0pCFAwRhEvDSo91lUfyZSY3AnPobfqWf/0VDx3Qa3do9fC/cLJP G7MtUWMxpnQIRR0jQsy912AeVxfJluNZS56n7Ukz+IZMun1A6zA4BVNySA/Z3gAj C+nUuZ2qq5WGooNd0s6JOrdASYVBjf1r3mNeV4rNYDkJV/e9af9UDQmg1IL8ZgTw kTcjQmz638bvS8Uj5UPDglKRiHi6l3DsfLxIFVzZ6+9K+GkQjJg= =llnM -----END PGP SIGNATURE----- --O25hLRtSOPoftLs9--