From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 418ED5A026F for ; Fri, 9 Feb 2024 06:01:22 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1707454877; bh=VdPcN5IqEFJzwQKXZO4zsuwUEP6Y6O2PrYq++U2KdOU=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=gApZaq6L71YayLRE9hYUxvRFIgpx1VzFfM8kX7ndA4zAWUyXEDqnyc66+up5GLVAT X1Bet/Pq5g8hiNsQFIXXXaT06i8tYy60p4n18EUBBcnLL72OHRAGhottQxX9HoO3Yu KMC7JdACrd39giIZHAlck4DJLTI32r2ItdfB5+ogKZljF/7h/P/Gtguc3KR7yERkqJ nMxyqWJzrUp2Uh1bT0VxTG9Y4eXneMh2iD73VfRwD/tD54cOC96oxHePS3JlFcvbP2 fVt1/HfswWj027IK4HJ7X0UW4HOiJ+5gF0Y7fbekwgiiWbdlX3/0e8OrRPQV0dZyfx 70G9oS0KCL1VQ== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4TWMBj47N6z4wys; Fri, 9 Feb 2024 16:01:17 +1100 (AEDT) Date: Fri, 9 Feb 2024 16:00:40 +1100 From: David Gibson To: Laurent Vivier Subject: Re: [PATCH 23/24] udp: vhost-user RX nocopy Message-ID: References: <20240202141151.3762941-1-lvivier@redhat.com> <20240202141151.3762941-24-lvivier@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="qs7t14SiE2Bco2+8" Content-Disposition: inline In-Reply-To: <20240202141151.3762941-24-lvivier@redhat.com> Message-ID-Hash: X6JHXXEXWLFVLC47OBJ4FRQ664QTCURP X-Message-ID-Hash: X6JHXXEXWLFVLC47OBJ4FRQ664QTCURP X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --qs7t14SiE2Bco2+8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Feb 02, 2024 at 03:11:50PM +0100, Laurent Vivier wrote: > Signed-off-by: Laurent Vivier > --- > Makefile | 4 +- > passt.c | 5 +- > passt.h | 1 + > udp.c | 23 +++--- > udp_internal.h | 21 +++++ > udp_vu.c | 215 +++++++++++++++++++++++++++++++++++++++++++++++++ > udp_vu.h | 8 ++ > 7 files changed, 262 insertions(+), 15 deletions(-) > create mode 100644 udp_internal.h > create mode 100644 udp_vu.c > create mode 100644 udp_vu.h >=20 > diff --git a/Makefile b/Makefile > index f7a403d19b61..1d2b5dbfe085 100644 > --- a/Makefile > +++ b/Makefile > @@ -47,7 +47,7 @@ FLAGS +=3D -DDUAL_STACK_SOCKETS=3D$(DUAL_STACK_SOCKETS) > PASST_SRCS =3D arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c icm= p.c \ > igmp.c isolation.c lineread.c log.c mld.c ndp.c netlink.c packet.c \ > passt.c pasta.c pcap.c pif.c port_fwd.c tap.c tcp.c tcp_splice.c \ > - tcp_buf.c tcp_vu.c udp.c util.c iov.c ip.c virtio.c vhost_user.c > + tcp_buf.c tcp_vu.c udp.c udp_vu.c util.c iov.c ip.c virtio.c vhost_user= =2Ec > QRAP_SRCS =3D qrap.c > SRCS =3D $(PASST_SRCS) $(QRAP_SRCS) > =20 > @@ -57,7 +57,7 @@ PASST_HEADERS =3D arch.h arp.h checksum.h conf.h dhcp.h= dhcpv6.h flow.h \ > flow_table.h icmp.h inany.h isolation.h lineread.h log.h ndp.h \ > netlink.h packet.h passt.h pasta.h pcap.h pif.h port_fwd.h siphash.h \ > tap.h tcp.h tcp_conn.h tcp_splice.h tcp_buf.h tcp_vu.h tcp_internal.h \ > - udp.h util.h iov.h ip.h virtio.h vhost_user.h > + udp.h udp_internal.h udp_vu.h util.h iov.h ip.h virtio.h vhost_user.h > HEADERS =3D $(PASST_HEADERS) seccomp.h > =20 > C :=3D \#include \nstruct tcp_info x =3D { .tcpi_snd_wnd = =3D 0 }; > diff --git a/passt.c b/passt.c > index 952aded12848..a5abd5c4fc03 100644 > --- a/passt.c > +++ b/passt.c > @@ -392,7 +392,10 @@ loop: > tcp_timer_handler(&c, ref); > break; > case EPOLL_TYPE_UDP: > - udp_buf_sock_handler(&c, ref, eventmask, &now); > + if (c.mode =3D=3D MODE_VU) > + udp_vu_sock_handler(&c, ref, eventmask, &now); > + else > + udp_buf_sock_handler(&c, ref, eventmask, &now); > break; > case EPOLL_TYPE_ICMP: > icmp_sock_handler(&c, AF_INET, ref); > diff --git a/passt.h b/passt.h > index 4e0100d51a4d..04f4af8fd72e 100644 > --- a/passt.h > +++ b/passt.h > @@ -42,6 +42,7 @@ union epoll_ref; > #include "port_fwd.h" > #include "tcp.h" > #include "udp.h" > +#include "udp_vu.h" > #include "vhost_user.h" > =20 > /** > diff --git a/udp.c b/udp.c > index 799a10989a91..da67d0cfa46b 100644 > --- a/udp.c > +++ b/udp.c > @@ -117,9 +117,7 @@ > #include "tap.h" > #include "pcap.h" > #include "log.h" > - > -#define UDP_CONN_TIMEOUT 180 /* s, timeout for ephemeral or local bind */ > -#define UDP_MAX_FRAMES 32 /* max # of frames to receive at once */ > +#include "udp_internal.h" > =20 > /** > * struct udp_tap_port - Port tracking based on tap-facing source port > @@ -227,11 +225,11 @@ static struct mmsghdr udp6_l2_mh_sock [UDP_MAX_FRA= MES]; > static struct iovec udp4_iov_splice [UDP_MAX_FRAMES]; > static struct iovec udp6_iov_splice [UDP_MAX_FRAMES]; > =20 > -static struct sockaddr_in udp4_localname =3D { > +struct sockaddr_in udp4_localname =3D { > .sin_family =3D AF_INET, > .sin_addr =3D IN4ADDR_LOOPBACK_INIT, > }; > -static struct sockaddr_in6 udp6_localname =3D { > +struct sockaddr_in6 udp6_localname =3D { > .sin6_family =3D AF_INET6, > .sin6_addr =3D IN6ADDR_LOOPBACK_INIT, > }; > @@ -562,9 +560,9 @@ static void udp_splice_sendfrom(const struct ctx *c, = unsigned start, unsigned n, > * > * Return: size of tap frame with headers > */ > -static size_t udp_update_hdr4(const struct ctx *c, struct iphdr *iph, > - size_t data_len, struct sockaddr_in *s_in, > - in_port_t dstport, const struct timespec *now) > +size_t udp_update_hdr4(const struct ctx *c, struct iphdr *iph, > + size_t data_len, struct sockaddr_in *s_in, > + in_port_t dstport, const struct timespec *now) > { > struct udphdr *uh =3D (struct udphdr *)(iph + 1); > in_port_t src_port; > @@ -602,6 +600,7 @@ static size_t udp_update_hdr4(const struct ctx *c, st= ruct iphdr *iph, > uh->source =3D s_in->sin_port; > uh->dest =3D htons(dstport); > uh->len=3D htons(data_len + sizeof(struct udphdr)); > + uh->check =3D 0; > =20 > return ip_len; > } > @@ -615,9 +614,9 @@ static size_t udp_update_hdr4(const struct ctx *c, st= ruct iphdr *iph, > * > * Return: size of tap frame with headers > */ > -static size_t udp_update_hdr6(const struct ctx *c, struct ipv6hdr *ip6h, > - size_t data_len, struct sockaddr_in6 *s_in6, > - in_port_t dstport, const struct timespec *now) > +size_t udp_update_hdr6(const struct ctx *c, struct ipv6hdr *ip6h, > + size_t data_len, struct sockaddr_in6 *s_in6, > + in_port_t dstport, const struct timespec *now) > { > struct udphdr *uh =3D (struct udphdr *)(ip6h + 1); > struct in6_addr *src; > @@ -672,7 +671,7 @@ static size_t udp_update_hdr6(const struct ctx *c, st= ruct ipv6hdr *ip6h, > uh->dest =3D htons(dstport); > uh->len =3D ip6h->payload_len; > uh->check =3D 0; > - if (c->mode !=3D MODE_VU || *c->pcap) > + if (c->mode !=3D MODE_VU) > uh->check =3D csum(uh, ntohs(ip6h->payload_len), > proto_ipv6_header_checksum(ip6h, IPPROTO_UDP)); > ip6h->version =3D 6; > diff --git a/udp_internal.h b/udp_internal.h > new file mode 100644 > index 000000000000..a09f3c69da42 > --- /dev/null > +++ b/udp_internal.h > @@ -0,0 +1,21 @@ > +/* SPDX-License-Identifier: GPL-2.0-or-later > + * Copyright (c) 2021 Red Hat GmbH > + * Author: Stefano Brivio > + */ > + > +#ifndef UDP_INTERNAL_H > +#define UDP_INTERNAL_H > + > +#define UDP_CONN_TIMEOUT 180 /* s, timeout for ephemeral or local bind */ > +#define UDP_MAX_FRAMES 32 /* max # of frames to receive at once */ > + > +extern struct sockaddr_in udp4_localname; > +extern struct sockaddr_in6 udp6_localname; > + > +size_t udp_update_hdr4(const struct ctx *c, struct iphdr *iph, > + size_t data_len, struct sockaddr_in *s_in, > + in_port_t dstport, const struct timespec *now); > +size_t udp_update_hdr6(const struct ctx *c, struct ipv6hdr *ip6h, > + size_t data_len, struct sockaddr_in6 *s_in6, > + in_port_t dstport, const struct timespec *now); > +#endif /* UDP_INTERNAL_H */ > diff --git a/udp_vu.c b/udp_vu.c > new file mode 100644 > index 000000000000..c0f4cb90abd2 > --- /dev/null > +++ b/udp_vu.c > @@ -0,0 +1,215 @@ > +// SPDX-License-Identifier: GPL-2.0-or-later > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "checksum.h" > +#include "util.h" > +#include "ip.h" > +#include "passt.h" > +#include "pcap.h" > +#include "log.h" > +#include "vhost_user.h" > +#include "udp_internal.h" > +#include "udp_vu.h" > + > +/* vhost-user */ > +static const struct virtio_net_hdr vu_header =3D { > + .flags =3D VIRTIO_NET_HDR_F_DATA_VALID, > + .gso_type =3D VIRTIO_NET_HDR_GSO_NONE, > +}; > + > +static unsigned char buffer[65536]; > +static struct iovec iov_vu [VIRTQUEUE_MAX_SIZE]; > +static unsigned int indexes [VIRTQUEUE_MAX_SIZE]; > + > +void udp_vu_sock_handler(const struct ctx *c, union epoll_ref ref, uint3= 2_t events, > + const struct timespec *now) It's not *as* big a deal as for TCP, but I'm really hoping we can abstract things to avoid more code duplication between the vu and non-vu paths here as well. > +{ > + VuDev *vdev =3D (VuDev *)&c->vdev; > + VuVirtq *vq =3D &vdev->vq[VHOST_USER_RX_QUEUE]; > + size_t l2_hdrlen, vnet_hdrlen, fillsize; > + ssize_t data_len; > + in_port_t dstport =3D ref.udp.port; > + bool has_mrg_rxbuf, v6 =3D ref.udp.v6; > + struct msghdr msg; > + int i, iov_count, iov_used, virtqueue_max; > + > + if (c->no_udp || !(events & EPOLLIN)) > + return; > + > + has_mrg_rxbuf =3D vu_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF); > + if (has_mrg_rxbuf) { > + vnet_hdrlen =3D sizeof(struct virtio_net_hdr_mrg_rxbuf); > + virtqueue_max =3D VIRTQUEUE_MAX_SIZE; > + } else { > + vnet_hdrlen =3D sizeof(struct virtio_net_hdr); > + virtqueue_max =3D 1; > + } > + l2_hdrlen =3D vnet_hdrlen + sizeof(struct ethhdr) + sizeof(struct udphd= r); > + > + if (v6) { > + l2_hdrlen +=3D sizeof(struct ipv6hdr); > + > + udp6_localname.sin6_port =3D htons(dstport); > + msg.msg_name =3D &udp6_localname; > + msg.msg_namelen =3D sizeof(udp6_localname); > + } else { > + l2_hdrlen +=3D sizeof(struct iphdr); > + > + udp4_localname.sin_port =3D htons(dstport); > + msg.msg_name =3D &udp4_localname; > + msg.msg_namelen =3D sizeof(udp4_localname); > + } > + > + msg.msg_control =3D NULL; > + msg.msg_controllen =3D 0; > + msg.msg_flags =3D 0; > + > + for (i =3D 0; i < UDP_MAX_FRAMES; i++) { > + struct virtio_net_hdr_mrg_rxbuf *vh; > + struct ethhdr *eh; > + char *base; > + size_t size; > + > + fillsize =3D USHRT_MAX; > + iov_count =3D 0; > + while (fillsize && iov_count < virtqueue_max) { > + VuVirtqElement *elem; > + > + elem =3D vu_queue_pop(vdev, vq, sizeof(VuVirtqElement), buffer); > + if (!elem) > + break; > + > + if (elem->in_num < 1) { > + err("virtio-net receive queue contains no in buffers"); > + vu_queue_rewind(vdev, vq, iov_count); > + return; > + } > + ASSERT(elem->in_num =3D=3D 1); > + ASSERT(elem->in_sg[0].iov_len >=3D l2_hdrlen); > + > + indexes[iov_count] =3D elem->index; > + if (iov_count =3D=3D 0) { > + iov_vu[0].iov_base =3D (char *)elem->in_sg[0].iov_base + l2_hdrlen; > + iov_vu[0].iov_len =3D elem->in_sg[0].iov_len - l2_hdrlen; > + } else { > + iov_vu[iov_count].iov_base =3D elem->in_sg[0].iov_base; > + iov_vu[iov_count].iov_len =3D elem->in_sg[0].iov_len; > + } > + > + if (iov_vu[iov_count].iov_len > fillsize) > + iov_vu[iov_count].iov_len =3D fillsize; > + > + fillsize -=3D iov_vu[iov_count].iov_len; > + > + iov_count++; > + } > + if (iov_count =3D=3D 0) > + break; > + > + msg.msg_iov =3D iov_vu; > + msg.msg_iovlen =3D iov_count; > + > + data_len =3D recvmsg(ref.fd, &msg, 0); > + if (data_len < 0) { > + vu_queue_rewind(vdev, vq, iov_count); > + return; > + } > + > + iov_used =3D 0; > + size =3D data_len; > + while (size) { > + if (iov_vu[iov_used].iov_len > size) > + iov_vu[iov_used].iov_len =3D size; > + > + size -=3D iov_vu[iov_used].iov_len; > + iov_used++; > + } > + > + base =3D (char *)iov_vu[0].iov_base - l2_hdrlen; > + size =3D iov_vu[0].iov_len + l2_hdrlen; > + > + /* release unused buffers */ > + vu_queue_rewind(vdev, vq, iov_count - iov_used); > + > + /* vnet_header */ > + vh =3D (struct virtio_net_hdr_mrg_rxbuf *)base; > + vh->hdr =3D vu_header; > + if (has_mrg_rxbuf) > + vh->num_buffers =3D htole16(iov_used); > + > + /* ethernet header */ > + eh =3D (struct ethhdr *)(base + vnet_hdrlen); > + > + memcpy(eh->h_dest, c->mac_guest, sizeof(eh->h_dest)); > + memcpy(eh->h_source, c->mac, sizeof(eh->h_source)); > + > + /* initialize header */ > + if (v6) { > + struct ipv6hdr *ip6h =3D (struct ipv6hdr *)(eh + 1); > + struct udphdr *uh =3D (struct udphdr *)(ip6h + 1); > + uint32_t sum; > + > + eh->h_proto =3D htons(ETH_P_IPV6); > + > + *ip6h =3D (struct ipv6hdr)L2_BUF_IP6_INIT(IPPROTO_UDP); > + > + udp_update_hdr6(c, ip6h, data_len, &udp6_localname, > + dstport, now); > + if (*c->pcap) { > + sum =3D proto_ipv6_header_checksum(ip6h, IPPROTO_UDP); > + > + iov_vu[0].iov_base =3D uh; > + iov_vu[0].iov_len =3D size - l2_hdrlen + sizeof(*uh); > + uh->check =3D csum_iov(iov_vu, iov_used, sum); > + } else { > + /* 0 checksum is invalid with IPv6/UDP */ > + uh->check =3D 0xFFFF; > + } > + } else { > + struct iphdr *iph =3D (struct iphdr *)(eh + 1); > + struct udphdr *uh =3D (struct udphdr *)(iph + 1); > + uint32_t sum; > + > + eh->h_proto =3D htons(ETH_P_IP); > + > + *iph =3D (struct iphdr)L2_BUF_IP4_INIT(IPPROTO_UDP); > + > + udp_update_hdr4(c, iph, data_len, &udp4_localname, > + dstport, now); > + if (*c->pcap) { > + sum =3D proto_ipv4_header_checksum(iph, IPPROTO_UDP); > + > + iov_vu[0].iov_base =3D uh; > + iov_vu[0].iov_len =3D size - l2_hdrlen + sizeof(*uh); > + uh->check =3D csum_iov(iov_vu, iov_used, sum); > + } > + } > + > + /* set iov for pcap logging */ > + iov_vu[0].iov_base =3D base + vnet_hdrlen; > + iov_vu[0].iov_len =3D size - vnet_hdrlen; > + pcap_iov(iov_vu, iov_used); > + > + /* set iov_len for vu_queue_fill_by_index(); */ > + iov_vu[0].iov_base =3D base; > + iov_vu[0].iov_len =3D size; > + > + /* send packets */ > + for (i =3D 0; i < iov_used; i++) > + vu_queue_fill_by_index(vdev, vq, indexes[i], > + iov_vu[i].iov_len, i); > + > + vu_queue_flush(vdev, vq, iov_used); > + vu_queue_notify(vdev, vq); > + } > +} > diff --git a/udp_vu.h b/udp_vu.h > new file mode 100644 > index 000000000000..e01ce047ee0a > --- /dev/null > +++ b/udp_vu.h > @@ -0,0 +1,8 @@ > +// SPDX-License-Identifier: GPL-2.0-or-later > + > +#ifndef UDP_VU_H > +#define UDP_VU_H > + > +void udp_vu_sock_handler(const struct ctx *c, union epoll_ref ref, > + uint32_t events, const struct timespec *now); > +#endif /* UDP_VU_H */ --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --qs7t14SiE2Bco2+8 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmXFsXgACgkQzQJF27ox 2GcqGQ//dF3NHhO8L0XRNRHRJcRPQEBqxz6BqC0muoFGKMs6ykQoIjosNnU6gBDs U4hfL01egYmVW7NkPRWoXz1DOzxIsL7Qufn2ElN7yJVd+uHlnGZBY6JgVRNPu8SC VwlpgmuNpWfC4Zrkzzgy+GJlSng9rxMDIio2qJbIjVJsRWj7M+2Rg7RxSjhfJqFW hywOe9UfklFe5DvnTLNICimAgeUmgM0N6Z+DLShjm/QgadRTIKFsYZBbrwHOeUAS l4583Rw09/W0pr3DKGfzPH6phewl4fYg4psJ8ZrE1gdaCskR/rk6nCROYyKahmYu u7uV3xFeUOpvln5xC/W2twAvfcMgmiWwCcwyeYK+87XV43F7ujlyxeLstazHV/he KjS2USC5rc1TGMeYg7wwL1R12y0wPM8NZyxQB70L5Up5OOHd6VstNI+9bfBqHizB hwK1RIN2Is+knod934IRqLv/KI8meyHmqIsWnscoV9dGN+e7kxX9NZtLLmeIShiP oPmAckmaOZ14fUyAACwK96fav/CFdRi4h06kiREZz1fJl9CgrctNrbMQTK1gff9X hRfUOQD6BQry5UV75n96V0glm9HAdHgbb8UfsncRxhdP0WYUb5h1XeGWpVgqwdpw tRxtmKqf+3H/lqlPPDylwpliAYWH34zvwiNVCB3dpmt7Hz2pKgE= =j5Pv -----END PGP SIGNATURE----- --qs7t14SiE2Bco2+8--