From: David Gibson <david@gibson.dropbear.id.au>
To: Laurent Vivier <lvivier@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH 03/12] udp_vu: Use iov_tail to manage virtqueue buffers
Date: Mon, 2 Mar 2026 11:03:29 +1100 [thread overview]
Message-ID: <aaTT0R7xCNpxP3R4@zatzit> (raw)
In-Reply-To: <20260227140330.2216753-4-lvivier@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 10369 bytes --]
On Fri, Feb 27, 2026 at 03:03:21PM +0100, Laurent Vivier wrote:
> Replace direct iovec pointer arithmetic in UDP vhost-user handling with
> iov_tail operations introduced in the previous commit.
>
> udp_vu_sock_recv() now takes a struct iov_tail and returns the received
> data length rather than the number of iov entries used. It uses
> iov_drop_header() to skip past L2/L3/L4 headers before receiving socket
> data, iov_tail_truncate() to trim unused buffer space, and
> iov_tail_zero_end() to zero-pad short frames instead of vu_pad().
>
> udp_vu_prepare() and udp_vu_csum() take a const struct iov_tail instead
> of referencing the file-scoped iov_vu array directly, making data flow
> explicit.
>
> udp_vu_csum() uses iov_drop_header() and IOV_REMOVE_HEADER() to locate
> the UDP header and payload, replacing manual offset calculations via
> vu_payloadv4()/vu_payloadv6().
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
> iov.c | 2 -
> udp_vu.c | 121 +++++++++++++++++++++++++++----------------------------
> 2 files changed, 60 insertions(+), 63 deletions(-)
>
> diff --git a/iov.c b/iov.c
> index cb4d6fef5567..8836305fb701 100644
> --- a/iov.c
> +++ b/iov.c
> @@ -175,7 +175,6 @@ bool iov_tail_prune(struct iov_tail *tail)
> * @tail: IO vector tail (modified in place, including backing iovecs)
> * @size: Maximum number of bytes to keep, relative to current tail offset
> */
> -/* cppcheck-suppress unusedFunction */
> void iov_tail_truncate(struct iov_tail *tail, size_t size)
> {
> size_t i, off;
> @@ -195,7 +194,6 @@ void iov_tail_truncate(struct iov_tail *tail, size_t size)
> * @tail: IO vector tail (backing buffers modified in place)
> * @size: Number of leading bytes to preserve
> */
> -/* cppcheck-suppress unusedFunction */
> void iov_tail_zero_end(struct iov_tail *tail, size_t size)
> {
> size_t i, off;
> diff --git a/udp_vu.c b/udp_vu.c
> index 6f6477f7d046..8f4d0aedac10 100644
> --- a/udp_vu.c
> +++ b/udp_vu.c
> @@ -59,21 +59,23 @@ static size_t udp_vu_hdrlen(bool v6)
> /**
> * udp_vu_sock_recv() - Receive datagrams from socket into vhost-user buffers
> * @c: Execution context
> + * @data: IO vector tail for the frame (modified on output)
> * @vq: virtqueue to use to receive data
> * @s: Socket to receive from
> * @v6: Set for IPv6 connections
> - * @dlen: Size of received data (output)
> *
> - * Return: number of iov entries used to store the datagram, 0 if the datagram
> + * Return: size of received data, 0 if the datagram
> * was discarded because the virtqueue is not ready, -1 on error
> */
> -static int udp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq, int s,
> - bool v6, ssize_t *dlen)
> +static ssize_t udp_vu_sock_recv(const struct ctx *c, struct iov_tail *data,
> + struct vu_virtq *vq, int s, bool v6)
> {
> const struct vu_dev *vdev = c->vdev;
> - int iov_cnt, idx, iov_used;
> - size_t off, hdrlen, l2len;
> struct msghdr msg = { 0 };
> + struct iov_tail payload;
> + size_t hdrlen;
> + ssize_t dlen;
> + int iov_cnt;
>
> ASSERT(!c->no_udp);
>
> @@ -83,82 +85,77 @@ static int udp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq, int s,
> if (recvmsg(s, &msg, MSG_DONTWAIT) < 0)
> debug_perror("Failed to discard datagram");
>
> + data->cnt = 0;
> return 0;
> }
>
> /* compute L2 header length */
> hdrlen = udp_vu_hdrlen(v6);
>
> - vu_init_elem(elem, iov_vu, ARRAY_SIZE(elem));
> + vu_init_elem(elem, (struct iovec *)data->iov, data->cnt);
>
> iov_cnt = vu_collect(vdev, vq, elem, ARRAY_SIZE(elem),
> IP_MAX_MTU + ETH_HLEN + VNET_HLEN, NULL);
> if (iov_cnt == 0)
> return -1;
>
> + data->cnt = iov_cnt;
Does something limit iov_cnt to be <= data->cnt? If so, it's not very
obvious from here. If not, then the line above is clearly dangerous.
> +
> /* reserve space for the headers */
> - ASSERT(iov_vu[0].iov_len >= MAX(hdrlen, ETH_ZLEN + VNET_HLEN));
> - iov_vu[0].iov_base = (char *)iov_vu[0].iov_base + hdrlen;
> - iov_vu[0].iov_len -= hdrlen;
> + ASSERT(iov_tail_size(data) >= MAX(hdrlen, ETH_ZLEN + VNET_HLEN));
>
> - /* read data from the socket */
> - msg.msg_iov = iov_vu;
> - msg.msg_iovlen = iov_cnt;
> + payload = *data;
> + iov_drop_header(&payload, hdrlen);
> +
> + struct iovec msg_iov[payload.cnt];
> + msg.msg_iov = msg_iov;
> + msg.msg_iovlen = iov_tail_clone(msg.msg_iov, payload.cnt, &payload);
>
> - *dlen = recvmsg(s, &msg, 0);
> - if (*dlen < 0) {
> + /* read data from the socket */
> + dlen = recvmsg(s, &msg, 0);
> + if (dlen < 0) {
> vu_queue_rewind(vq, iov_cnt);
> return -1;
> }
>
> - /* restore the pointer to the headers address */
> - iov_vu[0].iov_base = (char *)iov_vu[0].iov_base - hdrlen;
> - iov_vu[0].iov_len += hdrlen;
> -
> - /* count the numbers of buffer filled by recvmsg() */
> - idx = iov_skip_bytes(iov_vu, iov_cnt, *dlen + hdrlen, &off);
> -
> - /* adjust last iov length */
> - if (idx < iov_cnt)
> - iov_vu[idx].iov_len = off;
> - iov_used = idx + !!off;
> -
> - /* pad frame to 60 bytes: first buffer is at least ETH_ZLEN long */
> - l2len = *dlen + hdrlen - VNET_HLEN;
> - vu_pad(&iov_vu[0], l2len);
> + iov_tail_truncate(data, MAX(dlen + hdrlen, ETH_ZLEN + VNET_HLEN));
> + iov_tail_zero_end(data, dlen + hdrlen);
> + iov_tail_truncate(data, dlen + hdrlen);
Zeroing the tail, then truncating it seems kind of weird.
> - vu_set_vnethdr(vdev, iov_vu[0].iov_base, iov_used);
> + vu_set_vnethdr(vdev, data->iov[0].iov_base, data->cnt);
>
> /* release unused buffers */
> - vu_queue_rewind(vq, iov_cnt - iov_used);
> + vu_queue_rewind(vq, iov_cnt - data->cnt);
>
> - return iov_used;
> + return dlen;
> }
>
> /**
> * udp_vu_prepare() - Prepare the packet header
> * @c: Execution context
> + * @data: IO vector tail for the frame
> * @toside: Address information for one side of the flow
> * @dlen: Packet data length
> *
> * Return: Layer-4 length
> */
> -static size_t udp_vu_prepare(const struct ctx *c,
> +static size_t udp_vu_prepare(const struct ctx *c, const struct iov_tail *data,
> const struct flowside *toside, ssize_t dlen)
> {
> + const struct iovec *iov = data->iov;
> struct ethhdr *eh;
> size_t l4len;
>
> /* ethernet header */
> - eh = vu_eth(iov_vu[0].iov_base);
> + eh = vu_eth(iov[0].iov_base);
Now that you have an iov_tail, could you clone it and use
IOV_{PEEK,REMOVE}_HEADER() instead of the more specific vu_eth(),
vu_ip() etc?
>
> memcpy(eh->h_dest, c->guest_mac, sizeof(eh->h_dest));
> memcpy(eh->h_source, c->our_tap_mac, sizeof(eh->h_source));
>
> /* initialize header */
> if (inany_v4(&toside->eaddr) && inany_v4(&toside->oaddr)) {
> - struct iphdr *iph = vu_ip(iov_vu[0].iov_base);
> - struct udp_payload_t *bp = vu_payloadv4(iov_vu[0].iov_base);
> + struct iphdr *iph = vu_ip(iov[0].iov_base);
> + struct udp_payload_t *bp = vu_payloadv4(iov[0].iov_base);
>
> eh->h_proto = htons(ETH_P_IP);
>
> @@ -166,8 +163,8 @@ static size_t udp_vu_prepare(const struct ctx *c,
>
> l4len = udp_update_hdr4(iph, bp, toside, dlen, true);
> } else {
> - struct ipv6hdr *ip6h = vu_ip(iov_vu[0].iov_base);
> - struct udp_payload_t *bp = vu_payloadv6(iov_vu[0].iov_base);
> + struct ipv6hdr *ip6h = vu_ip(iov[0].iov_base);
> + struct udp_payload_t *bp = vu_payloadv6(iov[0].iov_base);
>
> eh->h_proto = htons(ETH_P_IPV6);
>
> @@ -182,25 +179,25 @@ static size_t udp_vu_prepare(const struct ctx *c,
> /**
> * udp_vu_csum() - Calculate and set checksum for a UDP packet
> * @toside: Address information for one side of the flow
> - * @iov_used: Number of used iov_vu items
> + * @data: IO vector tail for the frame
> */
> -static void udp_vu_csum(const struct flowside *toside, int iov_used)
> +static void udp_vu_csum(const struct flowside *toside,
> + const struct iov_tail *data)
> {
> const struct in_addr *src4 = inany_v4(&toside->oaddr);
> const struct in_addr *dst4 = inany_v4(&toside->eaddr);
> - char *base = iov_vu[0].iov_base;
> - struct udp_payload_t *bp;
> - struct iov_tail data;
> + struct iov_tail payload = *data;
> + struct udphdr *uh, uh_storage;
> + bool ipv4 = src4 && dst4;
>
> - if (src4 && dst4) {
> - bp = vu_payloadv4(base);
> - data = IOV_TAIL(iov_vu, iov_used, (char *)&bp->data - base);
> - csum_udp4(&bp->uh, *src4, *dst4, &data);
> - } else {
> - bp = vu_payloadv6(base);
> - data = IOV_TAIL(iov_vu, iov_used, (char *)&bp->data - base);
> - csum_udp6(&bp->uh, &toside->oaddr.a6, &toside->eaddr.a6, &data);
> - }
> + iov_drop_header(&payload,
> + udp_vu_hdrlen(!ipv4) - sizeof(struct udphdr));
> + uh = IOV_REMOVE_HEADER(&payload, uh_storage);
> +
> + if (ipv4)
> + csum_udp4(uh, *src4, *dst4, &payload);
> + else
> + csum_udp6(uh, &toside->oaddr.a6, &toside->eaddr.a6, &payload);
> }
>
> /**
> @@ -216,23 +213,25 @@ void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx)
> bool v6 = !(inany_v4(&toside->eaddr) && inany_v4(&toside->oaddr));
> struct vu_dev *vdev = c->vdev;
> struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE];
> + struct iov_tail data;
> int i;
>
> for (i = 0; i < n; i++) {
> ssize_t dlen;
> - int iov_used;
>
> - iov_used = udp_vu_sock_recv(c, vq, s, v6, &dlen);
> - if (iov_used < 0)
> + data = IOV_TAIL(iov_vu, VIRTQUEUE_MAX_SIZE, 0);
> +
> + dlen = udp_vu_sock_recv(c, &data, vq, s, v6);
> + if (dlen < 0)
> break;
>
> - if (iov_used > 0) {
> - udp_vu_prepare(c, toside, dlen);
> + if (data.cnt > 0) {
> + udp_vu_prepare(c, &data, toside, dlen);
> if (*c->pcap) {
> - udp_vu_csum(toside, iov_used);
> - pcap_iov(iov_vu, iov_used, VNET_HLEN);
> + udp_vu_csum(toside, &data);
> + pcap_iov(data.iov, data.cnt, VNET_HLEN);
> }
> - vu_flush(vdev, vq, elem, iov_used);
> + vu_flush(vdev, vq, elem, data.cnt);
> }
> }
> }
> --
> 2.53.0
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2026-03-02 0:13 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-27 14:03 [PATCH 00/12] vhost-user,udp: Handle multiple iovec entries per virtqueue element Laurent Vivier
2026-02-27 14:03 ` [PATCH 01/12] iov: Add iov_tail_truncate() and iov_tail_zero_end() Laurent Vivier
2026-03-01 23:47 ` David Gibson
2026-02-27 14:03 ` [PATCH 02/12] vhost-user: Use ARRAY_SIZE(elem) instead of VIRTQUEUE_MAX_SIZE Laurent Vivier
2026-03-01 23:48 ` David Gibson
2026-02-27 14:03 ` [PATCH 03/12] udp_vu: Use iov_tail to manage virtqueue buffers Laurent Vivier
2026-03-02 0:03 ` David Gibson [this message]
2026-02-27 14:03 ` [PATCH 04/12] udp_vu: Move virtqueue management from udp_vu_sock_recv() to its caller Laurent Vivier
2026-03-02 0:05 ` David Gibson
2026-02-27 14:03 ` [PATCH 05/12] iov: Add IOV_PUT_HEADER() to write header data back to iov_tail Laurent Vivier
2026-03-02 0:08 ` David Gibson
2026-02-27 14:03 ` [PATCH 06/12] udp: Pass iov_tail to udp_update_hdr4()/udp_update_hdr6() Laurent Vivier
2026-03-02 0:13 ` David Gibson
2026-02-27 14:03 ` [PATCH 07/12] udp_vu: Use iov_tail in udp_vu_prepare() Laurent Vivier
2026-03-02 0:24 ` David Gibson
2026-02-27 14:03 ` [PATCH 08/12] vu_common: Pass iov_tail to vu_set_vnethdr() Laurent Vivier
2026-03-02 0:51 ` David Gibson
2026-02-27 14:03 ` [PATCH 09/12] vu_common: Accept explicit iovec counts in vu_set_element() Laurent Vivier
2026-03-02 0:54 ` David Gibson
2026-02-27 14:03 ` [PATCH 10/12] vu_common: Accept explicit iovec count per element in vu_init_elem() Laurent Vivier
2026-03-02 0:55 ` David Gibson
2026-02-27 14:03 ` [PATCH 11/12] vu_common: Prepare to use multibuffer with guest RX Laurent Vivier
2026-03-02 0:59 ` David Gibson
2026-02-27 14:03 ` [PATCH 12/12] vhost-user,udp: Use 2 iovec entries per element Laurent Vivier
2026-03-02 1:03 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aaTT0R7xCNpxP3R4@zatzit \
--to=david@gibson.dropbear.id.au \
--cc=lvivier@redhat.com \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).