From: David Gibson <david@gibson.dropbear.id.au>
To: Laurent Vivier <lvivier@redhat.com>
Cc: passt-dev@passt.top, Jon Maloy <jmaloy@redhat.com>
Subject: Re: [PATCH v4 10/10] vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad()
Date: Thu, 14 May 2026 11:24:51 +1000 [thread overview]
Message-ID: <agUkY8CVMaZ4iHGi@zatzit> (raw)
In-Reply-To: <20260513115218.1662850-11-lvivier@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 5580 bytes --]
On Wed, May 13, 2026 at 01:52:18PM +0200, Laurent Vivier wrote:
> The previous per-protocol padding done by vu_pad() in tcp_vu.c and
> udp_vu.c was only correct for single-buffer frames: it assumed the
> padding area always fell within the first iov, writing past its end
> with a plain memset().
>
> It also required each caller to compute MAX(..., ETH_ZLEN + VNET_HLEN)
> for vu_collect() and to call vu_pad() at the right point, duplicating
> the minimum-size logic across protocols.
>
> Move the Ethernet minimum size enforcement into vu_collect() itself, so
> that enough buffer space is always reserved for padding regardless of
> the requested frame size.
>
> Rewrite vu_pad() to take a full iovec array and use iov_memset(),
> making it safe for multi-buffer (mergeable rx buffer) frames.
>
> In tcp_vu_sock_recv(), replace iov_truncate() with iov_skip_bytes():
> now that all consumers receive explicit data lengths, truncating the
> iovecs is no longer needed. In tcp_vu_data_from_sock(), cap each
> frame's data length against the remaining bytes actually received from
> the socket, so that the last partial frame gets correct headers and
> sequence number advancement.
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> Reviewed-by: Jon Maloy <jmaloy@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
But following on from my comments on v3, a couple of clarity nits for
possible follow up:
[snip]
> diff --git a/vu_common.c b/vu_common.c
> index 704e908aa02c..d07f584f228a 100644
> --- a/vu_common.c
> +++ b/vu_common.c
> @@ -74,6 +74,7 @@ int vu_collect(const struct vu_dev *vdev, struct vu_virtq *vq,
> size_t current_iov = 0;
> int elem_cnt = 0;
>
> + size = MAX(size, ETH_ZLEN /* Ethernet minimum size */ + VNET_HLEN);
Here I think "size" is a reasonable name, since it's the size of the
buffer we're obtaining, i.e. a bound, but not otherwise related to the
length of the frame.
> while (current_size < size && elem_cnt < max_elem &&
> current_iov < max_in_sg) {
> int ret;
> @@ -261,29 +262,27 @@ int vu_send_single(const struct ctx *c, const void *buf, size_t size)
> return -1;
> }
>
> - size += VNET_HLEN;
> elem_cnt = vu_collect(vdev, vq, elem, ARRAY_SIZE(elem), in_sg,
> - ARRAY_SIZE(in_sg), &in_total, size, &total);
> - if (elem_cnt == 0 || total < size) {
> + ARRAY_SIZE(in_sg), &in_total, VNET_HLEN + size, &total);
> + if (elem_cnt == 0 || total < VNET_HLEN + size) {
Here, "l2len" would be a much better name than "size".
> debug("vu_send_single: no space to send the data "
> "elem_cnt %d size %zu", elem_cnt, total);
> goto err;
> }
>
> - total -= VNET_HLEN;
> -
> /* copy data from the buffer to the iovec */
> - iov_from_buf(in_sg, in_total, VNET_HLEN, buf, total);
> + iov_from_buf(in_sg, in_total, VNET_HLEN, buf, size);
>
> if (*c->pcap)
> pcap_iov(in_sg, in_total, VNET_HLEN, size);
>
> + vu_pad(in_sg, in_total, VNET_HLEN + size);
> vu_flush(vdev, vq, elem, elem_cnt, VNET_HLEN + size);
> vu_queue_notify(vdev, vq);
>
> - trace("vhost-user sent %zu", total);
> + trace("vhost-user sent %zu", size);
>
> - return total;
> + return size;
> err:
> for (i = 0; i < elem_cnt; i++)
> vu_queue_detach_element(vq);
> @@ -292,15 +291,15 @@ err:
> }
>
> /**
> - * vu_pad() - Pad 802.3 frame to minimum length (60 bytes) if needed
> - * @iov: Buffer in iovec array where end of 802.3 frame is stored
> - * @l2len: Layer-2 length already filled in frame
> + * vu_pad() - Pad short frames to minimum Ethernet length and truncate iovec
> + * @iov: Pointer to iovec array
> + * @cnt: Number of entries in @iov
> + * @frame_len: Data length in @iov (including virtio-net header)
> */
> -void vu_pad(struct iovec *iov, size_t l2len)
> +void vu_pad(const struct iovec *iov, size_t cnt, size_t frame_len)
Here we have the actual frame length, including device header, but not
padding. "frame_len" is different from the other standard names we
use, so it's not terrible, but "frame" often refers to the L2 object
so it's not great either.
Not sure if 'l1len' or 'l0len' would be getting too cutesy with what
"physical" layer means in a virtual network. Something like
"device_len" maybe? But that should probably include padding as well.
Or alternatively, vu_pad() could be updated to take l2len, and add
VNET_HLEN inside.
> {
> - if (l2len >= ETH_ZLEN)
> - return;
> + size_t min_frame_len = ETH_ZLEN + VNET_HLEN;
>
> - memset((char *)iov->iov_base + iov->iov_len, 0, ETH_ZLEN - l2len);
> - iov->iov_len += ETH_ZLEN - l2len;
> + if (frame_len < min_frame_len)
> + iov_memset(iov, cnt, frame_len, 0, min_frame_len - frame_len);
> }
> diff --git a/vu_common.h b/vu_common.h
> index 77d1849e6115..51f70084a7cb 100644
> --- a/vu_common.h
> +++ b/vu_common.h
> @@ -44,6 +44,6 @@ void vu_flush(const struct vu_dev *vdev, struct vu_virtq *vq,
> void vu_kick_cb(struct vu_dev *vdev, union epoll_ref ref,
> const struct timespec *now);
> int vu_send_single(const struct ctx *c, const void *buf, size_t size);
> -void vu_pad(struct iovec *iov, size_t l2len);
> +void vu_pad(const struct iovec *iov, size_t cnt, size_t frame_len);
>
> #endif /* VU_COMMON_H */
> --
> 2.54.0
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
prev parent reply other threads:[~2026-05-14 1:25 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-13 11:52 [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 01/10] iov: Introduce iov_memset() Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 02/10] iov: Add iov_memcpy() to copy data between iovec arrays Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 03/10] vu_common: Move vnethdr setup into vu_flush() Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 04/10] udp_vu: Move virtqueue management from udp_vu_sock_recv() to its caller Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 05/10] udp_vu: Pass iov explicitly to helpers instead of using file-scoped array Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 06/10] checksum: Pass explicit L4 length to checksum functions Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 07/10] pcap: Pass explicit L2 length to pcap_iov() Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 08/10] vu_common: Pass explicit frame length to vu_flush() Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 09/10] tcp: Pass explicit data length to tcp_fill_headers() Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 10/10] vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad() Laurent Vivier
2026-05-14 1:24 ` David Gibson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=agUkY8CVMaZ4iHGi@zatzit \
--to=david@gibson.dropbear.id.au \
--cc=jmaloy@redhat.com \
--cc=lvivier@redhat.com \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).