* [PATCH v6 0/3] vhost-user,udp: Handle multiple iovec entries per virtqueue element
@ 2026-04-01 19:23 Laurent Vivier
2026-04-01 19:23 ` [PATCH v6 1/3] udp_vu: Allow virtqueue elements with multiple iovec entries Laurent Vivier
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Laurent Vivier @ 2026-04-01 19:23 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
Some virtio-net drivers (notably iPXE) provide descriptors where the
vnet header and the frame payload are in separate buffers, resulting in
two iovec entries per virtqueue element. Currently, the RX (host to
guest) path assumes a single iovec per element, which triggers:
ASSERTION FAILED in virtqueue_map_desc (virtio.c:403):
num_sg < max_num_sg
This series reworks the UDP vhost-user receive path to support multiple
iovec entries per element, fixing the iPXE crash.
This series only addresses the UDP path. TCP vhost-user will be
updated to use multi-iov elements in a subsequent series.
Based-on: 20260401191826.1782394-1-lvivier@redhat.com
v6:
- Rebased on top of
[PATCH 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element
v5:
- This version doesn't change the padding system regarding v4,
it's a complex task that will be addressed in another version
- reorder patches and add new patches
- remove IOV_PUT_HEADER()/with_header() and introduce IOV_PUSH_HEADER()
- don't use the iov_tail to provide the headers to the functions
- move vu_set_vnethdr() to vu_flush(), extract vu_queue_notify()
- move vu_flush() inside loop in tcp_vu_data_from_sock() to flush data
by frame and not by full data length
v4:
- rebase
- replace ASSERT() by assert()
v3:
- include the series "Decouple iovec management from virtqueues elements"
- because of this series, drop:
"vu_common: Accept explicit iovec counts in vu_set_element()"
"vu_common: Accept explicit iovec count per element in vu_init_elem()"
"vu_common: Prepare to use multibuffer with guest RX"
"vhost-user,udp: Use 2 iovec entries per element"
- drop "vu_common: Pass iov_tail to vu_set_vnethdr()"
as the specs insures a buffer is big enough to contain vnet header
- introduce "with_header()" and merge
"udp: Pass iov_tail to udp_update_hdr4()/udp_update_hdr6()" and
"udp_vu: Use iov_tail in udp_vu_prepare()"
to use it
v2:
- add iov_truncate(), iov_memset()
- remove iov_tail_truncate() and iov_tail_zero_end()
- manage 802.3 minimum frame size
Laurent Vivier (3):
udp_vu: Allow virtqueue elements with multiple iovec entries
iov: Introduce IOV_PUSH_HEADER() macro
udp: Pass iov_tail to udp_update_hdr4()/udp_update_hdr6()
iov.c | 22 ++++++++++
iov.h | 11 +++++
udp.c | 70 +++++++++++++++++--------------
udp_internal.h | 4 +-
udp_vu.c | 110 +++++++++++++++++++++++++------------------------
5 files changed, 131 insertions(+), 86 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v6 1/3] udp_vu: Allow virtqueue elements with multiple iovec entries
2026-04-01 19:23 [PATCH v6 0/3] vhost-user,udp: Handle multiple iovec entries per virtqueue element Laurent Vivier
@ 2026-04-01 19:23 ` Laurent Vivier
2026-04-03 11:53 ` Stefano Brivio
2026-04-01 19:23 ` [PATCH v6 2/3] iov: Introduce IOV_PUSH_HEADER() macro Laurent Vivier
2026-04-01 19:23 ` [PATCH v6 3/3] udp: Pass iov_tail to udp_update_hdr4()/udp_update_hdr6() Laurent Vivier
2 siblings, 1 reply; 9+ messages in thread
From: Laurent Vivier @ 2026-04-01 19:23 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
The previous code assumed a 1:1 mapping between virtqueue elements and
iovec entries (enforced by an assert). Drop that assumption to allow
elements that span multiple iovecs: track elem_used separately by
walking the element list against the iov count returned after padding.
This also fixes vu_queue_rewind() and vu_flush() to use the element
count rather than the iov count.
Use iov_tail_clone() in udp_vu_sock_recv() to handle header offset,
replacing the manual base/len adjustment and restore pattern.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
udp_vu.c | 29 ++++++++++++++---------------
1 file changed, 14 insertions(+), 15 deletions(-)
diff --git a/udp_vu.c b/udp_vu.c
index 30af64034516..5608a3a96ff5 100644
--- a/udp_vu.c
+++ b/udp_vu.c
@@ -64,30 +64,25 @@ static size_t udp_vu_hdrlen(bool v6)
*/
static ssize_t udp_vu_sock_recv(struct iovec *iov, size_t *cnt, int s, bool v6)
{
+ struct iovec msg_iov[*cnt];
struct msghdr msg = { 0 };
+ struct iov_tail payload;
size_t hdrlen, iov_used;
ssize_t dlen;
/* compute L2 header length */
hdrlen = udp_vu_hdrlen(v6);
- /* reserve space for the headers */
- assert(iov[0].iov_len >= MAX(hdrlen, ETH_ZLEN + VNET_HLEN));
- iov[0].iov_base = (char *)iov[0].iov_base + hdrlen;
- iov[0].iov_len -= hdrlen;
+ payload = IOV_TAIL(iov, *cnt, hdrlen);
- /* read data from the socket */
- msg.msg_iov = iov;
- msg.msg_iovlen = *cnt;
+ msg.msg_iov = msg_iov;
+ msg.msg_iovlen = iov_tail_clone(msg.msg_iov, payload.cnt, &payload);
+ /* read data from the socket */
dlen = recvmsg(s, &msg, 0);
if (dlen < 0)
return -1;
- /* restore the pointer to the headers address */
- iov[0].iov_base = (char *)iov[0].iov_base - hdrlen;
- iov[0].iov_len += hdrlen;
-
iov_used = iov_skip_bytes(iov, *cnt,
MAX(dlen + hdrlen, VNET_HLEN + ETH_ZLEN),
NULL);
@@ -205,7 +200,7 @@ void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx)
}
for (i = 0; i < n; i++) {
- unsigned elem_cnt, elem_used;
+ unsigned elem_cnt, elem_used, j, k;
size_t iov_cnt;
ssize_t dlen;
@@ -215,15 +210,19 @@ void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx)
if (elem_cnt == 0)
break;
- assert((size_t)elem_cnt == iov_cnt); /* one iovec per element */
-
dlen = udp_vu_sock_recv(iov_vu, &iov_cnt, s, v6);
if (dlen < 0) {
vu_queue_rewind(vq, elem_cnt);
break;
}
- elem_used = iov_cnt; /* one iovec per element */
+ elem_used = 0;
+ for (j = 0, k = 0; k < iov_cnt && j < elem_cnt; j++) {
+ if (k + elem[j].in_num > iov_cnt)
+ elem[j].in_num = iov_cnt - k;
+ k += elem[j].in_num;
+ elem_used++;
+ }
/* release unused buffers */
vu_queue_rewind(vq, elem_cnt - elem_used);
--
2.53.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v6 2/3] iov: Introduce IOV_PUSH_HEADER() macro
2026-04-01 19:23 [PATCH v6 0/3] vhost-user,udp: Handle multiple iovec entries per virtqueue element Laurent Vivier
2026-04-01 19:23 ` [PATCH v6 1/3] udp_vu: Allow virtqueue elements with multiple iovec entries Laurent Vivier
@ 2026-04-01 19:23 ` Laurent Vivier
2026-04-01 19:23 ` [PATCH v6 3/3] udp: Pass iov_tail to udp_update_hdr4()/udp_update_hdr6() Laurent Vivier
2 siblings, 0 replies; 9+ messages in thread
From: Laurent Vivier @ 2026-04-01 19:23 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
Add iov_push_header_() and its typed wrapper IOV_PUSH_HEADER() to write
a header into an iov_tail at the current offset and advance past it.
This is the write counterpart to IOV_PEEK_HEADER() / IOV_REMOVE_HEADER(),
using iov_from_buf() to copy the header data across iovec boundaries.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
iov.c | 23 +++++++++++++++++++++++
iov.h | 11 +++++++++++
2 files changed, 34 insertions(+)
diff --git a/iov.c b/iov.c
index 2289b425529e..6357c477bea6 100644
--- a/iov.c
+++ b/iov.c
@@ -360,6 +360,29 @@ void *iov_peek_header_(struct iov_tail *tail, void *v, size_t len, size_t align)
return v;
}
+/**
+ * iov_push_header_() - Write a new header to an IOV tail
+ * @tail: IOV tail to write header to
+ * @v: Pointer to header data to write
+ * @len: Length of header to write, in bytes
+ *
+ * Return: number of bytes written
+ */
+/* cppcheck-suppress unusedFunction */
+size_t iov_push_header_(struct iov_tail *tail, const void *v, size_t len)
+{
+ size_t l;
+
+ if (!iov_tail_prune(tail))
+ return 0; /* No space */
+
+ l = iov_from_buf(tail->iov, tail->cnt, tail->off, v, len);
+
+ tail->off = tail->off + l;
+
+ return l;
+}
+
/**
* iov_remove_header_() - Remove a header from an IOV tail
* @tail: IOV tail to remove header from (modified)
diff --git a/iov.h b/iov.h
index 074266e127ef..8e4d5f54b9e4 100644
--- a/iov.h
+++ b/iov.h
@@ -93,6 +93,7 @@ bool iov_tail_prune(struct iov_tail *tail);
size_t iov_tail_size(struct iov_tail *tail);
bool iov_drop_header(struct iov_tail *tail, size_t len);
void *iov_peek_header_(struct iov_tail *tail, void *v, size_t len, size_t align);
+size_t iov_push_header_(struct iov_tail *tail, const void *v, size_t len);
void *iov_remove_header_(struct iov_tail *tail, void *v, size_t len, size_t align);
ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt,
struct iov_tail *tail);
@@ -115,6 +116,16 @@ ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt,
sizeof(var_), \
__alignof__(var_))))
+/**
+ * IOV_PUSH_HEADER() - Write a new header to an IOV tail
+ * @tail_: IOV tail to write header to
+ * @var_: A variable containing the header data to write
+ *
+ * Return: number of bytes written
+ */
+#define IOV_PUSH_HEADER(tail_, var_) \
+ (iov_push_header_((tail_), &(var_), sizeof(var_)))
+
/**
* IOV_REMOVE_HEADER() - Remove and return typed header from an IOV tail
* @tail_: IOV tail to remove header from (modified)
--
2.53.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v6 3/3] udp: Pass iov_tail to udp_update_hdr4()/udp_update_hdr6()
2026-04-01 19:23 [PATCH v6 0/3] vhost-user,udp: Handle multiple iovec entries per virtqueue element Laurent Vivier
2026-04-01 19:23 ` [PATCH v6 1/3] udp_vu: Allow virtqueue elements with multiple iovec entries Laurent Vivier
2026-04-01 19:23 ` [PATCH v6 2/3] iov: Introduce IOV_PUSH_HEADER() macro Laurent Vivier
@ 2026-04-01 19:23 ` Laurent Vivier
2026-04-03 11:53 ` Stefano Brivio
2 siblings, 1 reply; 9+ messages in thread
From: Laurent Vivier @ 2026-04-01 19:23 UTC (permalink / raw)
To: passt-dev; +Cc: Laurent Vivier
Change udp_update_hdr4() and udp_update_hdr6() to take an iov_tail
pointing at the UDP frame instead of a contiguous udp_payload_t buffer
and explicit data length. This lets vhost-user pass scatter-gather
virtqueue buffers directly without an intermediate copy.
The UDP header is built into a local struct udphdr and written back with
IOV_PUSH_HEADER(). On the tap side, udp_tap_prepare() wraps the
existing udp_payload_t in a two-element iov to match the new interface.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
iov.c | 1 -
udp.c | 70 ++++++++++++++++++++++++-------------------
udp_internal.h | 4 +--
udp_vu.c | 81 +++++++++++++++++++++++++++-----------------------
4 files changed, 84 insertions(+), 72 deletions(-)
diff --git a/iov.c b/iov.c
index 6357c477bea6..d5fb4e81a502 100644
--- a/iov.c
+++ b/iov.c
@@ -368,7 +368,6 @@ void *iov_peek_header_(struct iov_tail *tail, void *v, size_t len, size_t align)
*
* Return: number of bytes written
*/
-/* cppcheck-suppress unusedFunction */
size_t iov_push_header_(struct iov_tail *tail, const void *v, size_t len)
{
size_t l;
diff --git a/udp.c b/udp.c
index e113b26bc726..0035a645ded5 100644
--- a/udp.c
+++ b/udp.c
@@ -255,21 +255,22 @@ static void udp_iov_init(const struct ctx *c)
/**
* udp_update_hdr4() - Update headers for one IPv4 datagram
* @ip4h: Pre-filled IPv4 header (except for tot_len and saddr)
- * @bp: Pointer to udp_payload_t to update
+ * @payload: iov_tail with UDP payload to update
* @toside: Flowside for destination side
* @dlen: Length of UDP payload
* @no_udp_csum: Do not set UDP checksum
*
* Return: size of IPv4 payload (UDP header + data)
*/
-size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
+size_t udp_update_hdr4(struct iphdr *ip4h, struct iov_tail *payload,
const struct flowside *toside, size_t dlen,
bool no_udp_csum)
{
const struct in_addr *src = inany_v4(&toside->oaddr);
const struct in_addr *dst = inany_v4(&toside->eaddr);
- size_t l4len = dlen + sizeof(bp->uh);
+ size_t l4len = dlen + sizeof(struct udphdr);
size_t l3len = l4len + sizeof(*ip4h);
+ struct udphdr uh;
assert(src && dst);
@@ -278,19 +279,18 @@ size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
ip4h->saddr = src->s_addr;
ip4h->check = csum_ip4_header(l3len, IPPROTO_UDP, *src, *dst);
- bp->uh.source = htons(toside->oport);
- bp->uh.dest = htons(toside->eport);
- bp->uh.len = htons(l4len);
+ uh.source = htons(toside->oport);
+ uh.dest = htons(toside->eport);
+ uh.len = htons(l4len);
if (no_udp_csum) {
- bp->uh.check = 0;
+ uh.check = 0;
} else {
- const struct iovec iov = {
- .iov_base = bp->data,
- .iov_len = dlen
- };
- struct iov_tail data = IOV_TAIL(&iov, 1, 0);
- csum_udp4(&bp->uh, *src, *dst, &data, l4len);
+ struct iov_tail data = *payload;
+
+ IOV_DROP_HEADER(&data, struct udphdr);
+ csum_udp4(&uh, *src, *dst, &data, l4len);
}
+ IOV_PUSH_HEADER(payload, uh);
return l4len;
}
@@ -299,18 +299,19 @@ size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
* udp_update_hdr6() - Update headers for one IPv6 datagram
* @ip6h: Pre-filled IPv6 header (except for payload_len and
* addresses)
- * @bp: Pointer to udp_payload_t to update
+ * @payload: iov_tail with UDP payload to update
* @toside: Flowside for destination side
* @dlen: Length of UDP payload
* @no_udp_csum: Do not set UDP checksum
*
* Return: size of IPv6 payload (UDP header + data)
*/
-size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp,
+size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct iov_tail *payload,
const struct flowside *toside, size_t dlen,
bool no_udp_csum)
{
- uint16_t l4len = dlen + sizeof(bp->uh);
+ uint16_t l4len = dlen + sizeof(struct udphdr);
+ struct udphdr uh;
ip6h->payload_len = htons(l4len);
ip6h->daddr = toside->eaddr.a6;
@@ -319,24 +320,24 @@ size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp,
ip6h->nexthdr = IPPROTO_UDP;
ip6h->hop_limit = 255;
- bp->uh.source = htons(toside->oport);
- bp->uh.dest = htons(toside->eport);
- bp->uh.len = ip6h->payload_len;
+ uh.source = htons(toside->oport);
+ uh.dest = htons(toside->eport);
+ uh.len = htons(l4len);
if (no_udp_csum) {
/* 0 is an invalid checksum for UDP IPv6 and dropped by
- * the kernel stack, even if the checksum is disabled by virtio
- * flags. We need to put any non-zero value here.
+ * the kernel stack, even if the checksum is disabled
+ * by virtio flags. We need to put any non-zero value
+ * here.
*/
- bp->uh.check = 0xffff;
+ uh.check = 0xffff;
} else {
- const struct iovec iov = {
- .iov_base = bp->data,
- .iov_len = dlen
- };
- struct iov_tail data = IOV_TAIL(&iov, 1, 0);
- csum_udp6(&bp->uh, &toside->oaddr.a6, &toside->eaddr.a6, &data,
- l4len);
+ struct iov_tail data = *payload;
+
+ IOV_DROP_HEADER(&data, struct udphdr);
+ csum_udp6(&uh, &toside->oaddr.a6, &toside->eaddr.a6,
+ &data, l4len);
}
+ IOV_PUSH_HEADER(payload, uh);
return l4len;
}
@@ -375,11 +376,18 @@ static void udp_tap_prepare(const struct mmsghdr *mmh,
struct ethhdr *eh = (*tap_iov)[UDP_IOV_ETH].iov_base;
struct udp_payload_t *bp = &udp_payload[idx];
struct udp_meta_t *bm = &udp_meta[idx];
+ struct iovec iov[2];
+ struct iov_tail payload = IOV_TAIL(iov, ARRAY_SIZE(iov), 0);
size_t l4len, l2len;
+ iov[0].iov_base = &bp->uh;
+ iov[0].iov_len = sizeof(bp->uh);
+ iov[1].iov_base = bp->data;
+ iov[1].iov_len = mmh[idx].msg_len;
+
eth_update_mac(eh, NULL, tap_omac);
if (!inany_v4(&toside->eaddr) || !inany_v4(&toside->oaddr)) {
- l4len = udp_update_hdr6(&bm->ip6h, bp, toside,
+ l4len = udp_update_hdr6(&bm->ip6h, &payload, toside,
mmh[idx].msg_len, no_udp_csum);
l2len = MAX(l4len + sizeof(bm->ip6h) + ETH_HLEN, ETH_ZLEN);
@@ -388,7 +396,7 @@ static void udp_tap_prepare(const struct mmsghdr *mmh,
eh->h_proto = htons_constant(ETH_P_IPV6);
(*tap_iov)[UDP_IOV_IP] = IOV_OF_LVALUE(bm->ip6h);
} else {
- l4len = udp_update_hdr4(&bm->ip4h, bp, toside,
+ l4len = udp_update_hdr4(&bm->ip4h, &payload, toside,
mmh[idx].msg_len, no_udp_csum);
l2len = MAX(l4len + sizeof(bm->ip4h) + ETH_HLEN, ETH_ZLEN);
diff --git a/udp_internal.h b/udp_internal.h
index 64e457748324..e6cbaab79519 100644
--- a/udp_internal.h
+++ b/udp_internal.h
@@ -25,10 +25,10 @@ struct udp_payload_t {
} __attribute__ ((packed, aligned(__alignof__(unsigned int))));
#endif
-size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
+size_t udp_update_hdr4(struct iphdr *ip4h, struct iov_tail *payload,
const struct flowside *toside, size_t dlen,
bool no_udp_csum);
-size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp,
+size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct iov_tail *payload,
const struct flowside *toside, size_t dlen,
bool no_udp_csum);
void udp_sock_fwd(const struct ctx *c, int s, int rule_hint,
diff --git a/udp_vu.c b/udp_vu.c
index 5608a3a96ff5..5bc9509a1b98 100644
--- a/udp_vu.c
+++ b/udp_vu.c
@@ -96,43 +96,53 @@ static ssize_t udp_vu_sock_recv(struct iovec *iov, size_t *cnt, int s, bool v6)
/**
* udp_vu_prepare() - Prepare the packet header
* @c: Execution context
- * @iov: IO vector for the frame (including vnet header)
+ * @data: IO vector tail for the frame,
+ * on return, points to the L3 frame
* @toside: Address information for one side of the flow
* @dlen: Packet data length
*
* Return: Layer-4 length
*/
-static size_t udp_vu_prepare(const struct ctx *c, const struct iovec *iov,
- const struct flowside *toside, ssize_t dlen)
+static size_t udp_vu_prepare(const struct ctx *c, struct iov_tail *data,
+ const struct flowside *toside, size_t dlen)
{
- struct ethhdr *eh;
+ bool ipv4 = inany_v4(&toside->eaddr) && inany_v4(&toside->oaddr);
+ struct ethhdr eh;
size_t l4len;
/* ethernet header */
- eh = vu_eth(iov[0].iov_base);
+ memcpy(eh.h_dest, c->guest_mac, sizeof(eh.h_dest));
+ memcpy(eh.h_source, c->our_tap_mac, sizeof(eh.h_source));
- memcpy(eh->h_dest, c->guest_mac, sizeof(eh->h_dest));
- memcpy(eh->h_source, c->our_tap_mac, sizeof(eh->h_source));
+ if (ipv4)
+ eh.h_proto = htons(ETH_P_IP);
+ else
+ eh.h_proto = htons(ETH_P_IPV6);
+ IOV_PUSH_HEADER(data, eh);
/* initialize header */
- if (inany_v4(&toside->eaddr) && inany_v4(&toside->oaddr)) {
- struct iphdr *iph = vu_ip(iov[0].iov_base);
- struct udp_payload_t *bp = vu_payloadv4(iov[0].iov_base);
+ if (ipv4) {
+ struct iov_tail udp_frame;
+ struct iphdr iph;
- eh->h_proto = htons(ETH_P_IP);
+ iph = (struct iphdr)L2_BUF_IP4_INIT(IPPROTO_UDP);
- *iph = (struct iphdr)L2_BUF_IP4_INIT(IPPROTO_UDP);
+ udp_frame = *data;
+ IOV_DROP_HEADER(&udp_frame, struct iphdr);
+ l4len = udp_update_hdr4(&iph, &udp_frame, toside, dlen, true);
- l4len = udp_update_hdr4(iph, bp, toside, dlen, true);
+ IOV_PUSH_HEADER(data, iph);
} else {
- struct ipv6hdr *ip6h = vu_ip(iov[0].iov_base);
- struct udp_payload_t *bp = vu_payloadv6(iov[0].iov_base);
+ struct iov_tail udp_frame;
+ struct ipv6hdr ip6h;
- eh->h_proto = htons(ETH_P_IPV6);
+ ip6h = (struct ipv6hdr)L2_BUF_IP6_INIT(IPPROTO_UDP);
- *ip6h = (struct ipv6hdr)L2_BUF_IP6_INIT(IPPROTO_UDP);
+ udp_frame = *data;
+ IOV_DROP_HEADER(&udp_frame, struct ipv6hdr);
+ l4len = udp_update_hdr6(&ip6h, &udp_frame, toside, dlen, true);
- l4len = udp_update_hdr6(ip6h, bp, toside, dlen, true);
+ IOV_PUSH_HEADER(data, ip6h);
}
return l4len;
@@ -141,29 +151,23 @@ static size_t udp_vu_prepare(const struct ctx *c, const struct iovec *iov,
/**
* udp_vu_csum() - Calculate and set checksum for a UDP packet
* @toside: Address information for one side of the flow
- * @iov: IO vector for the frame
- * @cnt: Number of IO vector entries
+ * @data: IO vector tail for the L3 frame
* @l4len: L4 length
*/
-static void udp_vu_csum(const struct flowside *toside, const struct iovec *iov,
- size_t cnt, size_t l4len)
+static void udp_vu_csum(const struct flowside *toside, struct iov_tail *data,
+ size_t l4len)
{
const struct in_addr *src4 = inany_v4(&toside->oaddr);
const struct in_addr *dst4 = inany_v4(&toside->eaddr);
- char *base = iov[0].iov_base;
- struct udp_payload_t *bp;
- struct iov_tail data;
-
- if (src4 && dst4) {
- bp = vu_payloadv4(base);
- data = IOV_TAIL(iov, cnt, (char *)&bp->data - base);
- csum_udp4(&bp->uh, *src4, *dst4, &data, l4len);
- } else {
- bp = vu_payloadv6(base);
- data = IOV_TAIL(iov, cnt, (char *)&bp->data - base);
- csum_udp6(&bp->uh, &toside->oaddr.a6, &toside->eaddr.a6, &data,
- l4len);
- }
+ struct udphdr *uh, uh_storage;
+ bool ipv4 = src4 && dst4;
+
+ uh = IOV_REMOVE_HEADER(data, uh_storage);
+
+ if (ipv4)
+ csum_udp4(uh, *src4, *dst4, data, l4len);
+ else
+ csum_udp6(uh, &toside->oaddr.a6, &toside->eaddr.a6, data, l4len);
}
/**
@@ -228,9 +232,10 @@ void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx)
vu_queue_rewind(vq, elem_cnt - elem_used);
if (iov_cnt > 0) {
- size_t l4len = udp_vu_prepare(c, iov_vu, toside, dlen);
+ struct iov_tail data = IOV_TAIL(iov_vu, iov_cnt, VNET_HLEN);
+ size_t l4len = udp_vu_prepare(c, &data, toside, dlen);
if (*c->pcap) {
- udp_vu_csum(toside, iov_vu, iov_cnt, l4len);
+ udp_vu_csum(toside, &data, l4len);
pcap_iov(iov_vu, iov_cnt, VNET_HLEN,
hdrlen + dlen - VNET_HLEN);
}
--
2.53.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6 1/3] udp_vu: Allow virtqueue elements with multiple iovec entries
2026-04-01 19:23 ` [PATCH v6 1/3] udp_vu: Allow virtqueue elements with multiple iovec entries Laurent Vivier
@ 2026-04-03 11:53 ` Stefano Brivio
2026-04-03 15:18 ` Laurent Vivier
0 siblings, 1 reply; 9+ messages in thread
From: Stefano Brivio @ 2026-04-03 11:53 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
On Wed, 1 Apr 2026 21:23:24 +0200
Laurent Vivier <lvivier@redhat.com> wrote:
> The previous code assumed a 1:1 mapping between virtqueue elements and
> iovec entries (enforced by an assert). Drop that assumption to allow
> elements that span multiple iovecs: track elem_used separately by
> walking the element list against the iov count returned after padding.
> This also fixes vu_queue_rewind() and vu_flush() to use the element
> count rather than the iov count.
>
> Use iov_tail_clone() in udp_vu_sock_recv() to handle header offset,
> replacing the manual base/len adjustment and restore pattern.
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
> udp_vu.c | 29 ++++++++++++++---------------
> 1 file changed, 14 insertions(+), 15 deletions(-)
>
> diff --git a/udp_vu.c b/udp_vu.c
> index 30af64034516..5608a3a96ff5 100644
> --- a/udp_vu.c
> +++ b/udp_vu.c
> @@ -64,30 +64,25 @@ static size_t udp_vu_hdrlen(bool v6)
> */
> static ssize_t udp_vu_sock_recv(struct iovec *iov, size_t *cnt, int s, bool v6)
> {
> + struct iovec msg_iov[*cnt];
Variable-length Arrays (VLAs) are allowed starting from C99 but we
should really really avoid them. If 'cnt' is big enough, we risk
writing all over the place. That's the main reason why they were more
or less banned from the Linux kernel some years ago and eventually
eradicated:
https://lore.kernel.org/lkml/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com/
https://lore.kernel.org/lkml/20181028172401.GA41102@beast/
Can we use VIRTQUEUE_MAX_SIZE as upper bound like udp_vu_sock_to_tap()
does?
We should probably add -Wvla by the way.
> struct msghdr msg = { 0 };
> + struct iov_tail payload;
> size_t hdrlen, iov_used;
> ssize_t dlen;
>
> /* compute L2 header length */
> hdrlen = udp_vu_hdrlen(v6);
>
> - /* reserve space for the headers */
> - assert(iov[0].iov_len >= MAX(hdrlen, ETH_ZLEN + VNET_HLEN));
> - iov[0].iov_base = (char *)iov[0].iov_base + hdrlen;
> - iov[0].iov_len -= hdrlen;
> + payload = IOV_TAIL(iov, *cnt, hdrlen);
>
> - /* read data from the socket */
> - msg.msg_iov = iov;
> - msg.msg_iovlen = *cnt;
> + msg.msg_iov = msg_iov;
> + msg.msg_iovlen = iov_tail_clone(msg.msg_iov, payload.cnt, &payload);
>
> + /* read data from the socket */
> dlen = recvmsg(s, &msg, 0);
> if (dlen < 0)
> return -1;
>
> - /* restore the pointer to the headers address */
> - iov[0].iov_base = (char *)iov[0].iov_base - hdrlen;
> - iov[0].iov_len += hdrlen;
> -
> iov_used = iov_skip_bytes(iov, *cnt,
> MAX(dlen + hdrlen, VNET_HLEN + ETH_ZLEN),
> NULL);
> @@ -205,7 +200,7 @@ void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx)
> }
>
> for (i = 0; i < n; i++) {
> - unsigned elem_cnt, elem_used;
> + unsigned elem_cnt, elem_used, j, k;
> size_t iov_cnt;
> ssize_t dlen;
>
> @@ -215,15 +210,19 @@ void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx)
> if (elem_cnt == 0)
> break;
>
> - assert((size_t)elem_cnt == iov_cnt); /* one iovec per element */
> -
> dlen = udp_vu_sock_recv(iov_vu, &iov_cnt, s, v6);
> if (dlen < 0) {
> vu_queue_rewind(vq, elem_cnt);
> break;
> }
>
> - elem_used = iov_cnt; /* one iovec per element */
> + elem_used = 0;
> + for (j = 0, k = 0; k < iov_cnt && j < elem_cnt; j++) {
> + if (k + elem[j].in_num > iov_cnt)
> + elem[j].in_num = iov_cnt - k;
I think it would be more intuitive to write it like this:
size_t iov_still_needed = iov_cnt - k;
if (elem[j].in_num > iov_still_needed)
elem[j].in_num = iov_still_needed;
...otherwise it looks like 'k + elem[j].in_num' needs to satisfy some
kind of bound, but that's not the case.
> + k += elem[j].in_num;
> + elem_used++;
> + }
>
> /* release unused buffers */
> vu_queue_rewind(vq, elem_cnt - elem_used);
--
Stefano
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6 3/3] udp: Pass iov_tail to udp_update_hdr4()/udp_update_hdr6()
2026-04-01 19:23 ` [PATCH v6 3/3] udp: Pass iov_tail to udp_update_hdr4()/udp_update_hdr6() Laurent Vivier
@ 2026-04-03 11:53 ` Stefano Brivio
0 siblings, 0 replies; 9+ messages in thread
From: Stefano Brivio @ 2026-04-03 11:53 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
On Wed, 1 Apr 2026 21:23:26 +0200
Laurent Vivier <lvivier@redhat.com> wrote:
> Change udp_update_hdr4() and udp_update_hdr6() to take an iov_tail
> pointing at the UDP frame instead of a contiguous udp_payload_t buffer
> and explicit data length. This lets vhost-user pass scatter-gather
> virtqueue buffers directly without an intermediate copy.
>
> The UDP header is built into a local struct udphdr and written back with
> IOV_PUSH_HEADER(). On the tap side, udp_tap_prepare() wraps the
> existing udp_payload_t in a two-element iov to match the new interface.
>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
> iov.c | 1 -
> udp.c | 70 ++++++++++++++++++++++++-------------------
> udp_internal.h | 4 +--
> udp_vu.c | 81 +++++++++++++++++++++++++++-----------------------
> 4 files changed, 84 insertions(+), 72 deletions(-)
>
> diff --git a/iov.c b/iov.c
> index 6357c477bea6..d5fb4e81a502 100644
> --- a/iov.c
> +++ b/iov.c
> @@ -368,7 +368,6 @@ void *iov_peek_header_(struct iov_tail *tail, void *v, size_t len, size_t align)
> *
> * Return: number of bytes written
> */
> -/* cppcheck-suppress unusedFunction */
> size_t iov_push_header_(struct iov_tail *tail, const void *v, size_t len)
> {
> size_t l;
> diff --git a/udp.c b/udp.c
> index e113b26bc726..0035a645ded5 100644
> --- a/udp.c
> +++ b/udp.c
> @@ -255,21 +255,22 @@ static void udp_iov_init(const struct ctx *c)
> /**
> * udp_update_hdr4() - Update headers for one IPv4 datagram
> * @ip4h: Pre-filled IPv4 header (except for tot_len and saddr)
> - * @bp: Pointer to udp_payload_t to update
> + * @payload: iov_tail with UDP payload to update
> * @toside: Flowside for destination side
> * @dlen: Length of UDP payload
> * @no_udp_csum: Do not set UDP checksum
> *
> * Return: size of IPv4 payload (UDP header + data)
> */
> -size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
> +size_t udp_update_hdr4(struct iphdr *ip4h, struct iov_tail *payload,
> const struct flowside *toside, size_t dlen,
> bool no_udp_csum)
> {
> const struct in_addr *src = inany_v4(&toside->oaddr);
> const struct in_addr *dst = inany_v4(&toside->eaddr);
> - size_t l4len = dlen + sizeof(bp->uh);
> + size_t l4len = dlen + sizeof(struct udphdr);
> size_t l3len = l4len + sizeof(*ip4h);
> + struct udphdr uh;
>
> assert(src && dst);
>
> @@ -278,19 +279,18 @@ size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
> ip4h->saddr = src->s_addr;
> ip4h->check = csum_ip4_header(l3len, IPPROTO_UDP, *src, *dst);
>
> - bp->uh.source = htons(toside->oport);
> - bp->uh.dest = htons(toside->eport);
> - bp->uh.len = htons(l4len);
> + uh.source = htons(toside->oport);
> + uh.dest = htons(toside->eport);
> + uh.len = htons(l4len);
> if (no_udp_csum) {
> - bp->uh.check = 0;
> + uh.check = 0;
> } else {
> - const struct iovec iov = {
> - .iov_base = bp->data,
> - .iov_len = dlen
> - };
> - struct iov_tail data = IOV_TAIL(&iov, 1, 0);
> - csum_udp4(&bp->uh, *src, *dst, &data, l4len);
> + struct iov_tail data = *payload;
> +
> + IOV_DROP_HEADER(&data, struct udphdr);
> + csum_udp4(&uh, *src, *dst, &data, l4len);
> }
> + IOV_PUSH_HEADER(payload, uh);
>
> return l4len;
> }
> @@ -299,18 +299,19 @@ size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
> * udp_update_hdr6() - Update headers for one IPv6 datagram
> * @ip6h: Pre-filled IPv6 header (except for payload_len and
> * addresses)
> - * @bp: Pointer to udp_payload_t to update
> + * @payload: iov_tail with UDP payload to update
> * @toside: Flowside for destination side
> * @dlen: Length of UDP payload
> * @no_udp_csum: Do not set UDP checksum
> *
> * Return: size of IPv6 payload (UDP header + data)
> */
> -size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp,
> +size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct iov_tail *payload,
> const struct flowside *toside, size_t dlen,
> bool no_udp_csum)
> {
> - uint16_t l4len = dlen + sizeof(bp->uh);
> + uint16_t l4len = dlen + sizeof(struct udphdr);
> + struct udphdr uh;
>
> ip6h->payload_len = htons(l4len);
> ip6h->daddr = toside->eaddr.a6;
> @@ -319,24 +320,24 @@ size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp,
> ip6h->nexthdr = IPPROTO_UDP;
> ip6h->hop_limit = 255;
>
> - bp->uh.source = htons(toside->oport);
> - bp->uh.dest = htons(toside->eport);
> - bp->uh.len = ip6h->payload_len;
> + uh.source = htons(toside->oport);
> + uh.dest = htons(toside->eport);
> + uh.len = htons(l4len);
> if (no_udp_csum) {
> /* 0 is an invalid checksum for UDP IPv6 and dropped by
> - * the kernel stack, even if the checksum is disabled by virtio
> - * flags. We need to put any non-zero value here.
> + * the kernel stack, even if the checksum is disabled
> + * by virtio flags. We need to put any non-zero value
> + * here.
This part looks unrelated (and the old comment was fitting nicely).
> */
> - bp->uh.check = 0xffff;
> + uh.check = 0xffff;
> } else {
> - const struct iovec iov = {
> - .iov_base = bp->data,
> - .iov_len = dlen
> - };
> - struct iov_tail data = IOV_TAIL(&iov, 1, 0);
> - csum_udp6(&bp->uh, &toside->oaddr.a6, &toside->eaddr.a6, &data,
> - l4len);
> + struct iov_tail data = *payload;
> +
> + IOV_DROP_HEADER(&data, struct udphdr);
> + csum_udp6(&uh, &toside->oaddr.a6, &toside->eaddr.a6,
> + &data, l4len);
> }
> + IOV_PUSH_HEADER(payload, uh);
>
> return l4len;
> }
> @@ -375,11 +376,18 @@ static void udp_tap_prepare(const struct mmsghdr *mmh,
> struct ethhdr *eh = (*tap_iov)[UDP_IOV_ETH].iov_base;
> struct udp_payload_t *bp = &udp_payload[idx];
> struct udp_meta_t *bm = &udp_meta[idx];
> + struct iovec iov[2];
> + struct iov_tail payload = IOV_TAIL(iov, ARRAY_SIZE(iov), 0);
> size_t l4len, l2len;
>
> + iov[0].iov_base = &bp->uh;
> + iov[0].iov_len = sizeof(bp->uh);
> + iov[1].iov_base = bp->data;
> + iov[1].iov_len = mmh[idx].msg_len;
> +
> eth_update_mac(eh, NULL, tap_omac);
> if (!inany_v4(&toside->eaddr) || !inany_v4(&toside->oaddr)) {
> - l4len = udp_update_hdr6(&bm->ip6h, bp, toside,
> + l4len = udp_update_hdr6(&bm->ip6h, &payload, toside,
> mmh[idx].msg_len, no_udp_csum);
>
> l2len = MAX(l4len + sizeof(bm->ip6h) + ETH_HLEN, ETH_ZLEN);
> @@ -388,7 +396,7 @@ static void udp_tap_prepare(const struct mmsghdr *mmh,
> eh->h_proto = htons_constant(ETH_P_IPV6);
> (*tap_iov)[UDP_IOV_IP] = IOV_OF_LVALUE(bm->ip6h);
> } else {
> - l4len = udp_update_hdr4(&bm->ip4h, bp, toside,
> + l4len = udp_update_hdr4(&bm->ip4h, &payload, toside,
> mmh[idx].msg_len, no_udp_csum);
>
> l2len = MAX(l4len + sizeof(bm->ip4h) + ETH_HLEN, ETH_ZLEN);
> diff --git a/udp_internal.h b/udp_internal.h
> index 64e457748324..e6cbaab79519 100644
> --- a/udp_internal.h
> +++ b/udp_internal.h
> @@ -25,10 +25,10 @@ struct udp_payload_t {
> } __attribute__ ((packed, aligned(__alignof__(unsigned int))));
> #endif
>
> -size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
> +size_t udp_update_hdr4(struct iphdr *ip4h, struct iov_tail *payload,
> const struct flowside *toside, size_t dlen,
> bool no_udp_csum);
> -size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp,
> +size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct iov_tail *payload,
> const struct flowside *toside, size_t dlen,
> bool no_udp_csum);
> void udp_sock_fwd(const struct ctx *c, int s, int rule_hint,
> diff --git a/udp_vu.c b/udp_vu.c
> index 5608a3a96ff5..5bc9509a1b98 100644
> --- a/udp_vu.c
> +++ b/udp_vu.c
> @@ -96,43 +96,53 @@ static ssize_t udp_vu_sock_recv(struct iovec *iov, size_t *cnt, int s, bool v6)
> /**
> * udp_vu_prepare() - Prepare the packet header
> * @c: Execution context
> - * @iov: IO vector for the frame (including vnet header)
> + * @data: IO vector tail for the frame,
> + * on return, points to the L3 frame
I think "L3 header" or "IP header" is clearer than "L3 frame" (also
because... it's not a frame). By the way it fits in the usual way we
write this stuff:
* @data: IO vector tail for the frame, points to the L3 header on return
> * @toside: Address information for one side of the flow
> * @dlen: Packet data length
> *
> * Return: Layer-4 length
> */
> -static size_t udp_vu_prepare(const struct ctx *c, const struct iovec *iov,
> - const struct flowside *toside, ssize_t dlen)
> +static size_t udp_vu_prepare(const struct ctx *c, struct iov_tail *data,
> + const struct flowside *toside, size_t dlen)
> {
> - struct ethhdr *eh;
> + bool ipv4 = inany_v4(&toside->eaddr) && inany_v4(&toside->oaddr);
> + struct ethhdr eh;
> size_t l4len;
>
> /* ethernet header */
> - eh = vu_eth(iov[0].iov_base);
> + memcpy(eh.h_dest, c->guest_mac, sizeof(eh.h_dest));
> + memcpy(eh.h_source, c->our_tap_mac, sizeof(eh.h_source));
>
> - memcpy(eh->h_dest, c->guest_mac, sizeof(eh->h_dest));
> - memcpy(eh->h_source, c->our_tap_mac, sizeof(eh->h_source));
> + if (ipv4)
> + eh.h_proto = htons(ETH_P_IP);
> + else
> + eh.h_proto = htons(ETH_P_IPV6);
> + IOV_PUSH_HEADER(data, eh);
>
> /* initialize header */
> - if (inany_v4(&toside->eaddr) && inany_v4(&toside->oaddr)) {
> - struct iphdr *iph = vu_ip(iov[0].iov_base);
> - struct udp_payload_t *bp = vu_payloadv4(iov[0].iov_base);
> + if (ipv4) {
> + struct iov_tail udp_frame;
> + struct iphdr iph;
>
> - eh->h_proto = htons(ETH_P_IP);
> + iph = (struct iphdr)L2_BUF_IP4_INIT(IPPROTO_UDP);
>
> - *iph = (struct iphdr)L2_BUF_IP4_INIT(IPPROTO_UDP);
> + udp_frame = *data;
> + IOV_DROP_HEADER(&udp_frame, struct iphdr);
> + l4len = udp_update_hdr4(&iph, &udp_frame, toside, dlen, true);
>
> - l4len = udp_update_hdr4(iph, bp, toside, dlen, true);
> + IOV_PUSH_HEADER(data, iph);
> } else {
> - struct ipv6hdr *ip6h = vu_ip(iov[0].iov_base);
> - struct udp_payload_t *bp = vu_payloadv6(iov[0].iov_base);
> + struct iov_tail udp_frame;
> + struct ipv6hdr ip6h;
>
> - eh->h_proto = htons(ETH_P_IPV6);
> + ip6h = (struct ipv6hdr)L2_BUF_IP6_INIT(IPPROTO_UDP);
>
> - *ip6h = (struct ipv6hdr)L2_BUF_IP6_INIT(IPPROTO_UDP);
> + udp_frame = *data;
> + IOV_DROP_HEADER(&udp_frame, struct ipv6hdr);
> + l4len = udp_update_hdr6(&ip6h, &udp_frame, toside, dlen, true);
>
> - l4len = udp_update_hdr6(ip6h, bp, toside, dlen, true);
> + IOV_PUSH_HEADER(data, ip6h);
> }
>
> return l4len;
> @@ -141,29 +151,23 @@ static size_t udp_vu_prepare(const struct ctx *c, const struct iovec *iov,
> /**
> * udp_vu_csum() - Calculate and set checksum for a UDP packet
> * @toside: Address information for one side of the flow
> - * @iov: IO vector for the frame
> - * @cnt: Number of IO vector entries
> + * @data: IO vector tail for the L3 frame
...meaning for the... packet, including IP headers?
> * @l4len: L4 length
> */
> -static void udp_vu_csum(const struct flowside *toside, const struct iovec *iov,
> - size_t cnt, size_t l4len)
> +static void udp_vu_csum(const struct flowside *toside, struct iov_tail *data,
> + size_t l4len)
> {
> const struct in_addr *src4 = inany_v4(&toside->oaddr);
> const struct in_addr *dst4 = inany_v4(&toside->eaddr);
> - char *base = iov[0].iov_base;
> - struct udp_payload_t *bp;
> - struct iov_tail data;
> -
> - if (src4 && dst4) {
> - bp = vu_payloadv4(base);
> - data = IOV_TAIL(iov, cnt, (char *)&bp->data - base);
> - csum_udp4(&bp->uh, *src4, *dst4, &data, l4len);
> - } else {
> - bp = vu_payloadv6(base);
> - data = IOV_TAIL(iov, cnt, (char *)&bp->data - base);
> - csum_udp6(&bp->uh, &toside->oaddr.a6, &toside->eaddr.a6, &data,
> - l4len);
> - }
> + struct udphdr *uh, uh_storage;
> + bool ipv4 = src4 && dst4;
> +
> + uh = IOV_REMOVE_HEADER(data, uh_storage);
> +
> + if (ipv4)
> + csum_udp4(uh, *src4, *dst4, data, l4len);
> + else
> + csum_udp6(uh, &toside->oaddr.a6, &toside->eaddr.a6, data, l4len);
Coverity Scan doesn't like this:
/home/sbrivio/passt/udp_vu.c:168:3:
Type: Dereference null return value (NULL_RETURNS)
/home/sbrivio/passt/udp_vu.c:165:2: Call to null-returning function
1. returned_null: "iov_remove_header_" returns "NULL" (checked 4 out of 5 times).
/home/sbrivio/passt/iov.c:405:2: Call to null-returning function
1.1. path: Condition "!p", taking true branch.
/home/sbrivio/passt/iov.c:406:3:
1.2. return_null: Explicitly returning null.
/home/sbrivio/passt/udp_vu.c:165:2:
2. var_assigned: Assigning: "uh" = "NULL" return value from "iov_remove_header_".
/home/sbrivio/passt/udp_vu.c:167:2:
3. path: Condition "ipv4", taking true branch.
/home/sbrivio/passt/udp_vu.c:168:3:
4. dereference: Passing null pointer "uh" to "csum_udp4", which dereferences it.
/home/sbrivio/passt/checksum.c:192:2: Call to null-returning function
4.1. dereference: Dereferencing pointer "udp4hr".
/home/sbrivio/passt/arp.c:86:2: Examples where return value was checked for null
5. example_assign: Example 1: Assigning: "ah" = return value from "iov_remove_header_(data, &ah_storage, 8UL, 2UL)".
/home/sbrivio/passt/arp.c:88:2:
6. example_checked: Example 1 (cont.): "ah" has its value checked in "ah".
/home/sbrivio/passt/dhcp.c:327:2: Examples where return value was checked for null
7. example_assign: Example 2: Assigning: "uh" = return value from "iov_remove_header_(data, &uh_storage, 8UL, 2UL)".
/home/sbrivio/passt/dhcp.c:328:2:
8. example_checked: Example 2 (cont.): "uh" has its value checked in "uh".
/home/sbrivio/passt/dhcpv6.c:576:2: Examples where return value was checked for null
9. example_assign: Example 3: Assigning: "uh" = return value from "iov_remove_header_(data, &uh_storage, 8UL, 2UL)".
/home/sbrivio/passt/dhcpv6.c:577:2:
10. example_checked: Example 3 (cont.): "uh" has its value checked in "uh".
/home/sbrivio/passt/udp.c:1083:3: Examples where return value was checked for null
11. example_assign: Example 4: Assigning: "uh_send" = return value from "iov_remove_header_(&data, &uh_storage, 8UL, 2UL)".
/home/sbrivio/passt/udp.c:1084:3:
12. example_checked: Example 4 (cont.): "uh_send" has its value checked in "uh_send".
/home/sbrivio/passt/udp_vu.c:170:3:
Type: Dereference null return value (NULL_RETURNS)
/home/sbrivio/passt/udp_vu.c:165:2: Call to null-returning function
1. returned_null: "iov_remove_header_" returns "NULL" (checked 4 out of 5 times).
/home/sbrivio/passt/iov.c:405:2: Call to null-returning function
1.1. path: Condition "!p", taking true branch.
/home/sbrivio/passt/iov.c:406:3:
1.2. return_null: Explicitly returning null.
/home/sbrivio/passt/udp_vu.c:165:2:
2. var_assigned: Assigning: "uh" = "NULL" return value from "iov_remove_header_".
/home/sbrivio/passt/udp_vu.c:167:2:
3. path: Condition "ipv4", taking false branch.
/home/sbrivio/passt/udp_vu.c:170:3:
4. dereference: Passing null pointer "uh" to "csum_udp6", which dereferences it.
/home/sbrivio/passt/checksum.c:258:2: Call to null-returning function
4.1. dereference: Dereferencing pointer "udp6hr".
/home/sbrivio/passt/arp.c:86:2: Examples where return value was checked for null
5. example_assign: Example 1: Assigning: "ah" = return value from "iov_remove_header_(data, &ah_storage, 8UL, 2UL)".
/home/sbrivio/passt/arp.c:88:2:
6. example_checked: Example 1 (cont.): "ah" has its value checked in "ah".
/home/sbrivio/passt/dhcp.c:327:2: Examples where return value was checked for null
7. example_assign: Example 2: Assigning: "uh" = return value from "iov_remove_header_(data, &uh_storage, 8UL, 2UL)".
/home/sbrivio/passt/dhcp.c:328:2:
8. example_checked: Example 2 (cont.): "uh" has its value checked in "uh".
/home/sbrivio/passt/dhcpv6.c:576:2: Examples where return value was checked for null
9. example_assign: Example 3: Assigning: "uh" = return value from "iov_remove_header_(data, &uh_storage, 8UL, 2UL)".
/home/sbrivio/passt/dhcpv6.c:577:2:
10. example_checked: Example 3 (cont.): "uh" has its value checked in "uh".
/home/sbrivio/passt/udp.c:1083:3: Examples where return value was checked for null
11. example_assign: Example 4: Assigning: "uh_send" = return value from "iov_remove_header_(&data, &uh_storage, 8UL, 2UL)".
/home/sbrivio/passt/udp.c:1084:3:
12. example_checked: Example 4 (cont.): "uh_send" has its value checked in "uh_send".
I'm not sure if there's a path where we could *really* make iov_remove_header_()
return NULL (we just built the header and we should always have space for it at
this point, I guess), but maybe an explicit check would make this more robust
anyway.
> }
>
> /**
> @@ -228,9 +232,10 @@ void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx)
> vu_queue_rewind(vq, elem_cnt - elem_used);
>
> if (iov_cnt > 0) {
> - size_t l4len = udp_vu_prepare(c, iov_vu, toside, dlen);
> + struct iov_tail data = IOV_TAIL(iov_vu, iov_cnt, VNET_HLEN);
> + size_t l4len = udp_vu_prepare(c, &data, toside, dlen);
> if (*c->pcap) {
> - udp_vu_csum(toside, iov_vu, iov_cnt, l4len);
> + udp_vu_csum(toside, &data, l4len);
> pcap_iov(iov_vu, iov_cnt, VNET_HLEN,
> hdrlen + dlen - VNET_HLEN);
> }
The rest of this series looks good to me, the TCP one will probably
take me a bit longer to review (unlikely that I'll finish today).
--
Stefano
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6 1/3] udp_vu: Allow virtqueue elements with multiple iovec entries
2026-04-03 11:53 ` Stefano Brivio
@ 2026-04-03 15:18 ` Laurent Vivier
2026-04-03 16:59 ` Stefano Brivio
0 siblings, 1 reply; 9+ messages in thread
From: Laurent Vivier @ 2026-04-03 15:18 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev
On 4/3/26 13:53, Stefano Brivio wrote:
> On Wed, 1 Apr 2026 21:23:24 +0200
> Laurent Vivier <lvivier@redhat.com> wrote:
>
>> The previous code assumed a 1:1 mapping between virtqueue elements and
>> iovec entries (enforced by an assert). Drop that assumption to allow
>> elements that span multiple iovecs: track elem_used separately by
>> walking the element list against the iov count returned after padding.
>> This also fixes vu_queue_rewind() and vu_flush() to use the element
>> count rather than the iov count.
>>
>> Use iov_tail_clone() in udp_vu_sock_recv() to handle header offset,
>> replacing the manual base/len adjustment and restore pattern.
>>
>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>> ---
>> udp_vu.c | 29 ++++++++++++++---------------
>> 1 file changed, 14 insertions(+), 15 deletions(-)
>>
>> diff --git a/udp_vu.c b/udp_vu.c
>> index 30af64034516..5608a3a96ff5 100644
>> --- a/udp_vu.c
>> +++ b/udp_vu.c
>> @@ -64,30 +64,25 @@ static size_t udp_vu_hdrlen(bool v6)
>> */
>> static ssize_t udp_vu_sock_recv(struct iovec *iov, size_t *cnt, int s, bool v6)
>> {
>> + struct iovec msg_iov[*cnt];
>
> Variable-length Arrays (VLAs) are allowed starting from C99 but we
> should really really avoid them. If 'cnt' is big enough, we risk
> writing all over the place. That's the main reason why they were more
> or less banned from the Linux kernel some years ago and eventually
> eradicated:
I can use alloca() if you prefer ;)
>
> https://lore.kernel.org/lkml/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com/
>
> https://lore.kernel.org/lkml/20181028172401.GA41102@beast/
>
> Can we use VIRTQUEUE_MAX_SIZE as upper bound like udp_vu_sock_to_tap()
> does?
Yes, but the idea here is we have always *cnt < VIRTQUEUE_MAX_SIZE
(because the value comes from vu_collect() and in vu_collect():
*in_sg (&iov_cnt or *cnt) < max_in_sg (ARRAY_SIZE(iov_vu) or VIRTQUEUE_MAX_SIZE or 1024)
And vu_collect(), in this case, sets generally *in_sg to a value lower than 44:
we want to create a frame of ETH_MAX_MTU by coalescing kernel buffers of size ETH_FRAME_LEN)
For me 16 kB on the stack is a lot of memory (but I started programming on a 48 kB RAM
computer...).
Thanks,
Laurent
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6 1/3] udp_vu: Allow virtqueue elements with multiple iovec entries
2026-04-03 15:18 ` Laurent Vivier
@ 2026-04-03 16:59 ` Stefano Brivio
2026-04-03 17:14 ` Laurent Vivier
0 siblings, 1 reply; 9+ messages in thread
From: Stefano Brivio @ 2026-04-03 16:59 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
On Fri, 3 Apr 2026 17:18:23 +0200
Laurent Vivier <lvivier@redhat.com> wrote:
> On 4/3/26 13:53, Stefano Brivio wrote:
> > On Wed, 1 Apr 2026 21:23:24 +0200
> > Laurent Vivier <lvivier@redhat.com> wrote:
> >
> >> The previous code assumed a 1:1 mapping between virtqueue elements and
> >> iovec entries (enforced by an assert). Drop that assumption to allow
> >> elements that span multiple iovecs: track elem_used separately by
> >> walking the element list against the iov count returned after padding.
> >> This also fixes vu_queue_rewind() and vu_flush() to use the element
> >> count rather than the iov count.
> >>
> >> Use iov_tail_clone() in udp_vu_sock_recv() to handle header offset,
> >> replacing the manual base/len adjustment and restore pattern.
> >>
> >> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> >> ---
> >> udp_vu.c | 29 ++++++++++++++---------------
> >> 1 file changed, 14 insertions(+), 15 deletions(-)
> >>
> >> diff --git a/udp_vu.c b/udp_vu.c
> >> index 30af64034516..5608a3a96ff5 100644
> >> --- a/udp_vu.c
> >> +++ b/udp_vu.c
> >> @@ -64,30 +64,25 @@ static size_t udp_vu_hdrlen(bool v6)
> >> */
> >> static ssize_t udp_vu_sock_recv(struct iovec *iov, size_t *cnt, int s, bool v6)
> >> {
> >> + struct iovec msg_iov[*cnt];
> >
> > Variable-length Arrays (VLAs) are allowed starting from C99 but we
> > should really really avoid them. If 'cnt' is big enough, we risk
> > writing all over the place. That's the main reason why they were more
> > or less banned from the Linux kernel some years ago and eventually
> > eradicated:
>
> I can use alloca() if you prefer ;)
Claude, is this you? ;)
I guess if you come up with a sufficient convoluted maze of "elem" /
"iov" / "head" macros using concatenation to strategically place calls
to strndupa(), with an abstraction based on IOV "tails" called "strain"
indicated by "strn" for brevity... one day I might miss that, yes.
But I'll try to remember that, the next time we discuss whether it's
really needed to duplicate strain A or if a single copy of it is enough.
> > https://lore.kernel.org/lkml/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com/
> >
> > https://lore.kernel.org/lkml/20181028172401.GA41102@beast/
> >
> > Can we use VIRTQUEUE_MAX_SIZE as upper bound like udp_vu_sock_to_tap()
> > does?
>
> Yes, but the idea here is we have always *cnt < VIRTQUEUE_MAX_SIZE
> (because the value comes from vu_collect() and in vu_collect():
> *in_sg (&iov_cnt or *cnt) < max_in_sg (ARRAY_SIZE(iov_vu) or VIRTQUEUE_MAX_SIZE or 1024)
...until somebody, running this somewhere where we don't have gcc's
stack protector stuff (or equivalent), without having quite obtained
full arbitrary code execution yet, finds a way to manipulate *cnt...
> And vu_collect(), in this case, sets generally *in_sg to a value lower than 44:
> we want to create a frame of ETH_MAX_MTU by coalescing kernel buffers of size ETH_FRAME_LEN)
If it _can_ be 16 KiB, then I would suggest it's better to _just_ have
16 KiB. It's more auditable, and it's not like we "allocate" it anyway.
On top of it, udp_vu_sock_to_tap() already does that, and other
functions (with potentially deeper call trees) do worse. I'm not
claiming it's a good idea to do it "because it's bad anyway", in
general, but in this case the maximum is what matters.
> For me 16 kB on the stack is a lot of memory (but I started programming on a 48 kB RAM
> computer...).
I guess I started with around ten times that but I tend to agree. What
I'm suggesting is that if it can be a lot, better just make it that lot.
An alternative I'm pondering about is whether we can make things
recoverably / gracefully fail if that's > 64 or something like that. At
this point we still have that data on the socket and we could dequeue
it in a later pass I suppose. But maybe it gets very complicated...
--
Stefano
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6 1/3] udp_vu: Allow virtqueue elements with multiple iovec entries
2026-04-03 16:59 ` Stefano Brivio
@ 2026-04-03 17:14 ` Laurent Vivier
0 siblings, 0 replies; 9+ messages in thread
From: Laurent Vivier @ 2026-04-03 17:14 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev
On 4/3/26 18:59, Stefano Brivio wrote:
> On Fri, 3 Apr 2026 17:18:23 +0200
> Laurent Vivier <lvivier@redhat.com> wrote:
>
>> On 4/3/26 13:53, Stefano Brivio wrote:
>>> On Wed, 1 Apr 2026 21:23:24 +0200
>>> Laurent Vivier <lvivier@redhat.com> wrote:
>>>
>>>> The previous code assumed a 1:1 mapping between virtqueue elements and
>>>> iovec entries (enforced by an assert). Drop that assumption to allow
>>>> elements that span multiple iovecs: track elem_used separately by
>>>> walking the element list against the iov count returned after padding.
>>>> This also fixes vu_queue_rewind() and vu_flush() to use the element
>>>> count rather than the iov count.
>>>>
>>>> Use iov_tail_clone() in udp_vu_sock_recv() to handle header offset,
>>>> replacing the manual base/len adjustment and restore pattern.
>>>>
>>>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>>>> ---
>>>> udp_vu.c | 29 ++++++++++++++---------------
>>>> 1 file changed, 14 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/udp_vu.c b/udp_vu.c
>>>> index 30af64034516..5608a3a96ff5 100644
>>>> --- a/udp_vu.c
>>>> +++ b/udp_vu.c
>>>> @@ -64,30 +64,25 @@ static size_t udp_vu_hdrlen(bool v6)
>>>> */
>>>> static ssize_t udp_vu_sock_recv(struct iovec *iov, size_t *cnt, int s, bool v6)
>>>> {
>>>> + struct iovec msg_iov[*cnt];
>>>
>>> Variable-length Arrays (VLAs) are allowed starting from C99 but we
>>> should really really avoid them. If 'cnt' is big enough, we risk
>>> writing all over the place. That's the main reason why they were more
>>> or less banned from the Linux kernel some years ago and eventually
>>> eradicated:
>>
>> I can use alloca() if you prefer ;)
>
> Claude, is this you? ;)
>
> I guess if you come up with a sufficient convoluted maze of "elem" /
> "iov" / "head" macros using concatenation to strategically place calls
> to strndupa(), with an abstraction based on IOV "tails" called "strain"
> indicated by "strn" for brevity... one day I might miss that, yes.
>
> But I'll try to remember that, the next time we discuss whether it's
> really needed to duplicate strain A or if a single copy of it is enough.
>
>>> https://lore.kernel.org/lkml/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com/
>>>
>>> https://lore.kernel.org/lkml/20181028172401.GA41102@beast/
>>>
>>> Can we use VIRTQUEUE_MAX_SIZE as upper bound like udp_vu_sock_to_tap()
>>> does?
>>
>> Yes, but the idea here is we have always *cnt < VIRTQUEUE_MAX_SIZE
>> (because the value comes from vu_collect() and in vu_collect():
>> *in_sg (&iov_cnt or *cnt) < max_in_sg (ARRAY_SIZE(iov_vu) or VIRTQUEUE_MAX_SIZE or 1024)
>
> ...until somebody, running this somewhere where we don't have gcc's
> stack protector stuff (or equivalent), without having quite obtained
> full arbitrary code execution yet, finds a way to manipulate *cnt...
>
>> And vu_collect(), in this case, sets generally *in_sg to a value lower than 44:
>> we want to create a frame of ETH_MAX_MTU by coalescing kernel buffers of size ETH_FRAME_LEN)
>
> If it _can_ be 16 KiB, then I would suggest it's better to _just_ have
> 16 KiB. It's more auditable, and it's not like we "allocate" it anyway.
>
> On top of it, udp_vu_sock_to_tap() already does that, and other
> functions (with potentially deeper call trees) do worse. I'm not
> claiming it's a good idea to do it "because it's bad anyway", in
> general, but in this case the maximum is what matters.
>
>> For me 16 kB on the stack is a lot of memory (but I started programming on a 48 kB RAM
>> computer...).
>
> I guess I started with around ten times that but I tend to agree. What
> I'm suggesting is that if it can be a lot, better just make it that lot.
>
> An alternative I'm pondering about is whether we can make things
> recoverably / gracefully fail if that's > 64 or something like that. At
> this point we still have that data on the socket and we could dequeue
> it in a later pass I suppose. But maybe it gets very complicated...
>
I understand.
I already sent a v6 without this change but for the v7 I will remove the VLA (without
using alloca()).
Thanks,
Laurent
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-04-03 17:14 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-01 19:23 [PATCH v6 0/3] vhost-user,udp: Handle multiple iovec entries per virtqueue element Laurent Vivier
2026-04-01 19:23 ` [PATCH v6 1/3] udp_vu: Allow virtqueue elements with multiple iovec entries Laurent Vivier
2026-04-03 11:53 ` Stefano Brivio
2026-04-03 15:18 ` Laurent Vivier
2026-04-03 16:59 ` Stefano Brivio
2026-04-03 17:14 ` Laurent Vivier
2026-04-01 19:23 ` [PATCH v6 2/3] iov: Introduce IOV_PUSH_HEADER() macro Laurent Vivier
2026-04-01 19:23 ` [PATCH v6 3/3] udp: Pass iov_tail to udp_update_hdr4()/udp_update_hdr6() Laurent Vivier
2026-04-03 11:53 ` Stefano Brivio
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).