public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element
@ 2026-05-13 11:52 Laurent Vivier
  2026-05-13 11:52 ` [PATCH v4 01/10] iov: Introduce iov_memset() Laurent Vivier
                   ` (11 more replies)
  0 siblings, 12 replies; 26+ messages in thread
From: Laurent Vivier @ 2026-05-13 11:52 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier

Currently, the vhost-user path assumes each virtqueue element contains
exactly one iovec entry covering the entire frame.  This assumption
breaks as some virtio-net drivers (notably iPXE) provide descriptors where the
vnet header and the frame payload are in separate buffers, resulting in
two iovec entries per virtqueue element.

This series refactors the vhost-user data path so that frame lengths,
header sizes, and padding are tracked and passed explicitly rather than
being derived from iovec sizes.  This decoupling is a prerequisite for
correctly handling padding of multi-buffer frames.

The changes in this series can be split in 3 groups:

- New iov helpers (patches 1-2):

   iov_memset() and iov_memcpy() operate across iovec boundaries.
   These are needed by the final patch to pad and copy frame data
   when a frame spans multiple iovec entries.

- Structural refactoring (patches 3-5):

   Move vnethdr setup into vu_flush(), separate virtqueue management
   from socket I/O in the UDP path, and pass iov arrays explicitly
   instead of using file-scoped state.  These changes make it possible
   to pass explicit frame lengths through the stack, which is required
   to pad frames independently of iovec layout.

- Explicit length passing throughout the stack (patches 6-10):

   Thread explicit L4, L2, frame, and data lengths through checksum,
   pcap, vu_flush(), and tcp_fill_headers(), replacing lengths that
   were previously derived from iovec sizes.  With lengths tracked
   explicitly, the final patch can centralise Ethernet frame padding
   into vu_collect() and a new vu_pad() helper that correctly pads
   frames spanning multiple iovec entries.

v4:
- rebase
- iov_memcpy: use size_t for loop indices i and j
- udp_vu: reorder elem[] declaration for inverted christmas tree style
- pcap: wrap pcap_iov() declaration and definition to respect line length
- write_remainder(): update length parameter description
- Add Reviewed-by tags from Jon and David

v3:
- csum_udp4()/csum_udp6()/udp_vu_csum receive payload length (dlen) rather than l4len
- Add a length parameter to write_remainder() and use it in pcap_frame()

v2:
- Rename iov_memcopy() to iov_memcpy() and use clearer parameter names
- Use clearer code in pcap_frame()
- Add braces around bodies in pcap.c and tcp_vu.c for style consistency
- Extract l2len variable in tap_add_packet() and tcp_vu_send_flag()
  to avoid repeating the same expression
- Fix indentation alignment of iov_skip_bytes() arguments in tcp_vu_c
- Introduce fill_size variable in vu_flush()
- Reposition comment for ETH_ZLEN in vu_collect()

Laurent Vivier (10):
  iov: Introduce iov_memset()
  iov: Add iov_memcpy() to copy data between iovec arrays
  vu_common: Move vnethdr setup into vu_flush()
  udp_vu: Move virtqueue management from udp_vu_sock_recv() to its
    caller
  udp_vu: Pass iov explicitly to helpers instead of using file-scoped
    array
  checksum: Pass explicit L4 length to checksum functions
  pcap: Pass explicit L2 length to pcap_iov()
  vu_common: Pass explicit frame length to vu_flush()
  tcp: Pass explicit data length to tcp_fill_headers()
  vhost-user: Centralise Ethernet frame padding in vu_collect() and
    vu_pad()

 checksum.c     |  43 +++++++-----
 checksum.h     |   6 +-
 iov.c          |  77 ++++++++++++++++++++++
 iov.h          |   5 ++
 pcap.c         |  29 ++++++---
 pcap.h         |   3 +-
 tap.c          |  10 +--
 tcp.c          |  14 ++--
 tcp_buf.c      |   3 +-
 tcp_internal.h |   2 +-
 tcp_vu.c       |  66 ++++++++++---------
 udp.c          |   5 +-
 udp_vu.c       | 173 +++++++++++++++++++++++++------------------------
 util.c         |  31 +++++++--
 util.h         |   3 +-
 vu_common.c    |  58 ++++++++++-------
 vu_common.h    |   5 +-
 17 files changed, 339 insertions(+), 194 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 01/10] iov: Introduce iov_memset()
  2026-05-13 11:52 [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element Laurent Vivier
@ 2026-05-13 11:52 ` Laurent Vivier
  2026-05-13 11:52 ` [PATCH v4 02/10] iov: Add iov_memcpy() to copy data between iovec arrays Laurent Vivier
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Laurent Vivier @ 2026-05-13 11:52 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier, David Gibson, Jon Maloy

Add a helper to set a range of bytes across an IO vector to a given
value, similar to memset() but operating over scatter-gather buffers.
It skips to the given offset and fills across iovec entries up to the
requested length.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Jon Maloy <jmaloy@redhat.com>
---
 iov.c | 27 +++++++++++++++++++++++++++
 iov.h |  2 ++
 2 files changed, 29 insertions(+)

diff --git a/iov.c b/iov.c
index ae0743931d18..0188acdf5eba 100644
--- a/iov.c
+++ b/iov.c
@@ -170,6 +170,33 @@ size_t iov_truncate(struct iovec *iov, size_t iov_cnt, size_t size)
 	return i;
 }
 
+/**
+ * iov_memset() - Set bytes of an IO vector to a given value
+ * @iov:	IO vector
+ * @iov_cnt:	Number of elements in @iov
+ * @offset:	Byte offset in the iovec at which to start
+ * @c:		Byte value to fill with
+ * @length:	Number of bytes to set
+ * 		Will write less than @length bytes if it runs out of space in
+ * 		the iov
+ */
+/* cppcheck-suppress unusedFunction */
+void iov_memset(const struct iovec *iov, size_t iov_cnt, size_t offset, int c,
+		size_t length)
+{
+	size_t i;
+
+	i = iov_skip_bytes(iov, iov_cnt, offset, &offset);
+
+	for ( ; i < iov_cnt && length; i++) {
+		size_t n = MIN(iov[i].iov_len - offset, length);
+
+		memset((char *)iov[i].iov_base + offset, c, n);
+		offset = 0;
+		length -= n;
+	}
+}
+
 /**
  * iov_tail_prune() - Remove any unneeded buffers from an IOV tail
  * @tail:	IO vector tail (modified)
diff --git a/iov.h b/iov.h
index b4e50b0fca5a..d295d05b3bab 100644
--- a/iov.h
+++ b/iov.h
@@ -30,6 +30,8 @@ size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt,
 		  size_t offset, void *buf, size_t bytes);
 size_t iov_size(const struct iovec *iov, size_t iov_cnt);
 size_t iov_truncate(struct iovec *iov, size_t iov_cnt, size_t size);
+void iov_memset(const struct iovec *iov, size_t iov_cnt, size_t offset, int c,
+		size_t length);
 
 /*
  * DOC: Theory of Operation, struct iov_tail
-- 
2.54.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 02/10] iov: Add iov_memcpy() to copy data between iovec arrays
  2026-05-13 11:52 [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element Laurent Vivier
  2026-05-13 11:52 ` [PATCH v4 01/10] iov: Introduce iov_memset() Laurent Vivier
@ 2026-05-13 11:52 ` Laurent Vivier
  2026-05-13 11:52 ` [PATCH v4 03/10] vu_common: Move vnethdr setup into vu_flush() Laurent Vivier
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Laurent Vivier @ 2026-05-13 11:52 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier, David Gibson, Jon Maloy

Add a helper to copy data from a source iovec array to a destination
iovec array, each starting at an arbitrary byte offset, iterating
through both arrays simultaneously and copying in chunks matching the
smaller of the two current segments.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Jon Maloy <jmaloy@redhat.com>
---
 iov.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 iov.h |  3 +++
 2 files changed, 54 insertions(+)

diff --git a/iov.c b/iov.c
index 0188acdf5eba..d2b06093d11f 100644
--- a/iov.c
+++ b/iov.c
@@ -197,6 +197,57 @@ void iov_memset(const struct iovec *iov, size_t iov_cnt, size_t offset, int c,
 	}
 }
 
+/**
+ * iov_memcpy() - Copy data between two iovec arrays
+ * @dst_iov:		Destination iovec array
+ * @dst_iov_cnt:	Number of elements in destination iovec array
+ * @dst_offset:		Destination offset
+ * @src_iov:		Source iovec array
+ * @src_iov_cnt:	Number of elements in source iovec array
+ * @offs:		Source offset
+ * @length:		Number of bytes to copy
+ *
+ * Return: total number of bytes copied
+ */
+/* cppcheck-suppress unusedFunction */
+size_t iov_memcpy(struct iovec *dst_iov, size_t dst_iov_cnt, size_t dst_offset,
+		  const struct iovec *src_iov, size_t src_iov_cnt,
+		  size_t src_offset, size_t length)
+{
+	size_t i, j, total = 0;
+
+	i = iov_skip_bytes(src_iov, src_iov_cnt, src_offset, &src_offset);
+	j = iov_skip_bytes(dst_iov, dst_iov_cnt, dst_offset, &dst_offset);
+
+	/* copying data */
+	while (length && i < src_iov_cnt && j < dst_iov_cnt) {
+		size_t n = MIN(dst_iov[j].iov_len - dst_offset,
+			       src_iov[i].iov_len - src_offset);
+
+		if (n > length)
+			n = length;
+
+		memcpy((char *)dst_iov[j].iov_base + dst_offset,
+		       (const char *)src_iov[i].iov_base + src_offset, n);
+
+		dst_offset += n;
+		src_offset += n;
+		total += n;
+		length -= n;
+
+		if (dst_offset == dst_iov[j].iov_len) {
+			dst_offset = 0;
+			j++;
+		}
+		if (src_offset == src_iov[i].iov_len) {
+			src_offset = 0;
+			i++;
+		}
+	}
+
+	return total;
+}
+
 /**
  * iov_tail_prune() - Remove any unneeded buffers from an IOV tail
  * @tail:	IO vector tail (modified)
diff --git a/iov.h b/iov.h
index d295d05b3bab..3c63308e554f 100644
--- a/iov.h
+++ b/iov.h
@@ -32,6 +32,9 @@ size_t iov_size(const struct iovec *iov, size_t iov_cnt);
 size_t iov_truncate(struct iovec *iov, size_t iov_cnt, size_t size);
 void iov_memset(const struct iovec *iov, size_t iov_cnt, size_t offset, int c,
 		size_t length);
+size_t iov_memcpy(struct iovec *dst_iov, size_t dst_iov_cnt, size_t dst_offset,
+		  const struct iovec *src_iov, size_t src_iov_cnt,
+		  size_t src_offset, size_t length);
 
 /*
  * DOC: Theory of Operation, struct iov_tail
-- 
2.54.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 03/10] vu_common: Move vnethdr setup into vu_flush()
  2026-05-13 11:52 [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element Laurent Vivier
  2026-05-13 11:52 ` [PATCH v4 01/10] iov: Introduce iov_memset() Laurent Vivier
  2026-05-13 11:52 ` [PATCH v4 02/10] iov: Add iov_memcpy() to copy data between iovec arrays Laurent Vivier
@ 2026-05-13 11:52 ` Laurent Vivier
  2026-05-13 11:52 ` [PATCH v4 04/10] udp_vu: Move virtqueue management from udp_vu_sock_recv() to its caller Laurent Vivier
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Laurent Vivier @ 2026-05-13 11:52 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier, David Gibson, Jon Maloy

Every caller of vu_flush() was calling vu_set_vnethdr() beforehand with
the same pattern.  Move it into vu_flush().

Remove vu_queue_notify() from vu_flush() and let callers invoke it
explicitly.  This allows paths that perform multiple flushes, such as
tcp_vu_send_flag() and tcp_vu_data_from_sock(), to issue a single guest
notification at the end.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Jon Maloy <jmaloy@redhat.com>
---
 tcp_vu.c    | 19 ++++++++-----------
 udp_vu.c    |  3 +--
 vu_common.c |  9 +++++----
 vu_common.h |  1 -
 4 files changed, 14 insertions(+), 18 deletions(-)

diff --git a/tcp_vu.c b/tcp_vu.c
index 9d63a30df4e9..95084cb4763c 100644
--- a/tcp_vu.c
+++ b/tcp_vu.c
@@ -84,7 +84,6 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
 	struct ethhdr *eh;
 	uint32_t seq;
 	int elem_cnt;
-	int nb_ack;
 	int ret;
 
 	hdrlen = tcp_vu_hdrlen(CONN_V6(conn));
@@ -99,8 +98,6 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
 	assert(flags_elem[0].in_sg[0].iov_len >=
 	       MAX(hdrlen + sizeof(*opts), ETH_ZLEN + VNET_HLEN));
 
-	vu_set_vnethdr(flags_elem[0].in_sg[0].iov_base, 1);
-
 	eh = vu_eth(flags_elem[0].in_sg[0].iov_base);
 
 	memcpy(eh->h_dest, c->guest_mac, sizeof(eh->h_dest));
@@ -145,9 +142,10 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
 	l2len = optlen + hdrlen - VNET_HLEN;
 	vu_pad(&flags_elem[0].in_sg[0], l2len);
 
+	vu_flush(vdev, vq, flags_elem, 1);
+
 	if (*c->pcap)
 		pcap_iov(&flags_elem[0].in_sg[0], 1, VNET_HLEN);
-	nb_ack = 1;
 
 	if (flags & DUP_ACK) {
 		elem_cnt = vu_collect(vdev, vq, &flags_elem[1], 1,
@@ -159,14 +157,14 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
 			memcpy(flags_elem[1].in_sg[0].iov_base,
 			       flags_elem[0].in_sg[0].iov_base,
 			       flags_elem[0].in_sg[0].iov_len);
-			nb_ack++;
+
+			vu_flush(vdev, vq, &flags_elem[1], 1);
 
 			if (*c->pcap)
 				pcap_iov(&flags_elem[1].in_sg[0], 1, VNET_HLEN);
 		}
 	}
-
-	vu_flush(vdev, vq, flags_elem, nb_ack);
+	vu_queue_notify(vdev, vq);
 
 	return 0;
 }
@@ -453,7 +451,6 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
 		assert(frame_size >= hdrlen);
 
 		dlen = frame_size - hdrlen;
-		vu_set_vnethdr(iov->iov_base, buf_cnt);
 
 		/* The IPv4 header checksum varies only with dlen */
 		if (previous_dlen != dlen)
@@ -466,14 +463,14 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
 		l2len = dlen + hdrlen - VNET_HLEN;
 		vu_pad(iov, l2len);
 
+		vu_flush(vdev, vq, &elem[head[i]], buf_cnt);
+
 		if (*c->pcap)
 			pcap_iov(iov, buf_cnt, VNET_HLEN);
 
 		conn->seq_to_tap += dlen;
 	}
-
-	/* send packets */
-	vu_flush(vdev, vq, elem, iov_cnt);
+	vu_queue_notify(vdev, vq);
 
 	conn_flag(c, conn, ACK_FROM_TAP_DUE);
 
diff --git a/udp_vu.c b/udp_vu.c
index cc69654398f0..f8629af58ab5 100644
--- a/udp_vu.c
+++ b/udp_vu.c
@@ -124,8 +124,6 @@ static int udp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq, int s,
 	l2len = *dlen + hdrlen - VNET_HLEN;
 	vu_pad(&iov_vu[0], l2len);
 
-	vu_set_vnethdr(iov_vu[0].iov_base, elem_used);
-
 	/* release unused buffers */
 	vu_queue_rewind(vq, elem_cnt - elem_used);
 
@@ -230,6 +228,7 @@ void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx)
 				pcap_iov(iov_vu, iov_used, VNET_HLEN);
 			}
 			vu_flush(vdev, vq, elem, iov_used);
+			vu_queue_notify(vdev, vq);
 		}
 	}
 }
diff --git a/vu_common.c b/vu_common.c
index 13b1e51001d4..57949ca32309 100644
--- a/vu_common.c
+++ b/vu_common.c
@@ -118,7 +118,8 @@ int vu_collect(const struct vu_dev *vdev, struct vu_virtq *vq,
  * @vnethdr:		Address of the header to set
  * @num_buffers:	Number of guest buffers of the frame
  */
-void vu_set_vnethdr(struct virtio_net_hdr_mrg_rxbuf *vnethdr, int num_buffers)
+static void vu_set_vnethdr(struct virtio_net_hdr_mrg_rxbuf *vnethdr,
+			   int num_buffers)
 {
 	vnethdr->hdr = VU_HEADER;
 	/* Note: if VIRTIO_NET_F_MRG_RXBUF is not negotiated,
@@ -139,6 +140,8 @@ void vu_flush(const struct vu_dev *vdev, struct vu_virtq *vq,
 {
 	int i;
 
+	vu_set_vnethdr(elem[0].in_sg[0].iov_base, elem_cnt);
+
 	for (i = 0; i < elem_cnt; i++) {
 		size_t elem_size = iov_size(elem[i].in_sg, elem[i].in_num);
 
@@ -146,7 +149,6 @@ void vu_flush(const struct vu_dev *vdev, struct vu_virtq *vq,
 	}
 
 	vu_queue_flush(vdev, vq, elem_cnt);
-	vu_queue_notify(vdev, vq);
 }
 
 /**
@@ -260,8 +262,6 @@ int vu_send_single(const struct ctx *c, const void *buf, size_t size)
 		goto err;
 	}
 
-	vu_set_vnethdr(in_sg[0].iov_base, elem_cnt);
-
 	total -= VNET_HLEN;
 
 	/* copy data from the buffer to the iovec */
@@ -271,6 +271,7 @@ int vu_send_single(const struct ctx *c, const void *buf, size_t size)
 		pcap_iov(in_sg, in_total, VNET_HLEN);
 
 	vu_flush(vdev, vq, elem, elem_cnt);
+	vu_queue_notify(vdev, vq);
 
 	trace("vhost-user sent %zu", total);
 
diff --git a/vu_common.h b/vu_common.h
index 7b060eb6184f..4037ab765b7d 100644
--- a/vu_common.h
+++ b/vu_common.h
@@ -39,7 +39,6 @@ int vu_collect(const struct vu_dev *vdev, struct vu_virtq *vq,
 	       struct vu_virtq_element *elem, int max_elem,
 	       struct iovec *in_sg, size_t max_in_sg, size_t *in_total,
 	       size_t size, size_t *collected);
-void vu_set_vnethdr(struct virtio_net_hdr_mrg_rxbuf *vnethdr, int num_buffers);
 void vu_flush(const struct vu_dev *vdev, struct vu_virtq *vq,
 	      struct vu_virtq_element *elem, int elem_cnt);
 void vu_kick_cb(struct vu_dev *vdev, union epoll_ref ref,
-- 
2.54.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 04/10] udp_vu: Move virtqueue management from udp_vu_sock_recv() to its caller
  2026-05-13 11:52 [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element Laurent Vivier
                   ` (2 preceding siblings ...)
  2026-05-13 11:52 ` [PATCH v4 03/10] vu_common: Move vnethdr setup into vu_flush() Laurent Vivier
@ 2026-05-13 11:52 ` Laurent Vivier
  2026-05-13 11:52 ` [PATCH v4 05/10] udp_vu: Pass iov explicitly to helpers instead of using file-scoped array Laurent Vivier
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Laurent Vivier @ 2026-05-13 11:52 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier, David Gibson, Jon Maloy

udp_vu_sock_recv() currently mixes two concerns: receiving data from the
socket and managing virtqueue buffers (collecting, rewinding, releasing).
This makes the function harder to reason about and couples socket I/O
with virtqueue state.

Move all virtqueue operations, vu_collect(), vu_init_elem(),
vu_queue_rewind(), vu_set_vnethdr(), and the queue-readiness check, into
udp_vu_sock_to_tap(), which is the only caller.  This turns
udp_vu_sock_recv() into a pure socket receive function that simply reads
into the provided iov array and adjusts its length.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Jon Maloy <jmaloy@redhat.com>
---
 udp_vu.c | 98 +++++++++++++++++++++++++++++---------------------------
 1 file changed, 50 insertions(+), 48 deletions(-)

diff --git a/udp_vu.c b/udp_vu.c
index f8629af58ab5..bd9fd5abb971 100644
--- a/udp_vu.c
+++ b/udp_vu.c
@@ -58,46 +58,23 @@ static size_t udp_vu_hdrlen(bool v6)
 
 /**
  * udp_vu_sock_recv() - Receive datagrams from socket into vhost-user buffers
- * @c:		Execution context
- * @vq:		virtqueue to use to receive data
  * @s:		Socket to receive from
  * @v6:		Set for IPv6 connections
- * @dlen:	Size of received data (output)
+ * @iov_cnt:	Number of collected iov in iov_vu (input)
+ * 		Number of iov entries used to store the datagram (output)
+ * 		Unchanged on failure
  *
- * Return: number of iov entries used to store the datagram, 0 if the datagram
- *         was discarded because the virtqueue is not ready, -1 on error
+ * Return: size of received data, -1 on error
  */
-static int udp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq, int s,
-			    bool v6, ssize_t *dlen)
+static ssize_t udp_vu_sock_recv(int s, bool v6, size_t *iov_cnt)
 {
-	const struct vu_dev *vdev = c->vdev;
-	int elem_cnt, elem_used, iov_used;
 	struct msghdr msg  = { 0 };
 	size_t hdrlen, l2len;
-	size_t iov_cnt;
-
-	assert(!c->no_udp);
-
-	if (!vu_queue_enabled(vq) || !vu_queue_started(vq)) {
-		debug("Got UDP packet, but RX virtqueue not usable yet");
-
-		if (recvmsg(s, &msg, MSG_DONTWAIT) < 0)
-			debug_perror("Failed to discard datagram");
-
-		return 0;
-	}
+	ssize_t dlen;
 
 	/* compute L2 header length */
 	hdrlen = udp_vu_hdrlen(v6);
 
-	elem_cnt = vu_collect(vdev, vq, elem, ARRAY_SIZE(elem),
-			      iov_vu, ARRAY_SIZE(iov_vu), &iov_cnt,
-			      IP_MAX_MTU + ETH_HLEN + VNET_HLEN, NULL);
-	if (elem_cnt == 0)
-		return -1;
-
-	assert((size_t)elem_cnt == iov_cnt);	/* one iovec per element */
-
 	/* reserve space for the headers */
 	assert(iov_vu[0].iov_len >= MAX(hdrlen, ETH_ZLEN + VNET_HLEN));
 	iov_vu[0].iov_base = (char *)iov_vu[0].iov_base + hdrlen;
@@ -105,29 +82,23 @@ static int udp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq, int s,
 
 	/* read data from the socket */
 	msg.msg_iov = iov_vu;
-	msg.msg_iovlen = iov_cnt;
+	msg.msg_iovlen = *iov_cnt;
 
-	*dlen = recvmsg(s, &msg, 0);
-	if (*dlen < 0) {
-		vu_queue_rewind(vq, elem_cnt);
+	dlen = recvmsg(s, &msg, 0);
+	if (dlen < 0)
 		return -1;
-	}
 
 	/* restore the pointer to the headers address */
 	iov_vu[0].iov_base = (char *)iov_vu[0].iov_base - hdrlen;
 	iov_vu[0].iov_len += hdrlen;
 
-	iov_used = iov_truncate(iov_vu, iov_cnt, *dlen + hdrlen);
-	elem_used = iov_used; /* one iovec per element */
+	*iov_cnt = iov_truncate(iov_vu, *iov_cnt, dlen + hdrlen);
 
 	/* pad frame to 60 bytes: first buffer is at least ETH_ZLEN long */
-	l2len = *dlen + hdrlen - VNET_HLEN;
+	l2len = dlen + hdrlen - VNET_HLEN;
 	vu_pad(&iov_vu[0], l2len);
 
-	/* release unused buffers */
-	vu_queue_rewind(vq, elem_cnt - elem_used);
-
-	return iov_used;
+	return dlen;
 }
 
 /**
@@ -213,21 +184,52 @@ void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx)
 	struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE];
 	int i;
 
+	assert(!c->no_udp);
+
+	if (!vu_queue_enabled(vq) || !vu_queue_started(vq)) {
+		struct msghdr msg = { 0 };
+
+		debug("Got UDP packet, but RX virtqueue not usable yet");
+
+		for (i = 0; i < n; i++) {
+			if (recvmsg(s, &msg, MSG_DONTWAIT) < 0)
+				debug_perror("Failed to discard datagram");
+		}
+
+		return;
+	}
+
 	for (i = 0; i < n; i++) {
+		unsigned elem_cnt, elem_used;
+		size_t iov_cnt;
 		ssize_t dlen;
-		int iov_used;
 
-		iov_used = udp_vu_sock_recv(c, vq, s, v6, &dlen);
-		if (iov_used < 0)
+		elem_cnt = vu_collect(vdev, vq, elem, ARRAY_SIZE(elem),
+				      iov_vu, ARRAY_SIZE(iov_vu), &iov_cnt,
+				      IP_MAX_MTU + ETH_HLEN + VNET_HLEN, NULL);
+		if (elem_cnt == 0)
+			break;
+
+		assert((size_t)elem_cnt == iov_cnt);	/* one iovec per element */
+
+		dlen = udp_vu_sock_recv(s, v6, &iov_cnt);
+		if (dlen < 0) {
+			vu_queue_rewind(vq, iov_cnt);
 			break;
+		}
+
+		elem_used = iov_cnt; /* one iovec per element */
+
+		/* release unused buffers */
+		vu_queue_rewind(vq, elem_cnt - elem_used);
 
-		if (iov_used > 0) {
+		if (iov_cnt > 0) {
 			udp_vu_prepare(c, toside, dlen);
 			if (*c->pcap) {
-				udp_vu_csum(toside, iov_used);
-				pcap_iov(iov_vu, iov_used, VNET_HLEN);
+				udp_vu_csum(toside, iov_cnt);
+				pcap_iov(iov_vu, iov_cnt, VNET_HLEN);
 			}
-			vu_flush(vdev, vq, elem, iov_used);
+			vu_flush(vdev, vq, elem, iov_cnt);
 			vu_queue_notify(vdev, vq);
 		}
 	}
-- 
2.54.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 05/10] udp_vu: Pass iov explicitly to helpers instead of using file-scoped array
  2026-05-13 11:52 [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element Laurent Vivier
                   ` (3 preceding siblings ...)
  2026-05-13 11:52 ` [PATCH v4 04/10] udp_vu: Move virtqueue management from udp_vu_sock_recv() to its caller Laurent Vivier
@ 2026-05-13 11:52 ` Laurent Vivier
  2026-05-13 11:52 ` [PATCH v4 06/10] checksum: Pass explicit L4 length to checksum functions Laurent Vivier
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Laurent Vivier @ 2026-05-13 11:52 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier, David Gibson, Jon Maloy

udp_vu_sock_recv(), udp_vu_prepare(), and udp_vu_csum() all operated on
the file-scoped iov_vu[] array directly.  Pass iov and count as explicit
parameters instead, and move iov_vu[] and elem[] to function-local
statics in udp_vu_sock_to_tap(), the only function that needs them.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Jon Maloy <jmaloy@redhat.com>
---
 udp_vu.c | 67 +++++++++++++++++++++++++++++---------------------------
 1 file changed, 35 insertions(+), 32 deletions(-)

diff --git a/udp_vu.c b/udp_vu.c
index bd9fd5abb971..96c65803a0ea 100644
--- a/udp_vu.c
+++ b/udp_vu.c
@@ -33,9 +33,6 @@
 #include "udp_vu.h"
 #include "vu_common.h"
 
-static struct iovec     iov_vu		[VIRTQUEUE_MAX_SIZE];
-static struct vu_virtq_element	elem		[VIRTQUEUE_MAX_SIZE];
-
 /**
  * udp_vu_hdrlen() - Sum size of all headers, from UDP to virtio-net
  * @v6:		Set for IPv6 packet
@@ -58,15 +55,16 @@ static size_t udp_vu_hdrlen(bool v6)
 
 /**
  * udp_vu_sock_recv() - Receive datagrams from socket into vhost-user buffers
+ * @iov:	IO vector for the frame (in/out)
+ * @cnt:	Number of available entries in @iov (input)
+ * 		Number of used entries in @iov to store the datagram (output)
+ * 		Unchanged on failure
  * @s:		Socket to receive from
  * @v6:		Set for IPv6 connections
- * @iov_cnt:	Number of collected iov in iov_vu (input)
- * 		Number of iov entries used to store the datagram (output)
- * 		Unchanged on failure
  *
  * Return: size of received data, -1 on error
  */
-static ssize_t udp_vu_sock_recv(int s, bool v6, size_t *iov_cnt)
+static ssize_t udp_vu_sock_recv(struct iovec *iov, size_t *cnt, int s, bool v6)
 {
 	struct msghdr msg  = { 0 };
 	size_t hdrlen, l2len;
@@ -76,27 +74,27 @@ static ssize_t udp_vu_sock_recv(int s, bool v6, size_t *iov_cnt)
 	hdrlen = udp_vu_hdrlen(v6);
 
 	/* reserve space for the headers */
-	assert(iov_vu[0].iov_len >= MAX(hdrlen, ETH_ZLEN + VNET_HLEN));
-	iov_vu[0].iov_base = (char *)iov_vu[0].iov_base + hdrlen;
-	iov_vu[0].iov_len -= hdrlen;
+	assert(iov[0].iov_len >= MAX(hdrlen, ETH_ZLEN + VNET_HLEN));
+	iov[0].iov_base = (char *)iov[0].iov_base + hdrlen;
+	iov[0].iov_len -= hdrlen;
 
 	/* read data from the socket */
-	msg.msg_iov = iov_vu;
-	msg.msg_iovlen = *iov_cnt;
+	msg.msg_iov = iov;
+	msg.msg_iovlen = *cnt;
 
 	dlen = recvmsg(s, &msg, 0);
 	if (dlen < 0)
 		return -1;
 
 	/* restore the pointer to the headers address */
-	iov_vu[0].iov_base = (char *)iov_vu[0].iov_base - hdrlen;
-	iov_vu[0].iov_len += hdrlen;
+	iov[0].iov_base = (char *)iov[0].iov_base - hdrlen;
+	iov[0].iov_len += hdrlen;
 
-	*iov_cnt = iov_truncate(iov_vu, *iov_cnt, dlen + hdrlen);
+	*cnt = iov_truncate(iov, *cnt, dlen + hdrlen);
 
 	/* pad frame to 60 bytes: first buffer is at least ETH_ZLEN long */
 	l2len = dlen + hdrlen - VNET_HLEN;
-	vu_pad(&iov_vu[0], l2len);
+	vu_pad(&iov[0], l2len);
 
 	return dlen;
 }
@@ -104,27 +102,28 @@ static ssize_t udp_vu_sock_recv(int s, bool v6, size_t *iov_cnt)
 /**
  * udp_vu_prepare() - Prepare the packet header
  * @c:		Execution context
+ * @iov:	IO vector for the frame (including vnet header)
  * @toside:	Address information for one side of the flow
  * @dlen:	Packet data length
  *
  * Return: Layer-4 length
  */
-static size_t udp_vu_prepare(const struct ctx *c,
+static size_t udp_vu_prepare(const struct ctx *c, const struct iovec *iov,
 			     const struct flowside *toside, ssize_t dlen)
 {
 	struct ethhdr *eh;
 	size_t l4len;
 
 	/* ethernet header */
-	eh = vu_eth(iov_vu[0].iov_base);
+	eh = vu_eth(iov[0].iov_base);
 
 	memcpy(eh->h_dest, c->guest_mac, sizeof(eh->h_dest));
 	memcpy(eh->h_source, c->our_tap_mac, sizeof(eh->h_source));
 
 	/* initialize header */
 	if (inany_v4(&toside->eaddr) && inany_v4(&toside->oaddr)) {
-		struct iphdr *iph = vu_ip(iov_vu[0].iov_base);
-		struct udp_payload_t *bp = vu_payloadv4(iov_vu[0].iov_base);
+		struct iphdr *iph = vu_ip(iov[0].iov_base);
+		struct udp_payload_t *bp = vu_payloadv4(iov[0].iov_base);
 
 		eh->h_proto = htons(ETH_P_IP);
 
@@ -132,8 +131,8 @@ static size_t udp_vu_prepare(const struct ctx *c,
 
 		l4len = udp_update_hdr4(iph, bp, toside, dlen, true);
 	} else {
-		struct ipv6hdr *ip6h = vu_ip(iov_vu[0].iov_base);
-		struct udp_payload_t *bp = vu_payloadv6(iov_vu[0].iov_base);
+		struct ipv6hdr *ip6h = vu_ip(iov[0].iov_base);
+		struct udp_payload_t *bp = vu_payloadv6(iov[0].iov_base);
 
 		eh->h_proto = htons(ETH_P_IPV6);
 
@@ -148,23 +147,25 @@ static size_t udp_vu_prepare(const struct ctx *c,
 /**
  * udp_vu_csum() - Calculate and set checksum for a UDP packet
  * @toside:	Address information for one side of the flow
- * @iov_used:	Number of used iov_vu items
+ * @iov:	IO vector for the frame
+ * @cnt:	Number of IO vector entries
  */
-static void udp_vu_csum(const struct flowside *toside, int iov_used)
+static void udp_vu_csum(const struct flowside *toside, const struct iovec *iov,
+			size_t cnt)
 {
 	const struct in_addr *src4 = inany_v4(&toside->oaddr);
 	const struct in_addr *dst4 = inany_v4(&toside->eaddr);
-	char *base = iov_vu[0].iov_base;
+	char *base = iov[0].iov_base;
 	struct udp_payload_t *bp;
 	struct iov_tail data;
 
 	if (src4 && dst4) {
 		bp = vu_payloadv4(base);
-		data = IOV_TAIL(iov_vu, iov_used, (char *)&bp->data - base);
+		data = IOV_TAIL(iov, cnt, (char *)&bp->data - base);
 		csum_udp4(&bp->uh, *src4, *dst4, &data);
 	} else {
 		bp = vu_payloadv6(base);
-		data = IOV_TAIL(iov_vu, iov_used, (char *)&bp->data - base);
+		data = IOV_TAIL(iov, cnt, (char *)&bp->data - base);
 		csum_udp6(&bp->uh, &toside->oaddr.a6, &toside->eaddr.a6, &data);
 	}
 }
@@ -180,6 +181,8 @@ void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx)
 {
 	const struct flowside *toside = flowside_at_sidx(tosidx);
 	bool v6 = !(inany_v4(&toside->eaddr) && inany_v4(&toside->oaddr));
+	static struct vu_virtq_element elem[VIRTQUEUE_MAX_SIZE];
+	static struct iovec iov_vu[VIRTQUEUE_MAX_SIZE];
 	struct vu_dev *vdev = c->vdev;
 	struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE];
 	int i;
@@ -212,9 +215,9 @@ void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx)
 
 		assert((size_t)elem_cnt == iov_cnt);	/* one iovec per element */
 
-		dlen = udp_vu_sock_recv(s, v6, &iov_cnt);
+		dlen = udp_vu_sock_recv(iov_vu, &iov_cnt, s, v6);
 		if (dlen < 0) {
-			vu_queue_rewind(vq, iov_cnt);
+			vu_queue_rewind(vq, elem_cnt);
 			break;
 		}
 
@@ -224,12 +227,12 @@ void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx)
 		vu_queue_rewind(vq, elem_cnt - elem_used);
 
 		if (iov_cnt > 0) {
-			udp_vu_prepare(c, toside, dlen);
+			udp_vu_prepare(c, iov_vu, toside, dlen);
 			if (*c->pcap) {
-				udp_vu_csum(toside, iov_cnt);
+				udp_vu_csum(toside, iov_vu, iov_cnt);
 				pcap_iov(iov_vu, iov_cnt, VNET_HLEN);
 			}
-			vu_flush(vdev, vq, elem, iov_cnt);
+			vu_flush(vdev, vq, elem, elem_used);
 			vu_queue_notify(vdev, vq);
 		}
 	}
-- 
2.54.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 06/10] checksum: Pass explicit L4 length to checksum functions
  2026-05-13 11:52 [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element Laurent Vivier
                   ` (4 preceding siblings ...)
  2026-05-13 11:52 ` [PATCH v4 05/10] udp_vu: Pass iov explicitly to helpers instead of using file-scoped array Laurent Vivier
@ 2026-05-13 11:52 ` Laurent Vivier
  2026-05-13 11:52 ` [PATCH v4 07/10] pcap: Pass explicit L2 length to pcap_iov() Laurent Vivier
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Laurent Vivier @ 2026-05-13 11:52 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier, David Gibson, Jon Maloy

The iov_tail passed to csum_iov_tail() may contain padding or trailing
data beyond the actual L4 payload.  Rather than relying on
iov_tail_size() to determine how many bytes to checksum, pass the
length explicitly so that only the relevant payload bytes are included
in the checksum computation.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Jon Maloy <jmaloy@redhat.com>
---
 checksum.c | 43 +++++++++++++++++++++++++------------------
 checksum.h |  6 +++---
 tap.c      |  4 ++--
 tcp.c      | 12 +++++++-----
 udp.c      |  5 +++--
 udp_vu.c   | 21 +++++++++------------
 6 files changed, 49 insertions(+), 42 deletions(-)

diff --git a/checksum.c b/checksum.c
index 828f9ecc9c02..7c62e42d6d4c 100644
--- a/checksum.c
+++ b/checksum.c
@@ -182,21 +182,22 @@ static uint16_t csum(const void *buf, size_t len, uint32_t init)
  * @saddr:	IPv4 source address
  * @daddr:	IPv4 destination address
  * @data:	UDP payload (as IO vector tail)
+ * @dlen:	UDP payload length
  */
 void csum_udp4(struct udphdr *udp4hr,
 	       struct in_addr saddr, struct in_addr daddr,
-	       struct iov_tail *data)
+	       struct iov_tail *data, size_t dlen)
 {
 	/* UDP checksums are optional, so don't bother */
 	udp4hr->check = 0;
 
 	if (UDP4_REAL_CHECKSUMS) {
-		uint16_t l4len = iov_tail_size(data) + sizeof(struct udphdr);
-		uint32_t psum = proto_ipv4_header_psum(l4len, IPPROTO_UDP,
-						       saddr, daddr);
+		uint32_t psum = proto_ipv4_header_psum(sizeof(*udp4hr) + dlen,
+						       IPPROTO_UDP, saddr,
+						       daddr);
 
-		psum = csum_unfolded(udp4hr, sizeof(struct udphdr), psum);
-		udp4hr->check = csum_iov_tail(data, psum);
+		psum = csum_unfolded(udp4hr, sizeof(*udp4hr), psum);
+		udp4hr->check = csum_iov_tail(data, psum, dlen);
 	}
 }
 
@@ -245,19 +246,19 @@ uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol,
  * @saddr:	Source address
  * @daddr:	Destination address
  * @data:	UDP payload (as IO vector tail)
+ * @dlen:	UDP payload length
  */
 void csum_udp6(struct udphdr *udp6hr,
 	       const struct in6_addr *saddr, const struct in6_addr *daddr,
-	       struct iov_tail *data)
+	       struct iov_tail *data, size_t dlen)
 {
-	uint16_t l4len = iov_tail_size(data) + sizeof(struct udphdr);
-	uint32_t psum = proto_ipv6_header_psum(l4len, IPPROTO_UDP,
-					       saddr, daddr);
+	uint32_t psum = proto_ipv6_header_psum(dlen + sizeof(*udp6hr),
+					       IPPROTO_UDP, saddr, daddr);
 
 	udp6hr->check = 0;
 
-	psum = csum_unfolded(udp6hr, sizeof(struct udphdr), psum);
-	udp6hr->check = csum_iov_tail(data, psum);
+	psum = csum_unfolded(udp6hr, sizeof(*udp6hr), psum);
+	udp6hr->check = csum_iov_tail(data, psum, dlen);
 }
 
 /**
@@ -604,20 +605,26 @@ uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init)
 /**
  * csum_iov_tail() - Calculate unfolded checksum for the tail of an IO vector
  * @tail:	IO vector tail to checksum
- * @init	Initial 32-bit checksum, 0 for no pre-computed checksum
+ * @init:	Initial 32-bit checksum, 0 for no pre-computed checksum
+ * @len:	Number of bytes to checksum from @tail
  *
  * Return: 16-bit folded, complemented checksum
  */
-uint16_t csum_iov_tail(struct iov_tail *tail, uint32_t init)
+uint16_t csum_iov_tail(struct iov_tail *tail, uint32_t init, size_t len)
 {
 	if (iov_tail_prune(tail)) {
-		size_t i;
+		size_t i, n;
 
+		n = MIN(len, tail->iov[0].iov_len - tail->off);
 		init = csum_unfolded((char *)tail->iov[0].iov_base + tail->off,
-				     tail->iov[0].iov_len - tail->off, init);
-		for (i = 1; i < tail->cnt; i++) {
+				     n, init);
+		len -= n;
+
+		for (i = 1; len && i < tail->cnt; i++) {
 			const struct iovec *iov = &tail->iov[i];
-			init = csum_unfolded(iov->iov_base, iov->iov_len, init);
+			n = MIN(len, iov->iov_len);
+			init = csum_unfolded(iov->iov_base, n, init);
+			len -= n;
 		}
 	}
 	return (uint16_t)~csum_fold(init);
diff --git a/checksum.h b/checksum.h
index 4e3b098db072..6270f1457a73 100644
--- a/checksum.h
+++ b/checksum.h
@@ -21,18 +21,18 @@ uint32_t proto_ipv4_header_psum(uint16_t l4len, uint8_t protocol,
 				struct in_addr saddr, struct in_addr daddr);
 void csum_udp4(struct udphdr *udp4hr,
 	       struct in_addr saddr, struct in_addr daddr,
-	       struct iov_tail *data);
+	       struct iov_tail *data, size_t dlen);
 void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t dlen);
 uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol,
 				const struct in6_addr *saddr,
 				const struct in6_addr *daddr);
 void csum_udp6(struct udphdr *udp6hr,
 	       const struct in6_addr *saddr, const struct in6_addr *daddr,
-	       struct iov_tail *data);
+	       struct iov_tail *data, size_t dlen);
 void csum_icmp6(struct icmp6hdr *icmp6hr,
 		const struct in6_addr *saddr, const struct in6_addr *daddr,
 		const void *payload, size_t dlen);
 uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init);
-uint16_t csum_iov_tail(struct iov_tail *tail, uint32_t init);
+uint16_t csum_iov_tail(struct iov_tail *tail, uint32_t init, size_t len);
 
 #endif /* CHECKSUM_H */
diff --git a/tap.c b/tap.c
index 0920a325980e..412766c7c762 100644
--- a/tap.c
+++ b/tap.c
@@ -251,7 +251,7 @@ void *tap_push_uh4(struct udphdr *uh, struct in_addr src, in_port_t sport,
 	uh->source = htons(sport);
 	uh->dest = htons(dport);
 	uh->len = htons(l4len);
-	csum_udp4(uh, src, dst, &payload);
+	csum_udp4(uh, src, dst, &payload, dlen);
 	return (char *)uh + sizeof(*uh);
 }
 
@@ -356,7 +356,7 @@ void *tap_push_uh6(struct udphdr *uh,
 	uh->source = htons(sport);
 	uh->dest = htons(dport);
 	uh->len = htons(l4len);
-	csum_udp6(uh, src, dst, &payload);
+	csum_udp6(uh, src, dst, &payload, dlen);
 	return (char *)uh + sizeof(*uh);
 }
 
diff --git a/tcp.c b/tcp.c
index d6a9ba28a531..9ab67e5a69ec 100644
--- a/tcp.c
+++ b/tcp.c
@@ -809,13 +809,14 @@ static void tcp_sock_set_nodelay(int s)
  * @psum:	Unfolded partial checksum of the IPv4 or IPv6 pseudo-header
  * @th:		TCP header (updated)
  * @payload:	TCP payload
+ * @dlen:	TCP payload length
  */
 static void tcp_update_csum(uint32_t psum, struct tcphdr *th,
-			    struct iov_tail *payload)
+			    struct iov_tail *payload, size_t dlen)
 {
 	th->check = 0;
 	psum = csum_unfolded(th, sizeof(*th), psum);
-	th->check = csum_iov_tail(payload, psum);
+	th->check = csum_iov_tail(payload, psum, dlen);
 }
 
 /**
@@ -952,7 +953,8 @@ size_t tcp_fill_headers(const struct ctx *c, struct tcp_tap_conn *conn,
 			bool no_tcp_csum)
 {
 	const struct flowside *tapside = TAPFLOW(conn);
-	size_t l4len = iov_tail_size(payload) + sizeof(*th);
+	size_t dlen = iov_tail_size(payload);
+	size_t l4len = dlen + sizeof(*th);
 	uint8_t *omac = conn->f.tap_omac;
 	size_t l3len = l4len;
 	uint32_t psum = 0;
@@ -1013,7 +1015,7 @@ size_t tcp_fill_headers(const struct ctx *c, struct tcp_tap_conn *conn,
 	if (no_tcp_csum)
 		th->check = 0;
 	else
-		tcp_update_csum(psum, th, payload);
+		tcp_update_csum(psum, th, payload, dlen);
 
 	return MAX(l3len + sizeof(struct ethhdr), ETH_ZLEN);
 }
@@ -2225,7 +2227,7 @@ static void tcp_rst_no_conn(const struct ctx *c, int af,
 		rsth->ack = 1;
 	}
 
-	tcp_update_csum(psum, rsth, &payload);
+	tcp_update_csum(psum, rsth, &payload, 0);
 	rst_l2len = ((char *)rsth - buf) + sizeof(*rsth);
 	tap_send_single(c, buf, rst_l2len);
 }
diff --git a/udp.c b/udp.c
index 52260b934e60..66dc7766868c 100644
--- a/udp.c
+++ b/udp.c
@@ -289,7 +289,7 @@ size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
 			.iov_len = dlen
 		};
 		struct iov_tail data = IOV_TAIL(&iov, 1, 0);
-		csum_udp4(&bp->uh, *src, *dst, &data);
+		csum_udp4(&bp->uh, *src, *dst, &data, dlen);
 	}
 
 	return l4len;
@@ -334,7 +334,8 @@ size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp,
 			.iov_len = dlen
 		};
 		struct iov_tail data = IOV_TAIL(&iov, 1, 0);
-		csum_udp6(&bp->uh, &toside->oaddr.a6, &toside->eaddr.a6, &data);
+		csum_udp6(&bp->uh, &toside->oaddr.a6, &toside->eaddr.a6, &data,
+			  dlen);
 	}
 
 	return l4len;
diff --git a/udp_vu.c b/udp_vu.c
index 96c65803a0ea..523447a81fae 100644
--- a/udp_vu.c
+++ b/udp_vu.c
@@ -105,14 +105,11 @@ static ssize_t udp_vu_sock_recv(struct iovec *iov, size_t *cnt, int s, bool v6)
  * @iov:	IO vector for the frame (including vnet header)
  * @toside:	Address information for one side of the flow
  * @dlen:	Packet data length
- *
- * Return: Layer-4 length
  */
-static size_t udp_vu_prepare(const struct ctx *c, const struct iovec *iov,
+static void udp_vu_prepare(const struct ctx *c, const struct iovec *iov,
 			     const struct flowside *toside, ssize_t dlen)
 {
 	struct ethhdr *eh;
-	size_t l4len;
 
 	/* ethernet header */
 	eh = vu_eth(iov[0].iov_base);
@@ -129,7 +126,7 @@ static size_t udp_vu_prepare(const struct ctx *c, const struct iovec *iov,
 
 		*iph = (struct iphdr)L2_BUF_IP4_INIT(IPPROTO_UDP);
 
-		l4len = udp_update_hdr4(iph, bp, toside, dlen, true);
+		udp_update_hdr4(iph, bp, toside, dlen, true);
 	} else {
 		struct ipv6hdr *ip6h = vu_ip(iov[0].iov_base);
 		struct udp_payload_t *bp = vu_payloadv6(iov[0].iov_base);
@@ -138,10 +135,8 @@ static size_t udp_vu_prepare(const struct ctx *c, const struct iovec *iov,
 
 		*ip6h = (struct ipv6hdr)L2_BUF_IP6_INIT(IPPROTO_UDP);
 
-		l4len = udp_update_hdr6(ip6h, bp, toside, dlen, true);
+		udp_update_hdr6(ip6h, bp, toside, dlen, true);
 	}
-
-	return l4len;
 }
 
 /**
@@ -149,9 +144,10 @@ static size_t udp_vu_prepare(const struct ctx *c, const struct iovec *iov,
  * @toside:	Address information for one side of the flow
  * @iov:	IO vector for the frame
  * @cnt:	Number of IO vector entries
+ * @dlen:	Data length
  */
 static void udp_vu_csum(const struct flowside *toside, const struct iovec *iov,
-			size_t cnt)
+			size_t cnt, size_t dlen)
 {
 	const struct in_addr *src4 = inany_v4(&toside->oaddr);
 	const struct in_addr *dst4 = inany_v4(&toside->eaddr);
@@ -162,11 +158,12 @@ static void udp_vu_csum(const struct flowside *toside, const struct iovec *iov,
 	if (src4 && dst4) {
 		bp = vu_payloadv4(base);
 		data = IOV_TAIL(iov, cnt, (char *)&bp->data - base);
-		csum_udp4(&bp->uh, *src4, *dst4, &data);
+		csum_udp4(&bp->uh, *src4, *dst4, &data, dlen);
 	} else {
 		bp = vu_payloadv6(base);
 		data = IOV_TAIL(iov, cnt, (char *)&bp->data - base);
-		csum_udp6(&bp->uh, &toside->oaddr.a6, &toside->eaddr.a6, &data);
+		csum_udp6(&bp->uh, &toside->oaddr.a6, &toside->eaddr.a6, &data,
+			  dlen);
 	}
 }
 
@@ -229,7 +226,7 @@ void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx)
 		if (iov_cnt > 0) {
 			udp_vu_prepare(c, iov_vu, toside, dlen);
 			if (*c->pcap) {
-				udp_vu_csum(toside, iov_vu, iov_cnt);
+				udp_vu_csum(toside, iov_vu, iov_cnt, dlen);
 				pcap_iov(iov_vu, iov_cnt, VNET_HLEN);
 			}
 			vu_flush(vdev, vq, elem, elem_used);
-- 
2.54.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 07/10] pcap: Pass explicit L2 length to pcap_iov()
  2026-05-13 11:52 [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element Laurent Vivier
                   ` (5 preceding siblings ...)
  2026-05-13 11:52 ` [PATCH v4 06/10] checksum: Pass explicit L4 length to checksum functions Laurent Vivier
@ 2026-05-13 11:52 ` Laurent Vivier
  2026-05-13 11:52 ` [PATCH v4 08/10] vu_common: Pass explicit frame length to vu_flush() Laurent Vivier
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Laurent Vivier @ 2026-05-13 11:52 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier, Jon Maloy, David Gibson

With vhost-user multibuffer frames, the iov can be larger than the
actual L2 frame. The previous approach of computing L2 length as
iov_size() - offset would overcount and write extra bytes into the
pcap file.

Pass the L2 frame length explicitly to pcap_frame() and pcap_iov(),
and write exactly that many bytes instead of the full iov remainder.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Jon Maloy <jmaloy@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 pcap.c      | 29 ++++++++++++++++++++---------
 pcap.h      |  3 ++-
 tap.c       |  6 ++++--
 tcp_vu.c    | 14 ++++++++------
 udp_vu.c    |  4 +++-
 util.c      | 31 +++++++++++++++++++++++++++----
 util.h      |  3 ++-
 vu_common.c |  2 +-
 8 files changed, 67 insertions(+), 25 deletions(-)

diff --git a/pcap.c b/pcap.c
index a026f17e7974..e4b47ce747e7 100644
--- a/pcap.c
+++ b/pcap.c
@@ -52,12 +52,12 @@ struct pcap_pkthdr {
  * @iov:	IO vector containing frame (with L2 headers and tap headers)
  * @iovcnt:	Number of buffers (@iov entries) in frame
  * @offset:	Byte offset of the L2 headers within @iov
+ * @l2len:	Length of L2 frame data to capture
  * @now:	Timestamp
  */
 static void pcap_frame(const struct iovec *iov, size_t iovcnt,
-		       size_t offset, const struct timespec *now)
+		       size_t offset, size_t l2len, const struct timespec *now)
 {
-	size_t l2len = iov_size(iov, iovcnt) - offset;
 	struct pcap_pkthdr h = {
 		.tv_sec = now->tv_sec,
 		.tv_usec = DIV_ROUND_CLOSEST(now->tv_nsec, 1000),
@@ -65,9 +65,15 @@ static void pcap_frame(const struct iovec *iov, size_t iovcnt,
 		.len = l2len
 	};
 
-	if (write_all_buf(pcap_fd, &h, sizeof(h)) < 0 ||
-	    write_remainder(pcap_fd, iov, iovcnt, offset) < 0)
+	if (write_all_buf(pcap_fd, &h, sizeof(h)) < 0) {
+		debug_perror("Cannot log packet, packet header error");
+		return;
+	}
+
+	if (write_remainder(pcap_fd, iov, iovcnt, offset, l2len) < 0) {
 		debug_perror("Cannot log packet, length %zu", l2len);
+		return;
+	}
 }
 
 /**
@@ -87,7 +93,7 @@ void pcap(const char *pkt, size_t l2len)
 	if (clock_gettime(CLOCK_REALTIME, &now))
 		err_perror("Failed to get CLOCK_REALTIME time");
 
-	pcap_frame(&iov, 1, 0, &now);
+	pcap_frame(&iov, 1, 0, l2len, &now);
 }
 
 /**
@@ -109,8 +115,11 @@ void pcap_multiple(const struct iovec *iov, size_t frame_parts, unsigned int n,
 	if (clock_gettime(CLOCK_REALTIME, &now))
 		err_perror("Failed to get CLOCK_REALTIME time");
 
-	for (i = 0; i < n; i++)
-		pcap_frame(iov + i * frame_parts, frame_parts, offset, &now);
+	for (i = 0; i < n; i++) {
+		pcap_frame(iov + i * frame_parts, frame_parts, offset,
+			   iov_size(iov + i * frame_parts, frame_parts) - offset,
+			   &now);
+	}
 }
 
 /**
@@ -120,8 +129,10 @@ void pcap_multiple(const struct iovec *iov, size_t frame_parts, unsigned int n,
  *		containing packet data to write, including L2 header
  * @iovcnt:	Number of buffers (@iov entries)
  * @offset:	Offset of the L2 frame within the full data length
+ * @l2len:	Length of L2 frame data to capture
  */
-void pcap_iov(const struct iovec *iov, size_t iovcnt, size_t offset)
+void pcap_iov(const struct iovec *iov, size_t iovcnt, size_t offset,
+	      size_t l2len)
 {
 	struct timespec now = { 0 };
 
@@ -131,7 +142,7 @@ void pcap_iov(const struct iovec *iov, size_t iovcnt, size_t offset)
 	if (clock_gettime(CLOCK_REALTIME, &now))
 		err_perror("Failed to get CLOCK_REALTIME time");
 
-	pcap_frame(iov, iovcnt, offset, &now);
+	pcap_frame(iov, iovcnt, offset, l2len, &now);
 }
 
 /**
diff --git a/pcap.h b/pcap.h
index dface5df4ee6..6c9d5c82146b 100644
--- a/pcap.h
+++ b/pcap.h
@@ -13,7 +13,8 @@ extern int pcap_fd;
 void pcap(const char *pkt, size_t l2len);
 void pcap_multiple(const struct iovec *iov, size_t frame_parts, unsigned int n,
 		   size_t offset);
-void pcap_iov(const struct iovec *iov, size_t iovcnt, size_t offset);
+void pcap_iov(const struct iovec *iov, size_t iovcnt, size_t offset,
+	      size_t l2len);
 void pcap_init(struct ctx *c);
 
 #endif /* PCAP_H */
diff --git a/tap.c b/tap.c
index 412766c7c762..bf0904f9bc8e 100644
--- a/tap.c
+++ b/tap.c
@@ -499,7 +499,8 @@ static size_t tap_send_frames_passt(const struct ctx *c,
 		/* Number of unsent or partially sent buffers for the frame */
 		size_t rembufs = bufs_per_frame - (i % bufs_per_frame);
 
-		if (write_remainder(c->fd_tap, &iov[i], rembufs, buf_offset) < 0) {
+		if (write_remainder(c->fd_tap, &iov[i], rembufs, buf_offset,
+				    SIZE_MAX) < 0) {
 			err_perror("tap: partial frame send");
 			return i;
 		}
@@ -1157,10 +1158,11 @@ void tap_handler(struct ctx *c, const struct timespec *now)
 void tap_add_packet(struct ctx *c, struct iov_tail *data,
 		    const struct timespec *now)
 {
+	size_t l2len = iov_tail_size(data);
 	struct ethhdr eh_storage;
 	const struct ethhdr *eh;
 
-	pcap_iov(data->iov, data->cnt, data->off);
+	pcap_iov(data->iov, data->cnt, data->off, l2len);
 
 	eh = IOV_PEEK_HEADER(data, eh_storage);
 	if (!eh)
diff --git a/tcp_vu.c b/tcp_vu.c
index 95084cb4763c..d6f38754859c 100644
--- a/tcp_vu.c
+++ b/tcp_vu.c
@@ -130,7 +130,8 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
 		return ret;
 	}
 
-	iov_truncate(&flags_iov[0], 1, hdrlen + optlen);
+	l2len = hdrlen + optlen - VNET_HLEN;
+	iov_truncate(&flags_iov[0], 1, l2len + VNET_HLEN);
 	payload = IOV_TAIL(flags_elem[0].in_sg, 1, hdrlen);
 
 	if (flags & KEEPALIVE)
@@ -139,13 +140,12 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
 	tcp_fill_headers(c, conn, eh, ip4h, ip6h, th, &payload,
 			 NULL, seq, !*c->pcap);
 
-	l2len = optlen + hdrlen - VNET_HLEN;
 	vu_pad(&flags_elem[0].in_sg[0], l2len);
 
 	vu_flush(vdev, vq, flags_elem, 1);
 
 	if (*c->pcap)
-		pcap_iov(&flags_elem[0].in_sg[0], 1, VNET_HLEN);
+		pcap_iov(&flags_elem[0].in_sg[0], 1, VNET_HLEN, l2len);
 
 	if (flags & DUP_ACK) {
 		elem_cnt = vu_collect(vdev, vq, &flags_elem[1], 1,
@@ -160,8 +160,10 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
 
 			vu_flush(vdev, vq, &flags_elem[1], 1);
 
-			if (*c->pcap)
-				pcap_iov(&flags_elem[1].in_sg[0], 1, VNET_HLEN);
+			if (*c->pcap) {
+				pcap_iov(&flags_elem[1].in_sg[0], 1, VNET_HLEN,
+					 l2len);
+			}
 		}
 	}
 	vu_queue_notify(vdev, vq);
@@ -466,7 +468,7 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
 		vu_flush(vdev, vq, &elem[head[i]], buf_cnt);
 
 		if (*c->pcap)
-			pcap_iov(iov, buf_cnt, VNET_HLEN);
+			pcap_iov(iov, buf_cnt, VNET_HLEN, l2len);
 
 		conn->seq_to_tap += dlen;
 	}
diff --git a/udp_vu.c b/udp_vu.c
index 523447a81fae..3ff643478616 100644
--- a/udp_vu.c
+++ b/udp_vu.c
@@ -182,6 +182,7 @@ void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx)
 	static struct iovec iov_vu[VIRTQUEUE_MAX_SIZE];
 	struct vu_dev *vdev = c->vdev;
 	struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE];
+	size_t hdrlen = udp_vu_hdrlen(v6);
 	int i;
 
 	assert(!c->no_udp);
@@ -227,7 +228,8 @@ void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx)
 			udp_vu_prepare(c, iov_vu, toside, dlen);
 			if (*c->pcap) {
 				udp_vu_csum(toside, iov_vu, iov_cnt, dlen);
-				pcap_iov(iov_vu, iov_cnt, VNET_HLEN);
+				pcap_iov(iov_vu, iov_cnt, VNET_HLEN,
+					 hdrlen + dlen - VNET_HLEN);
 			}
 			vu_flush(vdev, vq, elem, elem_used);
 			vu_queue_notify(vdev, vq);
diff --git a/util.c b/util.c
index 73c9d51d7b4a..b64c29ed7723 100644
--- a/util.c
+++ b/util.c
@@ -722,31 +722,54 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags,
  * @iov:	IO vector
  * @iovcnt:	Number of entries in @iov
  * @skip:	Number of bytes of the vector to skip writing
+ * @length:	Maximum number of bytes of the vector to write
  *
  * Return: 0 on success, -1 on error (with errno set)
  *
  * #syscalls writev
  */
-int write_remainder(int fd, const struct iovec *iov, size_t iovcnt, size_t skip)
+int write_remainder(int fd, const struct iovec *iov, size_t iovcnt,
+		    size_t skip, size_t length)
 {
 	size_t i = 0, offset;
 
-	while ((i += iov_skip_bytes(iov + i, iovcnt - i, skip, &offset)) < iovcnt) {
+	while (length &&
+	       (i += iov_skip_bytes(iov + i, iovcnt - i, skip, &offset)) < iovcnt) {
 		ssize_t rc;
+		size_t end;
 
 		if (offset) {
+			size_t len = MIN(length, iov[i].iov_len - offset);
+
 			/* Write the remainder of the partially written buffer */
 			if (write_all_buf(fd, (char *)iov[i].iov_base + offset,
-					  iov[i].iov_len - offset) < 0)
+					  len) < 0)
 				return -1;
+
+			length -= len;
 			i++;
+
+			if (!length || i >= iovcnt)
+				break;
+		}
+
+		end = iov_skip_bytes(iov + i, iovcnt - i, length, NULL);
+
+		/* Write a trailing partial buffer */
+		if (!end) {
+			size_t len = MIN(length, iov[i].iov_len);
+
+			if (write_all_buf(fd, iov[i].iov_base, len) < 0)
+				return -1;
+			break;
 		}
 
 		/* Write as much of the remaining whole buffers as we can */
-		rc = writev(fd, &iov[i], iovcnt - i);
+		rc = writev(fd, &iov[i], end);
 		if (rc < 0)
 			return -1;
 
+		length -= rc;
 		skip = rc;
 	}
 	return 0;
diff --git a/util.h b/util.h
index 70aadebaa085..a4fe2527575f 100644
--- a/util.h
+++ b/util.h
@@ -164,7 +164,8 @@ int fls(unsigned long x);
 int ilog2(unsigned long x);
 int write_file(const char *path, const char *buf);
 intmax_t read_file_integer(const char *path, intmax_t fallback);
-int write_remainder(int fd, const struct iovec *iov, size_t iovcnt, size_t skip);
+int write_remainder(int fd, const struct iovec *iov, size_t iovcnt,
+		    size_t skip, size_t length);
 int read_remainder(int fd, const struct iovec *iov, size_t cnt, size_t skip);
 void close_open_files(int argc, char **argv);
 bool snprintf_check(char *str, size_t size, const char *format, ...);
diff --git a/vu_common.c b/vu_common.c
index 57949ca32309..f254cb67ec78 100644
--- a/vu_common.c
+++ b/vu_common.c
@@ -268,7 +268,7 @@ int vu_send_single(const struct ctx *c, const void *buf, size_t size)
 	iov_from_buf(in_sg, in_total, VNET_HLEN, buf, total);
 
 	if (*c->pcap)
-		pcap_iov(in_sg, in_total, VNET_HLEN);
+		pcap_iov(in_sg, in_total, VNET_HLEN, size);
 
 	vu_flush(vdev, vq, elem, elem_cnt);
 	vu_queue_notify(vdev, vq);
-- 
2.54.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 08/10] vu_common: Pass explicit frame length to vu_flush()
  2026-05-13 11:52 [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element Laurent Vivier
                   ` (6 preceding siblings ...)
  2026-05-13 11:52 ` [PATCH v4 07/10] pcap: Pass explicit L2 length to pcap_iov() Laurent Vivier
@ 2026-05-13 11:52 ` Laurent Vivier
  2026-05-13 11:52 ` [PATCH v4 09/10] tcp: Pass explicit data length to tcp_fill_headers() Laurent Vivier
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Laurent Vivier @ 2026-05-13 11:52 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier, David Gibson, Jon Maloy

Currently vu_flush() derives the frame size from the iov, but in
preparation for iov arrays that may be larger than the actual frame,
pass the total length (including vnet header) explicitly so that only
the relevant portion is reported to the virtqueue.

Ensure a minimum frame size of ETH_ZLEN + VNET_HLEN to handle short
frames. All elements are still flushed to avoid descriptor leaks,
but trailing elements beyond frame_len will report a zero length.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Jon Maloy <jmaloy@redhat.com>
---
 tcp_vu.c    |  6 +++---
 udp_vu.c    |  2 +-
 vu_common.c | 16 ++++++++++++----
 vu_common.h |  2 +-
 4 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/tcp_vu.c b/tcp_vu.c
index d6f38754859c..d744ec705911 100644
--- a/tcp_vu.c
+++ b/tcp_vu.c
@@ -142,7 +142,7 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
 
 	vu_pad(&flags_elem[0].in_sg[0], l2len);
 
-	vu_flush(vdev, vq, flags_elem, 1);
+	vu_flush(vdev, vq, flags_elem, 1, hdrlen + optlen);
 
 	if (*c->pcap)
 		pcap_iov(&flags_elem[0].in_sg[0], 1, VNET_HLEN, l2len);
@@ -158,7 +158,7 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
 			       flags_elem[0].in_sg[0].iov_base,
 			       flags_elem[0].in_sg[0].iov_len);
 
-			vu_flush(vdev, vq, &flags_elem[1], 1);
+			vu_flush(vdev, vq, &flags_elem[1], 1, hdrlen + optlen);
 
 			if (*c->pcap) {
 				pcap_iov(&flags_elem[1].in_sg[0], 1, VNET_HLEN,
@@ -465,7 +465,7 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
 		l2len = dlen + hdrlen - VNET_HLEN;
 		vu_pad(iov, l2len);
 
-		vu_flush(vdev, vq, &elem[head[i]], buf_cnt);
+		vu_flush(vdev, vq, &elem[head[i]], buf_cnt, dlen + hdrlen);
 
 		if (*c->pcap)
 			pcap_iov(iov, buf_cnt, VNET_HLEN, l2len);
diff --git a/udp_vu.c b/udp_vu.c
index 3ff643478616..3c9fff53324c 100644
--- a/udp_vu.c
+++ b/udp_vu.c
@@ -231,7 +231,7 @@ void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx)
 				pcap_iov(iov_vu, iov_cnt, VNET_HLEN,
 					 hdrlen + dlen - VNET_HLEN);
 			}
-			vu_flush(vdev, vq, elem, elem_used);
+			vu_flush(vdev, vq, elem, elem_used, hdrlen + dlen);
 			vu_queue_notify(vdev, vq);
 		}
 	}
diff --git a/vu_common.c b/vu_common.c
index f254cb67ec78..704e908aa02c 100644
--- a/vu_common.c
+++ b/vu_common.c
@@ -134,18 +134,26 @@ static void vu_set_vnethdr(struct virtio_net_hdr_mrg_rxbuf *vnethdr,
  * @vq:		vhost-user virtqueue
  * @elem:	virtqueue elements array to send back to the virtqueue
  * @elem_cnt:	Length of the array
+ * @frame_len:	Total frame length including vnet header
  */
 void vu_flush(const struct vu_dev *vdev, struct vu_virtq *vq,
-	      struct vu_virtq_element *elem, int elem_cnt)
+	      struct vu_virtq_element *elem, int elem_cnt, size_t frame_len)
 {
+	size_t len;
 	int i;
 
 	vu_set_vnethdr(elem[0].in_sg[0].iov_base, elem_cnt);
 
+	len = MAX(ETH_ZLEN + VNET_HLEN, frame_len);
 	for (i = 0; i < elem_cnt; i++) {
-		size_t elem_size = iov_size(elem[i].in_sg, elem[i].in_num);
+		size_t elem_size, fill_size;
 
-		vu_queue_fill(vdev, vq, &elem[i], elem_size, i);
+		elem_size = iov_size(elem[i].in_sg, elem[i].in_num);
+		fill_size = MIN(elem_size, len);
+
+		vu_queue_fill(vdev, vq, &elem[i], fill_size, i);
+
+		len -= fill_size;
 	}
 
 	vu_queue_flush(vdev, vq, elem_cnt);
@@ -270,7 +278,7 @@ int vu_send_single(const struct ctx *c, const void *buf, size_t size)
 	if (*c->pcap)
 		pcap_iov(in_sg, in_total, VNET_HLEN, size);
 
-	vu_flush(vdev, vq, elem, elem_cnt);
+	vu_flush(vdev, vq, elem, elem_cnt, VNET_HLEN + size);
 	vu_queue_notify(vdev, vq);
 
 	trace("vhost-user sent %zu", total);
diff --git a/vu_common.h b/vu_common.h
index 4037ab765b7d..77d1849e6115 100644
--- a/vu_common.h
+++ b/vu_common.h
@@ -40,7 +40,7 @@ int vu_collect(const struct vu_dev *vdev, struct vu_virtq *vq,
 	       struct iovec *in_sg, size_t max_in_sg, size_t *in_total,
 	       size_t size, size_t *collected);
 void vu_flush(const struct vu_dev *vdev, struct vu_virtq *vq,
-	      struct vu_virtq_element *elem, int elem_cnt);
+	      struct vu_virtq_element *elem, int elem_cnt, size_t frame_len);
 void vu_kick_cb(struct vu_dev *vdev, union epoll_ref ref,
 		const struct timespec *now);
 int vu_send_single(const struct ctx *c, const void *buf, size_t size);
-- 
2.54.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 09/10] tcp: Pass explicit data length to tcp_fill_headers()
  2026-05-13 11:52 [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element Laurent Vivier
                   ` (7 preceding siblings ...)
  2026-05-13 11:52 ` [PATCH v4 08/10] vu_common: Pass explicit frame length to vu_flush() Laurent Vivier
@ 2026-05-13 11:52 ` Laurent Vivier
  2026-05-13 11:52 ` [PATCH v4 10/10] vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad() Laurent Vivier
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Laurent Vivier @ 2026-05-13 11:52 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier, David Gibson, Jon Maloy

tcp_fill_headers() computed the TCP payload length from iov_tail_size(),
but with vhost-user multibuffer frames, the iov_tail will be larger than
the actual data.  Pass the data length explicitly so that IP total
length, pseudo-header, and checksum computations use the correct value.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Jon Maloy <jmaloy@redhat.com>
---
 tcp.c          | 4 ++--
 tcp_buf.c      | 3 ++-
 tcp_internal.h | 2 +-
 tcp_vu.c       | 9 +++++----
 4 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/tcp.c b/tcp.c
index 9ab67e5a69ec..34fdd83c52b4 100644
--- a/tcp.c
+++ b/tcp.c
@@ -939,6 +939,7 @@ static void tcp_fill_header(struct tcphdr *th,
  * @ip6h:		Pointer to IPv6 header, or NULL
  * @th:			Pointer to TCP header
  * @payload:		TCP payload
+ * @dlen:		TCP payload length
  * @ip4_check:		IPv4 checksum, if already known
  * @seq:		Sequence number for this segment
  * @no_tcp_csum:	Do not set TCP checksum
@@ -949,11 +950,10 @@ size_t tcp_fill_headers(const struct ctx *c, struct tcp_tap_conn *conn,
 			struct ethhdr *eh,
 			struct iphdr *ip4h, struct ipv6hdr *ip6h,
 			struct tcphdr *th, struct iov_tail *payload,
-			const uint16_t *ip4_check, uint32_t seq,
+			size_t dlen, const uint16_t *ip4_check, uint32_t seq,
 			bool no_tcp_csum)
 {
 	const struct flowside *tapside = TAPFLOW(conn);
-	size_t dlen = iov_tail_size(payload);
 	size_t l4len = dlen + sizeof(*th);
 	uint8_t *omac = conn->f.tap_omac;
 	size_t l3len = l4len;
diff --git a/tcp_buf.c b/tcp_buf.c
index a092cb37fe9b..efdd42558e15 100644
--- a/tcp_buf.c
+++ b/tcp_buf.c
@@ -190,7 +190,8 @@ static void tcp_l2_buf_fill_headers(const struct ctx *c,
 	else
 		ip6h = iov[TCP_IOV_IP].iov_base;
 
-	l2len = tcp_fill_headers(c, conn, eh, ip4h, ip6h, th, &tail, check, seq,
+	l2len = tcp_fill_headers(c, conn, eh, ip4h, ip6h, th, &tail,
+				 iov_tail_size(&tail), check, seq,
 				 no_tcp_csum);
 	tap_hdr_update(taph, l2len);
 }
diff --git a/tcp_internal.h b/tcp_internal.h
index d9408852571f..a0fa19f4ed11 100644
--- a/tcp_internal.h
+++ b/tcp_internal.h
@@ -187,7 +187,7 @@ size_t tcp_fill_headers(const struct ctx *c, struct tcp_tap_conn *conn,
 			struct ethhdr *eh,
 			struct iphdr *ip4h, struct ipv6hdr *ip6h,
 			struct tcphdr *th, struct iov_tail *payload,
-			const uint16_t *ip4_check, uint32_t seq,
+			size_t dlen, const uint16_t *ip4_check, uint32_t seq,
 			bool no_tcp_csum);
 
 int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn,
diff --git a/tcp_vu.c b/tcp_vu.c
index d744ec705911..b879435e348a 100644
--- a/tcp_vu.c
+++ b/tcp_vu.c
@@ -138,7 +138,7 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
 		seq--;
 
 	tcp_fill_headers(c, conn, eh, ip4h, ip6h, th, &payload,
-			 NULL, seq, !*c->pcap);
+			 optlen, NULL, seq, !*c->pcap);
 
 	vu_pad(&flags_elem[0].in_sg[0], l2len);
 
@@ -282,12 +282,13 @@ static ssize_t tcp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq,
  * @conn:		Connection pointer
  * @iov:		Pointer to the array of IO vectors
  * @iov_cnt:		Number of entries in @iov
+ * @dlen:		Data length
  * @check:		Checksum, if already known
  * @no_tcp_csum:	Do not set TCP checksum
  * @push:		Set PSH flag, last segment in a batch
  */
 static void tcp_vu_prepare(const struct ctx *c, struct tcp_tap_conn *conn,
-			   struct iovec *iov, size_t iov_cnt,
+			   struct iovec *iov, size_t iov_cnt, size_t dlen,
 			   const uint16_t **check, bool no_tcp_csum, bool push)
 {
 	const struct flowside *toside = TAPFLOW(conn);
@@ -331,7 +332,7 @@ static void tcp_vu_prepare(const struct ctx *c, struct tcp_tap_conn *conn,
 	th->ack = 1;
 	th->psh = push;
 
-	tcp_fill_headers(c, conn, eh, ip4h, ip6h, th, &payload,
+	tcp_fill_headers(c, conn, eh, ip4h, ip6h, th, &payload, dlen,
 			 *check, conn->seq_to_tap, no_tcp_csum);
 	if (ip4h)
 		*check = &ip4h->check;
@@ -459,7 +460,7 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
 			check = NULL;
 		previous_dlen = dlen;
 
-		tcp_vu_prepare(c, conn, iov, buf_cnt, &check, !*c->pcap, push);
+		tcp_vu_prepare(c, conn, iov, buf_cnt, dlen, &check, !*c->pcap, push);
 
 		/* Pad first/single buffer only, it's at least ETH_ZLEN long */
 		l2len = dlen + hdrlen - VNET_HLEN;
-- 
2.54.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 10/10] vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad()
  2026-05-13 11:52 [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element Laurent Vivier
                   ` (8 preceding siblings ...)
  2026-05-13 11:52 ` [PATCH v4 09/10] tcp: Pass explicit data length to tcp_fill_headers() Laurent Vivier
@ 2026-05-13 11:52 ` Laurent Vivier
  2026-05-14  1:24   ` David Gibson
  2026-05-20  0:52 ` [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element Stefano Brivio
  2026-05-20 15:34 ` Stefano Brivio
  11 siblings, 1 reply; 26+ messages in thread
From: Laurent Vivier @ 2026-05-13 11:52 UTC (permalink / raw)
  To: passt-dev; +Cc: Laurent Vivier, Jon Maloy

The previous per-protocol padding done by vu_pad() in tcp_vu.c and
udp_vu.c was only correct for single-buffer frames: it assumed the
padding area always fell within the first iov, writing past its end
with a plain memset().

It also required each caller to compute MAX(..., ETH_ZLEN + VNET_HLEN)
for vu_collect() and to call vu_pad() at the right point, duplicating
the minimum-size logic across protocols.

Move the Ethernet minimum size enforcement into vu_collect() itself, so
that enough buffer space is always reserved for padding regardless of
the requested frame size.

Rewrite vu_pad() to take a full iovec array and use iov_memset(),
making it safe for multi-buffer (mergeable rx buffer) frames.

In tcp_vu_sock_recv(), replace iov_truncate() with iov_skip_bytes():
now that all consumers receive explicit data lengths, truncating the
iovecs is no longer needed.  In tcp_vu_data_from_sock(), cap each
frame's data length against the remaining bytes actually received from
the socket, so that the last partial frame gets correct headers and
sequence number advancement.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Jon Maloy <jmaloy@redhat.com>
---
 iov.c       |  1 -
 tcp_vu.c    | 34 ++++++++++++++++++----------------
 udp_vu.c    | 14 ++++++++------
 vu_common.c | 31 +++++++++++++++----------------
 vu_common.h |  2 +-
 5 files changed, 42 insertions(+), 40 deletions(-)

diff --git a/iov.c b/iov.c
index d2b06093d11f..7c18941c0395 100644
--- a/iov.c
+++ b/iov.c
@@ -180,7 +180,6 @@ size_t iov_truncate(struct iovec *iov, size_t iov_cnt, size_t size)
  * 		Will write less than @length bytes if it runs out of space in
  * 		the iov
  */
-/* cppcheck-suppress unusedFunction */
 void iov_memset(const struct iovec *iov, size_t iov_cnt, size_t offset, int c,
 		size_t length)
 {
diff --git a/tcp_vu.c b/tcp_vu.c
index b879435e348a..f6ac76e52438 100644
--- a/tcp_vu.c
+++ b/tcp_vu.c
@@ -90,7 +90,7 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
 
 	elem_cnt = vu_collect(vdev, vq, &flags_elem[0], 1,
 			      &flags_iov[0], 1, NULL,
-			      MAX(hdrlen + sizeof(*opts), ETH_ZLEN + VNET_HLEN), NULL);
+			      hdrlen + sizeof(*opts), NULL);
 	if (elem_cnt != 1)
 		return -EAGAIN;
 
@@ -130,8 +130,6 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
 		return ret;
 	}
 
-	l2len = hdrlen + optlen - VNET_HLEN;
-	iov_truncate(&flags_iov[0], 1, l2len + VNET_HLEN);
 	payload = IOV_TAIL(flags_elem[0].in_sg, 1, hdrlen);
 
 	if (flags & KEEPALIVE)
@@ -140,17 +138,17 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
 	tcp_fill_headers(c, conn, eh, ip4h, ip6h, th, &payload,
 			 optlen, NULL, seq, !*c->pcap);
 
-	vu_pad(&flags_elem[0].in_sg[0], l2len);
-
+	vu_pad(flags_elem[0].in_sg, 1, hdrlen + optlen);
 	vu_flush(vdev, vq, flags_elem, 1, hdrlen + optlen);
 
+	l2len = hdrlen + optlen - VNET_HLEN;
 	if (*c->pcap)
 		pcap_iov(&flags_elem[0].in_sg[0], 1, VNET_HLEN, l2len);
 
 	if (flags & DUP_ACK) {
 		elem_cnt = vu_collect(vdev, vq, &flags_elem[1], 1,
 				      &flags_iov[1], 1, NULL,
-				      flags_elem[0].in_sg[0].iov_len, NULL);
+				      hdrlen + optlen, NULL);
 		if (elem_cnt == 1 &&
 		    flags_elem[1].in_sg[0].iov_len >=
 		    flags_elem[0].in_sg[0].iov_len) {
@@ -215,7 +213,7 @@ static ssize_t tcp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq,
 				 ARRAY_SIZE(elem) - elem_cnt,
 				 &iov_vu[DISCARD_IOV_NUM + iov_used],
 				 VIRTQUEUE_MAX_SIZE - iov_used, &in_total,
-				 MAX(MIN(mss, fillsize) + hdrlen, ETH_ZLEN + VNET_HLEN),
+				 MIN(mss, fillsize) + hdrlen,
 				 &frame_size);
 		if (cnt == 0)
 			break;
@@ -251,8 +249,11 @@ static ssize_t tcp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq,
 	if (!peek_offset_cap)
 		ret -= already_sent;
 
-	/* adjust iov number and length of the last iov */
-	i = iov_truncate(&iov_vu[DISCARD_IOV_NUM], iov_used, ret);
+	i = iov_skip_bytes(&iov_vu[DISCARD_IOV_NUM], iov_used,
+			   MAX(hdrlen + ret, VNET_HLEN + ETH_ZLEN),
+			   NULL);
+	if ((size_t)i < iov_used)
+		i++;
 
 	/* adjust head count */
 	while (*head_cnt > 0 && head[*head_cnt - 1] >= i)
@@ -449,11 +450,13 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
 		size_t frame_size = iov_size(iov, buf_cnt);
 		bool push = i == head_cnt - 1;
 		ssize_t dlen;
-		size_t l2len;
 
 		assert(frame_size >= hdrlen);
 
 		dlen = frame_size - hdrlen;
+		if (dlen > len)
+			dlen = len;
+		len -= dlen;
 
 		/* The IPv4 header checksum varies only with dlen */
 		if (previous_dlen != dlen)
@@ -462,14 +465,13 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
 
 		tcp_vu_prepare(c, conn, iov, buf_cnt, dlen, &check, !*c->pcap, push);
 
-		/* Pad first/single buffer only, it's at least ETH_ZLEN long */
-		l2len = dlen + hdrlen - VNET_HLEN;
-		vu_pad(iov, l2len);
-
+		vu_pad(elem[head[i]].in_sg, buf_cnt, dlen + hdrlen);
 		vu_flush(vdev, vq, &elem[head[i]], buf_cnt, dlen + hdrlen);
 
-		if (*c->pcap)
-			pcap_iov(iov, buf_cnt, VNET_HLEN, l2len);
+		if (*c->pcap) {
+			pcap_iov(iov, buf_cnt, VNET_HLEN,
+				 dlen + hdrlen - VNET_HLEN);
+		}
 
 		conn->seq_to_tap += dlen;
 	}
diff --git a/udp_vu.c b/udp_vu.c
index 3c9fff53324c..dfff7bb1cd53 100644
--- a/udp_vu.c
+++ b/udp_vu.c
@@ -67,7 +67,7 @@ static size_t udp_vu_hdrlen(bool v6)
 static ssize_t udp_vu_sock_recv(struct iovec *iov, size_t *cnt, int s, bool v6)
 {
 	struct msghdr msg  = { 0 };
-	size_t hdrlen, l2len;
+	size_t hdrlen, iov_used;
 	ssize_t dlen;
 
 	/* compute L2 header length */
@@ -90,11 +90,12 @@ static ssize_t udp_vu_sock_recv(struct iovec *iov, size_t *cnt, int s, bool v6)
 	iov[0].iov_base = (char *)iov[0].iov_base - hdrlen;
 	iov[0].iov_len += hdrlen;
 
-	*cnt = iov_truncate(iov, *cnt, dlen + hdrlen);
-
-	/* pad frame to 60 bytes: first buffer is at least ETH_ZLEN long */
-	l2len = dlen + hdrlen - VNET_HLEN;
-	vu_pad(&iov[0], l2len);
+	iov_used = iov_skip_bytes(iov, *cnt,
+				  MAX(dlen + hdrlen, VNET_HLEN + ETH_ZLEN),
+				  NULL);
+	if (iov_used < *cnt)
+		iov_used++;
+	*cnt = iov_used; /* one iovec per element */
 
 	return dlen;
 }
@@ -231,6 +232,7 @@ void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx)
 				pcap_iov(iov_vu, iov_cnt, VNET_HLEN,
 					 hdrlen + dlen - VNET_HLEN);
 			}
+			vu_pad(iov_vu, iov_cnt, hdrlen + dlen);
 			vu_flush(vdev, vq, elem, elem_used, hdrlen + dlen);
 			vu_queue_notify(vdev, vq);
 		}
diff --git a/vu_common.c b/vu_common.c
index 704e908aa02c..d07f584f228a 100644
--- a/vu_common.c
+++ b/vu_common.c
@@ -74,6 +74,7 @@ int vu_collect(const struct vu_dev *vdev, struct vu_virtq *vq,
 	size_t current_iov = 0;
 	int elem_cnt = 0;
 
+	size = MAX(size, ETH_ZLEN /* Ethernet minimum size */ + VNET_HLEN);
 	while (current_size < size && elem_cnt < max_elem &&
 	       current_iov < max_in_sg) {
 		int ret;
@@ -261,29 +262,27 @@ int vu_send_single(const struct ctx *c, const void *buf, size_t size)
 		return -1;
 	}
 
-	size += VNET_HLEN;
 	elem_cnt = vu_collect(vdev, vq, elem, ARRAY_SIZE(elem), in_sg,
-			      ARRAY_SIZE(in_sg), &in_total, size, &total);
-	if (elem_cnt == 0 || total < size) {
+			      ARRAY_SIZE(in_sg), &in_total, VNET_HLEN + size, &total);
+	if (elem_cnt == 0 || total < VNET_HLEN + size) {
 		debug("vu_send_single: no space to send the data "
 		      "elem_cnt %d size %zu", elem_cnt, total);
 		goto err;
 	}
 
-	total -= VNET_HLEN;
-
 	/* copy data from the buffer to the iovec */
-	iov_from_buf(in_sg, in_total, VNET_HLEN, buf, total);
+	iov_from_buf(in_sg, in_total, VNET_HLEN, buf, size);
 
 	if (*c->pcap)
 		pcap_iov(in_sg, in_total, VNET_HLEN, size);
 
+	vu_pad(in_sg, in_total, VNET_HLEN + size);
 	vu_flush(vdev, vq, elem, elem_cnt, VNET_HLEN + size);
 	vu_queue_notify(vdev, vq);
 
-	trace("vhost-user sent %zu", total);
+	trace("vhost-user sent %zu", size);
 
-	return total;
+	return size;
 err:
 	for (i = 0; i < elem_cnt; i++)
 		vu_queue_detach_element(vq);
@@ -292,15 +291,15 @@ err:
 }
 
 /**
- * vu_pad() - Pad 802.3 frame to minimum length (60 bytes) if needed
- * @iov:	Buffer in iovec array where end of 802.3 frame is stored
- * @l2len:	Layer-2 length already filled in frame
+ * vu_pad() - Pad short frames to minimum Ethernet length and truncate iovec
+ * @iov:	Pointer to iovec array
+ * @cnt:	Number of entries in @iov
+ * @frame_len:	Data length in @iov (including virtio-net header)
  */
-void vu_pad(struct iovec *iov, size_t l2len)
+void vu_pad(const struct iovec *iov, size_t cnt, size_t frame_len)
 {
-	if (l2len >= ETH_ZLEN)
-		return;
+	size_t min_frame_len = ETH_ZLEN + VNET_HLEN;
 
-	memset((char *)iov->iov_base + iov->iov_len, 0, ETH_ZLEN - l2len);
-	iov->iov_len += ETH_ZLEN - l2len;
+	if (frame_len < min_frame_len)
+		iov_memset(iov, cnt, frame_len, 0, min_frame_len - frame_len);
 }
diff --git a/vu_common.h b/vu_common.h
index 77d1849e6115..51f70084a7cb 100644
--- a/vu_common.h
+++ b/vu_common.h
@@ -44,6 +44,6 @@ void vu_flush(const struct vu_dev *vdev, struct vu_virtq *vq,
 void vu_kick_cb(struct vu_dev *vdev, union epoll_ref ref,
 		const struct timespec *now);
 int vu_send_single(const struct ctx *c, const void *buf, size_t size);
-void vu_pad(struct iovec *iov, size_t l2len);
+void vu_pad(const struct iovec *iov, size_t cnt, size_t frame_len);
 
 #endif /* VU_COMMON_H */
-- 
2.54.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 10/10] vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad()
  2026-05-13 11:52 ` [PATCH v4 10/10] vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad() Laurent Vivier
@ 2026-05-14  1:24   ` David Gibson
  0 siblings, 0 replies; 26+ messages in thread
From: David Gibson @ 2026-05-14  1:24 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev, Jon Maloy

[-- Attachment #1: Type: text/plain, Size: 5580 bytes --]

On Wed, May 13, 2026 at 01:52:18PM +0200, Laurent Vivier wrote:
> The previous per-protocol padding done by vu_pad() in tcp_vu.c and
> udp_vu.c was only correct for single-buffer frames: it assumed the
> padding area always fell within the first iov, writing past its end
> with a plain memset().
> 
> It also required each caller to compute MAX(..., ETH_ZLEN + VNET_HLEN)
> for vu_collect() and to call vu_pad() at the right point, duplicating
> the minimum-size logic across protocols.
> 
> Move the Ethernet minimum size enforcement into vu_collect() itself, so
> that enough buffer space is always reserved for padding regardless of
> the requested frame size.
> 
> Rewrite vu_pad() to take a full iovec array and use iov_memset(),
> making it safe for multi-buffer (mergeable rx buffer) frames.
> 
> In tcp_vu_sock_recv(), replace iov_truncate() with iov_skip_bytes():
> now that all consumers receive explicit data lengths, truncating the
> iovecs is no longer needed.  In tcp_vu_data_from_sock(), cap each
> frame's data length against the remaining bytes actually received from
> the socket, so that the last partial frame gets correct headers and
> sequence number advancement.
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> Reviewed-by: Jon Maloy <jmaloy@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

But following on from my comments on v3, a couple of clarity nits for
possible follow up:

[snip]
> diff --git a/vu_common.c b/vu_common.c
> index 704e908aa02c..d07f584f228a 100644
> --- a/vu_common.c
> +++ b/vu_common.c
> @@ -74,6 +74,7 @@ int vu_collect(const struct vu_dev *vdev, struct vu_virtq *vq,
>  	size_t current_iov = 0;
>  	int elem_cnt = 0;
>  
> +	size = MAX(size, ETH_ZLEN /* Ethernet minimum size */ + VNET_HLEN);

Here I think "size" is a reasonable name, since it's the size of the
buffer we're obtaining, i.e. a bound, but not otherwise related to the
length of the frame.

>  	while (current_size < size && elem_cnt < max_elem &&
>  	       current_iov < max_in_sg) {
>  		int ret;
> @@ -261,29 +262,27 @@ int vu_send_single(const struct ctx *c, const void *buf, size_t size)
>  		return -1;
>  	}
>  
> -	size += VNET_HLEN;
>  	elem_cnt = vu_collect(vdev, vq, elem, ARRAY_SIZE(elem), in_sg,
> -			      ARRAY_SIZE(in_sg), &in_total, size, &total);
> -	if (elem_cnt == 0 || total < size) {
> +			      ARRAY_SIZE(in_sg), &in_total, VNET_HLEN + size, &total);
> +	if (elem_cnt == 0 || total < VNET_HLEN + size) {

Here, "l2len" would be a much better name than "size".

>  		debug("vu_send_single: no space to send the data "
>  		      "elem_cnt %d size %zu", elem_cnt, total);
>  		goto err;
>  	}
>  
> -	total -= VNET_HLEN;
> -
>  	/* copy data from the buffer to the iovec */
> -	iov_from_buf(in_sg, in_total, VNET_HLEN, buf, total);
> +	iov_from_buf(in_sg, in_total, VNET_HLEN, buf, size);
>  
>  	if (*c->pcap)
>  		pcap_iov(in_sg, in_total, VNET_HLEN, size);
>  
> +	vu_pad(in_sg, in_total, VNET_HLEN + size);
>  	vu_flush(vdev, vq, elem, elem_cnt, VNET_HLEN + size);
>  	vu_queue_notify(vdev, vq);
>  
> -	trace("vhost-user sent %zu", total);
> +	trace("vhost-user sent %zu", size);
>  
> -	return total;
> +	return size;
>  err:
>  	for (i = 0; i < elem_cnt; i++)
>  		vu_queue_detach_element(vq);
> @@ -292,15 +291,15 @@ err:
>  }
>  
>  /**
> - * vu_pad() - Pad 802.3 frame to minimum length (60 bytes) if needed
> - * @iov:	Buffer in iovec array where end of 802.3 frame is stored
> - * @l2len:	Layer-2 length already filled in frame
> + * vu_pad() - Pad short frames to minimum Ethernet length and truncate iovec
> + * @iov:	Pointer to iovec array
> + * @cnt:	Number of entries in @iov
> + * @frame_len:	Data length in @iov (including virtio-net header)
>   */
> -void vu_pad(struct iovec *iov, size_t l2len)
> +void vu_pad(const struct iovec *iov, size_t cnt, size_t frame_len)

Here we have the actual frame length, including device header, but not
padding.  "frame_len" is different from the other standard names we
use, so it's not terrible, but "frame" often refers to the L2 object
so it's not great either.

Not sure if 'l1len' or 'l0len' would be getting too cutesy with what
"physical" layer means in a virtual network.  Something like
"device_len" maybe?  But that should probably include padding as well.

Or alternatively, vu_pad() could be updated to take l2len, and add
VNET_HLEN inside.

>  {
> -	if (l2len >= ETH_ZLEN)
> -		return;
> +	size_t min_frame_len = ETH_ZLEN + VNET_HLEN;
>  
> -	memset((char *)iov->iov_base + iov->iov_len, 0, ETH_ZLEN - l2len);
> -	iov->iov_len += ETH_ZLEN - l2len;
> +	if (frame_len < min_frame_len)
> +		iov_memset(iov, cnt, frame_len, 0, min_frame_len - frame_len);
>  }
> diff --git a/vu_common.h b/vu_common.h
> index 77d1849e6115..51f70084a7cb 100644
> --- a/vu_common.h
> +++ b/vu_common.h
> @@ -44,6 +44,6 @@ void vu_flush(const struct vu_dev *vdev, struct vu_virtq *vq,
>  void vu_kick_cb(struct vu_dev *vdev, union epoll_ref ref,
>  		const struct timespec *now);
>  int vu_send_single(const struct ctx *c, const void *buf, size_t size);
> -void vu_pad(struct iovec *iov, size_t l2len);
> +void vu_pad(const struct iovec *iov, size_t cnt, size_t frame_len);
>  
>  #endif /* VU_COMMON_H */
> -- 
> 2.54.0
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element
  2026-05-13 11:52 [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element Laurent Vivier
                   ` (9 preceding siblings ...)
  2026-05-13 11:52 ` [PATCH v4 10/10] vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad() Laurent Vivier
@ 2026-05-20  0:52 ` Stefano Brivio
  2026-05-20 15:34 ` Stefano Brivio
  11 siblings, 0 replies; 26+ messages in thread
From: Stefano Brivio @ 2026-05-20  0:52 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev, Jon Maloy, David Gibson

On Wed, 13 May 2026 13:52:08 +0200
Laurent Vivier <lvivier@redhat.com> wrote:

> Currently, the vhost-user path assumes each virtqueue element contains
> exactly one iovec entry covering the entire frame.  This assumption
> breaks as some virtio-net drivers (notably iPXE) provide descriptors where the
> vnet header and the frame payload are in separate buffers, resulting in
> two iovec entries per virtqueue element.
> 
> This series refactors the vhost-user data path so that frame lengths,
> header sizes, and padding are tracked and passed explicitly rather than
> being derived from iovec sizes.  This decoupling is a prerequisite for
> correctly handling padding of multi-buffer frames.
> 
> The changes in this series can be split in 3 groups:
> 
> - New iov helpers (patches 1-2):
> 
>    iov_memset() and iov_memcpy() operate across iovec boundaries.
>    These are needed by the final patch to pad and copy frame data
>    when a frame spans multiple iovec entries.
> 
> - Structural refactoring (patches 3-5):
> 
>    Move vnethdr setup into vu_flush(), separate virtqueue management
>    from socket I/O in the UDP path, and pass iov arrays explicitly
>    instead of using file-scoped state.  These changes make it possible
>    to pass explicit frame lengths through the stack, which is required
>    to pad frames independently of iovec layout.
> 
> - Explicit length passing throughout the stack (patches 6-10):
> 
>    Thread explicit L4, L2, frame, and data lengths through checksum,
>    pcap, vu_flush(), and tcp_fill_headers(), replacing lengths that
>    were previously derived from iovec sizes.  With lengths tracked
>    explicitly, the final patch can centralise Ethernet frame padding
>    into vu_collect() and a new vu_pad() helper that correctly pads
>    frames spanning multiple iovec entries.
> 
> v4:
> - rebase
> - iov_memcpy: use size_t for loop indices i and j
> - udp_vu: reorder elem[] declaration for inverted christmas tree style
> - pcap: wrap pcap_iov() declaration and definition to respect line length
> - write_remainder(): update length parameter description
> - Add Reviewed-by tags from Jon and David

Applied, sorry for the delay.

-- 
Stefano


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element
  2026-05-13 11:52 [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element Laurent Vivier
                   ` (10 preceding siblings ...)
  2026-05-20  0:52 ` [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element Stefano Brivio
@ 2026-05-20 15:34 ` Stefano Brivio
  2026-05-20 16:07   ` Stefano Brivio
  11 siblings, 1 reply; 26+ messages in thread
From: Stefano Brivio @ 2026-05-20 15:34 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev, Jon Maloy, David GIbson

On Wed, 13 May 2026 13:52:08 +0200
Laurent Vivier <lvivier@redhat.com> wrote:

> Currently, the vhost-user path assumes each virtqueue element contains
> exactly one iovec entry covering the entire frame.  This assumption
> breaks as some virtio-net drivers (notably iPXE) provide descriptors where the
> vnet header and the frame payload are in separate buffers, resulting in
> two iovec entries per virtqueue element.
> 
> This series refactors the vhost-user data path so that frame lengths,
> header sizes, and padding are tracked and passed explicitly rather than
> being derived from iovec sizes.  This decoupling is a prerequisite for
> correctly handling padding of multi-buffer frames.

Sorry to bring (likely) bad news, but this series seems to introduce a
regression: I got the migration/rampstream_in tests fail twice in a
row, which I've never saw happening (I think I saw a single failure a
long time ago when the machine had a high CPU load, but nothing else).

I'm currently bisecting and the bisect seems to point towards the end
of the series (probably 10/10), but I haven't finished yet. I'll keep
you posted. I haven't spotted anything that might cause issues there.

It's probably worth mentioning that after migration we send pretty
small TCP frames (window probes), but I have no idea yet if that has
anything to do.

-- 
Stefano


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element
  2026-05-20 15:34 ` Stefano Brivio
@ 2026-05-20 16:07   ` Stefano Brivio
  2026-05-20 16:18     ` Stefano Brivio
  0 siblings, 1 reply; 26+ messages in thread
From: Stefano Brivio @ 2026-05-20 16:07 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev, Jon Maloy, David GIbson

On Wed, 20 May 2026 17:34:45 +0200
Stefano Brivio <sbrivio@redhat.com> wrote:

> On Wed, 13 May 2026 13:52:08 +0200
> Laurent Vivier <lvivier@redhat.com> wrote:
> 
> > Currently, the vhost-user path assumes each virtqueue element contains
> > exactly one iovec entry covering the entire frame.  This assumption
> > breaks as some virtio-net drivers (notably iPXE) provide descriptors where the
> > vnet header and the frame payload are in separate buffers, resulting in
> > two iovec entries per virtqueue element.
> > 
> > This series refactors the vhost-user data path so that frame lengths,
> > header sizes, and padding are tracked and passed explicitly rather than
> > being derived from iovec sizes.  This decoupling is a prerequisite for
> > correctly handling padding of multi-buffer frames.  
> 
> Sorry to bring (likely) bad news, but this series seems to introduce a
> regression: I got the migration/rampstream_in tests fail twice in a
> row, which I've never saw happening (I think I saw a single failure a
> long time ago when the machine had a high CPU load, but nothing else).
> 
> I'm currently bisecting and the bisect seems to point towards the end
> of the series (probably 10/10), but I haven't finished yet. I'll keep
> you posted. I haven't spotted anything that might cause issues there.

Yeah, that's the one :(

$ git bisect bad
db798fc60f4c5869cb53168354e068fb4dabd91a is the first bad commit
commit db798fc60f4c5869cb53168354e068fb4dabd91a
Author: Laurent Vivier <lvivier@redhat.com>
Date:   Wed May 13 13:52:18 2026 +0200

    vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad()

The "TCP/IPv4: sequence check, ramps, inbound" test in rampstream_in
gets stuck, once the source is done with the migration, and passt on the
destination just printed:

Accepted TCP_REPAIR helper, PID 13
accepted connection from PID 16

I'll get captures and logs next. It seems to fail most of the times,
I had two failures in a row again.

-- 
Stefano


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element
  2026-05-20 16:07   ` Stefano Brivio
@ 2026-05-20 16:18     ` Stefano Brivio
  2026-05-20 20:53       ` Stefano Brivio
  0 siblings, 1 reply; 26+ messages in thread
From: Stefano Brivio @ 2026-05-20 16:18 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev, Jon Maloy, David GIbson

[-- Attachment #1: Type: text/plain, Size: 3013 bytes --]

On Wed, 20 May 2026 18:07:08 +0200
Stefano Brivio <sbrivio@redhat.com> wrote:

> On Wed, 20 May 2026 17:34:45 +0200
> Stefano Brivio <sbrivio@redhat.com> wrote:
> 
> > On Wed, 13 May 2026 13:52:08 +0200
> > Laurent Vivier <lvivier@redhat.com> wrote:
> >   
> > > Currently, the vhost-user path assumes each virtqueue element contains
> > > exactly one iovec entry covering the entire frame.  This assumption
> > > breaks as some virtio-net drivers (notably iPXE) provide descriptors where the
> > > vnet header and the frame payload are in separate buffers, resulting in
> > > two iovec entries per virtqueue element.
> > > 
> > > This series refactors the vhost-user data path so that frame lengths,
> > > header sizes, and padding are tracked and passed explicitly rather than
> > > being derived from iovec sizes.  This decoupling is a prerequisite for
> > > correctly handling padding of multi-buffer frames.    
> > 
> > Sorry to bring (likely) bad news, but this series seems to introduce a
> > regression: I got the migration/rampstream_in tests fail twice in a
> > row, which I've never saw happening (I think I saw a single failure a
> > long time ago when the machine had a high CPU load, but nothing else).
> > 
> > I'm currently bisecting and the bisect seems to point towards the end
> > of the series (probably 10/10), but I haven't finished yet. I'll keep
> > you posted. I haven't spotted anything that might cause issues there.  
> 
> Yeah, that's the one :(
> 
> $ git bisect bad
> db798fc60f4c5869cb53168354e068fb4dabd91a is the first bad commit
> commit db798fc60f4c5869cb53168354e068fb4dabd91a
> Author: Laurent Vivier <lvivier@redhat.com>
> Date:   Wed May 13 13:52:18 2026 +0200
> 
>     vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad()
> 
> The "TCP/IPv4: sequence check, ramps, inbound" test in rampstream_in
> gets stuck, once the source is done with the migration, and passt on the
> destination just printed:
> 
> Accepted TCP_REPAIR helper, PID 13
> accepted connection from PID 16
> 
> I'll get captures and logs next. It seems to fail most of the times,
> I had two failures in a row again.

Log from passt --debug attached. Likely highlight:

---
13.2853: ================ Vhost user message ================
13.2853: Request: VHOST_USER_SET_VRING_ADDR (9)
13.2853: Flags:   0x1
13.2853: Size:    40
13.2853: vhost_vring_addr:
13.2853:     index:  0
13.2853:     flags:  0
13.2853:     desc_user_addr:   0x00007f0943f41000
13.2853:     used_user_addr:   0x00007f0943f42240
13.2854:     avail_user_addr:  0x00007f0943f42000
13.2854:     log_guest_addr:   0x000000001ff43240
13.2854: Setting virtq addresses:
13.2854:     vring_desc  at 0x7f2e2e2ca000
13.2854:     vring_used  at 0x7f2e2e2cb240
13.2854:     vring_avail at 0x7f2e2e2cb000
13.2854: Last avail index != used index: 2163 != 1936
13.2854: Got packet, but RX virtqueue not usable yet
---

pcap file of that passt instance empty, it didn't have a chance to
send/receive packets yet.

-- 
Stefano


[-- Attachment #2: context_passt_2.log --]
[-- Type: text/x-log, Size: 39547 bytes --]

passt_2$ ./passt -s /tmp/passt-tests-57k4fx/migrate/passt_2.socket -P /tmp/passt-tests-57k4fx/migrate/passt_2.pid -f --vhost-user --migrate-exit --migrate-no-linger -p /home/sbrivio/passt/test/test_logs/passt_2.pcap -d -t 10004 -u 10004
0.0006: UNIX domain socket bound at /tmp/passt-tests-57k4fx/migrate/passt_2.socket
0.0006: UNIX domain socket bound at /tmp/passt-tests-57k4fx/migrate/passt_2.socket.repair
0.0006: No IPv6 nameserver available for NDP/DHCPv6
0.0007: Template interface: enp9s0 (IPv4), enp9s0 (IPv6)
0.0007: MAC:
0.0007:     host: 9a:55:9a:55:9a:55
0.0007:     NAT to host 127.0.0.1: 88.198.0.161
0.0007: DHCP:
0.0007:     assign: 88.198.0.164
0.0007:     mask: 255.255.255.224
0.0007:     router: 88.198.0.161
0.0007: DNS:
0.0007:     185.12.64.1
0.0007:     185.12.64.2
0.0007:     NAT to host ::1: fe80::1
0.0007: NDP/DHCPv6:
0.0007:     assign: 2a01:4f8:222:904::2
0.0007:     router: fe80::1
0.0007:     our link-local: fe80::1
0.0007: Inbound forwarding rules (HOST):
0.0007:     TCP [*]:10004  =>  10004 
0.0007:     UDP [*]:10004  =>  10004 
0.0017: You can start qemu with:
0.0017:     kvm ... -chardev socket,id=chr0,path=/tmp/passt-tests-57k4fx/migrate/passt_2.socket -netdev vhost-user,id=netdev0,chardev=chr0 -device virtio-net,netdev=netdev0 -object memory-backend-memfd,id=memfd0,share=on,size=$RAMSIZE -numa node,memdev=memfd0

0.0018: Using UDP timeout parameters, timeout: 30, stream_timeout: 120
0.0020: Couldn't read /proc/sys/net/ipv4/tcp_syn_linear_timeouts, using 4 as default value
0.0020: Couldn't read /proc/sys/net/ipv4/tcp_rto_max_ms, using 120000 as default value
0.0020: Using TCP RTO parameters, syn_retries: 6, syn_linear_timeouts: 4, rto_max: 120
0.0023: SO_PEEK_OFF not supported
0.0025: TCP_INFO tcpi_snd_wnd field supported
0.0025: TCP_INFO tcpi_bytes_acked field supported
0.0025: TCP_INFO tcpi_min_rtt field supported
0.0026: Saving packet capture to /home/sbrivio/passt/test/test_logs/passt_2.pcap
0.1005: Accepted TCP_REPAIR helper, PID 14
0.1247: accepted connection from PID 12
0.1247: Sending initial ARP request for guest MAC address
0.1247: Got packet, but RX virtqueue not usable yet
0.1247: Sending initial NDP NS request for guest MAC address
0.1247: Got packet, but RX virtqueue not usable yet
0.1921: ================ Vhost user message ================
0.1921: Request: VHOST_USER_GET_FEATURES (1)
0.1921: Flags:   0x1
0.1921: Size:    0
0.1921: Sending back to guest u64: 0x0000000144008000
0.1922: ================ Vhost user message ================
0.1922: Request: VHOST_USER_GET_PROTOCOL_FEATURES (15)
0.1922: Flags:   0x1
0.1922: Size:    0
0.1922: ================ Vhost user message ================
0.1922: Request: VHOST_USER_SET_PROTOCOL_FEATURES (16)
0.1922: Flags:   0x1
0.1922: Size:    8
0.1922: u64: 0x000000000008000e
0.1922: ================ Vhost user message ================
0.1922: Request: VHOST_USER_SET_OWNER (3)
0.1922: Flags:   0x1
0.1922: Size:    0
0.1923: ================ Vhost user message ================
0.1923: Request: VHOST_USER_GET_FEATURES (1)
0.1923: Flags:   0x1
0.1923: Size:    0
0.1923: Sending back to guest u64: 0x0000000144008000
0.1923: ================ Vhost user message ================
0.1923: Request: VHOST_USER_SET_VRING_CALL (13)
0.1923: Flags:   0x1
0.1923: Size:    8
0.1923: u64: 0x0000000000000000
0.1923: Got call_fd: 78 for vq: 0
0.1923: ================ Vhost user message ================
0.1923: Request: VHOST_USER_SET_VRING_ERR (14)
0.1923: Flags:   0x1
0.1923: Size:    8
0.1923: u64: 0x0000000000000000
0.1924: ================ Vhost user message ================
0.1924: Request: VHOST_USER_SET_VRING_CALL (13)
0.1924: Flags:   0x1
0.1924: Size:    8
0.1924: u64: 0x0000000000000001
0.1924: Got call_fd: 80 for vq: 1
0.1924: ================ Vhost user message ================
0.1924: Request: VHOST_USER_SET_VRING_ERR (14)
0.1924: Flags:   0x1
0.1924: Size:    8
0.1924: u64: 0x0000000000000001
1.1934: TCP inactivity scan
13.2805: ================ Vhost user message ================
13.2806: Request: VHOST_USER_SET_VRING_ENABLE (18)
13.2806: Flags:   0x9
13.2806: Size:    8
13.2806: State.index:  0
13.2806: State.enable: 1
13.2806: ================ Vhost user message ================
13.2806: Request: VHOST_USER_SET_VRING_ENABLE (18)
13.2806: Flags:   0x9
13.2806: Size:    8
13.2806: State.index:  1
13.2806: State.enable: 1
13.2806: ================ Vhost user message ================
13.2806: Request: VHOST_USER_SET_DEVICE_STATE_FD (42)
13.2806: Flags:   0x1
13.2806: Size:    8
13.2806: Migration requested, fd: 82 (was -1)
13.2806: Handling migration request from fd: 82, target: 1
13.2807: Source magic: 0xb1bb1d1b0bb1d1b0, version: 2, compat: 2
13.2807: Target side migration stage: observed addresses
13.2807: Target side migration stage: transfer flows
13.2807: Receiving 1 flows
13.2807: Flow 0 (NEW): FREE -> NEW
13.2807: Flow 0 (TCP connection): TGT -> TYPED
13.2807: Flow 0 (TCP connection): HOST [192.0.2.1]:53948 -> [88.198.0.164]:10001 => TAP [192.0.2.1]:53948 -> [88.198.0.164]:10001
13.2807: Flow 0 (TCP connection): Side 1 hash table insert: bucket: 189957
13.2807: Flow 0 (TCP connection): TYPED -> ACTIVE
13.2807: Flow 0 (TCP connection): HOST [192.0.2.1]:53948 -> [88.198.0.164]:10001 => TAP [192.0.2.1]:53948 -> [88.198.0.164]:10001
13.2807: Flow 0 (TCP connection): Extended migration data, socket 83 sequences send 1662769585 receive 136300818
13.2807: Flow 0 (TCP connection):   pending queues: send 0 not sent 0 receive 3306378
13.2807: Flow 0 (TCP connection):   window: snd_wl1 139607196 snd_wnd 65536 max 65536 rcv_wnd 0 rcv_wup 139607196
13.2808: Flow 0 (TCP connection):   SO_PEEK_OFF disabled  offset=0
13.2829: Got packet, but RX virtqueue not usable yet
13.2829: Closing migration channel, fd: 82
13.2830: Closing TCP_REPAIR helper socket
13.2830: ================ Vhost user message ================
13.2830: Request: VHOST_USER_CHECK_DEVICE_STATE (43)
13.2830: Flags:   0x1
13.2830: Size:    0
13.2830: Got packet, but RX virtqueue not usable yet
13.2830: Got packet, but RX virtqueue not usable yet
13.2830: Got packet, but RX virtqueue not usable yet
13.2830: Got packet, but RX virtqueue not usable yet
13.2830: ================ Vhost user message ================
13.2830: Request: VHOST_USER_SET_VRING_ENABLE (18)
13.2830: Flags:   0x9
13.2830: Size:    8
13.2830: State.index:  0
13.2830: State.enable: 1
13.2830: Got packet, but RX virtqueue not usable yet
13.2830: Got packet, but RX virtqueue not usable yet
13.2830: Got packet, but RX virtqueue not usable yet
13.2830: ================ Vhost user message ================
13.2830: Request: VHOST_USER_SET_VRING_ENABLE (18)
13.2830: Flags:   0x9
13.2830: Size:    8
13.2831: State.index:  1
13.2831: State.enable: 1
13.2831: Got packet, but RX virtqueue not usable yet
13.2831: Got packet, but RX virtqueue not usable yet
13.2831: Got packet, but RX virtqueue not usable yet
13.2831: Got packet, but RX virtqueue not usable yet
13.2831: Got packet, but RX virtqueue not usable yet
13.2831: Got packet, but RX virtqueue not usable yet
13.2831: Got packet, but RX virtqueue not usable yet
13.2831: Got packet, but RX virtqueue not usable yet
13.2831: Got packet, but RX virtqueue not usable yet
13.2831: Got packet, but RX virtqueue not usable yet
13.2831: Got packet, but RX virtqueue not usable yet
13.2831: Got packet, but RX virtqueue not usable yet
13.2831: Got packet, but RX virtqueue not usable yet
13.2831: Got packet, but RX virtqueue not usable yet
13.2831: Got packet, but RX virtqueue not usable yet
13.2831: Got packet, but RX virtqueue not usable yet
13.2831: Got packet, but RX virtqueue not usable yet
13.2831: Got packet, but RX virtqueue not usable yet
13.2831: Got packet, but RX virtqueue not usable yet
13.2831: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2832: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2833: Got packet, but RX virtqueue not usable yet
13.2834: Got packet, but RX virtqueue not usable yet
13.2834: Got packet, but RX virtqueue not usable yet
13.2834: Got packet, but RX virtqueue not usable yet
13.2834: Got packet, but RX virtqueue not usable yet
13.2834: Got packet, but RX virtqueue not usable yet
13.2834: Got packet, but RX virtqueue not usable yet
13.2834: Got packet, but RX virtqueue not usable yet
13.2834: Got packet, but RX virtqueue not usable yet
13.2834: Got packet, but RX virtqueue not usable yet
13.2834: Got packet, but RX virtqueue not usable yet
13.2834: Got packet, but RX virtqueue not usable yet
13.2834: Got packet, but RX virtqueue not usable yet
13.2834: Got packet, but RX virtqueue not usable yet
13.2834: Got packet, but RX virtqueue not usable yet
13.2834: Got packet, but RX virtqueue not usable yet
13.2834: Got packet, but RX virtqueue not usable yet
13.2834: Got packet, but RX virtqueue not usable yet
13.2834: Got packet, but RX virtqueue not usable yet
13.2834: Got packet, but RX virtqueue not usable yet
13.2834: Got packet, but RX virtqueue not usable yet
13.2835: Got packet, but RX virtqueue not usable yet
13.2835: Got packet, but RX virtqueue not usable yet
13.2835: Got packet, but RX virtqueue not usable yet
13.2835: Got packet, but RX virtqueue not usable yet
13.2835: Got packet, but RX virtqueue not usable yet
13.2835: Got packet, but RX virtqueue not usable yet
13.2835: Got packet, but RX virtqueue not usable yet
13.2835: Got packet, but RX virtqueue not usable yet
13.2835: Got packet, but RX virtqueue not usable yet
13.2835: Got packet, but RX virtqueue not usable yet
13.2835: Got packet, but RX virtqueue not usable yet
13.2835: Got packet, but RX virtqueue not usable yet
13.2835: Got packet, but RX virtqueue not usable yet
13.2835: Got packet, but RX virtqueue not usable yet
13.2835: Got packet, but RX virtqueue not usable yet
13.2835: Got packet, but RX virtqueue not usable yet
13.2835: Got packet, but RX virtqueue not usable yet
13.2835: Got packet, but RX virtqueue not usable yet
13.2836: ================ Vhost user message ================
13.2836: Request: VHOST_USER_SET_VRING_ENABLE (18)
13.2836: Flags:   0x9
13.2836: Size:    8
13.2836: State.index:  0
13.2836: State.enable: 1
13.2836: Got packet, but RX virtqueue not usable yet
13.2836: Got packet, but RX virtqueue not usable yet
13.2836: Got packet, but RX virtqueue not usable yet
13.2836: ================ Vhost user message ================
13.2836: Request: VHOST_USER_SET_VRING_ENABLE (18)
13.2836: Flags:   0x9
13.2836: Size:    8
13.2836: State.index:  1
13.2836: State.enable: 1
13.2836: Got packet, but RX virtqueue not usable yet
13.2836: Got packet, but RX virtqueue not usable yet
13.2836: Got packet, but RX virtqueue not usable yet
13.2836: Got packet, but RX virtqueue not usable yet
13.2836: Got packet, but RX virtqueue not usable yet
13.2836: Got packet, but RX virtqueue not usable yet
13.2836: Got packet, but RX virtqueue not usable yet
13.2836: Got packet, but RX virtqueue not usable yet
13.2836: Got packet, but RX virtqueue not usable yet
13.2836: Got packet, but RX virtqueue not usable yet
13.2836: Got packet, but RX virtqueue not usable yet
13.2836: Got packet, but RX virtqueue not usable yet
13.2836: Got packet, but RX virtqueue not usable yet
13.2836: Got packet, but RX virtqueue not usable yet
13.2836: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2837: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2838: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2839: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2840: Got packet, but RX virtqueue not usable yet
13.2841: Got packet, but RX virtqueue not usable yet
13.2841: Got packet, but RX virtqueue not usable yet
13.2841: Got packet, but RX virtqueue not usable yet
13.2841: Got packet, but RX virtqueue not usable yet
13.2841: Got packet, but RX virtqueue not usable yet
13.2841: Got packet, but RX virtqueue not usable yet
13.2841: Got packet, but RX virtqueue not usable yet
13.2841: Got packet, but RX virtqueue not usable yet
13.2841: Got packet, but RX virtqueue not usable yet
13.2841: Got packet, but RX virtqueue not usable yet
13.2841: Got packet, but RX virtqueue not usable yet
13.2841: Got packet, but RX virtqueue not usable yet
13.2841: Got packet, but RX virtqueue not usable yet
13.2841: Got packet, but RX virtqueue not usable yet
13.2841: Got packet, but RX virtqueue not usable yet
13.2841: Got packet, but RX virtqueue not usable yet
13.2841: Got packet, but RX virtqueue not usable yet
13.2841: Got packet, but RX virtqueue not usable yet
13.2841: Got packet, but RX virtqueue not usable yet
13.2841: Got packet, but RX virtqueue not usable yet
13.2841: Got packet, but RX virtqueue not usable yet
13.2841: Got packet, but RX virtqueue not usable yet
13.2842: Got packet, but RX virtqueue not usable yet
13.2842: Got packet, but RX virtqueue not usable yet
13.2842: Got packet, but RX virtqueue not usable yet
13.2842: Got packet, but RX virtqueue not usable yet
13.2842: Got packet, but RX virtqueue not usable yet
13.2842: Got packet, but RX virtqueue not usable yet
13.2842: Got packet, but RX virtqueue not usable yet
13.2842: Got packet, but RX virtqueue not usable yet
13.2842: Got packet, but RX virtqueue not usable yet
13.2842: Got packet, but RX virtqueue not usable yet
13.2842: Got packet, but RX virtqueue not usable yet
13.2842: Got packet, but RX virtqueue not usable yet
13.2842: Got packet, but RX virtqueue not usable yet
13.2842: Got packet, but RX virtqueue not usable yet
13.2842: Got packet, but RX virtqueue not usable yet
13.2842: Got packet, but RX virtqueue not usable yet
13.2843: Got packet, but RX virtqueue not usable yet
13.2843: Got packet, but RX virtqueue not usable yet
13.2843: Got packet, but RX virtqueue not usable yet
13.2843: Got packet, but RX virtqueue not usable yet
13.2843: Got packet, but RX virtqueue not usable yet
13.2843: Got packet, but RX virtqueue not usable yet
13.2843: Got packet, but RX virtqueue not usable yet
13.2843: Got packet, but RX virtqueue not usable yet
13.2843: Got packet, but RX virtqueue not usable yet
13.2843: Got packet, but RX virtqueue not usable yet
13.2843: Got packet, but RX virtqueue not usable yet
13.2843: Got packet, but RX virtqueue not usable yet
13.2843: Got packet, but RX virtqueue not usable yet
13.2843: Got packet, but RX virtqueue not usable yet
13.2843: Got packet, but RX virtqueue not usable yet
13.2843: Got packet, but RX virtqueue not usable yet
13.2843: Got packet, but RX virtqueue not usable yet
13.2843: Got packet, but RX virtqueue not usable yet
13.2843: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2844: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2845: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2846: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2847: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2848: Got packet, but RX virtqueue not usable yet
13.2849: Got packet, but RX virtqueue not usable yet
13.2849: Got packet, but RX virtqueue not usable yet
13.2849: Got packet, but RX virtqueue not usable yet
13.2849: Got packet, but RX virtqueue not usable yet
13.2849: Got packet, but RX virtqueue not usable yet
13.2849: Got packet, but RX virtqueue not usable yet
13.2849: ================ Vhost user message ================
13.2849: Request: VHOST_USER_SEND_RARP (19)
13.2849: Flags:   0x1
13.2849: Size:    8
13.2849: Ignore command VHOST_USER_SEND_RARP for 52:54:00:12:34:56
13.2849: Got packet, but RX virtqueue not usable yet
13.2849: Got packet, but RX virtqueue not usable yet
13.2849: Got packet, but RX virtqueue not usable yet
13.2849: Got packet, but RX virtqueue not usable yet
13.2849: Got packet, but RX virtqueue not usable yet
13.2849: Got packet, but RX virtqueue not usable yet
13.2849: Got packet, but RX virtqueue not usable yet
13.2849: Got packet, but RX virtqueue not usable yet
13.2849: Got packet, but RX virtqueue not usable yet
13.2849: Got packet, but RX virtqueue not usable yet
13.2849: Got packet, but RX virtqueue not usable yet
13.2849: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2850: Got packet, but RX virtqueue not usable yet
13.2851: Got packet, but RX virtqueue not usable yet
13.2851: Got packet, but RX virtqueue not usable yet
13.2851: Got packet, but RX virtqueue not usable yet
13.2851: Got packet, but RX virtqueue not usable yet
13.2851: Got packet, but RX virtqueue not usable yet
13.2851: Got packet, but RX virtqueue not usable yet
13.2851: Got packet, but RX virtqueue not usable yet
13.2851: Got packet, but RX virtqueue not usable yet
13.2851: Got packet, but RX virtqueue not usable yet
13.2851: Got packet, but RX virtqueue not usable yet
13.2851: Got packet, but RX virtqueue not usable yet
13.2851: Got packet, but RX virtqueue not usable yet
13.2851: Got packet, but RX virtqueue not usable yet
13.2851: Got packet, but RX virtqueue not usable yet
13.2851: Got packet, but RX virtqueue not usable yet
13.2851: Got packet, but RX virtqueue not usable yet
13.2851: Got packet, but RX virtqueue not usable yet
13.2851: Got packet, but RX virtqueue not usable yet
13.2852: Got packet, but RX virtqueue not usable yet
13.2852: Got packet, but RX virtqueue not usable yet
13.2852: Got packet, but RX virtqueue not usable yet
13.2852: Got packet, but RX virtqueue not usable yet
13.2852: Got packet, but RX virtqueue not usable yet
13.2852: Got packet, but RX virtqueue not usable yet
13.2852: Got packet, but RX virtqueue not usable yet
13.2852: ================ Vhost user message ================
13.2852: Request: VHOST_USER_SET_VRING_ENABLE (18)
13.2852: Flags:   0x9
13.2852: Size:    8
13.2852: State.index:  0
13.2852: State.enable: 1
13.2852: Got packet, but RX virtqueue not usable yet
13.2852: Got packet, but RX virtqueue not usable yet
13.2852: ================ Vhost user message ================
13.2852: Request: VHOST_USER_SET_VRING_ENABLE (18)
13.2852: Flags:   0x9
13.2852: Size:    8
13.2852: State.index:  1
13.2852: State.enable: 1
13.2852: Got packet, but RX virtqueue not usable yet
13.2852: Got packet, but RX virtqueue not usable yet
13.2852: ================ Vhost user message ================
13.2852: Request: VHOST_USER_SET_FEATURES (2)
13.2852: Flags:   0x1
13.2852: Size:    8
13.2853: u64: 0x0000000140008000
13.2853: Got packet, but RX virtqueue not usable yet
13.2853: ================ Vhost user message ================
13.2853: Request: VHOST_USER_SET_MEM_TABLE (5)
13.2853: Flags:   0x9
13.2853: Size:    40
13.2853: vhost-user nregions: 1
13.2853: vhost-user region 0
13.2853:     guest_phys_addr: 0x0000000000000000
13.2853:     memory_size:     0x0000000020000000
13.2853:     userspace_addr   0x00007f0923fff000
13.2853:     mmap_offset      0x0000000000000000
13.2853:     mmap_addr:       0x00007f2e0e388000
13.2853: Got packet, but RX virtqueue not usable yet
13.2853: Got packet, but RX virtqueue not usable yet
13.2853: ================ Vhost user message ================
13.2853: Request: VHOST_USER_SET_VRING_NUM (8)
13.2853: Flags:   0x1
13.2853: Size:    8
13.2853: Got packet, but RX virtqueue not usable yet
13.2853: ================ Vhost user message ================
13.2853: Request: VHOST_USER_SET_VRING_BASE (10)
13.2853: Flags:   0x1
13.2853: Size:    8
13.2853: State.index: 0
13.2853: State.num:   2163
13.2853: Got packet, but RX virtqueue not usable yet
13.2853: ================ Vhost user message ================
13.2853: Request: VHOST_USER_SET_VRING_ADDR (9)
13.2853: Flags:   0x1
13.2853: Size:    40
13.2853: vhost_vring_addr:
13.2853:     index:  0
13.2853:     flags:  0
13.2853:     desc_user_addr:   0x00007f0943f41000
13.2853:     used_user_addr:   0x00007f0943f42240
13.2854:     avail_user_addr:  0x00007f0943f42000
13.2854:     log_guest_addr:   0x000000001ff43240
13.2854: Setting virtq addresses:
13.2854:     vring_desc  at 0x7f2e2e2ca000
13.2854:     vring_used  at 0x7f2e2e2cb240
13.2854:     vring_avail at 0x7f2e2e2cb000
13.2854: Last avail index != used index: 2163 != 1936
13.2854: Got packet, but RX virtqueue not usable yet
13.2854: ================ Vhost user message ================
13.2854: Request: VHOST_USER_SET_VRING_KICK (12)
13.2854: Flags:   0x1
13.2854: Size:    8
13.2854: u64: 0x0000000000000000
13.2854: Got kick_fd: 8 for vq: 0
13.2854: ================ Vhost user message ================
13.2854: Request: VHOST_USER_SET_VRING_CALL (13)
13.2854: Flags:   0x1
13.2854: Size:    8
13.2854: u64: 0x0000000000000000
13.2854: Got call_fd: 82 for vq: 0
13.2854: ================ Vhost user message ================
13.2854: Request: VHOST_USER_SET_VRING_NUM (8)
13.2854: Flags:   0x1
13.2854: Size:    8
13.2854: ================ Vhost user message ================
13.2854: Request: VHOST_USER_SET_VRING_BASE (10)
13.2854: Flags:   0x1
13.2854: Size:    8
13.2854: State.index: 1
13.2854: State.num:   143
13.2854: ================ Vhost user message ================
13.2854: Request: VHOST_USER_SET_VRING_ADDR (9)
13.2854: Flags:   0x1
13.2854: Size:    40
13.2854: vhost_vring_addr:
13.2854:     index:  1
13.2854:     flags:  0
13.2854:     desc_user_addr:   0x00007f0943c6f000
13.2854:     used_user_addr:   0x00007f0943c70240
13.2854:     avail_user_addr:  0x00007f0943c70000
13.2854:     log_guest_addr:   0x000000001fc71240
13.2854: Setting virtq addresses:
13.2854:     vring_desc  at 0x7f2e2dff8000
13.2854:     vring_used  at 0x7f2e2dff9240
13.2854:     vring_avail at 0x7f2e2dff9000
13.2855: ================ Vhost user message ================
13.2855: Request: VHOST_USER_SET_VRING_KICK (12)
13.2855: Flags:   0x1
13.2855: Size:    8
13.2855: u64: 0x0000000000000001
13.2855: Got kick_fd: 78 for vq: 1
13.2855: Waiting for kicks on fd: 78 for vq: 1
13.2855: ================ Vhost user message ================
13.2855: Request: VHOST_USER_SET_VRING_CALL (13)
13.2855: Flags:   0x1
13.2855: Size:    8
13.2855: u64: 0x0000000000000001
13.2855: Got call_fd: 84 for vq: 1
13.3342: ================ Vhost user message ================
13.3342: Request: VHOST_USER_SEND_RARP (19)
13.3342: Flags:   0x1
13.3342: Size:    8
13.3342: Ignore command VHOST_USER_SEND_RARP for 52:54:00:12:34:56
13.4842: ================ Vhost user message ================
13.4842: Request: VHOST_USER_SEND_RARP (19)
13.4842: Flags:   0x1
13.4842: Size:    8
13.4842: Ignore command VHOST_USER_SEND_RARP for 52:54:00:12:34:56
13.7342: ================ Vhost user message ================
13.7342: Request: VHOST_USER_SEND_RARP (19)
13.7342: Flags:   0x1
13.7342: Size:    8
13.7342: Ignore command VHOST_USER_SEND_RARP for 52:54:00:12:34:56
14.0842: ================ Vhost user message ================
14.0842: Request: VHOST_USER_SEND_RARP (19)
14.0842: Flags:   0x1
14.0842: Size:    8
14.0842: Ignore command VHOST_USER_SEND_RARP for 52:54:00:12:34:56
61.2038: Flow 0 (TCP connection): No tap activity for least 30s, send keepalive

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element
  2026-05-20 16:18     ` Stefano Brivio
@ 2026-05-20 20:53       ` Stefano Brivio
  2026-05-21  8:30         ` Laurent Vivier
  0 siblings, 1 reply; 26+ messages in thread
From: Stefano Brivio @ 2026-05-20 20:53 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev, Jon Maloy, David GIbson

On Wed, 20 May 2026 18:18:52 +0200
Stefano Brivio <sbrivio@redhat.com> wrote:

> On Wed, 20 May 2026 18:07:08 +0200
> Stefano Brivio <sbrivio@redhat.com> wrote:
> 
> > On Wed, 20 May 2026 17:34:45 +0200
> > Stefano Brivio <sbrivio@redhat.com> wrote:
> >   
> > > On Wed, 13 May 2026 13:52:08 +0200
> > > Laurent Vivier <lvivier@redhat.com> wrote:
> > >     
> > > > Currently, the vhost-user path assumes each virtqueue element contains
> > > > exactly one iovec entry covering the entire frame.  This assumption
> > > > breaks as some virtio-net drivers (notably iPXE) provide descriptors where the
> > > > vnet header and the frame payload are in separate buffers, resulting in
> > > > two iovec entries per virtqueue element.
> > > > 
> > > > This series refactors the vhost-user data path so that frame lengths,
> > > > header sizes, and padding are tracked and passed explicitly rather than
> > > > being derived from iovec sizes.  This decoupling is a prerequisite for
> > > > correctly handling padding of multi-buffer frames.      
> > > 
> > > Sorry to bring (likely) bad news, but this series seems to introduce a
> > > regression: I got the migration/rampstream_in tests fail twice in a
> > > row, which I've never saw happening (I think I saw a single failure a
> > > long time ago when the machine had a high CPU load, but nothing else).
> > > 
> > > I'm currently bisecting and the bisect seems to point towards the end
> > > of the series (probably 10/10), but I haven't finished yet. I'll keep
> > > you posted. I haven't spotted anything that might cause issues there.    
> > 
> > Yeah, that's the one :(
> > 
> > $ git bisect bad
> > db798fc60f4c5869cb53168354e068fb4dabd91a is the first bad commit
> > commit db798fc60f4c5869cb53168354e068fb4dabd91a
> > Author: Laurent Vivier <lvivier@redhat.com>
> > Date:   Wed May 13 13:52:18 2026 +0200
> > 
> >     vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad()
> > 
> > The "TCP/IPv4: sequence check, ramps, inbound" test in rampstream_in
> > gets stuck, once the source is done with the migration, and passt on the
> > destination just printed:
> > 
> > Accepted TCP_REPAIR helper, PID 13
> > accepted connection from PID 16
> > 
> > I'll get captures and logs next. It seems to fail most of the times,
> > I had two failures in a row again.  
> 
> Log from passt --debug attached. Likely highlight:
> 
> ---
> 13.2853: ================ Vhost user message ================
> 13.2853: Request: VHOST_USER_SET_VRING_ADDR (9)
> 13.2853: Flags:   0x1
> 13.2853: Size:    40
> 13.2853: vhost_vring_addr:
> 13.2853:     index:  0
> 13.2853:     flags:  0
> 13.2853:     desc_user_addr:   0x00007f0943f41000
> 13.2853:     used_user_addr:   0x00007f0943f42240
> 13.2854:     avail_user_addr:  0x00007f0943f42000
> 13.2854:     log_guest_addr:   0x000000001ff43240
> 13.2854: Setting virtq addresses:
> 13.2854:     vring_desc  at 0x7f2e2e2ca000
> 13.2854:     vring_used  at 0x7f2e2e2cb240
> 13.2854:     vring_avail at 0x7f2e2e2cb000
> 13.2854: Last avail index != used index: 2163 != 1936
> 13.2854: Got packet, but RX virtqueue not usable yet
> ---
> 
> pcap file of that passt instance empty, it didn't have a chance to
> send/receive packets yet.

...but I bisected 10/10 itself, and realised that reverting the
iov_truncate() -> iov_skip_bytes() conversion in tcp_vu_sock_recv()
like this:

---
diff --git a/tcp_vu.c b/tcp_vu.c
index f6ac76e..ccc031e 100644
--- a/tcp_vu.c
+++ b/tcp_vu.c
@@ -249,11 +249,7 @@ static ssize_t tcp_vu_sock_recv(const struct ctx
*c, struct vu_virtq *vq, if (!peek_offset_cap)
 		ret -= already_sent;
 
-	i = iov_skip_bytes(&iov_vu[DISCARD_IOV_NUM], iov_used,
-			   MAX(hdrlen + ret, VNET_HLEN + ETH_ZLEN),
-			   NULL);
-	if ((size_t)i < iov_used)
-		i++;
+	i = iov_truncate(&iov_vu[DISCARD_IOV_NUM], iov_used, ret);
 
 	/* adjust head count */
 	while (*head_cnt > 0 && head[*head_cnt - 1] >= i)
---

hides / fixes the issue.

I'm testing things on a kernel without SO_PEEK_OFF support for TCP, but
it doesn't seem to matter ('ret' at this point is the same before and
after your patch).

I don't see what's wrong with your change though. It's not even about
replacing 'ret' with the padded version, because I can also reproduce
the issue with:

	i = iov_skip_bytes(&iov_vu[DISCARD_IOV_NUM], iov_used, ret,
			   NULL);

For convenience, this is how I'm selecting the test without bothering
about variables in run():

---
diff --git a/test/run b/test/run
index f858e55..25d7002 100755
--- a/test/run
+++ b/test/run
@@ -71,6 +71,7 @@ run() {
 	perf_init
 	[ ${CI} -eq 1 ]   && video_start ci
 
+dont() {
 	exeter smoke/smoke.sh
 	exeter build/build.py
 	exeter build/static_checkers.sh
@@ -162,6 +163,10 @@ run() {
 	setup migrate
 	test migrate/iperf3_many_out6
 	teardown migrate
+}
+	VHOST_USER=1
+	VALGRIND=0
+
 	setup migrate
 	test migrate/rampstream_in
 	teardown migrate
---

-- 
Stefano


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element
  2026-05-20 20:53       ` Stefano Brivio
@ 2026-05-21  8:30         ` Laurent Vivier
  2026-05-21 23:13           ` Laurent Vivier
  0 siblings, 1 reply; 26+ messages in thread
From: Laurent Vivier @ 2026-05-21  8:30 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Jon Maloy, David GIbson

On 5/20/26 22:53, Stefano Brivio wrote:
> On Wed, 20 May 2026 18:18:52 +0200
> Stefano Brivio <sbrivio@redhat.com> wrote:
> 
>> On Wed, 20 May 2026 18:07:08 +0200
>> Stefano Brivio <sbrivio@redhat.com> wrote:
>>
>>> On Wed, 20 May 2026 17:34:45 +0200
>>> Stefano Brivio <sbrivio@redhat.com> wrote:
>>>    
>>>> On Wed, 13 May 2026 13:52:08 +0200
>>>> Laurent Vivier <lvivier@redhat.com> wrote:
>>>>      
>>>>> Currently, the vhost-user path assumes each virtqueue element contains
>>>>> exactly one iovec entry covering the entire frame.  This assumption
>>>>> breaks as some virtio-net drivers (notably iPXE) provide descriptors where the
>>>>> vnet header and the frame payload are in separate buffers, resulting in
>>>>> two iovec entries per virtqueue element.
>>>>>
>>>>> This series refactors the vhost-user data path so that frame lengths,
>>>>> header sizes, and padding are tracked and passed explicitly rather than
>>>>> being derived from iovec sizes.  This decoupling is a prerequisite for
>>>>> correctly handling padding of multi-buffer frames.
>>>>
>>>> Sorry to bring (likely) bad news, but this series seems to introduce a
>>>> regression: I got the migration/rampstream_in tests fail twice in a
>>>> row, which I've never saw happening (I think I saw a single failure a
>>>> long time ago when the machine had a high CPU load, but nothing else).
>>>>
>>>> I'm currently bisecting and the bisect seems to point towards the end
>>>> of the series (probably 10/10), but I haven't finished yet. I'll keep
>>>> you posted. I haven't spotted anything that might cause issues there.
>>>
>>> Yeah, that's the one :(
>>>
>>> $ git bisect bad
>>> db798fc60f4c5869cb53168354e068fb4dabd91a is the first bad commit
>>> commit db798fc60f4c5869cb53168354e068fb4dabd91a
>>> Author: Laurent Vivier <lvivier@redhat.com>
>>> Date:   Wed May 13 13:52:18 2026 +0200
>>>
>>>      vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad()

I checked on my system with the commit previous to this series,
bcc3d37a6e01 ("util: Fix changes to assert_with_msg()") and rampstream_in fails too (not 
everytime).

 > TCP/IPv4: sequence check, ramps, inbound
...failed.

and rampstream_out hangs sometime too.

I'm going to try with ealier commits.

Thanks,
Laurent

>>>
>>> The "TCP/IPv4: sequence check, ramps, inbound" test in rampstream_in
>>> gets stuck, once the source is done with the migration, and passt on the
>>> destination just printed:
>>>
>>> Accepted TCP_REPAIR helper, PID 13
>>> accepted connection from PID 16
>>>
>>> I'll get captures and logs next. It seems to fail most of the times,
>>> I had two failures in a row again.
>>
>> Log from passt --debug attached. Likely highlight:
>>
>> ---
>> 13.2853: ================ Vhost user message ================
>> 13.2853: Request: VHOST_USER_SET_VRING_ADDR (9)
>> 13.2853: Flags:   0x1
>> 13.2853: Size:    40
>> 13.2853: vhost_vring_addr:
>> 13.2853:     index:  0
>> 13.2853:     flags:  0
>> 13.2853:     desc_user_addr:   0x00007f0943f41000
>> 13.2853:     used_user_addr:   0x00007f0943f42240
>> 13.2854:     avail_user_addr:  0x00007f0943f42000
>> 13.2854:     log_guest_addr:   0x000000001ff43240
>> 13.2854: Setting virtq addresses:
>> 13.2854:     vring_desc  at 0x7f2e2e2ca000
>> 13.2854:     vring_used  at 0x7f2e2e2cb240
>> 13.2854:     vring_avail at 0x7f2e2e2cb000
>> 13.2854: Last avail index != used index: 2163 != 1936
>> 13.2854: Got packet, but RX virtqueue not usable yet
>> ---
>>
>> pcap file of that passt instance empty, it didn't have a chance to
>> send/receive packets yet.
> 
> ...but I bisected 10/10 itself, and realised that reverting the
> iov_truncate() -> iov_skip_bytes() conversion in tcp_vu_sock_recv()
> like this:
> 
> ---
> diff --git a/tcp_vu.c b/tcp_vu.c
> index f6ac76e..ccc031e 100644
> --- a/tcp_vu.c
> +++ b/tcp_vu.c
> @@ -249,11 +249,7 @@ static ssize_t tcp_vu_sock_recv(const struct ctx
> *c, struct vu_virtq *vq, if (!peek_offset_cap)
>   		ret -= already_sent;
>   
> -	i = iov_skip_bytes(&iov_vu[DISCARD_IOV_NUM], iov_used,
> -			   MAX(hdrlen + ret, VNET_HLEN + ETH_ZLEN),
> -			   NULL);
> -	if ((size_t)i < iov_used)
> -		i++;
> +	i = iov_truncate(&iov_vu[DISCARD_IOV_NUM], iov_used, ret);
>   
>   	/* adjust head count */
>   	while (*head_cnt > 0 && head[*head_cnt - 1] >= i)
> ---
> 
> hides / fixes the issue.
> 
> I'm testing things on a kernel without SO_PEEK_OFF support for TCP, but
> it doesn't seem to matter ('ret' at this point is the same before and
> after your patch).
> 
> I don't see what's wrong with your change though. It's not even about
> replacing 'ret' with the padded version, because I can also reproduce
> the issue with:
> 
> 	i = iov_skip_bytes(&iov_vu[DISCARD_IOV_NUM], iov_used, ret,
> 			   NULL);
> 
> For convenience, this is how I'm selecting the test without bothering
> about variables in run():
> 
> ---
> diff --git a/test/run b/test/run
> index f858e55..25d7002 100755
> --- a/test/run
> +++ b/test/run
> @@ -71,6 +71,7 @@ run() {
>   	perf_init
>   	[ ${CI} -eq 1 ]   && video_start ci
>   
> +dont() {
>   	exeter smoke/smoke.sh
>   	exeter build/build.py
>   	exeter build/static_checkers.sh
> @@ -162,6 +163,10 @@ run() {
>   	setup migrate
>   	test migrate/iperf3_many_out6
>   	teardown migrate
> +}
> +	VHOST_USER=1
> +	VALGRIND=0
> +
>   	setup migrate
>   	test migrate/rampstream_in
>   	teardown migrate
> ---
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element
  2026-05-21  8:30         ` Laurent Vivier
@ 2026-05-21 23:13           ` Laurent Vivier
  2026-05-22  4:22             ` Stefano Brivio
  0 siblings, 1 reply; 26+ messages in thread
From: Laurent Vivier @ 2026-05-21 23:13 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Jon Maloy, David GIbson

On 5/21/26 10:30, Laurent Vivier wrote:
> On 5/20/26 22:53, Stefano Brivio wrote:
>> On Wed, 20 May 2026 18:18:52 +0200
>> Stefano Brivio <sbrivio@redhat.com> wrote:
>>
>>> On Wed, 20 May 2026 18:07:08 +0200
>>> Stefano Brivio <sbrivio@redhat.com> wrote:
>>>
>>>> On Wed, 20 May 2026 17:34:45 +0200
>>>> Stefano Brivio <sbrivio@redhat.com> wrote:
>>>>> On Wed, 13 May 2026 13:52:08 +0200
>>>>> Laurent Vivier <lvivier@redhat.com> wrote:
>>>>>> Currently, the vhost-user path assumes each virtqueue element contains
>>>>>> exactly one iovec entry covering the entire frame.  This assumption
>>>>>> breaks as some virtio-net drivers (notably iPXE) provide descriptors where the
>>>>>> vnet header and the frame payload are in separate buffers, resulting in
>>>>>> two iovec entries per virtqueue element.
>>>>>>
>>>>>> This series refactors the vhost-user data path so that frame lengths,
>>>>>> header sizes, and padding are tracked and passed explicitly rather than
>>>>>> being derived from iovec sizes.  This decoupling is a prerequisite for
>>>>>> correctly handling padding of multi-buffer frames.
>>>>>
>>>>> Sorry to bring (likely) bad news, but this series seems to introduce a
>>>>> regression: I got the migration/rampstream_in tests fail twice in a
>>>>> row, which I've never saw happening (I think I saw a single failure a
>>>>> long time ago when the machine had a high CPU load, but nothing else).
>>>>>
>>>>> I'm currently bisecting and the bisect seems to point towards the end
>>>>> of the series (probably 10/10), but I haven't finished yet. I'll keep
>>>>> you posted. I haven't spotted anything that might cause issues there.
>>>>
>>>> Yeah, that's the one :(
>>>>
>>>> $ git bisect bad
>>>> db798fc60f4c5869cb53168354e068fb4dabd91a is the first bad commit
>>>> commit db798fc60f4c5869cb53168354e068fb4dabd91a
>>>> Author: Laurent Vivier <lvivier@redhat.com>
>>>> Date:   Wed May 13 13:52:18 2026 +0200
>>>>
>>>>      vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad()
> 
> I checked on my system with the commit previous to this series,
> bcc3d37a6e01 ("util: Fix changes to assert_with_msg()") and rampstream_in fails too (not 
> everytime).
> 
>  > TCP/IPv4: sequence check, ramps, inbound
> ...failed.
> 
> and rampstream_out hangs sometime too.
> 
> I'm going to try with ealier commits.

For me the problem can happen with any commit...

As it depends on the execution path and on the load and speed of the system it looks like 
a race condition.

Did you try to test on a host with a kernel patched with
"[PATCH net v2 0/2] Fix race condition between TCP_REPAIR dump and data receive" ?

Thanks,
Laurent


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element
  2026-05-21 23:13           ` Laurent Vivier
@ 2026-05-22  4:22             ` Stefano Brivio
  2026-05-22  5:44               ` Stefano Brivio
  0 siblings, 1 reply; 26+ messages in thread
From: Stefano Brivio @ 2026-05-22  4:22 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev, Jon Maloy, David GIbson

On Fri, 22 May 2026 01:13:33 +0200
Laurent Vivier <lvivier@redhat.com> wrote:

> On 5/21/26 10:30, Laurent Vivier wrote:
> > On 5/20/26 22:53, Stefano Brivio wrote:  
> >> On Wed, 20 May 2026 18:18:52 +0200
> >> Stefano Brivio <sbrivio@redhat.com> wrote:
> >>  
> >>> On Wed, 20 May 2026 18:07:08 +0200
> >>> Stefano Brivio <sbrivio@redhat.com> wrote:
> >>>  
> >>>> On Wed, 20 May 2026 17:34:45 +0200
> >>>> Stefano Brivio <sbrivio@redhat.com> wrote:  
> >>>>> On Wed, 13 May 2026 13:52:08 +0200
> >>>>> Laurent Vivier <lvivier@redhat.com> wrote:  
> >>>>>> Currently, the vhost-user path assumes each virtqueue element contains
> >>>>>> exactly one iovec entry covering the entire frame.  This assumption
> >>>>>> breaks as some virtio-net drivers (notably iPXE) provide descriptors where the
> >>>>>> vnet header and the frame payload are in separate buffers, resulting in
> >>>>>> two iovec entries per virtqueue element.
> >>>>>>
> >>>>>> This series refactors the vhost-user data path so that frame lengths,
> >>>>>> header sizes, and padding are tracked and passed explicitly rather than
> >>>>>> being derived from iovec sizes.  This decoupling is a prerequisite for
> >>>>>> correctly handling padding of multi-buffer frames.  
> >>>>>
> >>>>> Sorry to bring (likely) bad news, but this series seems to introduce a
> >>>>> regression: I got the migration/rampstream_in tests fail twice in a
> >>>>> row, which I've never saw happening (I think I saw a single failure a
> >>>>> long time ago when the machine had a high CPU load, but nothing else).
> >>>>>
> >>>>> I'm currently bisecting and the bisect seems to point towards the end
> >>>>> of the series (probably 10/10), but I haven't finished yet. I'll keep
> >>>>> you posted. I haven't spotted anything that might cause issues there.  
> >>>>
> >>>> Yeah, that's the one :(
> >>>>
> >>>> $ git bisect bad
> >>>> db798fc60f4c5869cb53168354e068fb4dabd91a is the first bad commit
> >>>> commit db798fc60f4c5869cb53168354e068fb4dabd91a
> >>>> Author: Laurent Vivier <lvivier@redhat.com>
> >>>> Date:   Wed May 13 13:52:18 2026 +0200
> >>>>
> >>>>      vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad()  
> > 
> > I checked on my system with the commit previous to this series,
> > bcc3d37a6e01 ("util: Fix changes to assert_with_msg()") and rampstream_in fails too (not 
> > everytime).
> >   
> >  > TCP/IPv4: sequence check, ramps, inbound  
> > ...failed.
> > 
> > and rampstream_out hangs sometime too.
> > 
> > I'm going to try with ealier commits.  
> 
> For me the problem can happen with any commit...
> 
> As it depends on the execution path and on the load and speed of the system it looks like 
> a race condition.

Hah, thanks for checking. Maybe...

> Did you try to test on a host with a kernel patched with
> "[PATCH net v2 0/2] Fix race condition between TCP_REPAIR dump and data receive" ?

Now I tried, and yes, the test doesn't hang anymore! I seem to have an
issue with teardown functions on recent kernels (current net.git HEAD
more or less):

---
+ teardown_migrate
+ cat /tmp/passt-tests-VVtLn0/migrate/qemu_1.pid
+ /home/sbrivio/passt/test/nstool exec /tmp/passt-tests-VVtLn0/migrate/ns1.hold -- kill 16
qemu-system-x86_64: terminating on signal 15 from pid 34 ()
+ cat /tmp/passt-tests-VVtLn0/migrate/qemu_2.pid
+ /home/sbrivio/passt/test/nstool exec /tmp/passt-tests-VVtLn0/migrate/ns1.hold -- kill 15
18.8974: ================ Vhost user message ================
18.8974: Request: VHOST_USER_GET_VRING_BASE (11)
18.8974: Flags:   0x1
18.8974: Size:    8
18.8974: State.index: 0
18.8975: ================ Vhost user message ================
18.8975: Request: VHOST_USER_GET_VRING_BASE (11)
18.8975: Flags:   0x1
18.8975: Size:    8
18.8975: State.index: 1
qemu-system-x86_64: terminating on signal 15 from pid 35 ()
18.7961: Client connection closed
18.7962: Closing TCP_REPAIR helper socket
+ context_wait qemu_1
+ __name=qemu_1
+ __pidfile=/tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
+ cat /tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
+ rc=0
+ rm /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_2.stdout.9pwpVbQr /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_2.stderr.dSY5hBu1
+ __pid=67766
+ rm /tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
+ [ 1 -eq 1 ]
+ echo [Exit code: 0]
+ echo -n passt_repair_2$ 
+ return 0
18.9016: Client connection closed
18.9018: Closing TCP_REPAIR helper socket
+ wait 67766
+ rc=0
+ rm /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_1.stdout.JEyDGxXe /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_1.stderr.WU550iEI
+ [ 1 -eq 1 ]
+ echo [Exit code: 0]
+ echo -n passt_repair_1$ 
+ return 0
+ rc=0
+ rm /tmp/passt-tests-VVtLn0/migrate/context_qemu_2.stdout.Dm8EAhfl /tmp/passt-tests-VVtLn0/migrate/context_qemu_2.stderr.207qJYPA
+ [ 1 -eq 1 ]
+ echo [Exit code: 0]
+ echo -n qemu_2$ 
+ return 0
2026/05/22 04:08:23 socat[73089] E connect(5, AF=40 cid:94558 port:22, 16): Connection timed out
Connection closed by UNKNOWN port 65535
...
---

it looks like we stop QEMU a bit too early. But it should be unrelated.

I'm now trying to find some kind of workaround for existing (not fixed)
kernel versions. Maybe stopping rampstream_in for a moment or something
like that.

-- 
Stefano


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element
  2026-05-22  4:22             ` Stefano Brivio
@ 2026-05-22  5:44               ` Stefano Brivio
  2026-05-22  6:15                 ` David GIbson
  2026-05-22 12:04                 ` Stefano Brivio
  0 siblings, 2 replies; 26+ messages in thread
From: Stefano Brivio @ 2026-05-22  5:44 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev, Jon Maloy, David GIbson

On Fri, 22 May 2026 06:22:39 +0200
Stefano Brivio <sbrivio@redhat.com> wrote:

> On Fri, 22 May 2026 01:13:33 +0200
> Laurent Vivier <lvivier@redhat.com> wrote:
> 
> > On 5/21/26 10:30, Laurent Vivier wrote:  
> > > On 5/20/26 22:53, Stefano Brivio wrote:    
> > >> On Wed, 20 May 2026 18:18:52 +0200
> > >> Stefano Brivio <sbrivio@redhat.com> wrote:
> > >>    
> > >>> On Wed, 20 May 2026 18:07:08 +0200
> > >>> Stefano Brivio <sbrivio@redhat.com> wrote:
> > >>>    
> > >>>> On Wed, 20 May 2026 17:34:45 +0200
> > >>>> Stefano Brivio <sbrivio@redhat.com> wrote:    
> > >>>>> On Wed, 13 May 2026 13:52:08 +0200
> > >>>>> Laurent Vivier <lvivier@redhat.com> wrote:    
> > >>>>>> Currently, the vhost-user path assumes each virtqueue element contains
> > >>>>>> exactly one iovec entry covering the entire frame.  This assumption
> > >>>>>> breaks as some virtio-net drivers (notably iPXE) provide descriptors where the
> > >>>>>> vnet header and the frame payload are in separate buffers, resulting in
> > >>>>>> two iovec entries per virtqueue element.
> > >>>>>>
> > >>>>>> This series refactors the vhost-user data path so that frame lengths,
> > >>>>>> header sizes, and padding are tracked and passed explicitly rather than
> > >>>>>> being derived from iovec sizes.  This decoupling is a prerequisite for
> > >>>>>> correctly handling padding of multi-buffer frames.    
> > >>>>>
> > >>>>> Sorry to bring (likely) bad news, but this series seems to introduce a
> > >>>>> regression: I got the migration/rampstream_in tests fail twice in a
> > >>>>> row, which I've never saw happening (I think I saw a single failure a
> > >>>>> long time ago when the machine had a high CPU load, but nothing else).
> > >>>>>
> > >>>>> I'm currently bisecting and the bisect seems to point towards the end
> > >>>>> of the series (probably 10/10), but I haven't finished yet. I'll keep
> > >>>>> you posted. I haven't spotted anything that might cause issues there.    
> > >>>>
> > >>>> Yeah, that's the one :(
> > >>>>
> > >>>> $ git bisect bad
> > >>>> db798fc60f4c5869cb53168354e068fb4dabd91a is the first bad commit
> > >>>> commit db798fc60f4c5869cb53168354e068fb4dabd91a
> > >>>> Author: Laurent Vivier <lvivier@redhat.com>
> > >>>> Date:   Wed May 13 13:52:18 2026 +0200
> > >>>>
> > >>>>      vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad()    
> > > 
> > > I checked on my system with the commit previous to this series,
> > > bcc3d37a6e01 ("util: Fix changes to assert_with_msg()") and rampstream_in fails too (not 
> > > everytime).
> > >     
> > >  > TCP/IPv4: sequence check, ramps, inbound    
> > > ...failed.
> > > 
> > > and rampstream_out hangs sometime too.
> > > 
> > > I'm going to try with ealier commits.    
> > 
> > For me the problem can happen with any commit...
> > 
> > As it depends on the execution path and on the load and speed of the system it looks like 
> > a race condition.  
> 
> Hah, thanks for checking. Maybe...
> 
> > Did you try to test on a host with a kernel patched with
> > "[PATCH net v2 0/2] Fix race condition between TCP_REPAIR dump and data receive" ?  
> 
> Now I tried, and yes, the test doesn't hang anymore! I seem to have an
> issue with teardown functions on recent kernels (current net.git HEAD
> more or less):
> 
> ---
> + teardown_migrate
> + cat /tmp/passt-tests-VVtLn0/migrate/qemu_1.pid
> + /home/sbrivio/passt/test/nstool exec /tmp/passt-tests-VVtLn0/migrate/ns1.hold -- kill 16
> qemu-system-x86_64: terminating on signal 15 from pid 34 ()
> + cat /tmp/passt-tests-VVtLn0/migrate/qemu_2.pid
> + /home/sbrivio/passt/test/nstool exec /tmp/passt-tests-VVtLn0/migrate/ns1.hold -- kill 15
> 18.8974: ================ Vhost user message ================
> 18.8974: Request: VHOST_USER_GET_VRING_BASE (11)
> 18.8974: Flags:   0x1
> 18.8974: Size:    8
> 18.8974: State.index: 0
> 18.8975: ================ Vhost user message ================
> 18.8975: Request: VHOST_USER_GET_VRING_BASE (11)
> 18.8975: Flags:   0x1
> 18.8975: Size:    8
> 18.8975: State.index: 1
> qemu-system-x86_64: terminating on signal 15 from pid 35 ()
> 18.7961: Client connection closed
> 18.7962: Closing TCP_REPAIR helper socket
> + context_wait qemu_1
> + __name=qemu_1
> + __pidfile=/tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
> + cat /tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
> + rc=0
> + rm /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_2.stdout.9pwpVbQr /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_2.stderr.dSY5hBu1
> + __pid=67766
> + rm /tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
> + [ 1 -eq 1 ]
> + echo [Exit code: 0]
> + echo -n passt_repair_2$ 
> + return 0
> 18.9016: Client connection closed
> 18.9018: Closing TCP_REPAIR helper socket
> + wait 67766
> + rc=0
> + rm /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_1.stdout.JEyDGxXe /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_1.stderr.WU550iEI
> + [ 1 -eq 1 ]
> + echo [Exit code: 0]
> + echo -n passt_repair_1$ 
> + return 0
> + rc=0
> + rm /tmp/passt-tests-VVtLn0/migrate/context_qemu_2.stdout.Dm8EAhfl /tmp/passt-tests-VVtLn0/migrate/context_qemu_2.stderr.207qJYPA
> + [ 1 -eq 1 ]
> + echo [Exit code: 0]
> + echo -n qemu_2$ 
> + return 0
> 2026/05/22 04:08:23 socat[73089] E connect(5, AF=40 cid:94558 port:22, 16): Connection timed out
> Connection closed by UNKNOWN port 65535
> ...
> ---
> 
> it looks like we stop QEMU a bit too early. But it should be unrelated.
> 
> I'm now trying to find some kind of workaround for existing (not fixed)
> kernel versions. Maybe stopping rampstream_in for a moment or something
> like that.

For some weird reason even very blatant throttling (100 ms - 1 s delays
every 10000 ramps, or an explicit 500 ms pause via signal before
migration) doesn't help.

So it doesn't seem to be *that* kind of race. I should probably check
the same exact kernel version with fix and without...

-- 
Stefano


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element
  2026-05-22  5:44               ` Stefano Brivio
@ 2026-05-22  6:15                 ` David GIbson
  2026-05-22  6:23                   ` Stefano Brivio
  2026-05-22 12:04                 ` Stefano Brivio
  1 sibling, 1 reply; 26+ messages in thread
From: David GIbson @ 2026-05-22  6:15 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Laurent Vivier, passt-dev, Jon Maloy

[-- Attachment #1: Type: text/plain, Size: 7389 bytes --]

On Fri, May 22, 2026 at 07:44:56AM +0200, Stefano Brivio wrote:
> On Fri, 22 May 2026 06:22:39 +0200
> Stefano Brivio <sbrivio@redhat.com> wrote:
> 
> > On Fri, 22 May 2026 01:13:33 +0200
> > Laurent Vivier <lvivier@redhat.com> wrote:
> > 
> > > On 5/21/26 10:30, Laurent Vivier wrote:  
> > > > On 5/20/26 22:53, Stefano Brivio wrote:    
> > > >> On Wed, 20 May 2026 18:18:52 +0200
> > > >> Stefano Brivio <sbrivio@redhat.com> wrote:
> > > >>    
> > > >>> On Wed, 20 May 2026 18:07:08 +0200
> > > >>> Stefano Brivio <sbrivio@redhat.com> wrote:
> > > >>>    
> > > >>>> On Wed, 20 May 2026 17:34:45 +0200
> > > >>>> Stefano Brivio <sbrivio@redhat.com> wrote:    
> > > >>>>> On Wed, 13 May 2026 13:52:08 +0200
> > > >>>>> Laurent Vivier <lvivier@redhat.com> wrote:    
> > > >>>>>> Currently, the vhost-user path assumes each virtqueue element contains
> > > >>>>>> exactly one iovec entry covering the entire frame.  This assumption
> > > >>>>>> breaks as some virtio-net drivers (notably iPXE) provide descriptors where the
> > > >>>>>> vnet header and the frame payload are in separate buffers, resulting in
> > > >>>>>> two iovec entries per virtqueue element.
> > > >>>>>>
> > > >>>>>> This series refactors the vhost-user data path so that frame lengths,
> > > >>>>>> header sizes, and padding are tracked and passed explicitly rather than
> > > >>>>>> being derived from iovec sizes.  This decoupling is a prerequisite for
> > > >>>>>> correctly handling padding of multi-buffer frames.    
> > > >>>>>
> > > >>>>> Sorry to bring (likely) bad news, but this series seems to introduce a
> > > >>>>> regression: I got the migration/rampstream_in tests fail twice in a
> > > >>>>> row, which I've never saw happening (I think I saw a single failure a
> > > >>>>> long time ago when the machine had a high CPU load, but nothing else).
> > > >>>>>
> > > >>>>> I'm currently bisecting and the bisect seems to point towards the end
> > > >>>>> of the series (probably 10/10), but I haven't finished yet. I'll keep
> > > >>>>> you posted. I haven't spotted anything that might cause issues there.    
> > > >>>>
> > > >>>> Yeah, that's the one :(
> > > >>>>
> > > >>>> $ git bisect bad
> > > >>>> db798fc60f4c5869cb53168354e068fb4dabd91a is the first bad commit
> > > >>>> commit db798fc60f4c5869cb53168354e068fb4dabd91a
> > > >>>> Author: Laurent Vivier <lvivier@redhat.com>
> > > >>>> Date:   Wed May 13 13:52:18 2026 +0200
> > > >>>>
> > > >>>>      vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad()    
> > > > 
> > > > I checked on my system with the commit previous to this series,
> > > > bcc3d37a6e01 ("util: Fix changes to assert_with_msg()") and rampstream_in fails too (not 
> > > > everytime).
> > > >     
> > > >  > TCP/IPv4: sequence check, ramps, inbound    
> > > > ...failed.
> > > > 
> > > > and rampstream_out hangs sometime too.
> > > > 
> > > > I'm going to try with ealier commits.    
> > > 
> > > For me the problem can happen with any commit...
> > > 
> > > As it depends on the execution path and on the load and speed of the system it looks like 
> > > a race condition.  
> > 
> > Hah, thanks for checking. Maybe...
> > 
> > > Did you try to test on a host with a kernel patched with
> > > "[PATCH net v2 0/2] Fix race condition between TCP_REPAIR dump and data receive" ?  
> > 
> > Now I tried, and yes, the test doesn't hang anymore! I seem to have an
> > issue with teardown functions on recent kernels (current net.git HEAD
> > more or less):
> > 
> > ---
> > + teardown_migrate
> > + cat /tmp/passt-tests-VVtLn0/migrate/qemu_1.pid
> > + /home/sbrivio/passt/test/nstool exec /tmp/passt-tests-VVtLn0/migrate/ns1.hold -- kill 16
> > qemu-system-x86_64: terminating on signal 15 from pid 34 ()
> > + cat /tmp/passt-tests-VVtLn0/migrate/qemu_2.pid
> > + /home/sbrivio/passt/test/nstool exec /tmp/passt-tests-VVtLn0/migrate/ns1.hold -- kill 15
> > 18.8974: ================ Vhost user message ================
> > 18.8974: Request: VHOST_USER_GET_VRING_BASE (11)
> > 18.8974: Flags:   0x1
> > 18.8974: Size:    8
> > 18.8974: State.index: 0
> > 18.8975: ================ Vhost user message ================
> > 18.8975: Request: VHOST_USER_GET_VRING_BASE (11)
> > 18.8975: Flags:   0x1
> > 18.8975: Size:    8
> > 18.8975: State.index: 1
> > qemu-system-x86_64: terminating on signal 15 from pid 35 ()
> > 18.7961: Client connection closed
> > 18.7962: Closing TCP_REPAIR helper socket
> > + context_wait qemu_1
> > + __name=qemu_1
> > + __pidfile=/tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
> > + cat /tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
> > + rc=0
> > + rm /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_2.stdout.9pwpVbQr /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_2.stderr.dSY5hBu1
> > + __pid=67766
> > + rm /tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
> > + [ 1 -eq 1 ]
> > + echo [Exit code: 0]
> > + echo -n passt_repair_2$ 
> > + return 0
> > 18.9016: Client connection closed
> > 18.9018: Closing TCP_REPAIR helper socket
> > + wait 67766
> > + rc=0
> > + rm /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_1.stdout.JEyDGxXe /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_1.stderr.WU550iEI
> > + [ 1 -eq 1 ]
> > + echo [Exit code: 0]
> > + echo -n passt_repair_1$ 
> > + return 0
> > + rc=0
> > + rm /tmp/passt-tests-VVtLn0/migrate/context_qemu_2.stdout.Dm8EAhfl /tmp/passt-tests-VVtLn0/migrate/context_qemu_2.stderr.207qJYPA
> > + [ 1 -eq 1 ]
> > + echo [Exit code: 0]
> > + echo -n qemu_2$ 
> > + return 0
> > 2026/05/22 04:08:23 socat[73089] E connect(5, AF=40 cid:94558 port:22, 16): Connection timed out
> > Connection closed by UNKNOWN port 65535
> > ...
> > ---
> > 
> > it looks like we stop QEMU a bit too early. But it should be unrelated.
> > 
> > I'm now trying to find some kind of workaround for existing (not fixed)
> > kernel versions. Maybe stopping rampstream_in for a moment or something
> > like that.
> 
> For some weird reason even very blatant throttling (100 ms - 1 s delays
> every 10000 ramps, or an explicit 500 ms pause via signal before
> migration) doesn't help.
> 
> So it doesn't seem to be *that* kind of race. I should probably check
> the same exact kernel version with fix and without...

If it's due to the kernel not stopping the queues on REPAIR, then the
only real way to fix the test is to cut off the source machine's
network before we trigger migration.  That could be done with
netfilter (in a user+netns).  But probably more natural would be to
not do the migration between local passt instances, but actually
between two host namespaces, with separate netifs for external
connectivity and for the migration.  Remove the external netif on the
source, then trigger migration, then add the external netif on the
destination.

It's quite a bit of hassle :(.  But it does model something much
closer to a real migration scenario.  As a bonus it would mean we'd no
longer rely on the hack of guessing when to exit the source passt in
order to allow the destination passt to bind.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element
  2026-05-22  6:15                 ` David GIbson
@ 2026-05-22  6:23                   ` Stefano Brivio
  2026-05-22  6:36                     ` David GIbson
  0 siblings, 1 reply; 26+ messages in thread
From: Stefano Brivio @ 2026-05-22  6:23 UTC (permalink / raw)
  To: David GIbson; +Cc: Laurent Vivier, passt-dev, Jon Maloy

On Fri, 22 May 2026 16:15:08 +1000
David GIbson <david@gibson.dropbear.id.au> wrote:

> On Fri, May 22, 2026 at 07:44:56AM +0200, Stefano Brivio wrote:
> > On Fri, 22 May 2026 06:22:39 +0200
> > Stefano Brivio <sbrivio@redhat.com> wrote:
> >   
> > > On Fri, 22 May 2026 01:13:33 +0200
> > > Laurent Vivier <lvivier@redhat.com> wrote:
> > >   
> > > > On 5/21/26 10:30, Laurent Vivier wrote:    
> > > > > On 5/20/26 22:53, Stefano Brivio wrote:      
> > > > >> On Wed, 20 May 2026 18:18:52 +0200
> > > > >> Stefano Brivio <sbrivio@redhat.com> wrote:
> > > > >>      
> > > > >>> On Wed, 20 May 2026 18:07:08 +0200
> > > > >>> Stefano Brivio <sbrivio@redhat.com> wrote:
> > > > >>>      
> > > > >>>> On Wed, 20 May 2026 17:34:45 +0200
> > > > >>>> Stefano Brivio <sbrivio@redhat.com> wrote:      
> > > > >>>>> On Wed, 13 May 2026 13:52:08 +0200
> > > > >>>>> Laurent Vivier <lvivier@redhat.com> wrote:      
> > > > >>>>>> Currently, the vhost-user path assumes each virtqueue element contains
> > > > >>>>>> exactly one iovec entry covering the entire frame.  This assumption
> > > > >>>>>> breaks as some virtio-net drivers (notably iPXE) provide descriptors where the
> > > > >>>>>> vnet header and the frame payload are in separate buffers, resulting in
> > > > >>>>>> two iovec entries per virtqueue element.
> > > > >>>>>>
> > > > >>>>>> This series refactors the vhost-user data path so that frame lengths,
> > > > >>>>>> header sizes, and padding are tracked and passed explicitly rather than
> > > > >>>>>> being derived from iovec sizes.  This decoupling is a prerequisite for
> > > > >>>>>> correctly handling padding of multi-buffer frames.      
> > > > >>>>>
> > > > >>>>> Sorry to bring (likely) bad news, but this series seems to introduce a
> > > > >>>>> regression: I got the migration/rampstream_in tests fail twice in a
> > > > >>>>> row, which I've never saw happening (I think I saw a single failure a
> > > > >>>>> long time ago when the machine had a high CPU load, but nothing else).
> > > > >>>>>
> > > > >>>>> I'm currently bisecting and the bisect seems to point towards the end
> > > > >>>>> of the series (probably 10/10), but I haven't finished yet. I'll keep
> > > > >>>>> you posted. I haven't spotted anything that might cause issues there.      
> > > > >>>>
> > > > >>>> Yeah, that's the one :(
> > > > >>>>
> > > > >>>> $ git bisect bad
> > > > >>>> db798fc60f4c5869cb53168354e068fb4dabd91a is the first bad commit
> > > > >>>> commit db798fc60f4c5869cb53168354e068fb4dabd91a
> > > > >>>> Author: Laurent Vivier <lvivier@redhat.com>
> > > > >>>> Date:   Wed May 13 13:52:18 2026 +0200
> > > > >>>>
> > > > >>>>      vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad()      
> > > > > 
> > > > > I checked on my system with the commit previous to this series,
> > > > > bcc3d37a6e01 ("util: Fix changes to assert_with_msg()") and rampstream_in fails too (not 
> > > > > everytime).
> > > > >       
> > > > >  > TCP/IPv4: sequence check, ramps, inbound      
> > > > > ...failed.
> > > > > 
> > > > > and rampstream_out hangs sometime too.
> > > > > 
> > > > > I'm going to try with ealier commits.      
> > > > 
> > > > For me the problem can happen with any commit...
> > > > 
> > > > As it depends on the execution path and on the load and speed of the system it looks like 
> > > > a race condition.    
> > > 
> > > Hah, thanks for checking. Maybe...
> > >   
> > > > Did you try to test on a host with a kernel patched with
> > > > "[PATCH net v2 0/2] Fix race condition between TCP_REPAIR dump and data receive" ?    
> > > 
> > > Now I tried, and yes, the test doesn't hang anymore! I seem to have an
> > > issue with teardown functions on recent kernels (current net.git HEAD
> > > more or less):
> > > 
> > > ---
> > > + teardown_migrate
> > > + cat /tmp/passt-tests-VVtLn0/migrate/qemu_1.pid
> > > + /home/sbrivio/passt/test/nstool exec /tmp/passt-tests-VVtLn0/migrate/ns1.hold -- kill 16
> > > qemu-system-x86_64: terminating on signal 15 from pid 34 ()
> > > + cat /tmp/passt-tests-VVtLn0/migrate/qemu_2.pid
> > > + /home/sbrivio/passt/test/nstool exec /tmp/passt-tests-VVtLn0/migrate/ns1.hold -- kill 15
> > > 18.8974: ================ Vhost user message ================
> > > 18.8974: Request: VHOST_USER_GET_VRING_BASE (11)
> > > 18.8974: Flags:   0x1
> > > 18.8974: Size:    8
> > > 18.8974: State.index: 0
> > > 18.8975: ================ Vhost user message ================
> > > 18.8975: Request: VHOST_USER_GET_VRING_BASE (11)
> > > 18.8975: Flags:   0x1
> > > 18.8975: Size:    8
> > > 18.8975: State.index: 1
> > > qemu-system-x86_64: terminating on signal 15 from pid 35 ()
> > > 18.7961: Client connection closed
> > > 18.7962: Closing TCP_REPAIR helper socket
> > > + context_wait qemu_1
> > > + __name=qemu_1
> > > + __pidfile=/tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
> > > + cat /tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
> > > + rc=0
> > > + rm /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_2.stdout.9pwpVbQr /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_2.stderr.dSY5hBu1
> > > + __pid=67766
> > > + rm /tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
> > > + [ 1 -eq 1 ]
> > > + echo [Exit code: 0]
> > > + echo -n passt_repair_2$ 
> > > + return 0
> > > 18.9016: Client connection closed
> > > 18.9018: Closing TCP_REPAIR helper socket
> > > + wait 67766
> > > + rc=0
> > > + rm /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_1.stdout.JEyDGxXe /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_1.stderr.WU550iEI
> > > + [ 1 -eq 1 ]
> > > + echo [Exit code: 0]
> > > + echo -n passt_repair_1$ 
> > > + return 0
> > > + rc=0
> > > + rm /tmp/passt-tests-VVtLn0/migrate/context_qemu_2.stdout.Dm8EAhfl /tmp/passt-tests-VVtLn0/migrate/context_qemu_2.stderr.207qJYPA
> > > + [ 1 -eq 1 ]
> > > + echo [Exit code: 0]
> > > + echo -n qemu_2$ 
> > > + return 0
> > > 2026/05/22 04:08:23 socat[73089] E connect(5, AF=40 cid:94558 port:22, 16): Connection timed out
> > > Connection closed by UNKNOWN port 65535
> > > ...
> > > ---
> > > 
> > > it looks like we stop QEMU a bit too early. But it should be unrelated.
> > > 
> > > I'm now trying to find some kind of workaround for existing (not fixed)
> > > kernel versions. Maybe stopping rampstream_in for a moment or something
> > > like that.  
> > 
> > For some weird reason even very blatant throttling (100 ms - 1 s delays
> > every 10000 ramps, or an explicit 500 ms pause via signal before
> > migration) doesn't help.
> > 
> > So it doesn't seem to be *that* kind of race. I should probably check
> > the same exact kernel version with fix and without...  
> 
> If it's due to the kernel not stopping the queues on REPAIR, then the
> only real way to fix the test is to cut off the source machine's
> network before we trigger migration.

Well, that's a rather complicated way to do it. One could simply stop
the traffic instead. But it doesn't help, so there's probably another
issue.

> That could be done with
> netfilter (in a user+netns).  But probably more natural would be to
> not do the migration between local passt instances, but actually
> between two host namespaces, with separate netifs for external
> connectivity and for the migration.  Remove the external netif on the
> source, then trigger migration, then add the external netif on the
> destination.
> 
> It's quite a bit of hassle :(.  But it does model something much
> closer to a real migration scenario.  As a bonus it would mean we'd no
> longer rely on the hack of guessing when to exit the source passt in
> order to allow the destination passt to bind.

I struggle to see how that would be worth the investment, especially if
we're working around a kernel issue that should eventually be fixed.

Or, at least, right now, I'm just trying to get tests to pass while
keeping Laurent changes in the tree..

-- 
Stefano


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element
  2026-05-22  6:23                   ` Stefano Brivio
@ 2026-05-22  6:36                     ` David GIbson
  2026-05-22  6:45                       ` Stefano Brivio
  0 siblings, 1 reply; 26+ messages in thread
From: David GIbson @ 2026-05-22  6:36 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Laurent Vivier, passt-dev, Jon Maloy

[-- Attachment #1: Type: text/plain, Size: 9081 bytes --]

On Fri, May 22, 2026 at 08:23:50AM +0200, Stefano Brivio wrote:
> On Fri, 22 May 2026 16:15:08 +1000
> David GIbson <david@gibson.dropbear.id.au> wrote:
> 
> > On Fri, May 22, 2026 at 07:44:56AM +0200, Stefano Brivio wrote:
> > > On Fri, 22 May 2026 06:22:39 +0200
> > > Stefano Brivio <sbrivio@redhat.com> wrote:
> > >   
> > > > On Fri, 22 May 2026 01:13:33 +0200
> > > > Laurent Vivier <lvivier@redhat.com> wrote:
> > > >   
> > > > > On 5/21/26 10:30, Laurent Vivier wrote:    
> > > > > > On 5/20/26 22:53, Stefano Brivio wrote:      
> > > > > >> On Wed, 20 May 2026 18:18:52 +0200
> > > > > >> Stefano Brivio <sbrivio@redhat.com> wrote:
> > > > > >>      
> > > > > >>> On Wed, 20 May 2026 18:07:08 +0200
> > > > > >>> Stefano Brivio <sbrivio@redhat.com> wrote:
> > > > > >>>      
> > > > > >>>> On Wed, 20 May 2026 17:34:45 +0200
> > > > > >>>> Stefano Brivio <sbrivio@redhat.com> wrote:      
> > > > > >>>>> On Wed, 13 May 2026 13:52:08 +0200
> > > > > >>>>> Laurent Vivier <lvivier@redhat.com> wrote:      
> > > > > >>>>>> Currently, the vhost-user path assumes each virtqueue element contains
> > > > > >>>>>> exactly one iovec entry covering the entire frame.  This assumption
> > > > > >>>>>> breaks as some virtio-net drivers (notably iPXE) provide descriptors where the
> > > > > >>>>>> vnet header and the frame payload are in separate buffers, resulting in
> > > > > >>>>>> two iovec entries per virtqueue element.
> > > > > >>>>>>
> > > > > >>>>>> This series refactors the vhost-user data path so that frame lengths,
> > > > > >>>>>> header sizes, and padding are tracked and passed explicitly rather than
> > > > > >>>>>> being derived from iovec sizes.  This decoupling is a prerequisite for
> > > > > >>>>>> correctly handling padding of multi-buffer frames.      
> > > > > >>>>>
> > > > > >>>>> Sorry to bring (likely) bad news, but this series seems to introduce a
> > > > > >>>>> regression: I got the migration/rampstream_in tests fail twice in a
> > > > > >>>>> row, which I've never saw happening (I think I saw a single failure a
> > > > > >>>>> long time ago when the machine had a high CPU load, but nothing else).
> > > > > >>>>>
> > > > > >>>>> I'm currently bisecting and the bisect seems to point towards the end
> > > > > >>>>> of the series (probably 10/10), but I haven't finished yet. I'll keep
> > > > > >>>>> you posted. I haven't spotted anything that might cause issues there.      
> > > > > >>>>
> > > > > >>>> Yeah, that's the one :(
> > > > > >>>>
> > > > > >>>> $ git bisect bad
> > > > > >>>> db798fc60f4c5869cb53168354e068fb4dabd91a is the first bad commit
> > > > > >>>> commit db798fc60f4c5869cb53168354e068fb4dabd91a
> > > > > >>>> Author: Laurent Vivier <lvivier@redhat.com>
> > > > > >>>> Date:   Wed May 13 13:52:18 2026 +0200
> > > > > >>>>
> > > > > >>>>      vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad()      
> > > > > > 
> > > > > > I checked on my system with the commit previous to this series,
> > > > > > bcc3d37a6e01 ("util: Fix changes to assert_with_msg()") and rampstream_in fails too (not 
> > > > > > everytime).
> > > > > >       
> > > > > >  > TCP/IPv4: sequence check, ramps, inbound      
> > > > > > ...failed.
> > > > > > 
> > > > > > and rampstream_out hangs sometime too.
> > > > > > 
> > > > > > I'm going to try with ealier commits.      
> > > > > 
> > > > > For me the problem can happen with any commit...
> > > > > 
> > > > > As it depends on the execution path and on the load and speed of the system it looks like 
> > > > > a race condition.    
> > > > 
> > > > Hah, thanks for checking. Maybe...
> > > >   
> > > > > Did you try to test on a host with a kernel patched with
> > > > > "[PATCH net v2 0/2] Fix race condition between TCP_REPAIR dump and data receive" ?    
> > > > 
> > > > Now I tried, and yes, the test doesn't hang anymore! I seem to have an
> > > > issue with teardown functions on recent kernels (current net.git HEAD
> > > > more or less):
> > > > 
> > > > ---
> > > > + teardown_migrate
> > > > + cat /tmp/passt-tests-VVtLn0/migrate/qemu_1.pid
> > > > + /home/sbrivio/passt/test/nstool exec /tmp/passt-tests-VVtLn0/migrate/ns1.hold -- kill 16
> > > > qemu-system-x86_64: terminating on signal 15 from pid 34 ()
> > > > + cat /tmp/passt-tests-VVtLn0/migrate/qemu_2.pid
> > > > + /home/sbrivio/passt/test/nstool exec /tmp/passt-tests-VVtLn0/migrate/ns1.hold -- kill 15
> > > > 18.8974: ================ Vhost user message ================
> > > > 18.8974: Request: VHOST_USER_GET_VRING_BASE (11)
> > > > 18.8974: Flags:   0x1
> > > > 18.8974: Size:    8
> > > > 18.8974: State.index: 0
> > > > 18.8975: ================ Vhost user message ================
> > > > 18.8975: Request: VHOST_USER_GET_VRING_BASE (11)
> > > > 18.8975: Flags:   0x1
> > > > 18.8975: Size:    8
> > > > 18.8975: State.index: 1
> > > > qemu-system-x86_64: terminating on signal 15 from pid 35 ()
> > > > 18.7961: Client connection closed
> > > > 18.7962: Closing TCP_REPAIR helper socket
> > > > + context_wait qemu_1
> > > > + __name=qemu_1
> > > > + __pidfile=/tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
> > > > + cat /tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
> > > > + rc=0
> > > > + rm /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_2.stdout.9pwpVbQr /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_2.stderr.dSY5hBu1
> > > > + __pid=67766
> > > > + rm /tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
> > > > + [ 1 -eq 1 ]
> > > > + echo [Exit code: 0]
> > > > + echo -n passt_repair_2$ 
> > > > + return 0
> > > > 18.9016: Client connection closed
> > > > 18.9018: Closing TCP_REPAIR helper socket
> > > > + wait 67766
> > > > + rc=0
> > > > + rm /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_1.stdout.JEyDGxXe /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_1.stderr.WU550iEI
> > > > + [ 1 -eq 1 ]
> > > > + echo [Exit code: 0]
> > > > + echo -n passt_repair_1$ 
> > > > + return 0
> > > > + rc=0
> > > > + rm /tmp/passt-tests-VVtLn0/migrate/context_qemu_2.stdout.Dm8EAhfl /tmp/passt-tests-VVtLn0/migrate/context_qemu_2.stderr.207qJYPA
> > > > + [ 1 -eq 1 ]
> > > > + echo [Exit code: 0]
> > > > + echo -n qemu_2$ 
> > > > + return 0
> > > > 2026/05/22 04:08:23 socat[73089] E connect(5, AF=40 cid:94558 port:22, 16): Connection timed out
> > > > Connection closed by UNKNOWN port 65535
> > > > ...
> > > > ---
> > > > 
> > > > it looks like we stop QEMU a bit too early. But it should be unrelated.
> > > > 
> > > > I'm now trying to find some kind of workaround for existing (not fixed)
> > > > kernel versions. Maybe stopping rampstream_in for a moment or something
> > > > like that.  
> > > 
> > > For some weird reason even very blatant throttling (100 ms - 1 s delays
> > > every 10000 ramps, or an explicit 500 ms pause via signal before
> > > migration) doesn't help.
> > > 
> > > So it doesn't seem to be *that* kind of race. I should probably check
> > > the same exact kernel version with fix and without...  
> > 
> > If it's due to the kernel not stopping the queues on REPAIR, then the
> > only real way to fix the test is to cut off the source machine's
> > network before we trigger migration.
> 
> Well, that's a rather complicated way to do it. One could simply stop
> the traffic instead.

I don't know that "simply" is quite so simple.  You can suspend the
source of the data, but you need to wait a difficult to ascertain
amount of time for that to make it to the guest, and all the acks to
come back.

For rampstream_out it's worse: the source is in the guest which isn't
supposed to know about the migration in advance, so you can't really
stop it without stopping the guest's whole network.

> But it doesn't help, so there's probably another
> issue.
> 
> > That could be done with
> > netfilter (in a user+netns).  But probably more natural would be to
> > not do the migration between local passt instances, but actually
> > between two host namespaces, with separate netifs for external
> > connectivity and for the migration.  Remove the external netif on the
> > source, then trigger migration, then add the external netif on the
> > destination.
> > 
> > It's quite a bit of hassle :(.  But it does model something much
> > closer to a real migration scenario.  As a bonus it would mean we'd no
> > longer rely on the hack of guessing when to exit the source passt in
> > order to allow the destination passt to bind.
> 
> I struggle to see how that would be worth the investment, especially if
> we're working around a kernel issue that should eventually be fixed.
> 
> Or, at least, right now, I'm just trying to get tests to pass while
> keeping Laurent changes in the tree..
> 
> -- 
> Stefano
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element
  2026-05-22  6:36                     ` David GIbson
@ 2026-05-22  6:45                       ` Stefano Brivio
  0 siblings, 0 replies; 26+ messages in thread
From: Stefano Brivio @ 2026-05-22  6:45 UTC (permalink / raw)
  To: David GIbson; +Cc: Laurent Vivier, passt-dev, Jon Maloy

On Fri, 22 May 2026 16:36:52 +1000
David GIbson <david@gibson.dropbear.id.au> wrote:

> On Fri, May 22, 2026 at 08:23:50AM +0200, Stefano Brivio wrote:
> > On Fri, 22 May 2026 16:15:08 +1000
> > David GIbson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Fri, May 22, 2026 at 07:44:56AM +0200, Stefano Brivio wrote:  
> > > > On Fri, 22 May 2026 06:22:39 +0200
> > > > Stefano Brivio <sbrivio@redhat.com> wrote:
> > > >     
> > > > > On Fri, 22 May 2026 01:13:33 +0200
> > > > > Laurent Vivier <lvivier@redhat.com> wrote:
> > > > >     
> > > > > > On 5/21/26 10:30, Laurent Vivier wrote:      
> > > > > > > On 5/20/26 22:53, Stefano Brivio wrote:        
> > > > > > >> On Wed, 20 May 2026 18:18:52 +0200
> > > > > > >> Stefano Brivio <sbrivio@redhat.com> wrote:
> > > > > > >>        
> > > > > > >>> On Wed, 20 May 2026 18:07:08 +0200
> > > > > > >>> Stefano Brivio <sbrivio@redhat.com> wrote:
> > > > > > >>>        
> > > > > > >>>> On Wed, 20 May 2026 17:34:45 +0200
> > > > > > >>>> Stefano Brivio <sbrivio@redhat.com> wrote:        
> > > > > > >>>>> On Wed, 13 May 2026 13:52:08 +0200
> > > > > > >>>>> Laurent Vivier <lvivier@redhat.com> wrote:        
> > > > > > >>>>>> Currently, the vhost-user path assumes each virtqueue element contains
> > > > > > >>>>>> exactly one iovec entry covering the entire frame.  This assumption
> > > > > > >>>>>> breaks as some virtio-net drivers (notably iPXE) provide descriptors where the
> > > > > > >>>>>> vnet header and the frame payload are in separate buffers, resulting in
> > > > > > >>>>>> two iovec entries per virtqueue element.
> > > > > > >>>>>>
> > > > > > >>>>>> This series refactors the vhost-user data path so that frame lengths,
> > > > > > >>>>>> header sizes, and padding are tracked and passed explicitly rather than
> > > > > > >>>>>> being derived from iovec sizes.  This decoupling is a prerequisite for
> > > > > > >>>>>> correctly handling padding of multi-buffer frames.        
> > > > > > >>>>>
> > > > > > >>>>> Sorry to bring (likely) bad news, but this series seems to introduce a
> > > > > > >>>>> regression: I got the migration/rampstream_in tests fail twice in a
> > > > > > >>>>> row, which I've never saw happening (I think I saw a single failure a
> > > > > > >>>>> long time ago when the machine had a high CPU load, but nothing else).
> > > > > > >>>>>
> > > > > > >>>>> I'm currently bisecting and the bisect seems to point towards the end
> > > > > > >>>>> of the series (probably 10/10), but I haven't finished yet. I'll keep
> > > > > > >>>>> you posted. I haven't spotted anything that might cause issues there.        
> > > > > > >>>>
> > > > > > >>>> Yeah, that's the one :(
> > > > > > >>>>
> > > > > > >>>> $ git bisect bad
> > > > > > >>>> db798fc60f4c5869cb53168354e068fb4dabd91a is the first bad commit
> > > > > > >>>> commit db798fc60f4c5869cb53168354e068fb4dabd91a
> > > > > > >>>> Author: Laurent Vivier <lvivier@redhat.com>
> > > > > > >>>> Date:   Wed May 13 13:52:18 2026 +0200
> > > > > > >>>>
> > > > > > >>>>      vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad()        
> > > > > > > 
> > > > > > > I checked on my system with the commit previous to this series,
> > > > > > > bcc3d37a6e01 ("util: Fix changes to assert_with_msg()") and rampstream_in fails too (not 
> > > > > > > everytime).
> > > > > > >         
> > > > > > >  > TCP/IPv4: sequence check, ramps, inbound        
> > > > > > > ...failed.
> > > > > > > 
> > > > > > > and rampstream_out hangs sometime too.
> > > > > > > 
> > > > > > > I'm going to try with ealier commits.        
> > > > > > 
> > > > > > For me the problem can happen with any commit...
> > > > > > 
> > > > > > As it depends on the execution path and on the load and speed of the system it looks like 
> > > > > > a race condition.      
> > > > > 
> > > > > Hah, thanks for checking. Maybe...
> > > > >     
> > > > > > Did you try to test on a host with a kernel patched with
> > > > > > "[PATCH net v2 0/2] Fix race condition between TCP_REPAIR dump and data receive" ?      
> > > > > 
> > > > > Now I tried, and yes, the test doesn't hang anymore! I seem to have an
> > > > > issue with teardown functions on recent kernels (current net.git HEAD
> > > > > more or less):
> > > > > 
> > > > > ---
> > > > > + teardown_migrate
> > > > > + cat /tmp/passt-tests-VVtLn0/migrate/qemu_1.pid
> > > > > + /home/sbrivio/passt/test/nstool exec /tmp/passt-tests-VVtLn0/migrate/ns1.hold -- kill 16
> > > > > qemu-system-x86_64: terminating on signal 15 from pid 34 ()
> > > > > + cat /tmp/passt-tests-VVtLn0/migrate/qemu_2.pid
> > > > > + /home/sbrivio/passt/test/nstool exec /tmp/passt-tests-VVtLn0/migrate/ns1.hold -- kill 15
> > > > > 18.8974: ================ Vhost user message ================
> > > > > 18.8974: Request: VHOST_USER_GET_VRING_BASE (11)
> > > > > 18.8974: Flags:   0x1
> > > > > 18.8974: Size:    8
> > > > > 18.8974: State.index: 0
> > > > > 18.8975: ================ Vhost user message ================
> > > > > 18.8975: Request: VHOST_USER_GET_VRING_BASE (11)
> > > > > 18.8975: Flags:   0x1
> > > > > 18.8975: Size:    8
> > > > > 18.8975: State.index: 1
> > > > > qemu-system-x86_64: terminating on signal 15 from pid 35 ()
> > > > > 18.7961: Client connection closed
> > > > > 18.7962: Closing TCP_REPAIR helper socket
> > > > > + context_wait qemu_1
> > > > > + __name=qemu_1
> > > > > + __pidfile=/tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
> > > > > + cat /tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
> > > > > + rc=0
> > > > > + rm /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_2.stdout.9pwpVbQr /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_2.stderr.dSY5hBu1
> > > > > + __pid=67766
> > > > > + rm /tmp/passt-tests-VVtLn0/migrate/context_qemu_1.pid
> > > > > + [ 1 -eq 1 ]
> > > > > + echo [Exit code: 0]
> > > > > + echo -n passt_repair_2$ 
> > > > > + return 0
> > > > > 18.9016: Client connection closed
> > > > > 18.9018: Closing TCP_REPAIR helper socket
> > > > > + wait 67766
> > > > > + rc=0
> > > > > + rm /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_1.stdout.JEyDGxXe /tmp/passt-tests-VVtLn0/migrate/context_passt_repair_1.stderr.WU550iEI
> > > > > + [ 1 -eq 1 ]
> > > > > + echo [Exit code: 0]
> > > > > + echo -n passt_repair_1$ 
> > > > > + return 0
> > > > > + rc=0
> > > > > + rm /tmp/passt-tests-VVtLn0/migrate/context_qemu_2.stdout.Dm8EAhfl /tmp/passt-tests-VVtLn0/migrate/context_qemu_2.stderr.207qJYPA
> > > > > + [ 1 -eq 1 ]
> > > > > + echo [Exit code: 0]
> > > > > + echo -n qemu_2$ 
> > > > > + return 0
> > > > > 2026/05/22 04:08:23 socat[73089] E connect(5, AF=40 cid:94558 port:22, 16): Connection timed out
> > > > > Connection closed by UNKNOWN port 65535
> > > > > ...
> > > > > ---
> > > > > 
> > > > > it looks like we stop QEMU a bit too early. But it should be unrelated.
> > > > > 
> > > > > I'm now trying to find some kind of workaround for existing (not fixed)
> > > > > kernel versions. Maybe stopping rampstream_in for a moment or something
> > > > > like that.    
> > > > 
> > > > For some weird reason even very blatant throttling (100 ms - 1 s delays
> > > > every 10000 ramps, or an explicit 500 ms pause via signal before
> > > > migration) doesn't help.
> > > > 
> > > > So it doesn't seem to be *that* kind of race. I should probably check
> > > > the same exact kernel version with fix and without...    
> > > 
> > > If it's due to the kernel not stopping the queues on REPAIR, then the
> > > only real way to fix the test is to cut off the source machine's
> > > network before we trigger migration.  
> > 
> > Well, that's a rather complicated way to do it. One could simply stop
> > the traffic instead.  
> 
> I don't know that "simply" is quite so simple.  You can suspend the
> source of the data, but you need to wait a difficult to ascertain
> amount of time for that to make it to the guest, and all the acks to
> come back.

Looking at captures that parts seems to be around 1-2 ms, so I'm
waiting 100 ms.

> For rampstream_out it's worse: the source is in the guest which isn't
> supposed to know about the migration in advance, so you can't really
> stop it without stopping the guest's whole network.

But we don't have a problem with that one.

-- 
Stefano


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element
  2026-05-22  5:44               ` Stefano Brivio
  2026-05-22  6:15                 ` David GIbson
@ 2026-05-22 12:04                 ` Stefano Brivio
  1 sibling, 0 replies; 26+ messages in thread
From: Stefano Brivio @ 2026-05-22 12:04 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: passt-dev, Jon Maloy, David GIbson

On Fri, 22 May 2026 07:44:55 +0200
Stefano Brivio <sbrivio@redhat.com> wrote:

> On Fri, 22 May 2026 06:22:39 +0200
> Stefano Brivio <sbrivio@redhat.com> wrote:
> 
> > On Fri, 22 May 2026 01:13:33 +0200
> > Laurent Vivier <lvivier@redhat.com> wrote:
> >   
> > > On 5/21/26 10:30, Laurent Vivier wrote:    
> > > > On 5/20/26 22:53, Stefano Brivio wrote:      
> > > >> On Wed, 20 May 2026 18:18:52 +0200
> > > >> Stefano Brivio <sbrivio@redhat.com> wrote:
> > > >>      
> > > >>> On Wed, 20 May 2026 18:07:08 +0200
> > > >>> Stefano Brivio <sbrivio@redhat.com> wrote:
> > > >>>      
> > > >>>> On Wed, 20 May 2026 17:34:45 +0200
> > > >>>> Stefano Brivio <sbrivio@redhat.com> wrote:      
> > > >>>>> On Wed, 13 May 2026 13:52:08 +0200
> > > >>>>> Laurent Vivier <lvivier@redhat.com> wrote:      
> > > >>>>>> Currently, the vhost-user path assumes each virtqueue element contains
> > > >>>>>> exactly one iovec entry covering the entire frame.  This assumption
> > > >>>>>> breaks as some virtio-net drivers (notably iPXE) provide descriptors where the
> > > >>>>>> vnet header and the frame payload are in separate buffers, resulting in
> > > >>>>>> two iovec entries per virtqueue element.
> > > >>>>>>
> > > >>>>>> This series refactors the vhost-user data path so that frame lengths,
> > > >>>>>> header sizes, and padding are tracked and passed explicitly rather than
> > > >>>>>> being derived from iovec sizes.  This decoupling is a prerequisite for
> > > >>>>>> correctly handling padding of multi-buffer frames.      
> > > >>>>>
> > > >>>>> Sorry to bring (likely) bad news, but this series seems to introduce a
> > > >>>>> regression: I got the migration/rampstream_in tests fail twice in a
> > > >>>>> row, which I've never saw happening (I think I saw a single failure a
> > > >>>>> long time ago when the machine had a high CPU load, but nothing else).
> > > >>>>>
> > > >>>>> I'm currently bisecting and the bisect seems to point towards the end
> > > >>>>> of the series (probably 10/10), but I haven't finished yet. I'll keep
> > > >>>>> you posted. I haven't spotted anything that might cause issues there.      
> > > >>>>
> > > >>>> Yeah, that's the one :(
> > > >>>>
> > > >>>> $ git bisect bad
> > > >>>> db798fc60f4c5869cb53168354e068fb4dabd91a is the first bad commit
> > > >>>> commit db798fc60f4c5869cb53168354e068fb4dabd91a
> > > >>>> Author: Laurent Vivier <lvivier@redhat.com>
> > > >>>> Date:   Wed May 13 13:52:18 2026 +0200
> > > >>>>
> > > >>>>      vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad()      
> > > > 
> > > > I checked on my system with the commit previous to this series,
> > > > bcc3d37a6e01 ("util: Fix changes to assert_with_msg()") and rampstream_in fails too (not 
> > > > everytime).
> > > >       
> > > >  > TCP/IPv4: sequence check, ramps, inbound      
> > > > ...failed.
> > > > 
> > > > and rampstream_out hangs sometime too.
> > > > 
> > > > I'm going to try with ealier commits.      
> > > 
> > > For me the problem can happen with any commit...
> > > 
> > > As it depends on the execution path and on the load and speed of the system it looks like 
> > > a race condition.    
> > 
> > Hah, thanks for checking. Maybe...
> >   
> > > Did you try to test on a host with a kernel patched with
> > > "[PATCH net v2 0/2] Fix race condition between TCP_REPAIR dump and data receive" ?    
> > 
> > Now I tried, and yes, the test doesn't hang anymore! I seem to have an
> > issue with teardown functions on recent kernels (current net.git HEAD
> > more or less):
> > 
> > ---
> > [...]
> >
> > 2026/05/22 04:08:23 socat[73089] E connect(5, AF=40 cid:94558 port:22, 16): Connection timed out
> > Connection closed by UNKNOWN port 65535
> > ...
> > ---
> > 
> > it looks like we stop QEMU a bit too early. But it should be unrelated.

Oops, I forgot to upgrade QEMU on the virtual machine I was using to
test those kernel builds, I had a somewhat outdated 8.1 version and it
failed migration for unrelated reasons. It works with 11.0.

Back to kernel versions: the "problem" is that with a recent
net-next.git HEAD, with or without my fix, in a nested VM, the test
always passes (20/20). And I can't easily test things non-nested.

I guess could just skip that test for the moment from the set I run git
push, and run it manually in the virtual machine, for the moment.

But judging from captures (test_logs/pasta_1.pcap from PCAP=1 ./run)
I'm fairly sure it's not *that* issue:

  465  12.141763    192.0.2.1 → 88.198.0.164 58451 TCP [TCP Window Full] 34416 → 10001 [PSH, ACK] Seq=10002100 Ack=1 Win=65536 Len=58397 
  466  12.187195 88.198.0.164 → 192.0.2.1    54 TCP [TCP ZeroWindow] 10001 → 34416 [ACK] Seq=1 Ack=10060497 Win=0 Len=0 
  467  13.187281    192.0.2.1 → 88.198.0.164 4150 TCP 34416 → 10001 [PSH, ACK] Seq=10060497 Ack=1 Win=65536 Len=4096 

last data transfer from client (rampstream):

  468  13.187358 88.198.0.164 → 192.0.2.1    54 TCP [TCP ZeroWindow] 10001 → 34416 [ACK] Seq=1 Ack=10060497 Win=0 Len=0 

everything acknowledged, migration starts now:

  469  14.143217 fe80::f471:c3ff:fe10:4e45 → ff02::2      70 ICMPv6 Router Solicitation from f6:71:c3:10:4e:45 
  470  14.687123 88.198.0.164 → 192.0.2.1    54 TCP [TCP ZeroWindow] [TCP Keep-Alive] 10001 → 34416 [ACK] Seq=0 Ack=10060497 Win=0 Len=0 

migration completed: and we acknowledge the right sequence (10060497),
so it didn't jump forward.

But starting from this point:

  471  14.687265    192.0.2.1 → 88.198.0.164 60 TCP 34416 → 10001 [ACK] Seq=10060497 Ack=1 Win=65536 Len=0 
  472  16.687412    192.0.2.1 → 88.198.0.164 4150 TCP [TCP Retransmission] 34416 → 10001 [PSH, ACK] Seq=10060497 Ack=1 Win=65536 Len=4096 
  473  16.687450 88.198.0.164 → 192.0.2.1    54 TCP [TCP ZeroWindow] 10001 → 34416 [ACK] Seq=1 Ack=10060497 Win=0 Len=0 
  474  20.687650    192.0.2.1 → 88.198.0.164 4150 TCP [TCP Retransmission] 34416 → 10001 [PSH, ACK] Seq=10060497 Ack=1 Win=65536 Len=4096 
  475  20.687692 88.198.0.164 → 192.0.2.1    54 TCP [TCP ZeroWindow] 10001 → 34416 [ACK] Seq=1 Ack=10060497 Win=0 Len=0 
  476  28.687817    192.0.2.1 → 88.198.0.164 4150 TCP [TCP Retransmission] 34416 → 10001 [PSH, ACK] Seq=10060497 Ack=1 Win=65536 Len=4096 

we keep advertising a zero window (that's the kernel doing it really),
as if we were unable to dequeue data.

I enabled --trace just for the target instance of passt, and I don't
see anything suspicious there:

13.0958: Receiving 1 flows
13.0958: Flow 0 (NEW): FREE -> NEW
13.0958: Flow 0 (TCP connection): TGT -> TYPED
13.0958: Flow 0 (TCP connection): HOST [192.0.2.1]:49892 -> [88.198.0.164]:10001 => TAP [192.0.2.1]:49892 -> [88.198.0.164]:10001
13.0958: Flow 0 (TCP connection): Side 1 hash table insert: bucket: 138154
13.0958: Flow 0 (TCP connection): TYPED -> ACTIVE
13.0958: Flow 0 (TCP connection): HOST [192.0.2.1]:49892 -> [88.198.0.164]:10001 => TAP [192.0.2.1]:49892 -> [88.198.0.164]:10001
13.0959: Flow 0 (TCP connection): Extended migration data, socket 83 sequences send 3121929544 receive 1643895001
13.0959: Flow 0 (TCP connection):   pending queues: send 0 not sent 0 receive 3500081
13.0959: Flow 0 (TCP connection):   window: snd_wl1 1647395082 snd_wnd 65536 max 65536 rcv_wnd 0 rcv_wup 1647395082
13.0959: Flow 0 (TCP connection):   SO_PEEK_OFF disabled  offset=0
13.0985: Got packet, but RX virtqueue not usable yet
13.0985: Closing migration channel, fd: 82
13.0985: Closing TCP_REPAIR helper socket
13.0985: passt: epoll event on vhost-user command socket 77 (events: 0x00000001)

then the usual VHOST_USER_CHECK_DEVICE_STATE and VHOST_USER_SET_VRING_ENABLE
commands. After that, a tight loop of:

13.0986: passt: epoll event on connected TCP socket 83 (events: 0x00000001)
13.0986: Got packet, but RX virtqueue not usable yet
13.0986: passt: epoll event on connected TCP socket 83 (events: 0x00000001)
13.0986: Got packet, but RX virtqueue not usable yet

until we go further with the vhost-user setup. I still see this message
which I had never noticed (but I didn't try to bisect around it):

13.1006: ================ Vhost user message ================
13.1006: Request: VHOST_USER_SET_VRING_ADDR (9)
[...]
13.1006: Last avail index != used index: 3252 != 3027

and then after VHOST_USER_SET_VRING_CALL, and:

13.1008: passt: epoll event on vhost-user kick socket 78 (events: 0x00000001)
13.1008: vhost-user: got kick_data: 0000000000000001 idx: 1

it's just a tight loop of:

13.1008: passt: epoll event on connected TCP socket 83 (events: 0x00000001)
13.1008: passt: epoll event on connected TCP socket 83 (events: 0x00000001)
13.1008: passt: epoll event on connected TCP socket 83 (events: 0x00000001)

as if we weren't dequeueing anything from there.

I start suspecting we might be hitting two different issues: perhaps
things fail on your setup because of the kernel bug with TCP_REPAIR not
freezing the queue, and they fail on my setup for some other reason.

For me it's very deterministic though: with patch 10/10 things always
fail, and without it they never fail.

I guess I'll add more prints and check for more messages before/after
that patch.

-- 
Stefano


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2026-05-22 12:04 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-05-13 11:52 [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 01/10] iov: Introduce iov_memset() Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 02/10] iov: Add iov_memcpy() to copy data between iovec arrays Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 03/10] vu_common: Move vnethdr setup into vu_flush() Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 04/10] udp_vu: Move virtqueue management from udp_vu_sock_recv() to its caller Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 05/10] udp_vu: Pass iov explicitly to helpers instead of using file-scoped array Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 06/10] checksum: Pass explicit L4 length to checksum functions Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 07/10] pcap: Pass explicit L2 length to pcap_iov() Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 08/10] vu_common: Pass explicit frame length to vu_flush() Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 09/10] tcp: Pass explicit data length to tcp_fill_headers() Laurent Vivier
2026-05-13 11:52 ` [PATCH v4 10/10] vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad() Laurent Vivier
2026-05-14  1:24   ` David Gibson
2026-05-20  0:52 ` [PATCH v4 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element Stefano Brivio
2026-05-20 15:34 ` Stefano Brivio
2026-05-20 16:07   ` Stefano Brivio
2026-05-20 16:18     ` Stefano Brivio
2026-05-20 20:53       ` Stefano Brivio
2026-05-21  8:30         ` Laurent Vivier
2026-05-21 23:13           ` Laurent Vivier
2026-05-22  4:22             ` Stefano Brivio
2026-05-22  5:44               ` Stefano Brivio
2026-05-22  6:15                 ` David GIbson
2026-05-22  6:23                   ` Stefano Brivio
2026-05-22  6:36                     ` David GIbson
2026-05-22  6:45                       ` Stefano Brivio
2026-05-22 12:04                 ` Stefano Brivio

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).