public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH v2 0/5] Pad all inbound frames to 802.3 minimum size if needed
@ 2025-12-05  0:51 Stefano Brivio
  2025-12-05  0:51 ` [PATCH v2 1/5] tap: Pad non-batched frames to 802.3 minimum (60 bytes) " Stefano Brivio
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Stefano Brivio @ 2025-12-05  0:51 UTC (permalink / raw)
  To: passt-dev; +Cc: David Gibson, Laurent Vivier

Patch 1/5 handles the easy non-batched case with a copy to a padded
buffer (only if needed).

Patches 2/5 and 3/5 clean coding style up before further changes.

Patch 4/5 deals with TCP and UDP batched frames in non-vhost-user modes.

Patch 5/5 is for batched frames in vhost-user mode instead.

v2:
  * in 1/5, calculate vnet_len *after* padding
  * in 5/5:
    - pass the right pointers to memset()
    - explicitly request at least ETH_ZLEN bytes from vu_collect()

Stefano Brivio (5):
  tap: Pad non-batched frames to 802.3 minimum (60 bytes) if needed
  tcp: Fix coding style for comment to enum tcp_iov_parts
  udp: Fix coding style for comment to enum udp_iov_idx
  tcp, udp: Pad batched frames to 60 bytes (802.3 minimum) in
    non-vhost-user modes
  tcp, udp: Pad batched frames for vhost-user modes to 60 bytes (802.3
    minimum)

 tap.c          | 11 ++++++++++-
 tcp.c          |  2 --
 tcp_buf.c      | 23 +++++++++++++++++++++++
 tcp_internal.h |  5 ++++-
 tcp_vu.c       | 32 ++++++++++++++++++++++++++++----
 udp.c          | 31 ++++++++++++++++++++++++++-----
 udp_vu.c       | 13 ++++++++++++-
 util.c         |  3 +++
 util.h         |  3 +++
 9 files changed, 109 insertions(+), 14 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/5] tap: Pad non-batched frames to 802.3 minimum (60 bytes) if needed
  2025-12-05  0:51 [PATCH v2 0/5] Pad all inbound frames to 802.3 minimum size if needed Stefano Brivio
@ 2025-12-05  0:51 ` Stefano Brivio
  2025-12-05  3:23   ` David Gibson
  2025-12-05  0:51 ` [PATCH v2 2/5] tcp: Fix coding style for comment to enum tcp_iov_parts Stefano Brivio
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Stefano Brivio @ 2025-12-05  0:51 UTC (permalink / raw)
  To: passt-dev; +Cc: David Gibson, Laurent Vivier

IEEE 802.3 requires a minimum frame payload of 46 bytes, without a
802.1Q tag. Add padding for the simple tap_send_single() case using
a zero-filled 60-byte buffer and copying data to it if needed.

In theory, we could add a further element in the iovec array, say:

	uint8_t padding[ETH_ZLEN] = { 0 };
	struct iovec iov[3];

	...

		if (l2len < ETH_ZLEN) {
			iov[iovcnt].iov_base = (void *)padding;
			iov[iovcnt].iov_len = ETH_ZLEN - l2len;
			iovcnt++;
		}

and avoid a copy, but that would substantially complicate the
vhost-user path, and it's questionable whether passing a reference
to a further buffer actually causes lower overhead than the simple
copy.

Link: https://bugs.passt.top/show_bug.cgi?id=166
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
---
 tap.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/tap.c b/tap.c
index 44b0644..e3ea61c 100644
--- a/tap.c
+++ b/tap.c
@@ -130,9 +130,18 @@ unsigned long tap_l2_max_len(const struct ctx *c)
  */
 void tap_send_single(const struct ctx *c, const void *data, size_t l2len)
 {
-	uint32_t vnet_len = htonl(l2len);
+	uint8_t padded[ETH_ZLEN] = { 0 };
 	struct iovec iov[2];
 	size_t iovcnt = 0;
+	uint32_t vnet_len;
+
+	if (l2len < ETH_ZLEN) {
+		memcpy(padded, data, l2len);
+		data = padded;
+		l2len = ETH_ZLEN;
+	}
+
+	vnet_len = htonl(l2len);
 
 	switch (c->mode) {
 	case MODE_PASST:
-- 
2.43.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 2/5] tcp: Fix coding style for comment to enum tcp_iov_parts
  2025-12-05  0:51 [PATCH v2 0/5] Pad all inbound frames to 802.3 minimum size if needed Stefano Brivio
  2025-12-05  0:51 ` [PATCH v2 1/5] tap: Pad non-batched frames to 802.3 minimum (60 bytes) " Stefano Brivio
@ 2025-12-05  0:51 ` Stefano Brivio
  2025-12-05  0:51 ` [PATCH v2 3/5] udp: Fix coding style for comment to enum udp_iov_idx Stefano Brivio
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Stefano Brivio @ 2025-12-05  0:51 UTC (permalink / raw)
  To: passt-dev; +Cc: David Gibson, Laurent Vivier

...as I'm going to add a new value to it.

Fixes: 95601237ef82 ("tcp: Replace TCP buffer structure by an iovec array")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
---
 tcp_internal.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcp_internal.h b/tcp_internal.h
index f55025c..19e8922 100644
--- a/tcp_internal.h
+++ b/tcp_internal.h
@@ -57,7 +57,7 @@
 #define CONN_V4(conn)		(!!inany_v4(&TAPFLOW(conn)->oaddr))
 #define CONN_V6(conn)		(!CONN_V4(conn))
 
-/*
+/**
  * enum tcp_iov_parts - I/O vector parts for one TCP frame
  * @TCP_IOV_TAP		tap backend specific header
  * @TCP_IOV_ETH		Ethernet header
-- 
2.43.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 3/5] udp: Fix coding style for comment to enum udp_iov_idx
  2025-12-05  0:51 [PATCH v2 0/5] Pad all inbound frames to 802.3 minimum size if needed Stefano Brivio
  2025-12-05  0:51 ` [PATCH v2 1/5] tap: Pad non-batched frames to 802.3 minimum (60 bytes) " Stefano Brivio
  2025-12-05  0:51 ` [PATCH v2 2/5] tcp: Fix coding style for comment to enum tcp_iov_parts Stefano Brivio
@ 2025-12-05  0:51 ` Stefano Brivio
  2025-12-05  0:51 ` [PATCH v2 4/5] tcp, udp: Pad batched frames to 60 bytes (802.3 minimum) in non-vhost-user modes Stefano Brivio
  2025-12-05  0:51 ` [PATCH v2 5/5] tcp, udp: Pad batched frames for vhost-user modes to 60 bytes (802.3 minimum) Stefano Brivio
  4 siblings, 0 replies; 11+ messages in thread
From: Stefano Brivio @ 2025-12-05  0:51 UTC (permalink / raw)
  To: passt-dev; +Cc: David Gibson, Laurent Vivier

...as I'm going to add a new value to it.

Fixes: 3f9bd867b585 ("udp: Split tap-bound UDP packets into multiple buffers using io vector")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
---
 udp.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/udp.c b/udp.c
index 4b625b7..b93c18b 100644
--- a/udp.c
+++ b/udp.c
@@ -164,11 +164,11 @@ udp_meta[UDP_MAX_FRAMES];
 
 /**
  * enum udp_iov_idx - Indices for the buffers making up a single UDP frame
- * @UDP_IOV_TAP         tap specific header
- * @UDP_IOV_ETH         Ethernet header
- * @UDP_IOV_IP          IP (v4/v6) header
- * @UDP_IOV_PAYLOAD     IP payload (UDP header + data)
- * @UDP_NUM_IOVS        the number of entries in the iovec array
+ * @UDP_IOV_TAP		tap specific header
+ * @UDP_IOV_ETH		Ethernet header
+ * @UDP_IOV_IP		IP (v4/v6) header
+ * @UDP_IOV_PAYLOAD	IP payload (UDP header + data)
+ * @UDP_NUM_IOVS	the number of entries in the iovec array
  */
 enum udp_iov_idx {
 	UDP_IOV_TAP,
-- 
2.43.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 4/5] tcp, udp: Pad batched frames to 60 bytes (802.3 minimum) in non-vhost-user modes
  2025-12-05  0:51 [PATCH v2 0/5] Pad all inbound frames to 802.3 minimum size if needed Stefano Brivio
                   ` (2 preceding siblings ...)
  2025-12-05  0:51 ` [PATCH v2 3/5] udp: Fix coding style for comment to enum udp_iov_idx Stefano Brivio
@ 2025-12-05  0:51 ` Stefano Brivio
  2025-12-05  5:48   ` Stefano Brivio
  2025-12-05  0:51 ` [PATCH v2 5/5] tcp, udp: Pad batched frames for vhost-user modes to 60 bytes (802.3 minimum) Stefano Brivio
  4 siblings, 1 reply; 11+ messages in thread
From: Stefano Brivio @ 2025-12-05  0:51 UTC (permalink / raw)
  To: passt-dev; +Cc: David Gibson, Laurent Vivier

Add a further iovec frame part, TCP_IOV_ETH_PAD for TCP and
UDP_IOV_ETH_PAD for UDP, after the payload, make that point to a
zero-filled buffer, and send out a part of it if needed to reach
the minimum frame length given by 802.3, that is, 60 bytes altogether.

The frames we might need to pad are IPv4 only (the IPv6 header is
larger), and are typically TCP ACK segments but can also be small
data segments or datagrams.

Link: https://bugs.passt.top/show_bug.cgi?id=166
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
---
 tcp_buf.c      | 23 +++++++++++++++++++++++
 tcp_internal.h |  2 ++
 udp.c          | 21 +++++++++++++++++++++
 util.c         |  3 +++
 util.h         |  3 +++
 5 files changed, 52 insertions(+)

diff --git a/tcp_buf.c b/tcp_buf.c
index 2058225..5d419d3 100644
--- a/tcp_buf.c
+++ b/tcp_buf.c
@@ -96,6 +96,7 @@ void tcp_sock_iov_init(const struct ctx *c)
 		iov[TCP_IOV_TAP] = tap_hdr_iov(c, &tcp_payload_tap_hdr[i]);
 		iov[TCP_IOV_ETH].iov_len = sizeof(struct ethhdr);
 		iov[TCP_IOV_PAYLOAD].iov_base = &tcp_payload[i];
+		iov[TCP_IOV_ETH_PAD].iov_base = eth_pad;
 	}
 }
 
@@ -144,6 +145,22 @@ void tcp_payload_flush(const struct ctx *c)
 	tcp_payload_used = 0;
 }
 
+/**
+ * tcp_l2_buf_pad() - Calculate padding to send out of padding (zero) buffer
+ * @iov:	Pointer to iovec of frame parts we're about to send
+ */
+static void tcp_l2_buf_pad(struct iovec *iov)
+{
+	size_t l2len = iov[TCP_IOV_ETH].iov_len +
+		       iov[TCP_IOV_IP].iov_len +
+		       iov[TCP_IOV_PAYLOAD].iov_len;
+
+	if (l2len < ETH_ZLEN)
+		iov[TCP_IOV_ETH_PAD].iov_len = ETH_ZLEN - l2len;
+	else
+		iov[TCP_IOV_ETH_PAD].iov_len = 0;
+}
+
 /**
  * tcp_l2_buf_fill_headers() - Fill 802.3, IP, TCP headers in pre-cooked buffers
  * @c:		Execution context
@@ -212,6 +229,8 @@ int tcp_buf_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
 	iov[TCP_IOV_PAYLOAD].iov_len = l4len;
 	tcp_l2_buf_fill_headers(c, conn, iov, NULL, seq, false);
 
+	tcp_l2_buf_pad(iov);
+
 	if (flags & DUP_ACK) {
 		struct iovec *dup_iov = tcp_l2_iov[tcp_payload_used];
 		tcp_frame_conns[tcp_payload_used++] = conn;
@@ -223,6 +242,7 @@ int tcp_buf_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
 		memcpy(dup_iov[TCP_IOV_PAYLOAD].iov_base,
 		       iov[TCP_IOV_PAYLOAD].iov_base, l4len);
 		dup_iov[TCP_IOV_PAYLOAD].iov_len = l4len;
+		dup_iov[TCP_IOV_ETH_PAD].iov_len = iov[TCP_IOV_ETH_PAD].iov_len;
 	}
 
 	if (tcp_payload_used > TCP_FRAMES_MEM - 2)
@@ -270,6 +290,9 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn,
 	payload->th.psh = push;
 	iov[TCP_IOV_PAYLOAD].iov_len = dlen + sizeof(struct tcphdr);
 	tcp_l2_buf_fill_headers(c, conn, iov, check, seq, false);
+
+	tcp_l2_buf_pad(iov);
+
 	if (++tcp_payload_used > TCP_FRAMES_MEM - 1)
 		tcp_payload_flush(c);
 }
diff --git a/tcp_internal.h b/tcp_internal.h
index 19e8922..5f8fb35 100644
--- a/tcp_internal.h
+++ b/tcp_internal.h
@@ -63,6 +63,7 @@
  * @TCP_IOV_ETH		Ethernet header
  * @TCP_IOV_IP		IP (v4/v6) header
  * @TCP_IOV_PAYLOAD	IP payload (TCP header + data)
+ * @TCP_IOV_ETH_PAD	Ethernet (802.3) padding to 60 bytes
  * @TCP_NUM_IOVS 	the number of entries in the iovec array
  */
 enum tcp_iov_parts {
@@ -70,6 +71,7 @@ enum tcp_iov_parts {
 	TCP_IOV_ETH	= 1,
 	TCP_IOV_IP	= 2,
 	TCP_IOV_PAYLOAD	= 3,
+	TCP_IOV_ETH_PAD	= 4,
 	TCP_NUM_IOVS
 };
 
diff --git a/udp.c b/udp.c
index b93c18b..f32f553 100644
--- a/udp.c
+++ b/udp.c
@@ -168,6 +168,7 @@ udp_meta[UDP_MAX_FRAMES];
  * @UDP_IOV_ETH		Ethernet header
  * @UDP_IOV_IP		IP (v4/v6) header
  * @UDP_IOV_PAYLOAD	IP payload (UDP header + data)
+ * @UDP_IOV_ETH_PAD	Ethernet (802.3) padding to 60 bytes
  * @UDP_NUM_IOVS	the number of entries in the iovec array
  */
 enum udp_iov_idx {
@@ -175,6 +176,7 @@ enum udp_iov_idx {
 	UDP_IOV_ETH,
 	UDP_IOV_IP,
 	UDP_IOV_PAYLOAD,
+	UDP_IOV_ETH_PAD,
 	UDP_NUM_IOVS,
 };
 
@@ -239,6 +241,7 @@ static void udp_iov_init_one(const struct ctx *c, size_t i)
 	tiov[UDP_IOV_ETH] = IOV_OF_LVALUE(udp_eth_hdr[i]);
 	tiov[UDP_IOV_TAP] = tap_hdr_iov(c, &meta->taph);
 	tiov[UDP_IOV_PAYLOAD].iov_base = payload;
+	tiov[UDP_IOV_ETH_PAD].iov_base = eth_pad;
 
 	mh->msg_iov	= siov;
 	mh->msg_iovlen	= 1;
@@ -344,6 +347,22 @@ size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp,
 	return l4len;
 }
 
+/**
+ * udp_tap_pad() - Calculate padding to send out of padding (zero) buffer
+ * @iov:	Pointer to iovec of frame parts we're about to send
+ */
+static void udp_tap_pad(struct iovec *iov)
+{
+	size_t l2len = iov[UDP_IOV_ETH].iov_len +
+		       iov[UDP_IOV_IP].iov_len +
+		       iov[UDP_IOV_PAYLOAD].iov_len;
+
+	if (l2len < ETH_ZLEN)
+		iov[UDP_IOV_ETH_PAD].iov_len = ETH_ZLEN - l2len;
+	else
+		iov[UDP_IOV_ETH_PAD].iov_len = 0;
+}
+
 /**
  * udp_tap_prepare() - Convert one datagram into a tap frame
  * @mmh:	Receiving mmsghdr array
@@ -379,6 +398,8 @@ static void udp_tap_prepare(const struct mmsghdr *mmh,
 		(*tap_iov)[UDP_IOV_IP] = IOV_OF_LVALUE(bm->ip4h);
 	}
 	(*tap_iov)[UDP_IOV_PAYLOAD].iov_len = l4len;
+
+	udp_tap_pad(*tap_iov);
 }
 
 /**
diff --git a/util.c b/util.c
index 590373d..d2039df 100644
--- a/util.c
+++ b/util.c
@@ -39,6 +39,9 @@
 #include <sys/random.h>
 #endif
 
+/* Zero-filled buffer to pad 802.3 frames, up to 60 (ETH_ZLEN) bytes */
+uint8_t eth_pad[ETH_ZLEN] = { 0 };
+
 /**
  * sock_l4_() - Create and bind socket to socket address
  * @c:		Execution context
diff --git a/util.h b/util.h
index 40de694..326012c 100644
--- a/util.h
+++ b/util.h
@@ -17,6 +17,7 @@
 #include <arpa/inet.h>
 #include <unistd.h>
 #include <sys/syscall.h>
+#include <net/ethernet.h>
 
 #include "log.h"
 
@@ -152,6 +153,8 @@ void abort_with_msg(const char *fmt, ...)
 #define ntohll(x)		(be64toh((x)))
 #define htonll(x)		(htobe64((x)))
 
+extern uint8_t eth_pad[ETH_ZLEN];
+
 /**
  * ntohl_unaligned() - Read 32-bit BE value from a possibly unaligned address
  * @p:		Pointer to the BE value in memory
-- 
2.43.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 5/5] tcp, udp: Pad batched frames for vhost-user modes to 60 bytes (802.3 minimum)
  2025-12-05  0:51 [PATCH v2 0/5] Pad all inbound frames to 802.3 minimum size if needed Stefano Brivio
                   ` (3 preceding siblings ...)
  2025-12-05  0:51 ` [PATCH v2 4/5] tcp, udp: Pad batched frames to 60 bytes (802.3 minimum) in non-vhost-user modes Stefano Brivio
@ 2025-12-05  0:51 ` Stefano Brivio
  2025-12-05  4:07   ` David Gibson
  4 siblings, 1 reply; 11+ messages in thread
From: Stefano Brivio @ 2025-12-05  0:51 UTC (permalink / raw)
  To: passt-dev; +Cc: David Gibson, Laurent Vivier

For both TCP and UDP, we request vhost-user buffers that are large
enough to reach ETH_ZLEN (60 bytes), so padding is just a matter of
increasing the appropriate iov_len and clearing bytes in the buffer
as needed.

Link: https://bugs.passt.top/show_bug.cgi?id=166
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 tcp.c          |  2 --
 tcp_internal.h |  1 +
 tcp_vu.c       | 32 ++++++++++++++++++++++++++++----
 udp_vu.c       | 13 ++++++++++++-
 4 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/tcp.c b/tcp.c
index c5486bc..8cd062f 100644
--- a/tcp.c
+++ b/tcp.c
@@ -341,8 +341,6 @@ enum {
 };
 #endif
 
-/* MSS rounding: see SET_MSS() */
-#define MSS_DEFAULT			536
 #define WINDOW_DEFAULT			14600		/* RFC 6928 */
 
 #define RTO_INIT			1		/* s, RFC 6298 */
diff --git a/tcp_internal.h b/tcp_internal.h
index 5f8fb35..d2295c9 100644
--- a/tcp_internal.h
+++ b/tcp_internal.h
@@ -12,6 +12,7 @@
 #define BUF_DISCARD_SIZE	(1 << 20)
 #define DISCARD_IOV_NUM		DIV_ROUND_UP(MAX_WINDOW, BUF_DISCARD_SIZE)
 
+#define MSS_DEFAULT /* and minimum */	536 /* as it comes from minimum MTU */
 #define MSS4				ROUND_DOWN(IP_MAX_MTU -		   \
 						   sizeof(struct tcphdr) - \
 						   sizeof(struct iphdr),   \
diff --git a/tcp_vu.c b/tcp_vu.c
index 1c81ce3..638813c 100644
--- a/tcp_vu.c
+++ b/tcp_vu.c
@@ -60,6 +60,26 @@ static size_t tcp_vu_hdrlen(bool v6)
 	return hdrlen;
 }
 
+/**
+ * tcp_vu_pad() - Pad 802.3 frame to minimum length (60 bytes) if needed
+ * @iov:	iovec array storing 802.3 frame with TCP segment inside
+ * @cnt:	Number of entries in @iov
+ */
+static void tcp_vu_pad(struct iovec *iov, size_t cnt)
+{
+	size_t l2len, pad;
+
+	ASSERT(iov_size(iov, cnt) >= sizeof(struct virtio_net_hdr_mrg_rxbuf));
+	l2len = iov_size(iov, cnt) - sizeof(struct virtio_net_hdr_mrg_rxbuf);
+	if (l2len >= ETH_ZLEN)
+		return;
+
+	pad = ETH_ZLEN - l2len;
+
+	memset((char *)iov[cnt - 1].iov_base + iov[cnt - 1].iov_len, 0, pad);
+	iov[cnt - 1].iov_len += pad;
+}
+
 /**
  * tcp_vu_send_flag() - Send segment with flags to vhost-user (no payload)
  * @c:		Execution context
@@ -91,12 +111,11 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
 	vu_set_element(&flags_elem[0], NULL, &flags_iov[0]);
 
 	elem_cnt = vu_collect(vdev, vq, &flags_elem[0], 1,
-			      hdrlen + sizeof(struct tcp_syn_opts), NULL);
+			      MAX(hdrlen + sizeof(*opts), ETH_ZLEN), NULL);
 	if (elem_cnt != 1)
 		return -1;
 
-	ASSERT(flags_elem[0].in_sg[0].iov_len >=
-	       hdrlen + sizeof(struct tcp_syn_opts));
+	ASSERT(flags_elem[0].in_sg[0].iov_len >= hdrlen + sizeof(*opts));
 
 	vu_set_vnethdr(vdev, flags_elem[0].in_sg[0].iov_base, 1);
 
@@ -138,6 +157,8 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
 	tcp_fill_headers(c, conn, NULL, eh, ip4h, ip6h, th, &payload,
 			 NULL, seq, !*c->pcap);
 
+	tcp_vu_pad(&flags_elem[0].in_sg[0], 1);
+
 	if (*c->pcap) {
 		pcap_iov(&flags_elem[0].in_sg[0], 1,
 			 sizeof(struct virtio_net_hdr_mrg_rxbuf));
@@ -211,7 +232,8 @@ static ssize_t tcp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq,
 
 		cnt = vu_collect(vdev, vq, &elem[elem_cnt],
 				 VIRTQUEUE_MAX_SIZE - elem_cnt,
-				 MIN(mss, fillsize) + hdrlen, &frame_size);
+				 MAX(MIN(mss, fillsize) + hdrlen, ETH_ZLEN),
+				 &frame_size);
 		if (cnt == 0)
 			break;
 
@@ -456,6 +478,8 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
 
 		tcp_vu_prepare(c, conn, iov, buf_cnt, &check, !*c->pcap, push);
 
+		tcp_vu_pad(iov, buf_cnt);
+
 		if (*c->pcap) {
 			pcap_iov(iov, buf_cnt,
 				 sizeof(struct virtio_net_hdr_mrg_rxbuf));
diff --git a/udp_vu.c b/udp_vu.c
index 099677f..33dbb9a 100644
--- a/udp_vu.c
+++ b/udp_vu.c
@@ -72,8 +72,8 @@ static int udp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq, int s,
 {
 	const struct vu_dev *vdev = c->vdev;
 	int iov_cnt, idx, iov_used;
+	size_t off, hdrlen, l2len;
 	struct msghdr msg  = { 0 };
-	size_t off, hdrlen;
 
 	ASSERT(!c->no_udp);
 
@@ -116,6 +116,17 @@ static int udp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq, int s,
 		iov_vu[idx].iov_len = off;
 	iov_used = idx + !!off;
 
+	/* pad 802.3 frame to 60 bytes if needed */
+	l2len = *dlen + hdrlen - sizeof(struct virtio_net_hdr_mrg_rxbuf);
+	if (l2len < ETH_ZLEN) {
+		size_t pad = ETH_ZLEN - l2len;
+
+		memset((char *)iov_vu[idx].iov_base + iov_vu[idx].iov_len,
+		       0, pad);
+
+		iov_vu[idx].iov_len += pad;
+	}
+
 	vu_set_vnethdr(vdev, iov_vu[0].iov_base, iov_used);
 
 	/* release unused buffers */
-- 
2.43.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/5] tap: Pad non-batched frames to 802.3 minimum (60 bytes) if needed
  2025-12-05  0:51 ` [PATCH v2 1/5] tap: Pad non-batched frames to 802.3 minimum (60 bytes) " Stefano Brivio
@ 2025-12-05  3:23   ` David Gibson
  0 siblings, 0 replies; 11+ messages in thread
From: David Gibson @ 2025-12-05  3:23 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Laurent Vivier

[-- Attachment #1: Type: text/plain, Size: 1942 bytes --]

On Fri, Dec 05, 2025 at 01:51:53AM +0100, Stefano Brivio wrote:
> IEEE 802.3 requires a minimum frame payload of 46 bytes, without a
> 802.1Q tag. Add padding for the simple tap_send_single() case using
> a zero-filled 60-byte buffer and copying data to it if needed.
> 
> In theory, we could add a further element in the iovec array, say:
> 
> 	uint8_t padding[ETH_ZLEN] = { 0 };
> 	struct iovec iov[3];
> 
> 	...
> 
> 		if (l2len < ETH_ZLEN) {
> 			iov[iovcnt].iov_base = (void *)padding;
> 			iov[iovcnt].iov_len = ETH_ZLEN - l2len;
> 			iovcnt++;
> 		}
> 
> and avoid a copy, but that would substantially complicate the
> vhost-user path, and it's questionable whether passing a reference
> to a further buffer actually causes lower overhead than the simple
> copy.
> 
> Link: https://bugs.passt.top/show_bug.cgi?id=166
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> Reviewed-by: Laurent Vivier <lvivier@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  tap.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/tap.c b/tap.c
> index 44b0644..e3ea61c 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -130,9 +130,18 @@ unsigned long tap_l2_max_len(const struct ctx *c)
>   */
>  void tap_send_single(const struct ctx *c, const void *data, size_t l2len)
>  {
> -	uint32_t vnet_len = htonl(l2len);
> +	uint8_t padded[ETH_ZLEN] = { 0 };
>  	struct iovec iov[2];
>  	size_t iovcnt = 0;
> +	uint32_t vnet_len;
> +
> +	if (l2len < ETH_ZLEN) {
> +		memcpy(padded, data, l2len);
> +		data = padded;
> +		l2len = ETH_ZLEN;
> +	}
> +
> +	vnet_len = htonl(l2len);
>  
>  	switch (c->mode) {
>  	case MODE_PASST:
> -- 
> 2.43.0
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 5/5] tcp, udp: Pad batched frames for vhost-user modes to 60 bytes (802.3 minimum)
  2025-12-05  0:51 ` [PATCH v2 5/5] tcp, udp: Pad batched frames for vhost-user modes to 60 bytes (802.3 minimum) Stefano Brivio
@ 2025-12-05  4:07   ` David Gibson
  2025-12-06  1:26     ` Stefano Brivio
  0 siblings, 1 reply; 11+ messages in thread
From: David Gibson @ 2025-12-05  4:07 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Laurent Vivier

[-- Attachment #1: Type: text/plain, Size: 5847 bytes --]

On Fri, Dec 05, 2025 at 01:51:57AM +0100, Stefano Brivio wrote:
> For both TCP and UDP, we request vhost-user buffers that are large
> enough to reach ETH_ZLEN (60 bytes), so padding is just a matter of
> increasing the appropriate iov_len and clearing bytes in the buffer
> as needed.
> 
> Link: https://bugs.passt.top/show_bug.cgi?id=166
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> ---
>  tcp.c          |  2 --
>  tcp_internal.h |  1 +
>  tcp_vu.c       | 32 ++++++++++++++++++++++++++++----
>  udp_vu.c       | 13 ++++++++++++-
>  4 files changed, 41 insertions(+), 7 deletions(-)
> 
> diff --git a/tcp.c b/tcp.c
> index c5486bc..8cd062f 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -341,8 +341,6 @@ enum {
>  };
>  #endif
>  
> -/* MSS rounding: see SET_MSS() */
> -#define MSS_DEFAULT			536
>  #define WINDOW_DEFAULT			14600		/* RFC 6928 */
>  
>  #define RTO_INIT			1		/* s, RFC 6298 */
> diff --git a/tcp_internal.h b/tcp_internal.h
> index 5f8fb35..d2295c9 100644
> --- a/tcp_internal.h
> +++ b/tcp_internal.h
> @@ -12,6 +12,7 @@
>  #define BUF_DISCARD_SIZE	(1 << 20)
>  #define DISCARD_IOV_NUM		DIV_ROUND_UP(MAX_WINDOW, BUF_DISCARD_SIZE)
>  
> +#define MSS_DEFAULT /* and minimum */	536 /* as it comes from minimum MTU */
>  #define MSS4				ROUND_DOWN(IP_MAX_MTU -		   \
>  						   sizeof(struct tcphdr) - \
>  						   sizeof(struct iphdr),   \
> diff --git a/tcp_vu.c b/tcp_vu.c
> index 1c81ce3..638813c 100644
> --- a/tcp_vu.c
> +++ b/tcp_vu.c
> @@ -60,6 +60,26 @@ static size_t tcp_vu_hdrlen(bool v6)
>  	return hdrlen;
>  }
>  
> +/**
> + * tcp_vu_pad() - Pad 802.3 frame to minimum length (60 bytes) if needed
> + * @iov:	iovec array storing 802.3 frame with TCP segment inside
> + * @cnt:	Number of entries in @iov
> + */
> +static void tcp_vu_pad(struct iovec *iov, size_t cnt)
> +{
> +	size_t l2len, pad;
> +
> +	ASSERT(iov_size(iov, cnt) >= sizeof(struct virtio_net_hdr_mrg_rxbuf));
> +	l2len = iov_size(iov, cnt) - sizeof(struct virtio_net_hdr_mrg_rxbuf);
> +	if (l2len >= ETH_ZLEN)
> +		return;
> +
> +	pad = ETH_ZLEN - l2len;
> +
> +	memset((char *)iov[cnt - 1].iov_base + iov[cnt - 1].iov_len, 0, pad);
> +	iov[cnt - 1].iov_len += pad;

This assumes there's enough allocated space in last buffer to extend
it this way.  I'm pretty sure that's true in practice, but it's not
super obvious from right here.

I can't see a way to avoid that without doing a substantial rework of
both paths.  But we should at least document the assumption in a
comment.

Also, there's not anything really TCP specific about this, so we
should be able to re-use it for UDP, no?

> +}
> +
>  /**
>   * tcp_vu_send_flag() - Send segment with flags to vhost-user (no payload)
>   * @c:		Execution context
> @@ -91,12 +111,11 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
>  	vu_set_element(&flags_elem[0], NULL, &flags_iov[0]);
>  
>  	elem_cnt = vu_collect(vdev, vq, &flags_elem[0], 1,
> -			      hdrlen + sizeof(struct tcp_syn_opts), NULL);
> +			      MAX(hdrlen + sizeof(*opts), ETH_ZLEN), NULL);
>  	if (elem_cnt != 1)
>  		return -1;
>  
> -	ASSERT(flags_elem[0].in_sg[0].iov_len >=
> -	       hdrlen + sizeof(struct tcp_syn_opts));
> +	ASSERT(flags_elem[0].in_sg[0].iov_len >= hdrlen + sizeof(*opts));
>  
>  	vu_set_vnethdr(vdev, flags_elem[0].in_sg[0].iov_base, 1);
>  
> @@ -138,6 +157,8 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)

A few lines above here is where we truncate iov_len[] to match the
actual length of the frame.  Replace that with a call to the
pad/truncate function.

>  	tcp_fill_headers(c, conn, NULL, eh, ip4h, ip6h, th, &payload,
>  			 NULL, seq, !*c->pcap);
>  
> +	tcp_vu_pad(&flags_elem[0].in_sg[0], 1);
> +
>  	if (*c->pcap) {
>  		pcap_iov(&flags_elem[0].in_sg[0], 1,
>  			 sizeof(struct virtio_net_hdr_mrg_rxbuf));
> @@ -211,7 +232,8 @@ static ssize_t tcp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq,
>  
>  		cnt = vu_collect(vdev, vq, &elem[elem_cnt],
>  				 VIRTQUEUE_MAX_SIZE - elem_cnt,
> -				 MIN(mss, fillsize) + hdrlen, &frame_size);
> +				 MAX(MIN(mss, fillsize) + hdrlen, ETH_ZLEN),
> +				 &frame_size);
>  		if (cnt == 0)
>  			break;
>  
> @@ -456,6 +478,8 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
>  
>  		tcp_vu_prepare(c, conn, iov, buf_cnt, &check, !*c->pcap, push);
>  
> +		tcp_vu_pad(iov, buf_cnt);
> +
>  		if (*c->pcap) {
>  			pcap_iov(iov, buf_cnt,
>  				 sizeof(struct virtio_net_hdr_mrg_rxbuf));
> diff --git a/udp_vu.c b/udp_vu.c
> index 099677f..33dbb9a 100644
> --- a/udp_vu.c
> +++ b/udp_vu.c
> @@ -72,8 +72,8 @@ static int udp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq, int s,
>  {
>  	const struct vu_dev *vdev = c->vdev;
>  	int iov_cnt, idx, iov_used;
> +	size_t off, hdrlen, l2len;
>  	struct msghdr msg  = { 0 };
> -	size_t off, hdrlen;
>  
>  	ASSERT(!c->no_udp);
>  
> @@ -116,6 +116,17 @@ static int udp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq, int s,
>  		iov_vu[idx].iov_len = off;
>  	iov_used = idx + !!off;
>  
> +	/* pad 802.3 frame to 60 bytes if needed */
> +	l2len = *dlen + hdrlen - sizeof(struct virtio_net_hdr_mrg_rxbuf);
> +	if (l2len < ETH_ZLEN) {
> +		size_t pad = ETH_ZLEN - l2len;
> +
> +		memset((char *)iov_vu[idx].iov_base + iov_vu[idx].iov_len,
> +		       0, pad);
> +
> +		iov_vu[idx].iov_len += pad;
> +	}
> +
>  	vu_set_vnethdr(vdev, iov_vu[0].iov_base, iov_used);
>  
>  	/* release unused buffers */
> -- 
> 2.43.0
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 4/5] tcp, udp: Pad batched frames to 60 bytes (802.3 minimum) in non-vhost-user modes
  2025-12-05  0:51 ` [PATCH v2 4/5] tcp, udp: Pad batched frames to 60 bytes (802.3 minimum) in non-vhost-user modes Stefano Brivio
@ 2025-12-05  5:48   ` Stefano Brivio
  2025-12-05 11:27     ` Stefano Brivio
  0 siblings, 1 reply; 11+ messages in thread
From: Stefano Brivio @ 2025-12-05  5:48 UTC (permalink / raw)
  To: passt-dev; +Cc: David Gibson, Laurent Vivier

On Fri,  5 Dec 2025 01:51:56 +0100
Stefano Brivio <sbrivio@redhat.com> wrote:

> Add a further iovec frame part, TCP_IOV_ETH_PAD for TCP and
> UDP_IOV_ETH_PAD for UDP, after the payload, make that point to a
> zero-filled buffer, and send out a part of it if needed to reach
> the minimum frame length given by 802.3, that is, 60 bytes altogether.
> 
> The frames we might need to pad are IPv4 only (the IPv6 header is
> larger), and are typically TCP ACK segments but can also be small
> data segments or datagrams.
> 
> Link: https://bugs.passt.top/show_bug.cgi?id=166
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> Reviewed-by: Laurent Vivier <lvivier@redhat.com>

...for some reason, in combination with the previous series with TCP
throughput fixes, this patch now seems to break basic transfers ("large
transfer", IPv4, guest to host), with passt only.

I'm fairly sure it didn't cause failures when I ran tests on this
series alone, at least twice, once now and once for the RFC version, but
it might also be that I missed running tests (in isolation) for some
reason.

No idea where the issue is, yet...

-- 
Stefano


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 4/5] tcp, udp: Pad batched frames to 60 bytes (802.3 minimum) in non-vhost-user modes
  2025-12-05  5:48   ` Stefano Brivio
@ 2025-12-05 11:27     ` Stefano Brivio
  0 siblings, 0 replies; 11+ messages in thread
From: Stefano Brivio @ 2025-12-05 11:27 UTC (permalink / raw)
  To: passt-dev; +Cc: David Gibson, Laurent Vivier

On Fri, 5 Dec 2025 06:48:42 +0100
Stefano Brivio <sbrivio@redhat.com> wrote:

> On Fri,  5 Dec 2025 01:51:56 +0100
> Stefano Brivio <sbrivio@redhat.com> wrote:
> 
> > Add a further iovec frame part, TCP_IOV_ETH_PAD for TCP and
> > UDP_IOV_ETH_PAD for UDP, after the payload, make that point to a
> > zero-filled buffer, and send out a part of it if needed to reach
> > the minimum frame length given by 802.3, that is, 60 bytes altogether.
> > 
> > The frames we might need to pad are IPv4 only (the IPv6 header is
> > larger), and are typically TCP ACK segments but can also be small
> > data segments or datagrams.
> > 
> > Link: https://bugs.passt.top/show_bug.cgi?id=166
> > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > Reviewed-by: Laurent Vivier <lvivier@redhat.com>  
> 
> ...for some reason, in combination with the previous series with TCP
> throughput fixes, this patch now seems to break basic transfers ("large
> transfer", IPv4, guest to host), with passt only.

Whoa, it turns out I didn't test this one at all, sorry for that, I
don't know how it happened.

This "clearly" needs:

---
diff --git a/tcp.c b/tcp.c
index c5486bc..7140b22 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1037,7 +1037,7 @@ void tcp_fill_headers(const struct ctx *c, struct tcp_tap_conn *conn,
 	else
 		tcp_update_csum(psum, th, payload);
 
-	tap_hdr_update(taph, l3len + sizeof(struct ethhdr));
+	tap_hdr_update(taph, MAX(l3len + sizeof(struct ethhdr), ETH_ZLEN));
 }
 
 /**
diff --git a/udp.c b/udp.c
index f32f553..08bec50 100644
--- a/udp.c
+++ b/udp.c
@@ -381,19 +381,25 @@ static void udp_tap_prepare(const struct mmsghdr *mmh,
 	struct ethhdr *eh = (*tap_iov)[UDP_IOV_ETH].iov_base;
 	struct udp_payload_t *bp = &udp_payload[idx];
 	struct udp_meta_t *bm = &udp_meta[idx];
-	size_t l4len;
+	size_t l4len, l2len;
 
 	eth_update_mac(eh, NULL, tap_omac);
 	if (!inany_v4(&toside->eaddr) || !inany_v4(&toside->oaddr)) {
 		l4len = udp_update_hdr6(&bm->ip6h, bp, toside,
 					mmh[idx].msg_len, no_udp_csum);
-		tap_hdr_update(&bm->taph, l4len + sizeof(bm->ip6h) + ETH_HLEN);
+
+		l2len = MAX(l4len + sizeof(bm->ip6h) + ETH_HLEN, ETH_ZLEN);
+		tap_hdr_update(&bm->taph, l2len);
+
 		eh->h_proto = htons_constant(ETH_P_IPV6);
 		(*tap_iov)[UDP_IOV_IP] = IOV_OF_LVALUE(bm->ip6h);
 	} else {
 		l4len = udp_update_hdr4(&bm->ip4h, bp, toside,
 					mmh[idx].msg_len, no_udp_csum);
-		tap_hdr_update(&bm->taph, l4len + sizeof(bm->ip4h) + ETH_HLEN);
+
+		l2len = MAX(l4len + sizeof(bm->ip4h) + ETH_HLEN, ETH_ZLEN);
+		tap_hdr_update(&bm->taph, l2len);
+
 		eh->h_proto = htons_constant(ETH_P_IP);
 		(*tap_iov)[UDP_IOV_IP] = IOV_OF_LVALUE(bm->ip4h);
 	}
---

on top. I'll respin the whole series in a bit (once I addressed David's
comments to 5/5 as well).

-- 
Stefano


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 5/5] tcp, udp: Pad batched frames for vhost-user modes to 60 bytes (802.3 minimum)
  2025-12-05  4:07   ` David Gibson
@ 2025-12-06  1:26     ` Stefano Brivio
  0 siblings, 0 replies; 11+ messages in thread
From: Stefano Brivio @ 2025-12-06  1:26 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Laurent Vivier

On Fri, 5 Dec 2025 15:07:27 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Fri, Dec 05, 2025 at 01:51:57AM +0100, Stefano Brivio wrote:
> > For both TCP and UDP, we request vhost-user buffers that are large
> > enough to reach ETH_ZLEN (60 bytes), so padding is just a matter of
> > increasing the appropriate iov_len and clearing bytes in the buffer
> > as needed.
> > 
> > Link: https://bugs.passt.top/show_bug.cgi?id=166
> > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> > ---
> >  tcp.c          |  2 --
> >  tcp_internal.h |  1 +
> >  tcp_vu.c       | 32 ++++++++++++++++++++++++++++----
> >  udp_vu.c       | 13 ++++++++++++-
> >  4 files changed, 41 insertions(+), 7 deletions(-)
> > 
> > diff --git a/tcp.c b/tcp.c
> > index c5486bc..8cd062f 100644
> > --- a/tcp.c
> > +++ b/tcp.c
> > @@ -341,8 +341,6 @@ enum {
> >  };
> >  #endif
> >  
> > -/* MSS rounding: see SET_MSS() */
> > -#define MSS_DEFAULT			536
> >  #define WINDOW_DEFAULT			14600		/* RFC 6928 */
> >  
> >  #define RTO_INIT			1		/* s, RFC 6298 */
> > diff --git a/tcp_internal.h b/tcp_internal.h
> > index 5f8fb35..d2295c9 100644
> > --- a/tcp_internal.h
> > +++ b/tcp_internal.h
> > @@ -12,6 +12,7 @@
> >  #define BUF_DISCARD_SIZE	(1 << 20)
> >  #define DISCARD_IOV_NUM		DIV_ROUND_UP(MAX_WINDOW, BUF_DISCARD_SIZE)
> >  
> > +#define MSS_DEFAULT /* and minimum */	536 /* as it comes from minimum MTU */
> >  #define MSS4				ROUND_DOWN(IP_MAX_MTU -		   \
> >  						   sizeof(struct tcphdr) - \
> >  						   sizeof(struct iphdr),   \
> > diff --git a/tcp_vu.c b/tcp_vu.c
> > index 1c81ce3..638813c 100644
> > --- a/tcp_vu.c
> > +++ b/tcp_vu.c
> > @@ -60,6 +60,26 @@ static size_t tcp_vu_hdrlen(bool v6)
> >  	return hdrlen;
> >  }
> >  
> > +/**
> > + * tcp_vu_pad() - Pad 802.3 frame to minimum length (60 bytes) if needed
> > + * @iov:	iovec array storing 802.3 frame with TCP segment inside
> > + * @cnt:	Number of entries in @iov
> > + */
> > +static void tcp_vu_pad(struct iovec *iov, size_t cnt)
> > +{
> > +	size_t l2len, pad;
> > +
> > +	ASSERT(iov_size(iov, cnt) >= sizeof(struct virtio_net_hdr_mrg_rxbuf));
> > +	l2len = iov_size(iov, cnt) - sizeof(struct virtio_net_hdr_mrg_rxbuf);
> > +	if (l2len >= ETH_ZLEN)
> > +		return;
> > +
> > +	pad = ETH_ZLEN - l2len;
> > +
> > +	memset((char *)iov[cnt - 1].iov_base + iov[cnt - 1].iov_len, 0, pad);
> > +	iov[cnt - 1].iov_len += pad;  
> 
> This assumes there's enough allocated space in last buffer to extend
> it this way.  I'm pretty sure that's true in practice, but it's not
> super obvious from right here.
> 
> I can't see a way to avoid that without doing a substantial rework of
> both paths.  But we should at least document the assumption in a
> comment.

Oops, that's actually not guaranteed. The only assumption we have is
that the first buffer is at least 54 bytes long for IPv4:

	/* we guess the first iovec provided by the guest can embed
	 * all the headers needed by L2 frame
	 */
	ASSERT(iov[0].iov_len >= hdrlen);

because as far as I understand there's no known vhost-user
implementation that would give us less than 1k or so, and we request at
least 60 bytes from vu_collect(), but that might lead for example to a
situation with iov[n].iov_len = < 54, 1, 5 > and we would write 6 bytes
to the last element.

I could have switched to something like a iov_from_buf() call from the
60-byte eth_pad[] I added in 4/5, but there's no version adjusting
iov_len, and we probably wouldn't need it for anything else.

So I changed approach altogether: I'll just ASSERT() that we have at
least 60 bytes in the first buffer. It's just 6 bytes on top in the
worst case, and still much less than any buffer we might actually see
in practice.

> Also, there's not anything really TCP specific about this, so we
> should be able to re-use it for UDP, no?

Renamed to vu_pad() and moved to vu_common.c.

> > +}
> > +
> >  /**
> >   * tcp_vu_send_flag() - Send segment with flags to vhost-user (no payload)
> >   * @c:		Execution context
> > @@ -91,12 +111,11 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
> >  	vu_set_element(&flags_elem[0], NULL, &flags_iov[0]);
> >  
> >  	elem_cnt = vu_collect(vdev, vq, &flags_elem[0], 1,
> > -			      hdrlen + sizeof(struct tcp_syn_opts), NULL);
> > +			      MAX(hdrlen + sizeof(*opts), ETH_ZLEN), NULL);
> >  	if (elem_cnt != 1)
> >  		return -1;
> >  
> > -	ASSERT(flags_elem[0].in_sg[0].iov_len >=
> > -	       hdrlen + sizeof(struct tcp_syn_opts));
> > +	ASSERT(flags_elem[0].in_sg[0].iov_len >= hdrlen + sizeof(*opts));
> >  
> >  	vu_set_vnethdr(vdev, flags_elem[0].in_sg[0].iov_base, 1);
> >  
> > @@ -138,6 +157,8 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)  
> 
> A few lines above here is where we truncate iov_len[] to match the
> actual length of the frame.  Replace that with a call to the
> pad/truncate function.

Ah, no need anymore, as the first buffer is now the only buffer that
might ever need padding.

> >  	tcp_fill_headers(c, conn, NULL, eh, ip4h, ip6h, th, &payload,
> >  			 NULL, seq, !*c->pcap);
> >  
> > +	tcp_vu_pad(&flags_elem[0].in_sg[0], 1);
> > +
> >  	if (*c->pcap) {
> >  		pcap_iov(&flags_elem[0].in_sg[0], 1,
> >  			 sizeof(struct virtio_net_hdr_mrg_rxbuf));
> > @@ -211,7 +232,8 @@ static ssize_t tcp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq,
> >  
> >  		cnt = vu_collect(vdev, vq, &elem[elem_cnt],
> >  				 VIRTQUEUE_MAX_SIZE - elem_cnt,
> > -				 MIN(mss, fillsize) + hdrlen, &frame_size);
> > +				 MAX(MIN(mss, fillsize) + hdrlen, ETH_ZLEN),
> > +				 &frame_size);
> >  		if (cnt == 0)
> >  			break;
> >  
> > @@ -456,6 +478,8 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
> >  
> >  		tcp_vu_prepare(c, conn, iov, buf_cnt, &check, !*c->pcap, push);
> >  
> > +		tcp_vu_pad(iov, buf_cnt);
> > +
> >  		if (*c->pcap) {
> >  			pcap_iov(iov, buf_cnt,
> >  				 sizeof(struct virtio_net_hdr_mrg_rxbuf));
> > diff --git a/udp_vu.c b/udp_vu.c
> > index 099677f..33dbb9a 100644
> > --- a/udp_vu.c
> > +++ b/udp_vu.c
> > @@ -72,8 +72,8 @@ static int udp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq, int s,
> >  {
> >  	const struct vu_dev *vdev = c->vdev;
> >  	int iov_cnt, idx, iov_used;
> > +	size_t off, hdrlen, l2len;
> >  	struct msghdr msg  = { 0 };
> > -	size_t off, hdrlen;
> >  
> >  	ASSERT(!c->no_udp);
> >  
> > @@ -116,6 +116,17 @@ static int udp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq, int s,
> >  		iov_vu[idx].iov_len = off;
> >  	iov_used = idx + !!off;
> >  
> > +	/* pad 802.3 frame to 60 bytes if needed */
> > +	l2len = *dlen + hdrlen - sizeof(struct virtio_net_hdr_mrg_rxbuf);
> > +	if (l2len < ETH_ZLEN) {
> > +		size_t pad = ETH_ZLEN - l2len;
> > +
> > +		memset((char *)iov_vu[idx].iov_base + iov_vu[idx].iov_len,
> > +		       0, pad);
> > +
> > +		iov_vu[idx].iov_len += pad;
> > +	}
> > +
> >  	vu_set_vnethdr(vdev, iov_vu[0].iov_base, iov_used);
> >  
> >  	/* release unused buffers */
> > -- 
> > 2.43.0

-- 
Stefano


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-12-06  1:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-05  0:51 [PATCH v2 0/5] Pad all inbound frames to 802.3 minimum size if needed Stefano Brivio
2025-12-05  0:51 ` [PATCH v2 1/5] tap: Pad non-batched frames to 802.3 minimum (60 bytes) " Stefano Brivio
2025-12-05  3:23   ` David Gibson
2025-12-05  0:51 ` [PATCH v2 2/5] tcp: Fix coding style for comment to enum tcp_iov_parts Stefano Brivio
2025-12-05  0:51 ` [PATCH v2 3/5] udp: Fix coding style for comment to enum udp_iov_idx Stefano Brivio
2025-12-05  0:51 ` [PATCH v2 4/5] tcp, udp: Pad batched frames to 60 bytes (802.3 minimum) in non-vhost-user modes Stefano Brivio
2025-12-05  5:48   ` Stefano Brivio
2025-12-05 11:27     ` Stefano Brivio
2025-12-05  0:51 ` [PATCH v2 5/5] tcp, udp: Pad batched frames for vhost-user modes to 60 bytes (802.3 minimum) Stefano Brivio
2025-12-05  4:07   ` David Gibson
2025-12-06  1:26     ` Stefano Brivio

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).