* [PATCH] tap: Trim Ethernet padding from short IPv4 frames instead of dropping them
@ 2026-06-12 21:58 David du Colombier
2026-06-13 16:24 ` Stefano Brivio
0 siblings, 1 reply; 2+ messages in thread
From: David du Colombier @ 2026-06-12 21:58 UTC (permalink / raw)
To: passt-dev; +Cc: sbrivio, David du Colombier
tap4_handler() requires the L2 payload after the IP header to match
the IP datagram length exactly. Guests whose drivers pad transmitted
frames to the 60 byte Ethernet minimum, as real hardware requires and
as drivers modelled on hardware do (Plan 9's virtio-net, for one),
send pure ACK and FIN segments as 60 byte frames: 14 byte Ethernet
header, 40 byte IPv4 datagram, 6 padding octets. Those frames fail
the exact length check and are dropped without trace.
passt then never sees such a guest's acknowledgements: it
retransmits from the lowest unacknowledged sequence with exponential
backoff while the guest, which received and acknowledged everything,
waits. Every fresh connection stalls for minutes (a 1 MiB HTTP fetch
over --map-host-loopback measured 248 s before this change, 0.27 s
after; bulk transfer over established connections, whose ACKs ride
data segments above the padding threshold, is unaffected). FIN
segments are padded too, so teardown hangs as well. Note that
tap_send_single() pads passt's own outbound frames to ETH_ZLEN, so
the receive path was already stricter than the send path.
Trim the trailing padding to the IP datagram length instead, using a
new iov_tail_trim() helper, and keep dropping frames genuinely
shorter than the datagram they claim to carry. IPv6 is unaffected:
its minimal TCP frame is 74 bytes, above the padding threshold.
Signed-off-by: David du Colombier <0intro@gmail.com>
---
iov.c | 35 +++++++++++++++++++++++++++++++++++
iov.h | 2 ++
tap.c | 11 ++++++++++-
3 files changed, 47 insertions(+), 1 deletion(-)
diff --git a/iov.c b/iov.c
index 6fd684a..968a365 100644
--- a/iov.c
+++ b/iov.c
@@ -450,3 +450,38 @@ ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt,
return j;
}
+
+/**
+ * iov_tail_trim() - Limit a tail to @len bytes via a scratch iovec array
+ * @tail: Pointer to the iov_tail to trim; rebuilt on success to
+ * reference @scratch
+ * @len: Number of bytes to keep
+ * @scratch: Scratch iovec array backing the trimmed tail; must stay
+ * valid as long as the trimmed tail is in use
+ * @scratch_cnt: Number of elements in @scratch
+ *
+ * Return: true on success, false if @tail is shorter than @len or does
+ * not fit in @scratch (@tail is unchanged on failure)
+ */
+bool iov_tail_trim(struct iov_tail *tail, size_t len,
+ struct iovec *scratch, size_t scratch_cnt)
+{
+ ssize_t cnt = iov_tail_clone(scratch, scratch_cnt, tail);
+ size_t left = len;
+ unsigned int i;
+
+ if (cnt < 0)
+ return false;
+
+ for (i = 0; i < (size_t)cnt && left; i++) {
+ if (scratch[i].iov_len > left)
+ scratch[i].iov_len = left;
+ left -= scratch[i].iov_len;
+ }
+
+ if (left)
+ return false;
+
+ *tail = IOV_TAIL(scratch, i, 0);
+ return true;
+}
diff --git a/iov.h b/iov.h
index 4fdf14a..3af467e 100644
--- a/iov.h
+++ b/iov.h
@@ -97,6 +97,8 @@ size_t iov_push_header_(struct iov_tail *tail, const void *v, size_t len);
void *iov_remove_header_(struct iov_tail *tail, void *v, size_t len, size_t align);
ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt,
struct iov_tail *tail);
+bool iov_tail_trim(struct iov_tail *tail, size_t len,
+ struct iovec *scratch, size_t scratch_cnt);
/**
* IOV_PEEK_HEADER() - Get typed pointer to a header from an IOV tail
diff --git a/tap.c b/tap.c
index 4cba4c7..b929b21 100644
--- a/tap.c
+++ b/tap.c
@@ -716,6 +716,7 @@ static int tap4_handler(struct ctx *c, const struct pool *in,
i = 0;
resume:
for (seq_count = 0, seq = NULL; i < in->count; i++) {
+ struct iovec trim_iov[UIO_MAXIOV];
size_t l3len, hlen, l4len;
struct ethhdr eh_storage;
struct iphdr iph_storage;
@@ -775,7 +776,15 @@ resume:
if (!iov_drop_header(&data, hlen))
continue;
- if (iov_tail_size(&data) != l4len)
+ if (iov_tail_size(&data) < l4len)
+ continue;
+
+ /* Drivers modelled on real hardware (Plan 9's virtio, for
+ * one) pad short frames to the 60 byte Ethernet minimum:
+ * trim trailing padding instead of dropping the packet.
+ */
+ if (iov_tail_size(&data) > l4len &&
+ !iov_tail_trim(&data, l4len, trim_iov, UIO_MAXIOV))
continue;
if (iph->protocol == IPPROTO_ICMP) {
--
2.54.0
^ permalink raw reply [flat|nested] 2+ messages in thread* Re: [PATCH] tap: Trim Ethernet padding from short IPv4 frames instead of dropping them
2026-06-12 21:58 [PATCH] tap: Trim Ethernet padding from short IPv4 frames instead of dropping them David du Colombier
@ 2026-06-13 16:24 ` Stefano Brivio
0 siblings, 0 replies; 2+ messages in thread
From: Stefano Brivio @ 2026-06-13 16:24 UTC (permalink / raw)
To: David du Colombier; +Cc: passt-dev, David Gibson, Laurent Vivier
On Fri, 12 Jun 2026 23:58:04 +0200
David du Colombier <0intro@gmail.com> wrote:
> tap4_handler() requires the L2 payload after the IP header to match
> the IP datagram length exactly. Guests whose drivers pad transmitted
> frames to the 60 byte Ethernet minimum, as real hardware requires and
> as drivers modelled on hardware do (Plan 9's virtio-net, for one),
David, thanks a lot for the patch. I wasn't even aware of the fact that
Plan 9 had a virtio-net implementation.
> send pure ACK and FIN segments as 60 byte frames: 14 byte Ethernet
> header, 40 byte IPv4 datagram, 6 padding octets. Those frames fail
> the exact length check and are dropped without trace.
>
> passt then never sees such a guest's acknowledgements: it
> retransmits from the lowest unacknowledged sequence with exponential
> backoff while the guest, which received and acknowledged everything,
> waits. Every fresh connection stalls for minutes (a 1 MiB HTTP fetch
> over --map-host-loopback measured 248 s before this change, 0.27 s
> after; bulk transfer over established connections, whose ACKs ride
> data segments above the padding threshold, is unaffected). FIN
> segments are padded too, so teardown hangs as well. Note that
> tap_send_single() pads passt's own outbound frames to ETH_ZLEN, so
> the receive path was already stricter than the send path.
Oops, right, we added padding just a few months ago because of
compatibility issues (https://bugs.passt.top/show_bug.cgi?id=166), but
it didn't occur to me that the receiving side wouldn't accept it.
In some sense, after those changes, passt might not even accept some
of its own frames.
> Trim the trailing padding to the IP datagram length instead, using a
> new iov_tail_trim() helper, and keep dropping frames genuinely
> shorter than the datagram they claim to carry. IPv6 is unaffected:
> its minimal TCP frame is 74 bytes, above the padding threshold.
The patch looks good to me, I'd just give David (Gibson) and Laurent
(both Cc'ed) a chance to review it before merging it, as they're
definitely more familiar than I am with the whole iov_*() machinery.
> Signed-off-by: David du Colombier <0intro@gmail.com>
> ---
> iov.c | 35 +++++++++++++++++++++++++++++++++++
> iov.h | 2 ++
> tap.c | 11 ++++++++++-
> 3 files changed, 47 insertions(+), 1 deletion(-)
>
> diff --git a/iov.c b/iov.c
> index 6fd684a..968a365 100644
> --- a/iov.c
> +++ b/iov.c
> @@ -450,3 +450,38 @@ ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt,
>
> return j;
> }
> +
> +/**
> + * iov_tail_trim() - Limit a tail to @len bytes via a scratch iovec array
> + * @tail: Pointer to the iov_tail to trim; rebuilt on success to
> + * reference @scratch
> + * @len: Number of bytes to keep
> + * @scratch: Scratch iovec array backing the trimmed tail; must stay
> + * valid as long as the trimmed tail is in use
> + * @scratch_cnt: Number of elements in @scratch
> + *
> + * Return: true on success, false if @tail is shorter than @len or does
> + * not fit in @scratch (@tail is unchanged on failure)
> + */
> +bool iov_tail_trim(struct iov_tail *tail, size_t len,
> + struct iovec *scratch, size_t scratch_cnt)
> +{
> + ssize_t cnt = iov_tail_clone(scratch, scratch_cnt, tail);
> + size_t left = len;
> + unsigned int i;
> +
> + if (cnt < 0)
> + return false;
> +
> + for (i = 0; i < (size_t)cnt && left; i++) {
> + if (scratch[i].iov_len > left)
> + scratch[i].iov_len = left;
> + left -= scratch[i].iov_len;
> + }
> +
> + if (left)
> + return false;
> +
> + *tail = IOV_TAIL(scratch, i, 0);
> + return true;
> +}
> diff --git a/iov.h b/iov.h
> index 4fdf14a..3af467e 100644
> --- a/iov.h
> +++ b/iov.h
> @@ -97,6 +97,8 @@ size_t iov_push_header_(struct iov_tail *tail, const void *v, size_t len);
> void *iov_remove_header_(struct iov_tail *tail, void *v, size_t len, size_t align);
> ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt,
> struct iov_tail *tail);
> +bool iov_tail_trim(struct iov_tail *tail, size_t len,
> + struct iovec *scratch, size_t scratch_cnt);
>
> /**
> * IOV_PEEK_HEADER() - Get typed pointer to a header from an IOV tail
> diff --git a/tap.c b/tap.c
> index 4cba4c7..b929b21 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -716,6 +716,7 @@ static int tap4_handler(struct ctx *c, const struct pool *in,
> i = 0;
> resume:
> for (seq_count = 0, seq = NULL; i < in->count; i++) {
> + struct iovec trim_iov[UIO_MAXIOV];
> size_t l3len, hlen, l4len;
> struct ethhdr eh_storage;
> struct iphdr iph_storage;
> @@ -775,7 +776,15 @@ resume:
>
> if (!iov_drop_header(&data, hlen))
> continue;
> - if (iov_tail_size(&data) != l4len)
> + if (iov_tail_size(&data) < l4len)
> + continue;
> +
> + /* Drivers modelled on real hardware (Plan 9's virtio, for
> + * one) pad short frames to the 60 byte Ethernet minimum:
> + * trim trailing padding instead of dropping the packet.
> + */
> + if (iov_tail_size(&data) > l4len &&
> + !iov_tail_trim(&data, l4len, trim_iov, UIO_MAXIOV))
> continue;
>
> if (iph->protocol == IPPROTO_ICMP) {
--
Stefano
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-06-13 16:24 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-06-12 21:58 [PATCH] tap: Trim Ethernet padding from short IPv4 frames instead of dropping them David du Colombier
2026-06-13 16:24 ` Stefano Brivio
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).