From: Stefano Brivio <sbrivio@redhat.com>
To: David du Colombier <0intro@gmail.com>
Cc: passt-dev@passt.top, David Gibson <david@gibson.dropbear.id.au>,
Laurent Vivier <lvivier@redhat.com>
Subject: Re: [PATCH] tap: Trim Ethernet padding from short IPv4 frames instead of dropping them
Date: Sat, 13 Jun 2026 18:24:07 +0200 (CEST) [thread overview]
Message-ID: <20260613182406.23b9f430@elisabeth> (raw)
In-Reply-To: <20260612215804.710266-1-0intro@gmail.com>
On Fri, 12 Jun 2026 23:58:04 +0200
David du Colombier <0intro@gmail.com> wrote:
> tap4_handler() requires the L2 payload after the IP header to match
> the IP datagram length exactly. Guests whose drivers pad transmitted
> frames to the 60 byte Ethernet minimum, as real hardware requires and
> as drivers modelled on hardware do (Plan 9's virtio-net, for one),
David, thanks a lot for the patch. I wasn't even aware of the fact that
Plan 9 had a virtio-net implementation.
> send pure ACK and FIN segments as 60 byte frames: 14 byte Ethernet
> header, 40 byte IPv4 datagram, 6 padding octets. Those frames fail
> the exact length check and are dropped without trace.
>
> passt then never sees such a guest's acknowledgements: it
> retransmits from the lowest unacknowledged sequence with exponential
> backoff while the guest, which received and acknowledged everything,
> waits. Every fresh connection stalls for minutes (a 1 MiB HTTP fetch
> over --map-host-loopback measured 248 s before this change, 0.27 s
> after; bulk transfer over established connections, whose ACKs ride
> data segments above the padding threshold, is unaffected). FIN
> segments are padded too, so teardown hangs as well. Note that
> tap_send_single() pads passt's own outbound frames to ETH_ZLEN, so
> the receive path was already stricter than the send path.
Oops, right, we added padding just a few months ago because of
compatibility issues (https://bugs.passt.top/show_bug.cgi?id=166), but
it didn't occur to me that the receiving side wouldn't accept it.
In some sense, after those changes, passt might not even accept some
of its own frames.
> Trim the trailing padding to the IP datagram length instead, using a
> new iov_tail_trim() helper, and keep dropping frames genuinely
> shorter than the datagram they claim to carry. IPv6 is unaffected:
> its minimal TCP frame is 74 bytes, above the padding threshold.
The patch looks good to me, I'd just give David (Gibson) and Laurent
(both Cc'ed) a chance to review it before merging it, as they're
definitely more familiar than I am with the whole iov_*() machinery.
> Signed-off-by: David du Colombier <0intro@gmail.com>
> ---
> iov.c | 35 +++++++++++++++++++++++++++++++++++
> iov.h | 2 ++
> tap.c | 11 ++++++++++-
> 3 files changed, 47 insertions(+), 1 deletion(-)
>
> diff --git a/iov.c b/iov.c
> index 6fd684a..968a365 100644
> --- a/iov.c
> +++ b/iov.c
> @@ -450,3 +450,38 @@ ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt,
>
> return j;
> }
> +
> +/**
> + * iov_tail_trim() - Limit a tail to @len bytes via a scratch iovec array
> + * @tail: Pointer to the iov_tail to trim; rebuilt on success to
> + * reference @scratch
> + * @len: Number of bytes to keep
> + * @scratch: Scratch iovec array backing the trimmed tail; must stay
> + * valid as long as the trimmed tail is in use
> + * @scratch_cnt: Number of elements in @scratch
> + *
> + * Return: true on success, false if @tail is shorter than @len or does
> + * not fit in @scratch (@tail is unchanged on failure)
> + */
> +bool iov_tail_trim(struct iov_tail *tail, size_t len,
> + struct iovec *scratch, size_t scratch_cnt)
> +{
> + ssize_t cnt = iov_tail_clone(scratch, scratch_cnt, tail);
> + size_t left = len;
> + unsigned int i;
> +
> + if (cnt < 0)
> + return false;
> +
> + for (i = 0; i < (size_t)cnt && left; i++) {
> + if (scratch[i].iov_len > left)
> + scratch[i].iov_len = left;
> + left -= scratch[i].iov_len;
> + }
> +
> + if (left)
> + return false;
> +
> + *tail = IOV_TAIL(scratch, i, 0);
> + return true;
> +}
> diff --git a/iov.h b/iov.h
> index 4fdf14a..3af467e 100644
> --- a/iov.h
> +++ b/iov.h
> @@ -97,6 +97,8 @@ size_t iov_push_header_(struct iov_tail *tail, const void *v, size_t len);
> void *iov_remove_header_(struct iov_tail *tail, void *v, size_t len, size_t align);
> ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt,
> struct iov_tail *tail);
> +bool iov_tail_trim(struct iov_tail *tail, size_t len,
> + struct iovec *scratch, size_t scratch_cnt);
>
> /**
> * IOV_PEEK_HEADER() - Get typed pointer to a header from an IOV tail
> diff --git a/tap.c b/tap.c
> index 4cba4c7..b929b21 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -716,6 +716,7 @@ static int tap4_handler(struct ctx *c, const struct pool *in,
> i = 0;
> resume:
> for (seq_count = 0, seq = NULL; i < in->count; i++) {
> + struct iovec trim_iov[UIO_MAXIOV];
> size_t l3len, hlen, l4len;
> struct ethhdr eh_storage;
> struct iphdr iph_storage;
> @@ -775,7 +776,15 @@ resume:
>
> if (!iov_drop_header(&data, hlen))
> continue;
> - if (iov_tail_size(&data) != l4len)
> + if (iov_tail_size(&data) < l4len)
> + continue;
> +
> + /* Drivers modelled on real hardware (Plan 9's virtio, for
> + * one) pad short frames to the 60 byte Ethernet minimum:
> + * trim trailing padding instead of dropping the packet.
> + */
> + if (iov_tail_size(&data) > l4len &&
> + !iov_tail_trim(&data, l4len, trim_iov, UIO_MAXIOV))
> continue;
>
> if (iph->protocol == IPPROTO_ICMP) {
--
Stefano
prev parent reply other threads:[~2026-06-13 16:24 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-12 21:58 David du Colombier
2026-06-13 16:24 ` Stefano Brivio [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260613182406.23b9f430@elisabeth \
--to=sbrivio@redhat.com \
--cc=0intro@gmail.com \
--cc=david@gibson.dropbear.id.au \
--cc=lvivier@redhat.com \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).