public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Laurent Vivier <lvivier@redhat.com>
To: David du Colombier <0intro@gmail.com>, passt-dev@passt.top
Cc: sbrivio@redhat.com
Subject: Re: [PATCH] tap: Trim Ethernet padding from short IPv4 frames instead of dropping them
Date: Mon, 15 Jun 2026 10:47:47 +0200	[thread overview]
Message-ID: <2192831c-fd2b-4825-ab65-5483484ff1b8@redhat.com> (raw)
In-Reply-To: <20260612215804.710266-1-0intro@gmail.com>

On 6/12/26 23:58, David du Colombier wrote:
> tap4_handler() requires the L2 payload after the IP header to match
> the IP datagram length exactly. Guests whose drivers pad transmitted
> frames to the 60 byte Ethernet minimum, as real hardware requires and
> as drivers modelled on hardware do (Plan 9's virtio-net, for one),
> send pure ACK and FIN segments as 60 byte frames: 14 byte Ethernet
> header, 40 byte IPv4 datagram, 6 padding octets. Those frames fail
> the exact length check and are dropped without trace.
> 
> passt then never sees such a guest's acknowledgements: it
> retransmits from the lowest unacknowledged sequence with exponential
> backoff while the guest, which received and acknowledged everything,
> waits. Every fresh connection stalls for minutes (a 1 MiB HTTP fetch
> over --map-host-loopback measured 248 s before this change, 0.27 s
> after; bulk transfer over established connections, whose ACKs ride
> data segments above the padding threshold, is unaffected). FIN
> segments are padded too, so teardown hangs as well. Note that
> tap_send_single() pads passt's own outbound frames to ETH_ZLEN, so
> the receive path was already stricter than the send path.
> 
> Trim the trailing padding to the IP datagram length instead, using a
> new iov_tail_trim() helper, and keep dropping frames genuinely
> shorter than the datagram they claim to carry. IPv6 is unaffected:
> its minimal TCP frame is 74 bytes, above the padding threshold.
> 
> Signed-off-by: David du Colombier <0intro@gmail.com>
> ---
>   iov.c | 35 +++++++++++++++++++++++++++++++++++
>   iov.h |  2 ++
>   tap.c | 11 ++++++++++-
>   3 files changed, 47 insertions(+), 1 deletion(-)
> 
> diff --git a/iov.c b/iov.c
> index 6fd684a..968a365 100644
> --- a/iov.c
> +++ b/iov.c
> @@ -450,3 +450,38 @@ ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt,
>   
>   	return j;
>   }
> +
> +/**
> + * iov_tail_trim() - Limit a tail to @len bytes via a scratch iovec array
> + * @tail:	 Pointer to the iov_tail to trim; rebuilt on success to
> + *		 reference @scratch
> + * @len:	 Number of bytes to keep
> + * @scratch:	 Scratch iovec array backing the trimmed tail; must stay
> + *		 valid as long as the trimmed tail is in use
> + * @scratch_cnt: Number of elements in @scratch
> + *
> + * Return: true on success, false if @tail is shorter than @len or does
> + *	   not fit in @scratch (@tail is unchanged on failure)
> + */
> +bool iov_tail_trim(struct iov_tail *tail, size_t len,
> +		   struct iovec *scratch, size_t scratch_cnt)
> +{
> +	ssize_t cnt = iov_tail_clone(scratch, scratch_cnt, tail);
> +	size_t left = len;
> +	unsigned int i;
> +
> +	if (cnt < 0)
> +		return false;
> +
> +	for (i = 0; i < (size_t)cnt && left; i++) {
> +		if (scratch[i].iov_len > left)
> +			scratch[i].iov_len = left;
> +		left -= scratch[i].iov_len;
> +	}
> +
> +	if (left)
> +		return false;
> +
> +	*tail = IOV_TAIL(scratch, i, 0);
> +	return true;
> +}
> diff --git a/iov.h b/iov.h
> index 4fdf14a..3af467e 100644
> --- a/iov.h
> +++ b/iov.h
> @@ -97,6 +97,8 @@ size_t iov_push_header_(struct iov_tail *tail, const void *v, size_t len);
>   void *iov_remove_header_(struct iov_tail *tail, void *v, size_t len, size_t align);
>   ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt,
>   		       struct iov_tail *tail);
> +bool iov_tail_trim(struct iov_tail *tail, size_t len,
> +		   struct iovec *scratch, size_t scratch_cnt);
>   
>   /**
>    * IOV_PEEK_HEADER() - Get typed pointer to a header from an IOV tail
> diff --git a/tap.c b/tap.c
> index 4cba4c7..b929b21 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -716,6 +716,7 @@ static int tap4_handler(struct ctx *c, const struct pool *in,
>   	i = 0;
>   resume:
>   	for (seq_count = 0, seq = NULL; i < in->count; i++) {
> +		struct iovec trim_iov[UIO_MAXIOV];
>   		size_t l3len, hlen, l4len;
>   		struct ethhdr eh_storage;
>   		struct iphdr iph_storage;
> @@ -775,7 +776,15 @@ resume:
>   
>   		if (!iov_drop_header(&data, hlen))
>   			continue;
> -		if (iov_tail_size(&data) != l4len)
> +		if (iov_tail_size(&data) < l4len)
> +			continue;
> +
> +		/* Drivers modelled on real hardware (Plan 9's virtio, for
> +		 * one) pad short frames to the 60 byte Ethernet minimum:
> +		 * trim trailing padding instead of dropping the packet.
> +		 */
> +		if (iov_tail_size(&data) > l4len &&

Perhaps we can avoid to compute twice iov_tail_size(&data) and store it in a variable?

> +		    !iov_tail_trim(&data, l4len, trim_iov, UIO_MAXIOV))

I think ARRAY_SIZE(trim_iov) would be a better choice than UIO_MAXIOV.

>   			continue;
>   
>   		if (iph->protocol == IPPROTO_ICMP) {

otherwise LGTM.

Thanks,
Laurent


      parent reply	other threads:[~2026-06-15  8:47 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-12 21:58 David du Colombier
2026-06-13 16:24 ` Stefano Brivio
2026-06-15  8:47 ` Laurent Vivier [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2192831c-fd2b-4825-ab65-5483484ff1b8@redhat.com \
    --to=lvivier@redhat.com \
    --cc=0intro@gmail.com \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).