public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: David du Colombier <0intro@gmail.com>
Cc: passt-dev@passt.top, David Gibson <david@gibson.dropbear.id.au>,
	Laurent Vivier <lvivier@redhat.com>
Subject: Re: [PATCH] tap: Trim Ethernet padding from short IPv4 frames instead of dropping them
Date: Sat, 13 Jun 2026 18:24:07 +0200 (CEST)	[thread overview]
Message-ID: <20260613182406.23b9f430@elisabeth> (raw)
In-Reply-To: <20260612215804.710266-1-0intro@gmail.com>

On Fri, 12 Jun 2026 23:58:04 +0200
David du Colombier <0intro@gmail.com> wrote:

> tap4_handler() requires the L2 payload after the IP header to match
> the IP datagram length exactly. Guests whose drivers pad transmitted
> frames to the 60 byte Ethernet minimum, as real hardware requires and
> as drivers modelled on hardware do (Plan 9's virtio-net, for one),

David, thanks a lot for the patch. I wasn't even aware of the fact that
Plan 9 had a virtio-net implementation.

> send pure ACK and FIN segments as 60 byte frames: 14 byte Ethernet
> header, 40 byte IPv4 datagram, 6 padding octets. Those frames fail
> the exact length check and are dropped without trace.
> 
> passt then never sees such a guest's acknowledgements: it
> retransmits from the lowest unacknowledged sequence with exponential
> backoff while the guest, which received and acknowledged everything,
> waits. Every fresh connection stalls for minutes (a 1 MiB HTTP fetch
> over --map-host-loopback measured 248 s before this change, 0.27 s
> after; bulk transfer over established connections, whose ACKs ride
> data segments above the padding threshold, is unaffected). FIN
> segments are padded too, so teardown hangs as well. Note that
> tap_send_single() pads passt's own outbound frames to ETH_ZLEN, so
> the receive path was already stricter than the send path.

Oops, right, we added padding just a few months ago because of
compatibility issues (https://bugs.passt.top/show_bug.cgi?id=166), but
it didn't occur to me that the receiving side wouldn't accept it.

In some sense, after those changes, passt might not even accept some
of its own frames.

> Trim the trailing padding to the IP datagram length instead, using a
> new iov_tail_trim() helper, and keep dropping frames genuinely
> shorter than the datagram they claim to carry. IPv6 is unaffected:
> its minimal TCP frame is 74 bytes, above the padding threshold.

The patch looks good to me, I'd just give David (Gibson) and Laurent
(both Cc'ed) a chance to review it before merging it, as they're
definitely more familiar than I am with the whole iov_*() machinery.

> Signed-off-by: David du Colombier <0intro@gmail.com>
> ---
>  iov.c | 35 +++++++++++++++++++++++++++++++++++
>  iov.h |  2 ++
>  tap.c | 11 ++++++++++-
>  3 files changed, 47 insertions(+), 1 deletion(-)
> 
> diff --git a/iov.c b/iov.c
> index 6fd684a..968a365 100644
> --- a/iov.c
> +++ b/iov.c
> @@ -450,3 +450,38 @@ ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt,
>  
>  	return j;
>  }
> +
> +/**
> + * iov_tail_trim() - Limit a tail to @len bytes via a scratch iovec array
> + * @tail:	 Pointer to the iov_tail to trim; rebuilt on success to
> + *		 reference @scratch
> + * @len:	 Number of bytes to keep
> + * @scratch:	 Scratch iovec array backing the trimmed tail; must stay
> + *		 valid as long as the trimmed tail is in use
> + * @scratch_cnt: Number of elements in @scratch
> + *
> + * Return: true on success, false if @tail is shorter than @len or does
> + *	   not fit in @scratch (@tail is unchanged on failure)
> + */
> +bool iov_tail_trim(struct iov_tail *tail, size_t len,
> +		   struct iovec *scratch, size_t scratch_cnt)
> +{
> +	ssize_t cnt = iov_tail_clone(scratch, scratch_cnt, tail);
> +	size_t left = len;
> +	unsigned int i;
> +
> +	if (cnt < 0)
> +		return false;
> +
> +	for (i = 0; i < (size_t)cnt && left; i++) {
> +		if (scratch[i].iov_len > left)
> +			scratch[i].iov_len = left;
> +		left -= scratch[i].iov_len;
> +	}
> +
> +	if (left)
> +		return false;
> +
> +	*tail = IOV_TAIL(scratch, i, 0);
> +	return true;
> +}
> diff --git a/iov.h b/iov.h
> index 4fdf14a..3af467e 100644
> --- a/iov.h
> +++ b/iov.h
> @@ -97,6 +97,8 @@ size_t iov_push_header_(struct iov_tail *tail, const void *v, size_t len);
>  void *iov_remove_header_(struct iov_tail *tail, void *v, size_t len, size_t align);
>  ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt,
>  		       struct iov_tail *tail);
> +bool iov_tail_trim(struct iov_tail *tail, size_t len,
> +		   struct iovec *scratch, size_t scratch_cnt);
>  
>  /**
>   * IOV_PEEK_HEADER() - Get typed pointer to a header from an IOV tail
> diff --git a/tap.c b/tap.c
> index 4cba4c7..b929b21 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -716,6 +716,7 @@ static int tap4_handler(struct ctx *c, const struct pool *in,
>  	i = 0;
>  resume:
>  	for (seq_count = 0, seq = NULL; i < in->count; i++) {
> +		struct iovec trim_iov[UIO_MAXIOV];
>  		size_t l3len, hlen, l4len;
>  		struct ethhdr eh_storage;
>  		struct iphdr iph_storage;
> @@ -775,7 +776,15 @@ resume:
>  
>  		if (!iov_drop_header(&data, hlen))
>  			continue;
> -		if (iov_tail_size(&data) != l4len)
> +		if (iov_tail_size(&data) < l4len)
> +			continue;
> +
> +		/* Drivers modelled on real hardware (Plan 9's virtio, for
> +		 * one) pad short frames to the 60 byte Ethernet minimum:
> +		 * trim trailing padding instead of dropping the packet.
> +		 */
> +		if (iov_tail_size(&data) > l4len &&
> +		    !iov_tail_trim(&data, l4len, trim_iov, UIO_MAXIOV))
>  			continue;
>  
>  		if (iph->protocol == IPPROTO_ICMP) {

-- 
Stefano


      reply	other threads:[~2026-06-13 16:24 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-12 21:58 David du Colombier
2026-06-13 16:24 ` Stefano Brivio [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260613182406.23b9f430@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=0intro@gmail.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=lvivier@redhat.com \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).