From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202606 header.b=CdCOib/B; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id D53995A026E for ; Mon, 22 Jun 2026 05:35:55 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202606; t=1782099351; bh=mIJyZVX2arPN7+4iNTMdGzH06apv/d1QIAcshL1Hbh8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=CdCOib/Bxsw0JSfAiDqwEoQkUaSfEpqvHO9tltYMd+fSDNcDywD1SA/pU5x4fxs8G dOGsPS8G870lkC+MfouPhRRM7dUQAeoaVJbxPxJuxTTeVocxIJyQNcGwaUNjmE7Pl8 aPPEwHRcBe/RJgwLrdQxx0oIVq8oUUJfjo3T+WgosSUyZPT6RaAM9AzuiZ0UjvAWhq 4Qllqxkob07W1zVgAfj9CxwCZ0TBKI6OHPdASFwv6XFNv2ItlPk6aB79xtzwwr+pjA iEevP0qrqVuF5jPooOMFBIYwZ/yN4eioUftuSqDjDytOKL6ffyIlqgICeeiEGxCe+K cUZGqJg8XpfjQ== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4gkDPM5yH0z4wGv; Mon, 22 Jun 2026 13:35:51 +1000 (AEST) Date: Mon, 22 Jun 2026 12:53:45 +1000 From: David Gibson To: David du Colombier <0intro@gmail.com> Subject: Re: [PATCH v2] tap: Trim Ethernet padding from short IPv4 frames instead of dropping them Message-ID: References: <20260612214338.12345-1-0intro@gmail.com> <20260616163705.1285803-1-0intro@gmail.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="+rHgJau+iJr43zPp" Content-Disposition: inline In-Reply-To: <20260616163705.1285803-1-0intro@gmail.com> Message-ID-Hash: IJBSERBQH5BDSGCYFEQGHV6KV2LDZCAQ X-Message-ID-Hash: IJBSERBQH5BDSGCYFEQGHV6KV2LDZCAQ X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, lvivier@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --+rHgJau+iJr43zPp Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jun 16, 2026 at 06:37:05PM +0200, David du Colombier wrote: > tap4_handler() requires the L2 payload after the IP header to match > the IP datagram length exactly. Guests whose drivers pad transmitted > frames to the 60 byte Ethernet minimum, as real hardware requires and > as drivers modelled on hardware do (Plan 9's virtio-net, for one), > send pure ACK and FIN segments as 60 byte frames: 14 byte Ethernet > header, 40 byte IPv4 datagram, 6 padding octets. Those frames fail > the exact length check and are dropped without trace. >=20 > passt then never sees such a guest's acknowledgements: it > retransmits from the lowest unacknowledged sequence with exponential > backoff while the guest, which received and acknowledged everything, > waits. Every fresh connection stalls for minutes (a 1 MiB HTTP fetch > over --map-host-loopback measured 248 s before this change, 0.27 s > after; bulk transfer over established connections, whose ACKs ride > data segments above the padding threshold, is unaffected). FIN > segments are padded too, so teardown hangs as well. Note that > tap_send_single() pads passt's own outbound frames to ETH_ZLEN, so > the receive path was already stricter than the send path. >=20 > Trim the trailing padding to the IP datagram length instead, using a > new iov_tail_trim() helper, and keep dropping frames genuinely > shorter than the datagram they claim to carry. IPv6 is unaffected: > its minimal TCP frame is 74 bytes, above the padding threshold. >=20 > Signed-off-by: David du Colombier <0intro@gmail.com> > --- > v2: > - store the iov_tail_size() result in a local variable instead of > computing it twice > - pass ARRAY_SIZE(trim_iov) instead of UIO_MAXIOV This is addressing a real, nasty, bug and I'm glad this is merged. Couple of thoughts that we might want to consider in the future, though. [snip] > +/** > + * iov_tail_trim() - Limit a tail to @len bytes via a scratch iovec array > + * @tail: Pointer to the iov_tail to trim; rebuilt on success to > + * reference @scratch > + * @len: Number of bytes to keep > + * @scratch: Scratch iovec array backing the trimmed tail; must stay > + * valid as long as the trimmed tail is in use > + * @scratch_cnt: Number of elements in @scratch > + * > + * Return: true on success, false if @tail is shorter than @len or does > + * not fit in @scratch (@tail is unchanged on failure) > + */ This interface seems kind of fragile - it imposes a pretty subtle lifetime constraint on @scratch. It also arguably breaks the model of an iov_tail() as a "window" into an underlying, immutable iovec array. I've been wondering for a while if we should extend (and rename) iov_tail to be such a window that can skip bytes from either end of the underlying iovec[], not just the beginning - so it would track a (byte) length as well as offset field. That would make merge iov_tail_clone() and iov_tail_trim(): take the "window" and transcribe it in fresh iov[]. As with existing iov_tail_clone() calls, we'd generally want to leave that to the last possible moment to avoid either repeatedly copying or tricky lifetime dependencies between different iovec arrays. Laurent, I feel like that could be useful for some of the vhost-user stuff you've done. Any thoughts? > +bool iov_tail_trim(struct iov_tail *tail, size_t len, > + struct iovec *scratch, size_t scratch_cnt) > +{ > + ssize_t cnt =3D iov_tail_clone(scratch, scratch_cnt, tail); > + size_t left =3D len; > + unsigned int i; > + > + if (cnt < 0) > + return false; > + > + for (i =3D 0; i < (size_t)cnt && left; i++) { > + if (scratch[i].iov_len > left) > + scratch[i].iov_len =3D left; > + left -=3D scratch[i].iov_len; > + } > + > + if (left) > + return false; > + > + *tail =3D IOV_TAIL(scratch, i, 0); > + return true; > +} > diff --git a/iov.h b/iov.h > index 4fdf14a..3af467e 100644 > --- a/iov.h > +++ b/iov.h > @@ -97,6 +97,8 @@ size_t iov_push_header_(struct iov_tail *tail, const vo= id *v, size_t len); > void *iov_remove_header_(struct iov_tail *tail, void *v, size_t len, siz= e_t align); > ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt, > struct iov_tail *tail); > +bool iov_tail_trim(struct iov_tail *tail, size_t len, > + struct iovec *scratch, size_t scratch_cnt); > =20 > /** > * IOV_PEEK_HEADER() - Get typed pointer to a header from an IOV tail > diff --git a/tap.c b/tap.c > index 4cba4c7..6ed7f8d 100644 > --- a/tap.c > +++ b/tap.c > @@ -716,7 +716,8 @@ static int tap4_handler(struct ctx *c, const struct p= ool *in, > i =3D 0; > resume: > for (seq_count =3D 0, seq =3D NULL; i < in->count; i++) { > - size_t l3len, hlen, l4len; > + struct iovec trim_iov[UIO_MAXIOV]; > + size_t l3len, hlen, l4len, check; > struct ethhdr eh_storage; > struct iphdr iph_storage; > struct udphdr uh_storage; > @@ -775,7 +776,17 @@ resume: > =20 > if (!iov_drop_header(&data, hlen)) > continue; > - if (iov_tail_size(&data) !=3D l4len) > + > + check =3D iov_tail_size(&data); > + if (check < l4len) > + continue; > + > + /* Drivers modelled on real hardware (Plan 9's virtio, for > + * one) pad short frames to the 60 byte Ethernet minimum: > + * trim trailing padding instead of dropping the packet. > + */ > + if (check > l4len && > + !iov_tail_trim(&data, l4len, trim_iov, ARRAY_SIZE(trim_iov))) > continue; > =20 > if (iph->protocol =3D=3D IPPROTO_ICMP) { > --=20 > 2.54.0 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --+rHgJau+iJr43zPp Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmo4o60ACgkQzQJF27ox 2GdHLhAAn1fTCWy+gUpJf5Pon0IdmBe7L7DVYVDvJFwt2VlqxqZ2bV4DokFt2m8M tnB7iTQ9eYVOKNhB5zSZtxu0HM4Reu2M9qeZA/2T4mi6RzoByNSMwZ6O9iN2k89d He7Ji+ok6LecgzHzwCAwtLiUSXATF1+7CIHRsillcEZwt8KDKhctf8STE3BpRmMP W5idME/Z0buTkrixcSlSffqHYxAWPG/RAGrb7l+YxoA1PMfYg/5qItabj/GT81yg V2v8ckdby6mAN/kP0ntdeESBFfJupwDMPu+CxRcadJohuLXYr2TfvLf9ya0q9oHe JJhMszhaCKqWyQsAnDhX1icIUvNhww5x7ZnAqaRnvvoD1g9qrw7k2DZwpK4sEJBQ Srirmj+WB+C1jlEOhrU1iNN0XJOO3/XDORdS0vfVw3iOXt4GsW/mswSSf3uOSGAv 61dlWdiGFfugC4Gc8U1UGS5J/jdaKZc7w2HX84oHNu0L2QhcLcA+83UztMkw8Y2O FTSHvhqmG/cVTUCp+TtDs4TOl6XNpyj1+KkAd3lw31RGtf/xu2rkwA116HkCtObn lwKJB0mObpUvQjtZ/GVbHrzL2/e99w97+Kf1ET96K5o1p9Ft/cixvBwa5elwd3uk 07sLDys5Jvc0VYr9am7U+OtPD8XAkboyyxXUGlg2ABgj5DzZDmc= =/Rvi -----END PGP SIGNATURE----- --+rHgJau+iJr43zPp--