From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=NsCKZfPe; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 2D6215A0265 for ; Sat, 13 Jun 2026 18:24:13 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1781367852; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NJ6CU9MqGmVZ8Os95yA47DglSN9/3cxLxjTotOmKWSw=; b=NsCKZfPeQTcwrxmGzpd0BBId7Vf/HMvUPDD2gMU9VdJAMs6xNF6HYjyJZQP8pTzGEKUaP7 pOf1VQM/jws3gdzkLGarhsdJMKQZ/uIV6oMS8InWZuIXK+wZ6+Y0S+hS1HwDUte6Wfh1UX thO94asb4G+70xoXaN1oDewq7PkPlug= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-124-Wb2Qu7FeMzyF5MuYWdmHRQ-1; Sat, 13 Jun 2026 12:24:10 -0400 X-MC-Unique: Wb2Qu7FeMzyF5MuYWdmHRQ-1 X-Mimecast-MFC-AGG-ID: Wb2Qu7FeMzyF5MuYWdmHRQ_1781367849 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-490a767c7dcso13118025e9.2 for ; Sat, 13 Jun 2026 09:24:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781367849; x=1781972649; h=date:content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=NJ6CU9MqGmVZ8Os95yA47DglSN9/3cxLxjTotOmKWSw=; b=FJ9yPxrRvp3Mqsl2WKQlIOfusV3BDbcoljDl2GEx6HXcBeMlPd7cxJ31kjgWJOFSn/ SUpu+MCNxU495rlqA+K64yr2g1cmGtIqNh1ab2ob6Zvw7WhH3HjLZyXRISoM2hYwyI1A nudb63op9knbRIGus5+envQpbeCsa31Qxin234Uhvmts5PYJFV2nmOqLxHoFdOKs1tqP uYcM9ybu11BsPKcj3a/lHENrtnDVXcpgiHuDotZZEUBY57c8RrbdiQRwmnmaByBtHXMJ A4JYE7QGa6QV1T1slQBDWUqXGUPXQz+Dz0m2pmKwiFQ/dYxsZZCUXF2i3osmCIAlRQJd dLbw== X-Gm-Message-State: AOJu0YyfS8uJ4ZAEO5HYsr4Fsvgh5iw/iBKwtyu4U9ZXpXTvEbIIbvja k0K58oyVUx5tGTzL/UdvIKC6cp86NnsgGn1yA++MiAP+g6F32czdOlKND9meXy1wSPS8z65ySeI NOga/qQZ0n9X7Ibd2yOidd+IV42wsyPegQ2qIvg39nmLW2K25RZJDnw== X-Gm-Gg: Acq92OFRP4D2FvAXlDSDWYpOQXWcDI7OmAvkgVlv9DjjrOkULffmw3ru1tJhQ+FpMd4 L7DlKEf0AgZMYMB/LOmOeh2TcZSzBWccK/Ez2/0EzDzj08ireNfKSu1fcX9Vy+O84ahxHfjv8Ep C7C6xKsCStK2GL6OzoptzbcAWZPraXJarygoBMgQJCirbHo/mcbzf1GRuM8tZxkb91RHaE4PcAp fS19OxcaeudZoOABRsEt1lWKMR2FVVF0h3EoDTKKrD+upcwWYsz18e/+i0PO9DiQgGVpAE4RS/D noEiU92TDHsRCfe2YmFjakVeKXyfR+IPh0pMWcSYm2fHEPl7yi/GIabRfh5J7pP0TcBtM5GM9q1 ymbJdgq+VvbCRg1ZWuyQydHbHx2Isfbd1 X-Received: by 2002:a05:600c:4710:b0:490:601f:d766 with SMTP id 5b1f17b1804b1-49220085103mr46549195e9.1.1781367849289; Sat, 13 Jun 2026 09:24:09 -0700 (PDT) X-Received: by 2002:a05:600c:4710:b0:490:601f:d766 with SMTP id 5b1f17b1804b1-49220085103mr46548935e9.1.1781367848748; Sat, 13 Jun 2026 09:24:08 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-4606f2b1056sm16660113f8f.18.2026.06.13.09.24.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Jun 2026 09:24:07 -0700 (PDT) From: Stefano Brivio To: David du Colombier <0intro@gmail.com> Subject: Re: [PATCH] tap: Trim Ethernet padding from short IPv4 frames instead of dropping them Message-ID: <20260613182406.23b9f430@elisabeth> In-Reply-To: <20260612215804.710266-1-0intro@gmail.com> References: <20260612215804.710266-1-0intro@gmail.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 Date: Sat, 13 Jun 2026 18:24:07 +0200 (CEST) X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: lTF7NpCM4OhDeSPvUFZxT1dD-C1Y1oEUs6jyuc-QhiI_1781367849 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: 7WNLPWPGN2PLJOQ6YXVL3MOMAY56ZBNZ X-Message-ID-Hash: 7WNLPWPGN2PLJOQ6YXVL3MOMAY56ZBNZ X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, David Gibson , Laurent Vivier X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, 12 Jun 2026 23:58:04 +0200 David du Colombier <0intro@gmail.com> wrote: > tap4_handler() requires the L2 payload after the IP header to match > the IP datagram length exactly. Guests whose drivers pad transmitted > frames to the 60 byte Ethernet minimum, as real hardware requires and > as drivers modelled on hardware do (Plan 9's virtio-net, for one), David, thanks a lot for the patch. I wasn't even aware of the fact that Plan 9 had a virtio-net implementation. > send pure ACK and FIN segments as 60 byte frames: 14 byte Ethernet > header, 40 byte IPv4 datagram, 6 padding octets. Those frames fail > the exact length check and are dropped without trace. > > passt then never sees such a guest's acknowledgements: it > retransmits from the lowest unacknowledged sequence with exponential > backoff while the guest, which received and acknowledged everything, > waits. Every fresh connection stalls for minutes (a 1 MiB HTTP fetch > over --map-host-loopback measured 248 s before this change, 0.27 s > after; bulk transfer over established connections, whose ACKs ride > data segments above the padding threshold, is unaffected). FIN > segments are padded too, so teardown hangs as well. Note that > tap_send_single() pads passt's own outbound frames to ETH_ZLEN, so > the receive path was already stricter than the send path. Oops, right, we added padding just a few months ago because of compatibility issues (https://bugs.passt.top/show_bug.cgi?id=166), but it didn't occur to me that the receiving side wouldn't accept it. In some sense, after those changes, passt might not even accept some of its own frames. > Trim the trailing padding to the IP datagram length instead, using a > new iov_tail_trim() helper, and keep dropping frames genuinely > shorter than the datagram they claim to carry. IPv6 is unaffected: > its minimal TCP frame is 74 bytes, above the padding threshold. The patch looks good to me, I'd just give David (Gibson) and Laurent (both Cc'ed) a chance to review it before merging it, as they're definitely more familiar than I am with the whole iov_*() machinery. > Signed-off-by: David du Colombier <0intro@gmail.com> > --- > iov.c | 35 +++++++++++++++++++++++++++++++++++ > iov.h | 2 ++ > tap.c | 11 ++++++++++- > 3 files changed, 47 insertions(+), 1 deletion(-) > > diff --git a/iov.c b/iov.c > index 6fd684a..968a365 100644 > --- a/iov.c > +++ b/iov.c > @@ -450,3 +450,38 @@ ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt, > > return j; > } > + > +/** > + * iov_tail_trim() - Limit a tail to @len bytes via a scratch iovec array > + * @tail: Pointer to the iov_tail to trim; rebuilt on success to > + * reference @scratch > + * @len: Number of bytes to keep > + * @scratch: Scratch iovec array backing the trimmed tail; must stay > + * valid as long as the trimmed tail is in use > + * @scratch_cnt: Number of elements in @scratch > + * > + * Return: true on success, false if @tail is shorter than @len or does > + * not fit in @scratch (@tail is unchanged on failure) > + */ > +bool iov_tail_trim(struct iov_tail *tail, size_t len, > + struct iovec *scratch, size_t scratch_cnt) > +{ > + ssize_t cnt = iov_tail_clone(scratch, scratch_cnt, tail); > + size_t left = len; > + unsigned int i; > + > + if (cnt < 0) > + return false; > + > + for (i = 0; i < (size_t)cnt && left; i++) { > + if (scratch[i].iov_len > left) > + scratch[i].iov_len = left; > + left -= scratch[i].iov_len; > + } > + > + if (left) > + return false; > + > + *tail = IOV_TAIL(scratch, i, 0); > + return true; > +} > diff --git a/iov.h b/iov.h > index 4fdf14a..3af467e 100644 > --- a/iov.h > +++ b/iov.h > @@ -97,6 +97,8 @@ size_t iov_push_header_(struct iov_tail *tail, const void *v, size_t len); > void *iov_remove_header_(struct iov_tail *tail, void *v, size_t len, size_t align); > ssize_t iov_tail_clone(struct iovec *dst_iov, size_t dst_iov_cnt, > struct iov_tail *tail); > +bool iov_tail_trim(struct iov_tail *tail, size_t len, > + struct iovec *scratch, size_t scratch_cnt); > > /** > * IOV_PEEK_HEADER() - Get typed pointer to a header from an IOV tail > diff --git a/tap.c b/tap.c > index 4cba4c7..b929b21 100644 > --- a/tap.c > +++ b/tap.c > @@ -716,6 +716,7 @@ static int tap4_handler(struct ctx *c, const struct pool *in, > i = 0; > resume: > for (seq_count = 0, seq = NULL; i < in->count; i++) { > + struct iovec trim_iov[UIO_MAXIOV]; > size_t l3len, hlen, l4len; > struct ethhdr eh_storage; > struct iphdr iph_storage; > @@ -775,7 +776,15 @@ resume: > > if (!iov_drop_header(&data, hlen)) > continue; > - if (iov_tail_size(&data) != l4len) > + if (iov_tail_size(&data) < l4len) > + continue; > + > + /* Drivers modelled on real hardware (Plan 9's virtio, for > + * one) pad short frames to the 60 byte Ethernet minimum: > + * trim trailing padding instead of dropping the packet. > + */ > + if (iov_tail_size(&data) > l4len && > + !iov_tail_trim(&data, l4len, trim_iov, UIO_MAXIOV)) > continue; > > if (iph->protocol == IPPROTO_ICMP) { -- Stefano