From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=QUkypC2E; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id B59D35A0619 for ; Fri, 05 Dec 2025 01:51:50 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1764895909; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wGbIMnQewGoNJ7X6MR4weEtVCwEO9a4wf+WTqSHT2w4=; b=QUkypC2EjT92wWMXMRhkxq63C8922PIV7wvx3dCgOLtGp1CSgDs6P6IGUukaZoMBGnBMJ0 v7SYUuxdSP17egoJnZE3DUqlaXi1rkln8F5Ls8pl6Rq5ripH78jPwrKtCFxOoLWxZVH81u fU6TtqhoUICPw2eEnVKj3rO5C4VlF4E= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-551-5fXBDB4WNfafec1bN8w1eA-1; Thu, 04 Dec 2025 19:51:48 -0500 X-MC-Unique: 5fXBDB4WNfafec1bN8w1eA-1 X-Mimecast-MFC-AGG-ID: 5fXBDB4WNfafec1bN8w1eA_1764895907 Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-47799717212so9102815e9.3 for ; Thu, 04 Dec 2025 16:51:48 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764895907; x=1765500707; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wGbIMnQewGoNJ7X6MR4weEtVCwEO9a4wf+WTqSHT2w4=; b=jL4XPpBTAHVbQadwFcDdcD6919oFEyCNQ6J/Bdf5tNdQfglIlCoUtfk/mg5r7CtBgM u67PjaH810OAu5+fywQRkAVj1CsbMGq47/KcSwK1UfV7iW08St4whNd8BsC4OgE4CT7t VLWtBh+x6Rb4L8t9hFXW/ySPKiji1R5cv3g0TQ87/YfA1YXKgijBPnN0L7aT8II8ktY4 pLVYK4NeA2C0jgU/MYwQyD5dQg3/zvDtLhMEo3TvCwyqFxYSJTHnGWjsqzU6VEIoq3IL 92vt1tbgHCQXA+sOutYj/Nq6zatTYrhvuIu/M1GzGO1Yl7cnsKTBC8taDHJDNObj9YA8 iu6g== X-Gm-Message-State: AOJu0YwCUZHEk0yUMh4ZKU0gk+2OVESjjW/u2abl1F4aPF+S6/GdzXF7 ehLNJ/PahRKU86OuTV650F4xAVqIvu4yz6/EPIyBSnLr2BgnC1UH+Vg01UtqnRrztcLiwcRjkxB KH/8Nw7TDCBcblLftjw+kjLhayHhEeFJMTfeRnQVQvWAfFwiThp5uKDTo7g5p5Q== X-Gm-Gg: ASbGnctSyASwvVRxZjWRJWNr1FK+edLqVFaZj5OU8v1p2tshCbRpitVC6SkQ9RPFRzL w+qKAd94G4Btp48DLxqO+oAU4ntr/XTcf5RLFs60VNPWDKpzHwHEoX4jl+Im0QDHGMO0pJbfG4t 6vM0AFh9HFCDbnv4KgVP/ORGRxYqEoB50zFVUsXgTBrzzufpPgQNsARvIZTchNuRURl4ZjnU8U/ JAZ5dwgmyaFCuCVFAEHc8X1+Ube1GuaSBFnVSc8HJIkiyz0ZP5Pvfz8fYrNNQM6aFl/H2TLC5ng PNTtnzSq1azxDbYRPdugqd1l+mV7VcutdcMyPmfOfoFeBg90LgDzggbOU70o0B/6MDk0ilm6FXQ oiQli1fL7sNq2IddQMrNI X-Received: by 2002:a05:600c:1c13:b0:477:7ae0:cd6e with SMTP id 5b1f17b1804b1-4792aed9f52mr75070625e9.5.1764895906689; Thu, 04 Dec 2025 16:51:46 -0800 (PST) X-Google-Smtp-Source: AGHT+IHXIwL5Fb7sniJ3d3uzwb6fA5oJxx3kjgzobKYhCFzcWnZjVutqS5TX5qu1G5qeKk2TS3KMoA== X-Received: by 2002:a05:600c:1c13:b0:477:7ae0:cd6e with SMTP id 5b1f17b1804b1-4792aed9f52mr75070475e9.5.1764895906123; Thu, 04 Dec 2025 16:51:46 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-42f7cbe8fe8sm5737865f8f.2.2025.12.04.16.51.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Dec 2025 16:51:45 -0800 (PST) Date: Fri, 5 Dec 2025 01:51:43 +0100 From: Stefano Brivio To: David Gibson Subject: Re: [RFC PATCH 5/5] tcp, udp: Pad batched frames for vhost-user modes to 60 bytes (802.3 minimum) Message-ID: <20251205015143.24f8d43e@elisabeth> In-Reply-To: References: <20251103101612.1412079-1-sbrivio@redhat.com> <20251103101612.1412079-6-sbrivio@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: IjIlozqS-uvvSdX07rgp8ZpaJ_7Ql0kWUwAdko1q_eg_1764895907 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: 7S2VNDVRYSJMKCA3SISVHRBZQVVG4YIY X-Message-ID-Hash: 7S2VNDVRYSJMKCA3SISVHRBZQVVG4YIY X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Laurent Vivier X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Wed, 5 Nov 2025 14:49:59 +1100 David Gibson wrote: > On Mon, Nov 03, 2025 at 11:16:12AM +0100, Stefano Brivio wrote: > > For both TCP and UDP, we request vhost-user buffers that are large > > enough to reach ETH_ZLEN (60 bytes), so padding is just a matter of > > increasing the appropriate iov_len and clearing bytes in the buffer > > as needed. > > > > Link: https://bugs.passt.top/show_bug.cgi?id=166 > > Signed-off-by: Stefano Brivio > > I think this is correct, apart from the nasty bug Laurent spotted. > > I'm less certain if this is the most natural way to do it. > > > --- > > tcp.c | 2 -- > > tcp_internal.h | 1 + > > tcp_vu.c | 27 +++++++++++++++++++++++++++ > > udp_vu.c | 11 ++++++++++- > > 4 files changed, 38 insertions(+), 3 deletions(-) > > > > diff --git a/tcp.c b/tcp.c > > index e91c0cf..039688d 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -335,8 +335,6 @@ enum { > > }; > > #endif > > > > -/* MSS rounding: see SET_MSS() */ > > -#define MSS_DEFAULT 536 > > #define WINDOW_DEFAULT 14600 /* RFC 6928 */ > > > > #define ACK_INTERVAL 10 /* ms */ > > diff --git a/tcp_internal.h b/tcp_internal.h > > index 5f8fb35..d2295c9 100644 > > --- a/tcp_internal.h > > +++ b/tcp_internal.h > > @@ -12,6 +12,7 @@ > > #define BUF_DISCARD_SIZE (1 << 20) > > #define DISCARD_IOV_NUM DIV_ROUND_UP(MAX_WINDOW, BUF_DISCARD_SIZE) > > > > +#define MSS_DEFAULT /* and minimum */ 536 /* as it comes from minimum MTU */ > > #define MSS4 ROUND_DOWN(IP_MAX_MTU - \ > > sizeof(struct tcphdr) - \ > > sizeof(struct iphdr), \ > > diff --git a/tcp_vu.c b/tcp_vu.c > > index 1c81ce3..7239401 100644 > > --- a/tcp_vu.c > > +++ b/tcp_vu.c > > @@ -60,6 +60,29 @@ static size_t tcp_vu_hdrlen(bool v6) > > return hdrlen; > > } > > > > +/** > > + * tcp_vu_pad() - Pad 802.3 frame to minimum length (60 bytes) if needed > > + * @iov: iovec array storing 802.3 frame with TCP segment inside > > + * @cnt: Number of entries in @iov > > + */ > > +static void tcp_vu_pad(struct iovec *iov, size_t cnt) > > +{ > > + size_t l2len, pad; > > + > > + ASSERT(iov_size(iov, cnt) >= sizeof(struct virtio_net_hdr_mrg_rxbuf)); > > + l2len = iov_size(iov, cnt) - sizeof(struct virtio_net_hdr_mrg_rxbuf); > > Re-obtaining l2len from iov_size() seems kind of awkward, since the > callers should already know the length - they've just used it to > populate iov_len. That's only the case for tcp_vu_send_flag() though, because tcp_vu_data_from_sock() can use split buffers and iov_len of the first element is not the same as the whole frame length. That is, you could (very much in theory) have iov_len set to 50 for the first iov item, set to 4 for the second iov item, and the frame needs padding, but you can't tell from the first iov item itself. > > > + if (l2len >= ETH_ZLEN) > > + return; > > > > + > > + pad = ETH_ZLEN - l2len; > > + > > + /* tcp_vu_sock_recv() requests at least MSS-sized vhost-user buffers */ > > + static_assert(ETH_ZLEN <= MSS_DEFAULT); > > So, this is true for the data path, but not AFAICT for the flags path. > > There _is_ still enough space in this case, because we request space > for (tcp_vu_hdrlen() + sizeof(struct tcp_syn_opts)) which works out to: > ETH_HLEN 14 > + IP header 20 > + TCP header 20 > + tcp_syn_opts 8 > ---- > 62 > ETH_ZLEN > > But the comment and assert are misleading. Dropped, in favour of: > It seems like it would make more sense to clamp ETH_ZLEN as a lower > length bound before we vu_collect() the buffers. this. > Or indeed, like we should be calculating l2len already including the > clamping. That's not trivial to do for the data path, I think (see above). I think it would be doable with a rework of the tcp_vu_data_from_sock() loop but I'd say it's beyond the scope of this series. > > + memset(&iov[cnt - 1].iov_base + iov[cnt - 1].iov_len, 0, pad); > > + iov[cnt - 1].iov_len += pad; > > +} > > + > > /** > > * tcp_vu_send_flag() - Send segment with flags to vhost-user (no payload) > > * @c: Execution context > > @@ -138,6 +161,8 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) > > tcp_fill_headers(c, conn, NULL, eh, ip4h, ip6h, th, &payload, > > NULL, seq, !*c->pcap); > > > > + tcp_vu_pad(&flags_elem[0].in_sg[0], 1); > > + > > if (*c->pcap) { > > pcap_iov(&flags_elem[0].in_sg[0], 1, > > sizeof(struct virtio_net_hdr_mrg_rxbuf)); > > @@ -456,6 +481,8 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn) > > > > tcp_vu_prepare(c, conn, iov, buf_cnt, &check, !*c->pcap, push); > > > > + tcp_vu_pad(iov, buf_cnt); > > + > > if (*c->pcap) { > > pcap_iov(iov, buf_cnt, > > sizeof(struct virtio_net_hdr_mrg_rxbuf)); > > diff --git a/udp_vu.c b/udp_vu.c > > index 099677f..1b60860 100644 > > --- a/udp_vu.c > > +++ b/udp_vu.c > > @@ -72,8 +72,8 @@ static int udp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq, int s, > > { > > const struct vu_dev *vdev = c->vdev; > > int iov_cnt, idx, iov_used; > > + size_t off, hdrlen, l2len; > > struct msghdr msg = { 0 }; > > - size_t off, hdrlen; > > > > ASSERT(!c->no_udp); > > > > @@ -116,6 +116,15 @@ static int udp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq, int s, > > iov_vu[idx].iov_len = off; > > iov_used = idx + !!off; > > > > + /* pad 802.3 frame to 60 bytes if needed */ > > + l2len = *dlen + hdrlen - sizeof(struct virtio_net_hdr_mrg_rxbuf); > > + if (l2len < ETH_ZLEN) { > > + size_t pad = ETH_ZLEN - l2len; > > + > > + iov_vu[idx].iov_len += pad; > > + memset(&iov_vu[idx].iov_base + off, 0, pad); > > + } > > + > > vu_set_vnethdr(vdev, iov_vu[0].iov_base, iov_used); > > > > /* release unused buffers */ > > -- > > 2.43.0 -- Stefano