From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=WVpDzCKU; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id B6D5A5A0271 for ; Tue, 09 Sep 2025 18:14:03 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757434442; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BgJVDrODxuHw6+SlVOl6v78o4GsXpaQLDCdBafNp+MY=; b=WVpDzCKUh7hyvuRe1pFko7lPilfpo2Di7NbgOf2l5mLKVCCe+B7d6xMAkUtyXJqdpddAbh Ulnytd8/4+AwhKxc2/QoepoMOvFCxP1yWvYi1wHLdGja0qatHqFdCM8WP2Z2+aX1i//Lr4 FTns611Fq9qmpyXTeebR9rsktIXv4JQ= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-515-y-1e-UOTPUqtR0hdtSNMQw-1; Tue, 09 Sep 2025 12:14:01 -0400 X-MC-Unique: y-1e-UOTPUqtR0hdtSNMQw-1 X-Mimecast-MFC-AGG-ID: y-1e-UOTPUqtR0hdtSNMQw_1757434440 Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-3e2055ce94bso3519612f8f.2 for ; Tue, 09 Sep 2025 09:14:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1757434439; x=1758039239; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=BgJVDrODxuHw6+SlVOl6v78o4GsXpaQLDCdBafNp+MY=; b=kP7DDHItrN/pJbGF36u3eG/M1asHJKFYChbko1mUFIkryFU/7KLTFpVMDTwsmeXRl7 ZcRZQA5BkddLOX0Ymshib/00he5wtpJCDvvBuF0kebONYWA5BX3rQnaWfQxf7bmgCRFv bao17KvkUa2RggQeMbkOlNFme2Is4nVKO3P19iyUv9H0aSPMYcivklHRqkaeKWu1/mJ7 q6+u+dHYV4Svf/A9vI0lT534A7XMSaWAhL8HvE1WknCjd8n2ftolj7Hk5kMehZqCibEc 46tiHVI0PVBzOT9TBYuuPXIAociHZVt/IybHdlL/dfefGvw5JDKjfjB41F3HqZDkmV9y ZRYA== X-Gm-Message-State: AOJu0YwrEF96CMsYZ8ZEdm1SY5V9tOPgNlsu7Sz68gsH7+BTyMUN1nz7 W7+iUsXbbpqMujUktUHK5BtkpA0R7GBa0pnommY7j2Z83DEwmW0l6K5xpxLDxs/UOL/ojc0X+m1 g1lq0gbkna1+zxsM/8nnndjQOMq8pM07xxXfeel39eEX5qHm5+x6tONTL53FqAbAvVEoR/rFWY2 zoa6hEQ3dfJxQc742hk3xGCKLSRXCu8zGYOPcf X-Gm-Gg: ASbGncsl/iWyZwMinvSUbX4j2pKYp0jcZit9XgJBgddNvt8rlgEakk8w1P5Pzq2nvjH PpT9OumiVpS/CXi/MudxhMEUadomxAFB5mzevHT0myEAfs97U3uFEBbZXpuetJiSC43lKk2UHwm t5Q4v9FJhkObT1XG0eyhwNy6Q66i4ooX/aPxi7vzbfIGlcmwcvWfUpv4Ld5spFdcWEmLb1N0WGt FgY8oY3q1oKLuVIRb4W4clCGAt+PrMSgfyVw2b7rGHDrkWrwbP2Vg3RnNB9EwIKPjK8a3RBKfVX kH7bGTPHIkuPxNTuJCfhaL3xgOb7MTiFLpWNCAfQzjWio3iiqgg= X-Received: by 2002:a05:600c:3583:b0:459:d3d0:650e with SMTP id 5b1f17b1804b1-45df6558689mr6435335e9.13.1757434439044; Tue, 09 Sep 2025 09:13:59 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFoggRpH027X3ZFfupyYpp+CJI7x2KHXxiAojJFqWO+paRgMkEVj1GCQ1TpyGMClur2SEK0cQ== X-Received: by 2002:a05:600c:3583:b0:459:d3d0:650e with SMTP id 5b1f17b1804b1-45df6558689mr6435085e9.13.1757434438497; Tue, 09 Sep 2025 09:13:58 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-45cb5693921sm302536345e9.0.2025.09.09.09.13.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Sep 2025 09:13:57 -0700 (PDT) Date: Tue, 9 Sep 2025 18:13:56 +0200 From: Stefano Brivio To: "xugu@redhat.com" Subject: Re: [PATCH] Reduce tcp_buf_discard size Message-ID: <20250909181356.456d48c6@elisabeth> In-Reply-To: <20250908110439.22327-1-xugu@redhat.com> References: <20250908110439.22327-1-xugu@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: CNbPvhPXCjY9RnKMfYubF5jyE2QTfWTL7OggiSs2-ZE_1757434440 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: 7KB4TO264OZ7KYJEFCS5QXGES27AWNWK X-Message-ID-Hash: 7KB4TO264OZ7KYJEFCS5QXGES27AWNWK X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Jon Maloy , Laurent Vivier X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Thanks for the patch, it looks good to me and all tests pass with and without SO_PEEK_OFF support! Jon, Laurent, would you mind having a quick look before I apply this? Gu, there's just one stray / trailing whitespace, indicated below, but there's no need to send a new version for that, I will just drop it on merge: On Mon, 8 Sep 2025 20:04:39 +0900 "xugu@redhat.com" wrote: > From: Xun Gu > > On kernels without SO_PEEK_OFF, a 16MB static buffer is used to > discard sent data. This patch reduces the buffer to 1MB. > > Larger discards are now handled by using multiple iovec entries > pointing to the same 1MB buffer. > > Signed-off-by: Xun Gu > --- > tcp.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++- > tcp_buf.c | 18 +++++--------- > tcp_internal.h | 7 +++++- > tcp_vu.c | 17 ++++--------- > 4 files changed, 82 insertions(+), 26 deletions(-) > > diff --git a/tcp.c b/tcp.c > index a27b069..253cdb3 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -399,7 +399,7 @@ static int tcp_sock_ns [NUM_PORTS][IP_VERSIONS]; > */ > static union inany_addr low_rtt_dst[LOW_RTT_TABLE_SIZE]; > > -char tcp_buf_discard [MAX_WINDOW]; > +char tcp_buf_discard [BUF_DISCARD_SIZE]; > > /* Does the kernel support TCP_PEEK_OFF? */ > bool peek_offset_cap; > @@ -3766,3 +3766,67 @@ fail: > > return 0; > } > + > +/** > + * tcp_prepare_iov() - Prepare iov according to kernel capability > + * @msg: Message header to update > + * @iov: iovec to receive TCP payload and data to discard > + * @already_sent: Bytes sent after the last acknowledged one > + * @payload_iov_cnt: Number of TCP payload iovec entries > + * > + * Return: 0 on success, -1 if already_sent cannot be discarded fully > + */ > +int tcp_prepare_iov(struct msghdr *msg, struct iovec *iov, > + uint32_t already_sent, int payload_iov_cnt) > +{ > + /* > + * IOV layout > + * |- tcp_buf_discard -|---------- TCP data slots ------------| > + * > + * with discarded data: > + * |------ddddddddddddd|ttttttttttttt-------------------------| > + * ^ > + * | > + * msg_iov > + * > + * without discarded data: > + * |-------------------|ttttttttttttt-------------------------| > + * ^ > + * | > + * msg_iov > + * d: discard data > + * t: TCP data > + */ > + if (peek_offset_cap) { > + msg->msg_iov = iov + DISCARD_IOV_NUM; > + msg->msg_iovlen = payload_iov_cnt; > + } else { > + int discard_cnt, discard_iov_rem; > + struct iovec *iov_start; > + int i; > + > + discard_cnt = DIV_ROUND_UP(already_sent, BUF_DISCARD_SIZE); > + if (discard_cnt > DISCARD_IOV_NUM) { > + debug("Failed to discard %u already sent bytes", > + already_sent); > + return -1; > + } > + > + discard_iov_rem = already_sent % BUF_DISCARD_SIZE; > + > + iov_start = iov + (DISCARD_IOV_NUM - discard_cnt); > + > + /* Multiple iov entries pointing to the same buffer */ > + for (i = 0; i < discard_cnt; i++) { > + iov_start[i].iov_base = tcp_buf_discard; > + iov_start[i].iov_len = BUF_DISCARD_SIZE; > + } > + if (discard_iov_rem) > + iov[DISCARD_IOV_NUM - 1].iov_len = discard_iov_rem; > + > + msg->msg_iov = iov_start; > + msg->msg_iovlen = discard_cnt + payload_iov_cnt; > + } > + > + return 0; > +} > diff --git a/tcp_buf.c b/tcp_buf.c > index bc898de..4ebb013 100644 > --- a/tcp_buf.c > +++ b/tcp_buf.c > @@ -60,7 +60,7 @@ static struct tcp_tap_conn *tcp_frame_conns[TCP_FRAMES_MEM]; > static unsigned int tcp_payload_used; > > /* recvmsg()/sendmsg() data for tap */ > -static struct iovec iov_sock [TCP_FRAMES_MEM + 1]; > +static struct iovec iov_sock [TCP_FRAMES_MEM + DISCARD_IOV_NUM]; > > static struct iovec tcp_l2_iov[TCP_FRAMES_MEM][TCP_NUM_IOVS]; > > @@ -326,15 +326,9 @@ int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn) > iov_rem = (wnd_scaled - already_sent) % mss; > } > > - /* Prepare iov according to kernel capability */ > - if (!peek_offset_cap) { > - mh_sock.msg_iov = iov_sock; > - iov_sock[0].iov_base = tcp_buf_discard; > - iov_sock[0].iov_len = already_sent; > - mh_sock.msg_iovlen = fill_bufs + 1; > - } else { > - mh_sock.msg_iov = &iov_sock[1]; > - mh_sock.msg_iovlen = fill_bufs; > + if (tcp_prepare_iov(&mh_sock, iov_sock, already_sent, fill_bufs)) { > + tcp_rst(c, conn); > + return -1; > } > > if (tcp_payload_used + fill_bufs > TCP_FRAMES_MEM) { > @@ -344,12 +338,12 @@ int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn) > tcp_payload_used = 0; > } > > - for (i = 0, iov = iov_sock + 1; i < fill_bufs; i++, iov++) { > + for (i = 0, iov = iov_sock + DISCARD_IOV_NUM; i < fill_bufs; i++, iov++) { > iov->iov_base = &tcp_payload[tcp_payload_used + i].data; > iov->iov_len = mss; > } > if (iov_rem) > - iov_sock[fill_bufs].iov_len = iov_rem; > + iov_sock[fill_bufs + DISCARD_IOV_NUM - 1].iov_len = iov_rem; > > /* Receive into buffers, don't dequeue until acknowledged by guest. */ > do > diff --git a/tcp_internal.h b/tcp_internal.h > index 9dae688..d0009f8 100644 > --- a/tcp_internal.h > +++ b/tcp_internal.h > @@ -9,6 +9,9 @@ > #define MAX_WS 8 > #define MAX_WINDOW (1 << (16 + (MAX_WS))) > > +#define BUF_DISCARD_SIZE (1 << 20) ^ ...here, after the ')' (git log/show shows it in red). -- Stefano