From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=FiiEhjje; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id A025A5A0278 for ; Mon, 08 Sep 2025 13:04:53 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757329492; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=uEAIp0DYl1Xwe/73rHShVaZ78Dx1KlFaAhIncnTkviA=; b=FiiEhjjeGKaJLvCaGlLUsixdJOM5hEJeKuRYKFfKYs/XpMfzY8jXw6DUtF+JjgzJ09kA2r OZF1208iBrikUgVwIaHcgBHnebAi4cfVdOcEidYqcM9+Vy1gc9wdeOOUpy2X0NWmAY6N/3 x3i4qiTd9XgFC1jTwzXt52c27N6zf2Q= Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-609-S7DwbjCQNOK-hE099my0YA-1; Mon, 08 Sep 2025 07:04:51 -0400 X-MC-Unique: S7DwbjCQNOK-hE099my0YA-1 X-Mimecast-MFC-AGG-ID: S7DwbjCQNOK-hE099my0YA_1757329490 Received: by mail-pf1-f200.google.com with SMTP id d2e1a72fcca58-7727edb9d3cso3701725b3a.1 for ; Mon, 08 Sep 2025 04:04:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1757329490; x=1757934290; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=uEAIp0DYl1Xwe/73rHShVaZ78Dx1KlFaAhIncnTkviA=; b=AM9lbuBVCivdH4WDGm/kNghfhcsCFeyqjR8OfZsKPJhJCf4eLyZYfeg3iowLw/WX0a L20OAW/EFk/028vrSHbgP1S0wJN0FoU6PKwEz0tAihnj43HnXivptFGWTqPeExH2gv/8 bhBnK2oPKPJugLXos4tQAiuBPMrXzEpoCG9wfxhIwF8TqruKuO5StoHnuG0Jsl9FDS2i Tz2ow+ZO7NtRtOEwtOhOlKWAQfuLpi+LwnaEF9vlkpoSVa+LpNDDO+JxafpskqTGr6Ko PqUQy8wOAGlj1mwtbyANYBQx65Blz6oORuo//0pyM10VRZt6IUt3pLbj3G+J9lnakCY/ /zsQ== X-Gm-Message-State: AOJu0Yyygp1FckzA3FEfgvm3o6nkHV/LokjlNSZFXJO2tm83BaSu2PaB SkGp9XVxNiBMvuhrpdROISH0/LC2YBxqcrUuhG512AXKL5T0+V4uIftMDIGDqPFXZBoDqEigOuJ oY+3ob1Yg8Z5PqnSjYkj6NNf1peiLvAiYqc/Z8Ec0yaloFsI9u43dbY0g4R+62U67s7j2/oQKLn D82aaDV770yVl/OeW3SgqhLZj3Pn3MzDw3 X-Gm-Gg: ASbGncuWTFADmsIvHGbBOBIBshf5owzzJan2JUONC7liCSMGBrDzSMyvtO06NKw//cc /sE0r6osi1id/h1ayN5JvKS3VTENThgIufN9E+jAtg/GMkQd4zvhL/JosJvAELqSWU9oWECVJRZ 6Jk9LHjM76fK+xEkr2mjupQUdCPaFYcBupgE46Tfx1vwms6nS5EdKCGi42BYLYgf7vbxJYJGf89 UiZriWQ7QkkHfY6O5TccoZQggrfM7W/izEL//n0+SM6hYCOw83teuvwlMdqE6DyMT0pjROwxxQ7 Rj8eVfuHOWPgVR22pORPwCYEb979orm1tZ1OQQnaMZOg1sGj2f9bBbNZrOza6xI= X-Received: by 2002:a05:6a00:2e87:b0:736:a8db:93bb with SMTP id d2e1a72fcca58-7742dc98d81mr7376190b3a.5.1757329490045; Mon, 08 Sep 2025 04:04:50 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEqeGPM84ZFo9z+r6y+c/aU71wso6hyOFI0r58Yo8RU8MYdDgaEl2MUXamD9qKCiq7WAcykvA== X-Received: by 2002:a05:6a00:2e87:b0:736:a8db:93bb with SMTP id d2e1a72fcca58-7742dc98d81mr7376163b3a.5.1757329489493; Mon, 08 Sep 2025 04:04:49 -0700 (PDT) Received: from xugu-thinkpadp16vgen1.tokyo.csb ([240d:1a:7f6:9100:cae9:eb4b:2ad2:6a4c]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7722a269f27sm29755341b3a.12.2025.09.08.04.04.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Sep 2025 04:04:49 -0700 (PDT) From: "xugu@redhat.com" To: passt-dev@passt.top Subject: [PATCH] Reduce tcp_buf_discard size Date: Mon, 8 Sep 2025 20:04:39 +0900 Message-ID: <20250908110439.22327-1-xugu@redhat.com> X-Mailer: git-send-email 2.51.0 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: XMBDv5lp2QUUncmSpe2N_zN83xecTj-SHl3sANIHH1s_1757329490 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit content-type: text/plain; charset="US-ASCII"; x-default=true Message-ID-Hash: J3OF5NJ7TUHZFUUHWJBJ53AIP7SOTWOA X-Message-ID-Hash: J3OF5NJ7TUHZFUUHWJBJ53AIP7SOTWOA X-MailFrom: xugu@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: xugu@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: Xun Gu On kernels without SO_PEEK_OFF, a 16MB static buffer is used to discard sent data. This patch reduces the buffer to 1MB. Larger discards are now handled by using multiple iovec entries pointing to the same 1MB buffer. Signed-off-by: Xun Gu --- tcp.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++- tcp_buf.c | 18 +++++--------- tcp_internal.h | 7 +++++- tcp_vu.c | 17 ++++--------- 4 files changed, 82 insertions(+), 26 deletions(-) diff --git a/tcp.c b/tcp.c index a27b069..253cdb3 100644 --- a/tcp.c +++ b/tcp.c @@ -399,7 +399,7 @@ static int tcp_sock_ns [NUM_PORTS][IP_VERSIONS]; */ static union inany_addr low_rtt_dst[LOW_RTT_TABLE_SIZE]; -char tcp_buf_discard [MAX_WINDOW]; +char tcp_buf_discard [BUF_DISCARD_SIZE]; /* Does the kernel support TCP_PEEK_OFF? */ bool peek_offset_cap; @@ -3766,3 +3766,67 @@ fail: return 0; } + +/** + * tcp_prepare_iov() - Prepare iov according to kernel capability + * @msg: Message header to update + * @iov: iovec to receive TCP payload and data to discard + * @already_sent: Bytes sent after the last acknowledged one + * @payload_iov_cnt: Number of TCP payload iovec entries + * + * Return: 0 on success, -1 if already_sent cannot be discarded fully + */ +int tcp_prepare_iov(struct msghdr *msg, struct iovec *iov, + uint32_t already_sent, int payload_iov_cnt) +{ + /* + * IOV layout + * |- tcp_buf_discard -|---------- TCP data slots ------------| + * + * with discarded data: + * |------ddddddddddddd|ttttttttttttt-------------------------| + * ^ + * | + * msg_iov + * + * without discarded data: + * |-------------------|ttttttttttttt-------------------------| + * ^ + * | + * msg_iov + * d: discard data + * t: TCP data + */ + if (peek_offset_cap) { + msg->msg_iov = iov + DISCARD_IOV_NUM; + msg->msg_iovlen = payload_iov_cnt; + } else { + int discard_cnt, discard_iov_rem; + struct iovec *iov_start; + int i; + + discard_cnt = DIV_ROUND_UP(already_sent, BUF_DISCARD_SIZE); + if (discard_cnt > DISCARD_IOV_NUM) { + debug("Failed to discard %u already sent bytes", + already_sent); + return -1; + } + + discard_iov_rem = already_sent % BUF_DISCARD_SIZE; + + iov_start = iov + (DISCARD_IOV_NUM - discard_cnt); + + /* Multiple iov entries pointing to the same buffer */ + for (i = 0; i < discard_cnt; i++) { + iov_start[i].iov_base = tcp_buf_discard; + iov_start[i].iov_len = BUF_DISCARD_SIZE; + } + if (discard_iov_rem) + iov[DISCARD_IOV_NUM - 1].iov_len = discard_iov_rem; + + msg->msg_iov = iov_start; + msg->msg_iovlen = discard_cnt + payload_iov_cnt; + } + + return 0; +} diff --git a/tcp_buf.c b/tcp_buf.c index bc898de..4ebb013 100644 --- a/tcp_buf.c +++ b/tcp_buf.c @@ -60,7 +60,7 @@ static struct tcp_tap_conn *tcp_frame_conns[TCP_FRAMES_MEM]; static unsigned int tcp_payload_used; /* recvmsg()/sendmsg() data for tap */ -static struct iovec iov_sock [TCP_FRAMES_MEM + 1]; +static struct iovec iov_sock [TCP_FRAMES_MEM + DISCARD_IOV_NUM]; static struct iovec tcp_l2_iov[TCP_FRAMES_MEM][TCP_NUM_IOVS]; @@ -326,15 +326,9 @@ int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn) iov_rem = (wnd_scaled - already_sent) % mss; } - /* Prepare iov according to kernel capability */ - if (!peek_offset_cap) { - mh_sock.msg_iov = iov_sock; - iov_sock[0].iov_base = tcp_buf_discard; - iov_sock[0].iov_len = already_sent; - mh_sock.msg_iovlen = fill_bufs + 1; - } else { - mh_sock.msg_iov = &iov_sock[1]; - mh_sock.msg_iovlen = fill_bufs; + if (tcp_prepare_iov(&mh_sock, iov_sock, already_sent, fill_bufs)) { + tcp_rst(c, conn); + return -1; } if (tcp_payload_used + fill_bufs > TCP_FRAMES_MEM) { @@ -344,12 +338,12 @@ int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn) tcp_payload_used = 0; } - for (i = 0, iov = iov_sock + 1; i < fill_bufs; i++, iov++) { + for (i = 0, iov = iov_sock + DISCARD_IOV_NUM; i < fill_bufs; i++, iov++) { iov->iov_base = &tcp_payload[tcp_payload_used + i].data; iov->iov_len = mss; } if (iov_rem) - iov_sock[fill_bufs].iov_len = iov_rem; + iov_sock[fill_bufs + DISCARD_IOV_NUM - 1].iov_len = iov_rem; /* Receive into buffers, don't dequeue until acknowledged by guest. */ do diff --git a/tcp_internal.h b/tcp_internal.h index 9dae688..d0009f8 100644 --- a/tcp_internal.h +++ b/tcp_internal.h @@ -9,6 +9,9 @@ #define MAX_WS 8 #define MAX_WINDOW (1 << (16 + (MAX_WS))) +#define BUF_DISCARD_SIZE (1 << 20) +#define DISCARD_IOV_NUM DIV_ROUND_UP(MAX_WINDOW, BUF_DISCARD_SIZE) + #define MSS4 ROUND_DOWN(IP_MAX_MTU - \ sizeof(struct tcphdr) - \ sizeof(struct iphdr), \ @@ -139,7 +142,7 @@ struct tcp_syn_opts { .ws = TCP_OPT_WS(ws_), \ }) -extern char tcp_buf_discard [MAX_WINDOW]; +extern char tcp_buf_discard [BUF_DISCARD_SIZE]; void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn, unsigned long flag); @@ -180,4 +183,6 @@ int tcp_prepare_flags(const struct ctx *c, struct tcp_tap_conn *conn, size_t *optlen); int tcp_set_peek_offset(const struct tcp_tap_conn *conn, int offset); +int tcp_prepare_iov(struct msghdr *msg, struct iovec *iov, + uint32_t already_sent, int payload_iov_cnt); #endif /* TCP_INTERNAL_H */ diff --git a/tcp_vu.c b/tcp_vu.c index cb39bc2..097ca13 100644 --- a/tcp_vu.c +++ b/tcp_vu.c @@ -35,7 +35,7 @@ #include "vu_common.h" #include -static struct iovec iov_vu[VIRTQUEUE_MAX_SIZE + 1]; +static struct iovec iov_vu[VIRTQUEUE_MAX_SIZE + DISCARD_IOV_NUM]; static struct vu_virtq_element elem[VIRTQUEUE_MAX_SIZE]; static int head[VIRTQUEUE_MAX_SIZE + 1]; @@ -200,7 +200,7 @@ static ssize_t tcp_vu_sock_recv(const struct ctx *c, hdrlen = tcp_vu_hdrlen(v6); - vu_init_elem(elem, &iov_vu[1], VIRTQUEUE_MAX_SIZE); + vu_init_elem(elem, &iov_vu[DISCARD_IOV_NUM], VIRTQUEUE_MAX_SIZE); elem_cnt = 0; *head_cnt = 0; @@ -228,16 +228,9 @@ static ssize_t tcp_vu_sock_recv(const struct ctx *c, elem_cnt += cnt; } - if (peek_offset_cap) { - mh_sock.msg_iov = iov_vu + 1; - mh_sock.msg_iovlen = elem_cnt; - } else { - iov_vu[0].iov_base = tcp_buf_discard; - iov_vu[0].iov_len = already_sent; - - mh_sock.msg_iov = iov_vu; - mh_sock.msg_iovlen = elem_cnt + 1; - } + if (tcp_prepare_iov(&mh_sock, iov_vu, already_sent, elem_cnt)) + /* Expect caller to do a TCP reset */ + return -1; do ret = recvmsg(s, &mh_sock, MSG_PEEK); -- 2.51.0