From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202602 header.b=KustjUhH; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 269EE5A026E for ; Thu, 21 May 2026 08:37:52 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202602; t=1779345468; bh=dSt9C6Y+3gII9Jth7YOi95ugcDJ/7fsZX51RlleeTIE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=KustjUhHDF94T2oAX8pz2fxwZxgzPVul/LJhnTc/Rf872+RbHiYG9IS2xMAo9cb51 nv9tB1wPORcnamYYsdT9U5s0XDpYyS04+eN7iycBKsf8Wn5O8wld3BdxWp+vfRuZym E1lOZzJRTaq8w6+ih8bygRwpO6U3CriV3IlZqpwdI0+TdmKZxr1yyJbpRVQKssqgJp 2x6ZtiGel2SKTkaWEsEW6tocDI7S5onYizuV6vSUMaEKdDEkCKcHUJfGHE9SQzIAZS 2W2J3eUiSR0olGRSXJmw4L8UVhiNZHK4rI4ATeaWkJbGVvzQCc/fqcJhoRF/xm3Zmx hVmIRhb7SM09g== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4gLdy44sJDz4wLb; Thu, 21 May 2026 16:37:48 +1000 (AEST) From: David Gibson To: passt-dev@passt.top, Stefano Brivio Subject: [PATCH v2 4/4] tcp_splice: Simplify tracking of read/written bytes Date: Thu, 21 May 2026 16:37:45 +1000 Message-ID: <20260521063745.1211215-5-david@gibson.dropbear.id.au> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260521063745.1211215-1-david@gibson.dropbear.id.au> References: <20260521063745.1211215-1-david@gibson.dropbear.id.au> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-ID-Hash: AMJDYBF4MGQCVFRWSTWPW5SADSZ4FWXO X-Message-ID-Hash: AMJDYBF4MGQCVFRWSTWPW5SADSZ4FWXO X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Paul Holzinger , David Gibson X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: For each each direction of each spliced connection, we keep track of how many bytes we've read from one socket and written to the other. However, we never actually care about the absolute values of these, only the difference between them, which represents how much data is currently "in flight" in the splicing pipe. Simplify the handling by having a single variable tracking the number of bytes in the pipe. As a bonus, the new scheme makes it clearer that we don't need to worry about overflows: pending can never become larger than the maximum pipe bufffer size, well within 32-bits. I _think_ the old scheme was safe in the case of overflow - again under the assumption that read/written can never be further apart than the pipe buffer size. However, it's much harder to reason about this case. It's certainly plausible that an overflow could occur - sending 4GiB through a local socket is entirely achievable. Signed-off-by: David Gibson --- tcp_conn.h | 6 ++---- tcp_splice.c | 18 +++++++++--------- 2 files changed, 11 insertions(+), 13 deletions(-) diff --git a/tcp_conn.h b/tcp_conn.h index 9f5bee03..c8381aa7 100644 --- a/tcp_conn.h +++ b/tcp_conn.h @@ -206,8 +206,7 @@ struct tcp_tap_transfer_ext { * @f: Generic flow information * @s: File descriptor for sockets * @pipe: File descriptors for pipes - * @read: Bytes read (not fully written to other side in one shot) - * @written: Bytes written (not fully written from one other side read) + * @pending: Bytes currently in each pipe * @events: Events observed/actions performed on connection * @flags: Connection flags (attributes, not events) */ @@ -218,8 +217,7 @@ struct tcp_splice_conn { int s[SIDES]; int pipe[SIDES][2]; - uint32_t read[SIDES]; - uint32_t written[SIDES]; + uint32_t pending[SIDES]; uint8_t events; #define SPLICE_CLOSED 0 diff --git a/tcp_splice.c b/tcp_splice.c index ae92bbd9..af50e715 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -292,7 +292,7 @@ bool tcp_splice_flow_defer(struct tcp_splice_conn *conn) conn->s[sidei] = -1; } - conn->read[sidei] = conn->written[sidei] = 0; + conn->pending[sidei] = 0; } conn->events = SPLICE_CLOSED; @@ -494,7 +494,7 @@ static int tcp_splice_forward(struct ctx *c, int eof = 0; while (1) { - ssize_t readlen, written, pending; + ssize_t readlen, written; int more = 0; retry: @@ -543,7 +543,7 @@ retry: flow_trace(conn, "%zi from write-side call (passed %zi)", written, c->tcp.pipe_size); - /* Most common case: skip updating counters. */ + /* Most common case: skip updating count of pending bytes */ if (readlen > 0 && readlen == written) { if (readlen >= (long)c->tcp.pipe_size * 10 / 100) continue; @@ -567,11 +567,11 @@ retry: continue; } - conn->read[fromsidei] += readlen > 0 ? readlen : 0; - conn->written[fromsidei] += written > 0 ? written : 0; + conn->pending[fromsidei] += readlen > 0 ? readlen : 0; + conn->pending[fromsidei] -= written > 0 ? written : 0; if (written < 0) { - if (conn->read[fromsidei] == conn->written[fromsidei]) + if (!conn->pending[fromsidei]) break; conn_event(conn, OUT_WAIT(!fromsidei)); @@ -581,15 +581,15 @@ retry: if (never_read && written == (long)(c->tcp.pipe_size)) goto retry; - pending = conn->read[fromsidei] - conn->written[fromsidei]; - if (!never_read && written > 0 && written < pending) + if (!never_read && written > 0 && + written < conn->pending[fromsidei]) goto retry; if (eof) break; } - if (conn->read[fromsidei] == conn->written[fromsidei] && eof) { + if (!conn->pending[fromsidei] && eof) { unsigned sidei; flow_foreach_sidei(sidei) { -- 2.54.0