From: David Gibson <david@gibson.dropbear.id.au>
To: passt-dev@passt.top, Stefano Brivio <sbrivio@redhat.com>
Cc: Paul Holzinger <pholzing@redhat.com>,
David Gibson <david@gibson.dropbear.id.au>
Subject: [PATCH 4/6] tcp_splice: Simplify tracking of read/written bytes
Date: Wed, 20 May 2026 23:08:49 +1000 [thread overview]
Message-ID: <20260520130851.436931-5-david@gibson.dropbear.id.au> (raw)
In-Reply-To: <20260520130851.436931-1-david@gibson.dropbear.id.au>
For each each direction of each spliced connection, we keep track of how
many bytes we've read from one socket and written to the other. However,
we never actually care about the absolute values of these, only the
difference between them, which represents how much data is currently "in
flight" in the splicing pipe.
Simplify the handling by having a single variable tracking the number of
bytes in the pipe.
As a bonus, the new scheme makes it clearer that we don't need to worry
about overflows: pending can never become larger than the maximum pipe
bufffer size, well within 32-bits.
I _think_ the old scheme was safe in the case of overflow - again under
the assumption that read/written can never be further apart than the pipe
buffer size. However, it's much harder to reason about this case. It's
certainly plausible that an overflow could occur - sending 4GiB through
a local socket is entirely achievable.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_conn.h | 6 ++----
tcp_splice.c | 18 +++++++++---------
2 files changed, 11 insertions(+), 13 deletions(-)
diff --git a/tcp_conn.h b/tcp_conn.h
index 9f5bee03..c8381aa7 100644
--- a/tcp_conn.h
+++ b/tcp_conn.h
@@ -206,8 +206,7 @@ struct tcp_tap_transfer_ext {
* @f: Generic flow information
* @s: File descriptor for sockets
* @pipe: File descriptors for pipes
- * @read: Bytes read (not fully written to other side in one shot)
- * @written: Bytes written (not fully written from one other side read)
+ * @pending: Bytes currently in each pipe
* @events: Events observed/actions performed on connection
* @flags: Connection flags (attributes, not events)
*/
@@ -218,8 +217,7 @@ struct tcp_splice_conn {
int s[SIDES];
int pipe[SIDES][2];
- uint32_t read[SIDES];
- uint32_t written[SIDES];
+ uint32_t pending[SIDES];
uint8_t events;
#define SPLICE_CLOSED 0
diff --git a/tcp_splice.c b/tcp_splice.c
index 18e8b303..8fbd490f 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -292,7 +292,7 @@ bool tcp_splice_flow_defer(struct tcp_splice_conn *conn)
conn->s[sidei] = -1;
}
- conn->read[sidei] = conn->written[sidei] = 0;
+ conn->pending[sidei] = 0;
}
conn->events = SPLICE_CLOSED;
@@ -490,7 +490,7 @@ static int tcp_splice_forward(struct ctx *c, struct
int eof = 0;
while (1) {
- ssize_t readlen, written, pending;
+ ssize_t readlen, written;
int more = 0;
retry:
@@ -537,7 +537,7 @@ retry:
flow_trace(conn, "%zi from write-side call (passed %zi)",
written, c->tcp.pipe_size);
- /* Most common case: skip updating counters. */
+ /* Most common case: skip updating pending. */
if (readlen > 0 && readlen == written) {
if (readlen >= (long)c->tcp.pipe_size * 10 / 100)
continue;
@@ -561,11 +561,11 @@ retry:
continue;
}
- conn->read[fromsidei] += readlen > 0 ? readlen : 0;
- conn->written[fromsidei] += written > 0 ? written : 0;
+ conn->pending[fromsidei] += readlen > 0 ? readlen : 0;
+ conn->pending[fromsidei] -= written > 0 ? written : 0;
if (written < 0) {
- if (conn->read[fromsidei] == conn->written[fromsidei])
+ if (!conn->pending[fromsidei])
break;
conn_event(conn, OUT_WAIT(!fromsidei));
@@ -575,15 +575,15 @@ retry:
if (never_read && written == (long)(c->tcp.pipe_size))
goto retry;
- pending = conn->read[fromsidei] - conn->written[fromsidei];
- if (!never_read && written > 0 && written < pending)
+ if (!never_read && written > 0 &&
+ written < conn->pending[fromsidei])
goto retry;
if (eof)
break;
}
- if (conn->read[fromsidei] == conn->written[fromsidei] && eof) {
+ if (!conn->pending[fromsidei] && eof) {
unsigned sidei;
flow_foreach_sidei(sidei) {
--
2.54.0
next prev parent reply other threads:[~2026-05-20 13:08 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-20 13:08 [PATCH 0/6] Fix race condition while closing spliced connections David Gibson
2026-05-20 13:08 ` [PATCH 1/6] tcp_splice: Improve error reporting David Gibson
2026-05-20 14:31 ` Stefano Brivio
2026-05-20 13:08 ` [PATCH 2/6] tcp_splice: Avoid missing EOF recognition while forwarding David Gibson
2026-05-20 13:08 ` [PATCH 3/6] tcp_splice: Clean up flow control path for splice forwarding David Gibson
2026-05-20 13:08 ` David Gibson [this message]
2026-05-20 13:08 ` [PATCH 5/6] tcp_splice: Simplify EPOLLRDHUP / eof / FIN handling David Gibson
2026-05-20 13:08 ` [PATCH 6/6] tcp_splice: Simplify shutdown(2) handling David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260520130851.436931-5-david@gibson.dropbear.id.au \
--to=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
--cc=pholzing@redhat.com \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).