From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202602 header.b=U4X8dcX3; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 49E875A0271 for ; Thu, 21 May 2026 04:29:14 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202602; t=1779330549; bh=40rPdNlhL1QYvOXWC8SaCyXZwndA1n/v/WKtwBCfuVo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=U4X8dcX3Ukg3V29zk0Za8VuGl9KZKrtGkTO0zr/n8T0JNBWjJPsHad3jyFaErHQKR Ce1UbeaJoC/O5fMgjlhIO1wkeuo4oPJm79Yin8p1gLkeV3n2tdK3SWfdQHYMDRA4/K EPfrBOaHNj8LojyRifiHyfgneYGhM5dAU/bjd7eVzDAhdnC6PPWJZWqHo8aGYai61h 1TvvkSxkMiGwKYeCWtfQHBwNcunoX9KLmO9nGbNCYERmGTnS7eKG5WRveHmbloKRoB +ixB6eiXy5v5XRBi/CM1uYAPvKHXZq5ToVYRwRY0maLmIcf66NOoJKSG7hXkXwIH7n +eE/OscWms2bA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4gLXR96FyNz4wLR; Thu, 21 May 2026 12:29:09 +1000 (AEST) Date: Thu, 21 May 2026 10:54:47 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 4/6] tcp_splice: Simplify tracking of read/written bytes Message-ID: References: <20260520130851.436931-1-david@gibson.dropbear.id.au> <20260520130851.436931-5-david@gibson.dropbear.id.au> <20260520222911.6d12ff70@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="ApnfoTlutpRP7oiz" Content-Disposition: inline In-Reply-To: <20260520222911.6d12ff70@elisabeth> Message-ID-Hash: VH32H7T5SKIUX3YU5PPELNBD26Y2S27H X-Message-ID-Hash: VH32H7T5SKIUX3YU5PPELNBD26Y2S27H X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Paul Holzinger X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --ApnfoTlutpRP7oiz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, May 20, 2026 at 10:29:12PM +0200, Stefano Brivio wrote: > On Wed, 20 May 2026 23:08:49 +1000 > David Gibson wrote: >=20 > > For each each direction of each spliced connection, we keep track of how > > many bytes we've read from one socket and written to the other. Howeve= r, > > we never actually care about the absolute values of these, only the > > difference between them, which represents how much data is currently "in > > flight" in the splicing pipe. > >=20 > > Simplify the handling by having a single variable tracking the number of > > bytes in the pipe. >=20 > For me it actually looks slightly more complicated to think about it > this way, I added explicit 'read' and 'written' after being bitten by > some issue I introduced with a previous 'pending' concept, but I have > to admit it slightly simplifies the overflow topic. >=20 > > As a bonus, the new scheme makes it clearer that we don't need to worry > > about overflows: pending can never become larger than the maximum pipe > > bufffer size, well within 32-bits. > >=20 > > I _think_ the old scheme was safe in the case of overflow - again under > > the assumption that read/written can never be further apart than the pi= pe > > buffer size. However, it's much harder to reason about this case. It's > > certainly plausible that an overflow could occur - sending 4GiB through > > a local socket is entirely achievable. >=20 > For me it looked pretty simple: you can overflow 32 bits (at 100 Gbps, > but without hitting the "optimised" case, it would take about five > minutes), but all the operations between the two counters are between > two uint32_t, so they would happen in uint32_t, hence modulo 32 bits, > similar to TCP sequences. Plus it's only equality comparisons, so we don't need SEQ_GT or the like. Yeah, that's the reasoning, but to me that's still a lot more than "can't exceed pipe size". > Anyway, overall, I think it's an improvement over the original. One nit > here: >=20 > > Signed-off-by: David Gibson > > --- > > tcp_conn.h | 6 ++---- > > tcp_splice.c | 18 +++++++++--------- > > 2 files changed, 11 insertions(+), 13 deletions(-) > >=20 > > diff --git a/tcp_conn.h b/tcp_conn.h > > index 9f5bee03..c8381aa7 100644 > > --- a/tcp_conn.h > > +++ b/tcp_conn.h > > @@ -206,8 +206,7 @@ struct tcp_tap_transfer_ext { > > * @f: Generic flow information > > * @s: File descriptor for sockets > > * @pipe: File descriptors for pipes > > - * @read: Bytes read (not fully written to other side in one shot) > > - * @written: Bytes written (not fully written from one other side rea= d) > > + * @pending: Bytes currently in each pipe > > * @events: Events observed/actions performed on connection > > * @flags: Connection flags (attributes, not events) > > */ > > @@ -218,8 +217,7 @@ struct tcp_splice_conn { > > int s[SIDES]; > > int pipe[SIDES][2]; > > =20 > > - uint32_t read[SIDES]; > > - uint32_t written[SIDES]; > > + uint32_t pending[SIDES]; > > =20 > > uint8_t events; > > #define SPLICE_CLOSED 0 > > diff --git a/tcp_splice.c b/tcp_splice.c > > index 18e8b303..8fbd490f 100644 > > --- a/tcp_splice.c > > +++ b/tcp_splice.c > > @@ -292,7 +292,7 @@ bool tcp_splice_flow_defer(struct tcp_splice_conn *= conn) > > conn->s[sidei] =3D -1; > > } > > =20 > > - conn->read[sidei] =3D conn->written[sidei] =3D 0; > > + conn->pending[sidei] =3D 0; > > } > > =20 > > conn->events =3D SPLICE_CLOSED; > > @@ -490,7 +490,7 @@ static int tcp_splice_forward(struct ctx *c, struct > > int eof =3D 0; > > =20 > > while (1) { > > - ssize_t readlen, written, pending; > > + ssize_t readlen, written; > > int more =3D 0; > > =20 > > retry: > > @@ -537,7 +537,7 @@ retry: > > flow_trace(conn, "%zi from write-side call (passed %zi)", > > written, c->tcp.pipe_size); > > =20 > > - /* Most common case: skip updating counters. */ > > + /* Most common case: skip updating pending. */ >=20 > "pending" isn't a noun (even though the variable name is, but it's > not quite obvious that you're referring to it). I think that: >=20 > /* Most common case: skip updating count of pending bytes */ >=20 > would be slightly clearer (and also omit the '.' because it's not a > complete sentence, as we usually do on single-line comments, similarly > to most occurrences in the kernel). Done. >=20 > > if (readlen > 0 && readlen =3D=3D written) { > > if (readlen >=3D (long)c->tcp.pipe_size * 10 / 100) > > continue; > > @@ -561,11 +561,11 @@ retry: > > continue; > > } > > =20 > > - conn->read[fromsidei] +=3D readlen > 0 ? readlen : 0; > > - conn->written[fromsidei] +=3D written > 0 ? written : 0; > > + conn->pending[fromsidei] +=3D readlen > 0 ? readlen : 0; > > + conn->pending[fromsidei] -=3D written > 0 ? written : 0; > > =20 > > if (written < 0) { > > - if (conn->read[fromsidei] =3D=3D conn->written[fromsidei]) > > + if (!conn->pending[fromsidei]) > > break; > > =20 > > conn_event(conn, OUT_WAIT(!fromsidei)); > > @@ -575,15 +575,15 @@ retry: > > if (never_read && written =3D=3D (long)(c->tcp.pipe_size)) > > goto retry; > > =20 > > - pending =3D conn->read[fromsidei] - conn->written[fromsidei]; > > - if (!never_read && written > 0 && written < pending) > > + if (!never_read && written > 0 && > > + written < conn->pending[fromsidei]) > > goto retry; > > =20 > > if (eof) > > break; > > } > > =20 > > - if (conn->read[fromsidei] =3D=3D conn->written[fromsidei] && eof) { > > + if (!conn->pending[fromsidei] && eof) { > > unsigned sidei; > > =20 > > flow_foreach_sidei(sidei) { >=20 > --=20 > Stefano >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --ApnfoTlutpRP7oiz Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmoOV9cACgkQzQJF27ox 2GfgpRAAlfRRhzkemzlGZ86ClUDUpAM64sPv4jyEyWElTj6b9kKG+Wip7ybdRO7X vuHMnQY3XoTIafuWgahJq4W+O08onHqTUuZbYTdY+XNfgEUFZlkl4+ufs+tL+KOR DkRE2gaqmDeUv7A8AlYQKviRL5OHdAXKHrS1Dfb0CAc7314yXeKV6siIAzzQ36ov Cc4xUzOablehzPnAisT5WEgebZ0YDz1eVg32bfTf42MybXGJwZq+nCJV/w8GBBJt oz31s/alIb90Jcd3WEMOKwHandpMGB/YGHH6T/fpxLM42liRmKniz9l+Kw8OZ2gq XC0E+9bdu359lepKHZdd3bP+XmE72zdfvXu37AHNQaXqv2fs3kYpRAFf1/Q8pjHx 4Tl0VKtwswQAL2I9JjL3QSoS5/FPjg+ZhXX05fV13qlKoRjmLleKzzfZ1/kDLU+f Wl2ez6xLEKluZQmaKjBjz3zI3O2EUft1V1SCWZgh62/ZraMxesbHuDfPrO/5VCY/ 669GExyr1wXAlFHsznoUxgT4tYwuB2Ht763wEQRwAZGDqlUvzQCotxaxyzkZNMjf abCMaBZ3DJBPNt0+vTcsEnhBJSr6VNKVQ85O1Lfn0A674azFxQ9AxHXO0w6yx99n xAn2FZrkjAjV2Fp4OI7CHOitLi/i3emYKsn8ozoRpXqkRADJE5w= =LvRy -----END PGP SIGNATURE----- --ApnfoTlutpRP7oiz--