From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202602 header.b=e21ISOaL; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id F247F5A0262 for ; Thu, 04 Jun 2026 07:26:47 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202602; t=1780550804; bh=XIYzzXdj2qlee6JeXVqN0oQgRNvDqiXhpW7dSzDi4bA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=e21ISOaLlkChbNM6GVtmaSv9LMzRWAq5JLFJI1K1Lg+cwzkXPBST3h5GSG0kOfVQn 1MckJln4u/uDwX5OanEOqXhAzvRHvjPngbnIJjv8EPfQDNTavucJXpfRUggyEG+mc6 0HpoJjqtxRAkh3vqlyar56VzPdSbJENJO5avmOgngoqSz+t8y0hyPQvTj/9Px47RFn 3A4G3ijCl2dh3IdaYDa3h0W6MhsVadlu4YvOJSy2qvVSdBskDQZJlKQd33OfQq0oxe x92ES7yUu39+4Xy8MOyw32SNzHS/5M/LhnlPKQFKxY71hvU9vtEousMjEEL4+2sK5/ 6z2p2YCiXSMPw== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4gWCjc50x0z4wck; Thu, 04 Jun 2026 15:26:44 +1000 (AEST) Date: Thu, 4 Jun 2026 15:14:08 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 6/8] tcp_splice: Simplify / correct OUT_WAIT flag handling Message-ID: References: <20260528050213.679685-1-david@gibson.dropbear.id.au> <20260528050213.679685-7-david@gibson.dropbear.id.au> <20260604064134.478745ed@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="kv7LiTe1hAitSpg7" Content-Disposition: inline In-Reply-To: <20260604064134.478745ed@elisabeth> Message-ID-Hash: 7FLBLRKLR7KWEEPXNVSUYIVMNRUZN5WY X-Message-ID-Hash: 7FLBLRKLR7KWEEPXNVSUYIVMNRUZN5WY X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --kv7LiTe1hAitSpg7 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jun 04, 2026 at 06:41:37AM +0200, Stefano Brivio wrote: > On Thu, 28 May 2026 15:02:11 +1000 > David Gibson wrote: >=20 > > We set the OUT_WAIT flag if we stop forwarding due to EAGAIN, but there= 's > > still data in the pipe. That ensures we wake up when the output socket= has > > room to drain the pipe into. > >=20 > > We clear the OUT_WAIT flag when we complete forwarding on an EPOLLOUT > > event, but that's not quite right. Even though it's called on an EPOLL= OUT, > > tcp_splice_forward() could, in principle empty the pipe, but also read > > enough new data from the other side to fill it again. That would set > > OUT_WAIT internally, but it would be cleared after returning meaning > > we could miss a necessary wakeup. >=20 > The current logic in tcp_splice_sock_handler(): >=20 > if (events & EPOLLOUT) { > if (tcp_splice_forward(c, conn, !evsidei, now)) > goto reset; > conn_event(conn, ~OUT_WAIT(evsidei)); > } >=20 > if (events & EPOLLIN) { > if (tcp_splice_forward(c, conn, evsidei, now)) > goto reset; > } >=20 > would prevent the case you described, because if we read new data from > the other side filling the pipe, we'll hit (events & EPOLLIN) and set > OUT_WAIT again if needed. Nope. The (events & EPOLLIN) is an event on the same socket, forwarding in the opposite direction. The pipe would be refilled by data on the _other_ socket forwarding in the same direction. Now, _usually_ you'd then get an EPOLLIN on that other socket and that would trigger the wake up. But, this is actually a rare case where we might "miss" an event because we're using level not edge trigger (rather than the other way around). Consider just one direction of flow from socket A to socket B 1. epoll_wait() returns (just) an EPOLLOUT on socket B, nothing has arrived yet on socket A, so no EPOLLIN there. 2. Data arrives on socket A. 3. We reach tcp_splice_forward(), it empties the pipe, but refills it with the data that arrived in step (2). It happens that this also consumes all the data that arrived in (2) - we got exactly one pipe's worth of data. 4. We return from tcp_splice_forward() and clear OUT_WAIT. 5. We return to the epoll_wait(), but because we already read the data from socket A, and we're using level triggered events, we don't get an EPOLLIN 6. Space becomes available on socket B, but we don't get an EPOLLOUT, because OUT_WAIT is clear =2E..and we're stuck. Unlikely, but possible > But there's a case this should actually fix, even though I've never > seen it happening in practice: what if we *don't* read new data from > the other side, and we can't empty the pipe in one EPOLLOUT shot anyway? >=20 > I hadn't considered that before but if the receiver is slow enough > that's probably possible. True, that's probably more likely than the scenario above, actually. >=20 > > The condition on whether we need write side wakeups is actually fairly > > simple: we need them if and only if we return to the main loop with data > > in the pipe. Maintain that in a single place - right after we exit the > > forwarding loop in tcp_splice_forward(). > >=20 > > Signed-off-by: David Gibson > > --- > > tcp_splice.c | 16 +++++++++------- > > 1 file changed, 9 insertions(+), 7 deletions(-) > >=20 > > diff --git a/tcp_splice.c b/tcp_splice.c > > index 42902684..5f412584 100644 > > --- a/tcp_splice.c > > +++ b/tcp_splice.c > > @@ -531,19 +531,22 @@ static int tcp_splice_forward(struct ctx *c, > > conn->pending[fromsidei] +=3D readlen > 0 ? readlen : 0; > > conn->pending[fromsidei] -=3D written > 0 ? written : 0; > > =20 > > - if (written < 0) { > > - if (!conn->pending[fromsidei]) > > - break; > > - > > - conn_event(conn, OUT_WAIT(!fromsidei)); > > + if (written < 0) > > break; > > - } > > =20 > > if (conn->events & FIN_RCVD(fromsidei) && > > !conn->pending[fromsidei]) > > break; > > } > > =20 > > + /* We need write-side wakeups if and only if we have data in the pipe= to > > + * drain. > > + */ > > + if (conn->pending[fromsidei]) > > + conn_event(conn, OUT_WAIT(!fromsidei)); > > + else > > + conn_event(conn, ~OUT_WAIT(!fromsidei)); > > + > > if ((conn->events & FIN_RCVD(fromsidei)) && > > !(conn->events & FIN_SENT(!fromsidei)) && > > !conn->pending[fromsidei]) { > > @@ -606,7 +609,6 @@ void tcp_splice_sock_handler(struct ctx *c, union e= poll_ref ref, > > if (events & EPOLLOUT) { > > if (tcp_splice_forward(c, conn, !evsidei, now)) > > goto reset; > > - conn_event(conn, ~OUT_WAIT(evsidei)); > > } > > =20 > > if (events & (EPOLLIN | EPOLLRDHUP)) { >=20 > --=20 > Stefano >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --kv7LiTe1hAitSpg7 Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmohCY0ACgkQzQJF27ox 2Gfgkw//WoAkgmp+NFgpXMxpOGhdk8KJbQB+OIVStetmASndh6ZL67+DbZBlFXmE f2oV0OxCf+98fPJlDxzyNOda31Tz+M6Gjq4/kymRPbWITBGdXrHAO+wMEcpfQrjg bxHhhACGXFOd8mYAtJtojbfPaW0lsc8bF/qSLXwoqwXma6aOoQw4NcBmVD5MrrKg Cz0d9ciHDmOyRo+62lnfIbNbYwyGnXCCLgpKmukiezHNSkQGMFo955sh3fOAzZC3 leYlkSF058RuKQIH4NsuwqOP7woqY3sMnSOwZKTXv8F7mWgkG3nSYep/fnPXMXnO FNyGP/wJUDYC2OqWBl2Zc9xHZuh7B9d0o2yLarXcyIsMReHMow+igkts7O1HbGVH L8coLz92AJvlv0392xhSu9g1SO8qJcwrnLZbg2rrSgaKqHI4mB/Juw3ZXfyIU0PA Lo9InhR2NORKGSwxMHpsg1jqyWAtz2eiU4dnaXs4yhTEPZwS+dkWEQ/Go+2hjl4p IlLJCk46aXtRLvkQYP3fomhjK2dSXjunMNw9ppKJqPhUoCoes2cwhtf3xlS1BDSX REe5gWyvozgu9ZmBzlwMIyWvbZgbfT4bN9C633DX/DCNHW4BnulXCAXi4dunEqmj 2fvt/py4naUlAEdkl50iBCjBhXPai/vyC5h8cPWrvs7iokHLmHI= =liHe -----END PGP SIGNATURE----- --kv7LiTe1hAitSpg7--