From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202602 header.b=L/8gwkzV; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 0D9C95A026D for ; Thu, 21 May 2026 04:29:14 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202602; t=1779330549; bh=YM1kCMVcRRxlAg6RXN1bcdgWKl1zMMcwx576R3NWXFk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=L/8gwkzV28h5Cddj6YCfhkbIk2AmdkAqS2otb+01Uza2nGgn31pY3rVtAKC+VD8Cl 1hvpWcTMsq5B5/MVCROy/rRK/3OfKgqK+BWtmhm5YkOw30o1Wpfp3OUFKs47aehoLz X3xSqiS11K1svkwColWgj4nA6Hm9ZLTSS1rBuIqjkPAR6tOgjzt9XNz5VJcOjFZuyb CDu7VTsqIzF4W+p9R79+bY4QCtx/u4AYzLrsQO6qhCeYpVRks0JpI25C6tOEt/zzIR vqClvm/YaQwIBeEswNdJqIZ9Gcz633bH72f/RRYUzPOyvdioFAp5PBR+vKh7cOufsP ai6dBhp48H9Cg== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4gLXR96Nvwz4wLT; Thu, 21 May 2026 12:29:09 +1000 (AEST) Date: Thu, 21 May 2026 12:03:33 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 5/6] tcp_splice: Simplify EPOLLRDHUP / eof / FIN handling Message-ID: References: <20260520130851.436931-1-david@gibson.dropbear.id.au> <20260520130851.436931-6-david@gibson.dropbear.id.au> <20260520223003.37ceb0f8@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="pfwUcQ5dvr/cMkni" Content-Disposition: inline In-Reply-To: <20260520223003.37ceb0f8@elisabeth> Message-ID-Hash: O34OAJ5KNKOXJRSPP3B3XXSHQXYYVWSQ X-Message-ID-Hash: O34OAJ5KNKOXJRSPP3B3XXSHQXYYVWSQ X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Paul Holzinger X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --pfwUcQ5dvr/cMkni Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, May 20, 2026 at 10:30:04PM +0200, Stefano Brivio wrote: > On Wed, 20 May 2026 23:08:50 +1000 > David Gibson wrote: >=20 > > There are two ways we can tell one of our sockets has received a FIN. = We > > can either see an EPOLLRDHUP epoll event, or we can get a zero-length r= ead > > (EOF) on the socket. We currently use both, in a mildly confusing way: > > we only set the FIN_RCVD() flag based on the EPOLLRDHUP event, but then > > some other close out logic is based on seeing an EOF. > >=20 > > Simplify this by setting the flag based on only the EOF. To make sure = we > > don't miss an event if we get an EPOLLRDHUP with no data, we trigger the > > forwarding path for EPOLLRDHUP as well as EPOLLIN. > >=20 > > Signed-off-by: David Gibson > > --- > > tcp_splice.c | 14 +++++--------- > > 1 file changed, 5 insertions(+), 9 deletions(-) > >=20 > > diff --git a/tcp_splice.c b/tcp_splice.c > > index 8fbd490f..b45f0060 100644 > > --- a/tcp_splice.c > > +++ b/tcp_splice.c > > @@ -487,7 +487,6 @@ static int tcp_splice_forward(struct ctx *c, struct > > uint8_t lowat_set_flag =3D RCVLOWAT_SET(fromsidei); > > uint8_t lowat_act_flag =3D RCVLOWAT_ACT(fromsidei); > > int never_read =3D 1; > > - int eof =3D 0; > > =20 > > while (1) { > > ssize_t readlen, written; > > @@ -510,7 +509,7 @@ retry: > > flow_trace(conn, "%zi from read-side call", readlen); > > =20 > > if (!readlen) { > > - eof =3D 1; > > + conn_event(conn, FIN_RCVD(fromsidei)); >=20 > I'm not sure if I really found a concrete issue with this, but it looks > a bit scary, because it changes the semantics of FIN_RCVD, which used to > mean that we infer we received a FIN, regardless of whether we're done > processing all data from that half of the connection. >=20 > Now FIN_RCVD is only set if we actually processed all the data and we > hit the end of file. True. But the only place that tested FIN_RCVD was at the end of tcp_splice_forward(), conditional on 'eof' anyway. In a sense, this was the cause of bug202 - we had FIN_RCVD set, but we didn't process it and shutdown() on the other side, because we didn't have eof. > The (potential) issue I see here is that we get EPOLLRDHUP, splice() > returns -1 with EAGAIN in errno because we had no room in the pipe, > and it would have returned 0 instead. >=20 > Will we ever get our zero-sized "read" later? If not, we might have > missed EPOLLRDHUP *and* the end of file. I'm not entirely sure we have > guarantees in that sense from splice(). It's not really about guarantees from splice. I'm pretty sure this is ok, reasoning as follows. Consider all the exit points from the loop body: - Each return is a return -1, so we kill the connection anyway. They don't matter - Each continue, goto retry and the end of the body will do the read side splice() again, so get another chance to see the EOF - That leaves just the breaks Consider each break (there are three, since patch 2 of this series) if (written < 0) { if (!conn->pending[fromsidei]) break; (1) The pipe is empty and the write-splice returned EAGAIN, so it didn't remove data from the pipe. Therefore, the pipe must have been empty before the write-splice. Which means the read-splice can't have blocked on a full pipe. conn_event(conn, OUT_WAIT(!fromsidei)); break; } (2) The pipe is non-empty and the write-splice returned EAGAIN, so it must have blocked on the output socket. We've set OUT_WAIT(), so we'll get an EPOLLOUT at some point which will cause us to read-splice again, meaning we get another chance to see the EOF. [...] if (conn->events & FIN_RCVD(fromsidei)) break; (3) By the new semantics of FIN_RCVD, we *have* seen the EOF. > The existing implementation distinguishes between end-of-file we hit in > a given iteration, and EPOLLRDHUP we might have seen at any time. > That was actually intended. It might be intended, but I can't see that we did anything with that information. That said the conditions on which we exit / retry this loop are pretty darn confusing. I'll see if I can improve them. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --pfwUcQ5dvr/cMkni Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmoOZ/QACgkQzQJF27ox 2GfALA/8CwqRjoVRrCMfAhi62zXAmLOs0wJ1LnxCOYrZtlucY+tjWeMXr081F6Z6 qJy92Yweu8mPxj9/L3/VC2wW7NHNMhgSC1wvvoEX+kPTssGZSRfzi+3T5g4YRETj 0jPVrUh9dNowlGxGSR4hWnoxgO/rGsnJxAHbxrzUntP04ntIP/iRIiN8UfJ935Jy 00s3Wx1tri+2h5xc0wxs5yRJwSMHmDZRnFjZaKtiPQ9e7EBL67RGsWPc6hgpmdTh MIenWbE8ZNH91GneXY05ofYRmoYT1XY4qM5fTVulQRb+RWSlQ4aAaeOY3PDMDeLr OmCxWIV0benCWYWpwEFrQMxzcN4Wb+B+te4W35ufMkzmgK/41XI8Ad3Tj3y5Mo6T xwmQ66zapnMJLw/vhfb4+zI6XKvsF3AMqqZTlZKOkHBEG3l/wM4rmOQ0p8NvHoub dkSONGlKgi/2kkUWz+iQxm7DytA9PoX8O0lIG5t6aXV3ggEWma04OXd/N1mi+wwM gTboiElBeZ9jL8zgRvJs/VorD1RIcfyhuqgkN7TYW20jYrBRet/isyaFGsnGxqDI q1ekfR0S772u8dv0oS0fsXhETUsc/eplS88TcZZKRznDnNsm5XACjMqgLo7W0HD9 XtBl9MAA41tDc6ieZ775ofb+SYBZq0/SYw83C2H6NrAPgNdYukg= =vVL+ -----END PGP SIGNATURE----- --pfwUcQ5dvr/cMkni--