From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top, Paul Holzinger <pholzing@redhat.com>,
Anshu Kumari <anskuma@redhat.com>
Subject: Re: [PATCH 3/6] tcp_splice: Clean up flow control path for splice forwarding
Date: Fri, 12 Jun 2026 18:18:47 +0200 (CEST) [thread overview]
Message-ID: <20260612181841.40e698e7@elisabeth> (raw)
In-Reply-To: <ag5W0IzdN01FJhYH@zatzit>
On Thu, 21 May 2026 10:50:24 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:
> On Wed, May 20, 2026 at 10:28:52PM +0200, Stefano Brivio wrote:
> > Ah, yes, it looks better now. Three remarks:
> >
> > On Wed, 20 May 2026 23:08:48 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >
> > > Splice forwarding can be blocked either waiting for data from one side
> > > or waiting for space on the other. For that reason,
> > > tcp_splice_sock_handler() on either socket can forward data in either or
> > > both directions, depending on whether we have EPOLLIN, EPOLLOUT or both
> > > events.
> > >
> > > The flow control for this is quite hard to follow though, since we forward
> > > in one direction, then sometimes loop back with a goto to do it in the
> > > other direction. Simplify this by adding a tcp_splice_forward() function
> > > with the logic to forward in one direction and calling it either once or
> > > twice from tcp_splice_sock_handler().
> > >
> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > ---
> > > tcp_splice.c | 137 ++++++++++++++++++++++++++-------------------------
> > > 1 file changed, 71 insertions(+), 66 deletions(-)
> > >
> > > diff --git a/tcp_splice.c b/tcp_splice.c
> > > index 34ffea73..18e8b303 100644
> > > --- a/tcp_splice.c
> > > +++ b/tcp_splice.c
> > > @@ -474,67 +474,20 @@ void tcp_splice_conn_from_sock(const struct ctx *c, union flow *flow, int s0)
> > > }
> > >
> > > /**
> > > - * tcp_splice_sock_handler() - Handler for socket mapped to spliced connection
> > > + * tcp_splice_forward() - Forward data in one direction using splice()
> > > * @c: Execution context
> > > - * @ref: epoll reference
> > > - * @events: epoll events bitmap
> > > + * @conn: Connection to forward data for
> > > + * @fromsidei: Side to forward data from
> > > *
> > > * #syscalls:pasta splice
> > > */
> > > -void tcp_splice_sock_handler(struct ctx *c, union epoll_ref ref,
> > > - uint32_t events)
> > > +static int tcp_splice_forward(struct ctx *c, struct
> > > + tcp_splice_conn *conn, unsigned fromsidei)
> >
> > I think the struct
> > argument should all be on the same line.
>
> Oops, definitely. Forgot to document the return value too.
>
> > > {
> > > - struct tcp_splice_conn *conn = conn_at_sidx(ref.flowside);
> > > - unsigned evsidei = ref.flowside.sidei, fromsidei;
> > > - uint8_t lowat_set_flag, lowat_act_flag;
> > > - int eof, never_read;
> > > -
> > > - assert(conn->f.type == FLOW_TCP_SPLICE);
> > > -
> > > - if (conn->events == SPLICE_CLOSED)
> > > - return;
> > > -
> > > - if (events & EPOLLERR) {
> > > - int err, rc;
> > > - socklen_t sl = sizeof(err);
> > > -
> > > - rc = getsockopt(ref.fd, SOL_SOCKET, SO_ERROR, &err, &sl);
> > > - if (rc)
> > > - flow_perror(conn, "Error retrieving SO_ERROR");
> > > - else
> > > - flow_dbg(conn, "Error event on %s socket: %s",
> > > - pif_name(conn->f.pif[evsidei]),
> > > - strerror_(err));
> > > - goto reset;
> > > - }
> > > -
> > > - if (conn->events == SPLICE_CONNECT) {
> > > - if (!(events & EPOLLOUT)) {
> > > - flow_err(conn, "Unexpected events 0x%x during connect",
> > > - events);
> > > - goto reset;
> > > - }
> > > - if (tcp_splice_connect_finish(c, conn))
> > > - goto reset;
> > > - }
> > > -
> > > - if (events & EPOLLOUT) {
> > > - fromsidei = !evsidei;
> > > - conn_event(conn, ~OUT_WAIT(evsidei));
> > > - } else {
> > > - fromsidei = evsidei;
> > > - }
> > > -
> > > - if (events & EPOLLRDHUP)
> > > - /* For side 0 this is fake, but implied */
> > > - conn_event(conn, FIN_RCVD(evsidei));
> > > -
> > > -swap:
> > > - eof = 0;
> > > - never_read = 1;
> > > -
> > > - lowat_set_flag = RCVLOWAT_SET(fromsidei);
> > > - lowat_act_flag = RCVLOWAT_ACT(fromsidei);
> > > + uint8_t lowat_set_flag = RCVLOWAT_SET(fromsidei);
> > > + uint8_t lowat_act_flag = RCVLOWAT_ACT(fromsidei);
> > > + int never_read = 1;
> > > + int eof = 0;
> > >
> > > while (1) {
> > > ssize_t readlen, written, pending;
> > > @@ -551,7 +504,7 @@ retry:
> > > if (readlen < 0 && errno != EAGAIN) {
> > > flow_perror(conn, "Splicing from %s socket",
> > > pif_name(conn->f.pif[fromsidei]));
> > > - goto reset;
> > > + return -1;
> > > }
> > >
> > > flow_trace(conn, "%zi from read-side call", readlen);
> > > @@ -578,7 +531,7 @@ retry:
> > > if (written < 0 && errno != EAGAIN) {
> > > flow_perror(conn, "Splicing to %s socket",
> > > pif_name(conn->f.pif[!fromsidei]));
> > > - goto reset;
> > > + return -1;
> > > }
> > >
> > > flow_trace(conn, "%zi from write-side call (passed %zi)",
> > > @@ -639,24 +592,76 @@ retry:
> > > if (shutdown(conn->s[!sidei], SHUT_WR) < 0) {
> > > flow_perror(conn, "shutdown() on %s",
> > > pif_name(conn->f.pif[!sidei]));
> > > - goto reset;
> > > + return -1;
> > > }
> > > conn_event(conn, FIN_SENT(!sidei));
> > > }
> > > }
> > > }
> > >
> > > - if (CONN_HAS(conn, FIN_SENT(0) | FIN_SENT(1))) {
> > > - /* Clean close, no reset */
> > > - conn_flag(conn, CLOSING);
> > > + return 0;
> > > +}
> > > +
> > > +/**
> > > + * tcp_splice_sock_handler() - Handler for socket mapped to spliced connection
> > > + * @c: Execution context
> > > + * @ref: epoll reference
> > > + * @events: epoll events bitmap
> > > + */
> > > +void tcp_splice_sock_handler(struct ctx *c, union epoll_ref ref,
> > > + uint32_t events)
> > > +{
> > > + struct tcp_splice_conn *conn = conn_at_sidx(ref.flowside);
> > > + unsigned evsidei = ref.flowside.sidei;
> > > +
> > > + assert(conn->f.type == FLOW_TCP_SPLICE);
> > > +
> > > + if (conn->events == SPLICE_CLOSED)
> > > return;
> > > +
> > > + if (events & EPOLLERR) {
> > > + int err, rc;
> > > + socklen_t sl = sizeof(err);
> > > +
> > > + rc = getsockopt(ref.fd, SOL_SOCKET, SO_ERROR, &err, &sl);
> > > + if (rc)
> > > + flow_perror(conn, "Error retrieving SO_ERROR");
> > > + else
> > > + flow_dbg(conn, "Error event on %s socket: %s",
> > > + pif_name(conn->f.pif[evsidei]),
> > > + strerror_(err));
> > > + goto reset;
> > > + }
> > > +
> > > + if (conn->events == SPLICE_CONNECT) {
> > > + if (!(events & EPOLLOUT)) {
> > > + flow_err(conn, "Unexpected events 0x%x during connect",
> > > + events);
> > > + goto reset;
> > > + }
> > > + if (tcp_splice_connect_finish(c, conn))
> > > + goto reset;
> > > + }
> > > +
> > > + if (events & EPOLLRDHUP)
> > > + /* For side 0 this is fake, but implied */
> > > + conn_event(conn, FIN_RCVD(evsidei));
> >
> > I saw this all goes away in 5/6, so it wouldn't be relevant. But in
> > case we decide to drop 5/6, here are my remarks on the this.
> >
> > EPOLLRDHUP is now handled before checking the other direction of the
> > connection in case of EPOLLOUT.
>
> I'm pretty sure that hasn't changed. In the old code EPOLLRDHUP
> handling was before we did any of the actual data handling for EPOLLIN
> or EPOLLOUT.
Well, kind of, in the sense that it's true we did that before any data
handling, but we had two checks:
if (conn->events == SPLICE_CLOSED)
return;
[...]
if (conn->events == SPLICE_CONNECT) {
if (!(events & EPOLLOUT)) {
[...]
goto reset;
}
if (tcp_splice_connect_finish(c, conn))
goto reset;
}
based on conn->events _before_ setting FIN_RCVD(evsidei).
Now, that should never be relevant for SPLICE_CLOSED. I'm not sure about
SPLICE_CONNECT, what if we get EPOLLRDHUP right away as we are
re-establishing a connection? I need to look into that, but I wasn't
able to see any difference in behaviour so far.
What I really missed here is:
> > I think it actually makes more sense this way because we update flags
> > with everything we know until that point, and it shouldn't have a
> > functional effect (the check at the end of the new tcp_splice_forward()
> > is on FIN_RCVD(fromsidei)), but I'm raising that in case the change
> > wasn't intended.
> >
> > > +
> > > + if (events & EPOLLOUT) {
> > > + if (tcp_splice_forward(c, conn, !evsidei))
> > > + goto reset;
> > > + conn_event(conn, ~OUT_WAIT(evsidei));
^^^
this swap, which caused https://bugs.passt.top/show_bug.cgi?id=207.
Earlier, we had:
if (events & EPOLLOUT) {
fromsidei = !evsidei;
conn_event(conn, ~OUT_WAIT(evsidei));
} ...
and then the rest of what's now tcp_splice_forward().
If we clear OUT_WAIT *later*, even if tcp_splice_forward() decides to keep
it after processing an EPOLLOUT event, we'll miss events.
It turns out that 4ccb2eebaa02 ("tcp_splice: Simplify / correct OUT_WAIT
flag handling") fixes this. I'm still checking whether the fix is complete
though.
> > > }
> > >
> > > - if ((events & (EPOLLIN | EPOLLOUT)) == (EPOLLIN | EPOLLOUT)) {
> > > - events = EPOLLIN;
> > > + if (events & EPOLLIN) {
> > > + if (tcp_splice_forward(c, conn, evsidei))
> > > + goto reset;
> >
> > This should be:
> >
> > goto reset;
> >
> > instead of:
> >
> > goto reset;
>
> Oops, fixed.
>
> >
> > > + }
> > >
> > > - fromsidei = !fromsidei;
> > > - goto swap;
> > > + if (CONN_HAS(conn, FIN_SENT(0) | FIN_SENT(1))) {
> > > + /* Clean close, no reset */
> > > + conn_flag(conn, CLOSING);
> > > + return;
> > > }
> > >
> > > if (events & EPOLLHUP) {
> >
> > --
> > Stefano
--
Stefano
next prev parent reply other threads:[~2026-06-12 16:18 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-20 13:08 [PATCH 0/6] Fix race condition while closing spliced connections David Gibson
2026-05-20 13:08 ` [PATCH 1/6] tcp_splice: Improve error reporting David Gibson
2026-05-20 14:31 ` Stefano Brivio
2026-05-21 0:43 ` David Gibson
2026-05-21 5:08 ` Stefano Brivio
2026-05-20 13:08 ` [PATCH 2/6] tcp_splice: Avoid missing EOF recognition while forwarding David Gibson
2026-05-20 20:28 ` Stefano Brivio
2026-05-21 0:46 ` David Gibson
2026-05-20 13:08 ` [PATCH 3/6] tcp_splice: Clean up flow control path for splice forwarding David Gibson
2026-05-20 20:28 ` Stefano Brivio
2026-05-21 0:50 ` David Gibson
2026-06-12 16:18 ` Stefano Brivio [this message]
2026-06-12 16:55 ` Stefano Brivio
2026-05-20 13:08 ` [PATCH 4/6] tcp_splice: Simplify tracking of read/written bytes David Gibson
2026-05-20 20:29 ` Stefano Brivio
2026-05-21 0:54 ` David Gibson
2026-05-20 13:08 ` [PATCH 5/6] tcp_splice: Simplify EPOLLRDHUP / eof / FIN handling David Gibson
2026-05-20 20:30 ` Stefano Brivio
2026-05-21 2:03 ` David Gibson
2026-05-21 5:40 ` Stefano Brivio
2026-05-21 6:56 ` David Gibson
2026-05-21 7:15 ` Stefano Brivio
2026-05-21 13:51 ` David Gibson
2026-05-21 15:18 ` Stefano Brivio
2026-05-22 1:29 ` David Gibson
2026-05-20 13:08 ` [PATCH 6/6] tcp_splice: Simplify shutdown(2) handling David Gibson
2026-05-20 20:30 ` Stefano Brivio
2026-05-21 2:11 ` David Gibson
2026-05-21 5:40 ` Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260612181841.40e698e7@elisabeth \
--to=sbrivio@redhat.com \
--cc=anskuma@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
--cc=pholzing@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).