public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top, Paul Holzinger <pholzing@redhat.com>,
	Anshu Kumari <anskuma@redhat.com>
Subject: Re: [PATCH 3/6] tcp_splice: Clean up flow control path for splice forwarding
Date: Fri, 12 Jun 2026 18:18:47 +0200 (CEST)	[thread overview]
Message-ID: <20260612181841.40e698e7@elisabeth> (raw)
In-Reply-To: <ag5W0IzdN01FJhYH@zatzit>

On Thu, 21 May 2026 10:50:24 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, May 20, 2026 at 10:28:52PM +0200, Stefano Brivio wrote:
> > Ah, yes, it looks better now. Three remarks:
> > 
> > On Wed, 20 May 2026 23:08:48 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > Splice forwarding can be blocked either waiting for data from one side
> > > or waiting for space on the other.  For that reason,
> > > tcp_splice_sock_handler() on either socket can forward data in either or
> > > both directions, depending on whether we have EPOLLIN, EPOLLOUT or both
> > > events.
> > > 
> > > The flow control for this is quite hard to follow though, since we forward
> > > in one direction, then sometimes loop back with a goto to do it in the
> > > other direction.  Simplify this by adding a tcp_splice_forward() function
> > > with the logic to forward in one direction and calling it either once or
> > > twice from tcp_splice_sock_handler().
> > > 
> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > ---
> > >  tcp_splice.c | 137 ++++++++++++++++++++++++++-------------------------
> > >  1 file changed, 71 insertions(+), 66 deletions(-)
> > > 
> > > diff --git a/tcp_splice.c b/tcp_splice.c
> > > index 34ffea73..18e8b303 100644
> > > --- a/tcp_splice.c
> > > +++ b/tcp_splice.c
> > > @@ -474,67 +474,20 @@ void tcp_splice_conn_from_sock(const struct ctx *c, union flow *flow, int s0)
> > >  }
> > >  
> > >  /**
> > > - * tcp_splice_sock_handler() - Handler for socket mapped to spliced connection
> > > + * tcp_splice_forward() - Forward data in one direction using splice()
> > >   * @c:		Execution context
> > > - * @ref:	epoll reference
> > > - * @events:	epoll events bitmap
> > > + * @conn:	Connection to forward data for
> > > + * @fromsidei:	Side to forward data from
> > >   *
> > >   * #syscalls:pasta splice
> > >   */
> > > -void tcp_splice_sock_handler(struct ctx *c, union epoll_ref ref,
> > > -			     uint32_t events)
> > > +static int tcp_splice_forward(struct ctx *c, struct
> > > +			      tcp_splice_conn *conn, unsigned fromsidei)  
> > 
> > I think the struct
> > argument should all be on the same line.  
> 
> Oops, definitely.  Forgot to document the return value too.
> 
> > >  {
> > > -	struct tcp_splice_conn *conn = conn_at_sidx(ref.flowside);
> > > -	unsigned evsidei = ref.flowside.sidei, fromsidei;
> > > -	uint8_t lowat_set_flag, lowat_act_flag;
> > > -	int eof, never_read;
> > > -
> > > -	assert(conn->f.type == FLOW_TCP_SPLICE);
> > > -
> > > -	if (conn->events == SPLICE_CLOSED)
> > > -		return;
> > > -
> > > -	if (events & EPOLLERR) {
> > > -		int err, rc;
> > > -		socklen_t sl = sizeof(err);
> > > -
> > > -		rc = getsockopt(ref.fd, SOL_SOCKET, SO_ERROR, &err, &sl);
> > > -		if (rc)
> > > -			flow_perror(conn, "Error retrieving SO_ERROR");
> > > -		else
> > > -			flow_dbg(conn, "Error event on %s socket: %s",
> > > -				 pif_name(conn->f.pif[evsidei]),
> > > -				 strerror_(err));
> > > -		goto reset;
> > > -	}
> > > -
> > > -	if (conn->events == SPLICE_CONNECT) {
> > > -		if (!(events & EPOLLOUT)) {
> > > -			flow_err(conn, "Unexpected events 0x%x during connect",
> > > -				 events);
> > > -			goto reset;
> > > -		}
> > > -		if (tcp_splice_connect_finish(c, conn))
> > > -			goto reset;
> > > -	}
> > > -
> > > -	if (events & EPOLLOUT) {
> > > -		fromsidei = !evsidei;
> > > -		conn_event(conn, ~OUT_WAIT(evsidei));
> > > -	} else {
> > > -		fromsidei = evsidei;
> > > -	}
> > > -
> > > -	if (events & EPOLLRDHUP)
> > > -		/* For side 0 this is fake, but implied */
> > > -		conn_event(conn, FIN_RCVD(evsidei));
> > > -
> > > -swap:
> > > -	eof = 0;
> > > -	never_read = 1;
> > > -
> > > -	lowat_set_flag = RCVLOWAT_SET(fromsidei);
> > > -	lowat_act_flag = RCVLOWAT_ACT(fromsidei);
> > > +	uint8_t lowat_set_flag = RCVLOWAT_SET(fromsidei);
> > > +	uint8_t lowat_act_flag = RCVLOWAT_ACT(fromsidei);
> > > +	int never_read = 1;
> > > +	int eof = 0;
> > >  
> > >  	while (1) {
> > >  		ssize_t readlen, written, pending;
> > > @@ -551,7 +504,7 @@ retry:
> > >  		if (readlen < 0 && errno != EAGAIN) {
> > >  			flow_perror(conn, "Splicing from %s socket",
> > >  				    pif_name(conn->f.pif[fromsidei]));
> > > -			goto reset;
> > > +			return -1;
> > >  		}
> > >  
> > >  		flow_trace(conn, "%zi from read-side call", readlen);
> > > @@ -578,7 +531,7 @@ retry:
> > >  		if (written < 0 && errno != EAGAIN) {
> > >  			flow_perror(conn, "Splicing to %s socket",
> > >  				    pif_name(conn->f.pif[!fromsidei]));
> > > -			goto reset;
> > > +			return -1;
> > >  		}
> > >  
> > >  		flow_trace(conn, "%zi from write-side call (passed %zi)",
> > > @@ -639,24 +592,76 @@ retry:
> > >  				if (shutdown(conn->s[!sidei], SHUT_WR) < 0) {
> > >  					flow_perror(conn, "shutdown() on %s",
> > >  						    pif_name(conn->f.pif[!sidei]));
> > > -					goto reset;
> > > +					return -1;
> > >  				}
> > >  				conn_event(conn, FIN_SENT(!sidei));
> > >  			}
> > >  		}
> > >  	}
> > >  
> > > -	if (CONN_HAS(conn, FIN_SENT(0) | FIN_SENT(1))) {
> > > -		/* Clean close, no reset */
> > > -		conn_flag(conn, CLOSING);
> > > +	return 0;
> > > +}
> > > +
> > > +/**
> > > + * tcp_splice_sock_handler() - Handler for socket mapped to spliced connection
> > > + * @c:		Execution context
> > > + * @ref:	epoll reference
> > > + * @events:	epoll events bitmap
> > > + */
> > > +void tcp_splice_sock_handler(struct ctx *c, union epoll_ref ref,
> > > +			     uint32_t events)
> > > +{
> > > +	struct tcp_splice_conn *conn = conn_at_sidx(ref.flowside);
> > > +	unsigned evsidei = ref.flowside.sidei;
> > > +
> > > +	assert(conn->f.type == FLOW_TCP_SPLICE);
> > > +
> > > +	if (conn->events == SPLICE_CLOSED)
> > >  		return;
> > > +
> > > +	if (events & EPOLLERR) {
> > > +		int err, rc;
> > > +		socklen_t sl = sizeof(err);
> > > +
> > > +		rc = getsockopt(ref.fd, SOL_SOCKET, SO_ERROR, &err, &sl);
> > > +		if (rc)
> > > +			flow_perror(conn, "Error retrieving SO_ERROR");
> > > +		else
> > > +			flow_dbg(conn, "Error event on %s socket: %s",
> > > +				 pif_name(conn->f.pif[evsidei]),
> > > +				 strerror_(err));
> > > +		goto reset;
> > > +	}
> > > +
> > > +	if (conn->events == SPLICE_CONNECT) {
> > > +		if (!(events & EPOLLOUT)) {
> > > +			flow_err(conn, "Unexpected events 0x%x during connect",
> > > +				 events);
> > > +			goto reset;
> > > +		}
> > > +		if (tcp_splice_connect_finish(c, conn))
> > > +			goto reset;
> > > +	}
> > > +
> > > +	if (events & EPOLLRDHUP)
> > > +		/* For side 0 this is fake, but implied */
> > > +		conn_event(conn, FIN_RCVD(evsidei));  
> > 
> > I saw this all goes away in 5/6, so it wouldn't be relevant. But in
> > case we decide to drop 5/6, here are my remarks on the this.
> > 
> > EPOLLRDHUP is now handled before checking the other direction of the
> > connection in case of EPOLLOUT.  
> 
> I'm pretty sure that hasn't changed.  In the old code EPOLLRDHUP
> handling was before we did any of the actual data handling for EPOLLIN
> or EPOLLOUT.

Well, kind of, in the sense that it's true we did that before any data
handling, but we had two checks:

	if (conn->events == SPLICE_CLOSED)
		return;

	[...]

	if (conn->events == SPLICE_CONNECT) {
		if (!(events & EPOLLOUT)) {
			[...]
			goto reset;
		}
		if (tcp_splice_connect_finish(c, conn))
			goto reset;
	}

based on conn->events _before_ setting FIN_RCVD(evsidei).

Now, that should never be relevant for SPLICE_CLOSED. I'm not sure about
SPLICE_CONNECT, what if we get EPOLLRDHUP right away as we are
re-establishing a connection? I need to look into that, but I wasn't
able to see any difference in behaviour so far.

What I really missed here is:

> > I think it actually makes more sense this way because we update flags
> > with everything we know until that point, and it shouldn't have a
> > functional effect (the check at the end of the new tcp_splice_forward()
> > is on FIN_RCVD(fromsidei)), but I'm raising that in case the change
> > wasn't intended.
> >   
> > > +
> > > +	if (events & EPOLLOUT) {
> > > +		if (tcp_splice_forward(c, conn, !evsidei))
> > > +			goto reset;
> > > +		conn_event(conn, ~OUT_WAIT(evsidei));

		^^^

this swap, which caused https://bugs.passt.top/show_bug.cgi?id=207.

Earlier, we had:

	if (events & EPOLLOUT) {
		fromsidei = !evsidei;
		conn_event(conn, ~OUT_WAIT(evsidei));
	} ...

and then the rest of what's now tcp_splice_forward().

If we clear OUT_WAIT *later*, even if tcp_splice_forward() decides to keep
it after processing an EPOLLOUT event, we'll miss events.

It turns out that 4ccb2eebaa02 ("tcp_splice: Simplify / correct OUT_WAIT
flag handling") fixes this. I'm still checking whether the fix is complete
though.

> > >  	}
> > >  
> > > -	if ((events & (EPOLLIN | EPOLLOUT)) == (EPOLLIN | EPOLLOUT)) {
> > > -		events = EPOLLIN;
> > > +	if (events & EPOLLIN) {
> > > +		if (tcp_splice_forward(c, conn, evsidei))
> > > +		    goto reset;  
> > 
> > This should be:
> > 
> > 			goto reset;
> > 
> > instead of:
> > 
> > 		    goto reset;  
> 
> Oops, fixed.
> 
> >   
> > > +	}
> > >  
> > > -		fromsidei = !fromsidei;
> > > -		goto swap;
> > > +	if (CONN_HAS(conn, FIN_SENT(0) | FIN_SENT(1))) {
> > > +		/* Clean close, no reset */
> > > +		conn_flag(conn, CLOSING);
> > > +		return;
> > >  	}
> > >  
> > >  	if (events & EPOLLHUP) {  
> > 
> > -- 
> > Stefano

-- 
Stefano


  reply	other threads:[~2026-06-12 16:18 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-20 13:08 [PATCH 0/6] Fix race condition while closing spliced connections David Gibson
2026-05-20 13:08 ` [PATCH 1/6] tcp_splice: Improve error reporting David Gibson
2026-05-20 14:31   ` Stefano Brivio
2026-05-21  0:43     ` David Gibson
2026-05-21  5:08       ` Stefano Brivio
2026-05-20 13:08 ` [PATCH 2/6] tcp_splice: Avoid missing EOF recognition while forwarding David Gibson
2026-05-20 20:28   ` Stefano Brivio
2026-05-21  0:46     ` David Gibson
2026-05-20 13:08 ` [PATCH 3/6] tcp_splice: Clean up flow control path for splice forwarding David Gibson
2026-05-20 20:28   ` Stefano Brivio
2026-05-21  0:50     ` David Gibson
2026-06-12 16:18       ` Stefano Brivio [this message]
2026-06-12 16:55         ` Stefano Brivio
2026-05-20 13:08 ` [PATCH 4/6] tcp_splice: Simplify tracking of read/written bytes David Gibson
2026-05-20 20:29   ` Stefano Brivio
2026-05-21  0:54     ` David Gibson
2026-05-20 13:08 ` [PATCH 5/6] tcp_splice: Simplify EPOLLRDHUP / eof / FIN handling David Gibson
2026-05-20 20:30   ` Stefano Brivio
2026-05-21  2:03     ` David Gibson
2026-05-21  5:40       ` Stefano Brivio
2026-05-21  6:56         ` David Gibson
2026-05-21  7:15           ` Stefano Brivio
2026-05-21 13:51             ` David Gibson
2026-05-21 15:18               ` Stefano Brivio
2026-05-22  1:29                 ` David Gibson
2026-05-20 13:08 ` [PATCH 6/6] tcp_splice: Simplify shutdown(2) handling David Gibson
2026-05-20 20:30   ` Stefano Brivio
2026-05-21  2:11     ` David Gibson
2026-05-21  5:40       ` Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260612181841.40e698e7@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=anskuma@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=pholzing@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).