On Thu, Jun 04, 2026 at 06:41:37AM +0200, Stefano Brivio wrote: > On Thu, 28 May 2026 15:02:11 +1000 > David Gibson wrote: > > > We set the OUT_WAIT flag if we stop forwarding due to EAGAIN, but there's > > still data in the pipe. That ensures we wake up when the output socket has > > room to drain the pipe into. > > > > We clear the OUT_WAIT flag when we complete forwarding on an EPOLLOUT > > event, but that's not quite right. Even though it's called on an EPOLLOUT, > > tcp_splice_forward() could, in principle empty the pipe, but also read > > enough new data from the other side to fill it again. That would set > > OUT_WAIT internally, but it would be cleared after returning meaning > > we could miss a necessary wakeup. > > The current logic in tcp_splice_sock_handler(): > > if (events & EPOLLOUT) { > if (tcp_splice_forward(c, conn, !evsidei, now)) > goto reset; > conn_event(conn, ~OUT_WAIT(evsidei)); > } > > if (events & EPOLLIN) { > if (tcp_splice_forward(c, conn, evsidei, now)) > goto reset; > } > > would prevent the case you described, because if we read new data from > the other side filling the pipe, we'll hit (events & EPOLLIN) and set > OUT_WAIT again if needed. Nope. The (events & EPOLLIN) is an event on the same socket, forwarding in the opposite direction. The pipe would be refilled by data on the _other_ socket forwarding in the same direction. Now, _usually_ you'd then get an EPOLLIN on that other socket and that would trigger the wake up. But, this is actually a rare case where we might "miss" an event because we're using level not edge trigger (rather than the other way around). Consider just one direction of flow from socket A to socket B 1. epoll_wait() returns (just) an EPOLLOUT on socket B, nothing has arrived yet on socket A, so no EPOLLIN there. 2. Data arrives on socket A. 3. We reach tcp_splice_forward(), it empties the pipe, but refills it with the data that arrived in step (2). It happens that this also consumes all the data that arrived in (2) - we got exactly one pipe's worth of data. 4. We return from tcp_splice_forward() and clear OUT_WAIT. 5. We return to the epoll_wait(), but because we already read the data from socket A, and we're using level triggered events, we don't get an EPOLLIN 6. Space becomes available on socket B, but we don't get an EPOLLOUT, because OUT_WAIT is clear ...and we're stuck. Unlikely, but possible > But there's a case this should actually fix, even though I've never > seen it happening in practice: what if we *don't* read new data from > the other side, and we can't empty the pipe in one EPOLLOUT shot anyway? > > I hadn't considered that before but if the receiver is slow enough > that's probably possible. True, that's probably more likely than the scenario above, actually. > > > The condition on whether we need write side wakeups is actually fairly > > simple: we need them if and only if we return to the main loop with data > > in the pipe. Maintain that in a single place - right after we exit the > > forwarding loop in tcp_splice_forward(). > > > > Signed-off-by: David Gibson > > --- > > tcp_splice.c | 16 +++++++++------- > > 1 file changed, 9 insertions(+), 7 deletions(-) > > > > diff --git a/tcp_splice.c b/tcp_splice.c > > index 42902684..5f412584 100644 > > --- a/tcp_splice.c > > +++ b/tcp_splice.c > > @@ -531,19 +531,22 @@ static int tcp_splice_forward(struct ctx *c, > > conn->pending[fromsidei] += readlen > 0 ? readlen : 0; > > conn->pending[fromsidei] -= written > 0 ? written : 0; > > > > - if (written < 0) { > > - if (!conn->pending[fromsidei]) > > - break; > > - > > - conn_event(conn, OUT_WAIT(!fromsidei)); > > + if (written < 0) > > break; > > - } > > > > if (conn->events & FIN_RCVD(fromsidei) && > > !conn->pending[fromsidei]) > > break; > > } > > > > + /* We need write-side wakeups if and only if we have data in the pipe to > > + * drain. > > + */ > > + if (conn->pending[fromsidei]) > > + conn_event(conn, OUT_WAIT(!fromsidei)); > > + else > > + conn_event(conn, ~OUT_WAIT(!fromsidei)); > > + > > if ((conn->events & FIN_RCVD(fromsidei)) && > > !(conn->events & FIN_SENT(!fromsidei)) && > > !conn->pending[fromsidei]) { > > @@ -606,7 +609,6 @@ void tcp_splice_sock_handler(struct ctx *c, union epoll_ref ref, > > if (events & EPOLLOUT) { > > if (tcp_splice_forward(c, conn, !evsidei, now)) > > goto reset; > > - conn_event(conn, ~OUT_WAIT(evsidei)); > > } > > > > if (events & (EPOLLIN | EPOLLRDHUP)) { > > -- > Stefano > -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson