On Wed, May 08, 2024 at 11:03:38AM +0200, Stefano Brivio wrote: > In tcp_splice_sock_handler(), if we get EAGAIN on the second splice(), > from pipe to receiving socket, that doesn't necessarily mean that the > pipe is empty: the receiver buffer might be full instead. > > Hence, we can't use the 'never_read' flag to decide that there's > nothing to wait for: even if we didn't read anything from the sending > side in a given iteration, we might still have data to send in the > pipe. Use read/written counters, instead. > > This fixes an issue where large bulk transfers would occasionally > hang. From a corresponding strace: > > 0.000061 epoll_wait(4, [{events=EPOLLOUT, data={u32=29442, u64=12884931330}}], 8, 1000) = 1 > 0.005003 epoll_ctl(4, EPOLL_CTL_MOD, 211, {events=EPOLLIN|EPOLLRDHUP, data={u32=54018, u64=8589988610}}) = 0 > 0.000089 epoll_ctl(4, EPOLL_CTL_MOD, 115, {events=EPOLLIN|EPOLLRDHUP, data={u32=29442, u64=12884931330}}) = 0 > 0.000081 splice(211, NULL, 151, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable) > 0.000073 splice(150, NULL, 115, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 1048576 > 0.000087 splice(211, NULL, 151, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable) > 0.000045 splice(150, NULL, 115, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 520415 > 0.000060 splice(211, NULL, 151, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable) > 0.000044 splice(150, NULL, 115, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable) > 0.000044 epoll_wait(4, [], 8, 1000) = 0 > > we're reading from socket 211 into to the pipe end numbered 151, > which connects to pipe end 150, and from there we're writing into > socket 115. > > We initially drop EPOLLOUT from the set of monitored flags for socket > 115, because it already signaled it's ready for output. Then we read > nothing from socket 211 (the sender had nothing to send), and we keep > emptying the pipe into socket 115 (first 1048576 bytes, then 520415 > bytes). > > This call of tcp_splice_sock_handler() ends with EAGAIN on the writing > side, and we just exit this function without setting the OUT_WAIT_1 > flag (and, in turn, EPOLLOUT for socket 115). However, it turns out, > the pipe wasn't actually emptied, and while socket 211 had nothing > more to send, we should have waited on socket 115 to be ready for > output again. > > As a further step, we could consider not clearing EPOLLOUT at all, > unless the read/written counters match, but I'm first trying to fix > this ugly issue with a minimal patch. > > Link: https://github.com/containers/podman/issues/22575 > Link: https://github.com/containers/podman/issues/22593 > Signed-off-by: Stefano Brivio Reviewed-by: David Gibson > --- > tcp_splice.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tcp_splice.c b/tcp_splice.c > index 42b7be0..4c36b72 100644 > --- a/tcp_splice.c > +++ b/tcp_splice.c > @@ -616,7 +616,7 @@ eintr: > if (errno != EAGAIN) > goto close; > > - if (never_read) > + if (conn->read[fromside] == conn->written[fromside]) > break; > > conn_event(c, conn, -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson