From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH] tcp_splice: Set OUT_WAIT_ flag whenever pipe isn't emptied
Date: Thu, 9 May 2024 10:28:22 +1000 [thread overview]
Message-ID: <ZjwYptEKmBNBMIui@zatzit> (raw)
In-Reply-To: <20240508090338.2735208-1-sbrivio@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 3641 bytes --]
On Wed, May 08, 2024 at 11:03:38AM +0200, Stefano Brivio wrote:
> In tcp_splice_sock_handler(), if we get EAGAIN on the second splice(),
> from pipe to receiving socket, that doesn't necessarily mean that the
> pipe is empty: the receiver buffer might be full instead.
>
> Hence, we can't use the 'never_read' flag to decide that there's
> nothing to wait for: even if we didn't read anything from the sending
> side in a given iteration, we might still have data to send in the
> pipe. Use read/written counters, instead.
>
> This fixes an issue where large bulk transfers would occasionally
> hang. From a corresponding strace:
>
> 0.000061 epoll_wait(4, [{events=EPOLLOUT, data={u32=29442, u64=12884931330}}], 8, 1000) = 1
> 0.005003 epoll_ctl(4, EPOLL_CTL_MOD, 211, {events=EPOLLIN|EPOLLRDHUP, data={u32=54018, u64=8589988610}}) = 0
> 0.000089 epoll_ctl(4, EPOLL_CTL_MOD, 115, {events=EPOLLIN|EPOLLRDHUP, data={u32=29442, u64=12884931330}}) = 0
> 0.000081 splice(211, NULL, 151, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
> 0.000073 splice(150, NULL, 115, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 1048576
> 0.000087 splice(211, NULL, 151, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
> 0.000045 splice(150, NULL, 115, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 520415
> 0.000060 splice(211, NULL, 151, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
> 0.000044 splice(150, NULL, 115, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
> 0.000044 epoll_wait(4, [], 8, 1000) = 0
>
> we're reading from socket 211 into to the pipe end numbered 151,
> which connects to pipe end 150, and from there we're writing into
> socket 115.
>
> We initially drop EPOLLOUT from the set of monitored flags for socket
> 115, because it already signaled it's ready for output. Then we read
> nothing from socket 211 (the sender had nothing to send), and we keep
> emptying the pipe into socket 115 (first 1048576 bytes, then 520415
> bytes).
>
> This call of tcp_splice_sock_handler() ends with EAGAIN on the writing
> side, and we just exit this function without setting the OUT_WAIT_1
> flag (and, in turn, EPOLLOUT for socket 115). However, it turns out,
> the pipe wasn't actually emptied, and while socket 211 had nothing
> more to send, we should have waited on socket 115 to be ready for
> output again.
>
> As a further step, we could consider not clearing EPOLLOUT at all,
> unless the read/written counters match, but I'm first trying to fix
> this ugly issue with a minimal patch.
>
> Link: https://github.com/containers/podman/issues/22575
> Link: https://github.com/containers/podman/issues/22593
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> tcp_splice.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tcp_splice.c b/tcp_splice.c
> index 42b7be0..4c36b72 100644
> --- a/tcp_splice.c
> +++ b/tcp_splice.c
> @@ -616,7 +616,7 @@ eintr:
> if (errno != EAGAIN)
> goto close;
>
> - if (never_read)
> + if (conn->read[fromside] == conn->written[fromside])
> break;
>
> conn_event(c, conn,
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
prev parent reply other threads:[~2024-05-09 0:28 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-08 9:03 [PATCH] tcp_splice: Set OUT_WAIT_ flag whenever pipe isn't emptied Stefano Brivio
2024-05-09 0:28 ` David Gibson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZjwYptEKmBNBMIui@zatzit \
--to=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).