public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH] tcp_splice: Set OUT_WAIT_ flag whenever pipe isn't emptied
Date: Thu, 9 May 2024 10:28:22 +1000	[thread overview]
Message-ID: <ZjwYptEKmBNBMIui@zatzit> (raw)
In-Reply-To: <20240508090338.2735208-1-sbrivio@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3641 bytes --]

On Wed, May 08, 2024 at 11:03:38AM +0200, Stefano Brivio wrote:
> In tcp_splice_sock_handler(), if we get EAGAIN on the second splice(),
> from pipe to receiving socket, that doesn't necessarily mean that the
> pipe is empty: the receiver buffer might be full instead.
> 
> Hence, we can't use the 'never_read' flag to decide that there's
> nothing to wait for: even if we didn't read anything from the sending
> side in a given iteration, we might still have data to send in the
> pipe. Use read/written counters, instead.
> 
> This fixes an issue where large bulk transfers would occasionally
> hang. From a corresponding strace:
> 
>      0.000061 epoll_wait(4, [{events=EPOLLOUT, data={u32=29442, u64=12884931330}}], 8, 1000) = 1
>      0.005003 epoll_ctl(4, EPOLL_CTL_MOD, 211, {events=EPOLLIN|EPOLLRDHUP, data={u32=54018, u64=8589988610}}) = 0
>      0.000089 epoll_ctl(4, EPOLL_CTL_MOD, 115, {events=EPOLLIN|EPOLLRDHUP, data={u32=29442, u64=12884931330}}) = 0
>      0.000081 splice(211, NULL, 151, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
>      0.000073 splice(150, NULL, 115, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 1048576
>      0.000087 splice(211, NULL, 151, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
>      0.000045 splice(150, NULL, 115, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 520415
>      0.000060 splice(211, NULL, 151, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
>      0.000044 splice(150, NULL, 115, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
>      0.000044 epoll_wait(4, [], 8, 1000) = 0
> 
> we're reading from socket 211 into to the pipe end numbered 151,
> which connects to pipe end 150, and from there we're writing into
> socket 115.
> 
> We initially drop EPOLLOUT from the set of monitored flags for socket
> 115, because it already signaled it's ready for output. Then we read
> nothing from socket 211 (the sender had nothing to send), and we keep
> emptying the pipe into socket 115 (first 1048576 bytes, then 520415
> bytes).
> 
> This call of tcp_splice_sock_handler() ends with EAGAIN on the writing
> side, and we just exit this function without setting the OUT_WAIT_1
> flag (and, in turn, EPOLLOUT for socket 115). However, it turns out,
> the pipe wasn't actually emptied, and while socket 211 had nothing
> more to send, we should have waited on socket 115 to be ready for
> output again.
> 
> As a further step, we could consider not clearing EPOLLOUT at all,
> unless the read/written counters match, but I'm first trying to fix
> this ugly issue with a minimal patch.
> 
> Link: https://github.com/containers/podman/issues/22575
> Link: https://github.com/containers/podman/issues/22593
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  tcp_splice.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tcp_splice.c b/tcp_splice.c
> index 42b7be0..4c36b72 100644
> --- a/tcp_splice.c
> +++ b/tcp_splice.c
> @@ -616,7 +616,7 @@ eintr:
>  			if (errno != EAGAIN)
>  				goto close;
>  
> -			if (never_read)
> +			if (conn->read[fromside] == conn->written[fromside])
>  				break;
>  
>  			conn_event(c, conn,

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

      reply	other threads:[~2024-05-09  0:28 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-08  9:03 [PATCH] tcp_splice: Set OUT_WAIT_ flag whenever pipe isn't emptied Stefano Brivio
2024-05-09  0:28 ` David Gibson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZjwYptEKmBNBMIui@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).