public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: passt-dev@passt.top
Cc: David Gibson <david@gibson.dropbear.id.au>
Subject: [PATCH] tcp_splice: Set OUT_WAIT_ flag whenever pipe isn't emptied
Date: Wed,  8 May 2024 11:03:38 +0200	[thread overview]
Message-ID: <20240508090338.2735208-1-sbrivio@redhat.com> (raw)

In tcp_splice_sock_handler(), if we get EAGAIN on the second splice(),
from pipe to receiving socket, that doesn't necessarily mean that the
pipe is empty: the receiver buffer might be full instead.

Hence, we can't use the 'never_read' flag to decide that there's
nothing to wait for: even if we didn't read anything from the sending
side in a given iteration, we might still have data to send in the
pipe. Use read/written counters, instead.

This fixes an issue where large bulk transfers would occasionally
hang. From a corresponding strace:

     0.000061 epoll_wait(4, [{events=EPOLLOUT, data={u32=29442, u64=12884931330}}], 8, 1000) = 1
     0.005003 epoll_ctl(4, EPOLL_CTL_MOD, 211, {events=EPOLLIN|EPOLLRDHUP, data={u32=54018, u64=8589988610}}) = 0
     0.000089 epoll_ctl(4, EPOLL_CTL_MOD, 115, {events=EPOLLIN|EPOLLRDHUP, data={u32=29442, u64=12884931330}}) = 0
     0.000081 splice(211, NULL, 151, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
     0.000073 splice(150, NULL, 115, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 1048576
     0.000087 splice(211, NULL, 151, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
     0.000045 splice(150, NULL, 115, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 520415
     0.000060 splice(211, NULL, 151, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
     0.000044 splice(150, NULL, 115, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
     0.000044 epoll_wait(4, [], 8, 1000) = 0

we're reading from socket 211 into to the pipe end numbered 151,
which connects to pipe end 150, and from there we're writing into
socket 115.

We initially drop EPOLLOUT from the set of monitored flags for socket
115, because it already signaled it's ready for output. Then we read
nothing from socket 211 (the sender had nothing to send), and we keep
emptying the pipe into socket 115 (first 1048576 bytes, then 520415
bytes).

This call of tcp_splice_sock_handler() ends with EAGAIN on the writing
side, and we just exit this function without setting the OUT_WAIT_1
flag (and, in turn, EPOLLOUT for socket 115). However, it turns out,
the pipe wasn't actually emptied, and while socket 211 had nothing
more to send, we should have waited on socket 115 to be ready for
output again.

As a further step, we could consider not clearing EPOLLOUT at all,
unless the read/written counters match, but I'm first trying to fix
this ugly issue with a minimal patch.

Link: https://github.com/containers/podman/issues/22575
Link: https://github.com/containers/podman/issues/22593
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 tcp_splice.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcp_splice.c b/tcp_splice.c
index 42b7be0..4c36b72 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -616,7 +616,7 @@ eintr:
 			if (errno != EAGAIN)
 				goto close;
 
-			if (never_read)
+			if (conn->read[fromside] == conn->written[fromside])
 				break;
 
 			conn_event(c, conn,
-- 
@@ -616,7 +616,7 @@ eintr:
 			if (errno != EAGAIN)
 				goto close;
 
-			if (never_read)
+			if (conn->read[fromside] == conn->written[fromside])
 				break;
 
 			conn_event(c, conn,
-- 
2.43.0


             reply	other threads:[~2024-05-08  9:03 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-08  9:03 Stefano Brivio [this message]
2024-05-09  0:28 ` [PATCH] tcp_splice: Set OUT_WAIT_ flag whenever pipe isn't emptied David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240508090338.2735208-1-sbrivio@redhat.com \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).