* [PATCH 1/2] tcp_splice: Don't double could bytes read on EINTR
2025-04-09 6:35 [PATCH 0/2] Fix bugs in tcp_splice_sock_handler() David Gibson
@ 2025-04-09 6:35 ` David Gibson
2025-04-09 6:35 ` [PATCH 2/2] tcp_splice: Don't clobber errno before checking for EAGAIN David Gibson
2025-04-09 22:41 ` [PATCH 0/2] Fix bugs in tcp_splice_sock_handler() Stefano Brivio
2 siblings, 0 replies; 4+ messages in thread
From: David Gibson @ 2025-04-09 6:35 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
In tcp_splice_sock_handler(), if we get an EINTR on our second splice()
(pipe to output socket) we - as we should - go back and retry it. However,
we do so *after* we've already updated our byte counters. That does no
harm for the conn->written[] counter - since the second splice() returned
an error it will be advanced by 0. However we also advance the
conn->read[] counter, and then do so again when the splice() succeeds.
This results in the counters being out of sync, and us thinking we have
remaining data in the pipe when we don't, which can leave us in an
infinite loop once the stream finishes.
Fix this by moving the EINTR handling to directly next to the splice()
call (which is what we usually do for EINTR). As a bonus this removes one
mildly confusing goto.
For symmetry, also rework the EINTR handling on the first splice() the same
way, although that doesn't (as far as I can tell) have buggy side effects.
Link: https://github.com/containers/podman/issues/23686
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_splice.c | 27 +++++++++++++--------------
1 file changed, 13 insertions(+), 14 deletions(-)
diff --git a/tcp_splice.c b/tcp_splice.c
index 0d10e3d4..7c3b56f9 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -520,15 +520,14 @@ swap:
int more = 0;
retry:
- readlen = splice(conn->s[fromsidei], NULL,
- conn->pipe[fromsidei][1], NULL,
- c->tcp.pipe_size,
- SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
+ do
+ readlen = splice(conn->s[fromsidei], NULL,
+ conn->pipe[fromsidei][1], NULL,
+ c->tcp.pipe_size,
+ SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
+ while (readlen < 0 && errno == EINTR);
flow_trace(conn, "%zi from read-side call", readlen);
if (readlen < 0) {
- if (errno == EINTR)
- goto retry;
-
if (errno != EAGAIN)
goto close;
} else if (!readlen) {
@@ -543,10 +542,13 @@ retry:
conn_flag(c, conn, lowat_act_flag);
}
-eintr:
- written = splice(conn->pipe[fromsidei][0], NULL,
- conn->s[!fromsidei], NULL, c->tcp.pipe_size,
- SPLICE_F_MOVE | more | SPLICE_F_NONBLOCK);
+ do
+ written = splice(conn->pipe[fromsidei][0], NULL,
+ conn->s[!fromsidei], NULL,
+ c->tcp.pipe_size,
+ SPLICE_F_MOVE | more | SPLICE_F_NONBLOCK);
+ while (written < 0 && errno == EINTR);
+
flow_trace(conn, "%zi from write-side call (passed %zi)",
written, c->tcp.pipe_size);
@@ -578,9 +580,6 @@ eintr:
conn->written[fromsidei] += written > 0 ? written : 0;
if (written < 0) {
- if (errno == EINTR)
- goto eintr;
-
if (errno != EAGAIN)
goto close;
--
@@ -520,15 +520,14 @@ swap:
int more = 0;
retry:
- readlen = splice(conn->s[fromsidei], NULL,
- conn->pipe[fromsidei][1], NULL,
- c->tcp.pipe_size,
- SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
+ do
+ readlen = splice(conn->s[fromsidei], NULL,
+ conn->pipe[fromsidei][1], NULL,
+ c->tcp.pipe_size,
+ SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
+ while (readlen < 0 && errno == EINTR);
flow_trace(conn, "%zi from read-side call", readlen);
if (readlen < 0) {
- if (errno == EINTR)
- goto retry;
-
if (errno != EAGAIN)
goto close;
} else if (!readlen) {
@@ -543,10 +542,13 @@ retry:
conn_flag(c, conn, lowat_act_flag);
}
-eintr:
- written = splice(conn->pipe[fromsidei][0], NULL,
- conn->s[!fromsidei], NULL, c->tcp.pipe_size,
- SPLICE_F_MOVE | more | SPLICE_F_NONBLOCK);
+ do
+ written = splice(conn->pipe[fromsidei][0], NULL,
+ conn->s[!fromsidei], NULL,
+ c->tcp.pipe_size,
+ SPLICE_F_MOVE | more | SPLICE_F_NONBLOCK);
+ while (written < 0 && errno == EINTR);
+
flow_trace(conn, "%zi from write-side call (passed %zi)",
written, c->tcp.pipe_size);
@@ -578,9 +580,6 @@ eintr:
conn->written[fromsidei] += written > 0 ? written : 0;
if (written < 0) {
- if (errno == EINTR)
- goto eintr;
-
if (errno != EAGAIN)
goto close;
--
2.49.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/2] tcp_splice: Don't clobber errno before checking for EAGAIN
2025-04-09 6:35 [PATCH 0/2] Fix bugs in tcp_splice_sock_handler() David Gibson
2025-04-09 6:35 ` [PATCH 1/2] tcp_splice: Don't double could bytes read on EINTR David Gibson
@ 2025-04-09 6:35 ` David Gibson
2025-04-09 22:41 ` [PATCH 0/2] Fix bugs in tcp_splice_sock_handler() Stefano Brivio
2 siblings, 0 replies; 4+ messages in thread
From: David Gibson @ 2025-04-09 6:35 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
Like many places, tcp_splice_sock_handler() needs to handle EAGAIN
specially, in this case for both of its splice() calls. Unfortunately it
tests for EAGAIN some time after those calls. In between there has been
at least a flow_trace() which could have clobbered errno. Move the test on
errno closer to the relevant system calls to avoid this problem.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_splice.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/tcp_splice.c b/tcp_splice.c
index 7c3b56f9..60455d64 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -526,13 +526,15 @@ retry:
c->tcp.pipe_size,
SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
while (readlen < 0 && errno == EINTR);
+
+ if (readlen < 0 && errno != EAGAIN)
+ goto close;
+
flow_trace(conn, "%zi from read-side call", readlen);
- if (readlen < 0) {
- if (errno != EAGAIN)
- goto close;
- } else if (!readlen) {
+
+ if (!readlen) {
eof = 1;
- } else {
+ } else if (readlen > 0) {
never_read = 0;
if (readlen >= (long)c->tcp.pipe_size * 90 / 100)
@@ -549,6 +551,9 @@ retry:
SPLICE_F_MOVE | more | SPLICE_F_NONBLOCK);
while (written < 0 && errno == EINTR);
+ if (written < 0 && errno != EAGAIN)
+ goto close;
+
flow_trace(conn, "%zi from write-side call (passed %zi)",
written, c->tcp.pipe_size);
@@ -580,9 +585,6 @@ retry:
conn->written[fromsidei] += written > 0 ? written : 0;
if (written < 0) {
- if (errno != EAGAIN)
- goto close;
-
if (conn->read[fromsidei] == conn->written[fromsidei])
break;
--
@@ -526,13 +526,15 @@ retry:
c->tcp.pipe_size,
SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
while (readlen < 0 && errno == EINTR);
+
+ if (readlen < 0 && errno != EAGAIN)
+ goto close;
+
flow_trace(conn, "%zi from read-side call", readlen);
- if (readlen < 0) {
- if (errno != EAGAIN)
- goto close;
- } else if (!readlen) {
+
+ if (!readlen) {
eof = 1;
- } else {
+ } else if (readlen > 0) {
never_read = 0;
if (readlen >= (long)c->tcp.pipe_size * 90 / 100)
@@ -549,6 +551,9 @@ retry:
SPLICE_F_MOVE | more | SPLICE_F_NONBLOCK);
while (written < 0 && errno == EINTR);
+ if (written < 0 && errno != EAGAIN)
+ goto close;
+
flow_trace(conn, "%zi from write-side call (passed %zi)",
written, c->tcp.pipe_size);
@@ -580,9 +585,6 @@ retry:
conn->written[fromsidei] += written > 0 ? written : 0;
if (written < 0) {
- if (errno != EAGAIN)
- goto close;
-
if (conn->read[fromsidei] == conn->written[fromsidei])
break;
--
2.49.0
^ permalink raw reply related [flat|nested] 4+ messages in thread