public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH] tcp: Clear and set ACK_FROM_TAP_DUE also on unchanged ACK sequence from peer
@ 2023-03-23 16:08 Stefano Brivio
  2023-03-24  5:18 ` David Gibson
  0 siblings, 1 reply; 4+ messages in thread
From: Stefano Brivio @ 2023-03-23 16:08 UTC (permalink / raw)
  To: passt-dev; +Cc: David Gibson

Since commit cc6d8286d104 ("tcp: Reset ACK_FROM_TAP_DUE flag only as
needed, update timer"), we don't clear ACK_FROM_TAP_DUE whenever we
process an ACK segment, but, more correctly, only if we're really not
waiting for a further ACK segment, that is, only if the acknowledged
sequence matches what we sent.

In the new function implementing this, tcp_update_seqack_from_tap(),
we also reset the retransmission counter and store the updated ACK
sequence. Both should be done iff forward progress is acknowledged,
implied by the fact that the new ACK sequence is greater than the
one we previously stored.

At that point, it looked natural to also include the statements that
clear and set the ACK_FROM_TAP_DUE flag inside the same conditional
block: if we're not making forward progress, the need for an ACK, or
lack thereof, should remain unchanged.

There's a case where this isn't true, though: if a client initiates
a connection, and the server doesn't send data, with the client also
not sending any further data except for what's possibly sent along
with the first ACK segment following SYN, ACK from the server, we'll
never, in the established state of the connection, call
tcp_update_seqack_from_tap() with reported forward progress.

In that case, we'll never reset the initial ACK_FROM_TAP_DUE (used
to trigger handshake timeouts), interpret the trigger as a the need
for a retransmission (which won't actually retransmit anything), and
eventually time out a perfectly healthy connection once we reach the
maximum retransmission count.

This is relatively simple to reproduce if we switch back to 30s
iperf3 test runs, but it depends on the timing of the iperf3 client:
sometimes the first ACK from the client (part of the three-way
handshake) will come with data (and we'll hit the problem), sometimes
data will be sent later (and we call to tcp_update_seqack_from_tap()
from tcp_data_from_tap() later, avoiding the issue).

A reliable reproducer is a simpler:

  $ strace -e accept,shutdown socat TCP-LISTEN:1111 STDIO &
  [2] 2202832
  $ pasta --config-net -- sh -c '(sleep 30; echo a) | socat STDIN TCP:88.198.0.161:1111'
  accept(5, {sa_family=AF_INET, sin_port=htons(57200), sin_addr=inet_addr("127.0.0.1")}, [16]) = 6
  shutdown(6, SHUT_RDWR)                  = 0
  --- SIGTTOU {si_signo=SIGTTOU, si_code=SI_KERNEL} ---
  --- stopped by SIGTTOU ---
  2023/03/23 16:05:06 socat[3] E write(5, 0x5645dbca9000, 2): Connection reset by peer

where the socat client connects, and no data exchange happens for 30s
in either direction. Here, I'm using the default gateway address to
reach the socat server on the host.

Fix this by clearing and setting the ACK_FROM_TAP_DUE flag regardless
of reported forward progress. If we clear it when it's already unset,
conn_flag() will do nothing with it. If it was set, it's always fine
to reschedule the timer (as long as we're waiting for a further ACK),
because we just received an ACK segment, no matter its sequence.

Fixes: cc6d8286d104 ("tcp: Reset ACK_FROM_TAP_DUE flag only as needed, update timer")
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Analysed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 tcp.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tcp.c b/tcp.c
index f156287..b225bbe 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1610,12 +1610,12 @@ out:
 static void tcp_update_seqack_from_tap(const struct ctx *c,
 				       struct tcp_tap_conn *conn, uint32_t seq)
 {
-	if (SEQ_GT(seq, conn->seq_ack_from_tap)) {
-		if (seq == conn->seq_to_tap)
-			conn_flag(c, conn, ~ACK_FROM_TAP_DUE);
-		else
-			conn_flag(c, conn, ACK_FROM_TAP_DUE);
+	if (seq == conn->seq_to_tap)
+		conn_flag(c, conn, ~ACK_FROM_TAP_DUE);
+	else
+		conn_flag(c, conn, ACK_FROM_TAP_DUE);
 
+	if (SEQ_GT(seq, conn->seq_ack_from_tap)) {
 		conn->retrans = 0;
 		conn->seq_ack_from_tap = seq;
 	}
-- 
@@ -1610,12 +1610,12 @@ out:
 static void tcp_update_seqack_from_tap(const struct ctx *c,
 				       struct tcp_tap_conn *conn, uint32_t seq)
 {
-	if (SEQ_GT(seq, conn->seq_ack_from_tap)) {
-		if (seq == conn->seq_to_tap)
-			conn_flag(c, conn, ~ACK_FROM_TAP_DUE);
-		else
-			conn_flag(c, conn, ACK_FROM_TAP_DUE);
+	if (seq == conn->seq_to_tap)
+		conn_flag(c, conn, ~ACK_FROM_TAP_DUE);
+	else
+		conn_flag(c, conn, ACK_FROM_TAP_DUE);
 
+	if (SEQ_GT(seq, conn->seq_ack_from_tap)) {
 		conn->retrans = 0;
 		conn->seq_ack_from_tap = seq;
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-03-24  9:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-23 16:08 [PATCH] tcp: Clear and set ACK_FROM_TAP_DUE also on unchanged ACK sequence from peer Stefano Brivio
2023-03-24  5:18 ` David Gibson
2023-03-24  7:42   ` Stefano Brivio
2023-03-24  9:20     ` David Gibson

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).