public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: passt-dev@passt.top
Cc: Asahi Lina <lina@asahilina.net>, Sergio Lopez <slp@redhat.com>,
	Jon Maloy <jmaloy@redhat.com>,
	David Gibson <david@gibson.dropbear.id.au>
Subject: [PATCH 1/4] tcp: Fix ACK sequence getting out of sync on EPOLLOUT wake-up
Date: Thu, 16 Jan 2025 21:32:47 +0100	[thread overview]
Message-ID: <20250116203250.784496-2-sbrivio@redhat.com> (raw)
In-Reply-To: <20250116203250.784496-1-sbrivio@redhat.com>

In the next patches, I'm extending the usage of STALLED to a few more
cases.

Doing so revealed this issue: if we set STALLED and, consequently,
EPOLLOUT (which is wrong, fixed later) right after we set a connection
to ESTABLISHED (which also happened by mistake while I was preparing
another change), with the guest sending data together with the final
ACK in the handshake, say:

  41.3661: vhost-user: got kick_data: 0000000000000001 idx: 1
  41.3662: Flow 2 (NEW): FREE -> NEW
  41.3663: Flow 2 (INI): NEW -> INI
  41.3663: Flow 2 (INI): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => ?
  41.3665: Flow 2 (TGT): INI -> TGT
  41.3666: Flow 2 (TGT): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => HOST [::]:0 -> [2001:db8:9a55::1]:10003
  41.3667: Flow 2 (TCP connection): TGT -> TYPED
  41.3667: Flow 2 (TCP connection): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => HOST [::]:0 -> [2001:db8:9a55::1]:10003
  41.3669: Flow 2 (TCP connection): TAP_SYN_RCVD: CLOSED -> SYN_SENT
  41.3670: Flow 2 (TCP connection): Side 0 hash table insert: bucket: 339814
  41.3672: Flow 2 (TCP connection): TYPED -> ACTIVE
  41.3673: Flow 2 (TCP connection): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => HOST [::]:0 -> [2001:db8:9a55::1]:10003
  41.3674: Flow 2 (TCP connection): TAP_SYN_ACK_SENT: SYN_SENT -> SYN_RCVD
  41.3675: Flow 2 (TCP connection): ACK_FROM_TAP_DUE
  41.3675: Flow 2 (TCP connection): timer expires in 10.000s
  41.3675: vhost-user: got kick_data: 0000000000000001 idx: 1
  41.3676: Flow 2 (TCP connection): ACK_FROM_TAP_DUE dropped
  41.3676: Flow 2 (TCP connection): ESTABLISHED: SYN_RCVD -> ESTABLISHED
  41.3678: Flow 2 (TCP connection): STALLED
  41.3678: vhost-user: got kick_data: 0000000000000002 idx: 1
  41.3679: Flow 2 (TCP connection): ACK_TO_TAP_DUE
  41.3680: Flow 2 (TCP connection): timer expires in 0.010s
  41.3680: Flow 2 (TCP connection): STALLED dropped

we'll immediately get an EPOLLOUT event, call tcp_update_seqack_wnd(),
but ignore window and ACK sequence update. At this point, we think we
acknowledged all the data to the guest (but we didn't) and we'll
happily proceed to clear the ACK_TO_TAP_DUE flag:

  41.3780: Flow 2 (TCP connection): ACK_TO_TAP_DUE dropped
  41.3780: Flow 2 (TCP connection): timer expires in 7200.000s
  41.5754: vhost-user: got kick_data: 0000000000000001 idx: 1
  41.9956: vhost-user: got kick_data: 0000000000000001 idx: 1
  42.8275: vhost-user: got kick_data: 0000000000000001 idx: 1

while the guest starts retransmitting that data desperately, without
ever getting an ACK segment from us:

   1433  38.746353 2a01:4f8:222:904::2 → 2001:db8:9a55::1 94 TCP 54312 → 10003 [SYN] Seq=0 Win=65460 Len=0 MSS=65460 SACK_PERM TSval=1089126192 TSecr=0 WS=128
   1434  38.747357 2001:db8:9a55::1 → 2a01:4f8:222:904::2 82 TCP 10003 → 54312 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=61440 WS=256
   1435  38.747500 2a01:4f8:222:904::2 → 2001:db8:9a55::1 74 TCP 54312 → 10003 [ACK] Seq=1 Ack=1 Win=65536 Len=0
   1436  38.747769 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192
   1437  38.747798 2a01:4f8:222:904::2 → 2001:db8:9a55::1 32841 TCP 54312 → 10003 [ACK] Seq=8193 Ack=1 Win=65536 Len=32767
   1438  38.748049 2001:db8:9a55::1 → 2a01:4f8:222:904::2 74 TCP [TCP Window Update] 10003 → 54312 [ACK] Seq=1 Ack=1 Win=65280 Len=0
   1439  38.954044 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP [TCP Retransmission] 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192
   1440  39.370096 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP [TCP Retransmission] 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192
   1441  40.202135 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP [TCP Retransmission] 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192

because seq_ack_to_tap is already set to the sequence after frame
number 1437 in the example.

For some reason, I could only reproduce this with vhost-user, IPv6,
and passt running under valgrind while taking captures. Even under
these conditions, it happens quite rarely.

Forcibly send an ACK segment if we update the ACK sequence (or the
advertised window).

Fixes: e5eefe77435a ("tcp: Refactor to use events instead of states, split out spliced implementation")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 tcp.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tcp.c b/tcp.c
index ec433f7..72fca63 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2200,8 +2200,10 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref,
 		if (events & EPOLLIN)
 			tcp_data_from_sock(c, conn);
 
-		if (events & EPOLLOUT)
-			tcp_update_seqack_wnd(c, conn, false, NULL);
+		if (events & EPOLLOUT) {
+			if (tcp_update_seqack_wnd(c, conn, false, NULL))
+				tcp_send_flag(c, conn, ACK);
+		}
 
 		return;
 	}
-- 
@@ -2200,8 +2200,10 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref,
 		if (events & EPOLLIN)
 			tcp_data_from_sock(c, conn);
 
-		if (events & EPOLLOUT)
-			tcp_update_seqack_wnd(c, conn, false, NULL);
+		if (events & EPOLLOUT) {
+			if (tcp_update_seqack_wnd(c, conn, false, NULL))
+				tcp_send_flag(c, conn, ACK);
+		}
 
 		return;
 	}
-- 
2.43.0


  reply	other threads:[~2025-01-16 20:32 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-16 20:32 [PATCH 0/4] Fixes for EAGAIN/EPOLLIN storm and related issues Stefano Brivio
2025-01-16 20:32 ` Stefano Brivio [this message]
2025-01-16 20:32 ` [PATCH 2/4] tcp: Don't subscribe to EPOLLOUT events on STALLED Stefano Brivio
2025-01-16 20:32 ` [PATCH 3/4] tcp: Set EPOLLET when when reading from a socket fails with EAGAIN Stefano Brivio
2025-01-16 20:32 ` [PATCH 4/4] tcp: Mask EPOLLIN altogether if we're blocked waiting on an ACK from the guest Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250116203250.784496-2-sbrivio@redhat.com \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=jmaloy@redhat.com \
    --cc=lina@asahilina.net \
    --cc=passt-dev@passt.top \
    --cc=slp@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).