From: Stefano Brivio <sbrivio@redhat.com>
To: passt-dev@passt.top
Cc: Asahi Lina <lina@asahilina.net>, Sergio Lopez <slp@redhat.com>,
Jon Maloy <jmaloy@redhat.com>,
David Gibson <david@gibson.dropbear.id.au>
Subject: [PATCH 1/4] tcp: Fix ACK sequence getting out of sync on EPOLLOUT wake-up
Date: Thu, 16 Jan 2025 21:32:47 +0100 [thread overview]
Message-ID: <20250116203250.784496-2-sbrivio@redhat.com> (raw)
In-Reply-To: <20250116203250.784496-1-sbrivio@redhat.com>
In the next patches, I'm extending the usage of STALLED to a few more
cases.
Doing so revealed this issue: if we set STALLED and, consequently,
EPOLLOUT (which is wrong, fixed later) right after we set a connection
to ESTABLISHED (which also happened by mistake while I was preparing
another change), with the guest sending data together with the final
ACK in the handshake, say:
41.3661: vhost-user: got kick_data: 0000000000000001 idx: 1
41.3662: Flow 2 (NEW): FREE -> NEW
41.3663: Flow 2 (INI): NEW -> INI
41.3663: Flow 2 (INI): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => ?
41.3665: Flow 2 (TGT): INI -> TGT
41.3666: Flow 2 (TGT): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => HOST [::]:0 -> [2001:db8:9a55::1]:10003
41.3667: Flow 2 (TCP connection): TGT -> TYPED
41.3667: Flow 2 (TCP connection): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => HOST [::]:0 -> [2001:db8:9a55::1]:10003
41.3669: Flow 2 (TCP connection): TAP_SYN_RCVD: CLOSED -> SYN_SENT
41.3670: Flow 2 (TCP connection): Side 0 hash table insert: bucket: 339814
41.3672: Flow 2 (TCP connection): TYPED -> ACTIVE
41.3673: Flow 2 (TCP connection): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => HOST [::]:0 -> [2001:db8:9a55::1]:10003
41.3674: Flow 2 (TCP connection): TAP_SYN_ACK_SENT: SYN_SENT -> SYN_RCVD
41.3675: Flow 2 (TCP connection): ACK_FROM_TAP_DUE
41.3675: Flow 2 (TCP connection): timer expires in 10.000s
41.3675: vhost-user: got kick_data: 0000000000000001 idx: 1
41.3676: Flow 2 (TCP connection): ACK_FROM_TAP_DUE dropped
41.3676: Flow 2 (TCP connection): ESTABLISHED: SYN_RCVD -> ESTABLISHED
41.3678: Flow 2 (TCP connection): STALLED
41.3678: vhost-user: got kick_data: 0000000000000002 idx: 1
41.3679: Flow 2 (TCP connection): ACK_TO_TAP_DUE
41.3680: Flow 2 (TCP connection): timer expires in 0.010s
41.3680: Flow 2 (TCP connection): STALLED dropped
we'll immediately get an EPOLLOUT event, call tcp_update_seqack_wnd(),
but ignore window and ACK sequence update. At this point, we think we
acknowledged all the data to the guest (but we didn't) and we'll
happily proceed to clear the ACK_TO_TAP_DUE flag:
41.3780: Flow 2 (TCP connection): ACK_TO_TAP_DUE dropped
41.3780: Flow 2 (TCP connection): timer expires in 7200.000s
41.5754: vhost-user: got kick_data: 0000000000000001 idx: 1
41.9956: vhost-user: got kick_data: 0000000000000001 idx: 1
42.8275: vhost-user: got kick_data: 0000000000000001 idx: 1
while the guest starts retransmitting that data desperately, without
ever getting an ACK segment from us:
1433 38.746353 2a01:4f8:222:904::2 → 2001:db8:9a55::1 94 TCP 54312 → 10003 [SYN] Seq=0 Win=65460 Len=0 MSS=65460 SACK_PERM TSval=1089126192 TSecr=0 WS=128
1434 38.747357 2001:db8:9a55::1 → 2a01:4f8:222:904::2 82 TCP 10003 → 54312 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=61440 WS=256
1435 38.747500 2a01:4f8:222:904::2 → 2001:db8:9a55::1 74 TCP 54312 → 10003 [ACK] Seq=1 Ack=1 Win=65536 Len=0
1436 38.747769 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192
1437 38.747798 2a01:4f8:222:904::2 → 2001:db8:9a55::1 32841 TCP 54312 → 10003 [ACK] Seq=8193 Ack=1 Win=65536 Len=32767
1438 38.748049 2001:db8:9a55::1 → 2a01:4f8:222:904::2 74 TCP [TCP Window Update] 10003 → 54312 [ACK] Seq=1 Ack=1 Win=65280 Len=0
1439 38.954044 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP [TCP Retransmission] 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192
1440 39.370096 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP [TCP Retransmission] 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192
1441 40.202135 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP [TCP Retransmission] 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192
because seq_ack_to_tap is already set to the sequence after frame
number 1437 in the example.
For some reason, I could only reproduce this with vhost-user, IPv6,
and passt running under valgrind while taking captures. Even under
these conditions, it happens quite rarely.
Forcibly send an ACK segment if we update the ACK sequence (or the
advertised window).
Fixes: e5eefe77435a ("tcp: Refactor to use events instead of states, split out spliced implementation")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
tcp.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tcp.c b/tcp.c
index ec433f7..72fca63 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2200,8 +2200,10 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref,
if (events & EPOLLIN)
tcp_data_from_sock(c, conn);
- if (events & EPOLLOUT)
- tcp_update_seqack_wnd(c, conn, false, NULL);
+ if (events & EPOLLOUT) {
+ if (tcp_update_seqack_wnd(c, conn, false, NULL))
+ tcp_send_flag(c, conn, ACK);
+ }
return;
}
--
@@ -2200,8 +2200,10 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref,
if (events & EPOLLIN)
tcp_data_from_sock(c, conn);
- if (events & EPOLLOUT)
- tcp_update_seqack_wnd(c, conn, false, NULL);
+ if (events & EPOLLOUT) {
+ if (tcp_update_seqack_wnd(c, conn, false, NULL))
+ tcp_send_flag(c, conn, ACK);
+ }
return;
}
--
2.43.0
next prev parent reply other threads:[~2025-01-16 20:32 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-16 20:32 [PATCH 0/4] Fixes for EAGAIN/EPOLLIN storm and related issues Stefano Brivio
2025-01-16 20:32 ` Stefano Brivio [this message]
2025-01-16 20:32 ` [PATCH 2/4] tcp: Don't subscribe to EPOLLOUT events on STALLED Stefano Brivio
2025-01-16 20:32 ` [PATCH 3/4] tcp: Set EPOLLET when when reading from a socket fails with EAGAIN Stefano Brivio
2025-01-16 20:32 ` [PATCH 4/4] tcp: Mask EPOLLIN altogether if we're blocked waiting on an ACK from the guest Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250116203250.784496-2-sbrivio@redhat.com \
--to=sbrivio@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=jmaloy@redhat.com \
--cc=lina@asahilina.net \
--cc=passt-dev@passt.top \
--cc=slp@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).