From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top, Asahi Lina <lina@asahilina.net>,
Sergio Lopez <slp@redhat.com>, Jon Maloy <jmaloy@redhat.com>
Subject: Re: [PATCH 1/4] tcp: Fix ACK sequence getting out of sync on EPOLLOUT wake-up
Date: Fri, 24 Jan 2025 13:57:09 +1100 [thread overview]
Message-ID: <Z5MBhQdRHHOVqRmY@zatzit> (raw)
In-Reply-To: <20250116203250.784496-2-sbrivio@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 5502 bytes --]
On Thu, Jan 16, 2025 at 09:32:47PM +0100, Stefano Brivio wrote:
> In the next patches, I'm extending the usage of STALLED to a few more
> cases.
>
> Doing so revealed this issue: if we set STALLED and, consequently,
> EPOLLOUT (which is wrong, fixed later) right after we set a connection
> to ESTABLISHED (which also happened by mistake while I was preparing
> another change), with the guest sending data together with the final
> ACK in the handshake, say:
>
> 41.3661: vhost-user: got kick_data: 0000000000000001 idx: 1
> 41.3662: Flow 2 (NEW): FREE -> NEW
> 41.3663: Flow 2 (INI): NEW -> INI
> 41.3663: Flow 2 (INI): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => ?
> 41.3665: Flow 2 (TGT): INI -> TGT
> 41.3666: Flow 2 (TGT): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => HOST [::]:0 -> [2001:db8:9a55::1]:10003
> 41.3667: Flow 2 (TCP connection): TGT -> TYPED
> 41.3667: Flow 2 (TCP connection): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => HOST [::]:0 -> [2001:db8:9a55::1]:10003
> 41.3669: Flow 2 (TCP connection): TAP_SYN_RCVD: CLOSED -> SYN_SENT
> 41.3670: Flow 2 (TCP connection): Side 0 hash table insert: bucket: 339814
> 41.3672: Flow 2 (TCP connection): TYPED -> ACTIVE
> 41.3673: Flow 2 (TCP connection): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => HOST [::]:0 -> [2001:db8:9a55::1]:10003
> 41.3674: Flow 2 (TCP connection): TAP_SYN_ACK_SENT: SYN_SENT -> SYN_RCVD
> 41.3675: Flow 2 (TCP connection): ACK_FROM_TAP_DUE
> 41.3675: Flow 2 (TCP connection): timer expires in 10.000s
> 41.3675: vhost-user: got kick_data: 0000000000000001 idx: 1
> 41.3676: Flow 2 (TCP connection): ACK_FROM_TAP_DUE dropped
> 41.3676: Flow 2 (TCP connection): ESTABLISHED: SYN_RCVD -> ESTABLISHED
> 41.3678: Flow 2 (TCP connection): STALLED
> 41.3678: vhost-user: got kick_data: 0000000000000002 idx: 1
> 41.3679: Flow 2 (TCP connection): ACK_TO_TAP_DUE
> 41.3680: Flow 2 (TCP connection): timer expires in 0.010s
> 41.3680: Flow 2 (TCP connection): STALLED dropped
>
> we'll immediately get an EPOLLOUT event, call tcp_update_seqack_wnd(),
> but ignore window and ACK sequence update. At this point, we think we
> acknowledged all the data to the guest (but we didn't) and we'll
> happily proceed to clear the ACK_TO_TAP_DUE flag:
>
> 41.3780: Flow 2 (TCP connection): ACK_TO_TAP_DUE dropped
> 41.3780: Flow 2 (TCP connection): timer expires in 7200.000s
> 41.5754: vhost-user: got kick_data: 0000000000000001 idx: 1
> 41.9956: vhost-user: got kick_data: 0000000000000001 idx: 1
> 42.8275: vhost-user: got kick_data: 0000000000000001 idx: 1
>
> while the guest starts retransmitting that data desperately, without
> ever getting an ACK segment from us:
>
> 1433 38.746353 2a01:4f8:222:904::2 → 2001:db8:9a55::1 94 TCP 54312 → 10003 [SYN] Seq=0 Win=65460 Len=0 MSS=65460 SACK_PERM TSval=1089126192 TSecr=0 WS=128
> 1434 38.747357 2001:db8:9a55::1 → 2a01:4f8:222:904::2 82 TCP 10003 → 54312 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=61440 WS=256
> 1435 38.747500 2a01:4f8:222:904::2 → 2001:db8:9a55::1 74 TCP 54312 → 10003 [ACK] Seq=1 Ack=1 Win=65536 Len=0
> 1436 38.747769 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192
> 1437 38.747798 2a01:4f8:222:904::2 → 2001:db8:9a55::1 32841 TCP 54312 → 10003 [ACK] Seq=8193 Ack=1 Win=65536 Len=32767
> 1438 38.748049 2001:db8:9a55::1 → 2a01:4f8:222:904::2 74 TCP [TCP Window Update] 10003 → 54312 [ACK] Seq=1 Ack=1 Win=65280 Len=0
> 1439 38.954044 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP [TCP Retransmission] 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192
> 1440 39.370096 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP [TCP Retransmission] 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192
> 1441 40.202135 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP [TCP Retransmission] 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192
>
> because seq_ack_to_tap is already set to the sequence after frame
> number 1437 in the example.
>
> For some reason, I could only reproduce this with vhost-user, IPv6,
> and passt running under valgrind while taking captures. Even under
> these conditions, it happens quite rarely.
>
> Forcibly send an ACK segment if we update the ACK sequence (or the
> advertised window).
>
> Fixes: e5eefe77435a ("tcp: Refactor to use events instead of states, split out spliced implementation")
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> tcp.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/tcp.c b/tcp.c
> index ec433f7..72fca63 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -2200,8 +2200,10 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref,
> if (events & EPOLLIN)
> tcp_data_from_sock(c, conn);
>
> - if (events & EPOLLOUT)
> - tcp_update_seqack_wnd(c, conn, false, NULL);
> + if (events & EPOLLOUT) {
> + if (tcp_update_seqack_wnd(c, conn, false, NULL))
> + tcp_send_flag(c, conn, ACK);
> + }
>
> return;
> }
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2025-01-24 3:38 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-16 20:32 [PATCH 0/4] Fixes for EAGAIN/EPOLLIN storm and related issues Stefano Brivio
2025-01-16 20:32 ` [PATCH 1/4] tcp: Fix ACK sequence getting out of sync on EPOLLOUT wake-up Stefano Brivio
2025-01-24 2:57 ` David Gibson [this message]
2025-01-16 20:32 ` [PATCH 2/4] tcp: Don't subscribe to EPOLLOUT events on STALLED Stefano Brivio
2025-01-24 2:58 ` David Gibson
2025-01-16 20:32 ` [PATCH 3/4] tcp: Set EPOLLET when when reading from a socket fails with EAGAIN Stefano Brivio
2025-01-24 3:00 ` David Gibson
2025-01-16 20:32 ` [PATCH 4/4] tcp: Mask EPOLLIN altogether if we're blocked waiting on an ACK from the guest Stefano Brivio
2025-01-24 3:06 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z5MBhQdRHHOVqRmY@zatzit \
--to=david@gibson.dropbear.id.au \
--cc=jmaloy@redhat.com \
--cc=lina@asahilina.net \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
--cc=slp@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).