public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top, Asahi Lina <lina@asahilina.net>,
	Sergio Lopez <slp@redhat.com>, Jon Maloy <jmaloy@redhat.com>
Subject: Re: [PATCH 1/4] tcp: Fix ACK sequence getting out of sync on EPOLLOUT wake-up
Date: Fri, 24 Jan 2025 13:57:09 +1100	[thread overview]
Message-ID: <Z5MBhQdRHHOVqRmY@zatzit> (raw)
In-Reply-To: <20250116203250.784496-2-sbrivio@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 5502 bytes --]

On Thu, Jan 16, 2025 at 09:32:47PM +0100, Stefano Brivio wrote:
> In the next patches, I'm extending the usage of STALLED to a few more
> cases.
> 
> Doing so revealed this issue: if we set STALLED and, consequently,
> EPOLLOUT (which is wrong, fixed later) right after we set a connection
> to ESTABLISHED (which also happened by mistake while I was preparing
> another change), with the guest sending data together with the final
> ACK in the handshake, say:
> 
>   41.3661: vhost-user: got kick_data: 0000000000000001 idx: 1
>   41.3662: Flow 2 (NEW): FREE -> NEW
>   41.3663: Flow 2 (INI): NEW -> INI
>   41.3663: Flow 2 (INI): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => ?
>   41.3665: Flow 2 (TGT): INI -> TGT
>   41.3666: Flow 2 (TGT): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => HOST [::]:0 -> [2001:db8:9a55::1]:10003
>   41.3667: Flow 2 (TCP connection): TGT -> TYPED
>   41.3667: Flow 2 (TCP connection): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => HOST [::]:0 -> [2001:db8:9a55::1]:10003
>   41.3669: Flow 2 (TCP connection): TAP_SYN_RCVD: CLOSED -> SYN_SENT
>   41.3670: Flow 2 (TCP connection): Side 0 hash table insert: bucket: 339814
>   41.3672: Flow 2 (TCP connection): TYPED -> ACTIVE
>   41.3673: Flow 2 (TCP connection): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => HOST [::]:0 -> [2001:db8:9a55::1]:10003
>   41.3674: Flow 2 (TCP connection): TAP_SYN_ACK_SENT: SYN_SENT -> SYN_RCVD
>   41.3675: Flow 2 (TCP connection): ACK_FROM_TAP_DUE
>   41.3675: Flow 2 (TCP connection): timer expires in 10.000s
>   41.3675: vhost-user: got kick_data: 0000000000000001 idx: 1
>   41.3676: Flow 2 (TCP connection): ACK_FROM_TAP_DUE dropped
>   41.3676: Flow 2 (TCP connection): ESTABLISHED: SYN_RCVD -> ESTABLISHED
>   41.3678: Flow 2 (TCP connection): STALLED
>   41.3678: vhost-user: got kick_data: 0000000000000002 idx: 1
>   41.3679: Flow 2 (TCP connection): ACK_TO_TAP_DUE
>   41.3680: Flow 2 (TCP connection): timer expires in 0.010s
>   41.3680: Flow 2 (TCP connection): STALLED dropped
> 
> we'll immediately get an EPOLLOUT event, call tcp_update_seqack_wnd(),
> but ignore window and ACK sequence update. At this point, we think we
> acknowledged all the data to the guest (but we didn't) and we'll
> happily proceed to clear the ACK_TO_TAP_DUE flag:
> 
>   41.3780: Flow 2 (TCP connection): ACK_TO_TAP_DUE dropped
>   41.3780: Flow 2 (TCP connection): timer expires in 7200.000s
>   41.5754: vhost-user: got kick_data: 0000000000000001 idx: 1
>   41.9956: vhost-user: got kick_data: 0000000000000001 idx: 1
>   42.8275: vhost-user: got kick_data: 0000000000000001 idx: 1
> 
> while the guest starts retransmitting that data desperately, without
> ever getting an ACK segment from us:
> 
>    1433  38.746353 2a01:4f8:222:904::2 → 2001:db8:9a55::1 94 TCP 54312 → 10003 [SYN] Seq=0 Win=65460 Len=0 MSS=65460 SACK_PERM TSval=1089126192 TSecr=0 WS=128
>    1434  38.747357 2001:db8:9a55::1 → 2a01:4f8:222:904::2 82 TCP 10003 → 54312 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=61440 WS=256
>    1435  38.747500 2a01:4f8:222:904::2 → 2001:db8:9a55::1 74 TCP 54312 → 10003 [ACK] Seq=1 Ack=1 Win=65536 Len=0
>    1436  38.747769 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192
>    1437  38.747798 2a01:4f8:222:904::2 → 2001:db8:9a55::1 32841 TCP 54312 → 10003 [ACK] Seq=8193 Ack=1 Win=65536 Len=32767
>    1438  38.748049 2001:db8:9a55::1 → 2a01:4f8:222:904::2 74 TCP [TCP Window Update] 10003 → 54312 [ACK] Seq=1 Ack=1 Win=65280 Len=0
>    1439  38.954044 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP [TCP Retransmission] 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192
>    1440  39.370096 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP [TCP Retransmission] 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192
>    1441  40.202135 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP [TCP Retransmission] 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192
> 
> because seq_ack_to_tap is already set to the sequence after frame
> number 1437 in the example.
> 
> For some reason, I could only reproduce this with vhost-user, IPv6,
> and passt running under valgrind while taking captures. Even under
> these conditions, it happens quite rarely.
> 
> Forcibly send an ACK segment if we update the ACK sequence (or the
> advertised window).
> 
> Fixes: e5eefe77435a ("tcp: Refactor to use events instead of states, split out spliced implementation")
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  tcp.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/tcp.c b/tcp.c
> index ec433f7..72fca63 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -2200,8 +2200,10 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref,
>  		if (events & EPOLLIN)
>  			tcp_data_from_sock(c, conn);
>  
> -		if (events & EPOLLOUT)
> -			tcp_update_seqack_wnd(c, conn, false, NULL);
> +		if (events & EPOLLOUT) {
> +			if (tcp_update_seqack_wnd(c, conn, false, NULL))
> +				tcp_send_flag(c, conn, ACK);
> +		}
>  
>  		return;
>  	}

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2025-01-24  3:38 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-16 20:32 [PATCH 0/4] Fixes for EAGAIN/EPOLLIN storm and related issues Stefano Brivio
2025-01-16 20:32 ` [PATCH 1/4] tcp: Fix ACK sequence getting out of sync on EPOLLOUT wake-up Stefano Brivio
2025-01-24  2:57   ` David Gibson [this message]
2025-01-16 20:32 ` [PATCH 2/4] tcp: Don't subscribe to EPOLLOUT events on STALLED Stefano Brivio
2025-01-24  2:58   ` David Gibson
2025-01-16 20:32 ` [PATCH 3/4] tcp: Set EPOLLET when when reading from a socket fails with EAGAIN Stefano Brivio
2025-01-24  3:00   ` David Gibson
2025-01-16 20:32 ` [PATCH 4/4] tcp: Mask EPOLLIN altogether if we're blocked waiting on an ACK from the guest Stefano Brivio
2025-01-24  3:06   ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z5MBhQdRHHOVqRmY@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=jmaloy@redhat.com \
    --cc=lina@asahilina.net \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    --cc=slp@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).