From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202412 header.b=Lq0AnE68; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 8CB825A0272 for ; Fri, 24 Jan 2025 04:38:25 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202412; t=1737689892; bh=rZ+KWseTMQnb1MsVLZrve63lKjEubzwGYm7mVFl6oPc=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Lq0AnE68eCW/KwcuZW8mGhOr6Vp7Mhq+NIaPEvBP8U3XBRYiACEH8sFuHXsVPP3Mk CX5OAiIhGTuK6Wf4LcoxqLlSsmNS2n5W2szh8dnmRzBuf+1YoxEyHaXI7Dii1S5ELJ YVLVyvOUUj7tUTT1KRN2OHaI0eowrT0cyfd/YHW+q3YVY1OjQOTrkCbfI8/q9DupQ4 j7T/ggAnkjYxBjns+DleLQxKNpIOIPhyvOE7+EcIgvbIaxUV5ZDLO/IoyC4gxCzHyZ ol8P0Cr3v+k7dEVipa/gdaHvQUJpDHG7z3l9w59HKRlejUgmGxjsqo5LdNUJYpBz7N Cdrt8AFmk+xOA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4YfNnJ3bdLz4wj1; Fri, 24 Jan 2025 14:38:12 +1100 (AEDT) Date: Fri, 24 Jan 2025 13:57:09 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 1/4] tcp: Fix ACK sequence getting out of sync on EPOLLOUT wake-up Message-ID: References: <20250116203250.784496-1-sbrivio@redhat.com> <20250116203250.784496-2-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="piUVwROEdO71lDfS" Content-Disposition: inline In-Reply-To: <20250116203250.784496-2-sbrivio@redhat.com> Message-ID-Hash: QSI2FCAHNEWTHZAQ6FS7J7T63D367F3P X-Message-ID-Hash: QSI2FCAHNEWTHZAQ6FS7J7T63D367F3P X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Asahi Lina , Sergio Lopez , Jon Maloy X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --piUVwROEdO71lDfS Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jan 16, 2025 at 09:32:47PM +0100, Stefano Brivio wrote: > In the next patches, I'm extending the usage of STALLED to a few more > cases. >=20 > Doing so revealed this issue: if we set STALLED and, consequently, > EPOLLOUT (which is wrong, fixed later) right after we set a connection > to ESTABLISHED (which also happened by mistake while I was preparing > another change), with the guest sending data together with the final > ACK in the handshake, say: >=20 > 41.3661: vhost-user: got kick_data: 0000000000000001 idx: 1 > 41.3662: Flow 2 (NEW): FREE -> NEW > 41.3663: Flow 2 (INI): NEW -> INI > 41.3663: Flow 2 (INI): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a5= 5::1]:10003 =3D> ? > 41.3665: Flow 2 (TGT): INI -> TGT > 41.3666: Flow 2 (TGT): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a5= 5::1]:10003 =3D> HOST [::]:0 -> [2001:db8:9a55::1]:10003 > 41.3667: Flow 2 (TCP connection): TGT -> TYPED > 41.3667: Flow 2 (TCP connection): TAP [2a01:4f8:222:904::2]:52536 -> [2= 001:db8:9a55::1]:10003 =3D> HOST [::]:0 -> [2001:db8:9a55::1]:10003 > 41.3669: Flow 2 (TCP connection): TAP_SYN_RCVD: CLOSED -> SYN_SENT > 41.3670: Flow 2 (TCP connection): Side 0 hash table insert: bucket: 339= 814 > 41.3672: Flow 2 (TCP connection): TYPED -> ACTIVE > 41.3673: Flow 2 (TCP connection): TAP [2a01:4f8:222:904::2]:52536 -> [2= 001:db8:9a55::1]:10003 =3D> HOST [::]:0 -> [2001:db8:9a55::1]:10003 > 41.3674: Flow 2 (TCP connection): TAP_SYN_ACK_SENT: SYN_SENT -> SYN_RCVD > 41.3675: Flow 2 (TCP connection): ACK_FROM_TAP_DUE > 41.3675: Flow 2 (TCP connection): timer expires in 10.000s > 41.3675: vhost-user: got kick_data: 0000000000000001 idx: 1 > 41.3676: Flow 2 (TCP connection): ACK_FROM_TAP_DUE dropped > 41.3676: Flow 2 (TCP connection): ESTABLISHED: SYN_RCVD -> ESTABLISHED > 41.3678: Flow 2 (TCP connection): STALLED > 41.3678: vhost-user: got kick_data: 0000000000000002 idx: 1 > 41.3679: Flow 2 (TCP connection): ACK_TO_TAP_DUE > 41.3680: Flow 2 (TCP connection): timer expires in 0.010s > 41.3680: Flow 2 (TCP connection): STALLED dropped >=20 > we'll immediately get an EPOLLOUT event, call tcp_update_seqack_wnd(), > but ignore window and ACK sequence update. At this point, we think we > acknowledged all the data to the guest (but we didn't) and we'll > happily proceed to clear the ACK_TO_TAP_DUE flag: >=20 > 41.3780: Flow 2 (TCP connection): ACK_TO_TAP_DUE dropped > 41.3780: Flow 2 (TCP connection): timer expires in 7200.000s > 41.5754: vhost-user: got kick_data: 0000000000000001 idx: 1 > 41.9956: vhost-user: got kick_data: 0000000000000001 idx: 1 > 42.8275: vhost-user: got kick_data: 0000000000000001 idx: 1 >=20 > while the guest starts retransmitting that data desperately, without > ever getting an ACK segment from us: >=20 > 1433 38.746353 2a01:4f8:222:904::2 =E2=86=92 2001:db8:9a55::1 94 TCP = 54312 =E2=86=92 10003 [SYN] Seq=3D0 Win=3D65460 Len=3D0 MSS=3D65460 SACK_PE= RM TSval=3D1089126192 TSecr=3D0 WS=3D128 > 1434 38.747357 2001:db8:9a55::1 =E2=86=92 2a01:4f8:222:904::2 82 TCP = 10003 =E2=86=92 54312 [SYN, ACK] Seq=3D0 Ack=3D1 Win=3D65535 Len=3D0 MSS=3D= 61440 WS=3D256 > 1435 38.747500 2a01:4f8:222:904::2 =E2=86=92 2001:db8:9a55::1 74 TCP = 54312 =E2=86=92 10003 [ACK] Seq=3D1 Ack=3D1 Win=3D65536 Len=3D0 > 1436 38.747769 2a01:4f8:222:904::2 =E2=86=92 2001:db8:9a55::1 8266 TC= P 54312 =E2=86=92 10003 [PSH, ACK] Seq=3D1 Ack=3D1 Win=3D65536 Len=3D8192 > 1437 38.747798 2a01:4f8:222:904::2 =E2=86=92 2001:db8:9a55::1 32841 T= CP 54312 =E2=86=92 10003 [ACK] Seq=3D8193 Ack=3D1 Win=3D65536 Len=3D32767 > 1438 38.748049 2001:db8:9a55::1 =E2=86=92 2a01:4f8:222:904::2 74 TCP = [TCP Window Update] 10003 =E2=86=92 54312 [ACK] Seq=3D1 Ack=3D1 Win=3D65280= Len=3D0 > 1439 38.954044 2a01:4f8:222:904::2 =E2=86=92 2001:db8:9a55::1 8266 TC= P [TCP Retransmission] 54312 =E2=86=92 10003 [PSH, ACK] Seq=3D1 Ack=3D1 Win= =3D65536 Len=3D8192 > 1440 39.370096 2a01:4f8:222:904::2 =E2=86=92 2001:db8:9a55::1 8266 TC= P [TCP Retransmission] 54312 =E2=86=92 10003 [PSH, ACK] Seq=3D1 Ack=3D1 Win= =3D65536 Len=3D8192 > 1441 40.202135 2a01:4f8:222:904::2 =E2=86=92 2001:db8:9a55::1 8266 TC= P [TCP Retransmission] 54312 =E2=86=92 10003 [PSH, ACK] Seq=3D1 Ack=3D1 Win= =3D65536 Len=3D8192 >=20 > because seq_ack_to_tap is already set to the sequence after frame > number 1437 in the example. >=20 > For some reason, I could only reproduce this with vhost-user, IPv6, > and passt running under valgrind while taking captures. Even under > these conditions, it happens quite rarely. >=20 > Forcibly send an ACK segment if we update the ACK sequence (or the > advertised window). >=20 > Fixes: e5eefe77435a ("tcp: Refactor to use events instead of states, spli= t out spliced implementation") > Signed-off-by: Stefano Brivio Reviewed-by: David Gibson > --- > tcp.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) >=20 > diff --git a/tcp.c b/tcp.c > index ec433f7..72fca63 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -2200,8 +2200,10 @@ void tcp_sock_handler(const struct ctx *c, union e= poll_ref ref, > if (events & EPOLLIN) > tcp_data_from_sock(c, conn); > =20 > - if (events & EPOLLOUT) > - tcp_update_seqack_wnd(c, conn, false, NULL); > + if (events & EPOLLOUT) { > + if (tcp_update_seqack_wnd(c, conn, false, NULL)) > + tcp_send_flag(c, conn, ACK); > + } > =20 > return; > } --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --piUVwROEdO71lDfS Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmeTAYAACgkQzQJF27ox 2Gfd4Q/8CUy8jNgcqGu4fhMI0xnLHpBxWc6cSMWJCZ9wxCxsaIDNry2amKeKsBNp MC9iE1E3oemusWX8m4hd/19DpdgG8NZ3ty55jgtVNC+vcAguBqV/YEwPi6AWR/9j FMwJ6eJIv2nYiYpO4KP46EKmg/ULvHd0UqyFpVysILpZrBvqtBMcLmjtU7KyWPZ1 zEikVPzrba1mKrMYHqrLgmTl+mgcS8HlhuLHHs2WODLHXQ4MQHk1auQ/JF7WwHQJ EFIeRdH2H869mqsKwJtoWUncneWCVOGZLa1EPxXuH0gqkXGX6PRJCFKqQx3sVUOy /GrFVC8gRfusRQK82jVrlkVqQ0kRZM0ZxAUJIcDKCqiVeEuMQzvDhJwcTHVxyMTW bVIBhEvuGuLCzUhTpIcMVGs36r/ubyn8bN9yV6LHT7bbm7/JV7OrMbmgmkxVaot5 hITQqvT/KHgPEz9H4KfYJ8P3U28iAAbFytv85G0IU0WyhbyqDpRivOlF3fkKNGAa 2NMzXWoZyLScS8SMTkwfLaLS7XjJUU/FqDbSBCTLzM1UTgOx88VvaaUcglBb98U7 8oxawXpRMItu6z4eUrvvJGLWf7pzzSjQEgZZvhahU5DroP4pXRqR4YJVWSUsTV/6 +ua84kKvgg5TOo1qyBMAPlh93DUFoRsUMc13hL+ix+gk2vuo58c= =1XjU -----END PGP SIGNATURE----- --piUVwROEdO71lDfS--