From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202508 header.b=E7G8DXTh; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 939F25A027B for ; Wed, 10 Sep 2025 04:29:58 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202508; t=1757471394; bh=lKLfU69I4q3kiC/DqV2CZ8sa6XwxPdnY6r+SCEHPafQ=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=E7G8DXTh2dAd7FCi52oCQeKVsrfO+uecKXBTlRiCNfspyE0MpiH3+lvTIVE7S3OXa nSfi23BOY/Qh8CFMWtF6qbQ2dYpoRByfU4ktFsOevzfG3GoyPq98fhh4WWW2uWCHVi pACknYji4khaM5yiR5uL6apK6iFrFGDv6AEHbSRsQAWoMVHTuaFp2Akwd8/QQ9lhbq mx72s/rr9HeAcKqsr07zakMCELCJuhW6MaKOWLp33n3Wp2GtX7GKsgcWe7/UK29XHO L5IvcsUHGSdJwrgXrtex58Ii3l5+5NAtLz3eTonBTVngN4/8ccrIHSxHxX7q8tGqlr 6ZeB7iA/MpHNw== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4cM4Qp4P3Qz4wBC; Wed, 10 Sep 2025 12:29:54 +1000 (AEST) Date: Wed, 10 Sep 2025 12:20:19 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v4 3/8] tcp: Rewind sequence when guest shrinks window to zero Message-ID: References: <20250909181655.2990223-1-sbrivio@redhat.com> <20250909181655.2990223-4-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="M60Iy5WJUuoCnBSi" Content-Disposition: inline In-Reply-To: <20250909181655.2990223-4-sbrivio@redhat.com> Message-ID-Hash: DVLKUYRIZFNJK5YSNYUJG3N4QWD3WPQS X-Message-ID-Hash: DVLKUYRIZFNJK5YSNYUJG3N4QWD3WPQS X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Jon Maloy , Paul Holzinger X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --M60Iy5WJUuoCnBSi Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Sep 09, 2025 at 08:16:50PM +0200, Stefano Brivio wrote: > A window shrunk to zero means by definition that anything else that > might be in flight is now out of window. Restart from the currently > acknowledged sequence. >=20 > We need to do that both in tcp_tap_window_update(), where we already > check for zero-window updates, as well as in tcp_data_from_tap(), > because we might get one of those updates in a batch of packets that > also contains a non-zero window update. >=20 > Suggested-by: Jon Maloy > Signed-off-by: Stefano Brivio Reviewed-by: David Gibson Though a couple of documentation nits below. > --- > tcp.c | 34 +++++++++++++++++++++++++--------- > 1 file changed, 25 insertions(+), 9 deletions(-) >=20 > diff --git a/tcp.c b/tcp.c > index 86e08f1..12d42e0 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -1268,19 +1268,25 @@ static void tcp_get_tap_ws(struct tcp_tap_conn *c= onn, > =20 > /** > * tcp_tap_window_update() - Process an updated window from tap side > + * @c: Execution context > * @conn: Connection pointer > * @wnd: Window value, host order, unscaled > */ > -static void tcp_tap_window_update(struct tcp_tap_conn *conn, unsigned wn= d) > +static void tcp_tap_window_update(const struct ctx *c, > + struct tcp_tap_conn *conn, unsigned wnd) > { > wnd =3D MIN(MAX_WINDOW, wnd << conn->ws_from_tap); > =20 > /* Work-around for bug introduced in peer kernel code, commit > - * e2142825c120 ("net: tcp: send zero-window ACK when no memory"). > - * We don't update if window shrank to zero. > + * e2142825c120 ("net: tcp: send zero-window ACK when no memory"): don't > + * update the window if it shrank to zero, so that we'll eventually > + * retry to send data, but rewind the sequence as that obviously implies > + * that no data beyond the updated window will ever be acknowledged. As noted earlier "will ever be acknowledged" might be a bit misleading. Maybe "no data beyond the window will be acknowledged until it is retransmitted". > */ > - if (!wnd && SEQ_LT(conn->seq_ack_from_tap, conn->seq_to_tap)) > + if (!wnd && SEQ_LT(conn->seq_ack_from_tap, conn->seq_to_tap)) { > + tcp_rewind_seq(c, conn); > return; > + } > =20 > conn->wnd_from_tap =3D MIN(wnd >> conn->ws_from_tap, USHRT_MAX); > =20 > @@ -1709,7 +1715,8 @@ static int tcp_data_from_tap(const struct ctx *c, s= truct tcp_tap_conn *conn, > tcp_timer_ctl(c, conn); > =20 > if (p->count =3D=3D 1) { > - tcp_tap_window_update(conn, ntohs(th->window)); > + tcp_tap_window_update(c, conn, > + ntohs(th->window)); > return 1; > } > =20 > @@ -1728,6 +1735,15 @@ static int tcp_data_from_tap(const struct ctx *c, = struct tcp_tap_conn *conn, > ack_seq =3D=3D max_ack_seq && > ntohs(th->window) =3D=3D max_ack_seq_wnd; > =20 > + /* See tcp_tap_window_update() for details. On > + * top of that, we also need to check here if a > + * zero-window update is contained in a batch of > + * packets that includes a non-zero window as > + * well. I'm not 100% convinced of this reasoning. But at worst this should result in some unnecessary but mostly harmless retransmits, and it seems to fix the problem empirically, so I'm not suggesting changing it at this time. > + */ > + if (!ntohs(th->window)) > + tcp_rewind_seq(c, conn); > + > max_ack_seq_wnd =3D ntohs(th->window); > max_ack_seq =3D ack_seq; > } > @@ -1791,7 +1807,7 @@ static int tcp_data_from_tap(const struct ctx *c, s= truct tcp_tap_conn *conn, > if (ack && !tcp_sock_consume(conn, max_ack_seq)) > tcp_update_seqack_from_tap(c, conn, max_ack_seq); > =20 > - tcp_tap_window_update(conn, max_ack_seq_wnd); > + tcp_tap_window_update(c, conn, max_ack_seq_wnd); > =20 > if (retr) { > flow_trace(conn, > @@ -1880,7 +1896,7 @@ static void tcp_conn_from_sock_finish(const struct = ctx *c, > const struct tcphdr *th, > const char *opts, size_t optlen) > { > - tcp_tap_window_update(conn, ntohs(th->window)); > + tcp_tap_window_update(c, conn, ntohs(th->window)); > tcp_get_tap_ws(conn, opts, optlen); > =20 > /* First value is not scaled */ > @@ -2085,7 +2101,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pi= f, sa_family_t af, > if (!th->ack) > goto reset; > =20 > - tcp_tap_window_update(conn, ntohs(th->window)); > + tcp_tap_window_update(c, conn, ntohs(th->window)); > =20 > tcp_data_from_sock(c, conn); > =20 > @@ -2097,7 +2113,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pi= f, sa_family_t af, > if (conn->events & TAP_FIN_RCVD) { > tcp_sock_consume(conn, ntohl(th->ack_seq)); > tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq)); > - tcp_tap_window_update(conn, ntohs(th->window)); > + tcp_tap_window_update(c, conn, ntohs(th->window)); > tcp_data_from_sock(c, conn); > =20 > if (conn->events & SOCK_FIN_RCVD && > --=20 > 2.43.0 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --M60Iy5WJUuoCnBSi Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmjA4GIACgkQzQJF27ox 2Gdb5hAAoSgHCTIDcUojdPTKPxg02NqWDGRglD0NoinIC1oX8x3/XrtOlLZVXI/+ rlwFuF2rj6+6EhnwfoheTGEfYEZaFnsbYGu8xfFopsM9PbLHPpb7PR2xia0cssyz duafRaTR82hxNJ3hNUtBbOKYfpSQHvcdrWLgZhYVpfHaz+nYrVRZDu5pxFh9qa7O b5EaPvoQDZiy3q6GeIYvTZvwhW6VM5YmNbCjnfjyn2iCjKfwCGf040evoSlmi4nZ aYooY2uRSqukCMRifGHHc3xd8d4HMndbYgky2sXB7zHCYExFkbpPMGMnKl9OZlmV duikAfr6q1VUHvz7+mQFpCQv4PIfcOWFBB5PTIDWhbSHXwD/XLVoG/4i09DVz5no nPcxPqQidQNccMN7n7EVXk3ctRYs/dznlsjk5+q2+/Rsp4JyGtFO7PyBqNt8DMdt NVxCe6WW54q7LulmJw+mIHv83rp3iIDq8YZj0yY71L4eT98yOwBnyKA3U9BM4/VW KxTdCcIrqw2feIWPo9vjIDnSf0DpIZZ8DzRQVLJCt2gOoY/ApfVr0jnwqUu8MAGI inhmNYnWCylMFT3k7HkHzbgXRGI2Dd3EHmCu2bVMoij3DN+F2CP0oxRectvJKjvB L/Oavu9xo4YtLhkJ/34Dnji2iwLtB0CxcNegTAWiX4iQ06wfG/8= =fAix -----END PGP SIGNATURE----- --M60Iy5WJUuoCnBSi--