From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202508 header.b=BDbq6yay; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 4E5135A0278 for ; Mon, 01 Sep 2025 06:31:48 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202508; t=1756701105; bh=LkRrQNahWYzxUWQKHTmKiS4TSlHiXWve+UFWqyXuvcc=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=BDbq6yaysw7VD+1dxf2clPFKSVoShKMRu9qVZjEbsXa1ZSYKNSpBFSVa/0VrWJbj8 NpNBEBCX92GOqU/j/Fiq+wBQ7cle5sKfEUxDjUF9EG06NVm++oyKVPZle64VZYDxcx gxRUWxrXSNImW9yrR+B/WSzwsq0TSFCDX2lDGOIHbiKfL9KtuJ/G2/XeBXtcImUuuK c2J+Ofs/zUqFqUpGbZJ2h6VKdob8uXrETK9oc8CP8wKspeY5ENQMDVM0q1+uikIuFT IRIGDdIWQAIbTID8SVtIkNrp2dX9u7wLi9yaRIPSmvc2noJ0Jz/ozvmZg4aCEGzExC raZzDWltAwupg== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4cFbYY6bqjz4w91; Mon, 1 Sep 2025 14:31:45 +1000 (AEST) Date: Mon, 1 Sep 2025 14:28:33 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v3 3/7] tcp: Rewind sequence when guest shrinks window to zero Message-ID: References: <20250829201132.1561650-1-sbrivio@redhat.com> <20250829201132.1561650-4-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="VuYvbB5uFahWIM5f" Content-Disposition: inline In-Reply-To: <20250829201132.1561650-4-sbrivio@redhat.com> Message-ID-Hash: PRQCGWJHP6GQ35BHSBGK7NUSK52GCHKA X-Message-ID-Hash: PRQCGWJHP6GQ35BHSBGK7NUSK52GCHKA X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Jon Maloy , Paul Holzinger X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --VuYvbB5uFahWIM5f Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Aug 29, 2025 at 10:11:28PM +0200, Stefano Brivio wrote: > A window shrunk to zero means by definition that anything else that > might be in flight is now out of window. Restart from the currently > acknowledged sequence. >=20 > We need to do that both in tcp_tap_window_update(), where we already > check for zero-window updates, as well as in tcp_data_from_tap(), > because we might get one of those updates in a batch of packets that > also contains a non-zero window update. >=20 > Suggested-by: Jon Maloy > Signed-off-by: Stefano Brivio Reviewed-by: David Gibson > --- > tcp.c | 34 +++++++++++++++++++++++++--------- > 1 file changed, 25 insertions(+), 9 deletions(-) >=20 > diff --git a/tcp.c b/tcp.c > index 1402ca2..11c9c84 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -1257,19 +1257,25 @@ static void tcp_get_tap_ws(struct tcp_tap_conn *c= onn, > =20 > /** > * tcp_tap_window_update() - Process an updated window from tap side > + * @c: Execution context > * @conn: Connection pointer > * @wnd: Window value, host order, unscaled > */ > -static void tcp_tap_window_update(struct tcp_tap_conn *conn, unsigned wn= d) > +static void tcp_tap_window_update(const struct ctx *c, > + struct tcp_tap_conn *conn, unsigned wnd) > { > wnd =3D MIN(MAX_WINDOW, wnd << conn->ws_from_tap); > =20 > /* Work-around for bug introduced in peer kernel code, commit > - * e2142825c120 ("net: tcp: send zero-window ACK when no memory"). > - * We don't update if window shrank to zero. > + * e2142825c120 ("net: tcp: send zero-window ACK when no memory"): don't > + * update the window if it shrank to zero, so that we'll eventually > + * retry to send data, but rewind the sequence as that obviously implies > + * that no data beyond the updated window will ever be acknowledged. Nit: Arguably "no data...will ever" is not quite right. It presumbly won't be acknowledged until we resend it at least once, but we certainly hope it will be acknowledged after that point. > */ > - if (!wnd && SEQ_LT(conn->seq_ack_from_tap, conn->seq_to_tap)) > + if (!wnd && SEQ_LT(conn->seq_ack_from_tap, conn->seq_to_tap)) { > + tcp_rewind_seq(c, conn); > return; > + } > =20 > conn->wnd_from_tap =3D MIN(wnd >> conn->ws_from_tap, USHRT_MAX); > =20 > @@ -1694,7 +1700,8 @@ static int tcp_data_from_tap(const struct ctx *c, s= truct tcp_tap_conn *conn, > tcp_timer_ctl(c, conn); > =20 > if (p->count =3D=3D 1) { > - tcp_tap_window_update(conn, ntohs(th->window)); > + tcp_tap_window_update(c, conn, > + ntohs(th->window)); > return 1; > } > =20 > @@ -1713,6 +1720,15 @@ static int tcp_data_from_tap(const struct ctx *c, = struct tcp_tap_conn *conn, > ack_seq =3D=3D max_ack_seq && > ntohs(th->window) =3D=3D max_ack_seq_wnd; > =20 > + /* See tcp_tap_window_update() for details. On > + * top of that, we also need to check here if a > + * zero-window update is contained in a batch of > + * packets that includes a non-zero window as > + * well. > + */ > + if (!ntohs(th->window)) > + tcp_rewind_seq(c, conn); > + > max_ack_seq_wnd =3D ntohs(th->window); > max_ack_seq =3D ack_seq; > } > @@ -1772,7 +1788,7 @@ static int tcp_data_from_tap(const struct ctx *c, s= truct tcp_tap_conn *conn, > if (ack && !tcp_sock_consume(conn, max_ack_seq)) > tcp_update_seqack_from_tap(c, conn, max_ack_seq); > =20 > - tcp_tap_window_update(conn, max_ack_seq_wnd); > + tcp_tap_window_update(c, conn, max_ack_seq_wnd); > =20 > if (retr) { > flow_trace(conn, > @@ -1861,7 +1877,7 @@ static void tcp_conn_from_sock_finish(const struct = ctx *c, > const struct tcphdr *th, > const char *opts, size_t optlen) > { > - tcp_tap_window_update(conn, ntohs(th->window)); > + tcp_tap_window_update(c, conn, ntohs(th->window)); > tcp_get_tap_ws(conn, opts, optlen); > =20 > /* First value is not scaled */ > @@ -2059,7 +2075,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pi= f, sa_family_t af, > if (!th->ack) > goto reset; > =20 > - tcp_tap_window_update(conn, ntohs(th->window)); > + tcp_tap_window_update(c, conn, ntohs(th->window)); > =20 > tcp_data_from_sock(c, conn); > =20 > @@ -2071,7 +2087,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pi= f, sa_family_t af, > if (conn->events & TAP_FIN_RCVD) { > tcp_sock_consume(conn, ntohl(th->ack_seq)); > tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq)); > - tcp_tap_window_update(conn, ntohs(th->window)); > + tcp_tap_window_update(c, conn, ntohs(th->window)); > tcp_data_from_sock(c, conn); > =20 > if (conn->events & SOCK_FIN_RCVD && > --=20 > 2.43.0 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --VuYvbB5uFahWIM5f Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmi1IOEACgkQzQJF27ox 2GdzgA//QNOwiJKf+KUVXF+tUQs+dCAgKB4oU5nWFzyVs2fUnS290BPkln4JsGH3 vbo298bvfWpfRQ8FpBkjmOJHJMgX7UqxblwrrxxCb3OP9pHJSdXexSouY1VCN2LO fKlYlKGll9jMvBBNw/Xsudpctu2HoctS/Q5ZKBRAGigl+/bOdI51yEl2EtcK5h1X fA/kl96FLeToDHEuvjPFTT9YZ18jPpKglUtqOjVjv4lR5co0M5O6ukPAbXPOjVVS cL0l3B55n4VKEPSCfKP4fqtxxpmRNc6UitH2v0vrac26Ts98XFkIRDYiNGrqEZdD NiJ43zPpTeOSoOPP1nzG5GToCjpkJ77FTt1KR1HBGn1Ha8YOTM84kZh1WuhWB0tP mMRlQpyqHMdmCOqrf+flnnW5cSB8ERXhp0O1D11Or9fScA24FDy9g0I3FX7R7/Nx FVbdHl6u6fmJ16wZmhUqjvKMqkpmCfSzaJ3Mv8FmbMe4eyBxVZA+VO1Y/4RJnTUm R8eqI0Sf4FbDKOIhIVlyCyHCd34fMnbD/3BA5nIgw5seHSyg5aaHXzMGbIh4mDXw mbFjE9SKAhvHnXV8HMDsKCt5dwA9iQcLsy/O9zELQAEZBfb3RiScsAnEBo/65n8Y ByXwlFaZD3cGfzP/gLxWkg1mSMEr26XPJ1AhdHwewUCpXCxOqSU= =4NR4 -----END PGP SIGNATURE----- --VuYvbB5uFahWIM5f--