From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202508 header.b=M7XSykq2; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 1DE825A0278 for ; Wed, 10 Sep 2025 09:20:50 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202508; t=1757488845; bh=t4ng+AgmiPlpF94qGA6i8mqCU1RPb1mALRh5oo6Dcts=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=M7XSykq29d3/0i0xJb81zoPoVjo2BRw+HrpyzV2UtyaTuN1PTcwYaec3uJ+5/CpJs sWctoZl2FOlwlnYlKdWdISvTV3CU6DnolKuaW9mSOBh8ZYJHzuGNldRi1mPUkgDwcY Cndt0U1Zu0x66SDdpNPtwiD/X59iYsnO+39CHB6hTp2cgokMLvVH67w8Dsq53UqBpe d6nTC+8PYcW0oEMW6wwz9ijWHGcOtG/jcnCsxmE+J+DEM4JuWrY1Vd3VbHkyv2kWGR sD66bPnZ/sT3ql7zq8bHbn+jOQbz4kNhAQk9p3h20avBh7LHy3mnLXkAnM4FVt3EYn gnk3FnhJjDLpA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4cMBtP69lBz4wB0; Wed, 10 Sep 2025 17:20:45 +1000 (AEST) Date: Wed, 10 Sep 2025 17:18:54 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v4 3/8] tcp: Rewind sequence when guest shrinks window to zero Message-ID: References: <20250909181655.2990223-1-sbrivio@redhat.com> <20250909181655.2990223-4-sbrivio@redhat.com> <20250910083748.6b566880@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="fdTQ1Z2x2Iufhfuv" Content-Disposition: inline In-Reply-To: <20250910083748.6b566880@elisabeth> Message-ID-Hash: XYXSEFEEB52U7NJUWAQBIWJK7S5NWPI6 X-Message-ID-Hash: XYXSEFEEB52U7NJUWAQBIWJK7S5NWPI6 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Jon Maloy , Paul Holzinger X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --fdTQ1Z2x2Iufhfuv Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Sep 10, 2025 at 08:37:48AM +0200, Stefano Brivio wrote: > On Wed, 10 Sep 2025 12:20:19 +1000 > David Gibson wrote: >=20 > > On Tue, Sep 09, 2025 at 08:16:50PM +0200, Stefano Brivio wrote: > > > A window shrunk to zero means by definition that anything else that > > > might be in flight is now out of window. Restart from the currently > > > acknowledged sequence. > > >=20 > > > We need to do that both in tcp_tap_window_update(), where we already > > > check for zero-window updates, as well as in tcp_data_from_tap(), > > > because we might get one of those updates in a batch of packets that > > > also contains a non-zero window update. > > >=20 > > > Suggested-by: Jon Maloy > > > Signed-off-by: Stefano Brivio =20 > >=20 > > Reviewed-by: David Gibson > >=20 > > Though a couple of documentation nits below. > >=20 > > > --- > > > tcp.c | 34 +++++++++++++++++++++++++--------- > > > 1 file changed, 25 insertions(+), 9 deletions(-) > > >=20 > > > diff --git a/tcp.c b/tcp.c > > > index 86e08f1..12d42e0 100644 > > > --- a/tcp.c > > > +++ b/tcp.c > > > @@ -1268,19 +1268,25 @@ static void tcp_get_tap_ws(struct tcp_tap_con= n *conn, > > > =20 > > > /** > > > * tcp_tap_window_update() - Process an updated window from tap side > > > + * @c: Execution context > > > * @conn: Connection pointer > > > * @wnd: Window value, host order, unscaled > > > */ > > > -static void tcp_tap_window_update(struct tcp_tap_conn *conn, unsigne= d wnd) > > > +static void tcp_tap_window_update(const struct ctx *c, > > > + struct tcp_tap_conn *conn, unsigned wnd) > > > { > > > wnd =3D MIN(MAX_WINDOW, wnd << conn->ws_from_tap); > > > =20 > > > /* Work-around for bug introduced in peer kernel code, commit > > > - * e2142825c120 ("net: tcp: send zero-window ACK when no memory"). > > > - * We don't update if window shrank to zero. > > > + * e2142825c120 ("net: tcp: send zero-window ACK when no memory"): = don't > > > + * update the window if it shrank to zero, so that we'll eventually > > > + * retry to send data, but rewind the sequence as that obviously im= plies > > > + * that no data beyond the updated window will ever be acknowledged= =2E =20 > >=20 > > As noted earlier "will ever be acknowledged" might be a bit > > misleading. Maybe "no data beyond the window will be acknowledged > > until it is retransmitted". >=20 > Sorry, I actually fixed the patch after your same comment on v3 and > then for some reason I threw away the changes together with some of > your Reviewed-by: tags. Fixed as we discussed on v3 now. Ah, right. It's happened to all of us. >=20 > > > */ > > > - if (!wnd && SEQ_LT(conn->seq_ack_from_tap, conn->seq_to_tap)) > > > + if (!wnd && SEQ_LT(conn->seq_ack_from_tap, conn->seq_to_tap)) { > > > + tcp_rewind_seq(c, conn); > > > return; > > > + } > > > =20 > > > conn->wnd_from_tap =3D MIN(wnd >> conn->ws_from_tap, USHRT_MAX); > > > =20 > > > @@ -1709,7 +1715,8 @@ static int tcp_data_from_tap(const struct ctx *= c, struct tcp_tap_conn *conn, > > > tcp_timer_ctl(c, conn); > > > =20 > > > if (p->count =3D=3D 1) { > > > - tcp_tap_window_update(conn, ntohs(th->window)); > > > + tcp_tap_window_update(c, conn, > > > + ntohs(th->window)); > > > return 1; > > > } > > > =20 > > > @@ -1728,6 +1735,15 @@ static int tcp_data_from_tap(const struct ctx = *c, struct tcp_tap_conn *conn, > > > ack_seq =3D=3D max_ack_seq && > > > ntohs(th->window) =3D=3D max_ack_seq_wnd; > > > =20 > > > + /* See tcp_tap_window_update() for details. On > > > + * top of that, we also need to check here if a > > > + * zero-window update is contained in a batch of > > > + * packets that includes a non-zero window as > > > + * well. =20 > >=20 > > I'm not 100% convinced of this reasoning. But at worst this should > > result in some unnecessary but mostly harmless retransmits, and it > > seems to fix the problem empirically, so I'm not suggesting changing > > it at this time. >=20 > Right, strictly speaking, it *might* be that the peer didn't actually > throw data away, which should be indicated by the fact that the same > data is acknowledged in another segment of this same batch. >=20 > But handling that without making this part unreasonably complicated > implies a refactor of this whole thing: we should extract a function > dealing with sequences and perhaps another one with window updates from > here. >=20 > I plan to file a ticket or two with stuff that emerged from this series. Ok, sounds good. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --fdTQ1Z2x2Iufhfuv Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmjBJlEACgkQzQJF27ox 2GeFmA//WZqqtEW8ri57MEGZnppfXZK/C2bB3Uy/PBJNS+csG3SYqL6MD7UZ05xu X02T03t25OMCGi5AFyzPmlUfB+Y5tZ99bA5PRyuv/UoBnqAt8rcbEy3mk8/jodtt eRP++dJtNXse8/7CgBc+z/rqQN0LQTBy1Xi4IEmpIcF6xVZf3dTEWQu5JFmZC0gH 8A78Zz3lDqsFYnhr18ILI1fJzrELPV2XmWfKHVnWGT2AKRpXoWDanSjoSwr97CBQ EkqwnPSHeMpOBo5zre4HmdyzcKLinwC5z73szPimmC4Cqz+3jO7HqNC0dED2Bu74 bWPSaup0GmDo/+BYKlKDdvGu5lQTHowIdvmh25XGpMBU8f7rFl2OIoBEEvrhVcI1 6uAI+jeMX+oTKh4ESkSVJx9eQ+5C54oyl8rHpk1PRbNcgdIlHWHnnnrHfdTUpv/L BEXCepUy/UL6zZXaz+sQe5BV/4GoTYLy9DsjmI1vq5wpDJl3gao7UELxZNnUzdid nGHgvsZ/qjauQYlS2NPB8EiyVhbDAoyCcL/fbnqTRSBNN018siG5FWedv2D5pAS1 J13aPTowEF4ckWJ9Vcl1q1uUcUmOu5Oi1BPcZcjxk2VCTgQAwiM9BwhXx03LTfJ9 6y2TwgNRZ0hsxsNpsFfhhxjPMSRbzcZfXlHw5hp+2m1VBw9WR7M= =RBP0 -----END PGP SIGNATURE----- --fdTQ1Z2x2Iufhfuv--