From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id C4BCB5A026D for ; Tue, 3 Oct 2023 04:50:49 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=201602; t=1696301446; bh=SBlZD+tm74JT1op+WWgLXL1OWpJJMSDpD5cHJXfKPl8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=CQNgRucNJ8zWCcS3bKUh9m689GWG+srOhegPYYsI1SSjpt9hbA3LUg9C/HSsTd+gk KSwFGOsjVOSJxDtefnNvGbx8Kq8OBbnKRBNt6XAZI+h97jaeXh345gzLoGQW96FEOs WKK/hwxiGTw0f2EyVwiaVwKMofTq2cNR323YUWh4= Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4S02Pf3vnYz4xPQ; Tue, 3 Oct 2023 13:50:46 +1100 (AEDT) Date: Tue, 3 Oct 2023 13:47:13 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v2 2/3] tcp: Force TCP_WINDOW_CLAMP before resetting STALLED flag Message-ID: References: <20230929150446.2671959-1-sbrivio@redhat.com> <20230929150446.2671959-3-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="qE7guAADj18MVqdl" Content-Disposition: inline In-Reply-To: <20230929150446.2671959-3-sbrivio@redhat.com> Message-ID-Hash: 527HSVM7SFC7436V7JCJL5LLZCGDZ3G4 X-Message-ID-Hash: 527HSVM7SFC7436V7JCJL5LLZCGDZ3G4 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Matej Hrica X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --qE7guAADj18MVqdl Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Sep 29, 2023 at 05:04:45PM +0200, Stefano Brivio wrote: > It looks like we need it as workaround for this situation, readily > reproducible at least with a 6.5 Linux kernel, with default rmem_max > and wmem_max values: >=20 > - an iperf3 client on the host sends about 160 KiB, typically > segmented into five frames by passt. We read this data using > MSG_PEEK >=20 > - the iperf3 server on the guest starts receiving >=20 > - meanwhile, the host kernel advertised a zero-sized window to the > receiver, as expected >=20 > - eventually, the guest acknowledges all the data sent so far, and > we drop it from the buffer, courtesy of tcp_sock_consume(), using > recv() with MSG_TRUNC >=20 > - the client, however, doesn't get an updated window value, and > even keepalive packets are answered with zero-window segments, > until the connection is closed >=20 > It looks like dropping data from a socket using MSG_TRUNC doesn't > cause a recalculation of the window, which would be expected as a > result of any receiving operation that invalidates data on a buffer > (that is, not with MSG_PEEK). >=20 > Strangely enough, setting TCP_WINDOW_CLAMP via setsockopt(), even to > the previous value we clamped to, forces a recalculation of the > window which is advertised to the guest. >=20 > I couldn't quite confirm this issue by following all the possible > code paths in the kernel, yet. If confirmed, this should be fixed in > the kernel, but meanwhile this workaround looks robust to me (and it > will be needed for backward compatibility anyway). >=20 > Reported-by: Matej Hrica > Link: https://bugs.passt.top/show_bug.cgi?id=3D74 > Analysed-by: David Gibson > Signed-off-by: Stefano Brivio Reviewed-by: David Gibson > --- > tcp.c | 29 ++++++++++++++++++++++++----- > 1 file changed, 24 insertions(+), 5 deletions(-) >=20 > diff --git a/tcp.c b/tcp.c > index dff5e79..32917c8 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -1780,7 +1780,23 @@ static void tcp_clamp_window(const struct ctx *c, = struct tcp_tap_conn *conn, > wnd <<=3D conn->ws_from_tap; > wnd =3D MIN(MAX_WINDOW, wnd); > =20 > - if (conn->flags & WND_CLAMPED) { > + /* TODO: With (at least) Linux kernel versions 6.1 to 6.5, if we end up > + * with a zero-sized window on a TCP socket, dropping data (once > + * acknowledged by the guest) with recv() and MSG_TRUNC doesn't appear > + * to be enough to make the kernel advertise a non-zero window to the > + * receiver. Forcing a TCP_WINDOW_CLAMP setting, even with the existing > + * value, fixes this. > + * > + * The STALLED flag on a connection is a sufficient indication that we > + * might have a zero-sized window on the socket, because it's set if we > + * exhausted the tap-side window, or if everything we receive from a > + * socket is already in flight to the guest. > + * > + * So, if STALLED is set, and we received a window value from the tap, > + * force a TCP_WINDOW_CLAMP setsockopt(). This should be investigated > + * further and fixed in the kernel instead, if confirmed. > + */ > + if (!(conn->flags & STALLED) && conn->flags & WND_CLAMPED) { > if (prev_scaled =3D=3D wnd) > return; > =20 > @@ -2409,12 +2425,12 @@ static int tcp_data_from_tap(struct ctx *c, struc= t tcp_tap_conn *conn, > i =3D keep - 1; > } > =20 > - tcp_clamp_window(c, conn, max_ack_seq_wnd); > - > /* On socket flush failure, pretend there was no ACK, try again later */ > if (ack && !tcp_sock_consume(conn, max_ack_seq)) > tcp_update_seqack_from_tap(c, conn, max_ack_seq); > =20 > + tcp_clamp_window(c, conn, max_ack_seq_wnd); > + > if (retr) { > trace("TCP: fast re-transmit, ACK: %u, previous sequence: %u", > max_ack_seq, conn->seq_to_tap); > @@ -2572,8 +2588,6 @@ int tcp_tap_handler(struct ctx *c, int af, const vo= id *saddr, const void *daddr, > if (th->ack && !(conn->events & ESTABLISHED)) > tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq)); > =20 > - conn_flag(c, conn, ~STALLED); > - > /* Establishing connection from socket */ > if (conn->events & SOCK_ACCEPTED) { > if (th->syn && th->ack && !th->fin) { > @@ -2628,6 +2642,11 @@ int tcp_tap_handler(struct ctx *c, int af, const v= oid *saddr, const void *daddr, > if (count =3D=3D -1) > goto reset; > =20 > + /* Note: STALLED matters for tcp_clamp_window(): unset it only after > + * processing data (and window) from the tap side > + */ > + conn_flag(c, conn, ~STALLED); > + > if (conn->seq_ack_to_tap !=3D conn->seq_from_tap) > ack_due =3D 1; > =20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --qE7guAADj18MVqdl Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmUbgKsACgkQzQJF27ox 2GevHw//bd1ec9BFFzvVh0ULz3Nk4gRyNWgWGCUcj8ZR/VLZ9KwU4MqgGtlpTUnO sHAUG/b8WAzgC4aJ7d52j34uxgB3/gZwMwZI7oCA1e8QXi2dTuToiZQki9rXWrGH PihOCunFnE/uRjgArXuxzurzloRDUNwd368DvrIHLdVUrBTKx8Mc1QARXGeA6w26 RK/mhddU8D+c+NsGJYGDbzqIGDIGhQSQGcBWlHJJtZ2KzOoEvsUlfb9GSpy8TVKr Gqq6W1Up7r9/RaL/as2qlr7V77mBi3BPK7ArgAC7Ctw1CtEDgYVjF2caEpLXzgLQ ewrmjJTTvUi3nyALLUo/f4vFxv/WdT5+dVFZYPtN7g6RvnNpr9YHs3vSAMhnE/sZ zg0Vax3GReUIojbgTxTTuC2TfwD+iM0lbwcz7nizcRacbRlsjwfkeWr7x73ncpCo 4q/+dBai9Z+ry25uNSrB7SZGV4XUVTDrODk/0IkKRMg/Og/4OXWiynctQnGVFSV6 JRiD0/ojRVppVkYbbUO5XyFdC7RJrWyv2Nog95PEJUiVfdgLINGNpdZM6z/Bg6Et k/a41Z9XKV6gjC6ciKss2KITl8/fOEpPtDzTFolH3WRN4twovjO57iTV1b94Irxb UYbziWSiPYuVpyRIs3g91Lolqx+/bX9UrN7bNflKtB3ordyuySk= =mVWB -----END PGP SIGNATURE----- --qE7guAADj18MVqdl--