From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 3FF1E5A026F for ; Sat, 23 Sep 2023 10:08:57 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=201602; t=1695456531; bh=4ZBiFTz9/I7DZYRC3pmiriujM1qjEvaxgDcsAg2nA9U=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=btRPTNZamQkT3UgGmYw28qRzMVho+tGHNTv0yjut0T0iBZexGLQwAv6+729XDv3yu mndYwgMAoiTmnZj8BCz+nCOwmpeMj5HOWsbT2JGrATxRcFBuwKnQ9TbxSxGiHtbtRd 0e5EKGlBgeOqrHgrFN8k6Iaykgf2oTbCKkfrno6A= Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4Rt1xH49HJz4xPc; Sat, 23 Sep 2023 18:08:51 +1000 (AEST) Date: Sat, 23 Sep 2023 17:55:44 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH RFT 3/5] tcp: Force TCP_WINDOW_CLAMP before resetting STALLED flag Message-ID: References: <20230922220610.58767-1-sbrivio@redhat.com> <20230922220610.58767-4-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="cFHo+B/aDL5SBtFD" Content-Disposition: inline In-Reply-To: <20230922220610.58767-4-sbrivio@redhat.com> Message-ID-Hash: FLYLXCQKV3AY32ZGZNQFV7FPAGSLKOB4 X-Message-ID-Hash: FLYLXCQKV3AY32ZGZNQFV7FPAGSLKOB4 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Matej Hrica , passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --cFHo+B/aDL5SBtFD Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Sep 23, 2023 at 12:06:08AM +0200, Stefano Brivio wrote: > It looks like we need it as workaround for this situation, readily > reproducible at least with a 6.5 Linux kernel, with default rmem_max > and wmem_max values: >=20 > - an iperf3 client on the host sends about 160 KiB, typically > segmented into five frames by passt. We read this data using > MSG_PEEK >=20 > - the iperf3 server on the guest starts receiving >=20 > - meanwhile, the host kernel advertised a zero-sized window to the > receiver, as expected You noted the s/receiver/sender/ here.. >=20 > - eventually, the guest acknowledges all the data sent so far, and > we drop it from the buffer, courtesy of tcp_sock_consume(), using > recv() with MSG_TRUNC >=20 > - the client, however, doesn't get an updated window value, and > even keepalive packets are answered with zero-window segments, > until the connection is closed >=20 > It looks like dropping data from a socket using MSG_TRUNC doesn't > cause a recalculation of the window, which would be expected as a > result of any receiving operation that invalidates data on a buffer > (that is, not with MSG_PEEK). >=20 > Strangely enough, setting TCP_WINDOW_CLAMP via setsockopt(), even to > the previous value we clamped to, forces a recalculation of the > window which is advertised to the guest. =2E.and the s/guest/sender/ here.. >=20 > I couldn't quite confirm this issue by following all the possible > code paths in the kernel, yet. If confirmed, this should be fixed in > the kernel, but meanwhile this workaround looks robust to me (and it > will be needed for backward compatibility anyway). >=20 > Reported-by: Matej Hrica > Link: https://bugs.passt.top/show_bug.cgi?id=3D74 > Analysed-by: David Gibson > Signed-off-by: Stefano Brivio > --- > tcp.c | 22 +++++++++++++++++++--- > 1 file changed, 19 insertions(+), 3 deletions(-) >=20 > diff --git a/tcp.c b/tcp.c > index 5528e05..4606f17 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -1780,7 +1780,23 @@ static void tcp_clamp_window(const struct ctx *c, = struct tcp_tap_conn *conn, > wnd <<=3D conn->ws_from_tap; > wnd =3D MIN(MAX_WINDOW, wnd); > =20 > - if (conn->flags & WND_CLAMPED) { > + /* TODO: With (at least) Linux kernel versions 6.1 to 6.5, if we end up > + * with a zero-sized window on a TCP socket, dropping data (once > + * acknowledged by the guest) with recv() and MSG_TRUNC doesn't appear > + * to be enough to make the kernel advertise a non-zero window to the > + * receiver. Forcing a TCP_WINDOW_CLAMP setting, even with the existing =2E. but you need another s/receiver/sender/ here too. > + * value, fixes this. > + * > + * The STALLED flag on a connection is a sufficient indication that we > + * might have a zero-sized window on the socket, because it's set if we > + * exhausted the tap-side window, or if everything we receive from a > + * socket is already in flight to the guest. > + * > + * So, if STALLED is set, and we received a window value from the tap, > + * force a TCP_WINDOW_CLAMP setsockopt(). This should be investigated > + * further and fixed in the kernel instead, if confirmed. > + */ > + if (!(conn->flags & STALLED) && conn->flags & WND_CLAMPED) { > if (prev_scaled =3D=3D wnd) > return; > =20 > @@ -2409,12 +2425,12 @@ static int tcp_data_from_tap(struct ctx *c, struc= t tcp_tap_conn *conn, > i =3D keep - 1; > } > =20 > - tcp_clamp_window(c, conn, max_ack_seq_wnd); > - > /* On socket flush failure, pretend there was no ACK, try again later */ > if (ack && !tcp_sock_consume(conn, max_ack_seq)) > tcp_update_seqack_from_tap(c, conn, max_ack_seq); > =20 > + tcp_clamp_window(c, conn, max_ack_seq_wnd); > + > if (retr) { > trace("TCP: fast re-transmit, ACK: %u, previous sequence: %u", > max_ack_seq, conn->seq_to_tap); --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --cFHo+B/aDL5SBtFD Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmUOmesACgkQzQJF27ox 2GeJYA//TTMxIdYdzmt1xgKTQIAmGR972kEiuXZg6auQbsZndldIBANLIwL5rzSf 3FU8jXvGBcuhAqHcskzeFwA7z3WjRnDMrdJm+dHFFt+dS+kNwS/AU8aysnO3L9in dX3ON2W4nXpH0dUaLTOqRBjy/qpqFeZm6FuzC8siY20d5bLEjZ4xFucEL9aX2hQ/ r7G7nwPFAUNq8d586WVZoJWbVzdv0cye7Z+fAW17PCW65pdFOpAQEmGlKONZ+W0H lb3BB0z0zMOXhzRKsMo3mOdmSaGOuVMDImoeQ24n7HciBXs9hsx26XaIMc8Hc36M SOY5faxrWsG6/BIdtnl4bsIlpyKRHd7L198PlzotC53dOZfiwJp3IPbcANa07A6F Urz99bjo8F10Yle0SZpL+yIuvuMGo+ru+JKLg/gIDNxn8Ax95WL7tZi/GCKvJBc7 SKU2uRpS8oMxQL6rrtT20gbuEG+gFF/knHQYbafqU+cVVhtMoKkuLSXK/Qc5R21o c/imwfLxj4kwTq50nmEpLe6bA7lY9iLYtr+N/UT0Ftv4OtIwexJqLqK87bVAjVzR hHD1oZt5gKfTwuYwylDaaGKmLrP8z7OzLfSwEM2ANn0h6A2x2Ui4xNPVn+uZUOFJ knV7yu6OFocawg0ooMNWfC8Lg0bxtmor8t2eJtWwublMNnMX7Cc= =Arv7 -----END PGP SIGNATURE----- --cFHo+B/aDL5SBtFD--