From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202510 header.b=DTv+x3eH; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 38CBE5A0619 for ; Fri, 05 Dec 2025 03:37:04 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202510; t=1764902221; bh=G/+kpA7pHeSZE+ylvC/9XpUM9LpZOQ09ovjtxWZncV0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=DTv+x3eHHyjoUZ0YdVbz+TmGeOq4YNFr1qKCjuMe1saxbeUOscMpdfgQ5zs/wuiN6 nId9iAMPU8ui25122m9FkEBCUdEV0BPTjXn49ph9cHm4ePaJpXUKxM8ktrePyfe5BX H6t9Il+yC/3pLtujTQDj7xr9UQxXdBYQmHDjCAOYCQ0bSvmPyl6TTcfMbghZh3rHr3 oJGWmfqRhzZcNqbsg11e56fymQuCqPjNaR/8PTPR2hRcZKjD9CyttUqLd+oZMa4drJ muyXknT1oHmVqmJ35L1dGcJhFMPY09jBBTNjLZhdggat8D7g38N/MHDsioQv0whh2r VoKtkrWwy7uZQ== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4dMwWK3VsDz4wHW; Fri, 05 Dec 2025 13:37:01 +1100 (AEDT) Date: Fri, 5 Dec 2025 13:34:07 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 6/8] tcp: Allow exceeding the available sending buffer size in window advertisements Message-ID: References: <20251204074542.2156548-1-sbrivio@redhat.com> <20251204074542.2156548-7-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="If8wqpdmfKzrT6XP" Content-Disposition: inline In-Reply-To: <20251204074542.2156548-7-sbrivio@redhat.com> Message-ID-Hash: NODYW4FEXDVGEBVLBNTNQE2VFSKRLKW7 X-Message-ID-Hash: NODYW4FEXDVGEBVLBNTNQE2VFSKRLKW7 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Max Chernoff X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --If8wqpdmfKzrT6XP Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Dec 04, 2025 at 08:45:39AM +0100, Stefano Brivio wrote: > ...under two conditions: >=20 > - the remote peer is advertising a bigger value to us, meaning that a > bigger sending buffer is likely to benefit throughput, AND I think this condition is redundant: if the remote peer is advertising less, we'll clamp new_wnd_to_tap to that value anyway. > - this is not a short-lived connection, where the latency cost of > retransmissions would be otherwise unacceptable. >=20 > By doing this, we can reliably trigger TCP buffer size auto-tuning (as > long as it's available) on bulk data transfers. >=20 > Signed-off-by: Stefano Brivio > --- > tcp.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) >=20 > diff --git a/tcp.c b/tcp.c > index 2220059..454df69 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -353,6 +353,13 @@ enum { > #define LOW_RTT_TABLE_SIZE 8 > #define LOW_RTT_THRESHOLD 10 /* us */ > =20 > +/* Try to avoid retransmissions to improve latency on short-lived connec= tions */ > +#define SHORT_CONN_BYTES (16ULL * 1024 * 1024) > + > +/* Temporarily exceed available sending buffer to force TCP auto-tuning = */ > +#define SNDBUF_BOOST_FACTOR 150 /* % */ > +#define SNDBUF_BOOST(x) ((x) * SNDBUF_BOOST_FACTOR / 100) For the short term, the fact this works empirically is enough. For the longer term, it would be nice to have a better understanding of what this "overcommit" amount is actually estimating. I think what we're looking for is an estimate of the number of bytes that will have left the buffer by the time the guest gets back to us. So: * Alas, I don't see a way to estimate either of those from the information we already track - we'd need additional bookkeeping. > #define ACK_IF_NEEDED 0 /* See tcp_send_flag() */ > =20 > #define CONN_IS_CLOSING(conn) \ > @@ -1137,6 +1144,9 @@ int tcp_update_seqack_wnd(const struct ctx *c, stru= ct tcp_tap_conn *conn, > =20 > if ((int)sendq > SNDBUF_GET(conn)) /* Due to memory pressure? */ > limit =3D 0; > + else if ((int)tinfo->tcpi_snd_wnd > SNDBUF_GET(conn) && > + tinfo->tcpi_bytes_acked > SHORT_CONN_BYTES) This is pretty subtle, I think it would be worth having some rationale in a comment, not just the commit message. > + limit =3D SNDBUF_BOOST(SNDBUF_GET(conn)) - (int)sendq; > else > limit =3D SNDBUF_GET(conn) - (int)sendq; > =20 > --=20 > 2.43.0 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --If8wqpdmfKzrT6XP Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmkyRIsACgkQzQJF27ox 2Gc8+RAAi5s/t3RWZKFuLwX8983HeYQnBlWvmRj/lABegpFHPs40Zg9QlSbhxjfD 7aARBR4truqc/un1qs1zRMC8NuQmRulvIDK4lh0tHvckhemSAh0rz16y2htBNq4+ wWcVaSO0LJu+uNdnxsb6UXIpzuDwbhivWCgdLZ+1cbrh12memByPh2tE7AGY3JkF r9jteMe+Y0ZGhSuLMeq4fZYgotZLW599mO11yTzXJsp0W64D6OS0+7+bVShwnQuJ dQl/tfXlG7tlO4TGTs3iKXQg64KzvOzPYePs7GS5ifSUktr42idVoEUTyuabYweE H3rw8hreVAbvjziAz97spNIgsR9S/N6aP7//msmgoXBtrgdPVOICz5dkrKa0iBFQ 4Hq7KXY0K0bro+6k3IEtByY6Af14/GmBF4+97meLb1Us4SRHTfewa3nSjJSg/L0f oeRgWMWPXKD1uqSYYd3jtNQWmnUQlkLwd9k9IBbIBTAofzV8dTh6EFcNYHxc7S/b 5/i32Mtza4FQq7oS4JHuT3NdhQl5EeUhdnZgnMshfEG10i8T220mMWHUAZTsUhT6 oCD3Ic/2cC1mGEYXJr41PfZg0s1RnaWdURsD/P9R9JkIr+ForEkgKnUXrHIf1UUs jVBTzt8nEoGw5HuOwRAH7fFA1v42jBkn7sbSPjrhkkLBV1UVQ7k= =+hi+ -----END PGP SIGNATURE----- --If8wqpdmfKzrT6XP--