From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202512 header.b=VhcIhZUm; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id EBBE85A0624 for ; Mon, 08 Dec 2025 11:58:46 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202512; t=1765191524; bh=g48fXoDa4H3pWWTLaZBzRIVRkAjgFW8wMqKr0mPJRzA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=VhcIhZUmDXZl3naZGf0B6YRqk1x2TQGIW/GzE6EUma+s7ktBCacWe5alXgJt+V5b5 pEG+QvoWiJ32nNaYq83fxLRUKTDezPemusgM2TDw/y75frh2GUm2iQdxADAdqcIukU ssmPuBCPCZ00ByF25mPJe4YuuMIUCtpCgz7vpJZpim6nEWlf3hcZUPGu/0USUPjSLS T8eGy8skXBvjHnNXCiZN78Ulvp1f1T8rG5St9ZbjE8DAtSXjOzaa3gh3/M4bQ6hNEh Bj5lfPV/GMnKi8+hrIY33OjJugxBLyfrDIy8gXp7Lw1DjBieUv8LkAQVWqG/lVAgyw f79ze2zWo2F1w== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4dPzVr42VGz4wGZ; Mon, 08 Dec 2025 21:58:44 +1100 (AEDT) Date: Mon, 8 Dec 2025 19:31:00 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v2 5/9] tcp: Acknowledge everything if it looks like bulk traffic, not interactive Message-ID: References: <20251208002229.391162-1-sbrivio@redhat.com> <20251208002229.391162-6-sbrivio@redhat.com> <20251208082529.6ce78f65@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="i9UZk56hWOtK6fCH" Content-Disposition: inline In-Reply-To: <20251208082529.6ce78f65@elisabeth> Message-ID-Hash: V7PA2N5C2AGO6VHMD5GQEDYLTWCVCNBU X-Message-ID-Hash: V7PA2N5C2AGO6VHMD5GQEDYLTWCVCNBU X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Max Chernoff X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --i9UZk56hWOtK6fCH Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Dec 08, 2025 at 08:25:29AM +0100, Stefano Brivio wrote: > On Mon, 8 Dec 2025 16:54:55 +1100 > David Gibson wrote: >=20 > > On Mon, Dec 08, 2025 at 01:22:13AM +0100, Stefano Brivio wrote: > > > ...instead of checking if the current sending buffer is less than > > > SNDBUF_SMALL, because this isn't simply an optimisation to coalesce > > > ACK segments: we rely on having enough data at once from the sender > > > to make the buffer grow by means of TCP buffer size tuning > > > implemented in the Linux kernel. > > >=20 > > > This is important if we're trying to maximise throughput, but not > > > desirable for interactive traffic, where we want to be transparent as > > > possible and avoid introducing unnecessary latency. > > >=20 > > > Use the tcpi_delivery_rate field reported by the Linux kernel, if > > > available, to calculate the current bandwidth-delay product: if it's > > > significantly smaller than the available sending buffer, conclude that > > > we're not bandwidth-bound and this is likely to be interactive > > > traffic, so acknowledge data only as it's acknowledged by the peer. > > >=20 > > > Conversely, if the bandwidth-delay product is comparable to the size > > > of the sending buffer (more than 5%), we're probably bandwidth-bound > > > or... bound to be: acknowledge everything in that case. =20 > >=20 > > Ah, nice. This reasoning is much clearer to me than the previous > > spin. > >=20 > > >=20 > > > Signed-off-by: Stefano Brivio > > > --- > > > tcp.c | 45 +++++++++++++++++++++++++++++++++------------ > > > 1 file changed, 33 insertions(+), 12 deletions(-) > > >=20 > > > diff --git a/tcp.c b/tcp.c > > > index 9bf7b8b..533c8a7 100644 > > > --- a/tcp.c > > > +++ b/tcp.c > > > @@ -353,6 +353,9 @@ enum { > > > #define LOW_RTT_TABLE_SIZE 8 > > > #define LOW_RTT_THRESHOLD 10 /* us */ > > > =20 > > > +/* Ratio of buffer to bandwidth * delay product implying interactive= traffic */ > > > +#define SNDBUF_TO_BW_DELAY_INTERACTIVE /* > */ 20 /* (i.e. < 5% of b= uffer) */ > > > + > > > #define ACK_IF_NEEDED 0 /* See tcp_send_flag() */ > > > =20 > > > #define CONN_IS_CLOSING(conn) \ > > > @@ -426,11 +429,13 @@ socklen_t tcp_info_size; > > > sizeof(((struct tcp_info_linux *)NULL)->tcpi_##f_)) <=3D tcp_info= _size) > > > =20 > > > /* Kernel reports sending window in TCP_INFO (kernel commit 8f7baad7= f035) */ > > > -#define snd_wnd_cap tcp_info_cap(snd_wnd) > > > +#define snd_wnd_cap tcp_info_cap(snd_wnd) > > > /* Kernel reports bytes acked in TCP_INFO (kernel commit 0df48c26d84= ) */ > > > -#define bytes_acked_cap tcp_info_cap(bytes_acked) > > > +#define bytes_acked_cap tcp_info_cap(bytes_acked) > > > /* Kernel reports minimum RTT in TCP_INFO (kernel commit cd9b266095f= 4) */ > > > -#define min_rtt_cap tcp_info_cap(min_rtt) > > > +#define min_rtt_cap tcp_info_cap(min_rtt) > > > +/* Kernel reports delivery rate in TCP_INFO (kernel commit eb8329e0a= 04d) */ > > > +#define delivery_rate_cap tcp_info_cap(delivery_rate) > > > =20 > > > /* sendmsg() to socket */ > > > static struct iovec tcp_iov [UIO_MAXIOV]; > > > @@ -1048,6 +1053,7 @@ int tcp_update_seqack_wnd(const struct ctx *c, = struct tcp_tap_conn *conn, > > > socklen_t sl =3D sizeof(*tinfo); > > > struct tcp_info_linux tinfo_new; > > > uint32_t new_wnd_to_tap =3D prev_wnd_to_tap; > > > + bool ack_everything =3D true; > > > int s =3D conn->sock; > > > =20 > > > /* At this point we could ack all the data we've accepted for forwa= rding > > > @@ -1057,7 +1063,8 @@ int tcp_update_seqack_wnd(const struct ctx *c, = struct tcp_tap_conn *conn, > > > * control behaviour. > > > * > > > * For it to be possible and worth it we need: > > > - * - The TCP_INFO Linux extension which gives us the peer acked by= tes > > > + * - The TCP_INFO Linux extensions which give us the peer acked by= tes > > > + * and the delivery rate (outbound bandwidth at receiver) > > > * - Not to be told not to (force_seq) > > > * - Not half-closed in the peer->guest direction > > > * With no data coming from the peer, we might not get events = which > > > @@ -1067,19 +1074,36 @@ int tcp_update_seqack_wnd(const struct ctx *c= , struct tcp_tap_conn *conn, > > > * Data goes from socket to socket, with nothing meaningfully = "in > > > * flight". > > > * - Not a pseudo-local connection (e.g. to a VM on the same host) > > > - * - Large enough send buffer > > > - * In these cases, there's not enough in flight to bother. > > > + * If it is, there's not enough in flight to bother. > > > + * - Sending buffer significantly larger than bandwidth * delay pr= oduct > > > + * Meaning we're not bandwidth-bound and this is likely to be > > > + * interactive traffic where we want to preserve transparent > > > + * connection behaviour and latency. =20 > >=20 > > Do we actually want the sending buffer size here? Or the amount of > > buffer that's actually in use (SIOCOUTQ)? If we had a burst transfer > > followed by interactive traffic, the kernel could still have a large > > send buffer allocated, no? >=20 > The kernel shrinks it rather fast, and if it's not fast enough, then it > still looks like bulk traffic. I tried several metrics (including > something based on the data just sent, which approximates SIOCOUTQ), > they are not as good as the current buffer size. Ok. Thinking about this, I guess the kernel has had quite some time to tweak its heuristics here, so making indirect use of that experience would be a good idea. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --i9UZk56hWOtK6fCH Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmk2jMMACgkQzQJF27ox 2Gc2gw//VOetOBISsAVKRCkK0kGZZb83i9k7EcBmoFrcMNuyRE63A5U58tQhOaje U4JqmfwbW0bywwo4WF3iDnT5EuksDBSfRkqNE3h9A9OCrwZkPFFyPOpcMfR2PeTS MQTb5WTvJ/U52fucEa8B7IQz72Tp4LspaVEGPn9ae1ofYw3AemOBzUwbbX+xjNhW lnberxnM8v4SRn5+Tz1z0VWAcTriEh4qcBDPw0YZELM21eBmZF9wYiBd5k5SRRUJ gPpPIX1yo9m7tWJyD3RR6ELiiW2J0XGh2okPMJNuqcmF2z3HN9gR3yBokb/aToF5 9D8nGCRuTo5wJzWMz/vg16fjyILOcUnDOGmggwjdM1iQ6hHddws0gvxZdmDKR0Sr OtjMrnMix52iv4qO5NrPkKCihM2jFYE+Sh/gAnSXaQVZRKzOnGoSM7bEsVZkJoSz nWX26E1lxzluTchSXR955zOaBtvAA8tSFJDGKYPnHTugK5/6pG6StbDvcOI/6yCb 1JV5Sk/LfvIldBtIdsmPagmBXtp35ZQZZSOh/ppSbu3WCq3MqYbLQQq1/dATWCHY LkrNVnq6dY80QmAoLqm/aR/1IejMrsnNuTVSCgtvEbEYa15MaPiVzVfBuzbEY5fQ R22qdswa9QNp5o8YZwBfF+c9Q9id3I1WFA1U9HzmCGRhcdvETcA= =I2FN -----END PGP SIGNATURE----- --i9UZk56hWOtK6fCH--