From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202512 header.b=u4ie8COV; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 8C7FA5A0625 for ; Tue, 09 Dec 2025 06:13:32 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202512; t=1765257208; bh=SYBdWKJTCOpF0MEIcYMQOxUE3bbEqwgNfIboMO9HajQ=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=u4ie8COVZkmj+JbhJdk2mqZXe1NUoHnJyVmwsu3wP55LG/KyEESK0kGv0+xrkHuDH A5Xr/fyf5JDvYVqRKhlTr5lyCMonPAxbVwN2tN2Ex0lrifkfIXDk9IBaZCWp7M6p+X SLyiHo4xKjpqay6dMb3Q3IifrnDI8onc5G5S0X7vgx6OvSurTFCNLX6FJ9Msh4bfsX XG22Prx/byswqtBhJE5doU2kjWXn4dEQ9bB1MLvB9uJXeuxwGzlTppTh32yhd8GRlf R6pxG4UuTNJKfAps4HoDGsgPZXVQ7F1JHkDex9y/N9tbG6kDgjJcXivYTMP60TLhWh vCsJ6R+pi3IRQ== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4dQRp0556Dz4wMG; Tue, 09 Dec 2025 16:13:28 +1100 (AEDT) Date: Tue, 9 Dec 2025 16:12:32 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v3 06/10] tcp: Acknowledge everything if it looks like bulk traffic, not interactive Message-ID: References: <20251208072024.3884137-1-sbrivio@redhat.com> <20251208072024.3884137-7-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="9wzQscXWoe5KneLv" Content-Disposition: inline In-Reply-To: <20251208072024.3884137-7-sbrivio@redhat.com> Message-ID-Hash: UZSZZNVPELSAPSVTNLXJXTI2F7T7THC3 X-Message-ID-Hash: UZSZZNVPELSAPSVTNLXJXTI2F7T7THC3 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Max Chernoff X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --9wzQscXWoe5KneLv Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Dec 08, 2025 at 08:20:19AM +0100, Stefano Brivio wrote: > ...instead of checking if the current sending buffer is less than > SNDBUF_SMALL, because this isn't simply an optimisation to coalesce > ACK segments: we rely on having enough data at once from the sender > to make the buffer grow by means of TCP buffer size tuning > implemented in the Linux kernel. >=20 > This is important if we're trying to maximise throughput, but not > desirable for interactive traffic, where we want to be transparent as > possible and avoid introducing unnecessary latency. >=20 > Use the tcpi_delivery_rate field reported by the Linux kernel, if > available, to calculate the current bandwidth-delay product: if it's > significantly smaller than the available sending buffer, conclude that > we're not bandwidth-bound and this is likely to be interactive > traffic, so acknowledge data only as it's acknowledged by the peer. >=20 > Conversely, if the bandwidth-delay product is comparable to the size > of the sending buffer (more than 5%), we're probably bandwidth-bound > or... bound to be: acknowledge everything in that case. >=20 > Signed-off-by: Stefano Brivio Reviewed-by: David Gibson > --- > tcp.c | 45 +++++++++++++++++++++++++++++++++------------ > 1 file changed, 33 insertions(+), 12 deletions(-) >=20 > diff --git a/tcp.c b/tcp.c > index b2e4174..923c1f2 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -353,6 +353,9 @@ enum { > #define LOW_RTT_TABLE_SIZE 8 > #define LOW_RTT_THRESHOLD 10 /* us */ > =20 > +/* Ratio of buffer to bandwidth * delay product implying interactive tra= ffic */ > +#define SNDBUF_TO_BW_DELAY_INTERACTIVE /* > */ 20 /* (i.e. < 5% of buffe= r) */ > + > #define ACK_IF_NEEDED 0 /* See tcp_send_flag() */ > =20 > #define CONN_IS_CLOSING(conn) \ > @@ -426,11 +429,13 @@ socklen_t tcp_info_size; > sizeof(((struct tcp_info_linux *)NULL)->tcpi_##f_)) <=3D tcp_info_siz= e) > =20 > /* Kernel reports sending window in TCP_INFO (kernel commit 8f7baad7f035= ) */ > -#define snd_wnd_cap tcp_info_cap(snd_wnd) > +#define snd_wnd_cap tcp_info_cap(snd_wnd) > /* Kernel reports bytes acked in TCP_INFO (kernel commit 0df48c26d84) */ > -#define bytes_acked_cap tcp_info_cap(bytes_acked) > +#define bytes_acked_cap tcp_info_cap(bytes_acked) > /* Kernel reports minimum RTT in TCP_INFO (kernel commit cd9b266095f4) */ > -#define min_rtt_cap tcp_info_cap(min_rtt) > +#define min_rtt_cap tcp_info_cap(min_rtt) > +/* Kernel reports delivery rate in TCP_INFO (kernel commit eb8329e0a04d)= */ > +#define delivery_rate_cap tcp_info_cap(delivery_rate) > =20 > /* sendmsg() to socket */ > static struct iovec tcp_iov [UIO_MAXIOV]; > @@ -1050,6 +1055,7 @@ int tcp_update_seqack_wnd(const struct ctx *c, stru= ct tcp_tap_conn *conn, > socklen_t sl =3D sizeof(*tinfo); > struct tcp_info_linux tinfo_new; > uint32_t new_wnd_to_tap =3D prev_wnd_to_tap; > + bool ack_everything =3D true; > int s =3D conn->sock; > =20 > /* At this point we could ack all the data we've accepted for forwarding > @@ -1059,7 +1065,8 @@ int tcp_update_seqack_wnd(const struct ctx *c, stru= ct tcp_tap_conn *conn, > * control behaviour. > * > * For it to be possible and worth it we need: > - * - The TCP_INFO Linux extension which gives us the peer acked bytes > + * - The TCP_INFO Linux extensions which give us the peer acked bytes > + * and the delivery rate (outbound bandwidth at receiver) > * - Not to be told not to (force_seq) > * - Not half-closed in the peer->guest direction > * With no data coming from the peer, we might not get events which > @@ -1069,19 +1076,36 @@ int tcp_update_seqack_wnd(const struct ctx *c, st= ruct tcp_tap_conn *conn, > * Data goes from socket to socket, with nothing meaningfully "in > * flight". > * - Not a pseudo-local connection (e.g. to a VM on the same host) > - * - Large enough send buffer > - * In these cases, there's not enough in flight to bother. > + * If it is, there's not enough in flight to bother. > + * - Sending buffer significantly larger than bandwidth * delay product > + * Meaning we're not bandwidth-bound and this is likely to be > + * interactive traffic where we want to preserve transparent > + * connection behaviour and latency. > + * > + * Otherwise, we probably want to maximise throughput, which needs > + * sending buffer auto-tuning, triggered in turn by filling up the > + * outbound socket queue. > */ > - if (bytes_acked_cap && !force_seq && > + if (bytes_acked_cap && delivery_rate_cap && !force_seq && > !CONN_IS_CLOSING(conn) && > - !(conn->flags & LOCAL) && !tcp_rtt_dst_low(conn) && > - (unsigned)SNDBUF_GET(conn) >=3D SNDBUF_SMALL) { > + !(conn->flags & LOCAL) && !tcp_rtt_dst_low(conn)) { > if (!tinfo) { > tinfo =3D &tinfo_new; > if (getsockopt(s, SOL_TCP, TCP_INFO, tinfo, &sl)) > return 0; > } > =20 > + if ((unsigned)SNDBUF_GET(conn) > (long long)tinfo->tcpi_rtt * > + tinfo->tcpi_delivery_rate / > + 1000 / 1000 * > + SNDBUF_TO_BW_DELAY_INTERACTIVE) > + ack_everything =3D false; > + } > + > + if (ack_everything) { > + /* Fall back to acknowledging everything we got */ > + conn->seq_ack_to_tap =3D conn->seq_from_tap; > + } else { > /* This trips a cppcheck bug in some versions, including > * cppcheck 2.18.3. > * https://sourceforge.net/p/cppcheck/discussion/general/thread/fecde5= 9085/ > @@ -1089,9 +1113,6 @@ int tcp_update_seqack_wnd(const struct ctx *c, stru= ct tcp_tap_conn *conn, > /* cppcheck-suppress [uninitvar,unmatchedSuppression] */ > conn->seq_ack_to_tap =3D tinfo->tcpi_bytes_acked + > conn->seq_init_from_tap; > - } else { > - /* Fall back to acknowledging everything we got */ > - conn->seq_ack_to_tap =3D conn->seq_from_tap; > } > =20 > /* It's occasionally possible for us to go from using the fallback above > --=20 > 2.43.0 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --9wzQscXWoe5KneLv Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmk3r78ACgkQzQJF27ox 2GeHsw//Y6mCx4Mh94K2cC8sy4dUHPhDc4cIoK5s1Un2UjEqlSEpKw4LrM/cBn65 icfjGBCcjtAGZaY/FskPjg3JaVG0Y+PFVgoJm3NSwEws4z44e0utDxZ7CJfCk70Y waxqk9tBfccSXTB8grelCPZ9YRw9RWsDvgW7Nud/+nI6YjJ4FUrDjfWj1X1FVTq/ 6/LPXQ0ec99xLD+jAai1hRMIyLhZLLOP/ES6hMR6FzyAWRFsg9uuu7dSsU3f27U6 G+BbNVvHFWjelAH+U/hRLNIxbIqGDuSYQiRNVo0BZZ0Ph0pIWkiM/RfpFRdIv8NU kAkrBNd7vdLfi+S9wVtIbJKBt1Qo3jamZDe0KCsXBcLHQmq5rtx41roZI6lgI9Iu FzXvDVHigIcUhYOKRNe+oYDzH/8U8OMIECMjU8BMTZUdXXv+0zRVyKtTHh4oOa// 8aimXLuXN1+BikpauLE4pyG/K50ElXI61fn50DBzSSFx/N6Pk4j7h0pa+PJJ+6L+ r8bM2KcvfApAZC1RwJ2g55xZ4YSeKsJ7EbhEzfxgntmwF/uGWBbJlXScJOZQNf3d H/xpDiesLBmK51W5qY2CQAaP+vwHn3aQupBtz8KeSjUu0ViqKdrbHV/++Ylxg3W/ FEOSUoYRUmy+Km1SuUnr7ABtUSvW3CY9Did2VjT98GogOcOvLys= =V0YS -----END PGP SIGNATURE----- --9wzQscXWoe5KneLv--