public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: passt-dev@passt.top
Cc: Max Chernoff <git@maxchernoff.ca>,
	David Gibson <david@gibson.dropbear.id.au>
Subject: [PATCH v3 06/10] tcp: Acknowledge everything if it looks like bulk traffic, not interactive
Date: Mon,  8 Dec 2025 08:20:19 +0100	[thread overview]
Message-ID: <20251208072024.3884137-7-sbrivio@redhat.com> (raw)
In-Reply-To: <20251208072024.3884137-1-sbrivio@redhat.com>

...instead of checking if the current sending buffer is less than
SNDBUF_SMALL, because this isn't simply an optimisation to coalesce
ACK segments: we rely on having enough data at once from the sender
to make the buffer grow by means of TCP buffer size tuning
implemented in the Linux kernel.

This is important if we're trying to maximise throughput, but not
desirable for interactive traffic, where we want to be transparent as
possible and avoid introducing unnecessary latency.

Use the tcpi_delivery_rate field reported by the Linux kernel, if
available, to calculate the current bandwidth-delay product: if it's
significantly smaller than the available sending buffer, conclude that
we're not bandwidth-bound and this is likely to be interactive
traffic, so acknowledge data only as it's acknowledged by the peer.

Conversely, if the bandwidth-delay product is comparable to the size
of the sending buffer (more than 5%), we're probably bandwidth-bound
or... bound to be: acknowledge everything in that case.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 tcp.c | 45 +++++++++++++++++++++++++++++++++------------
 1 file changed, 33 insertions(+), 12 deletions(-)

diff --git a/tcp.c b/tcp.c
index b2e4174..923c1f2 100644
--- a/tcp.c
+++ b/tcp.c
@@ -353,6 +353,9 @@ enum {
 #define LOW_RTT_TABLE_SIZE		8
 #define LOW_RTT_THRESHOLD		10 /* us */
 
+/* Ratio of buffer to bandwidth * delay product implying interactive traffic */
+#define SNDBUF_TO_BW_DELAY_INTERACTIVE	/* > */ 20 /* (i.e. < 5% of buffer) */
+
 #define ACK_IF_NEEDED	0		/* See tcp_send_flag() */
 
 #define CONN_IS_CLOSING(conn)						\
@@ -426,11 +429,13 @@ socklen_t tcp_info_size;
 	  sizeof(((struct tcp_info_linux *)NULL)->tcpi_##f_)) <= tcp_info_size)
 
 /* Kernel reports sending window in TCP_INFO (kernel commit 8f7baad7f035) */
-#define snd_wnd_cap	tcp_info_cap(snd_wnd)
+#define snd_wnd_cap		tcp_info_cap(snd_wnd)
 /* Kernel reports bytes acked in TCP_INFO (kernel commit 0df48c26d84) */
-#define bytes_acked_cap	tcp_info_cap(bytes_acked)
+#define bytes_acked_cap		tcp_info_cap(bytes_acked)
 /* Kernel reports minimum RTT in TCP_INFO (kernel commit cd9b266095f4) */
-#define min_rtt_cap	tcp_info_cap(min_rtt)
+#define min_rtt_cap		tcp_info_cap(min_rtt)
+/* Kernel reports delivery rate in TCP_INFO (kernel commit eb8329e0a04d) */
+#define delivery_rate_cap	tcp_info_cap(delivery_rate)
 
 /* sendmsg() to socket */
 static struct iovec	tcp_iov			[UIO_MAXIOV];
@@ -1050,6 +1055,7 @@ int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn,
 	socklen_t sl = sizeof(*tinfo);
 	struct tcp_info_linux tinfo_new;
 	uint32_t new_wnd_to_tap = prev_wnd_to_tap;
+	bool ack_everything = true;
 	int s = conn->sock;
 
 	/* At this point we could ack all the data we've accepted for forwarding
@@ -1059,7 +1065,8 @@ int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn,
 	 * control behaviour.
 	 *
 	 * For it to be possible and worth it we need:
-	 *  - The TCP_INFO Linux extension which gives us the peer acked bytes
+	 *  - The TCP_INFO Linux extensions which give us the peer acked bytes
+	 *    and the delivery rate (outbound bandwidth at receiver)
 	 *  - Not to be told not to (force_seq)
 	 *  - Not half-closed in the peer->guest direction
 	 *      With no data coming from the peer, we might not get events which
@@ -1069,19 +1076,36 @@ int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn,
 	 *      Data goes from socket to socket, with nothing meaningfully "in
 	 *      flight".
 	 *  - Not a pseudo-local connection (e.g. to a VM on the same host)
-	 *  - Large enough send buffer
-	 *      In these cases, there's not enough in flight to bother.
+	 *      If it is, there's not enough in flight to bother.
+	 *  - Sending buffer significantly larger than bandwidth * delay product
+	 *      Meaning we're not bandwidth-bound and this is likely to be
+	 *      interactive traffic where we want to preserve transparent
+	 *      connection behaviour and latency.
+	 *
+	 *      Otherwise, we probably want to maximise throughput, which needs
+	 *      sending buffer auto-tuning, triggered in turn by filling up the
+	 *      outbound socket queue.
 	 */
-	if (bytes_acked_cap && !force_seq &&
+	if (bytes_acked_cap && delivery_rate_cap && !force_seq &&
 	    !CONN_IS_CLOSING(conn) &&
-	    !(conn->flags & LOCAL) && !tcp_rtt_dst_low(conn) &&
-	    (unsigned)SNDBUF_GET(conn) >= SNDBUF_SMALL) {
+	    !(conn->flags & LOCAL) && !tcp_rtt_dst_low(conn)) {
 		if (!tinfo) {
 			tinfo = &tinfo_new;
 			if (getsockopt(s, SOL_TCP, TCP_INFO, tinfo, &sl))
 				return 0;
 		}
 
+		if ((unsigned)SNDBUF_GET(conn) > (long long)tinfo->tcpi_rtt *
+						 tinfo->tcpi_delivery_rate /
+						 1000 / 1000 *
+						 SNDBUF_TO_BW_DELAY_INTERACTIVE)
+			ack_everything = false;
+	}
+
+	if (ack_everything) {
+		/* Fall back to acknowledging everything we got */
+		conn->seq_ack_to_tap = conn->seq_from_tap;
+	} else {
 		/* This trips a cppcheck bug in some versions, including
 		 * cppcheck 2.18.3.
 		 * https://sourceforge.net/p/cppcheck/discussion/general/thread/fecde59085/
@@ -1089,9 +1113,6 @@ int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn,
 		/* cppcheck-suppress [uninitvar,unmatchedSuppression] */
 		conn->seq_ack_to_tap = tinfo->tcpi_bytes_acked +
 		                       conn->seq_init_from_tap;
-	} else {
-		/* Fall back to acknowledging everything we got */
-		conn->seq_ack_to_tap = conn->seq_from_tap;
 	}
 
 	/* It's occasionally possible for us to go from using the fallback above
-- 
2.43.0


  parent reply	other threads:[~2025-12-08  7:20 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-08  7:20 [PATCH v3 00/10] tcp: Fix throughput issues with non-local peers Stefano Brivio
2025-12-08  7:20 ` [PATCH v3 01/10] tcp, util: Add function for scaling to linearly interpolated factor, use it Stefano Brivio
2025-12-09  5:05   ` David Gibson
2025-12-08  7:20 ` [PATCH v3 02/10] tcp: Change usage factor of sending buffer in tcp_get_sndbuf() to 75% Stefano Brivio
2025-12-09  5:05   ` David Gibson
2025-12-08  7:20 ` [PATCH v3 03/10] tcp: Limit advertised window to available, not total sending buffer size Stefano Brivio
2025-12-08  7:20 ` [PATCH v3 04/10] tcp: Adaptive interval based on RTT for socket-side acknowledgement checks Stefano Brivio
2025-12-09  5:10   ` David Gibson
2025-12-09 22:49     ` Stefano Brivio
2025-12-08  7:20 ` [PATCH v3 05/10] tcp: Don't clear ACK_TO_TAP_DUE if we're advertising a zero-sized window Stefano Brivio
2025-12-08  7:20 ` Stefano Brivio [this message]
2025-12-09  5:12   ` [PATCH v3 06/10] tcp: Acknowledge everything if it looks like bulk traffic, not interactive David Gibson
2025-12-08  7:20 ` [PATCH v3 07/10] tcp: Don't limit window to less-than-MSS values, use zero instead Stefano Brivio
2025-12-08  7:20 ` [PATCH v3 08/10] tcp: Allow exceeding the available sending buffer size in window advertisements Stefano Brivio
2025-12-08  8:14   ` Max Chernoff
2025-12-08  8:15   ` Max Chernoff
2025-12-08  8:27     ` Stefano Brivio
2025-12-08  7:20 ` [PATCH v3 09/10] tcp: Send a duplicate ACK also on complete sendmsg() failure Stefano Brivio
2025-12-08  7:20 ` [PATCH v3 10/10] tcp: Skip redundant ACK on partial " Stefano Brivio
2025-12-08  8:11 ` [PATCH v3 00/10] tcp: Fix throughput issues with non-local peers Max Chernoff
2025-12-08  8:25   ` Stefano Brivio
2025-12-08  8:51     ` Max Chernoff
2025-12-08  9:00       ` Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251208072024.3884137-7-sbrivio@redhat.com \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=git@maxchernoff.ca \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).