public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>,
	Matej Hrica <mhrica@redhat.com>
Cc: passt-dev@passt.top
Subject: [PATCH RFT 3/5] tcp: Force TCP_WINDOW_CLAMP before resetting STALLED flag
Date: Sat, 23 Sep 2023 00:06:08 +0200	[thread overview]
Message-ID: <20230922220610.58767-4-sbrivio@redhat.com> (raw)
In-Reply-To: <20230922220610.58767-1-sbrivio@redhat.com>

It looks like we need it as workaround for this situation, readily
reproducible at least with a 6.5 Linux kernel, with default rmem_max
and wmem_max values:

- an iperf3 client on the host sends about 160 KiB, typically
  segmented into five frames by passt. We read this data using
  MSG_PEEK

- the iperf3 server on the guest starts receiving

- meanwhile, the host kernel advertised a zero-sized window to the
  receiver, as expected

- eventually, the guest acknowledges all the data sent so far, and
  we drop it from the buffer, courtesy of tcp_sock_consume(), using
  recv() with MSG_TRUNC

- the client, however, doesn't get an updated window value, and
  even keepalive packets are answered with zero-window segments,
  until the connection is closed

It looks like dropping data from a socket using MSG_TRUNC doesn't
cause a recalculation of the window, which would be expected as a
result of any receiving operation that invalidates data on a buffer
(that is, not with MSG_PEEK).

Strangely enough, setting TCP_WINDOW_CLAMP via setsockopt(), even to
the previous value we clamped to, forces a recalculation of the
window which is advertised to the guest.

I couldn't quite confirm this issue by following all the possible
code paths in the kernel, yet. If confirmed, this should be fixed in
the kernel, but meanwhile this workaround looks robust to me (and it
will be needed for backward compatibility anyway).

Reported-by: Matej Hrica <mhrica@redhat.com>
Link: https://bugs.passt.top/show_bug.cgi?id=74
Analysed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 tcp.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/tcp.c b/tcp.c
index 5528e05..4606f17 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1780,7 +1780,23 @@ static void tcp_clamp_window(const struct ctx *c, struct tcp_tap_conn *conn,
 	wnd <<= conn->ws_from_tap;
 	wnd = MIN(MAX_WINDOW, wnd);
 
-	if (conn->flags & WND_CLAMPED) {
+	/* TODO: With (at least) Linux kernel versions 6.1 to 6.5, if we end up
+	 * with a zero-sized window on a TCP socket, dropping data (once
+	 * acknowledged by the guest) with recv() and MSG_TRUNC doesn't appear
+	 * to be enough to make the kernel advertise a non-zero window to the
+	 * receiver. Forcing a TCP_WINDOW_CLAMP setting, even with the existing
+	 * value, fixes this.
+	 *
+	 * The STALLED flag on a connection is a sufficient indication that we
+	 * might have a zero-sized window on the socket, because it's set if we
+	 * exhausted the tap-side window, or if everything we receive from a
+	 * socket is already in flight to the guest.
+	 *
+	 * So, if STALLED is set, and we received a window value from the tap,
+	 * force a TCP_WINDOW_CLAMP setsockopt(). This should be investigated
+	 * further and fixed in the kernel instead, if confirmed.
+	 */
+	if (!(conn->flags & STALLED) && conn->flags & WND_CLAMPED) {
 		if (prev_scaled == wnd)
 			return;
 
@@ -2409,12 +2425,12 @@ static int tcp_data_from_tap(struct ctx *c, struct tcp_tap_conn *conn,
 			i = keep - 1;
 	}
 
-	tcp_clamp_window(c, conn, max_ack_seq_wnd);
-
 	/* On socket flush failure, pretend there was no ACK, try again later */
 	if (ack && !tcp_sock_consume(conn, max_ack_seq))
 		tcp_update_seqack_from_tap(c, conn, max_ack_seq);
 
+	tcp_clamp_window(c, conn, max_ack_seq_wnd);
+
 	if (retr) {
 		trace("TCP: fast re-transmit, ACK: %u, previous sequence: %u",
 		      max_ack_seq, conn->seq_to_tap);
-- 
@@ -1780,7 +1780,23 @@ static void tcp_clamp_window(const struct ctx *c, struct tcp_tap_conn *conn,
 	wnd <<= conn->ws_from_tap;
 	wnd = MIN(MAX_WINDOW, wnd);
 
-	if (conn->flags & WND_CLAMPED) {
+	/* TODO: With (at least) Linux kernel versions 6.1 to 6.5, if we end up
+	 * with a zero-sized window on a TCP socket, dropping data (once
+	 * acknowledged by the guest) with recv() and MSG_TRUNC doesn't appear
+	 * to be enough to make the kernel advertise a non-zero window to the
+	 * receiver. Forcing a TCP_WINDOW_CLAMP setting, even with the existing
+	 * value, fixes this.
+	 *
+	 * The STALLED flag on a connection is a sufficient indication that we
+	 * might have a zero-sized window on the socket, because it's set if we
+	 * exhausted the tap-side window, or if everything we receive from a
+	 * socket is already in flight to the guest.
+	 *
+	 * So, if STALLED is set, and we received a window value from the tap,
+	 * force a TCP_WINDOW_CLAMP setsockopt(). This should be investigated
+	 * further and fixed in the kernel instead, if confirmed.
+	 */
+	if (!(conn->flags & STALLED) && conn->flags & WND_CLAMPED) {
 		if (prev_scaled == wnd)
 			return;
 
@@ -2409,12 +2425,12 @@ static int tcp_data_from_tap(struct ctx *c, struct tcp_tap_conn *conn,
 			i = keep - 1;
 	}
 
-	tcp_clamp_window(c, conn, max_ack_seq_wnd);
-
 	/* On socket flush failure, pretend there was no ACK, try again later */
 	if (ack && !tcp_sock_consume(conn, max_ack_seq))
 		tcp_update_seqack_from_tap(c, conn, max_ack_seq);
 
+	tcp_clamp_window(c, conn, max_ack_seq_wnd);
+
 	if (retr) {
 		trace("TCP: fast re-transmit, ACK: %u, previous sequence: %u",
 		      max_ack_seq, conn->seq_to_tap);
-- 
2.39.2


  parent reply	other threads:[~2023-09-22 22:06 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-22 22:06 [PATCH RFT 0/5] Fixes and a workaround for TCP stalls with small buffers Stefano Brivio
2023-09-22 22:06 ` [PATCH RFT 1/5] tcp: Fix comment to tcp_sock_consume() Stefano Brivio
2023-09-23  2:48   ` David Gibson
2023-09-22 22:06 ` [PATCH RFT 2/5] tcp: Reset STALLED flag on ACK only, check for pending socket data Stefano Brivio
2023-09-25  3:07   ` David Gibson
2023-09-27 17:05     ` Stefano Brivio
2023-09-28  1:48       ` David Gibson
2023-09-29 15:20         ` Stefano Brivio
2023-10-03  3:20           ` David Gibson
2023-10-05  6:18             ` Stefano Brivio
2023-10-05  7:36               ` David Gibson
2023-09-22 22:06 ` Stefano Brivio [this message]
2023-09-22 22:31   ` [PATCH RFT 3/5] tcp: Force TCP_WINDOW_CLAMP before resetting STALLED flag Stefano Brivio
2023-09-23  7:55   ` David Gibson
2023-09-25  4:09   ` David Gibson
2023-09-25  4:10     ` David Gibson
2023-09-25  4:21     ` David Gibson
2023-09-27 17:05       ` Stefano Brivio
2023-09-28  1:51         ` David Gibson
2023-09-22 22:06 ` [PATCH RFT 4/5] tcp, tap: Don't increase tap-side sequence counter for dropped frames Stefano Brivio
2023-09-25  4:47   ` David Gibson
2023-09-27 17:06     ` Stefano Brivio
2023-09-28  1:58       ` David Gibson
2023-09-29 15:19         ` Stefano Brivio
2023-10-03  3:22           ` David Gibson
2023-10-05  6:19             ` Stefano Brivio
2023-10-05  7:38               ` David Gibson
2023-09-22 22:06 ` [PATCH RFT 5/5] passt.1: Add note about tuning rmem_max and wmem_max for throughput Stefano Brivio
2023-09-25  4:57   ` David Gibson
2023-09-27 17:06     ` Stefano Brivio
2023-09-28  2:02       ` David Gibson
2023-09-25  5:52 ` [PATCH RFT 0/5] Fixes and a workaround for TCP stalls with small buffers David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230922220610.58767-4-sbrivio@redhat.com \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=mhrica@redhat.com \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).