public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: Matej Hrica <mhrica@redhat.com>, passt-dev@passt.top
Subject: Re: [PATCH RFT 3/5] tcp: Force TCP_WINDOW_CLAMP before resetting STALLED flag
Date: Sat, 23 Sep 2023 17:55:44 +1000	[thread overview]
Message-ID: <ZQ6aACKk3LpHl7F4@zatzit> (raw)
In-Reply-To: <20230922220610.58767-4-sbrivio@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 4144 bytes --]

On Sat, Sep 23, 2023 at 12:06:08AM +0200, Stefano Brivio wrote:
> It looks like we need it as workaround for this situation, readily
> reproducible at least with a 6.5 Linux kernel, with default rmem_max
> and wmem_max values:
> 
> - an iperf3 client on the host sends about 160 KiB, typically
>   segmented into five frames by passt. We read this data using
>   MSG_PEEK
> 
> - the iperf3 server on the guest starts receiving
> 
> - meanwhile, the host kernel advertised a zero-sized window to the
>   receiver, as expected

You noted the s/receiver/sender/ here..

> 
> - eventually, the guest acknowledges all the data sent so far, and
>   we drop it from the buffer, courtesy of tcp_sock_consume(), using
>   recv() with MSG_TRUNC
> 
> - the client, however, doesn't get an updated window value, and
>   even keepalive packets are answered with zero-window segments,
>   until the connection is closed
> 
> It looks like dropping data from a socket using MSG_TRUNC doesn't
> cause a recalculation of the window, which would be expected as a
> result of any receiving operation that invalidates data on a buffer
> (that is, not with MSG_PEEK).
> 
> Strangely enough, setting TCP_WINDOW_CLAMP via setsockopt(), even to
> the previous value we clamped to, forces a recalculation of the
> window which is advertised to the guest.

..and the s/guest/sender/ here..

> 
> I couldn't quite confirm this issue by following all the possible
> code paths in the kernel, yet. If confirmed, this should be fixed in
> the kernel, but meanwhile this workaround looks robust to me (and it
> will be needed for backward compatibility anyway).
> 
> Reported-by: Matej Hrica <mhrica@redhat.com>
> Link: https://bugs.passt.top/show_bug.cgi?id=74
> Analysed-by: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> ---
>  tcp.c | 22 +++++++++++++++++++---
>  1 file changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/tcp.c b/tcp.c
> index 5528e05..4606f17 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -1780,7 +1780,23 @@ static void tcp_clamp_window(const struct ctx *c, struct tcp_tap_conn *conn,
>  	wnd <<= conn->ws_from_tap;
>  	wnd = MIN(MAX_WINDOW, wnd);
>  
> -	if (conn->flags & WND_CLAMPED) {
> +	/* TODO: With (at least) Linux kernel versions 6.1 to 6.5, if we end up
> +	 * with a zero-sized window on a TCP socket, dropping data (once
> +	 * acknowledged by the guest) with recv() and MSG_TRUNC doesn't appear
> +	 * to be enough to make the kernel advertise a non-zero window to the
> +	 * receiver. Forcing a TCP_WINDOW_CLAMP setting, even with the existing

.. but you need another s/receiver/sender/ here too.

> +	 * value, fixes this.
> +	 *
> +	 * The STALLED flag on a connection is a sufficient indication that we
> +	 * might have a zero-sized window on the socket, because it's set if we
> +	 * exhausted the tap-side window, or if everything we receive from a
> +	 * socket is already in flight to the guest.
> +	 *
> +	 * So, if STALLED is set, and we received a window value from the tap,
> +	 * force a TCP_WINDOW_CLAMP setsockopt(). This should be investigated
> +	 * further and fixed in the kernel instead, if confirmed.
> +	 */
> +	if (!(conn->flags & STALLED) && conn->flags & WND_CLAMPED) {
>  		if (prev_scaled == wnd)
>  			return;
>  
> @@ -2409,12 +2425,12 @@ static int tcp_data_from_tap(struct ctx *c, struct tcp_tap_conn *conn,
>  			i = keep - 1;
>  	}
>  
> -	tcp_clamp_window(c, conn, max_ack_seq_wnd);
> -
>  	/* On socket flush failure, pretend there was no ACK, try again later */
>  	if (ack && !tcp_sock_consume(conn, max_ack_seq))
>  		tcp_update_seqack_from_tap(c, conn, max_ack_seq);
>  
> +	tcp_clamp_window(c, conn, max_ack_seq_wnd);
> +
>  	if (retr) {
>  		trace("TCP: fast re-transmit, ACK: %u, previous sequence: %u",
>  		      max_ack_seq, conn->seq_to_tap);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2023-09-23  8:08 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-22 22:06 [PATCH RFT 0/5] Fixes and a workaround for TCP stalls with small buffers Stefano Brivio
2023-09-22 22:06 ` [PATCH RFT 1/5] tcp: Fix comment to tcp_sock_consume() Stefano Brivio
2023-09-23  2:48   ` David Gibson
2023-09-22 22:06 ` [PATCH RFT 2/5] tcp: Reset STALLED flag on ACK only, check for pending socket data Stefano Brivio
2023-09-25  3:07   ` David Gibson
2023-09-27 17:05     ` Stefano Brivio
2023-09-28  1:48       ` David Gibson
2023-09-29 15:20         ` Stefano Brivio
2023-10-03  3:20           ` David Gibson
2023-10-05  6:18             ` Stefano Brivio
2023-10-05  7:36               ` David Gibson
2023-09-22 22:06 ` [PATCH RFT 3/5] tcp: Force TCP_WINDOW_CLAMP before resetting STALLED flag Stefano Brivio
2023-09-22 22:31   ` Stefano Brivio
2023-09-23  7:55   ` David Gibson [this message]
2023-09-25  4:09   ` David Gibson
2023-09-25  4:10     ` David Gibson
2023-09-25  4:21     ` David Gibson
2023-09-27 17:05       ` Stefano Brivio
2023-09-28  1:51         ` David Gibson
2023-09-22 22:06 ` [PATCH RFT 4/5] tcp, tap: Don't increase tap-side sequence counter for dropped frames Stefano Brivio
2023-09-25  4:47   ` David Gibson
2023-09-27 17:06     ` Stefano Brivio
2023-09-28  1:58       ` David Gibson
2023-09-29 15:19         ` Stefano Brivio
2023-10-03  3:22           ` David Gibson
2023-10-05  6:19             ` Stefano Brivio
2023-10-05  7:38               ` David Gibson
2023-09-22 22:06 ` [PATCH RFT 5/5] passt.1: Add note about tuning rmem_max and wmem_max for throughput Stefano Brivio
2023-09-25  4:57   ` David Gibson
2023-09-27 17:06     ` Stefano Brivio
2023-09-28  2:02       ` David Gibson
2023-09-25  5:52 ` [PATCH RFT 0/5] Fixes and a workaround for TCP stalls with small buffers David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZQ6aACKk3LpHl7F4@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=mhrica@redhat.com \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).