public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top, Max Chernoff <git@maxchernoff.ca>
Subject: Re: [PATCH v2 6/9] tcp: Don't limit window to less-than-MSS values, use zero instead
Date: Mon, 8 Dec 2025 17:43:14 +1100	[thread overview]
Message-ID: <aTZzgtcKWLb28zrf@zatzit> (raw)
In-Reply-To: <20251208002229.391162-7-sbrivio@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2900 bytes --]

On Mon, Dec 08, 2025 at 01:22:14AM +0100, Stefano Brivio wrote:
> If the sender uses data clumping (including Nagle's algorithm) for
> Silly Window Syndrome (SWS) avoidance, advertising less than a MSS
> means the sender might stop sending altogether, and window updates
> after a low window condition are just as important as they are in
> a zero-window condition.
> 
> For simplicity, approximate that limit to zero, as we have an
> implementation forcing window updates after zero-sized windows.
> This matches the suggestion from RFC 813, section 4.
> 
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Looking at this again, I'm worrying if it might allow a pathalogical
case here: unlikely to hit, but very bad if it did.

Suppose we have:
  1. A receiver that wants to consume its input in fixed largish
     (~64kiB) records
  2. The receiver has locked its SO_RCVBUF to that record length, or
     only slightly more
  3. The receive buf is near full - but not quite a full record's
     worth

The receiver doesn't consume anything, because it doesn't have a full
record.  Its rcvbuf is near full, so its kernel advertises only a
small window.  We approximate that to zero, so the sender can't send
anything.  So, the record never gets completed and we stall
completely.

The solution would be to "time out" this limitation of the advertised
window, not sure how complicated that would be in practice.

> ---
>  tcp.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/tcp.c b/tcp.c
> index 533c8a7..3c046a5 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -1155,6 +1155,23 @@ int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn,
>  		else
>  			limit = SNDBUF_GET(conn) - (int)sendq;
>  
> +		/* If the sender uses mechanisms to prevent Silly Window
> +		 * Syndrome (SWS, described in RFC 813 Section 3) it's critical
> +		 * that, should the window ever become less than the MSS, we
> +		 * advertise a new value once it increases again to be above it.
> +		 *
> +		 * The mechanism to avoid SWS in the kernel is, implicitly,
> +		 * implemented by Nagle's algorithm (which was proposed after
> +		 * RFC 813).
> +		 *
> +		 * To this end, for simplicity, approximate a window value below
> +		 * the MSS to zero, as we already have mechanisms in place to
> +		 * force updates after the window becomes zero. This matches the
> +		 * suggestion from RFC 813, Section 4.
> +		 */
> +		if (limit < MSS_GET(conn))
> +			limit = 0;
> +
>  		new_wnd_to_tap = MIN((int)tinfo->tcpi_snd_wnd, limit);
>  	}
>  
> -- 
> 2.43.0
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2025-12-08  6:43 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-08  0:22 [PATCH v2 0/9] tcp: Fix throughput issues with non-local peers Stefano Brivio
2025-12-08  0:22 ` [PATCH v2 1/9] tcp, util: Add function for scaling to linearly interpolated factor, use it Stefano Brivio
2025-12-08  5:33   ` David Gibson
2025-12-08  0:22 ` [PATCH v2 2/9] tcp: Limit advertised window to available, not total sending buffer size Stefano Brivio
2025-12-08  0:22 ` [PATCH v2 3/9] tcp: Adaptive interval based on RTT for socket-side acknowledgement checks Stefano Brivio
2025-12-08  5:41   ` David Gibson
2025-12-08  7:22     ` Stefano Brivio
2025-12-08  8:28       ` David Gibson
2025-12-08  0:22 ` [PATCH v2 4/9] tcp: Don't clear ACK_TO_TAP_DUE if we're advertising a zero-sized window Stefano Brivio
2025-12-08  0:22 ` [PATCH v2 5/9] tcp: Acknowledge everything if it looks like bulk traffic, not interactive Stefano Brivio
2025-12-08  5:54   ` David Gibson
2025-12-08  7:25     ` Stefano Brivio
2025-12-08  8:31       ` David Gibson
2025-12-08  0:22 ` [PATCH v2 6/9] tcp: Don't limit window to less-than-MSS values, use zero instead Stefano Brivio
2025-12-08  6:43   ` David Gibson [this message]
2025-12-08  8:11     ` Stefano Brivio
2025-12-08  0:22 ` [PATCH v2 7/9] tcp: Allow exceeding the available sending buffer size in window advertisements Stefano Brivio
2025-12-08  6:25   ` David Gibson
2025-12-08  7:45     ` Stefano Brivio
2025-12-08  0:22 ` [PATCH v2 8/9] tcp: Send a duplicate ACK also on complete sendmsg() failure Stefano Brivio
2025-12-08  0:22 ` [PATCH v2 9/9] tcp: Skip redundant ACK on partial " Stefano Brivio
2025-12-08  6:46 ` [PATCH v2 0/9] tcp: Fix throughput issues with non-local peers David Gibson
2025-12-08  8:22   ` Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aTZzgtcKWLb28zrf@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=git@maxchernoff.ca \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).