From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=TDkA1KyH; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id C12A55A0626 for ; Sat, 13 Dec 2025 15:31:14 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1765636273; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aST9xWl9akjollC8f6HHikvbyl7HOAWo53AFCEn5FfE=; b=TDkA1KyH9yEGW1jqw7lUbHYZEkoCra7WOUXMNINNiLSiH19SUIfQ+oaROYIio1SxOuXVyv gket8PiAWNHjAfTu6tcNPRHKlG97pIIm2ZaxNYqU5tgXqWhqaXYgYZk35CdurPBPSTyPE9 fp+6LbkPodv2p6v+cYl/2RGGY5WU3rU= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-462-3mGRCsD4Pq21sZPEng2rYw-1; Sat, 13 Dec 2025 09:31:12 -0500 X-MC-Unique: 3mGRCsD4Pq21sZPEng2rYw-1 X-Mimecast-MFC-AGG-ID: 3mGRCsD4Pq21sZPEng2rYw_1765636271 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-47921784b97so16232855e9.0 for ; Sat, 13 Dec 2025 06:31:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765636270; x=1766241070; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=aST9xWl9akjollC8f6HHikvbyl7HOAWo53AFCEn5FfE=; b=PcfeqngCAHUj82t9cvz/n8mW1AuySn1jZPe5iwUpSNr66YEXzxOnKOUVmHv6Juz5lx BHbpvnNi+WqDhjzxMUI94WwYEfArftsHRGGUmIgCNbBe4bTmQcSI41YyXQnxupbKfmJ2 lxpg2G6184D5xCP7SA7HcUMHgFFbpyFj9or9yCmwdBjY+X24A+rSsG4qW9VlVH4Y7IDC 4Qo71S6ofhIYu6ni60m/rBiRM6MUd4ZGZhWkAR07yCW57k+MAHRVPMZMVAC9ncibTg9t wceYt5pN5CBrpswn3lgPx3Qy4Bkfj1W/291s1c2ShzIuo9pm2sQfOevFKh0Me5IWuct8 PCCw== X-Gm-Message-State: AOJu0YxFbUAs8UifZGVr8v9x8sYl1g3osHYtVp6Bpvn13ED0eBaVOXt7 OKppfwTR6G9VgqREwNe4CEnRwK/kPIifym06qOf6KWyZ/hMKlT8EpkEbiSvtzcnN0UTX8meQDCd zOoxyWHljMlVAvrA5yq9O9aKGWNm/ZH11m2TSBidFhploUbZjFXC6AUrTm6MeuQ== X-Gm-Gg: AY/fxX5hhkMpWQN58LAUA5A1XVk4hfDy78S6dcigbZOM8Ndaf51YsZv4rnxBaQ4JjoB gIJ5qo8DzjDilA5L8ZpJ1mOmWZK4unFW4K+UqO8YXcrpIu/uyv8ZhFUIFzP1ZG76eG0SRVdH29O P07KFT0Q5LKeF2EIiQUmDlQHr4mhF125r12J3aGzPgd2zqnSAp6ZswugfDnq5chbsSBNEFuCE+Y IpDTIYlYux2GdWBqG/h3WDES9Qm+9bLCj+srUOASuKQKCaTqCYuFBdHdGYAWpZA55/roZ48AbKO 8QP7bBFfyWbcWucRCuCZgyenuOSwcObkzCUUlUqGHBzrjD2yyo28/xWtAXZ5ly5meBZpFXZLAML f7TWraojf2yIvarBy3ZeF X-Received: by 2002:a05:600c:3146:b0:479:3a88:de5f with SMTP id 5b1f17b1804b1-47a8f9174f3mr49596195e9.36.1765636270421; Sat, 13 Dec 2025 06:31:10 -0800 (PST) X-Google-Smtp-Source: AGHT+IFv9A96R4C/w/mpQUdkCp37R/o36ziHpG54l7/Tsvvz1Lxn2OpT/2k4pYQYYAQQAqVXds0LVg== X-Received: by 2002:a05:600c:3146:b0:479:3a88:de5f with SMTP id 5b1f17b1804b1-47a8f9174f3mr49596015e9.36.1765636269982; Sat, 13 Dec 2025 06:31:09 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-47a8f8f3894sm87894845e9.15.2025.12.13.06.31.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Dec 2025 06:31:09 -0800 (PST) Date: Sat, 13 Dec 2025 15:31:08 +0100 From: Stefano Brivio To: passt-dev@passt.top Subject: Re: [PATCH] tcp: Use less-than-MSS window on no queued data, or no data sent recently Message-ID: <20251213153108.1b7b54e7@elisabeth> In-Reply-To: <20251213142540.1319527-1-sbrivio@redhat.com> References: <20251213142540.1319527-1-sbrivio@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 5kbKDNlyr3Exoqm3hx1HUSdnt2okP8RRj2kBb6uC_8c_1765636271 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: UW5YZ2YT5TOWJJ7KY7HSAI3LIUEZ4WYP X-Message-ID-Hash: UW5YZ2YT5TOWJJ7KY7HSAI3LIUEZ4WYP X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: David Gibson , Max Chernoff , Tyler Cloud X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Sat, 13 Dec 2025 15:25:40 +0100 Stefano Brivio wrote: > We limit the advertised window to guests and containers to the > available length of the sending buffer, and if it's less than the MSS, > since commit cf1925fb7b77 ("tcp: Don't limit window to less-than-MSS > values, use zero instead"), we approximate that limit to zero. > > This way, we'll trigger a window update as soon as we realise that we > can advertise a larger value, just like we do in all other cases where > we advertise a zero-sized window. > > By doing that, we don't wait for the peer to send us data before we > update the window. This matters because the guest or container might > be trying to aggregate more data and won't send us anything at all if > the advertised window is too small. > > However, this might be problematic in two situations: > > 1. one, reported by Tyler, where the remote (receiving) peer > advertises a window that's smaller than what we usually get and > very close to the MSS, causing the kernel to give us a starting > size of the buffer that's less than the MSS we advertise to the > guest or container. > > If this happens, we'll never advertise a non-zero window after > the handshake, and the container or guest will never send us any > data at all. > > With a simple 'curl https://cloudflare.com/', we get, with default > TCP memory parameters, a 65535-byte window from the peer, and 46080 > bytes of initial sending buffer from the kernel. But we advertised > a 65480-byte MSS, and we'll never actually receive the client > request. > > This seems to be specific to Cloudflare for some reason, probably > deriving from a particular tuning of TCP parameters on their > servers. > > 2. another one, hypothesised by David, where the peer might only be > willing to process (and acknowledge) data in batches. > > We might have queued outbound data which is, at the same time, not > enough to fill one of these batches and be acknowledged and removed > from the sending queue, but enough to make our available buffer > smaller than the MSS, and the connection will hang. > > Take care of both cases by: > > a. not approximating the sending buffer to zero if we have no outboud > queued data at all, because in that case we don't expect the > available buffer to increase if we don't send any data, so there's > no point in waiting for it to grow larger than the MSS. > > This fixes problem 1. above. > > b. also using the full sending buffer size if we haven't send data to > the socket for a while (reported by tcpi_last_data_sent). This part > was already suggested by David in: > > https://archives.passt.top/passt-dev/aTZzgtcKWLb28zrf@zatzit/ > > and I'm now picking ten times the RTT as a somewhat arbitrary > threshold. > > This is meant to take care of potential problem 2. above, but it > also happens to fix 1. > > Reported-by: Tyler Cloud > Link: https://bugs.passt.top/show_bug.cgi?id=183 And, I forgot: Fixes: cf1925fb7b77 ("tcp: Don't limit window to less-than-MSS values, use zero instead") > Suggested-by: David Gibson > Signed-off-by: Stefano Brivio > --- > tcp.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/tcp.c b/tcp.c > index 81bc114..b179e39 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -1211,8 +1211,21 @@ int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn, > * the MSS to zero, as we already have mechanisms in place to > * force updates after the window becomes zero. This matches the > * suggestion from RFC 813, Section 4. > + * > + * But don't do this if, either: > + * > + * - there's nothing in the outbound queue: the size of the > + * sending buffer is limiting us, and it won't increase if we > + * don't send data, so there's no point in waiting, or > + * > + * - we haven't sent data in a while (somewhat arbitrarily, ten > + * times the RTT), as that might indicate that the receiver > + * will only process data in batches that are large enough, > + * but we won't send enough to fill one because we're stuck > + * with pending data in the outbound queue > */ > - if (limit < MSS_GET(conn)) > + if (limit < MSS_GET(conn) && sendq && > + tinfo->tcpi_last_data_sent < tinfo->tcpi_rtt / 1000 * 10) > limit = 0; > > new_wnd_to_tap = MIN((int)tinfo->tcpi_snd_wnd, limit); -- Stefano