From: Stefano Brivio <sbrivio@redhat.com>
To: passt-dev@passt.top
Cc: Max Chernoff <git@maxchernoff.ca>,
David Gibson <david@gibson.dropbear.id.au>
Subject: [PATCH 1/8] tcp: Limit advertised window to available, not total sending buffer size
Date: Thu, 4 Dec 2025 08:45:34 +0100 [thread overview]
Message-ID: <20251204074542.2156548-2-sbrivio@redhat.com> (raw)
In-Reply-To: <20251204074542.2156548-1-sbrivio@redhat.com>
For non-local connections, we advertise the same window size as what
the peer in turn advertises to us, and limit it to the buffer size
reported via SO_SNDBUF.
That's not quite correct: in order to later avoid failures while
queueing data to the socket, we need to limit the window to the
available buffer size, not the total one.
Use the SIOCOUTQ ioctl and subtract the number of outbound queued
bytes from the total buffer size, then clamp to this value.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
README.md | 2 +-
tcp.c | 18 ++++++++++++++++--
2 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/README.md b/README.md
index 897ae8b..8fdc0a3 100644
--- a/README.md
+++ b/README.md
@@ -291,7 +291,7 @@ speeding up local connections, and usually requiring NAT. _pasta_:
* ✅ all capabilities dropped, other than `CAP_NET_BIND_SERVICE` (if granted)
* ✅ with default options, user, mount, IPC, UTS, PID namespaces are detached
* ✅ no external dependencies (other than a standard C library)
-* ✅ restrictive seccomp profiles (33 syscalls allowed for _passt_, 43 for
+* ✅ restrictive seccomp profiles (34 syscalls allowed for _passt_, 43 for
_pasta_ on x86_64)
* ✅ examples of [AppArmor](/passt/tree/contrib/apparmor) and
[SELinux](/passt/tree/contrib/selinux) profiles available
diff --git a/tcp.c b/tcp.c
index fa95f6b..863ccdb 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1031,6 +1031,8 @@ void tcp_fill_headers(const struct ctx *c, struct tcp_tap_conn *conn,
* @tinfo: tcp_info from kernel, can be NULL if not pre-fetched
*
* Return: 1 if sequence or window were updated, 0 otherwise
+ *
+ * #syscalls ioctl
*/
int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn,
bool force_seq, struct tcp_info_linux *tinfo)
@@ -1113,9 +1115,21 @@ int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn,
if ((conn->flags & LOCAL) || tcp_rtt_dst_low(conn)) {
new_wnd_to_tap = tinfo->tcpi_snd_wnd;
} else {
+ uint32_t sendq;
+ int limit;
+
+ if (ioctl(s, SIOCOUTQ, &sendq)) {
+ debug_perror("SIOCOUTQ on socket %i, assuming 0", s);
+ sendq = 0;
+ }
tcp_get_sndbuf(conn);
- new_wnd_to_tap = MIN((int)tinfo->tcpi_snd_wnd,
- SNDBUF_GET(conn));
+
+ if ((int)sendq > SNDBUF_GET(conn)) /* Due to memory pressure? */
+ limit = 0;
+ else
+ limit = SNDBUF_GET(conn) - (int)sendq;
+
+ new_wnd_to_tap = MIN((int)tinfo->tcpi_snd_wnd, limit);
}
new_wnd_to_tap = MIN(new_wnd_to_tap, MAX_WINDOW);
--
2.43.0
next prev parent reply other threads:[~2025-12-04 7:45 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-04 7:45 [PATCH 0/8] tcp: Fix throughput issues with non-local peers Stefano Brivio
2025-12-04 7:45 ` Stefano Brivio [this message]
2025-12-04 23:10 ` [PATCH 1/8] tcp: Limit advertised window to available, not total sending buffer size David Gibson
2025-12-04 7:45 ` [PATCH 2/8] tcp: Adaptive interval based on RTT for socket-side acknowledgement checks Stefano Brivio
2025-12-04 23:48 ` David Gibson
2025-12-05 1:20 ` Stefano Brivio
2025-12-05 2:49 ` David Gibson
2025-12-04 7:45 ` [PATCH 3/8] tcp: Don't clear ACK_TO_TAP_DUE if we're advertising a zero-sized window Stefano Brivio
2025-12-04 23:50 ` David Gibson
2025-12-04 7:45 ` [PATCH 4/8] tcp: Acknowledge everything if sending buffer is less than SNDBUF_BIG Stefano Brivio
2025-12-05 0:08 ` David Gibson
2025-12-05 1:20 ` Stefano Brivio
2025-12-05 2:50 ` David Gibson
2025-12-08 0:19 ` Stefano Brivio
2025-12-04 7:45 ` [PATCH 5/8] tcp: Don't limit window to less-than-MSS values, use zero instead Stefano Brivio
2025-12-05 0:35 ` David Gibson
2025-12-05 1:20 ` Stefano Brivio
2025-12-05 2:53 ` David Gibson
2025-12-04 7:45 ` [PATCH 6/8] tcp: Allow exceeding the available sending buffer size in window advertisements Stefano Brivio
2025-12-05 2:34 ` David Gibson
2025-12-08 0:20 ` Stefano Brivio
2025-12-04 7:45 ` [PATCH 7/8] tcp: Send a duplicate ACK also on complete sendmsg() failure Stefano Brivio
2025-12-05 2:35 ` David Gibson
2025-12-04 7:45 ` [PATCH 8/8] tcp: Skip redundant ACK on partial " Stefano Brivio
2025-12-05 2:36 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251204074542.2156548-2-sbrivio@redhat.com \
--to=sbrivio@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=git@maxchernoff.ca \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).