On Thu, Dec 04, 2025 at 08:45:34AM +0100, Stefano Brivio wrote: > For non-local connections, we advertise the same window size as what > the peer in turn advertises to us, and limit it to the buffer size > reported via SO_SNDBUF. > > That's not quite correct: in order to later avoid failures while > queueing data to the socket, we need to limit the window to the > available buffer size, not the total one. > > Use the SIOCOUTQ ioctl and subtract the number of outbound queued > bytes from the total buffer size, then clamp to this value. > > Signed-off-by: Stefano Brivio Reviewed-by: David Gibson > --- > README.md | 2 +- > tcp.c | 18 ++++++++++++++++-- > 2 files changed, 17 insertions(+), 3 deletions(-) > > diff --git a/README.md b/README.md > index 897ae8b..8fdc0a3 100644 > --- a/README.md > +++ b/README.md > @@ -291,7 +291,7 @@ speeding up local connections, and usually requiring NAT. _pasta_: > * ✅ all capabilities dropped, other than `CAP_NET_BIND_SERVICE` (if granted) > * ✅ with default options, user, mount, IPC, UTS, PID namespaces are detached > * ✅ no external dependencies (other than a standard C library) > -* ✅ restrictive seccomp profiles (33 syscalls allowed for _passt_, 43 for > +* ✅ restrictive seccomp profiles (34 syscalls allowed for _passt_, 43 for > _pasta_ on x86_64) > * ✅ examples of [AppArmor](/passt/tree/contrib/apparmor) and > [SELinux](/passt/tree/contrib/selinux) profiles available > diff --git a/tcp.c b/tcp.c > index fa95f6b..863ccdb 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -1031,6 +1031,8 @@ void tcp_fill_headers(const struct ctx *c, struct tcp_tap_conn *conn, > * @tinfo: tcp_info from kernel, can be NULL if not pre-fetched > * > * Return: 1 if sequence or window were updated, 0 otherwise > + * > + * #syscalls ioctl > */ > int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn, > bool force_seq, struct tcp_info_linux *tinfo) > @@ -1113,9 +1115,21 @@ int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn, > if ((conn->flags & LOCAL) || tcp_rtt_dst_low(conn)) { > new_wnd_to_tap = tinfo->tcpi_snd_wnd; > } else { > + uint32_t sendq; > + int limit; > + > + if (ioctl(s, SIOCOUTQ, &sendq)) { > + debug_perror("SIOCOUTQ on socket %i, assuming 0", s); > + sendq = 0; > + } > tcp_get_sndbuf(conn); > - new_wnd_to_tap = MIN((int)tinfo->tcpi_snd_wnd, > - SNDBUF_GET(conn)); > + > + if ((int)sendq > SNDBUF_GET(conn)) /* Due to memory pressure? */ > + limit = 0; > + else > + limit = SNDBUF_GET(conn) - (int)sendq; > + > + new_wnd_to_tap = MIN((int)tinfo->tcpi_snd_wnd, limit); > } > > new_wnd_to_tap = MIN(new_wnd_to_tap, MAX_WINDOW); > -- > 2.43.0 > -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson