public inbox for passt-user@passt.top
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Max Chernoff <git@maxchernoff.ca>
Cc: Stefano Brivio <sbrivio@redhat.com>, passt-user@passt.top
Subject: Re: pasta slow at HTTP upload
Date: Mon, 17 Nov 2025 16:13:26 +1100	[thread overview]
Message-ID: <aRqu9o1FoIUWmDHD@zatzit> (raw)
In-Reply-To: <176301983731.2033508.12381101277059600955@maja>

[-- Attachment #1: Type: text/plain, Size: 13366 bytes --]

On Wed, Nov 12, 2025 at 05:30:35PM -0700, Max Chernoff via user wrote:
> Date: Wed, 12 Nov 2025 17:30:35 -0700
> From: Max Chernoff <git@maxchernoff.ca>
> To: Stefano Brivio <sbrivio@redhat.com>
> CC: passt-user@passt.top
> Subject: Re: pasta slow at HTTP upload
> List-Id: "For passt users: support, questions and answers"
>  <passt-user.passt.top>
> 
> Hi Stefano,
> 
> On Wed, 2025-11-12 at 13:53 +0100, Stefano Brivio wrote:
> > As a further hack, you could probably do something like this on top:
> >
> > ---
> > diff --git a/tcp.c b/tcp.c
> > index 697f80d..8c50ee0 100644
> > --- a/tcp.c
> > +++ b/tcp.c
> > @@ -339,7 +339,7 @@ enum {
> >  #define MSS_DEFAULT			536
> >  #define WINDOW_DEFAULT			14600		/*
> > RFC 6928 */
> >
> > -#define ACK_INTERVAL			1		/* ms */
> > +#define ACK_INTERVAL			200		/* us */
> >  #define SYN_TIMEOUT			10		/* s */
> >  #define ACK_TIMEOUT			2
> >  #define FIN_TIMEOUT			60
> > @@ -582,7 +582,7 @@ static void tcp_timer_ctl(struct tcp_tap_conn *conn)
> >  	}
> >
> >  	if (conn->flags & ACK_TO_TAP_DUE) {
> > -		it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000;
> > +		it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000;
> >  	} else if (conn->flags & ACK_FROM_TAP_DUE) {
> >  		if (!(conn->events & ESTABLISHED))
> >  			it.it_value.tv_sec = SYN_TIMEOUT;
> > ---
> 
> That actually makes it worse again, about as bad as before the patch.
> But I've just tried rebuilding with original patch again, and also with
> the exact same binary that I used yesterday, and that's slow now too.
> I've verified with pgrep that Podman is using the correct pasta version,
> so I have no idea what's happening.
> 
> However, I do remember that for the past few months, some uploads would
> randomly go really quickly, so maybe the problem happens sporadically,
> and when I was testing the patched version I just happened to get
> (un)lucky?
> 
> > >     net.core.wmem_max=7500000
> > >     net.core.rmem_max=7500000
> >
> > Those were settings we recommended for KubeVirt until
> > https://github.com/kubevirt/user-guide/pull/933, but they don't seem to
> > necessarily make sense as we seem to have made peace with the TCP
> > auto-tuning mechanism in Linux meanwhile.
> >
> > See also https://bugs.passt.top/show_bug.cgi?id=138 and commit
> > 71249ef3f9bc ("tcp, tcp_splice: Don't set SO_SNDBUF and SO_RCVBUF to
> > maximum values").
> >
> > As the issue here is about socket (kernel) buffers being "too small" for
> > a while, I guess that those settings plus reverting that commit would
> > "fix" the issue entirely for you. But it's impractical to rely on users
> > to set those, that's why I'm looking for something adaptive which still
> > plays nicely with TCP auto-tuning instead.
> 
> Ah, I didn't know that those (used to be) recommended for pasta; I set
> those for Caddy since it complains on startup if those aren't set
> 
>     https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes
> 
> > >     net.ipv4.tcp_notsent_lowat=131072
> > >     net.core.default_qdisc=cake
> > >     net.ipv4.tcp_congestion_control=bbr
> 
> I set those as a general performance tuning thing (not for pasta
> specifically) based off of
> 
>     https://blog.cloudflare.com/optimizing-tcp-for-high-throughput-and-low-latency/
> 
>     https://grapheneos.org/articles/server-traffic-shaping
> 
> > I'm not sure if those really matter for pasta, but I haven't really
> > thought about them.
> 
> Aha, those do seem to be the issue. Using the original (unpatched)
> pasta:

Huh, interesting.

>     (Set everything to my original settings)
>     $ sudo sysctl -w net.core.wmem_max=7500000 net.core.rmem_max=7500000 net.ipv4.tcp_notsent_lowat=131072 net.core.default_qdisc=cake net.ipv4.tcp_congestion_control=bbr
> 
>     (Test with --network=host)
>     $ podman run --rm --pull=newer --volume="$(realpath .):/srv/:Z" --workdir=/srv/ --network=host quay.io/fedora/fedora-minimal curl --output /dev/null --progress-meter --form file=@./test.tar.gz "https://www.ctan.org/submit/validate"
>       % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                      Dload  Upload   Total   Spent    Left  Speed
>     100  100M    0   345  100  100M     69  20.0M  0:00:04  0:00:04 --:--:-- 20.4M
> 
>     (Test with --network=pasta)
>     $ podman run --rm --pull=newer --volume="$(realpath .):/srv/:Z" --workdir=/srv/ --network=pasta quay.io/fedora/fedora-minimal curl --output /dev/null --progress-meter --form file=@./test.tar.gz "https://www.ctan.org/submit/validate"
>       % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                      Dload  Upload   Total   Spent    Left  Speed
>       4  100M    0     0    4 5120k      0  42440  0:41:10  0:02:03  0:39:07 26188
>     (Stopped early since I got sick of waiting)
> 
>     (Set everything to the kernel defaults)
>     $ sudo sysctl -w net.core.wmem_max=212992 net.core.rmem_max=212992 net.ipv4.tcp_notsent_lowat=4294967295 net.core.default_qdisc=fq_codel net.ipv4.tcp_congestion_control=cubic
>     $ podman run --rm --pull=newer --volume="$(realpath .):/srv/:Z" --workdir=/srv/ --network=pasta quay.io/fedora/fedora-minimal curl --output /dev/null --progress-meter --form file=@./test.tar.gz "https://www.ctan.org/submit/validate"
>       % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                      Dload  Upload   Total   Spent    Left  Speed
>     100  100M    0   345  100  100M      4  1431k  0:01:11  0:01:11 --:--:-- 1399k
> 
>     (tcp_congestion_control=default, tcp_notsent_lowat=custom, [rw]mem_max=default)
>     $ sudo sysctl -w net.ipv4.tcp_congestion_control=cubic net.ipv4.tcp_notsent_lowat=131072 net.core.rmem_max=212992 net.core.wmem_max=212992
>     $ podman run --rm --pull=newer --volume="$(realpath .):/srv/:Z" --workdir=/srv/ --network=pasta quay.io/fedora/fedora-minimal curl --output /dev/null --progress-meter --form file=@./test.tar.gz "https://www.ctan.org/submit/validate"
>       % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                      Dload  Upload   Total   Spent    Left  Speed
>     100  100M    0   345  100  100M      5  1484k  0:01:08  0:01:08 --:--:-- 1400k
> 
>     (tcp_congestion_control=custom, tcp_notsent_lowat=default, [rw]mem_max=default)
>     $ sudo sysctl -w net.ipv4.tcp_congestion_control=bbr net.ipv4.tcp_notsent_lowat=4294967295 net.core.rmem_max=212992 net.core.wmem_max=212992
>     $ podman run --rm --pull=newer --volume="$(realpath .):/srv/:Z" --workdir=/srv/ --network=pasta quay.io/fedora/fedora-minimal curl --output /dev/null --progress-meter --form file=@./test.tar.gz "https://www.ctan.org/submit/validate"
>       % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                      Dload  Upload   Total   Spent    Left  Speed
>     100  100M    0   345  100  100M     12  3576k  0:00:28  0:00:28 --:--:-- 14.9M
> 
>     (tcp_congestion_control=custom, tcp_notsent_lowat=custom, [rw]mem_max=default)
>     $ sudo sysctl -w net.ipv4.tcp_congestion_control=bbr net.ipv4.tcp_notsent_lowat=131072 net.core.rmem_max=212992 net.core.wmem_max=212992
>     $ podman run --rm --pull=newer --volume="$(realpath .):/srv/:Z" --workdir=/srv/ --network=pasta quay.io/fedora/fedora-minimal curl --output /dev/null --progress-meter --form file=@./test.tar.gz "https://www.ctan.org/submit/validate"
>       % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                      Dload  Upload   Total   Spent    Left  Speed
>     100  100M    0   345  100  100M      8  2595k  0:00:39  0:00:39 --:--:-- 17.1M
> 
>     (tcp_congestion_control=default, tcp_notsent_lowat=default, [rw]mem_max=custom)
>     $ sudo sysctl -w net.ipv4.tcp_congestion_control=cubic net.ipv4.tcp_notsent_lowat=4294967295 net.core.rmem_max=7500000 net.core.wmem_max=7500000
>     $ podman run --rm --pull=newer --volume="$(realpath .):/srv/:Z" --workdir=/srv/ --network=pasta quay.io/fedora/fedora-minimal curl --output /dev/null --progress-meter --form file=@./test.tar.gz "https://www.ctan.org/submit/validate"
>       % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                      Dload  Upload   Total   Spent    Left  Speed
>     100  100M    0   345  100  100M      7  2107k  0:00:48  0:00:48 --:--:-- 8310k
> 
>     (tcp_congestion_control=custom, tcp_notsent_lowat=default, [rw]mem_max=custom)
>     $ sudo sysctl -w net.ipv4.tcp_congestion_control=bbr net.ipv4.tcp_notsent_lowat=4294967295 net.core.rmem_max=7500000 net.core.wmem_max=7500000
>     $ podman run --rm --pull=newer --volume="$(realpath .):/srv/:Z" --workdir=/srv/ --network=pasta quay.io/fedora/fedora-minimal curl --output /dev/null --progress-meter --form file=@./test.tar.gz "https://www.ctan.org/submit/validate"
>       % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                      Dload  Upload   Total   Spent    Left  Speed
>     100  100M    0   345  100  100M     15  4620k  0:00:22  0:00:22 --:--:-- 12.1M
> 
>     (tcp_congestion_control=custom, tcp_notsent_lowat=custom, [rw]mem_max=custom)
>     $ sudo sysctl -w net.ipv4.tcp_congestion_control=bbr net.ipv4.tcp_notsent_lowat=131072 net.core.rmem_max=7500000 net.core.wmem_max=7500000
>     $ podman run --rm --pull=newer --volume="$(realpath .):/srv/:Z" --workdir=/srv/ --network=pasta quay.io/fedora/fedora-minimal curl --output /dev/null --progress-meter --form file=@./test.tar.gz "https://www.ctan.org/submit/validate"
>       % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                      Dload  Upload   Total   Spent    Left  Speed
>       5  100M    0     0    5 5376k      0  29899  0:58:27  0:03:04  0:55:23     0
>     (Stopped early since I got sick of waiting)
> 
>     (Test with --network=host again)
>     $ podman run --rm --pull=newer --volume="$(realpath .):/srv/:Z" --workdir=/srv/ --network=host quay.io/fedora/fedora-minimal curl --output /dev/null --progress-meter --form file=@./test.tar.gz "https://www.ctan.org/submit/validate"
>       % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                      Dload  Upload   Total   Spent    Left  Speed
>     100  100M    0   345  100  100M     69  20.0M  0:00:04  0:00:04 --:--:-- 24.3M

Well, that sure is a confusing pattern.  Looks like tcp_notsent_lowat
is the biggest culprit... except for the outlier where it seems ok
with both custom notsent_lowat and custom congestion control.

How repeatable are each of these results?

> I know fairly little about networking and the kernel, so if the answer
> is just "don't set those sysctls together", I'd be okay with that. But I
> haven't changed these sysctls since February, my upload speeds via pasta
> were fine up until a few months ago, and the upload speeds are still
> okay with --network=host, so I suspect that this is a bug somewhere.

I have a vague theory why tcp_notsent_lowat might be bad for pasta but
not --network=host.  I'm far from certain about it though, and I don't
have a good idea what change on the pasta side might have changed.
Actually, is it just pasta that's changed in that time?  Or could it
be a new kernel version or something else?

I'm having some trouble wrapping my head around what tcp_notsent_lowat
does, but I _think_ the upshot of it will be to make the socket send
path more "bursty":  we'll fill the socket send buffer (which could be
very large), then stop.  We won't wake up again to refill until the
amount in the socket buffer drops below tcp_notsend_lowat.  Assuming
we've been getting plenty of data from the other side, we'll then
refill the buffer very quickly.

Less wakups sounds like a good thing in the abstract, but I'm
wondering if this burstiness is messing with the flow control on the
other side, causing the guest side peer to slow down to nothing.  If
that is the case, it would make sense that it gets even worse with
large socket buffer sizes, since that would make the "bursts" even
bigger.

Likewise I was wondering if having different congestion control
algorithms on each side might somehow conflict causing a slowdown.  I
don't think the data you have above really suggest that though.

> I also find it quite interesting that setting any of the sysctls
> individually or in pairs improves the upload speeds, but setting all 3
> at once slows it down drastically.

I'm curious about the case with default congestion control, but custom
lowat and mem_max.  I think that one's missing from the table above.

> I bisected a kernel a few weeks ago,
> so I can try that here if you think that this is a kernel bug and not a
> pasta bug.

Could be either, I have no real intuition either way so far.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2025-11-17  5:13 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <176293029592.2033508.497353982367240204@maja>
2025-11-12  6:55 ` Stefano Brivio
2025-11-12 10:32   ` Stefano Brivio
2025-11-12 11:22     ` Max Chernoff
2025-11-12 12:53       ` Stefano Brivio
2025-11-13  0:30         ` Max Chernoff
     [not found]         ` <176301983731.2033508.12381101277059600955@maja>
2025-11-17  5:13           ` David Gibson [this message]
2025-11-23  9:12             ` Max Chernoff
2025-11-12  6:11 Max Chernoff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRqu9o1FoIUWmDHD@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=git@maxchernoff.ca \
    --cc=passt-user@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).