From: Eugenio Perez Martin <eperezma@redhat.com>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top, Jason Wang <jasowang@redhat.com>,
Jeff Nelson <jenelson@redhat.com>,
Paul Holzinger <pholzing@redhat.com>
Subject: Re: vhost-kernel net on pasta: from 26 to 37Gbit/s
Date: Wed, 11 Jun 2025 09:04:57 +0200 [thread overview]
Message-ID: <CAJaqyWcWMhSfrajNpVwpRMda3x5Y62QvrpWRRRXrVrR9gx78nQ@mail.gmail.com> (raw)
In-Reply-To: <20250610172931.4c730f04@elisabeth>
On Tue, Jun 10, 2025 at 5:29 PM Stefano Brivio <sbrivio@redhat.com> wrote:
>
> [Adding Paul as Podman developer]
>
> On Mon, 9 Jun 2025 11:59:21 +0200
> Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> > On Fri, Jun 6, 2025 at 6:37 PM Stefano Brivio <sbrivio@redhat.com> wrote:
> > >
> > > On Fri, 6 Jun 2025 16:32:38 +0200
> > > Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > >
> > > > On Wed, May 21, 2025 at 12:35 PM Eugenio Perez Martin
> > > > <eperezma@redhat.com> wrote:
> > > > >
> > > > > On Wed, May 21, 2025 at 12:09 PM Stefano Brivio <sbrivio@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, 20 May 2025 17:09:44 +0200
> > > > > > Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > > > >
> > > > > > > [...]
> > > > > > >
> > > > > > > Now if I isolate the vhost kernel thread [1] I get way more
> > > > > > > performance as expected:
> > > > > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > > > > [ ID] Interval Transfer Bitrate Retr
> > > > > > > [ 5] 0.00-10.00 sec 43.1 GBytes 37.1 Gbits/sec 0 sender
> > > > > > > [ 5] 0.00-10.04 sec 43.1 GBytes 36.9 Gbits/sec receiver
> > > > > > >
> > > > > > > After analyzing perf output, rep_movs_alternative is the most called
> > > > > > > function in the three iperf3 (~20%Self), passt.avx2 (~15%Self) and
> > > > > > > vhost (~15%Self)
> > > > > >
> > > > > > Interesting... s/most called function/function using the most cycles/, I
> > > > > > suppose.
> > > > > >
> > > > >
> > > > > Right!
> > > > >
> > > > > > So it looks somewhat similar to
> > > > > >
> > > > > > https://archives.passt.top/passt-dev/20241017021027.2ac9ea53@elisabeth/
> > > > > >
> > > > > > now?
> > > > > >
> > > > >
> > > > > Kind of. Below tcp_sendmsg_locked I don't see sk_page_frag_refill but
> > > > > skb_do_copy_data_nocache. Not sure if that means something, as it
> > > > > should not be affected by vhost.
> > > > >
> > > > > > > But I don't see any of them consuming 100% of CPU in
> > > > > > > top: pasta consumes ~85% %CPU, both iperf3 client and server consumes
> > > > > > > 60%, and vhost consumes ~53%.
> > > > > > >
> > > > > > > So... I have mixed feelings about this :). By "default" it seems to
> > > > > > > have less performance, but my test is maybe too synthetic.
> > > > > >
> > > > > > Well, surely we can't ask Podman users to pin specific stuff to given
> > > > > > CPU threads. :)
> > > > > >
> > > > >
> > > > > Yes but maybe the result changes under the right schedule? I'm
> > > > > isolating the CPUs entirely, which is not the usual case for pasta for
> > > > > sure :).
> > > > >
> > > > > > > There is room for improvement with the mentioned optimizations so I'd
> > > > > > > continue applying them, continuing with UDP and TCP zerocopy, and
> > > > > > > developing zerocopy vhost rx.
> > > > > >
> > > > > > That definitely makes sense to me.
> > > > > >
> > > > >
> > > > > Good!
> > > > >
> > > > > > > With these numbers I think the series should not be
> > > > > > > merged at the moment. I could send it as RFC if you want but I've not
> > > > > > > applied the comments the first one received, POC style :).
> > > > > >
> > > > > > I don't think it's really needed for you to spend time on
> > > > > > semi-polishing something just to have an RFC if you're still working on
> > > > > > it. I guess the implementation will change substantially anyway once
> > > > > > you factor in further optimisations.
> > > > > >
> > > > >
> > > > > Agree! I'll keep iterating on this then.
> > > > >
> > > >
> > > > Actually, if I remove all the taskset etc, and trust the kernel
> > > > scheduler, vanilla pasta gives me:
> > > > [pasta@virtlab716 ~]$ /home/passt/pasta --config-net iperf3 -c 10.6.68.254 -w 8M
> > > > Connecting to host 10.6.68.254, port 5201
> > > > [ 5] local 10.6.68.20 port 40408 connected to 10.6.68.254 port 5201
> > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > [ 5] 0.00-1.00 sec 3.11 GBytes 26.7 Gbits/sec 0 25.4 MBytes
> > > > [ 5] 1.00-2.00 sec 3.11 GBytes 26.7 Gbits/sec 0 25.4 MBytes
> > > > [ 5] 2.00-3.00 sec 3.12 GBytes 26.8 Gbits/sec 0 25.4 MBytes
> > > > [ 5] 3.00-4.00 sec 3.11 GBytes 26.7 Gbits/sec 0 25.4 MBytes
> > > > [ 5] 4.00-5.00 sec 3.10 GBytes 26.6 Gbits/sec 0 25.4 MBytes
> > > > [ 5] 5.00-6.00 sec 3.11 GBytes 26.7 Gbits/sec 0 25.4 MBytes
> > > > [ 5] 6.00-7.00 sec 3.11 GBytes 26.7 Gbits/sec 0 25.4 MBytes
> > > > [ 5] 7.00-8.00 sec 3.09 GBytes 26.6 Gbits/sec 0 25.4 MBytes
> > > > [ 5] 8.00-9.00 sec 3.08 GBytes 26.5 Gbits/sec 0 25.4 MBytes
> > > > [ 5] 9.00-10.00 sec 3.10 GBytes 26.6 Gbits/sec 0 25.4 MBytes
> > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > [ ID] Interval Transfer Bitrate Retr
> > > > [ 5] 0.00-10.00 sec 31.0 GBytes 26.7 Gbits/sec 0 sender
> > > > [ 5] 0.00-10.04 sec 31.0 GBytes 26.5 Gbits/sec receiver
> > > >
> > > > And with vhost-net :
> > > > [pasta@virtlab716 ~]$ /home/passt/pasta --config-net iperf3 -c 10.6.68.254 -w 8M
> > > > ...
> > > > Connecting to host 10.6.68.254, port 5201
> > > > [ 5] local 10.6.68.20 port 46720 connected to 10.6.68.254 port 5201
> > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > [ 5] 0.00-1.00 sec 4.17 GBytes 35.8 Gbits/sec 0 11.9 MBytes
> > > > [ 5] 1.00-2.00 sec 4.17 GBytes 35.9 Gbits/sec 0 11.9 MBytes
> > > > [ 5] 2.00-3.00 sec 4.16 GBytes 35.7 Gbits/sec 0 11.9 MBytes
> > > > [ 5] 3.00-4.00 sec 4.14 GBytes 35.6 Gbits/sec 0 11.9 MBytes
> > > > [ 5] 4.00-5.00 sec 4.16 GBytes 35.7 Gbits/sec 0 11.9 MBytes
> > > > [ 5] 5.00-6.00 sec 4.16 GBytes 35.8 Gbits/sec 0 11.9 MBytes
> > > > [ 5] 6.00-7.00 sec 4.18 GBytes 35.9 Gbits/sec 0 11.9 MBytes
> > > > [ 5] 7.00-8.00 sec 4.19 GBytes 35.9 Gbits/sec 0 11.9 MBytes
> > > > [ 5] 8.00-9.00 sec 4.18 GBytes 35.9 Gbits/sec 0 11.9 MBytes
> > > > [ 5] 9.00-10.00 sec 4.18 GBytes 35.9 Gbits/sec 0 11.9 MBytes
> > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > [ ID] Interval Transfer Bitrate Retr
> > > > [ 5] 0.00-10.00 sec 41.7 GBytes 35.8 Gbits/sec 0 sender
> > > > [ 5] 0.00-10.04 sec 41.7 GBytes 35.7 Gbits/sec receiver
> > > >
> > > > If I go the extra mile and disable notifications (it might be just
> > > > noise, but...)
> > > > [pasta@virtlab716 ~]$ /home/passt/pasta --config-net iperf3 -c 10.6.68.254 -w 8M
> > > > ...
> > > > Connecting to host 10.6.68.254, port 5201
> > > > [ 5] local 10.6.68.20 port 56590 connected to 10.6.68.254 port 5201
> > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > [ 5] 0.00-1.00 sec 4.19 GBytes 36.0 Gbits/sec 0 12.4 MBytes
> > > > [ 5] 1.00-2.00 sec 4.18 GBytes 35.9 Gbits/sec 0 12.4 MBytes
> > > > [ 5] 2.00-3.00 sec 4.18 GBytes 35.9 Gbits/sec 0 12.4 MBytes
> > > > [ 5] 3.00-4.00 sec 4.20 GBytes 36.1 Gbits/sec 0 12.4 MBytes
> > > > [ 5] 4.00-5.00 sec 4.21 GBytes 36.2 Gbits/sec 0 12.4 MBytes
> > > > [ 5] 5.00-6.00 sec 4.21 GBytes 36.1 Gbits/sec 0 12.4 MBytes
> > > > [ 5] 6.00-7.00 sec 4.20 GBytes 36.1 Gbits/sec 0 12.4 MBytes
> > > > [ 5] 7.00-8.00 sec 4.23 GBytes 36.4 Gbits/sec 0 12.4 MBytes
> > > > [ 5] 8.00-9.00 sec 4.24 GBytes 36.4 Gbits/sec 0 12.4 MBytes
> > > > [ 5] 9.00-10.00 sec 4.21 GBytes 36.2 Gbits/sec 0 12.4 MBytes
> > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > [ ID] Interval Transfer Bitrate Retr
> > > > [ 5] 0.00-10.00 sec 42.1 GBytes 36.1 Gbits/sec 0 sender
> > > > [ 5] 0.00-10.04 sec 42.1 GBytes 36.0 Gbits/sec receiver
> > > >
> > > > So I guess the best is to actually run performance tests closer to
> > > > real-world workload against the new version and see if it works
> > > > better?
> > >
> > > Well, that's certainly a possibility.
> > >
> > > I'd say the biggest value for vhost-net usage in pasta is reaching
> > > throughput figures that are comparable with veth, with or without
> > > multithreading (keeping an eye on bytes per cycle, of course), with or
> > > without kernel changes, so that users won't need to choose between
> > > rootless and performance anymore.
> > >
> > > It would also simplify things in Podman quite a lot (and to some extent
> > > in rootlesskit / Docker as well). We're pretty much there with virtual
> > > machines, just not quite with containers (which is somewhat ironic, but
> > > of course there's a good reason for that).
> > >
> > > If we're clearly wasting cycles in vhost-net (because of the bounce
> > > buffer, plus something else perhaps?) *and* there's a somewhat possible
> > > solution for that in sight *and* the interface would change anyway,
> > > running throughput tests and polishing up the current version with a
> > > half-baked solution at the moment sounds a bit wasteful to me.
> >
> > My point is that I'm testing a very synthetic scenario. If everybody
> > agree this is close enough to real world ones, I'm ok to continue
> > improving the edges we see. If not, maybe we're picking the wrong
> > fruit even if it is low hand?
> >
> > Getting a table like [1] would give us light about this, especially if
> > it is just a matter of running "make performance" or similar. Maybe we
> > need to include longer queues? Focus on a given scenario? UDP goes
> > better but TCP?
>
> Well, it's a matter of running ./run under tests (or 'make' there).
> Have you tried that with your patch? It's kind of representative in the
> sense that it uses several message sizes and different values for the
> sending window.
>
Yes but it freezes in my env.
Copying the different windows, top-left:
# tail -f --retry /home/passt/test/test_logs/context_unshare.log
/home/passt/test/test_logs/context_ns.log
tail: warning: --retry only effective for the initial open
==> /home/passt/test/test_logs/context_unshare.log <==
unshare$ tail: cannot open '/home/passt/test/test_logs/context_ns.log'
for reading: No such file or directory
---
top-right:
# while cat /tmp/passt-tests-7HpbEm/log_pipe; do :; done
Test layout: single pasta instance with namespace.
---
bottom-left:
# tail -f --retry /home/passt/test/test_logs/context_host.log
tail: warning: --retry only effective for the initial open
host$
---
bottom-right:
# tail -f --ontext_passt.logt/test/test_logs/c
tail: warning: --retry only effective for the initial open
passt$
---
And test/test_logs/test.log:
=== build/all
> Build passt
? ! [ -e passt ]
? [ -f passt ]
...passed.
> Build pasta
? ! [ -e pasta ]
? [ -h pasta ]
...passed.
> Build qrap
? ! [ -e qrap ]
? [ -f qrap ]
...passed.
> Build all
? ! [ -e passt ]
? ! [ -e pasta ]
? ! [ -e qrap ]
? [ -f passt ]
? [ -h pasta ]
? [ -f qrap ]
...passed.
> Install
? [ -f /tmp/passt-tests-7HpbEm/build/all/prefix/bin/passt ]
? [ -h /tmp/passt-tests-7HpbEm/build/all/prefix/bin/pasta ]
? [ -f /tmp/passt-tests-7HpbEm/build/all/prefix/bin/qrap ]
? man -M /tmp/passt-tests-7HpbEm/build/all/prefix/share/man -W passt
? man -M /tmp/passt-tests-7HpbEm/build/all/prefix/share/man -W pasta
? man -M /tmp/passt-tests-7HpbEm/build/all/prefix/share/man -W qrap
...passed.
> Uninstall
? ! [ -f /tmp/passt-tests-7HpbEm/build/all/prefix/bin/passt ]
? ! [ -h /tmp/passt-tests-7HpbEm/build/all/prefix/bin/pasta ]
? ! [ -f /tmp/passt-tests-7HpbEm/build/all/prefix/bin/qrap ]
? ! man -M /tmp/passt-tests-7HpbEm/build/all/prefix/share/man -W passt
2>/dev/null
? ! man -M /tmp/passt-tests-7HpbEm/build/all/prefix/share/man -W pasta
2>/dev/null
? ! man -M /tmp/passt-tests-7HpbEm/build/all/prefix/share/man -W qrap
2>/dev/null
...passed.
=== build/cppcheck
...skipped.
=== build/clang_tidy
...skipped.
---
> > Now more points about this scenario:
> > 1) I don't see 100% CPU usage in any element:
> > CPU%
> > 84.2 passt.avx2
> > 57.9 iperf3
> > 57.2 iperf3
> > 50.7 vhost-1805109
>
> Still, I bet we're using an awful amount of cycles compared to veth.
>
> > 2) The most used (Self%) function in vhost is rep_movs_alternative,
> > called from skb_copy_datagram_iter, so yes, ZeroCopy should help a lot
> > here.
> >
> > Now, is "iperf3 -w 8M" representative? I'm sure ZC helps in this
> > scenario, does it make it worse if we have small packets? Do we care?
>
> We don't care _a lot_ about small packets because we can typically use
> large packets, inbound and outbound, at least for TCP (bulk) transfers.
> But users are doing all sort of things with containers, including bulk
> transfers and VPN traffic over UDP, so we do, a bit.
>
> Again, the main value of using vhost-net, I think, is making "rootful"
> networking essentially unnecessary, or necessary just for niche use
> cases (say, non-TCP, non-UDP traffic, or macvlan-like cases). If there
> are relatively common use cases where pasta performs pretty bad
> compared to veth, we'll still need rootful networking.
>
> So, yes, it is representative, but not necessarily universal.
>
> > I'm totally ok with continuing trying with ZC, I just want to make
> > sure we're not missing anything :).
>
> In any case, it looks like vhost-net zero-copy is a bigger task than we
> thought, so, even if we don't reach a universal solution that makes
> rootful networking essentially unnecessary, but we have a big
> improvement ready, there's of course a lot of value in it. Your call...
>
Got it, thanks!
next prev parent reply other threads:[~2025-06-11 7:05 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-20 15:09 vhost-kernel net on pasta: from 26 to 37Gbit/s Eugenio Perez Martin
2025-05-21 0:57 ` Jason Wang
2025-05-21 5:37 ` Eugenio Perez Martin
2025-05-21 10:08 ` Stefano Brivio
2025-05-21 10:35 ` Eugenio Perez Martin
2025-06-06 14:32 ` Eugenio Perez Martin
2025-06-06 16:37 ` Stefano Brivio
2025-06-09 9:59 ` Eugenio Perez Martin
2025-06-10 15:29 ` Stefano Brivio
2025-06-11 7:04 ` Eugenio Perez Martin [this message]
2025-06-11 8:08 ` Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJaqyWcWMhSfrajNpVwpRMda3x5Y62QvrpWRRRXrVrR9gx78nQ@mail.gmail.com \
--to=eperezma@redhat.com \
--cc=jasowang@redhat.com \
--cc=jenelson@redhat.com \
--cc=passt-dev@passt.top \
--cc=pholzing@redhat.com \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).