* Testing vhost-user with "virtio-net: tweak for better TX performance in NAPI mode"
@ 2025-03-18 11:07 Laurent Vivier
2025-03-18 16:33 ` Stefano Brivio
0 siblings, 1 reply; 2+ messages in thread
From: Laurent Vivier @ 2025-03-18 11:07 UTC (permalink / raw)
To: passt-dev
Hi,
as reported by Stefano there is an asymmetry in the throughput between host and guest with
vhost-user.
I've tested the following kernel patch from Jason to see if it can improve the performance:
------------------------------------------------------------------------------
commit e13b6da7045f997e1a5a5efd61d40e63c4fc20e8
Author: Jason Wang <jasowang@redhat.com>
Date: Tue Feb 18 10:39:08 2025 +0800
virtio-net: tweak for better TX performance in NAPI mode
There are several issues existed in start_xmit():
- Transmitted packets need to be freed before sending a packet, this
introduces delay and increases the average packets transmit
time. This also increase the time that spent in holding the TX lock.
- Notification is enabled after free_old_xmit_skbs() which will
introduce unnecessary interrupts if TX notification happens on the
same CPU that is doing the transmission now (actually, virtio-net
driver are optimized for this case).
So this patch tries to avoid those issues by not cleaning transmitted
packets in start_xmit() when TX NAPI is enabled and disable
notifications even more aggressively. Notification will be since the
beginning of the start_xmit(). But we can't enable delayed
notification after TX is stopped as we will lose the
notifications. Instead, the delayed notification needs is enabled
after the virtqueue is kicked for best performance.
Performance numbers:
1) single queue 2 vcpus guest with pktgen_sample03_burst_single_flow.sh
(burst 256) + testpmd (rxonly) on the host:
- When pinning TX IRQ to pktgen VCPU: split virtqueue PPS were
increased 55% from 6.89 Mpps to 10.7 Mpps and 32% TX interrupts were
eliminated. Packed virtqueue PPS were increased 50% from 7.09 Mpps to
10.7 Mpps, 99% TX interrupts were eliminated.
- When pinning TX IRQ to VCPU other than pktgen: split virtqueue PPS
were increased 96% from 5.29 Mpps to 10.4 Mpps and 45% TX interrupts
were eliminated; Packed virtqueue PPS were increased 78% from 6.12
Mpps to 10.9 Mpps and 99% TX interrupts were eliminated.
2) single queue 1 vcpu guest + vhost-net/TAP on the host: single
session netperf from guest to host shows 82% improvement from
31Gb/s to 58Gb/s, %stddev were reduced from 34.5% to 1.9% and 88%
of TX interrupts were eliminated.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
------------------------------------------------------------------------------
systemctl stop firewalld.service || service iptables stop || iptables -Ft
/sbin/sysctl -w net.core.rmem_max=536870912
/sbin/sysctl -w net.core.wmem_max=536870912
____ I made my test using 6.14-rc7 kernel:
From guest:
iperf3 -c 10.6.68.254 -P2 -Z -t5 -l 1M -w 16M
[SUM] 0.00-5.00 sec 14.5 GBytes 24.9 Gbits/sec 0 sender
[SUM] 0.00-5.00 sec 14.5 GBytes 24.9 Gbits/sec receiver
From host:
iperf3 -c localhost -P2 -Z -t5 -p 10001 -l 1M -w 16M
[SUM] 0.00-5.00 sec 28.9 GBytes 49.6 Gbits/sec 0 sender
[SUM] 0.00-5.03 sec 28.8 GBytes 49.2 Gbits/sec receiver
____ The results with a 6.14-rc7 + e13b6da7045f:
From guest:
iperf3 -c 10.6.68.254 -P2 -Z -t5 -l 1M -w 16M
[SUM] 0.00-5.00 sec 14.8 GBytes 25.4 Gbits/sec 0 sender
[SUM] 0.00-5.01 sec 14.8 GBytes 25.4 Gbits/sec receiver
From host:
iperf3 -c localhost -P2 -Z -t5 -p 10001 -l 1M -w 16M
[SUM] 0.00-5.00 sec 28.5 GBytes 48.9 Gbits/sec 0 sender
[SUM] 0.00-5.03 sec 28.4 GBytes 48.6 Gbits/sec receiver
We have only a 2% improvement.
Thanks,
Laurent
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Testing vhost-user with "virtio-net: tweak for better TX performance in NAPI mode"
2025-03-18 11:07 Testing vhost-user with "virtio-net: tweak for better TX performance in NAPI mode" Laurent Vivier
@ 2025-03-18 16:33 ` Stefano Brivio
0 siblings, 0 replies; 2+ messages in thread
From: Stefano Brivio @ 2025-03-18 16:33 UTC (permalink / raw)
To: Laurent Vivier; +Cc: passt-dev
On Tue, 18 Mar 2025 12:07:09 +0100
Laurent Vivier <lvivier@redhat.com> wrote:
> ____ The results with a 6.14-rc7 + e13b6da7045f:
Thanks for checking!
> From guest:
>
> iperf3 -c 10.6.68.254 -P2 -Z -t5 -l 1M -w 16M
> [SUM] 0.00-5.00 sec 14.8 GBytes 25.4 Gbits/sec 0 sender
> [SUM] 0.00-5.01 sec 14.8 GBytes 25.4 Gbits/sec receiver
>
> From host:
>
> iperf3 -c localhost -P2 -Z -t5 -p 10001 -l 1M -w 16M
> [SUM] 0.00-5.00 sec 28.5 GBytes 48.9 Gbits/sec 0 sender
> [SUM] 0.00-5.03 sec 28.4 GBytes 48.6 Gbits/sec receiver
>
> We have only a 2% improvement.
Ouch. :( Then there's something else... by the way, for reference, my
investigation back then stopped a bit after:
https://archives.passt.top/passt-dev/20241010090801.23da8bff@elisabeth/
that is, I tried zeroing pages "fast":
https://archives.passt.top/passt-dev/20241017021027.2ac9ea53@elisabeth/
but it didn't really change the asymmetry. I was getting the same
numbers you're getting now.
Whatever, I guess it's not so important, just... one day we should
figure it out. :)
--
Stefano
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-03-18 16:33 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-18 11:07 Testing vhost-user with "virtio-net: tweak for better TX performance in NAPI mode" Laurent Vivier
2025-03-18 16:33 ` Stefano Brivio
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).