Testing vhost-user with "virtio-net: tweak for better TX performance in NAPI mode"

public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed

From: Laurent Vivier <lvivier@redhat.com>
To: passt-dev@passt.top
Subject: Testing vhost-user with "virtio-net: tweak for better TX performance in NAPI mode"
Date: Tue, 18 Mar 2025 12:07:09 +0100	[thread overview]
Message-ID: <2f3fa74f-523d-4c03-bffb-8d154c40261b@redhat.com> (raw)

Hi,

as reported by Stefano there is an asymmetry in the throughput between host and guest with 
vhost-user.

I've tested the following kernel patch from Jason to see if it can improve the performance:

------------------------------------------------------------------------------
commit e13b6da7045f997e1a5a5efd61d40e63c4fc20e8
Author: Jason Wang <jasowang@redhat.com>
Date:   Tue Feb 18 10:39:08 2025 +0800

     virtio-net: tweak for better TX performance in NAPI mode

     There are several issues existed in start_xmit():

     - Transmitted packets need to be freed before sending a packet, this
       introduces delay and increases the average packets transmit
       time. This also increase the time that spent in holding the TX lock.
     - Notification is enabled after free_old_xmit_skbs() which will
       introduce unnecessary interrupts if TX notification happens on the
       same CPU that is doing the transmission now (actually, virtio-net
       driver are optimized for this case).

     So this patch tries to avoid those issues by not cleaning transmitted
     packets in start_xmit() when TX NAPI is enabled and disable
     notifications even more aggressively. Notification will be since the
     beginning of the start_xmit(). But we can't enable delayed
     notification after TX is stopped as we will lose the
     notifications. Instead, the delayed notification needs is enabled
     after the virtqueue is kicked for best performance.

     Performance numbers:

     1) single queue 2 vcpus guest with pktgen_sample03_burst_single_flow.sh
        (burst 256) + testpmd (rxonly) on the host:

     - When pinning TX IRQ to pktgen VCPU: split virtqueue PPS were
       increased 55% from 6.89 Mpps to 10.7 Mpps and 32% TX interrupts were
       eliminated. Packed virtqueue PPS were increased 50% from 7.09 Mpps to
       10.7 Mpps, 99% TX interrupts were eliminated.

     - When pinning TX IRQ to VCPU other than pktgen: split virtqueue PPS
       were increased 96% from 5.29 Mpps to 10.4 Mpps and 45% TX interrupts
       were eliminated; Packed virtqueue PPS were increased 78% from 6.12
       Mpps to 10.9 Mpps and 99% TX interrupts were eliminated.

     2) single queue 1 vcpu guest + vhost-net/TAP on the host: single
        session netperf from guest to host shows 82% improvement from
        31Gb/s to 58Gb/s, %stddev were reduced from 34.5% to 1.9% and 88%
        of TX interrupts were eliminated.

     Signed-off-by: Jason Wang <jasowang@redhat.com>
     Acked-by: Michael S. Tsirkin <mst@redhat.com>
     Signed-off-by: David S. Miller <davem@davemloft.net>
------------------------------------------------------------------------------

systemctl stop firewalld.service || service iptables stop || iptables -Ft
/sbin/sysctl -w net.core.rmem_max=536870912
/sbin/sysctl -w net.core.wmem_max=536870912

____ I made my test using 6.14-rc7 kernel:

   From guest:

   iperf3 -c 10.6.68.254  -P2 -Z -t5  -l 1M -w 16M
   [SUM]   0.00-5.00   sec  14.5 GBytes  24.9 Gbits/sec    0             sender
   [SUM]   0.00-5.00   sec  14.5 GBytes  24.9 Gbits/sec                  receiver

   From host:

   iperf3 -c localhost -P2 -Z -t5  -p 10001 -l 1M -w 16M
   [SUM]   0.00-5.00   sec  28.9 GBytes  49.6 Gbits/sec    0             sender
   [SUM]   0.00-5.03   sec  28.8 GBytes  49.2 Gbits/sec                  receiver

____ The results with a 6.14-rc7 + e13b6da7045f:

   From guest:

   iperf3 -c 10.6.68.254  -P2 -Z -t5  -l 1M -w 16M
   [SUM]   0.00-5.00   sec  14.8 GBytes  25.4 Gbits/sec    0             sender
   [SUM]   0.00-5.01   sec  14.8 GBytes  25.4 Gbits/sec                  receiver

   From host:

   iperf3 -c localhost -P2 -Z -t5  -p 10001 -l 1M -w 16M
   [SUM]   0.00-5.00   sec  28.5 GBytes  48.9 Gbits/sec    0             sender
   [SUM]   0.00-5.03   sec  28.4 GBytes  48.6 Gbits/sec                  receiver

We have only a 2% improvement.

Thanks,
Laurent

next             reply	other threads:[~2025-03-18 11:07 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-18 11:07 Laurent Vivier [this message]
2025-03-18 16:33 ` Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2f3fa74f-523d-4c03-bffb-8d154c40261b@redhat.com \
    --to=lvivier@redhat.com \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).