public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: Laurent Vivier <lvivier@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v7 0/8] Add vhost-user support to passt. (part 3)
Date: Thu, 10 Oct 2024 09:08:01 +0200	[thread overview]
Message-ID: <20241010090801.23da8bff@elisabeth> (raw)
In-Reply-To: <20241009193726.7e2e5790@elisabeth>

On Wed, 9 Oct 2024 19:37:26 +0200
Stefano Brivio <sbrivio@redhat.com> wrote:

> On Wed,  9 Oct 2024 11:07:07 +0200
> Laurent Vivier <lvivier@redhat.com> wrote:
> 
> > This series of patches adds vhost-user support to passt
> > and then allows passt to connect to QEMU network backend using
> > virtqueue rather than a socket.
> > 
> > With QEMU, rather than using to connect:
> > 
> >   -netdev stream,id=s,server=off,addr.type=unix,addr.path=/tmp/passt_1.socket
> > 
> > we will use:
> > 
> >   -chardev socket,id=chr0,path=/tmp/passt_1.socket
> >   -netdev vhost-user,id=netdev0,chardev=chr0
> >   -device virtio-net,netdev=netdev0
> >   -object memory-backend-memfd,id=memfd0,share=on,size=$RAMSIZE
> >   -numa node,memdev=memfd0
> > 
> > The memory backend is needed to share data between passt and QEMU.
> > 
> > Performance comparison between "-netdev stream" and "-netdev vhost-user":  
> 
> On my setup, with a few tweaks (don't ask me why... we should figure
> out eventually):

...I had a closer look with perf(1).

For outbound traffic (I checked with IPv6), a reasonably expanded
output (20 seconds of iperf3 with those parameters):

--
Samples: 80K of event 'cycles', Event count (approx.): 81191130978
  Children      Self  Command     Shared Object         Symbol
-   89.20%     0.07%  passt.avx2  [kernel.kallsyms]     [k] entry_SYSCALL_64_after_hwframe ▒
     89.13% entry_SYSCALL_64_after_hwframe                                                 ◆
      - do_syscall_64                                                                      ▒
         - 69.37% __sys_recvmsg                                                            ▒
            - 69.26% ___sys_recvmsg                                                        ▒
               - 68.11% ____sys_recvmsg                                                    ▒
                  - 67.80% inet6_recvmsg                                                   ▒
                     - tcp_recvmsg                                                         ▒
                        - 67.56% tcp_recvmsg_locked                                        ▒
                           - 64.14% skb_copy_datagram_iter                                 ▒
                              - __skb_datagram_iter                                        ▒
                                 - 51.42% _copy_to_iter                                    ▒
                                      48.74% copy_user_generic_string                      ▒
                                 - 6.96% simple_copy_to_iter                               ▒
                                    + __check_object_size                                  ▒
                           + 1.09% __tcp_transmit_skb                                      ▒
                           + 0.75% tcp_rcv_space_adjust                                    ▒
                           + 0.61% __tcp_cleanup_rbuf                                      ▒
               + 1.01% copy_msghdr_from_user                                               ▒
         - 16.60% __x64_sys_recvfrom                                                       ▒
            - __sys_recvfrom                                                               ▒
               - 16.43% inet6_recvmsg                                                      ▒
                  - tcp_recvmsg                                                            ▒
                     - 14.27% tcp_recvmsg_locked                                           ▒
                        - 12.67% __tcp_transmit_skb                                        ▒
                           - 12.64% __ip_queue_xmit                                        ▒
                              - 12.56% ip_finish_output2                                   ▒
                                 - 12.51% __local_bh_enable_ip                             ▒
                                    - do_softirq.part.0                                    ▒
                                       - 12.50% __softirqentry_text_start                  ▒
                                          - 12.49% net_rx_action                           ▒
                                             - 12.47% __napi_poll                          ▒
                                                - process_backlog                          ▒
                                                   - 12.41% __netif_receive_skb_one_core   ▒
                                                      - 9.75% ip_local_deliver_finish      ▒
                                                         - 9.52% ip_protocol_deliver_rcu   ▒
                                                            + 9.28% tcp_v4_rcv             ▒
                                                      - 2.02% ip_local_deliver             ▒
                                                         - 1.92% nf_hook_slow              ▒
                                                            + 1.84% nft_do_chain_ipv4      ▒
                          0.70% __sk_mem_reduce_allocated                                  ▒
                     + 2.10% release_sock                                                  ▒
         + 1.16% __x64_sys_timerfd_settime                                                 ▒
         + 0.56% ksys_write                                                                ▒
         + 0.54% __x64_sys_epoll_wait                                                      ▒
+   89.13%     0.06%  passt.avx2  [kernel.kallsyms]     [k] do_syscall_64                  ▒
+   84.23%     0.02%  passt.avx2  [kernel.kallsyms]     [k] inet6_recvmsg                  ▒
+   84.21%     0.04%  passt.avx2  [kernel.kallsyms]     [k] tcp_recvmsg                    ▒
+   81.84%     0.97%  passt.avx2  [kernel.kallsyms]     [k] tcp_recvmsg_locked             ▒
+   74.96%     0.00%  passt.avx2  [unknown]             [k] 0000000000000000               ▒
+   69.78%     0.07%  passt.avx2  libc.so.6             [.] __libc_recvmsg                 ▒
+   69.37%     0.02%  passt.avx2  [kernel.kallsyms]     [k] __sys_recvmsg                  ▒
+   69.26%     0.03%  passt.avx2  [kernel.kallsyms]     [k] ___sys_recvmsg                 ▒
+   68.11%     0.12%  passt.avx2  [kernel.kallsyms]     [k] ____sys_recvmsg                ▒
+   64.14%     0.06%  passt.avx2  [kernel.kallsyms]     [k] skb_copy_datagram_iter         ▒
+   64.08%     5.60%  passt.avx2  [kernel.kallsyms]     [k] __skb_datagram_iter            ▒
+   51.44%     2.68%  passt.avx2  [kernel.kallsyms]     [k] _copy_to_iter                  ▒
+   49.16%    49.08%  passt.avx2  [kernel.kallsyms]     [k] copy_user_generic_string       ◆
+   16.84%     0.00%  passt.avx2  [unknown]             [k] 0xffff000000000000             ▒
+   16.77%     0.02%  passt.avx2  libc.so.6             [.] __libc_recv                    ▒
+   16.60%     0.00%  passt.avx2  [kernel.kallsyms]     [k] __x64_sys_recvfrom             ▒
+   16.60%     0.07%  passt.avx2  [kernel.kallsyms]     [k] __sys_recvfrom                 ▒
+   13.81%     1.28%  passt.avx2  [kernel.kallsyms]     [k] __tcp_transmit_skb             ▒
+   13.76%     0.06%  passt.avx2  [kernel.kallsyms]     [k] __ip_queue_xmit                ▒
+   13.64%     0.10%  passt.avx2  [kernel.kallsyms]     [k] ip_finish_output2              ▒
+   13.62%     0.14%  passt.avx2  [kernel.kallsyms]     [k] __local_bh_enable_ip           ▒
+   13.57%     0.01%  passt.avx2  [kernel.kallsyms]     [k] __softirqentry_text_start      ▒
+   13.56%     0.01%  passt.avx2  [kernel.kallsyms]     [k] do_softirq.part.0              ▒
+   13.53%     0.01%  passt.avx2  [kernel.kallsyms]     [k] net_rx_action                  ▒
+   13.51%     0.00%  passt.avx2  [kernel.kallsyms]     [k] __napi_poll                    ▒
+   13.51%     0.04%  passt.avx2  [kernel.kallsyms]     [k] process_backlog                ▒
+   13.45%     0.02%  passt.avx2  [kernel.kallsyms]     [k] __netif_receive_skb_one_core   ▒
+   11.27%     0.04%  passt.avx2  [kernel.kallsyms]     [k] tcp_v4_do_rcv                  ▒
+   10.96%     0.06%  passt.avx2  [kernel.kallsyms]     [k] tcp_rcv_established            ▒
+   10.74%     0.02%  passt.avx2  [kernel.kallsyms]     [k] ip_local_deliver_finish        ▒
+   10.51%     0.04%  passt.avx2  [kernel.kallsyms]     [k] ip_protocol_deliver_rcu        ▒
+   10.26%     0.17%  passt.avx2  [kernel.kallsyms]     [k] tcp_v4_rcv                     ▒
+    8.14%     0.01%  passt.avx2  [kernel.kallsyms]     [k] __tcp_push_pending_frames      ▒
+    8.13%     0.73%  passt.avx2  [kernel.kallsyms]     [k] tcp_write_xmit                 ▒
+    6.96%     0.26%  passt.avx2  [kernel.kallsyms]     [k] simple_copy_to_iter            ▒
+    6.79%     4.72%  passt.avx2  [kernel.kallsyms]     [k] __check_object_size            ▒
+    3.33%     0.16%  passt.avx2  [kernel.kallsyms]     [k] nf_hook_slow                   ▒
+    3.09%     0.09%  passt.avx2  [nf_tables]           [k] nft_do_chain_ipv4              ▒
+    3.00%     2.28%  passt.avx2  [nf_tables]           [k] nft_do_chain                   ▒
+    2.85%     2.84%  passt.avx2  passt.avx2            [.] vu_init_elem                   ▒
+    2.22%     0.02%  passt.avx2  [kernel.kallsyms]     [k] release_sock                   ▒
+    2.15%     0.02%  passt.avx2  [kernel.kallsyms]     [k] __release_sock                 ▒
+    2.04%     0.08%  passt.avx2  [kernel.kallsyms]     [k] ip_local_deliver               ▒
+    1.80%     1.79%  passt.avx2  [kernel.kallsyms]     [k] __virt_addr_valid              ▒
+    1.57%     0.03%  passt.avx2  libc.so.6             [.] timerfd_settime                ▒
     1.53%     1.53%  passt.avx2  passt.avx2            [.] vu_queue_map_desc.isra.0       ▒
--

not much we can improve (and the throughput is anyway very close to
iperf3 to iperf3 on host's loopback, ~50 Gbps vs. ~70): the bulk of
it is copy_user_generic_string() reading from sockets into the queue
and related bookkeeping.

The only users of more than 1% of cycles are vu_init_elem() and
vu_queue_map_desc(), perhaps we could try to speed those up... one
day.

Full perf output (you can load it with perf -i ...), if you're curious,
at:

  https://passt.top/static/vu_tcp_ipv6_inbound.perf

For outbound traffic (I tried with IPv4), which is much slower for some
reason (~25 Gbps):

--
Samples: 79K of event 'cycles', Event count (approx.): 73661070737
  Children      Self  Command     Shared Object         Symbol
-   91.00%     0.23%  passt.avx2  [kernel.kallsyms]     [k] entry_SYSCALL_64_after_hwframe ◆
     90.78% entry_SYSCALL_64_after_hwframe                                                 ▒
      - do_syscall_64                                                                      ▒
         - 78.75% __sys_sendmsg                                                            ▒
            - 78.58% ___sys_sendmsg                                                        ▒
               - 78.06% ____sys_sendmsg                                                    ▒
                  - sock_sendmsg                                                           ▒
                     - 77.58% tcp_sendmsg                                                  ▒
                        - 68.63% tcp_sendmsg_locked                                        ▒
                           - 26.24% sk_page_frag_refill                                    ▒
                              - skb_page_frag_refill                                       ▒
                                 - 25.87% __alloc_pages                                    ▒
                                    - 25.61% get_page_from_freelist                        ▒
                                         24.51% clear_page_rep                             ▒
                           - 23.08% _copy_from_iter                                        ▒
                                22.88% copy_user_generic_string                            ▒
                           - 8.77% tcp_write_xmit                                          ▒
                              - 8.19% __tcp_transmit_skb                                   ▒
                                 - 7.86% __ip_queue_xmit                                   ▒
                                    - 7.13% ip_finish_output2                              ▒
                                       - 6.65% __local_bh_enable_ip                        ▒
                                          - 6.60% do_softirq.part.0                        ▒
                                             - 6.51% __softirqentry_text_start             ▒
                                                - 6.40% net_rx_action                      ▒
                                                   - 5.43% __napi_poll                     ▒
                                                      + process_backlog                    ▒
                                                     0.50% napi_consume_skb                ▒
                           + 5.39% __tcp_push_pending_frames                               ▒
                           + 2.03% tcp_stream_alloc_skb                                    ▒
                           + 1.48% tcp_wmem_schedule                                       ▒
                        + 8.58% release_sock                                               ▒
         - 4.57% ksys_write                                                                ▒
            - 4.41% vfs_write                                                              ▒
               - 3.96% eventfd_write                                                       ▒
                  - 3.46% __wake_up_common                                                 ▒
                     - irqfd_wakeup                                                        ▒
                        - 3.15% kvm_arch_set_irq_inatomic                                  ▒
                           - 3.11% kvm_irq_delivery_to_apic_fast                           ▒
                              - 2.01% __apic_accept_irq                                    ▒
                                   0.93% svm_complete_interrupt_delivery                   ▒
         + 3.91% __x64_sys_epoll_wait                                                      ▒
         + 1.20% __x64_sys_getsockopt                                                      ▒
         + 0.78% syscall_trace_enter.constprop.0                                           ▒
           0.71% syscall_exit_to_user_mode                                                 ▒
         + 0.61% ksys_read                                                                 ▒
--

...there are no users of more than 1% cycles in passt itself. The bulk of
it is sendmsg() as expected, one notable thing is that the kernel spends
an awful amount of cycles zeroing pages so that we can fill them. I looked
into that "issue" a long time ago,

  https://github.com/netoptimizer/prototype-kernel/pull/39/commits/2c8223c30d7f280a9e456d8e690adb0869ed8c5c

...maybe I can try out a kernel with a version of that as
clear_page_rep() and see what happens. Anyway, same here, I don't
see anything we can really improve in passt.

Full output at: https://passt.top/static/vu_tcp_ipv4_outbound.perf

-- 
Stefano


  reply	other threads:[~2024-10-10  7:08 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-09  9:07 [PATCH v7 0/8] Add vhost-user support to passt. (part 3) Laurent Vivier
2024-10-09  9:07 ` [PATCH v7 1/8] packet: replace struct desc by struct iovec Laurent Vivier
2024-10-09  9:07 ` [PATCH v7 2/8] vhost-user: introduce virtio API Laurent Vivier
2024-10-09  9:07 ` [PATCH v7 3/8] vhost-user: introduce vhost-user API Laurent Vivier
2024-10-09  9:07 ` [PATCH v7 4/8] udp: Prepare udp.c to be shared with vhost-user Laurent Vivier
2024-10-09  9:07 ` [PATCH v7 5/8] tcp: Export headers functions Laurent Vivier
2024-10-09  9:07 ` [PATCH v7 6/8] passt: rename tap_sock_init() to tap_backend_init() Laurent Vivier
2024-10-09  9:07 ` [PATCH v7 7/8] vhost-user: add vhost-user Laurent Vivier
2024-10-09  9:07 ` [PATCH v7 8/8] test: Add tests for passt in vhost-user mode Laurent Vivier
2024-10-09 13:07 ` [PATCH v7 0/8] Add vhost-user support to passt. (part 3) Stefano Brivio
2024-10-09 14:50   ` Laurent Vivier
2024-10-09 17:37 ` Stefano Brivio
2024-10-10  7:08   ` Stefano Brivio [this message]
2024-10-10  7:43     ` Laurent Vivier
2024-10-10  7:45     ` Laurent Vivier
2024-10-10  7:52       ` Stefano Brivio
2024-10-11 18:07     ` Stefano Brivio
2024-10-17  0:10       ` Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241010090801.23da8bff@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).