From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=WVfekXm+; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTP id A733D5A004E for ; Thu, 10 Oct 2024 09:08:11 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1728544090; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+xAPsJxMn93X2ciZMPwkvaPtKy6uoSFXTdbPC6lvQjE=; b=WVfekXm+q+5e0XTry+8lw3hOpLQMmO4mWzPowoSOIk1TkpyWoct7kIxa5JihQ6DmVs3E7g BE5YMgppk+u5U+Hq6IeT7bY9jvVjye3FxjsKsVf9T3XlMdFHUdCIfZFqieGkwNMl1lnI8Z 1tD8ddocLo2oliwLDQDgpvxfi0WgU/A= Received: from mail-lf1-f70.google.com (mail-lf1-f70.google.com [209.85.167.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-621-xB5gGtvuM0awwmq9ID3lZw-1; Thu, 10 Oct 2024 03:08:09 -0400 X-MC-Unique: xB5gGtvuM0awwmq9ID3lZw-1 Received: by mail-lf1-f70.google.com with SMTP id 2adb3069b0e04-539ada1e190so349666e87.2 for ; Thu, 10 Oct 2024 00:08:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728544086; x=1729148886; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=+xAPsJxMn93X2ciZMPwkvaPtKy6uoSFXTdbPC6lvQjE=; b=MNfBLwDvdh4LuUKLRsWT2QWqLamdrAIZdZ1HnLD5d34scEriMr2g1bcQqsQG+6uvJJ SiYUJou2oUMpDss32a8H4iqNF/iABAcLnKYvqh/NM3vIBs70GHfd5RTjCttAf38ioJIf FDq61DY5lKpes8kllg0ZmoSO12D+6FhSZIwi6QlTWRhHoTocf+MxAM72eXNekK15OqfL N6/43dXCFXRIZt8pv/urhW0aEtWvh4wn7vYHpgsRRKSjvuaI8JcjBGtL7pvF8AfRpyiS uD/oLuT/0EUDG4ydh3g/XwiueRif4kOHkw9x3DO4yfp9Et4McxmQUURrUR4a0EgUBk+F oLUw== X-Gm-Message-State: AOJu0YzLQ3ZNbmduYuGdzd+r5FiqruKLcLx2AHyLvATgzNlh5aS7YoVn kwGjdAfDVMRA6/gC2X6sfV0oLj3ZfWou1f/883E/aIkLJnOjU01Y2DKc6Sx9Fy7itaf8QBSXK7F B9VFcnsaN7guE20iiBFvq+3YrOor9G4x1+bM+/QgsLILQ4L6fPNyxi9I+0gkCyP7n00CtPc+KAX GlijOtJxGSFj8s9MVipzBYSeIcppfQHKKl X-Received: by 2002:a05:6512:1385:b0:539:96a1:e4cf with SMTP id 2adb3069b0e04-539c48e2bbemr2888666e87.32.1728544086020; Thu, 10 Oct 2024 00:08:06 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH4lrMxWH/LVKRbhADq6gBihkQHQNn9C3QNnnwnx87g5lA9/8YtwdnovQcNulvTT6PzO3srnw== X-Received: by 2002:a05:6512:1385:b0:539:96a1:e4cf with SMTP id 2adb3069b0e04-539c48e2bbemr2888637e87.32.1728544085399; Thu, 10 Oct 2024 00:08:05 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-430d70b4331sm40071465e9.36.2024.10.10.00.08.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Oct 2024 00:08:04 -0700 (PDT) Date: Thu, 10 Oct 2024 09:08:01 +0200 From: Stefano Brivio To: Laurent Vivier Subject: Re: [PATCH v7 0/8] Add vhost-user support to passt. (part 3) Message-ID: <20241010090801.23da8bff@elisabeth> In-Reply-To: <20241009193726.7e2e5790@elisabeth> References: <20241009090716.691361-1-lvivier@redhat.com> <20241009193726.7e2e5790@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: QAZJEUWHOOW2K4GDCLDBDZTJUMF7YDUX X-Message-ID-Hash: QAZJEUWHOOW2K4GDCLDBDZTJUMF7YDUX X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Wed, 9 Oct 2024 19:37:26 +0200 Stefano Brivio wrote: > On Wed, 9 Oct 2024 11:07:07 +0200 > Laurent Vivier wrote: >=20 > > This series of patches adds vhost-user support to passt > > and then allows passt to connect to QEMU network backend using > > virtqueue rather than a socket. > >=20 > > With QEMU, rather than using to connect: > >=20 > > -netdev stream,id=3Ds,server=3Doff,addr.type=3Dunix,addr.path=3D/tmp/= passt_1.socket > >=20 > > we will use: > >=20 > > -chardev socket,id=3Dchr0,path=3D/tmp/passt_1.socket > > -netdev vhost-user,id=3Dnetdev0,chardev=3Dchr0 > > -device virtio-net,netdev=3Dnetdev0 > > -object memory-backend-memfd,id=3Dmemfd0,share=3Don,size=3D$RAMSIZE > > -numa node,memdev=3Dmemfd0 > >=20 > > The memory backend is needed to share data between passt and QEMU. > >=20 > > Performance comparison between "-netdev stream" and "-netdev vhost-user= ": =20 >=20 > On my setup, with a few tweaks (don't ask me why... we should figure > out eventually): ...I had a closer look with perf(1). For outbound traffic (I checked with IPv6), a reasonably expanded output (20 seconds of iperf3 with those parameters): -- Samples: 80K of event 'cycles', Event count (approx.): 81191130978 Children Self Command Shared Object Symbol - 89.20% 0.07% passt.avx2 [kernel.kallsyms] [k] entry_SYSCALL_6= 4_after_hwframe =E2=96=92 89.13% entry_SYSCALL_64_after_hwframe = =E2=97=86 - do_syscall_64 = =E2=96=92 - 69.37% __sys_recvmsg = =E2=96=92 - 69.26% ___sys_recvmsg = =E2=96=92 - 68.11% ____sys_recvmsg = =E2=96=92 - 67.80% inet6_recvmsg = =E2=96=92 - tcp_recvmsg = =E2=96=92 - 67.56% tcp_recvmsg_locked = =E2=96=92 - 64.14% skb_copy_datagram_iter = =E2=96=92 - __skb_datagram_iter = =E2=96=92 - 51.42% _copy_to_iter = =E2=96=92 48.74% copy_user_generic_string = =E2=96=92 - 6.96% simple_copy_to_iter = =E2=96=92 + __check_object_size = =E2=96=92 + 1.09% __tcp_transmit_skb = =E2=96=92 + 0.75% tcp_rcv_space_adjust = =E2=96=92 + 0.61% __tcp_cleanup_rbuf = =E2=96=92 + 1.01% copy_msghdr_from_user = =E2=96=92 - 16.60% __x64_sys_recvfrom = =E2=96=92 - __sys_recvfrom = =E2=96=92 - 16.43% inet6_recvmsg = =E2=96=92 - tcp_recvmsg = =E2=96=92 - 14.27% tcp_recvmsg_locked = =E2=96=92 - 12.67% __tcp_transmit_skb = =E2=96=92 - 12.64% __ip_queue_xmit = =E2=96=92 - 12.56% ip_finish_output2 = =E2=96=92 - 12.51% __local_bh_enable_ip = =E2=96=92 - do_softirq.part.0 = =E2=96=92 - 12.50% __softirqentry_text_start = =E2=96=92 - 12.49% net_rx_action = =E2=96=92 - 12.47% __napi_poll = =E2=96=92 - process_backlog = =E2=96=92 - 12.41% __netif_receive= _skb_one_core =E2=96=92 - 9.75% ip_local_deli= ver_finish =E2=96=92 - 9.52% ip_protoco= l_deliver_rcu =E2=96=92 + 9.28% tcp_v4_= rcv =E2=96=92 - 2.02% ip_local_deli= ver =E2=96=92 - 1.92% nf_hook_sl= ow =E2=96=92 + 1.84% nft_do_= chain_ipv4 =E2=96=92 0.70% __sk_mem_reduce_allocated = =E2=96=92 + 2.10% release_sock = =E2=96=92 + 1.16% __x64_sys_timerfd_settime = =E2=96=92 + 0.56% ksys_write = =E2=96=92 + 0.54% __x64_sys_epoll_wait = =E2=96=92 + 89.13% 0.06% passt.avx2 [kernel.kallsyms] [k] do_syscall_64 = =E2=96=92 + 84.23% 0.02% passt.avx2 [kernel.kallsyms] [k] inet6_recvmsg = =E2=96=92 + 84.21% 0.04% passt.avx2 [kernel.kallsyms] [k] tcp_recvmsg = =E2=96=92 + 81.84% 0.97% passt.avx2 [kernel.kallsyms] [k] tcp_recvmsg_loc= ked =E2=96=92 + 74.96% 0.00% passt.avx2 [unknown] [k] 000000000000000= 0 =E2=96=92 + 69.78% 0.07% passt.avx2 libc.so.6 [.] __libc_recvmsg = =E2=96=92 + 69.37% 0.02% passt.avx2 [kernel.kallsyms] [k] __sys_recvmsg = =E2=96=92 + 69.26% 0.03% passt.avx2 [kernel.kallsyms] [k] ___sys_recvmsg = =E2=96=92 + 68.11% 0.12% passt.avx2 [kernel.kallsyms] [k] ____sys_recvmsg= =E2=96=92 + 64.14% 0.06% passt.avx2 [kernel.kallsyms] [k] skb_copy_datagr= am_iter =E2=96=92 + 64.08% 5.60% passt.avx2 [kernel.kallsyms] [k] __skb_datagram_= iter =E2=96=92 + 51.44% 2.68% passt.avx2 [kernel.kallsyms] [k] _copy_to_iter = =E2=96=92 + 49.16% 49.08% passt.avx2 [kernel.kallsyms] [k] copy_user_gener= ic_string =E2=97=86 + 16.84% 0.00% passt.avx2 [unknown] [k] 0xffff000000000= 000 =E2=96=92 + 16.77% 0.02% passt.avx2 libc.so.6 [.] __libc_recv = =E2=96=92 + 16.60% 0.00% passt.avx2 [kernel.kallsyms] [k] __x64_sys_recvf= rom =E2=96=92 + 16.60% 0.07% passt.avx2 [kernel.kallsyms] [k] __sys_recvfrom = =E2=96=92 + 13.81% 1.28% passt.avx2 [kernel.kallsyms] [k] __tcp_transmit_= skb =E2=96=92 + 13.76% 0.06% passt.avx2 [kernel.kallsyms] [k] __ip_queue_xmit= =E2=96=92 + 13.64% 0.10% passt.avx2 [kernel.kallsyms] [k] ip_finish_outpu= t2 =E2=96=92 + 13.62% 0.14% passt.avx2 [kernel.kallsyms] [k] __local_bh_enab= le_ip =E2=96=92 + 13.57% 0.01% passt.avx2 [kernel.kallsyms] [k] __softirqentry_= text_start =E2=96=92 + 13.56% 0.01% passt.avx2 [kernel.kallsyms] [k] do_softirq.part= .0 =E2=96=92 + 13.53% 0.01% passt.avx2 [kernel.kallsyms] [k] net_rx_action = =E2=96=92 + 13.51% 0.00% passt.avx2 [kernel.kallsyms] [k] __napi_poll = =E2=96=92 + 13.51% 0.04% passt.avx2 [kernel.kallsyms] [k] process_backlog= =E2=96=92 + 13.45% 0.02% passt.avx2 [kernel.kallsyms] [k] __netif_receive= _skb_one_core =E2=96=92 + 11.27% 0.04% passt.avx2 [kernel.kallsyms] [k] tcp_v4_do_rcv = =E2=96=92 + 10.96% 0.06% passt.avx2 [kernel.kallsyms] [k] tcp_rcv_establi= shed =E2=96=92 + 10.74% 0.02% passt.avx2 [kernel.kallsyms] [k] ip_local_delive= r_finish =E2=96=92 + 10.51% 0.04% passt.avx2 [kernel.kallsyms] [k] ip_protocol_del= iver_rcu =E2=96=92 + 10.26% 0.17% passt.avx2 [kernel.kallsyms] [k] tcp_v4_rcv = =E2=96=92 + 8.14% 0.01% passt.avx2 [kernel.kallsyms] [k] __tcp_push_pend= ing_frames =E2=96=92 + 8.13% 0.73% passt.avx2 [kernel.kallsyms] [k] tcp_write_xmit = =E2=96=92 + 6.96% 0.26% passt.avx2 [kernel.kallsyms] [k] simple_copy_to_= iter =E2=96=92 + 6.79% 4.72% passt.avx2 [kernel.kallsyms] [k] __check_object_= size =E2=96=92 + 3.33% 0.16% passt.avx2 [kernel.kallsyms] [k] nf_hook_slow = =E2=96=92 + 3.09% 0.09% passt.avx2 [nf_tables] [k] nft_do_chain_ip= v4 =E2=96=92 + 3.00% 2.28% passt.avx2 [nf_tables] [k] nft_do_chain = =E2=96=92 + 2.85% 2.84% passt.avx2 passt.avx2 [.] vu_init_elem = =E2=96=92 + 2.22% 0.02% passt.avx2 [kernel.kallsyms] [k] release_sock = =E2=96=92 + 2.15% 0.02% passt.avx2 [kernel.kallsyms] [k] __release_sock = =E2=96=92 + 2.04% 0.08% passt.avx2 [kernel.kallsyms] [k] ip_local_delive= r =E2=96=92 + 1.80% 1.79% passt.avx2 [kernel.kallsyms] [k] __virt_addr_val= id =E2=96=92 + 1.57% 0.03% passt.avx2 libc.so.6 [.] timerfd_settime= =E2=96=92 1.53% 1.53% passt.avx2 passt.avx2 [.] vu_queue_map_de= sc.isra.0 =E2=96=92 -- not much we can improve (and the throughput is anyway very close to iperf3 to iperf3 on host's loopback, ~50 Gbps vs. ~70): the bulk of it is copy_user_generic_string() reading from sockets into the queue and related bookkeeping. The only users of more than 1% of cycles are vu_init_elem() and vu_queue_map_desc(), perhaps we could try to speed those up... one day. Full perf output (you can load it with perf -i ...), if you're curious, at: https://passt.top/static/vu_tcp_ipv6_inbound.perf For outbound traffic (I tried with IPv4), which is much slower for some reason (~25 Gbps): -- Samples: 79K of event 'cycles', Event count (approx.): 73661070737 Children Self Command Shared Object Symbol - 91.00% 0.23% passt.avx2 [kernel.kallsyms] [k] entry_SYSCALL_6= 4_after_hwframe =E2=97=86 90.78% entry_SYSCALL_64_after_hwframe = =E2=96=92 - do_syscall_64 = =E2=96=92 - 78.75% __sys_sendmsg = =E2=96=92 - 78.58% ___sys_sendmsg = =E2=96=92 - 78.06% ____sys_sendmsg = =E2=96=92 - sock_sendmsg = =E2=96=92 - 77.58% tcp_sendmsg = =E2=96=92 - 68.63% tcp_sendmsg_locked = =E2=96=92 - 26.24% sk_page_frag_refill = =E2=96=92 - skb_page_frag_refill = =E2=96=92 - 25.87% __alloc_pages = =E2=96=92 - 25.61% get_page_from_freelist = =E2=96=92 24.51% clear_page_rep = =E2=96=92 - 23.08% _copy_from_iter = =E2=96=92 22.88% copy_user_generic_string = =E2=96=92 - 8.77% tcp_write_xmit = =E2=96=92 - 8.19% __tcp_transmit_skb = =E2=96=92 - 7.86% __ip_queue_xmit = =E2=96=92 - 7.13% ip_finish_output2 = =E2=96=92 - 6.65% __local_bh_enable_ip = =E2=96=92 - 6.60% do_softirq.part.0 = =E2=96=92 - 6.51% __softirqentry_text_st= art =E2=96=92 - 6.40% net_rx_action = =E2=96=92 - 5.43% __napi_poll = =E2=96=92 + process_backlog = =E2=96=92 0.50% napi_consume_skb= =E2=96=92 + 5.39% __tcp_push_pending_frames = =E2=96=92 + 2.03% tcp_stream_alloc_skb = =E2=96=92 + 1.48% tcp_wmem_schedule = =E2=96=92 + 8.58% release_sock = =E2=96=92 - 4.57% ksys_write = =E2=96=92 - 4.41% vfs_write = =E2=96=92 - 3.96% eventfd_write = =E2=96=92 - 3.46% __wake_up_common = =E2=96=92 - irqfd_wakeup = =E2=96=92 - 3.15% kvm_arch_set_irq_inatomic = =E2=96=92 - 3.11% kvm_irq_delivery_to_apic_fast = =E2=96=92 - 2.01% __apic_accept_irq = =E2=96=92 0.93% svm_complete_interrupt_delivery = =E2=96=92 + 3.91% __x64_sys_epoll_wait = =E2=96=92 + 1.20% __x64_sys_getsockopt = =E2=96=92 + 0.78% syscall_trace_enter.constprop.0 = =E2=96=92 0.71% syscall_exit_to_user_mode = =E2=96=92 + 0.61% ksys_read = =E2=96=92 -- ...there are no users of more than 1% cycles in passt itself. The bulk of it is sendmsg() as expected, one notable thing is that the kernel spends an awful amount of cycles zeroing pages so that we can fill them. I looked into that "issue" a long time ago, https://github.com/netoptimizer/prototype-kernel/pull/39/commits/2c8223c3= 0d7f280a9e456d8e690adb0869ed8c5c ...maybe I can try out a kernel with a version of that as clear_page_rep() and see what happens. Anyway, same here, I don't see anything we can really improve in passt. Full output at: https://passt.top/static/vu_tcp_ipv4_outbound.perf --=20 Stefano