public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: Laurent Vivier <lvivier@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v7 0/8] Add vhost-user support to passt. (part 3)
Date: Fri, 11 Oct 2024 20:07:30 +0200	[thread overview]
Message-ID: <20241011200730.63c97dc7@elisabeth> (raw)
In-Reply-To: <20241010090801.23da8bff@elisabeth>

On Thu, 10 Oct 2024 09:08:01 +0200
Stefano Brivio <sbrivio@redhat.com> wrote:

> For outbound traffic (I tried with IPv4), which is much slower for some
> reason (~25 Gbps):
> 
> --
> Samples: 79K of event 'cycles', Event count (approx.): 73661070737
>   Children      Self  Command     Shared Object         Symbol
> -   91.00%     0.23%  passt.avx2  [kernel.kallsyms]     [k] entry_SYSCALL_64_after_hwframe ◆
>      90.78% entry_SYSCALL_64_after_hwframe                                                 ▒
>       - do_syscall_64                                                                      ▒
>          - 78.75% __sys_sendmsg                                                            ▒
>             - 78.58% ___sys_sendmsg                                                        ▒
>                - 78.06% ____sys_sendmsg                                                    ▒
>                   - sock_sendmsg                                                           ▒
>                      - 77.58% tcp_sendmsg                                                  ▒
>                         - 68.63% tcp_sendmsg_locked                                        ▒
>                            - 26.24% sk_page_frag_refill                                    ▒
>                               - skb_page_frag_refill                                       ▒
>                                  - 25.87% __alloc_pages                                    ▒
>                                     - 25.61% get_page_from_freelist                        ▒
>                                          24.51% clear_page_rep                             ▒
>                            - 23.08% _copy_from_iter                                        ▒
>                                 22.88% copy_user_generic_string                            ▒
>                            - 8.77% tcp_write_xmit                                          ▒
>                               - 8.19% __tcp_transmit_skb                                   ▒
>                                  - 7.86% __ip_queue_xmit                                   ▒
>                                     - 7.13% ip_finish_output2                              ▒
>                                        - 6.65% __local_bh_enable_ip                        ▒
>                                           - 6.60% do_softirq.part.0                        ▒
>                                              - 6.51% __softirqentry_text_start             ▒
>                                                 - 6.40% net_rx_action                      ▒
>                                                    - 5.43% __napi_poll                     ▒
>                                                       + process_backlog                    ▒
>                                                      0.50% napi_consume_skb                ▒
>                            + 5.39% __tcp_push_pending_frames                               ▒
>                            + 2.03% tcp_stream_alloc_skb                                    ▒
>                            + 1.48% tcp_wmem_schedule                                       ▒
>                         + 8.58% release_sock                                               ▒
>          - 4.57% ksys_write                                                                ▒
>             - 4.41% vfs_write                                                              ▒
>                - 3.96% eventfd_write                                                       ▒
>                   - 3.46% __wake_up_common                                                 ▒
>                      - irqfd_wakeup                                                        ▒
>                         - 3.15% kvm_arch_set_irq_inatomic                                  ▒
>                            - 3.11% kvm_irq_delivery_to_apic_fast                           ▒
>                               - 2.01% __apic_accept_irq                                    ▒
>                                    0.93% svm_complete_interrupt_delivery                   ▒
>          + 3.91% __x64_sys_epoll_wait                                                      ▒
>          + 1.20% __x64_sys_getsockopt                                                      ▒
>          + 0.78% syscall_trace_enter.constprop.0                                           ▒
>            0.71% syscall_exit_to_user_mode                                                 ▒
>          + 0.61% ksys_read                                                                 ▒
> --
> 
> ...there are no users of more than 1% cycles in passt itself. The bulk of
> it is sendmsg() as expected, one notable thing is that the kernel spends
> an awful amount of cycles zeroing pages so that we can fill them. I looked
> into that "issue" a long time ago,
> 
>   https://github.com/netoptimizer/prototype-kernel/pull/39/commits/2c8223c30d7f280a9e456d8e690adb0869ed8c5c
> 
> ...maybe I can try out a kernel with a version of that as
> clear_page_rep() and see what happens.

...so I tried, it looks like this, but it doesn't boot for some reason:

--
diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index f3d257c45225..4079012ce765 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -44,6 +44,17 @@ void clear_page_orig(void *page);
 void clear_page_rep(void *page);
 void clear_page_erms(void *page);
 
+#define MEMSET_AVX2_ZERO(reg)					\
+	asm volatile("vpxor %ymm" #reg ", %ymm" #reg ", %ymm" #reg)
+#define MEMSET_AVX2_STORE(loc, reg)				\
+	asm volatile("vmovdqa %%ymm" #reg ", %0" : "=m" (loc))
+
+#define YMM_BYTES		(256 / 8)
+#define BYTES_TO_YMM(x)		((x) / YMM_BYTES)
+extern void kernel_fpu_begin_mask(unsigned int kfpu_mask);
+extern void kernel_fpu_end(void);
+extern bool irq_fpu_usable(void);
+
 static inline void clear_page(void *page)
 {
 	/*
@@ -51,6 +62,18 @@ static inline void clear_page(void *page)
 	 * below clobbers @page, so we perform unpoisoning before it.
 	 */
 	kmsan_unpoison_memory(page, PAGE_SIZE);
+
+	if (irq_fpu_usable()) {
+		int i;
+
+		kernel_fpu_begin();
+		MEMSET_AVX2_ZERO(0);
+		for (i = 0; i < BYTES_TO_YMM(PAGE_SIZE); i++)
+			MEMSET_AVX2_STORE(((unsigned char *)page)[YMM_BYTES * i], 0);
+		kernel_fpu_end();
+		return;
+	}
+
 	alternative_call_2(clear_page_orig,
 			   clear_page_rep, X86_FEATURE_REP_GOOD,
 			   clear_page_erms, X86_FEATURE_ERMS,
--

...I'm not sure if that's something we can do at early boot, so perhaps
I should add something specific in skb_page_frag_refill() instead. But
that's for another day/week/month...

-- 
@@ -44,6 +44,17 @@ void clear_page_orig(void *page);
 void clear_page_rep(void *page);
 void clear_page_erms(void *page);
 
+#define MEMSET_AVX2_ZERO(reg)					\
+	asm volatile("vpxor %ymm" #reg ", %ymm" #reg ", %ymm" #reg)
+#define MEMSET_AVX2_STORE(loc, reg)				\
+	asm volatile("vmovdqa %%ymm" #reg ", %0" : "=m" (loc))
+
+#define YMM_BYTES		(256 / 8)
+#define BYTES_TO_YMM(x)		((x) / YMM_BYTES)
+extern void kernel_fpu_begin_mask(unsigned int kfpu_mask);
+extern void kernel_fpu_end(void);
+extern bool irq_fpu_usable(void);
+
 static inline void clear_page(void *page)
 {
 	/*
@@ -51,6 +62,18 @@ static inline void clear_page(void *page)
 	 * below clobbers @page, so we perform unpoisoning before it.
 	 */
 	kmsan_unpoison_memory(page, PAGE_SIZE);
+
+	if (irq_fpu_usable()) {
+		int i;
+
+		kernel_fpu_begin();
+		MEMSET_AVX2_ZERO(0);
+		for (i = 0; i < BYTES_TO_YMM(PAGE_SIZE); i++)
+			MEMSET_AVX2_STORE(((unsigned char *)page)[YMM_BYTES * i], 0);
+		kernel_fpu_end();
+		return;
+	}
+
 	alternative_call_2(clear_page_orig,
 			   clear_page_rep, X86_FEATURE_REP_GOOD,
 			   clear_page_erms, X86_FEATURE_ERMS,
--

...I'm not sure if that's something we can do at early boot, so perhaps
I should add something specific in skb_page_frag_refill() instead. But
that's for another day/week/month...

-- 
Stefano


  parent reply	other threads:[~2024-10-11 18:07 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-09  9:07 [PATCH v7 0/8] Add vhost-user support to passt. (part 3) Laurent Vivier
2024-10-09  9:07 ` [PATCH v7 1/8] packet: replace struct desc by struct iovec Laurent Vivier
2024-10-09  9:07 ` [PATCH v7 2/8] vhost-user: introduce virtio API Laurent Vivier
2024-10-09  9:07 ` [PATCH v7 3/8] vhost-user: introduce vhost-user API Laurent Vivier
2024-10-09  9:07 ` [PATCH v7 4/8] udp: Prepare udp.c to be shared with vhost-user Laurent Vivier
2024-10-09  9:07 ` [PATCH v7 5/8] tcp: Export headers functions Laurent Vivier
2024-10-09  9:07 ` [PATCH v7 6/8] passt: rename tap_sock_init() to tap_backend_init() Laurent Vivier
2024-10-09  9:07 ` [PATCH v7 7/8] vhost-user: add vhost-user Laurent Vivier
2024-10-09  9:07 ` [PATCH v7 8/8] test: Add tests for passt in vhost-user mode Laurent Vivier
2024-10-09 13:07 ` [PATCH v7 0/8] Add vhost-user support to passt. (part 3) Stefano Brivio
2024-10-09 14:50   ` Laurent Vivier
2024-10-09 17:37 ` Stefano Brivio
2024-10-10  7:08   ` Stefano Brivio
2024-10-10  7:43     ` Laurent Vivier
2024-10-10  7:45     ` Laurent Vivier
2024-10-10  7:52       ` Stefano Brivio
2024-10-11 18:07     ` Stefano Brivio [this message]
2024-10-17  0:10       ` Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241011200730.63c97dc7@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).