From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=EY2PqJJm; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTP id 5E9085A004C for ; Fri, 11 Oct 2024 20:07:41 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1728670059; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Xcff804/36NWcRSG67ZGlDt1YOVrNT+i2Jr27XyWJbY=; b=EY2PqJJmF/Vuc/JxSRyqggpRljmCP9RyvQf+LVjBr++QuXmktso4M42SItQQef9GWubddd DOytNnfPpW8efszUNVKDDst0PL2FpnU22Pb3gsSMuGlimsR1LgjMP0ZwezFOYib1UjEKoW ywxxPBudZFTqT9oqZ6ZZFBe2/UXJPSQ= Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-614-jjKNVE2gP1aTzJRUXshCIA-1; Fri, 11 Oct 2024 14:07:37 -0400 X-MC-Unique: jjKNVE2gP1aTzJRUXshCIA-1 Received: by mail-pl1-f199.google.com with SMTP id d9443c01a7336-20c92707255so18196085ad.1 for ; Fri, 11 Oct 2024 11:07:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728670056; x=1729274856; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=dP5YDRiMgzZpqgN7QpRh8J03+7KURyYws2xGVmkBnQg=; b=WAc+84sy0ktWKVg2pBnSNSFSMzBL5D25IIbiIwXjZA4p1E82UYxsT0cChLtRCM3Pmu eBdeBudKYnCK54DfZGgkTyCNHWXgFT7Fyms3hOCxF65HX9fp8QvOFIOrsPusJSAGR59d X2XWgyRwWth155uA37QYdh5VQ+tkHZvygEFSP3PJysOWKEgdHr5/HTKkBPbV9YruyFiO e3BzfLiQ6U1nGNWT4YF4GlS++iS111q6KW1J9tb7AnHVOTHjn4EBSH4hSiv/brV3MQcO I9FzGmrRHKGPfWBlH27P9Mc5TKmArHmIQ+x/JIkcGg7cykuCNXthlcvAL9sgt9JH8rM6 37QQ== X-Gm-Message-State: AOJu0Yw6Qkg9GfKTXNURUiFGSc2AkhAT6HjcaFQeniklHcb2kpoYXSQm wNEewv/beyROE2g0qh7oABV+fjkf47uIOGPaBOxfiu/pVswsoMmtUTEeuCAN3YKJuG8Pe8LAG/L o9Oh+KWYfkuUc1qicJsRGi9f+0cVW4TIQLBMpsXas21D+WEEpaAPgS2wERn7/1wGSCEeDEATq3g 5hjZMeviuZQNOOZGXJiQeOfIL4PNU7A5bq X-Received: by 2002:a17:903:1c9:b0:20b:6e74:b720 with SMTP id d9443c01a7336-20cbb27a78bmr5812585ad.59.1728670055630; Fri, 11 Oct 2024 11:07:35 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHXeeMEtmDn59tRMDzVuf8B6sEupvHx41FzBBXU0xFSKcx+eCOg95eXoNjZZMw2uo+ylfBBNg== X-Received: by 2002:a17:903:1c9:b0:20b:6e74:b720 with SMTP id d9443c01a7336-20cbb27a78bmr5812155ad.59.1728670054986; Fri, 11 Oct 2024 11:07:34 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20c8c33ce22sm26034935ad.256.2024.10.11.11.07.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Oct 2024 11:07:34 -0700 (PDT) Date: Fri, 11 Oct 2024 20:07:30 +0200 From: Stefano Brivio To: Laurent Vivier Subject: Re: [PATCH v7 0/8] Add vhost-user support to passt. (part 3) Message-ID: <20241011200730.63c97dc7@elisabeth> In-Reply-To: <20241010090801.23da8bff@elisabeth> References: <20241009090716.691361-1-lvivier@redhat.com> <20241009193726.7e2e5790@elisabeth> <20241010090801.23da8bff@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: SVEX3GCIOWY7TQNVV7LKZPXNOAYYNYD5 X-Message-ID-Hash: SVEX3GCIOWY7TQNVV7LKZPXNOAYYNYD5 X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, 10 Oct 2024 09:08:01 +0200 Stefano Brivio wrote: > For outbound traffic (I tried with IPv4), which is much slower for some > reason (~25 Gbps): >=20 > -- > Samples: 79K of event 'cycles', Event count (approx.): 73661070737 > Children Self Command Shared Object Symbol > - 91.00% 0.23% passt.avx2 [kernel.kallsyms] [k] entry_SYSCALL= _64_after_hwframe =E2=97=86 > 90.78% entry_SYSCALL_64_after_hwframe = =E2=96=92 > - do_syscall_64 = =E2=96=92 > - 78.75% __sys_sendmsg = =E2=96=92 > - 78.58% ___sys_sendmsg = =E2=96=92 > - 78.06% ____sys_sendmsg = =E2=96=92 > - sock_sendmsg = =E2=96=92 > - 77.58% tcp_sendmsg = =E2=96=92 > - 68.63% tcp_sendmsg_locked = =E2=96=92 > - 26.24% sk_page_frag_refill = =E2=96=92 > - skb_page_frag_refill = =E2=96=92 > - 25.87% __alloc_pages = =E2=96=92 > - 25.61% get_page_from_freelist = =E2=96=92 > 24.51% clear_page_rep = =E2=96=92 > - 23.08% _copy_from_iter = =E2=96=92 > 22.88% copy_user_generic_string = =E2=96=92 > - 8.77% tcp_write_xmit = =E2=96=92 > - 8.19% __tcp_transmit_skb = =E2=96=92 > - 7.86% __ip_queue_xmit = =E2=96=92 > - 7.13% ip_finish_output2 = =E2=96=92 > - 6.65% __local_bh_enable_ip = =E2=96=92 > - 6.60% do_softirq.part.0 = =E2=96=92 > - 6.51% __softirqentry_text_= start =E2=96=92 > - 6.40% net_rx_action = =E2=96=92 > - 5.43% __napi_poll = =E2=96=92 > + process_backlog = =E2=96=92 > 0.50% napi_consume_s= kb =E2=96=92 > + 5.39% __tcp_push_pending_frames = =E2=96=92 > + 2.03% tcp_stream_alloc_skb = =E2=96=92 > + 1.48% tcp_wmem_schedule = =E2=96=92 > + 8.58% release_sock = =E2=96=92 > - 4.57% ksys_write = =E2=96=92 > - 4.41% vfs_write = =E2=96=92 > - 3.96% eventfd_write = =E2=96=92 > - 3.46% __wake_up_common = =E2=96=92 > - irqfd_wakeup = =E2=96=92 > - 3.15% kvm_arch_set_irq_inatomic = =E2=96=92 > - 3.11% kvm_irq_delivery_to_apic_fast = =E2=96=92 > - 2.01% __apic_accept_irq = =E2=96=92 > 0.93% svm_complete_interrupt_delivery = =E2=96=92 > + 3.91% __x64_sys_epoll_wait = =E2=96=92 > + 1.20% __x64_sys_getsockopt = =E2=96=92 > + 0.78% syscall_trace_enter.constprop.0 = =E2=96=92 > 0.71% syscall_exit_to_user_mode = =E2=96=92 > + 0.61% ksys_read = =E2=96=92 > -- >=20 > ...there are no users of more than 1% cycles in passt itself. The bulk of > it is sendmsg() as expected, one notable thing is that the kernel spends > an awful amount of cycles zeroing pages so that we can fill them. I looke= d > into that "issue" a long time ago, >=20 > https://github.com/netoptimizer/prototype-kernel/pull/39/commits/2c8223= c30d7f280a9e456d8e690adb0869ed8c5c >=20 > ...maybe I can try out a kernel with a version of that as > clear_page_rep() and see what happens. ...so I tried, it looks like this, but it doesn't boot for some reason: -- diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.= h index f3d257c45225..4079012ce765 100644 --- a/arch/x86/include/asm/page_64.h +++ b/arch/x86/include/asm/page_64.h @@ -44,6 +44,17 @@ void clear_page_orig(void *page); void clear_page_rep(void *page); void clear_page_erms(void *page); =20 +#define MEMSET_AVX2_ZERO(reg)=09=09=09=09=09\ +=09asm volatile("vpxor %ymm" #reg ", %ymm" #reg ", %ymm" #reg) +#define MEMSET_AVX2_STORE(loc, reg)=09=09=09=09\ +=09asm volatile("vmovdqa %%ymm" #reg ", %0" : "=3Dm" (loc)) + +#define YMM_BYTES=09=09(256 / 8) +#define BYTES_TO_YMM(x)=09=09((x) / YMM_BYTES) +extern void kernel_fpu_begin_mask(unsigned int kfpu_mask); +extern void kernel_fpu_end(void); +extern bool irq_fpu_usable(void); + static inline void clear_page(void *page) { =09/* @@ -51,6 +62,18 @@ static inline void clear_page(void *page) =09 * below clobbers @page, so we perform unpoisoning before it. =09 */ =09kmsan_unpoison_memory(page, PAGE_SIZE); + +=09if (irq_fpu_usable()) { +=09=09int i; + +=09=09kernel_fpu_begin(); +=09=09MEMSET_AVX2_ZERO(0); +=09=09for (i =3D 0; i < BYTES_TO_YMM(PAGE_SIZE); i++) +=09=09=09MEMSET_AVX2_STORE(((unsigned char *)page)[YMM_BYTES * i], 0); +=09=09kernel_fpu_end(); +=09=09return; +=09} + =09alternative_call_2(clear_page_orig, =09=09=09 clear_page_rep, X86_FEATURE_REP_GOOD, =09=09=09 clear_page_erms, X86_FEATURE_ERMS, -- ...I'm not sure if that's something we can do at early boot, so perhaps I should add something specific in skb_page_frag_refill() instead. But that's for another day/week/month... --=20 Stefano