From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=DY2/swke; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id 911B35A0280 for ; Fri, 06 Jun 2025 18:37:28 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1749227847; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LoxRuIXW/p2JuPy2VvMk3mmpaVrhAbUpzt6Xgtx11Xg=; b=DY2/swkea1c7WE+0nPJ/ziCIr5kxVym6Ndx/WsmwmcdMLU2lpTkqUN09MmInm3ASqHeitI qGWf1bO1wwRy1xNBX5u94V9KCzoltd8dFnTbEFf5DnzYo6aMt6hLqlORoqOC0VEUV0Fnvx Esq7etLaR0vXufCvyaQ2vUfM3G0NyA8= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-681-i-yIshopNyqKF7UqIpoK-w-1; Fri, 06 Jun 2025 12:37:26 -0400 X-MC-Unique: i-yIshopNyqKF7UqIpoK-w-1 X-Mimecast-MFC-AGG-ID: i-yIshopNyqKF7UqIpoK-w_1749227845 Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-3a4f3796779so1661030f8f.1 for ; Fri, 06 Jun 2025 09:37:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749227842; x=1749832642; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=HBHmLN7nft9PvCvWaOzK0nDnMvRocOVVserNafym2dA=; b=qzbnSRygdT0l7LlTNjHu3x5XxHp7BWk/NYMAZHfxui24KsQ5VRo0D9nUeiTmGCurjN eVL7gezZLZF0fb8fCQG2QlWPgqyuy2x1K4uRwbX6/RKG/PqhE5+/HeJpbWf1Mif55qyN zi+BuNqDRaX9+u+wCHQ6xmiUfcSVm/mGcT0jSQj7acF+36hudNe4MiHnTJzlJ0O3CBVA uqeaDtIva+KxrhPg/TXMOqKoquzjWREFbJxpDxbFpZWMtJaeeRYU6i9ubaAm48sTv7Tj ls1HO4cVlXWT2pBmPcbynqebdxTAw8Y1gqvHNO5i2HFU2xUK1xEzqZ/VtVHcMwPUTMud lXfA== X-Gm-Message-State: AOJu0YxhIINhzzwvvnUAz1iuiiv0nTXE29G4RVaqQrC8eUTd3QEiQUtI IeDeNV6dwRbFSy+/NBQmcyJtOEuL6aN2hXU+yZNcGGiSeF34bpGy7WLhRkVLA89JkMMHn6GHrLz 56HJUH2697BOaSTEcCrC0vSYJnPKDq3CA9EiDUkUbUKoiy22/4gn9XsLbrjeSmcMfpCsgjj7u3i nsupipT/21rL5FBL2aoLTWUHdpEozr+d5mso5P X-Gm-Gg: ASbGncs6/FogoqNovWIa2268ZSpX3o90gLNeAdulhHUN8n4he/+392VOKGfsXeGWY+B 9jUfWXHHY9BnSeelREAop9BxP2P4cKKYjirwuJB6evofaLSvEsmPS88deQDigPfVgipa+N2gZmn 1Vjck2TSZG+dx8++cR8RJCpqYricSBuN/xaByLkb4c9JVxsCL2+FKusvWLRiAGjVs8kmbbmkVBI DGPcVi5VA0NA/J9VyZOep4JPgtciPWTPVGdVWadbNqDkR6DLAxSTPCDOml8WdAecKwTlFzF3vI4 wC6zUR472c3pMyXQK5cgcYqwyXPXLH82LQ== X-Received: by 2002:a05:6000:2087:b0:3a5:2f6a:ccd5 with SMTP id ffacd0b85a97d-3a531cbd83fmr3205424f8f.49.1749227841959; Fri, 06 Jun 2025 09:37:21 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEMVB2BviLovSYCpGZBHHuqDtERBiey0lxago/UPPShGyqun8Ktsp0XrKGeOZGN6mHegC+c6g== X-Received: by 2002:a05:6000:2087:b0:3a5:2f6a:ccd5 with SMTP id ffacd0b85a97d-3a531cbd83fmr3205399f8f.49.1749227841410; Fri, 06 Jun 2025 09:37:21 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-452137281b0sm30048795e9.29.2025.06.06.09.37.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Jun 2025 09:37:17 -0700 (PDT) Date: Fri, 6 Jun 2025 18:37:02 +0200 From: Stefano Brivio To: Eugenio Perez Martin Subject: Re: vhost-kernel net on pasta: from 26 to 37Gbit/s Message-ID: <20250606183702.0ff9a3c7@elisabeth> In-Reply-To: References: <20250521120855.5cdaeb04@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: RT5URbJH1dyoJScJZ8-d4jI7yB6YOp7fRn7MupgLNZY_1749227845 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: NEQJNSJ627XVIWIROZJF36ZEFVFXLON6 X-Message-ID-Hash: NEQJNSJ627XVIWIROZJF36ZEFVFXLON6 X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Jason Wang , Jeff Nelson X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, 6 Jun 2025 16:32:38 +0200 Eugenio Perez Martin wrote: > On Wed, May 21, 2025 at 12:35=E2=80=AFPM Eugenio Perez Martin > wrote: > > > > On Wed, May 21, 2025 at 12:09=E2=80=AFPM Stefano Brivio wrote: =20 > > > > > > On Tue, 20 May 2025 17:09:44 +0200 > > > Eugenio Perez Martin wrote: > > > =20 > > > > [...] > > > > > > > > Now if I isolate the vhost kernel thread [1] I get way more > > > > performance as expected: > > > > - - - - - - - - - - - - - - - - - - - - - - - - - > > > > [ ID] Interval Transfer Bitrate Retr > > > > [ 5] 0.00-10.00 sec 43.1 GBytes 37.1 Gbits/sec 0 = sender > > > > [ 5] 0.00-10.04 sec 43.1 GBytes 36.9 Gbits/sec = receiver > > > > > > > > After analyzing perf output, rep_movs_alternative is the most calle= d > > > > function in the three iperf3 (~20%Self), passt.avx2 (~15%Self) and > > > > vhost (~15%Self) =20 > > > > > > Interesting... s/most called function/function using the most cycles/= , I > > > suppose. > > > =20 > > > > Right! > > =20 > > > So it looks somewhat similar to > > > > > > https://archives.passt.top/passt-dev/20241017021027.2ac9ea53@elisab= eth/ > > > > > > now? > > > =20 > > > > Kind of. Below tcp_sendmsg_locked I don't see sk_page_frag_refill but > > skb_do_copy_data_nocache. Not sure if that means something, as it > > should not be affected by vhost. > > =20 > > > > But I don't see any of them consuming 100% of CPU in > > > > top: pasta consumes ~85% %CPU, both iperf3 client and server consum= es > > > > 60%, and vhost consumes ~53%. > > > > > > > > So... I have mixed feelings about this :). By "default" it seems to > > > > have less performance, but my test is maybe too synthetic. =20 > > > > > > Well, surely we can't ask Podman users to pin specific stuff to given > > > CPU threads. :) > > > =20 > > > > Yes but maybe the result changes under the right schedule? I'm > > isolating the CPUs entirely, which is not the usual case for pasta for > > sure :). > > =20 > > > > There is room for improvement with the mentioned optimizations so I= 'd > > > > continue applying them, continuing with UDP and TCP zerocopy, and > > > > developing zerocopy vhost rx. =20 > > > > > > That definitely makes sense to me. > > > =20 > > > > Good! > > =20 > > > > With these numbers I think the series should not be > > > > merged at the moment. I could send it as RFC if you want but I've n= ot > > > > applied the comments the first one received, POC style :). =20 > > > > > > I don't think it's really needed for you to spend time on > > > semi-polishing something just to have an RFC if you're still working = on > > > it. I guess the implementation will change substantially anyway once > > > you factor in further optimisations. > > > =20 > > > > Agree! I'll keep iterating on this then. > > =20 >=20 > Actually, if I remove all the taskset etc, and trust the kernel > scheduler, vanilla pasta gives me: > [pasta@virtlab716 ~]$ /home/passt/pasta --config-net iperf3 -c 10.6.68.25= 4 -w 8M > Connecting to host 10.6.68.254, port 5201 > [ 5] local 10.6.68.20 port 40408 connected to 10.6.68.254 port 5201 > [ ID] Interval Transfer Bitrate Retr Cwnd > [ 5] 0.00-1.00 sec 3.11 GBytes 26.7 Gbits/sec 0 25.4 MBytes > [ 5] 1.00-2.00 sec 3.11 GBytes 26.7 Gbits/sec 0 25.4 MBytes > [ 5] 2.00-3.00 sec 3.12 GBytes 26.8 Gbits/sec 0 25.4 MBytes > [ 5] 3.00-4.00 sec 3.11 GBytes 26.7 Gbits/sec 0 25.4 MBytes > [ 5] 4.00-5.00 sec 3.10 GBytes 26.6 Gbits/sec 0 25.4 MBytes > [ 5] 5.00-6.00 sec 3.11 GBytes 26.7 Gbits/sec 0 25.4 MBytes > [ 5] 6.00-7.00 sec 3.11 GBytes 26.7 Gbits/sec 0 25.4 MBytes > [ 5] 7.00-8.00 sec 3.09 GBytes 26.6 Gbits/sec 0 25.4 MBytes > [ 5] 8.00-9.00 sec 3.08 GBytes 26.5 Gbits/sec 0 25.4 MBytes > [ 5] 9.00-10.00 sec 3.10 GBytes 26.6 Gbits/sec 0 25.4 MBytes > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bitrate Retr > [ 5] 0.00-10.00 sec 31.0 GBytes 26.7 Gbits/sec 0 sen= der > [ 5] 0.00-10.04 sec 31.0 GBytes 26.5 Gbits/sec rec= eiver >=20 > And with vhost-net : > [pasta@virtlab716 ~]$ /home/passt/pasta --config-net iperf3 -c 10.6.68.25= 4 -w 8M > ... > Connecting to host 10.6.68.254, port 5201 > [ 5] local 10.6.68.20 port 46720 connected to 10.6.68.254 port 5201 > [ ID] Interval Transfer Bitrate Retr Cwnd > [ 5] 0.00-1.00 sec 4.17 GBytes 35.8 Gbits/sec 0 11.9 MBytes > [ 5] 1.00-2.00 sec 4.17 GBytes 35.9 Gbits/sec 0 11.9 MBytes > [ 5] 2.00-3.00 sec 4.16 GBytes 35.7 Gbits/sec 0 11.9 MBytes > [ 5] 3.00-4.00 sec 4.14 GBytes 35.6 Gbits/sec 0 11.9 MBytes > [ 5] 4.00-5.00 sec 4.16 GBytes 35.7 Gbits/sec 0 11.9 MBytes > [ 5] 5.00-6.00 sec 4.16 GBytes 35.8 Gbits/sec 0 11.9 MBytes > [ 5] 6.00-7.00 sec 4.18 GBytes 35.9 Gbits/sec 0 11.9 MBytes > [ 5] 7.00-8.00 sec 4.19 GBytes 35.9 Gbits/sec 0 11.9 MBytes > [ 5] 8.00-9.00 sec 4.18 GBytes 35.9 Gbits/sec 0 11.9 MBytes > [ 5] 9.00-10.00 sec 4.18 GBytes 35.9 Gbits/sec 0 11.9 MBytes > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bitrate Retr > [ 5] 0.00-10.00 sec 41.7 GBytes 35.8 Gbits/sec 0 sen= der > [ 5] 0.00-10.04 sec 41.7 GBytes 35.7 Gbits/sec rec= eiver >=20 > If I go the extra mile and disable notifications (it might be just > noise, but...) > [pasta@virtlab716 ~]$ /home/passt/pasta --config-net iperf3 -c 10.6.68.25= 4 -w 8M > ... > Connecting to host 10.6.68.254, port 5201 > [ 5] local 10.6.68.20 port 56590 connected to 10.6.68.254 port 5201 > [ ID] Interval Transfer Bitrate Retr Cwnd > [ 5] 0.00-1.00 sec 4.19 GBytes 36.0 Gbits/sec 0 12.4 MBytes > [ 5] 1.00-2.00 sec 4.18 GBytes 35.9 Gbits/sec 0 12.4 MBytes > [ 5] 2.00-3.00 sec 4.18 GBytes 35.9 Gbits/sec 0 12.4 MBytes > [ 5] 3.00-4.00 sec 4.20 GBytes 36.1 Gbits/sec 0 12.4 MBytes > [ 5] 4.00-5.00 sec 4.21 GBytes 36.2 Gbits/sec 0 12.4 MBytes > [ 5] 5.00-6.00 sec 4.21 GBytes 36.1 Gbits/sec 0 12.4 MBytes > [ 5] 6.00-7.00 sec 4.20 GBytes 36.1 Gbits/sec 0 12.4 MBytes > [ 5] 7.00-8.00 sec 4.23 GBytes 36.4 Gbits/sec 0 12.4 MBytes > [ 5] 8.00-9.00 sec 4.24 GBytes 36.4 Gbits/sec 0 12.4 MBytes > [ 5] 9.00-10.00 sec 4.21 GBytes 36.2 Gbits/sec 0 12.4 MBytes > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bitrate Retr > [ 5] 0.00-10.00 sec 42.1 GBytes 36.1 Gbits/sec 0 sen= der > [ 5] 0.00-10.04 sec 42.1 GBytes 36.0 Gbits/sec rec= eiver >=20 > So I guess the best is to actually run performance tests closer to > real-world workload against the new version and see if it works > better? Well, that's certainly a possibility. I'd say the biggest value for vhost-net usage in pasta is reaching throughput figures that are comparable with veth, with or without multithreading (keeping an eye on bytes per cycle, of course), with or without kernel changes, so that users won't need to choose between rootless and performance anymore. It would also simplify things in Podman quite a lot (and to some extent in rootlesskit / Docker as well). We're pretty much there with virtual machines, just not quite with containers (which is somewhat ironic, but of course there's a good reason for that). If we're clearly wasting cycles in vhost-net (because of the bounce buffer, plus something else perhaps?) *and* there's a somewhat possible solution for that in sight *and* the interface would change anyway, running throughput tests and polishing up the current version with a half-baked solution at the moment sounds a bit wasteful to me. But if one of those assumptions doesn't hold, or if you feel the need to consolidate the current status, perhaps polishing up the current version right now and actually evaluating throughput (as well as overhead) makes sense to me, yes. --=20 Stefano