From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=UHdwp1qS; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 485EF5A0287 for ; Mon, 09 Jun 2025 12:00:02 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1749463201; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=K424+bA5hFAmX+JmOfyO8l0aw5U8Vi0yVdaZmgFeaXw=; b=UHdwp1qSePJyA/UeL1dPM8d5WK7YbdfFWhJcUpZhrGIeE5J+9piNZV6Y4q5ImGnlr6U5DK RnkTm/D43KehFin3KO4dZZWR5ChSyTC7MCwFp8J4kIRkNFAJisoNFfBNVacI2hWR0AqRsI tUD/KhDr8/HNvY1K+rFTOG0jmF2ibZM= Received: from mail-pj1-f71.google.com (mail-pj1-f71.google.com [209.85.216.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-586-UCB8ACXVOGKvZ4I5j3DqQw-1; Mon, 09 Jun 2025 05:59:59 -0400 X-MC-Unique: UCB8ACXVOGKvZ4I5j3DqQw-1 X-Mimecast-MFC-AGG-ID: UCB8ACXVOGKvZ4I5j3DqQw_1749463198 Received: by mail-pj1-f71.google.com with SMTP id 98e67ed59e1d1-3132c8437ffso4940629a91.1 for ; Mon, 09 Jun 2025 02:59:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749463198; x=1750067998; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=K424+bA5hFAmX+JmOfyO8l0aw5U8Vi0yVdaZmgFeaXw=; b=pUjED0/1ZDeYaHI0jAL/1uooQ/8Vv4j7sopD2tDQaHLM818NlJ8b5pOZZHksC/gI1B hKFlXKaLoQKO/WSynb9qouXcE+PmBx4lQTNfYMLo9u6aCoptY81I68lujuA3Oq0ymHsv RfJPMItPFdfL/Km/drK1E5L9y5z0Sw8frFd5DM18YsjR7M1kWm5JRfW1JXeQSuDJzfHD ubttJg9yWHvHmzSibpWYhqmrdIA9VrId6pryGpn6SosEr77GhCzinVtnwi7FGEEPOoPE +3ft0rUKfXYfAqkxvMIMKlSS8J/MJ4bSbgyOGsakR4isDMp0nncaWWMBYeImMepLjlZV AxSw== X-Gm-Message-State: AOJu0YySLPnzfzxMZ0GflFYs3h+MvO/R2AN98NirHMGkh1D0LF/ZEzhO pgxLZNXNQ+rSYeEK+c6MYmQqrFx3UOqSDbzWdy8uryPbdLGn7njpddRzM7MYzQxIvCK/EGJptYs 1JJ48jSPjINFNJSKl/9GQTkqpDOj1jBY4dVYdcndBWhu1VaURqGaNrlTltkIZoghp8DkYjAijMt Hzas96SbbaM6VOo+Y9G2TwFA1Nrao+ X-Gm-Gg: ASbGncvq4/MYTzlbBr8ZqfFiRAVht/8+eje8RsaIWhLTwUtNylePMBZVY2jNrhpkYY6 Z4BPuoShPCTy6Sbly4kQBog2r+FBTlOdy5rx4e4RahPzac0mvnHMmG9HudIvREFKVw5c= X-Received: by 2002:a17:90b:53c4:b0:312:f88d:260b with SMTP id 98e67ed59e1d1-313472f3d27mr23151184a91.14.1749463197984; Mon, 09 Jun 2025 02:59:57 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEK2NeZYhCDvNwev0iCoKf0fQXHDnPBHzwWUr4zINPecEPp2P4fAveIf4++0GTNul2lqpXWTTGNS6LYUf7p2ys= X-Received: by 2002:a17:90b:53c4:b0:312:f88d:260b with SMTP id 98e67ed59e1d1-313472f3d27mr23151160a91.14.1749463197623; Mon, 09 Jun 2025 02:59:57 -0700 (PDT) MIME-Version: 1.0 References: <20250521120855.5cdaeb04@elisabeth> <20250606183702.0ff9a3c7@elisabeth> In-Reply-To: <20250606183702.0ff9a3c7@elisabeth> From: Eugenio Perez Martin Date: Mon, 9 Jun 2025 11:59:21 +0200 X-Gm-Features: AX0GCFto10dBf0e0F0WFZA6vmgyvuzXCZrowG5_MiqUSTxbYNMAPO1RUvmlkm68 Message-ID: Subject: Re: vhost-kernel net on pasta: from 26 to 37Gbit/s To: Stefano Brivio X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: gJovErGhbzcAj2a6OGIlSg_ZvblhDiQxrJJell1G2HE_1749463198 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Message-ID-Hash: 2IYB2P6VWWL7J65TDGUF5VLXD5YXVN6T X-Message-ID-Hash: 2IYB2P6VWWL7J65TDGUF5VLXD5YXVN6T X-MailFrom: eperezma@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Jason Wang , Jeff Nelson X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, Jun 6, 2025 at 6:37=E2=80=AFPM Stefano Brivio = wrote: > > On Fri, 6 Jun 2025 16:32:38 +0200 > Eugenio Perez Martin wrote: > > > On Wed, May 21, 2025 at 12:35=E2=80=AFPM Eugenio Perez Martin > > wrote: > > > > > > On Wed, May 21, 2025 at 12:09=E2=80=AFPM Stefano Brivio wrote: > > > > > > > > On Tue, 20 May 2025 17:09:44 +0200 > > > > Eugenio Perez Martin wrote: > > > > > > > > > [...] > > > > > > > > > > Now if I isolate the vhost kernel thread [1] I get way more > > > > > performance as expected: > > > > > - - - - - - - - - - - - - - - - - - - - - - - - - > > > > > [ ID] Interval Transfer Bitrate Retr > > > > > [ 5] 0.00-10.00 sec 43.1 GBytes 37.1 Gbits/sec 0 = sender > > > > > [ 5] 0.00-10.04 sec 43.1 GBytes 36.9 Gbits/sec = receiver > > > > > > > > > > After analyzing perf output, rep_movs_alternative is the most cal= led > > > > > function in the three iperf3 (~20%Self), passt.avx2 (~15%Self) an= d > > > > > vhost (~15%Self) > > > > > > > > Interesting... s/most called function/function using the most cycle= s/, I > > > > suppose. > > > > > > > > > > Right! > > > > > > > So it looks somewhat similar to > > > > > > > > https://archives.passt.top/passt-dev/20241017021027.2ac9ea53@elis= abeth/ > > > > > > > > now? > > > > > > > > > > Kind of. Below tcp_sendmsg_locked I don't see sk_page_frag_refill but > > > skb_do_copy_data_nocache. Not sure if that means something, as it > > > should not be affected by vhost. > > > > > > > > But I don't see any of them consuming 100% of CPU in > > > > > top: pasta consumes ~85% %CPU, both iperf3 client and server cons= umes > > > > > 60%, and vhost consumes ~53%. > > > > > > > > > > So... I have mixed feelings about this :). By "default" it seems = to > > > > > have less performance, but my test is maybe too synthetic. > > > > > > > > Well, surely we can't ask Podman users to pin specific stuff to giv= en > > > > CPU threads. :) > > > > > > > > > > Yes but maybe the result changes under the right schedule? I'm > > > isolating the CPUs entirely, which is not the usual case for pasta fo= r > > > sure :). > > > > > > > > There is room for improvement with the mentioned optimizations so= I'd > > > > > continue applying them, continuing with UDP and TCP zerocopy, and > > > > > developing zerocopy vhost rx. > > > > > > > > That definitely makes sense to me. > > > > > > > > > > Good! > > > > > > > > With these numbers I think the series should not be > > > > > merged at the moment. I could send it as RFC if you want but I've= not > > > > > applied the comments the first one received, POC style :). > > > > > > > > I don't think it's really needed for you to spend time on > > > > semi-polishing something just to have an RFC if you're still workin= g on > > > > it. I guess the implementation will change substantially anyway onc= e > > > > you factor in further optimisations. > > > > > > > > > > Agree! I'll keep iterating on this then. > > > > > > > Actually, if I remove all the taskset etc, and trust the kernel > > scheduler, vanilla pasta gives me: > > [pasta@virtlab716 ~]$ /home/passt/pasta --config-net iperf3 -c 10.6.68.= 254 -w 8M > > Connecting to host 10.6.68.254, port 5201 > > [ 5] local 10.6.68.20 port 40408 connected to 10.6.68.254 port 5201 > > [ ID] Interval Transfer Bitrate Retr Cwnd > > [ 5] 0.00-1.00 sec 3.11 GBytes 26.7 Gbits/sec 0 25.4 MBytes > > [ 5] 1.00-2.00 sec 3.11 GBytes 26.7 Gbits/sec 0 25.4 MBytes > > [ 5] 2.00-3.00 sec 3.12 GBytes 26.8 Gbits/sec 0 25.4 MBytes > > [ 5] 3.00-4.00 sec 3.11 GBytes 26.7 Gbits/sec 0 25.4 MBytes > > [ 5] 4.00-5.00 sec 3.10 GBytes 26.6 Gbits/sec 0 25.4 MBytes > > [ 5] 5.00-6.00 sec 3.11 GBytes 26.7 Gbits/sec 0 25.4 MBytes > > [ 5] 6.00-7.00 sec 3.11 GBytes 26.7 Gbits/sec 0 25.4 MBytes > > [ 5] 7.00-8.00 sec 3.09 GBytes 26.6 Gbits/sec 0 25.4 MBytes > > [ 5] 8.00-9.00 sec 3.08 GBytes 26.5 Gbits/sec 0 25.4 MBytes > > [ 5] 9.00-10.00 sec 3.10 GBytes 26.6 Gbits/sec 0 25.4 MBytes > > - - - - - - - - - - - - - - - - - - - - - - - - - > > [ ID] Interval Transfer Bitrate Retr > > [ 5] 0.00-10.00 sec 31.0 GBytes 26.7 Gbits/sec 0 s= ender > > [ 5] 0.00-10.04 sec 31.0 GBytes 26.5 Gbits/sec r= eceiver > > > > And with vhost-net : > > [pasta@virtlab716 ~]$ /home/passt/pasta --config-net iperf3 -c 10.6.68.= 254 -w 8M > > ... > > Connecting to host 10.6.68.254, port 5201 > > [ 5] local 10.6.68.20 port 46720 connected to 10.6.68.254 port 5201 > > [ ID] Interval Transfer Bitrate Retr Cwnd > > [ 5] 0.00-1.00 sec 4.17 GBytes 35.8 Gbits/sec 0 11.9 MBytes > > [ 5] 1.00-2.00 sec 4.17 GBytes 35.9 Gbits/sec 0 11.9 MBytes > > [ 5] 2.00-3.00 sec 4.16 GBytes 35.7 Gbits/sec 0 11.9 MBytes > > [ 5] 3.00-4.00 sec 4.14 GBytes 35.6 Gbits/sec 0 11.9 MBytes > > [ 5] 4.00-5.00 sec 4.16 GBytes 35.7 Gbits/sec 0 11.9 MBytes > > [ 5] 5.00-6.00 sec 4.16 GBytes 35.8 Gbits/sec 0 11.9 MBytes > > [ 5] 6.00-7.00 sec 4.18 GBytes 35.9 Gbits/sec 0 11.9 MBytes > > [ 5] 7.00-8.00 sec 4.19 GBytes 35.9 Gbits/sec 0 11.9 MBytes > > [ 5] 8.00-9.00 sec 4.18 GBytes 35.9 Gbits/sec 0 11.9 MBytes > > [ 5] 9.00-10.00 sec 4.18 GBytes 35.9 Gbits/sec 0 11.9 MBytes > > - - - - - - - - - - - - - - - - - - - - - - - - - > > [ ID] Interval Transfer Bitrate Retr > > [ 5] 0.00-10.00 sec 41.7 GBytes 35.8 Gbits/sec 0 s= ender > > [ 5] 0.00-10.04 sec 41.7 GBytes 35.7 Gbits/sec r= eceiver > > > > If I go the extra mile and disable notifications (it might be just > > noise, but...) > > [pasta@virtlab716 ~]$ /home/passt/pasta --config-net iperf3 -c 10.6.68.= 254 -w 8M > > ... > > Connecting to host 10.6.68.254, port 5201 > > [ 5] local 10.6.68.20 port 56590 connected to 10.6.68.254 port 5201 > > [ ID] Interval Transfer Bitrate Retr Cwnd > > [ 5] 0.00-1.00 sec 4.19 GBytes 36.0 Gbits/sec 0 12.4 MBytes > > [ 5] 1.00-2.00 sec 4.18 GBytes 35.9 Gbits/sec 0 12.4 MBytes > > [ 5] 2.00-3.00 sec 4.18 GBytes 35.9 Gbits/sec 0 12.4 MBytes > > [ 5] 3.00-4.00 sec 4.20 GBytes 36.1 Gbits/sec 0 12.4 MBytes > > [ 5] 4.00-5.00 sec 4.21 GBytes 36.2 Gbits/sec 0 12.4 MBytes > > [ 5] 5.00-6.00 sec 4.21 GBytes 36.1 Gbits/sec 0 12.4 MBytes > > [ 5] 6.00-7.00 sec 4.20 GBytes 36.1 Gbits/sec 0 12.4 MBytes > > [ 5] 7.00-8.00 sec 4.23 GBytes 36.4 Gbits/sec 0 12.4 MBytes > > [ 5] 8.00-9.00 sec 4.24 GBytes 36.4 Gbits/sec 0 12.4 MBytes > > [ 5] 9.00-10.00 sec 4.21 GBytes 36.2 Gbits/sec 0 12.4 MBytes > > - - - - - - - - - - - - - - - - - - - - - - - - - > > [ ID] Interval Transfer Bitrate Retr > > [ 5] 0.00-10.00 sec 42.1 GBytes 36.1 Gbits/sec 0 s= ender > > [ 5] 0.00-10.04 sec 42.1 GBytes 36.0 Gbits/sec r= eceiver > > > > So I guess the best is to actually run performance tests closer to > > real-world workload against the new version and see if it works > > better? > > Well, that's certainly a possibility. > > I'd say the biggest value for vhost-net usage in pasta is reaching > throughput figures that are comparable with veth, with or without > multithreading (keeping an eye on bytes per cycle, of course), with or > without kernel changes, so that users won't need to choose between > rootless and performance anymore. > > It would also simplify things in Podman quite a lot (and to some extent > in rootlesskit / Docker as well). We're pretty much there with virtual > machines, just not quite with containers (which is somewhat ironic, but > of course there's a good reason for that). > > If we're clearly wasting cycles in vhost-net (because of the bounce > buffer, plus something else perhaps?) *and* there's a somewhat possible > solution for that in sight *and* the interface would change anyway, > running throughput tests and polishing up the current version with a > half-baked solution at the moment sounds a bit wasteful to me. > My point is that I'm testing a very synthetic scenario. If everybody agree this is close enough to real world ones, I'm ok to continue improving the edges we see. If not, maybe we're picking the wrong fruit even if it is low hand? Getting a table like [1] would give us light about this, especially if it is just a matter of running "make performance" or similar. Maybe we need to include longer queues? Focus on a given scenario? UDP goes better but TCP? Now more points about this scenario: 1) I don't see 100% CPU usage in any element: CPU% 84.2 passt.avx2 57.9 iperf3 57.2 iperf3 50.7 vhost-1805109 2) The most used (Self%) function in vhost is rep_movs_alternative, called from skb_copy_datagram_iter, so yes, ZeroCopy should help a lot here. Now, is "iperf3 -w 8M" representative? I'm sure ZC helps in this scenario, does it make it worse if we have small packets? Do we care? I'm totally ok with continuing trying with ZC, I just want to make sure we're not missing anything :). Thanks! [1] https://passt.top/passt/about/#performance_1 More details > But if one of those assumptions doesn't hold, or if you feel the need to > consolidate the current status, perhaps polishing up the current > version right now and actually evaluating throughput (as well as > overhead) makes sense to me, yes. > > -- > Stefano >