From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 80A755A005E for ; Wed, 21 Dec 2022 07:01:21 +0100 (CET) Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4NcN9R3sP2z4xVS; Wed, 21 Dec 2022 17:01:15 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=201602; t=1671602475; bh=9CkTGQZm+ETt1awmDueTXKpPJ6E4U44PXTRklRJoDbI=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=cqT/th8NIaBWdJn9n+MLHNNDGZ2I8MXbGjy730n+WxFl8JzuqC7S2Nbg8+gF9CCNw kKdQk7OR7qbVowFPrOdr+1xhyoE2ginKx3k7l+oy8rqRVGy2w9Hu/RxyaQltPldmj/ NTQx4kVDNBRAOlFFcCIrAlZ48cm+pOUR2pFFK/xk= Date: Wed, 21 Dec 2022 17:00:24 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 4/8] udp: Receive multiple datagrams at once on the pasta sock->tap path Message-ID: References: <20221205081425.2614425-1-david@gibson.dropbear.id.au> <20221205081425.2614425-5-david@gibson.dropbear.id.au> <20221213234847.6c723ad9@elisabeth> <20221214113546.16942d3a@elisabeth> <20221220114246.737b0c3e@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="oTzA6mqKfK1ln+tq" Content-Disposition: inline In-Reply-To: <20221220114246.737b0c3e@elisabeth> Message-ID-Hash: SYUPX2DLZX3SBIT6JZEJPLCMSUK2D75E X-Message-ID-Hash: SYUPX2DLZX3SBIT6JZEJPLCMSUK2D75E X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --oTzA6mqKfK1ln+tq Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Dec 20, 2022 at 11:42:46AM +0100, Stefano Brivio wrote: > Sorry for the further delay, >=20 > On Wed, 14 Dec 2022 11:35:46 +0100 > Stefano Brivio wrote: >=20 > > On Wed, 14 Dec 2022 12:42:14 +1100 > > David Gibson wrote: > >=20 > > > On Tue, Dec 13, 2022 at 11:48:47PM +0100, Stefano Brivio wrote: =20 > > > > Sorry for the long delay here, > > > >=20 > > > > On Mon, 5 Dec 2022 19:14:21 +1100 > > > > David Gibson wrote: > > > > =20 > > > > > Usually udp_sock_handler() will receive and forward multiple (up = to 32) > > > > > datagrams in udp_sock_handler(), then forward them all to the tap > > > > > interface. For unclear reasons, though, when in pasta mode we wi= ll only > > > > > receive and forward a single datagram at a time. Change it to re= ceive > > > > > multiple datagrams at once, like the other paths. =20 > > > >=20 > > > > This is explained in the commit message of 6c931118643c ("tcp, udp: > > > > Receive batching doesn't pay off when writing single frames to tap"= ). > > > >=20 > > > > I think it's worth re-checking the throughput now as this path is a= bit > > > > different, but unfortunately I didn't include this in the "perf" te= sts :( > > > > because at the time I introduced those I wasn't sure it even made s= ense to > > > > have traffic from the same host being directed to the tap device. > > > >=20 > > > > The iperf3 runs were I observed this are actually the ones from the= Podman > > > > demo. Ideally that case should be also checked in the perf/pasta_ud= p tests. =20 > > >=20 > > > Hm, ok. > > > =20 > > > > How fundamental is this for the rest of the series? I couldn't find= any > > > > actual dependency on this but I might be missing something. =20 > > >=20 > > > So the issue is that prior to this change in pasta we receive multiple > > > frames at once on the splice path, but one frame at a time on the tap > > > path. By the end of this series we can't do that any more, because we > > > don't know before the recvmmsg() which one we'll be doing. =20 > >=20 > > Oh, right, I see. Then let me add this path to the perf/pasta_udp test > > and check how relevant this is now, I'll get back to you in a bit. >=20 > I was checking the wrong path. With this: >=20 > diff --git a/test/perf/pasta_udp b/test/perf/pasta_udp > index 27ea724..973c2f4 100644 > --- a/test/perf/pasta_udp > +++ b/test/perf/pasta_udp > @@ -31,6 +31,14 @@ report pasta lo_udp 1 __FREQ__ > =20 > th MTU 1500B 4000B 16384B 65535B > =20 > +tr UDP throughput over IPv6: host to ns > +nsout IFNAME ip -j link show | jq -rM '.[] | select(.link_type =3D=3D "e= ther").ifname' > +nsout ADDR6 ip -j -6 addr show|jq -rM '.[] | select(.ifname =3D=3D "__IF= NAME__").addr_info[] | select(.scope =3D=3D "global" and .prefixlen =3D=3D = 64).local' > +bw - > +bw - > +bw - > +iperf3 BW host ns __ADDR6__ 100${i}2 __THREADS__ __TIME__ __OPTS__ -b 15G > +bw __BW__ 7.0 9.0 > =20 > tr UDP throughput over IPv6: ns to host > ns ip link set dev lo mtu 1500 > diff --git a/test/run b/test/run > index e07513f..b53182b 100755 > --- a/test/run > +++ b/test/run > @@ -67,6 +67,14 @@ run() { > test build/clang_tidy > teardown build > =20 > + VALGRIND=3D0 > + setup passt_in_ns > + test passt/ndp > + test passt/dhcp > + test perf/pasta_udp > + test passt_in_ns/shutdown > + teardown passt_in_ns > + > setup pasta > test pasta/ndp > test pasta/dhcp Ah, ok. Can we add that to the standard set of tests ASAP, please. > I get 21.6 gbps after this series, and 29.7 gbps before -- it's quite > significant. Drat. > And there's nothing strange in perf's output, really, the distribution > of overhead per functions is pretty much the same, but writing multiple > messages to the tap device just takes more cycles per message compared > to a single message. That's so weird. It should be basically an identical set of write()s, except that they happen in a batch, rather than a bit spread out. I guess it has to be some kind of cache locality thing. I wonder if the difference would go away or reverse if we had a way to submit multiple frames with a single syscall. > I'm a bit ashamed to propose this, but do you think about something > like: > if (c->mode =3D=3D MODE_PASTA) { if (recvmmsg(ref.r.s, mmh_recv, > 1, 0, NULL) <=3D 0) return; > if (udp_mmh_splice_port(v6, mmh_recv)) { n =3D > recvmmsg(ref.r.s, mmh_recv + 1, UDP_MAX_FRAMES > - 1, 0, NULL); } > if (n > 0) n++; else n =3D 1; } else { n =3D > recvmmsg(ref.r.s, mmh_recv, UDP_MAX_FRAMES, 0, > NULL); if (n <=3D 0) return; } > ? Other than the inherent ugliness, it looks like a good > approximation to me. Hmm. Well, the first question is how much impact does going 1 message at a time have on the spliced throughput. If it's not too bad, then we could just always go one at a time for pasta, regardless of splicing. And we could even abstract that difference into the tap backend with a callback like tap_batch_size(c). --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --oTzA6mqKfK1ln+tq Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEoULxWu4/Ws0dB+XtgypY4gEwYSIFAmOioNwACgkQgypY4gEw YSJ9mQ//UwoWSz06rQkn95sZtR2LZNXNCarHY29m5bLXs1s44Vnypr79cur4p+kz 6BhL2miz/aPcoOJPRxzv/ZUEqOtpHY/8rMm3TM8H+1YLDOIzrtGeNCdd+wravyxn JDZ8dGuXgHASiKnCpFzJTUBxenVKKqYG3/ydHuX8nVVKC6qp7Y0fK7FUcuqKMuYS bo/uFgCwkNi76AkN+od1zHpL3y1IaTB2KRo8liDwyJ5HXQG5SU4/GBYSo55hsNJz 4rkmp2s3UtLTUN/mV4hXSfondu3nzInREKA/vp3BkspJ/ea5LgiGO8Pn/+biJXmL e0TNcyRWBfaMDQt9ZOEX3Norj4OshwdgD2MjkdChrJJxftMzK6mBcX+mnqNwoBom YRPu9k0XL+Y8h4cEfaGnywOy0VNvr34hVDgz+DQLCVhUPGdhjxPUdOiNaEIJFhez s8dgTofyEhcNZ7/xGI39ZVnJFkeR79bg8ZjGoiAFf+3t26qw3crWF0Rafm2wrPuQ CpE3PHqMsxkwXaiO+cBS0YbOYSqVVdr3rwyQGecanjdihRzvySN1rA0uvO2sVPF9 rfBPr4Q+MG7Woomf5bMr9p8VOdKZheDp5wLShjdeAjxcPKRGLxD2PwkYeAGzTy7N YP3Nz+N1l2gcmSs3dPa1j+OcrCSqCWMMvCJF5WEzHZ8dFOpWi/g= =TlEr -----END PGP SIGNATURE----- --oTzA6mqKfK1ln+tq--