From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202412 header.b=mX8u6XU0; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 6D9FF5A0275 for ; Thu, 02 Jan 2025 04:47:04 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202412; t=1735789607; bh=1/NLlQyUVRQ07eMM04Xpz9vZX5P7cVG+NOZmR3d8zcI=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=mX8u6XU0HDOSUxSyRRAzT9bn6kPku/sK1s9b5+FiEfS4y1gEFJj6e8P4W0LqTg29v 9KQlPiPdpLqMiqXtxv3M2CXubbwqzQp1Bg1WnetcJoc9WN40v7vthxDbFbSyT2hw9G u5zKiYzJ6Sepi+kIcQNfDaUXx/xSpSfku18llnHqErXua23XnqfhS6B/8RUAIg4vk+ qpgwC1WpRe1PPjYz14qwTKQgMV4gCAiTQW5WTfVvwe0/BRqaN6yXJC0KTGVZUSSWur gHFOYD7PHwtp6KiP4rzsdV2CirlHoD4QY5Pj7Q7hoSn3cSx6A/tmB3Covzbao6PTvO Njql3OLQALNZA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4YNt1M005qz4wc4; Thu, 2 Jan 2025 14:46:46 +1100 (AEDT) Date: Thu, 2 Jan 2025 14:46:45 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v2 11/12] tap: Don't size pool_tap[46] for the maximum number of packets Message-ID: References: <20241220083535.1372523-1-david@gibson.dropbear.id.au> <20241220083535.1372523-12-david@gibson.dropbear.id.au> <20250101225444.130c1034@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="mSGJb+Jci1nJ0pVz" Content-Disposition: inline In-Reply-To: <20250101225444.130c1034@elisabeth> Message-ID-Hash: VVZ35ZL6E2JLJCO3BT5G53NJCVBUBZDQ X-Message-ID-Hash: VVZ35ZL6E2JLJCO3BT5G53NJCVBUBZDQ X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --mSGJb+Jci1nJ0pVz Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jan 01, 2025 at 10:54:44PM +0100, Stefano Brivio wrote: > On Fri, 20 Dec 2024 19:35:34 +1100 > David Gibson wrote: >=20 > > Currently we attempt to size pool_tap[46] so they have room for the max= imum > > possible number of packets that could fit in pkt_buf, TAP_MSGS. Howeve= r, > > the calculation isn't quite correct: TAP_MSGS is based on ETH_ZLEN (60)= as > > the minimum possible L2 frame size. But, we don't enforce that L2 fram= es > > are at least ETH_ZLEN when we receive them from the tap backend, and si= nce > > we're dealing with virtual interfaces we don't have the physical Ethern= et > > limitations requiring that length. Indeed it is possible to generate a > > legitimate frame smaller than that (e.g. a zero-payload UDP/IPv4 frame = on > > the 'pasta' backend is only 42 bytes long). > >=20 > > It's also unclear if this limit is sufficient for vhost-user which isn't > > limited by the size of pkt_buf as the other modes are. > >=20 > > We could attempt to correct the calculation, but that would leave us wi= th > > even larger arrays, which in practice rarely accumulate more than a han= dful > > of packets. So, instead, put an arbitrary cap on the number of packets= we > > can put in a batch, and if we run out of space, process and flush the > > batch. >=20 > I ran a few more tests with this, keeping TAP_MSGS at 256, and in > general I couldn't really see a difference in latency (especially for > UDP streams with small packets) or throughput. Figures from short > throughput tests (such as the ones from the test suite) look a bit more > variable, but I don't have any statistically meaningful data. >=20 > Then I looked into how many messages we might have in the array without > this change, and I realised that, with the throughput tests from the > suite, we very easily exceed the 256 limit. Ah, interesting. > Perhaps surprisingly we get the highest buffer counts with TCP transfers > and intermediate MTUs: we're at about 4000-5000 with 1500 bytes (and > more like ~1000 with 1280 bytes) meaning that we move 6 to 8 megabytes > in one shot, every 5-10ms (at 8 Gbps). With that kind of time interval, > the extra system call overhead from forcibly flushing batches might > become rather relevant. Really? I thought syscall overhead (as in the part that's per-syscall, rather than per-work) was generally in the tens of =B5s range, rather than the ms range. But in any case, I'd be fine with upping the size of the array to 4k or 8k based on that empirical data. That's still much smaller than the >150k we have now. > With lower MTUs, it looks like we have a lower CPU load and > transmissions are scheduled differently (resulting in smaller batches), > but I didn't really trace things. Ok. I wonder if with the small MTUs we're hitting throughput bottlenecks elsewhere which mean this particular path isn't over-exercised. > So I start thinking that this has the *potential* to introduce a > performance regression in some cases and we shouldn't just assume that > some arbitrary 256 limit is good enough. I didn't check with perf(1), > though. >=20 > Right now that array takes, effectively, less than 100 KiB (it's ~5000 > copies of struct iovec, 16 bytes each), and in theory that could be > ~2.5 MiB (at 161319 items). Even if we double or triple that (let's > assume we use 2 * ETH_ALEN to keep it simple) it's not much... and will > have no practical effect anyway. Yeah, I guess. I don't love the fact that currently for correctness (not spuriously dropping packets) we rely on a fairly complex calculation that's based on information from different layers: the buffer size and enforcement is in the packet pool layer and is independent of packet layout, but the minimum frame size comes from the tap layer and depends quite specifically on which L2 encapsulation we're using. > All in all, I think we shouldn't change this limit without a deeper > understanding of the practical impact. While this change doesn't bring > any practical advantage, the current behaviour is somewhat tested by > now, and a small limit isn't. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --mSGJb+Jci1nJ0pVz Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmd2DCUACgkQzQJF27ox 2GeAXw//Vyxe4UUFyRA+QV1V5KDHCOZW2FyejIKRwm4WaNVYHMFwQsxTDF89wRdN ZqZexk6KaOzyQ6PVK+pC4kl3eoQbKwocIJNfzIVNcDrOyvom3yJLnsacnIAArTr2 21pTojNKVvGO/y4fTSwexVi6c+OFaTgvrO9hyMxRtWOfoSwQLUhcJigMsGBNi8Ju X9Z6u6SqZ74NNy3kgmHhBlmw5kX7+XMACRZbtgG38HhgaNnjeORFfjTFQ95mEEJ5 qLEOtGsMPknt2QzICwvLTYBpr0+g8boeEzNuokkR4uSK4Dz0rzSnkFhCTxqE8ehd S1KEQOp2GVeeGHgbsMgCHNlJEfa25raYDYvVPaM5KvVSVjTCDB9dC6IfWiPA90cZ YKWEsqciYyafcB4MPN69ZHXbllS1HMIJ3NHDSDDPS4N9+vG++6vMbE4VXU98iiNg NQYCY4WrAGlezL1nuuxxPCuJu9rR025dSTgrUisOhfzMxOkyYNAqQ07KWANspy1X QGb1eKE6Wo/YRQm9Y0O5oXQ4FGCNXrPM1yGWSpCUwWKBQ3VOyQ/mvV9N8XjKF4b3 iwiPw/6ExMQKscrI+240Se+o4Y7fMDBLO1B8t4498smBsuS843QdjC736tjFoHde XwXt5df3hiQ27VqF1ZQ508RsOgjX+FVOkit9dScE4K9XTGsWeIA= =+EfB -----END PGP SIGNATURE----- --mSGJb+Jci1nJ0pVz--