From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202508 header.b=Q+DaDqAR; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 1E12B5A0278 for ; Mon, 25 Aug 2025 03:55:38 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202508; t=1756086934; bh=l7cfbvFoapWukesr1YDUwazZp0oG74bRhFCyc4vJjks=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Q+DaDqARvxsw7sWgN/UJ5c6pHc/L4Ob75+oUe/3+lvVoQSghVor1aAZGOblpGXik/ 2S4WPCqzFGxpzpNBS++p67FZE5l6hy9T0Fp7+biW5CLrDsywhdzG4ASBBoInZ3wDkt sq9JQxMN9NzTqxlcut35woptJSQddSq1H6/mvqNcsXwOs4oyXWbqFSbxl/3hRFIt5v 0W+EJtn9RKxSTXijbEDkLhQNbTQ2aABkmmQ5JI6zw+GKS6rgxcMf1sWbNQtsKQVeI7 Ab/LYlgVL4BVFetbuJQPVubr6ro1/BLjpSnpRxgOSSJY6ILnnZjrjoBJB1ySqoGK6x YxGV8U2D6dadA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4c9DQZ6CpXz4x3L; Mon, 25 Aug 2025 11:55:34 +1000 (AEST) Date: Mon, 25 Aug 2025 11:48:24 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v4 9/9] fwd: Added cache table for ARP/NDP contents Message-ID: References: <20250820031005.2725591-1-jmaloy@redhat.com> <20250820031005.2725591-10-jmaloy@redhat.com> <20250821125336.0b8ef0dc@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="PBNSxTFKdF/QfjXu" Content-Disposition: inline In-Reply-To: <20250821125336.0b8ef0dc@elisabeth> Message-ID-Hash: NYQ2CPVWKNZAT27V7WJWY2K6NKPM6RSC X-Message-ID-Hash: NYQ2CPVWKNZAT27V7WJWY2K6NKPM6RSC X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Jon Maloy , dgibson@redhat.com, passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --PBNSxTFKdF/QfjXu Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Aug 21, 2025 at 12:53:36PM +0200, Stefano Brivio wrote: > On Thu, 21 Aug 2025 12:03:47 +1000 > David Gibson wrote: >=20 > > On Tue, Aug 19, 2025 at 11:10:05PM -0400, Jon Maloy wrote: > > > We add a cache table to keep partial contents of the kernel ARP/NDP > > > tables. This way, we drastically reduce the number of netlink calls > > > to read those tables. =20 > >=20 > > Do you have data to suggest this is necessary? >=20 > I'll chime in as I originally suggested that we need this cache. >=20 > Without it, we'll have one netlink query for each local, non-loopback > flow being established, which sounds rather absurd (...am I missing > something?). Not just local: we only need the MAC for local flows, but we don't know if it's local until we look for the MAC. Yeah, that's true. I guess I don't have a good sense of the relative latency of a netlink lookup versus a flow establishment, so I don't have an intuition as to whether that's absurd or not. Since you do, I'll defer to your judgement on this. > I haven't tested these changes yet but I suppose the usual tcp_crr test > should show the issue. >=20 > It's not just about TCP CRR latency though. We're adding this mostly > for use cases where some kind of LAN service is implemented by a > container (say, Pi-hole), and we can probably expect a ton of > short-lived TCP flows in those cases (say, DNS requests over TCP). Actually, I would have expected more UDP than TCP, but the issue is the same. > > It's a lot of code to optimise something only needed for some pretty > > uncommon cases. >=20 > Actually, it's a bit less code than I expected, but I don't understand > why you're assuming those cases are uncommon. Hm, maybe. > By the way, note that we should be able to get rid of most of this once > we implement a netlink monitor (which we need for other purposes), > because at that point we can also subscribe to ARP / neighbour table > changes. Not really. We don't need explicit queries in that case, but we still need the data structure to record the information we get from the monitor. The bulk of this code is implementing that data structure. > > > We create dummy cache entries representing non-(not-yet)-existing > > > ARP/NDP entries when needed. We add a short expiration time to each > > > such entry, so that we can know when to make repeated calls to the > > > kernel tables in the beginning. We also add an access counter to the > > > entries, to ensure that the timer becomes longer and the call frequen= cy > > > abates over time if no ARP/NDP entry is created. > > >=20 > > > For regular entries we use a much longer timer, with the purpose to > > > update the entry in the rare case that a remote host changes its > > > MAC address. > > >=20 > > > Signed-off-by: Jon Maloy > > > --- > > > arp.c | 3 +- > > > conf.c | 2 + > > > flow.c | 5 +- > > > fwd.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++++++++++= ++ > > > fwd.h | 4 ++ > > > ndp.c | 3 +- > > > tcp.c | 3 +- > > > 7 files changed, 218 insertions(+), 8 deletions(-) > > >=20 > > > diff --git a/arp.c b/arp.c > > > index c37867a..040d4fe 100644 > > > --- a/arp.c > > > +++ b/arp.c > > > @@ -29,7 +29,6 @@ > > > #include "dhcp.h" > > > #include "passt.h" > > > #include "tap.h" > > > -#include "netlink.h" > > > =20 > > > /** > > > * arp() - Check if this is a supported ARP message, reply as needed > > > @@ -79,7 +78,7 @@ int arp(const struct ctx *c, const struct pool *p) > > > */ > > > inany_from_af(&tgt, AF_INET, am->tip); > > > if (!fwd_inany_nat(c, &tgt)) > > > - nl_neigh_mac_get(nl_sock, &tgt, c->ifi4, am->sha); > > > + fwd_neigh_mac_get(c, &tgt, c->ifi4, am->sha); > > > =20 > > > memcpy(swap, am->tip, sizeof(am->tip)); > > > memcpy(am->tip, am->sip, sizeof(am->tip)); > > > diff --git a/conf.c b/conf.c > > > index f47f48e..0abdbf7 100644 > > > --- a/conf.c > > > +++ b/conf.c > > > @@ -2122,6 +2122,8 @@ void conf(struct ctx *c, int argc, char **argv) > > > c->udp.fwd_out.mode =3D fwd_default; > > > =20 > > > fwd_scan_ports_init(c); > > > + if (fwd_mac_cache_init()) > > > + die("Failed to initiate neighnor MAC cache"); =20 > >=20 > > "neighnor" >=20 > Jon, by the way, we've been using BrE quite consistently throughout the > codebase, so the two occurrences of "neighbour" (code comments only) > have, well, a 'u' in them. I'd try to keep that consistency if it > doesn't bother anybody. >=20 > --=20 > Stefano >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --PBNSxTFKdF/QfjXu Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmirwNoACgkQzQJF27ox 2Gf0QBAAlg46qPUqRutpbCayl8Lt3n3VEtUdtEU2xOOzO2GOHbOspF1suclPHRX8 CxsyLqpfWHYYlB4x9/gwncf4EDl9l014/um8vvd5w3RxwY6T33lsdp9AebUl/4rr OMP71+7uAWg3ThBnZv6yE0/tBvel1t1S5pALUVqMku4/ytqxPYOgMihcFSQuHlVG LF8jiq8qA6Xruh5OWLP4Mknc39DGylgV+BqWVArAQ7jKd5rw2s4R/lmATPtTSZMc Z8t7X/ZSlL623ZNKryv8hzYaCShvm+8TXimqK36cduRzUddjTH6u6wBEk/U5CjrE M5Uk03FWE5ki91qYnfzg0IxGsUlbU8yZrRD8dsYKVabEitsCQBVCbIXFseMfa+G8 JRGt+2g2UjzWzURcijhST0UvMjJ6kpedCvvvUxRkAmwOg0NDFA/HRL+ocPHz/Apb KejrO6cx38Sir8+qvqRcCLON+9tTzw2Wqg8CsWaWRboFYYVOTbERxUY3JQD5NHXJ JjFfW0MNslnf9+m95uGkJwYOOB3Q5eE5l1kK0KT/YzlSDzUOXKDE5rxZXxDqgQK7 GrZ08rVAZItN2Dw5pg3wyZrB9KteiqqymJ2WxMw5PwQhwqQoEysmIIiha64WaKMr AvJcY1Y0dnXGjF03qWPLNmpDklqqxsjGJY6lv2tWI13vCNbe3BE= =c+Wd -----END PGP SIGNATURE----- --PBNSxTFKdF/QfjXu--