From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202508 header.b=SR597lB/; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 6B8CA5A026F for ; Mon, 29 Sep 2025 08:04:11 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202508; t=1759125849; bh=/A+CqwVvnvgBylHmN1aK4fig4YwPkVk/dkfK9sFsHuY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=SR597lB/YWvY/Vxs4TR2dNFxqB3lgSJGPqRKJC5IrsRHPDnWFt/cy6NinqvH4A85+ mKlEyDcAfwsem8YuSfBLb8qOV9zpEFKhAg/YqbdnCQPWIlgMNCzfM9SaTLP+Sie6Fq DS9nmanZ0Mgu2cxWjJ1xz0dXnFvxTHN69hr2vHxFgHFR0zEG9PlRKyi1PXO7m7AOh8 GfBxqt/r7BQfEyXEJmUxBSE/jpt4OmzSyBA3Z6MHbquOSfDoEo9qnMnjrNBoGemxn9 nQiMGoaxyhP7vMmZLePrGuI+PNBwjGbkovXUucZF29jss/HbOSOGXlh9tcx9jqhU9z a2HOvitTx0H6A== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4cZrHF0dmfz4wBD; Mon, 29 Sep 2025 16:04:09 +1000 (AEST) Date: Mon, 29 Sep 2025 15:56:30 +1000 From: David Gibson To: Jon Maloy Subject: Re: [PATCH v11 2/9] fwd: Add cache table for ARP/NDP contents Message-ID: References: <20250927192522.3024554-1-jmaloy@redhat.com> <20250927192522.3024554-3-jmaloy@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="/8PtvUmeC2QRSX9f" Content-Disposition: inline In-Reply-To: <20250927192522.3024554-3-jmaloy@redhat.com> Message-ID-Hash: GAXW4HUP35W6CRFIKNU3OLXARTPNSJGE X-Message-ID-Hash: GAXW4HUP35W6CRFIKNU3OLXARTPNSJGE X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: sbrivio@redhat.com, dgibson@redhat.com, passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --/8PtvUmeC2QRSX9f Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Sep 27, 2025 at 03:25:15PM -0400, Jon Maloy wrote: > We add a cache table to keep track of the contents of the kernel ARP > and NDP tables. The table is fed from the just introduced netlink based > neigbour subscription function. The new table eliminates the need for > explicit netlink calls to find a host's MAC address. >=20 > Signed-off-by: Jon Maloy A few comments below, but they're all pretty minor, so Reviewed-by: David Gibson >=20 > --- > v5: - Moved to earlier in series to reduce rebase conflicts > v6: - Sqashed the hash list commit and the FIFO/LRU queue commit > - Removed hash lookup. We now only use linear lookup in a > linked list > - Eliminated dynamic memory allocation. > - Ensured there is only one call to clock_gettime() > - Using MAC_ZERO instead of the previously dedicated definitions > v7: - NOW using MAC_ZERO where needed > - I am still using linear back-off for empty cache entries. Even > an incoming, flow-creating packet from a local host gives no > guarantee that its MAC address is in the ARP table, so we must > allow for a few new attempts at first possible occasions. Only > after several failed lookups can we conclude that we probably > never will succeed. Hence the back-off. > - Fixed a bug that David inadvertently made me aware of: I only > intended to set the initial expiry value to MAC_CACHE_RENEWAL > when an ARP/NDP table lookup was successful. > - Improved struct and function description comments. > v8: - Total re-design of table, adapting to the new, subscription > based way of updating it. > v9: - Catering for MAC address change for an existing host. > v10: - Changes according to feedback from David Gibson > --- > conf.c | 1 + > fwd.c | 164 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > fwd.h | 7 +++ > netlink.c | 2 + > 4 files changed, 174 insertions(+) >=20 > diff --git a/conf.c b/conf.c > index 66b9e63..cfd8590 100644 > --- a/conf.c > +++ b/conf.c > @@ -2133,6 +2133,7 @@ void conf(struct ctx *c, int argc, char **argv) > c->udp.fwd_out.mode =3D fwd_default; > =20 > fwd_scan_ports_init(c); > + fwd_neigh_table_init(); > =20 > if (!c->quiet) > conf_print(c); > diff --git a/fwd.c b/fwd.c > index 250cf56..2fd6cee 100644 > --- a/fwd.c > +++ b/fwd.c > @@ -33,6 +33,170 @@ static in_port_t fwd_ephemeral_max =3D NUM_PORTS - 1; > =20 > #define PORT_RANGE_SYSCTL "/proc/sys/net/ipv4/ip_local_port_range" > =20 > +#define NEIGH_TABLE_SLOTS 1024 /* Must be power of two */ A comment is better than nothing, but a static_assert() is better still. IMO, not having the restriction at all is even better, since it's pretty easy to do so in this case. > +#define NEIGH_TABLE_SIZE (NEIGH_TABLE_SLOTS / 2) > + > +/** > + * neigh_table_entry - Entry in the ARP/NDP table > + * @next: Next entry in slot or free list > + * @addr: IP address of represented host > + * @mac: MAC address of represented host > + */ > +struct neigh_table_entry { > + struct neigh_table_entry *next; > + union inany_addr addr; > + uint8_t mac[ETH_ALEN]; > +}; > + > +/** > + * neigh_table - Cache of ARP/NDP table contents > + * @entries: Entries to be plugged into the hash slots when allocated > + * @slots: Hash table slots > + * @free: Linked list of unused entries > + */ > +struct neigh_table { > + struct neigh_table_entry entries[NEIGH_TABLE_SIZE]; > + struct neigh_table_entry *slots[NEIGH_TABLE_SLOTS]; Given that we have bucket chains, I don't think it's particularly required for this to be bigger than SIZE (unlike the flow table, where linear probing means it's absolutely vital we have more hash buckets than the maximum number of entries). > + struct neigh_table_entry *free; > +}; > + > +static struct neigh_table neigh_table; > + > +/** > + * neigh_table_slot() - Hash key to a number within the table range > + * @c: Execution context > + * @key: The key to be used for the hash > + * > + * Return: The resulting hash value > + */ > +static inline size_t neigh_table_slot(const struct ctx *c, > + const union inany_addr *key) > +{ > + struct siphash_state st =3D SIPHASH_INIT(c->hash_secret); > + uint32_t i; > + > + inany_siphash_feed(&st, key); > + i =3D siphash_final(&st, sizeof(*key), 0); > + > + return ((size_t)i) & (NEIGH_TABLE_SIZE - 1); > +} > + > +/** > + * fwd_neigh_table_find() - Find a MAC table entry > + * @c: Execution context > + * @addr: Neighbour address to be used as key for the lookup > + * > + * Return: The found entry, if any. Otherwise NULL. > + */ > +static struct neigh_table_entry *fwd_neigh_table_find(const struct ctx *= c, > + const union inany_addr *addr) > +{ > + size_t slot =3D neigh_table_slot(c, addr); > + struct neigh_table_entry *e =3D neigh_table.slots[slot]; > + > + while (e && !inany_equals(&e->addr, addr)) > + e =3D e->next; > + > + return e; > +} > + > +/** > + * fwd_neigh_table_update() - Allocate a neigbour table entry from the f= ree list I like the change to the name, but now the one line description is misleading. > + * @c: Execution context > + * @addr: Address used to determine insertion slot and store in entry > + * @mac: The MAC address associated with the neighbour address > + */ > +void fwd_neigh_table_update(const struct ctx *c, > + const union inany_addr *addr, uint8_t *mac) > +{ > + struct neigh_table *t =3D &neigh_table; > + struct neigh_table_entry *e; > + ssize_t slot; > + > + /* MAC address might change sometimes */ > + e =3D fwd_neigh_table_find(c, addr); > + if (e) { > + memcpy(e->mac, mac, ETH_ALEN); > + return; > + } > + > + e =3D t->free; > + if (!e) > + return; > + t->free =3D e->next; > + > + slot =3D neigh_table_slot(c, addr); > + e->next =3D t->slots[slot]; > + t->slots[slot] =3D e; > + > + memcpy(&e->addr, addr, sizeof(*addr)); > + memcpy(e->mac, mac, ETH_ALEN); > +} > + > +/** > + * fwd_neigh_table_free() - Remove an entry from a slot and add it to fr= ee list > + * @c: Execution context > + * @addr: Neighbour address used to find the slot for the entry > + */ > +void fwd_neigh_table_free(const struct ctx *c, const union inany_addr *a= ddr) > +{ > + ssize_t slot =3D neigh_table_slot(c, addr); > + struct neigh_table *t =3D &neigh_table; > + struct neigh_table_entry *e, **prev; > + > + prev =3D &t->slots[slot]; > + e =3D t->slots[slot]; > + while (e && !inany_equals(&e->addr, addr)) { > + prev =3D &e->next; > + e =3D e->next; > + } > + if (!e) > + return; > + > + *prev =3D e->next; > + e->next =3D t->free; > + t->free =3D e; > + memset(&e->addr, 0, sizeof(*addr)); > + memset(e->mac, 0, ETH_ALEN); > +} > + > +/** > + * fwd_neigh_mac_get() - Lookup MAC address in the ARP/NDP table > + * @c: Execution context > + * @addr: Neighbour address used as lookup key > + * @mac: Buffer for Ethernet MAC to return, found or default value. > + * > + * Return: true if real MAC found, false if not found or if failure > + */ > +bool fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr, > + uint8_t *mac) > +{ > + struct neigh_table_entry *e =3D fwd_neigh_table_find(c, addr); > + > + if (e) > + memcpy(mac, e->mac, ETH_ALEN); > + else > + memcpy(mac, c->our_tap_mac, ETH_ALEN); > + > + return !!e; > +} > + > +/** > + * fwd_neigh_table_init() - Initialize the neighbor table > + */ > +void fwd_neigh_table_init(void) > +{ > + struct neigh_table *t =3D &neigh_table; > + struct neigh_table_entry *e; > + > + memset(t, 0, sizeof(*t)); > + for (int i =3D 0; i < NEIGH_TABLE_SIZE; i++) { > + e =3D &t->entries[i]; > + e->next =3D t->free; > + t->free =3D e; > + } > +} > + > /** fwd_probe_ephemeral() - Determine what ports this host considers eph= emeral > * > * Work out what ports the host thinks are emphemeral and record it for = later > diff --git a/fwd.h b/fwd.h > index 65c7c96..5464349 100644 > --- a/fwd.h > +++ b/fwd.h > @@ -56,5 +56,12 @@ uint8_t fwd_nat_from_splice(const struct ctx *c, uint8= _t proto, > const struct flowside *ini, struct flowside *tgt); > uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, > const struct flowside *ini, struct flowside *tgt); > +void fwd_neigh_table_update(const struct ctx *c, > + const union inany_addr *addr, uint8_t *mac); > +void fwd_neigh_table_free(const struct ctx *c, > + const union inany_addr *addr); > +bool fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr, > + uint8_t *mac); > +void fwd_neigh_table_init(void); > =20 > #endif /* FWD_H */ > diff --git a/netlink.c b/netlink.c > index 0dc4e9d..da9e1d5 100644 > --- a/netlink.c > +++ b/netlink.c > @@ -1189,8 +1189,10 @@ static void nl_neigh_msg_read(const struct ctx *c,= struct nlmsghdr *nh) > if (nh->nlmsg_type =3D=3D RTM_NEWNEIGH && > ndm->ndm_state & NUD_VALID) { > trace("neigh table update: %s / %s", ip_str, mac_str); > + fwd_neigh_table_update(c, &addr, mac); > } else { > trace("neigh table delete: %s / %s", ip_str, mac_str); > + fwd_neigh_table_free(c, &addr); > } > } > =20 > --=20 > 2.50.1 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --/8PtvUmeC2QRSX9f Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmjaH30ACgkQzQJF27ox 2GczKg/8CEvSUWDrX1hT/J+6IWabgyvcwpyEI2yK5JeUXoB/2BHqysafJORuvZ0l cAuJNFR+zg5fCXka/bC22i378fDyiIJrf/RX8R5+rsjkf5WeICH6QCvA/DZJgBIv JyjZV+LC0jpxNOxDlL3mGbREfB712HH31gBdXUgidD7Apk9DbnKfWcR7zlwg0sJr Ne/AF+W85XQvC3IwCCAgpHetizCuE11DVAXCOzy8bXHz2HO77j8VMaDjpqDjb7dG pJkM+yKYvuFIQ9EO5e4jy28a+iWh2VzPVE3T+Qh3bjcfbDebri907/cjB/YVA9io jawoVn1LdICeZ1sG/2hMVwQQ3SmlFvwpG0SohqD7EZnLdcWdkp7gT/l3KmGDtJtU /0ZfcIGezi+qZjv1eUsvAc4wgn4AWgV/YFP2VZgS6QDxykYXo/Oqc77wMQPXpjc8 wXtQPlB+fSHYVHabuunfeZPMiJFqqZ0b+28WnTilRObz8xz9123jQvCgBpWKz92x DC6mffkwghUxxcGdHppck7VxDVN0EVturvg5/j900mQMKvBcAOj2wIgV455WtOdL ZvUlDeLqr3m3mzETn46nt+BN1D5DtAjwa7vBeoSfzEczVzBx9Jz6aAYphpEqXzmq 7I+ysUEBfajOZvnoHXQMTCpskYuHbfWP8FscgolD+xSqxibun6w= =qZRC -----END PGP SIGNATURE----- --/8PtvUmeC2QRSX9f--