From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202510 header.b=X8H2gvVX; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 309B85A061A for ; Tue, 14 Oct 2025 07:04:51 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202510; t=1760418288; bh=iE5efkOBUEJVlxXZRfhZgryTUiwQc4IDNRLf5wTuT5s=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=X8H2gvVX0KsoS+w42Z3KnMlzcXJT8ojblh2pUHld2amUSB7/q+2XFlJtaLNmO/w4a qpLimU9zIH7e/D5l6p21nkEyJ6hDjx1836yz6U08Co0sz2o9/dj7I80E19y/rl/TzK jquIn5jGoQObp9t07nWFurPybArZe9gVi/3d8yFUbzIKDfo+hIHqwYCV2ESy+a2AvX KcFEpRjsnDxk0aAbO53jtj4HuwHDDBsxoSV4HJDUDhd4P8NNUC67GfY3M5ctC0EaVM tYqaNND9IIalFr2ikMA8/bwekPZcXAq6/MzVQHUgKLfEZ8/3W9H+TpS1ht8ifmEhY4 vgtYoP28bvS6g== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4cm2Fr12Drz4wCl; Tue, 14 Oct 2025 16:04:48 +1100 (AEDT) Date: Tue, 14 Oct 2025 15:55:54 +1100 From: David Gibson To: Jon Maloy Subject: Re: [PATCH v13 03/10] fwd: Add cache table for ARP/NDP contents Message-ID: References: <20251012193337.616835-1-jmaloy@redhat.com> <20251012193337.616835-4-jmaloy@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="MsVaFzxL+iTqLd3Y" Content-Disposition: inline In-Reply-To: <20251012193337.616835-4-jmaloy@redhat.com> Message-ID-Hash: 7SAL5JXPUZJ7PJNUSUZKHWY4LRG4BMCQ X-Message-ID-Hash: 7SAL5JXPUZJ7PJNUSUZKHWY4LRG4BMCQ X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: sbrivio@redhat.com, dgibson@redhat.com, passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --MsVaFzxL+iTqLd3Y Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Oct 12, 2025 at 03:33:30PM -0400, Jon Maloy wrote: > We add a cache table to keep track of the contents of the kernel ARP > and NDP tables. The table is fed from the just introduced netlink based > neigbour subscription function. The new table eliminates the need for > explicit netlink calls to find a host's MAC address. Last sentence only really makes sense in the context of earlier versions of this series, which won't be in the final commit log. >=20 > Signed-off-by: Jon Maloy >=20 > --- > v5: - Moved to earlier in series to reduce rebase conflicts > v6: - Sqashed the hash list commit and the FIFO/LRU queue commit > - Removed hash lookup. We now only use linear lookup in a > linked list > - Eliminated dynamic memory allocation. > - Ensured there is only one call to clock_gettime() > - Using MAC_ZERO instead of the previously dedicated definitions > v7: - NOW using MAC_ZERO where needed > - I am still using linear back-off for empty cache entries. Even > an incoming, flow-creating packet from a local host gives no > guarantee that its MAC address is in the ARP table, so we must > allow for a few new attempts at first possible occasions. Only > after several failed lookups can we conclude that we probably > never will succeed. Hence the back-off. > - Fixed a bug that David inadvertently made me aware of: I only > intended to set the initial expiry value to MAC_CACHE_RENEWAL > when an ARP/NDP table lookup was successful. > - Improved struct and function description comments. > v8: - Total re-design of table, adapting to the new, subscription > based way of updating it. > v9: - Catering for MAC address change for an existing host. > v10: - Changes according to feedback from David Gibson > v12: - Changes according to feedback from David and Stefano > - Added dummy entries for loopback and default GW addresses > v13: - Changes according to feedback and discussions with David > and Stefano > --- > fwd.c | 222 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > fwd.h | 7 ++ > netlink.c | 3 + > passt.c | 1 + > 4 files changed, 233 insertions(+) >=20 > diff --git a/fwd.c b/fwd.c > index 250cf56..d062ebc 100644 > --- a/fwd.c > +++ b/fwd.c > @@ -26,6 +26,7 @@ > #include "passt.h" > #include "lineread.h" > #include "flow_table.h" > +#include "netlink.h" > =20 > /* Empheral port range: values from RFC 6335 */ > static in_port_t fwd_ephemeral_min =3D (1 << 15) + (1 << 14); > @@ -33,6 +34,227 @@ static in_port_t fwd_ephemeral_max =3D NUM_PORTS - 1; > =20 > #define PORT_RANGE_SYSCTL "/proc/sys/net/ipv4/ip_local_port_range" > =20 > +#define NEIGH_TABLE_SLOTS 1024 > +#define NEIGH_TABLE_SIZE (NEIGH_TABLE_SLOTS / 2) > +static_assert((NEIGH_TABLE_SLOTS & (NEIGH_TABLE_SLOTS - 1)) =3D=3D 0, > + "NEIGH_TABLE_SLOTS must be a power of two"); > + > +/** > + * struct neigh_table_entry - Entry in the ARP/NDP table > + * @next: Next entry in slot or free list > + * @addr: IP address of represented host > + * @mac: MAC address of represented host > + * @permanent: Entry cannot be altered or freed by notification > + */ > +struct neigh_table_entry { > + struct neigh_table_entry *next; > + union inany_addr addr; > + uint8_t mac[ETH_ALEN]; > + bool permanent; > +}; > + > +/** > + * struct neigh_table - Cache of ARP/NDP table contents > + * @entries: Entries to be plugged into the hash slots when allocated > + * @slots: Hash table slots > + * @free: Linked list of unused entries > + */ > +struct neigh_table { > + struct neigh_table_entry entries[NEIGH_TABLE_SIZE]; > + struct neigh_table_entry *slots[NEIGH_TABLE_SLOTS]; > + struct neigh_table_entry *free; > +}; > + > +static struct neigh_table neigh_table; > + > +/** > + * neigh_table_slot() - Hash key to a number within the table range > + * @c: Execution context > + * @key: The key to be used for the hash > + * > + * Return: the resulting hash value > + */ > +static size_t neigh_table_slot(const struct ctx *c, > + const union inany_addr *key) > +{ > + struct siphash_state st =3D SIPHASH_INIT(c->hash_secret); > + uint32_t i; > + > + inany_siphash_feed(&st, key); > + i =3D siphash_final(&st, sizeof(*key), 0); > + > + return ((size_t)i) & (NEIGH_TABLE_SIZE - 1); > +} > + > +/** > + * fwd_neigh_table_find() - Find a MAC table entry > + * @c: Execution context > + * @addr: Neighbour address to be used as key for the lookup > + * > + * Return: the matching entry, if found. Otherwise NULL > + */ > +static struct neigh_table_entry *fwd_neigh_table_find(const struct ctx *= c, > + const union inany_addr *addr) > +{ > + size_t slot =3D neigh_table_slot(c, addr); > + struct neigh_table_entry *e =3D neigh_table.slots[slot]; > + > + while (e && !inany_equals(&e->addr, addr)) > + e =3D e->next; > + > + return e; > +} > + > +/** > + * fwd_neigh_table_update() - Allocate or update neighbour table entry > + * @c: Execution context > + * @addr: IP address used to determine insertion slot and store in entry > + * @mac: The MAC address associated with the neighbour address > + * @permanent: Created entry cannot be altered or freed > + */ > +void fwd_neigh_table_update(const struct ctx *c, const union inany_addr = *addr, > + const uint8_t *mac, bool permanent) > +{ > + struct neigh_table *t =3D &neigh_table; > + struct neigh_table_entry *e; > + union inany_addr daddr; > + ssize_t slot; > + > + /* We only add guest-side visible addresses */ > + if (!nat_inbound(c, addr, &daddr)) > + return; To me, it makes more sense to have the nat_inbound() in the caller. That way the hash table is just a hash table, with no specific knowledge of what the contents mean. The caller is the thing which knows it has a host side address, and wants to update a corresponding entry in a table indexed by guest side address. It also means that the fwd_neigh_table_*() functions will consistently take guest side addresses. > + /* MAC address might change sometimes */ > + e =3D fwd_neigh_table_find(c, &daddr); > + if (e) { > + if (!e->permanent) > + memcpy(e->mac, mac, ETH_ALEN); > + return; > + } > + > + e =3D t->free; > + if (!e) { > + debug("Failed to allocate neighbour table entry"); > + return; > + } > + t->free =3D e->next; > + slot =3D neigh_table_slot(c, &daddr); > + e->next =3D t->slots[slot]; > + t->slots[slot] =3D e; > + > + memcpy(&e->addr, &daddr, sizeof(daddr)); > + memcpy(e->mac, mac, ETH_ALEN); > + e->permanent =3D permanent; > +} > + > +/** > + * fwd_neigh_table_free() - Remove an entry from a slot and add it to fr= ee list > + * @c: Execution context > + * @addr: IP address used to find the slot for the entry > + */ > +void fwd_neigh_table_free(const struct ctx *c, const union inany_addr *a= ddr) > +{ > + ssize_t slot =3D neigh_table_slot(c, addr); > + struct neigh_table *t =3D &neigh_table; > + struct neigh_table_entry *e, **prev; > + > + prev =3D &t->slots[slot]; > + e =3D t->slots[slot]; > + while (e && !inany_equals(&e->addr, addr)) { > + prev =3D &e->next; > + e =3D e->next; > + } > + > + if (!e || e->permanent) > + return; > + > + *prev =3D e->next; > + e->next =3D t->free; > + t->free =3D e; > + memset(&e->addr, 0, sizeof(*addr)); > + memset(e->mac, 0, ETH_ALEN); > +} > + > +/** > + * fwd_neigh_mac_get() - Look up MAC address in the ARP/NDP table > + * @c: Execution context > + * @addr: Neighbour IP address used as lookup key > + * @mac: Buffer for returned MAC address > + */ > +void fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr, > + uint8_t *mac) > +{ > + const struct neigh_table_entry *e =3D fwd_neigh_table_find(c, addr); > + > + if (e) > + memcpy(mac, e->mac, ETH_ALEN); > + else > + memcpy(mac, c->our_tap_mac, ETH_ALEN); > +} > + > +/** > + * fwd_neigh_table_init() - Initialize the neighbour table > + * @c: Execution context > + */ > +void fwd_neigh_table_init(const struct ctx *c) > +{ > + union inany_addr mhl =3D inany_from_v4(c->ip4.map_host_loopback); > + union inany_addr mga =3D inany_from_v4(c->ip4.map_guest_addr); > + union inany_addr ggw =3D inany_from_v4(c->ip4.guest_gw); > + struct neigh_table *t =3D &neigh_table; > + struct neigh_table_entry *e; > + int i; > + > + memset(t, 0, sizeof(*t)); > + > + for (i =3D 0; i < NEIGH_TABLE_SIZE; i++) { > + e =3D &t->entries[i]; > + e->next =3D t->free; > + t->free =3D e; > + } > + > + /* Blocker entries to stop events from hosts using these addresses */ > + if (!inany_is_unspecified4(&mhl)) > + fwd_neigh_table_update(c, &mhl, c->our_tap_mac, true); Here mhl is already a guest side address - and may not have a corresponding address host side - so we definitely don't want the nat_inbound() inside fwd_neigh_table_update(). > + if (!inany_is_unspecified4(&ggw) && !c->no_map_gw) > + fwd_neigh_table_update(c, &ggw, c->our_tap_mac, true); Same here. > + if (!inany_is_unspecified4(&mga) && !inany_equals(&mhl, &mga)) { > + uint8_t mac[ETH_ALEN]; > + int rc; > + > + rc =3D nl_link_get_mac(nl_sock, c->ifi4, mac); > + if (rc < 0) { > + debug("Couldn't get ip4 MAC addr: %s", strerror_(-rc)); > + memcpy(mac, c->our_tap_mac, ETH_ALEN); > + } > + fwd_neigh_table_update(c, &mga, mac, true); And here. > + } > + > + mhl =3D *(union inany_addr *)&c->ip6.map_host_loopback; > + mga =3D *(union inany_addr *)&c->ip4.map_guest_addr; > + ggw =3D *(union inany_addr *)&c->ip4.guest_gw; > + > + if (!inany_is_unspecified6(&mhl)) > + fwd_neigh_table_update(c, &mhl, c->our_tap_mac, true); =2E.and so forth. > + if (!inany_is_unspecified6(&ggw) && !c->no_map_gw) > + fwd_neigh_table_update(c, &ggw, c->our_tap_mac, true); > + > + if (!inany_is_unspecified6(&mga) && !inany_equals(&mhl, &mga)) { > + uint8_t mac[ETH_ALEN]; > + int rc; > + > + rc =3D nl_link_get_mac(nl_sock, c->ifi6, mac); > + if (rc < 0) { > + debug("Couldn't get ip6 MAC addr: %s", strerror_(-rc)); > + memcpy(mac, c->our_tap_mac, ETH_ALEN); > + } > + fwd_neigh_table_update(c, &mga, mac, true); > + } > +} > + > /** fwd_probe_ephemeral() - Determine what ports this host considers eph= emeral > * > * Work out what ports the host thinks are emphemeral and record it for = later > diff --git a/fwd.h b/fwd.h > index 65c7c96..352f3b5 100644 > --- a/fwd.h > +++ b/fwd.h > @@ -56,5 +56,12 @@ uint8_t fwd_nat_from_splice(const struct ctx *c, uint8= _t proto, > const struct flowside *ini, struct flowside *tgt); > uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, > const struct flowside *ini, struct flowside *tgt); > +void fwd_neigh_table_update(const struct ctx *c, const union inany_addr = *addr, > + const uint8_t *mac, bool permanent); > +void fwd_neigh_table_free(const struct ctx *c, > + const union inany_addr *addr); > +void fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr, > + uint8_t *mac); > +void fwd_neigh_table_init(const struct ctx *c); > =20 > #endif /* FWD_H */ > diff --git a/netlink.c b/netlink.c > index 99dcb72..61796fb 100644 > --- a/netlink.c > +++ b/netlink.c > @@ -1186,10 +1186,12 @@ static void nl_neigh_msg_read(const struct ctx *c= , struct nlmsghdr *nh) > =20 > if (nh->nlmsg_type =3D=3D RTM_DELNEIGH) { > trace("neigh table delete: %s", ip_str); > + fwd_neigh_table_free(c, &addr); You'll need a nat_inbound() on the delete side as well. Again, I think it makes more sense here than in fwd_neigh_table_free(), but at the moment it's in neither. > return; > } > if (!(ndm->ndm_state & NUD_VALID)) { > trace("neigh table: invalid state for %s", ip_str); > + fwd_neigh_table_free(c, &addr); > return; > } > if (nh->nlmsg_type !=3D RTM_NEWNEIGH || !lladdr) { > @@ -1202,6 +1204,7 @@ static void nl_neigh_msg_read(const struct ctx *c, = struct nlmsghdr *nh) > memcpy(mac, lladdr, ETH_ALEN); > eth_ntop(mac, mac_str, sizeof(mac_str)); > trace("neigh table update: %s / %s", ip_str, mac_str); > + fwd_neigh_table_update(c, &addr, mac, false); > } > =20 > /** > diff --git a/passt.c b/passt.c > index e21d6ba..98fc430 100644 > --- a/passt.c > +++ b/passt.c > @@ -324,6 +324,7 @@ int main(int argc, char **argv) > =20 > pcap_init(&c); > =20 > + fwd_neigh_table_init(&c); > nl_neigh_notify_init(&c); > =20 > if (!c.foreground) { > --=20 > 2.50.1 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --MsVaFzxL+iTqLd3Y Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmjt19kACgkQzQJF27ox 2GeBWQ//dfFofQnyNd8wE7QlNNBY0VzNRd6S1D0qIRzu2J9YW1ovsP+7ycu6ayPD judLuLeKseTsSrBHBYwbZkwHEwPzmFDft4nf13G0DnI9yCFElUUlcFJpNPT9fnLB dxdly6WUeJM+QSM3vTPR2elAoAPEF0rfoELfGtCh+gmEPJnJtmFsQoOHmZ9ZeoCF h14nzgCBZOTYz4lKF6c9F7JP677zWCkHeBcabxH+xEUle2mlITqL+AHL4fKBvdFz R6KnMC59wYmNp1YVjs/hb/UAXUot0NslqWNjiTBEmcaeM52pKu27OvUllMqXZDtr XuAPmD+DuZkWo950UiD/4kF1Zw6LbReXJVLrkI4pUQ9Zes2tN+HWZoCILbdKuf14 QcqmEXtIF8ZySW0mT4OJXbAO9Mc71aw41SZqAjig0yYBSlFLEu/EBs203PWxFx1o PmzWIju8QN9hA2D566jNQXY1iNyfcQgzLKxu3GFXmhJhl4NyiUulC6/Z5kQo41+j bW3J9KmHta7SoYXRA2/okYfLMnYgew+aBK6ksUR61a16tQKXSTpaBVFG5XuXhNSQ MPn++JlVu1dWJrs8QVrG8DgoY6LZL60lzKlYTlVRQx0zIKfRplqr4TMrDfdQeEPp 4c2e4u/SQCV2zUI+JWs37GZLe2tRtz/Jgwnv1KbyJzXc0bnq4yM= =SMVO -----END PGP SIGNATURE----- --MsVaFzxL+iTqLd3Y--