From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202508 header.b=QKU/OrOJ; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 56E305A0271 for ; Wed, 10 Sep 2025 04:57:17 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202508; t=1757473034; bh=plNWgxGeNGMoNwPCtMaCyKMjK6zftynV9ZzrPLSPbQ4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=QKU/OrOJNAmJkOZlHu7sBdNiKqLqwaF0V9JeK+iqG8sBcv5WA4k/5Qm3K54f5j9Jn zXjrXZakWCjFo1Qa4ZRkGetMbUh80JD+PM7PO2283rEdyO2gINc7im+Lm6ILM2RdYU M20TaObF4yqEU0S5YklTV9pjIQnGXuQNzYjReG8Wi+2BgJDIP0mNkqNY2h9V/wXxhk HihyeH3GgdYt8EzFQb8xWtfU2TW/PHaMX8Zd9NL1L1qa+KY6UDt2/nWzDDSg/Xw6Ot gKnFuQq8TwTjGldT4ZL0UR5UxbPGTvHkWS53mG/5lDegV+68B0ZBBy2hqtf40IP/z1 clQwG7MI/2YrA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4cM52L2SmQz4wBC; Wed, 10 Sep 2025 12:57:14 +1000 (AEST) Date: Wed, 10 Sep 2025 12:48:27 +1000 From: David Gibson To: Jon Maloy Subject: Re: [PATCH v6 2/8] fwd: Added cache table for ARP/NDP contents Message-ID: References: <20250910015919.173414-1-jmaloy@redhat.com> <20250910015919.173414-3-jmaloy@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="KCQDCYZx3MYJMTCl" Content-Disposition: inline In-Reply-To: <20250910015919.173414-3-jmaloy@redhat.com> Message-ID-Hash: NS2VESS4FDRBU3BUXQFXPOIDT6Q4H4K4 X-Message-ID-Hash: NS2VESS4FDRBU3BUXQFXPOIDT6Q4H4K4 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: sbrivio@redhat.com, dgibson@redhat.com, passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --KCQDCYZx3MYJMTCl Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Sep 09, 2025 at 09:59:13PM -0400, Jon Maloy wrote: > We add a cache table to keep partial contents of the kernel ARP/NDP > tables. This way, we drastically reduce the number of netlink calls > to read those tables. >=20 > We create undefined cache entries representing non- or not-yet- > existing ARP/NDP entries when needed. We add a short expiration time > to each such entry, so that we can know when to make repeated calls to > the kernel tables in the beginning. We also add an access counter to the > entries, to ensure that the timer becomes longer and the call frequency > abates over time if no ARP/NDP entry shows up. >=20 > For regular entries we use a much longer timer, with the purpose to > update the entry in the rare case that a remote host changes its > MAC address. I don't think the description of expiry behaviour is correct any more >=20 > Signed-off-by: Jon Maloy >=20 > --- > v5: - Moved to earlier in series to reduce rebase conflicts > v6: - Sqashed the hash list commit and the FIFO/LRU queue commit > - Removed hash lookup. We now only use linear lookup in a > linked list > - Eliminated dynamic memory allocation. > - Ensured there is only one call to clock_gettime() > - Using MAC_ZERO instead of the previously dedicated definitions The last point doesn't appear to be correct... > --- > conf.c | 1 + > fwd.c | 176 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > fwd.h | 4 ++ > util.c | 12 ++++ > util.h | 1 + > 5 files changed, 194 insertions(+) >=20 > diff --git a/conf.c b/conf.c > index f47f48e..27b04d3 100644 > --- a/conf.c > +++ b/conf.c > @@ -2122,6 +2122,7 @@ void conf(struct ctx *c, int argc, char **argv) > c->udp.fwd_out.mode =3D fwd_default; > =20 > fwd_scan_ports_init(c); > + fwd_mac_cache_init(); > =20 > if (!c->quiet) > conf_print(c); > diff --git a/fwd.c b/fwd.c > index 250cf56..c58ba9f 100644 > --- a/fwd.c > +++ b/fwd.c > @@ -19,6 +19,8 @@ > #include > #include > #include > +#include > +#include > =20 > #include "util.h" > #include "ip.h" > @@ -26,6 +28,8 @@ > #include "passt.h" > #include "lineread.h" > #include "flow_table.h" > +#include "inany.h" > +#include "netlink.h" > =20 > /* Empheral port range: values from RFC 6335 */ > static in_port_t fwd_ephemeral_min =3D (1 << 15) + (1 << 14); > @@ -33,6 +37,178 @@ static in_port_t fwd_ephemeral_max =3D NUM_PORTS - 1; > =20 > #define PORT_RANGE_SYSCTL "/proc/sys/net/ipv4/ip_local_port_range" > =20 > +#define MAC_CACHE_SIZE 128 > +#define MAC_CACHE_RENEWAL 3600 /* Refresh entry from ARP/NDP every hour= */ > + > +/* Partial cache of ARP/NDP table contents */ > +struct mac_cache_entry { > + struct mac_cache_entry *prev; > + struct mac_cache_entry *next; > + struct timespec expiry; > + union inany_addr addr; > + uint8_t mac[ETH_ALEN]; > + uint16_t access_count; > +}; > + > +struct mac_cache_table { > + struct mac_cache_entry mac_table[MAC_CACHE_SIZE]; > + > + /* FIFO/LRU queue */ > + struct mac_cache_entry *head; > + struct mac_cache_entry *tail; > +}; > + > +static struct mac_cache_table mac_cache; > +const uint8_t undefined_mac[6] =3D {0, 0, 0, 0, 0, 0}; =2E..this is still here, and I don't see MAC_ZERO used anywhere. > +/** > + * fwd_mac_cache_unlink() - Unlink entry from LRU queue > + */ > +static void fwd_mac_cache_unlink(struct mac_cache_entry *e) > +{ > + struct mac_cache_table *t =3D &mac_cache; > + > + if (e->prev) > + e->prev->next =3D e->next; > + else > + t->head =3D e->next; > + > + if (e->next) > + e->next->prev =3D e->prev; > + else > + t->tail =3D e->prev; > + > + e->prev =3D e->next =3D NULL; > +} > + > +/** > + * fwd_mac_cache_append_tail() - Add entry to tail of LRU queue > + */ > +static void fwd_mac_cache_append_to_tail(struct mac_cache_entry *e) > +{ > + struct mac_cache_table *t =3D &mac_cache; > + > + e->next =3D NULL; > + e->prev =3D t->tail; > + t->tail->next =3D e; > + t->tail =3D e; > +} > + > +/** > + * fwd_mac_cache_move_to_tail() - Move entry to tail of LRU queue > + */ > +static void fwd_mac_cache_move_to_tail(struct mac_cache_entry *e) > +{ > + struct mac_cache_table *t =3D &mac_cache; > + > + if (t->tail =3D=3D e) > + return; > + > + fwd_mac_cache_unlink(e); > + fwd_mac_cache_append_to_tail(e); > +} > + > +/** > + * mac_entry_set_expiry() - Set the time for a cache entry to expire > + * @e: Cache entry > + * @expiry: Expiration time, in seconds from current moment. > + */ > +static void mac_entry_set_expiry(struct mac_cache_entry *e, struct times= pec *now, int expiry) > +{ > + e->expiry =3D *now; > + e->expiry.tv_sec +=3D expiry; > +} > + > +/** > + * fwd_mac_cache_find() - Find an entry in the ARP/NDP cache table > + * @addr: IPv4 or IPv6 address, used as key for the lookup > + * > + * Return: Pointer to the entry on success, NULL on failure. > + */ > +static struct mac_cache_entry *fwd_mac_cache_find(const union inany_addr= *addr) > +{ > + const struct mac_cache_table *t =3D &mac_cache; > + struct mac_cache_entry *e =3D t->tail; > + > + for (e =3D t->tail; e; e =3D e->prev) { > + if (inany_equals(&e->addr, addr)) > + return e; > + } > + return NULL; > +} > + > +/** > + * fwd_neigh_mac_get() - Find a MAC address the ARP/NDP cache table This is missing an "in" at least, and doesn't really distinguish this =66rom fwd_mac_cache_find(). > + * @c: Execution context > + * @addr: IPv4 or IPv6 address > + * @ifi: Interface index > + * @mac: Buffer for Ethernet MAC to return, found or default. > + * > + * Return: true if real MAC found, false if not found or failure > + */ > +bool fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr, > + uint8_t *mac) > +{ > + struct mac_cache_entry *e =3D fwd_mac_cache_find(addr); > + int ifi =3D inany_v4(addr) ? c->ifi4 : c->ifi6; > + bool refresh =3D false; > + bool found =3D false; > + struct timespec now; > + > + clock_gettime(CLOCK_MONOTONIC, &now); > + > + if (e) { > + refresh =3D timespec_before(&e->expiry, &now); > + } else { > + /* Recycle least recently used entry */ > + e =3D mac_cache.head; > + e->addr =3D *addr; > + memcpy(e->mac, undefined_mac, ETH_ALEN); MAC_ZERO. > + e->access_count =3D 0; > + refresh =3D true; > + } > + > + if (!refresh) { > + found =3D !MAC_IS_ZERO(e->mac); > + } else { > + found =3D nl_neigh_mac_get(nl_sock, addr, ifi, e->mac); > + mac_entry_set_expiry(e, &now, MAC_CACHE_RENEWAL); > + } > + > + if (found) { > + memcpy(mac, e->mac, ETH_ALEN); > + } else { > + /* Do linear back-off from new netlink calls if nothing found */ > + mac_entry_set_expiry(e, &now, e->access_count++); Do we need this? Particularly now we're treating negative and positive entries the same for initial expiry. > + memcpy(mac, c->our_tap_mac, ETH_ALEN); > + } > + > + /* Set to most recently used */ > + fwd_mac_cache_move_to_tail(e); > + > + return found; > +} > + > +/** > + * fwd_mac_cache_init() - Initiate ARP/NDP cache table > + */ > +void fwd_mac_cache_init(void) > +{ > + struct mac_cache_table *t =3D &mac_cache; > + struct mac_cache_entry *e; > + int i; > + > + memset(t, 0, sizeof(*t)); > + > + for (i =3D 0; i < MAC_CACHE_SIZE; i++) { > + e =3D &t->mac_table[i]; > + e->prev =3D (i =3D=3D 0) ? NULL : &t->mac_table[i - 1]; > + e->next =3D (i < (MAC_CACHE_SIZE - 1)) ? &t->mac_table[i + 1] : NULL; > + } > + t->head =3D &t->mac_table[0]; > + t->tail =3D &t->mac_table[MAC_CACHE_SIZE - 1]; > +} > + > /** fwd_probe_ephemeral() - Determine what ports this host considers eph= emeral > * > * Work out what ports the host thinks are emphemeral and record it for = later > diff --git a/fwd.h b/fwd.h > index 65c7c96..728601f 100644 > --- a/fwd.h > +++ b/fwd.h > @@ -57,4 +57,8 @@ uint8_t fwd_nat_from_splice(const struct ctx *c, uint8_= t proto, > uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, > const struct flowside *ini, struct flowside *tgt); > =20 > +bool fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr, > + uint8_t *mac); > +void fwd_mac_cache_init(void); > + > #endif /* FWD_H */ > diff --git a/util.c b/util.c > index c492f90..8a8846c 100644 > --- a/util.c > +++ b/util.c > @@ -305,6 +305,18 @@ long timespec_diff_ms(const struct timespec *a, cons= t struct timespec *b) > return timespec_diff_us(a, b) / 1000; > } > =20 > +/** > + * timespec_before() - Check the relation between two points in time > + * @a: Point in time to be tested > + * @b: Point in time test a against > + * Return: True if a comes before b, otherwise b > + */ > +bool timespec_before(const struct timespec *a, const struct timespec *b) > +{ > + return (a->tv_sec < b->tv_sec) || > + (a->tv_sec =3D=3D b->tv_sec && a->tv_nsec < b->tv_nsec); > +} > + > /** > * bitmap_set() - Set single bit in bitmap > * @map: Pointer to bitmap > diff --git a/util.h b/util.h > index 2a8c38f..5ec3f22 100644 > --- a/util.h > +++ b/util.h > @@ -207,6 +207,7 @@ int sock_unix(char *sock_path); > void sock_probe_mem(struct ctx *c); > long timespec_diff_ms(const struct timespec *a, const struct timespec *b= ); > int64_t timespec_diff_us(const struct timespec *a, const struct timespec= *b); > +bool timespec_before(const struct timespec *a, const struct timespec *b); > void bitmap_set(uint8_t *map, unsigned bit); > void bitmap_clear(uint8_t *map, unsigned bit); > bool bitmap_isset(const uint8_t *map, unsigned bit); > --=20 > 2.50.1 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --KCQDCYZx3MYJMTCl Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmjA5voACgkQzQJF27ox 2GfCKxAAmNu89T6X++novT8XuTsbmAzkUNSPk9aOYZlt43NaBfZp8L1hmjQ2tBxY YOmdhPUFLr2Hdfz86iEDmwL0pUmpFHN1Q8BdwBpyMY/50Cli9euheBBLG6asAwk/ iWVNSexy0UJ79j2FDsMuSeEkZ3MEdzxi3hD34By7Hl27DuHI0aTcPVHiWr/FISnp VshQ1i7e1cmHmaaCKaI5qKr00x2QxS4e9LF8uSG5Heuf/JyXBGq/A3zO7exrzKEH xeGQzzjB3Uzus4pRzhsZvgHMPKI5VNaJQ4pcXfSDZgq8p64db8waNRnaAhqFPR/E jJiWNW5/jpB4/TP0GNFd+a/itoplB+W5pUBoZ9u5RJcS+nxYz8SenbPlO001CLIo VygHzOSOV5ZGatUJodWuszLcbkluWkd7P7tBf5/yvmqHC4O9YWDeHbAyTUYvIw+m VtMoVtHHCpDci7IWU4xYON9v4w7nEGTk+Lp12uCFsHhdjVSz2x2HEw79QZbnhdiN 5GT2GJilnvk+7R+ySVgKv50nM0nphYgznWlcHpSQjS5BpE8Ip2MW73jJuJYp9tEq TFfXAZ0X0tB23jWTKcKy1wlZbNgXsf0LCgum1KYH/faWDCSHjt8QYtXUGNCRscxM IgA+q2WUmi+uICR1hCcdEK5aXJO5ZbtLd+xxJaQHOqhaAb7rYlg= =eF3s -----END PGP SIGNATURE----- --KCQDCYZx3MYJMTCl--