From: Jon Maloy <jmaloy@redhat.com>
To: sbrivio@redhat.com, dgibson@redhat.com,
david@gibson.dropbear.id.au, jmaloy@redhat.com,
passt-dev@passt.top
Subject: [PATCH v8 2/8] fwd: Add cache table for ARP/NDP contents
Date: Sun, 21 Sep 2025 18:08:59 -0400 [thread overview]
Message-ID: <20250921220905.478621-3-jmaloy@redhat.com> (raw)
In-Reply-To: <20250921220905.478621-1-jmaloy@redhat.com>
We add a cache table to keep track of the contents of the kernel ARP
and NDP tables. The table is fed from the just introduced netlink based
neigbour subscription function. The new table eliminates the need for
explicit netlink calls to find a host's MAC address.
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
v5: - Moved to earlier in series to reduce rebase conflicts
v6: - Sqashed the hash list commit and the FIFO/LRU queue commit
- Removed hash lookup. We now only use linear lookup in a
linked list
- Eliminated dynamic memory allocation.
- Ensured there is only one call to clock_gettime()
- Using MAC_ZERO instead of the previously dedicated definitions
v7: - NOW using MAC_ZERO where needed
- I am still using linear back-off for empty cache entries. Even
an incoming, flow-creating packet from a local host gives no
guarantee that its MAC address is in the ARP table, so we must
allow for a few new attempts at first possible occasions. Only
after several failed lookups can we conclude that we probably
never will succeed. Hence the back-off.
- Fixed a bug that David inadvertently made me aware of: I only
intended to set the initial expiry value to MAC_CACHE_RENEWAL
when an ARP/NDP table lookup was successful.
- Improved struct and function description comments.
v8: - Total re-design of table, adapting to the new, subscription
based way of updating it.
---
conf.c | 1 +
fwd.c | 159 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
fwd.h | 7 +++
netlink.c | 15 ++++--
4 files changed, 177 insertions(+), 5 deletions(-)
diff --git a/conf.c b/conf.c
index 66b9e63..cc491c3 100644
--- a/conf.c
+++ b/conf.c
@@ -2133,6 +2133,7 @@ void conf(struct ctx *c, int argc, char **argv)
c->udp.fwd_out.mode = fwd_default;
fwd_scan_ports_init(c);
+ fwd_neigh_cache_init();
if (!c->quiet)
conf_print(c);
diff --git a/fwd.c b/fwd.c
index 250cf56..139ad5c 100644
--- a/fwd.c
+++ b/fwd.c
@@ -32,6 +32,165 @@ static in_port_t fwd_ephemeral_min = (1 << 15) + (1 << 14);
static in_port_t fwd_ephemeral_max = NUM_PORTS - 1;
#define PORT_RANGE_SYSCTL "/proc/sys/net/ipv4/ip_local_port_range"
+#define MAC_CACHE_SIZE 1024
+
+/**
+ * mac_cache_entry - Entry in the ARP/NDP table cache
+ * @next: Next entry in slot or free list
+ * @addr: IP address of represented host
+ * @mac: MAC address of represented host, if known
+ */
+struct mac_cache_entry {
+ struct mac_cache_entry *next;
+ union inany_addr addr;
+ uint8_t mac[ETH_ALEN];
+};
+
+/**
+ * mac_cache_table - Cache of ARP/NDP table contents
+ * @entries: Entries to be plugged into the hash slots when allocated
+ * @slots: Hash table slots
+ * @free: Linked list of unused entries
+ */
+struct mac_cache_table {
+ struct mac_cache_entry entries[MAC_CACHE_SIZE];
+ struct mac_cache_entry *slots[MAC_CACHE_SIZE];
+ struct mac_cache_entry *free;
+};
+
+static struct mac_cache_table mac_cache;
+
+/**
+ * fwd_mac_cache_slot_idx() - Hash key to a number within the table range
+ * @c: Execution context
+ * @key: The key to be used for the hash
+ *
+ * Return: The resulting hash value
+ */
+static inline size_t fwd_mac_cache_slot_idx(const struct ctx *c,
+ const union inany_addr *key)
+{
+ struct siphash_state st = SIPHASH_INIT(c->hash_secret);
+ uint32_t i;
+
+ inany_siphash_feed(&st, key);
+ i = siphash_final(&st, sizeof(*key), 0);
+
+ return ((size_t)i) & (MAC_CACHE_SIZE - 1);
+}
+
+/**
+ * fwd_mac_cache_find() - Find a MAC cache table entry
+ * @c: Execution context
+ * @addr: Neighbour address to be used as key for the lookup
+ *
+ * Return: The found entry, if any. Otherwise NULL.
+ */
+static struct mac_cache_entry *fwd_mac_cache_find(const struct ctx *c,
+ const union inany_addr *addr)
+{
+ size_t idx = fwd_mac_cache_slot_idx(c, addr);
+ struct mac_cache_entry *e = mac_cache.slots[idx];
+
+ while (e && !inany_equals(&e->addr, addr))
+ e = e->next;
+
+ return e;
+}
+
+/**
+ * fwd_mac_cache_alloc() - Allocate a mac cache table entry from the free list
+ * @c: Execution context
+ * @addr: Address used to determine insertion slot and store in entry
+ * @mac: The MAC address associated with the neighbour address
+ */
+void fwd_neigh_mac_cache_alloc(const struct ctx *c,
+ const union inany_addr *addr, uint8_t *mac)
+{
+ struct mac_cache_table *t = &mac_cache;
+ struct mac_cache_entry *e;
+ ssize_t idx;
+
+ if (fwd_mac_cache_find(c, addr))
+ return;
+
+ e = t->free;
+ if (!e)
+ return;
+
+ t->free = e->next;
+
+ idx = fwd_mac_cache_slot_idx(c, addr);
+ e->next = t->slots[idx];
+ t->slots[idx] = e;
+
+ memcpy(&e->addr, addr, sizeof(*addr));
+ memcpy(e->mac, mac, ETH_ALEN);
+}
+
+/**
+ * fwd_mac_cache_free() - Remove an entry from a slot and add it to free list
+ * @c: Execution context
+ * @addr: Neighbour address used to find the slot for the entry
+ */
+void fwd_neigh_mac_cache_free(const struct ctx *c, const union inany_addr *addr)
+{
+ ssize_t idx = fwd_mac_cache_slot_idx(c, addr);
+ struct mac_cache_table *t = &mac_cache;
+ struct mac_cache_entry *e, **prev;
+
+ prev = &t->slots[idx];
+ e = t->slots[idx];
+ while (e && !inany_equals(&e->addr, addr)) {
+ prev = &e->next;
+ e = e->next;
+ }
+ if (!e)
+ return;
+
+ *prev = e->next;
+ e->next = t->free;
+ t->free = e;
+ memset(&e->addr, 0, sizeof(*addr));
+ memset(e->mac, 0, ETH_ALEN);
+}
+
+/**
+ * fwd_neigh_mac_get() - Lookup MAC address in the ARP/NDP cache table
+ * @c: Execution context
+ * @addr: Neighbour address used as lookup key
+ * @mac: Buffer for Ethernet MAC to return, found or default value.
+ *
+ * Return: true if real MAC found, false if not found or if failure
+ */
+bool fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr,
+ uint8_t *mac)
+{
+ struct mac_cache_entry *e = fwd_mac_cache_find(c, addr);
+
+ if (e)
+ memcpy(mac, e->mac, ETH_ALEN);
+ else
+ memcpy(mac, c->our_tap_mac, ETH_ALEN);
+
+ return !!e;
+}
+
+/**
+ * fwd_neigh_cache_init() - Initialize the neighbor ARP/NDP cache table
+ */
+void fwd_neigh_cache_init(void)
+{
+ struct mac_cache_table *t = &mac_cache;
+ struct mac_cache_entry *e;
+
+ memset(t, 0, sizeof(*t));
+ for (int i = 0; i < MAC_CACHE_SIZE; i++) {
+ e = &t->entries[i];
+ e->next = t->free;
+ t->free = e;
+ }
+}
/** fwd_probe_ephemeral() - Determine what ports this host considers ephemeral
*
diff --git a/fwd.h b/fwd.h
index 65c7c96..a0e8fbc 100644
--- a/fwd.h
+++ b/fwd.h
@@ -56,5 +56,12 @@ uint8_t fwd_nat_from_splice(const struct ctx *c, uint8_t proto,
const struct flowside *ini, struct flowside *tgt);
uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto,
const struct flowside *ini, struct flowside *tgt);
+void fwd_neigh_mac_cache_alloc(const struct ctx *c,
+ const union inany_addr *addr, uint8_t *mac);
+void fwd_neigh_mac_cache_free(const struct ctx *c,
+ const union inany_addr *addr);
+bool fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr,
+ uint8_t *mac);
+void fwd_neigh_cache_init(void);
#endif /* FWD_H */
diff --git a/netlink.c b/netlink.c
index 1faf3da..1e1ec43 100644
--- a/netlink.c
+++ b/netlink.c
@@ -131,6 +131,8 @@ int nl_neigh_subscr_init(struct ctx *c)
*/
void nl_neigh_subscr_handler(struct ctx *c)
{
+ union inany_addr addr;
+ uint8_t mac[ETH_ALEN];
struct nlmsghdr *nh;
char buf[NLBUFSIZ];
ssize_t n;
@@ -183,17 +185,20 @@ void nl_neigh_subscr_handler(struct ctx *c)
dstlen != sizeof(struct in6_addr))
continue;
- char abuf[INET6_ADDRSTRLEN];
+ if (!lladdr || lladdr_len != ETH_ALEN)
+ continue;
+
+ memcpy(mac, lladdr, ETH_ALEN);
if (dstlen == sizeof(struct in_addr))
- inet_ntop(AF_INET, dst, abuf, sizeof(abuf));
+ addr = inany_from_v4(*(struct in_addr *) dst);
else
- inet_ntop(AF_INET6, dst, abuf, sizeof(abuf));
+ addr.a6 = *(struct in6_addr *) dst;
if (nh->nlmsg_type == RTM_NEWNEIGH)
- debug("neigh: NEW %s lladdr_len=%zu", abuf, lladdr_len);
+ fwd_neigh_mac_cache_alloc(c, &addr, mac);
else
- debug("neigh: DEL %s", abuf);
+ fwd_neigh_mac_cache_free(c, &addr);
}
}
}
--
2.50.1
next prev parent reply other threads:[~2025-09-21 22:09 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-21 22:08 [PATCH v8 0/8] Use true MAC address of LAN local remote hosts Jon Maloy
2025-09-21 22:08 ` [PATCH v8 1/8] netlink: add subsciption on changes in NDP/ARP table Jon Maloy
2025-09-21 22:08 ` Jon Maloy [this message]
2025-09-21 22:09 ` [PATCH v8 3/8] arp/ndp: respond with true MAC address of LAN local remote hosts Jon Maloy
2025-09-21 22:09 ` [PATCH v8 4/8] flow: add MAC address of LAN local remote hosts to flow Jon Maloy
2025-09-21 22:09 ` [PATCH v8 5/8] udp: forward external source MAC address through tap interface Jon Maloy
2025-09-21 22:09 ` [PATCH v8 6/8] tcp: " Jon Maloy
2025-09-21 22:09 ` [PATCH v8 7/8] tap: change signature of function tap_push_l2h() Jon Maloy
2025-09-21 22:09 ` [PATCH v8 8/8] icmp: let icmp use mac address from flowside structure Jon Maloy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250921220905.478621-3-jmaloy@redhat.com \
--to=jmaloy@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=dgibson@redhat.com \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).