public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Jon Maloy <jmaloy@redhat.com>
To: sbrivio@redhat.com, dgibson@redhat.com,
	david@gibson.dropbear.id.au, jmaloy@redhat.com,
	passt-dev@passt.top
Subject: [PATCH v14 03/10] fwd: Add cache table for ARP/NDP contents
Date: Tue, 14 Oct 2025 22:55:14 -0400	[thread overview]
Message-ID: <20251015025521.1449156-4-jmaloy@redhat.com> (raw)
In-Reply-To: <20251015025521.1449156-1-jmaloy@redhat.com>

We add a cache table to keep track of the contents of the kernel ARP
and NDP tables. The table is fed from the just introduced netlink based
neigbour subscription function.

Signed-off-by: Jon Maloy <jmaloy@redhat.com>

---
v5: - Moved to earlier in series to reduce rebase conflicts
v6: - Sqashed the hash list commit and the FIFO/LRU queue commit
    - Removed hash lookup. We now only use linear lookup in a
      linked list
    - Eliminated dynamic memory allocation.
    - Ensured there is only one call to clock_gettime()
    - Using MAC_ZERO instead of the previously dedicated definitions
v7: - NOW using MAC_ZERO where needed
    - I am still using linear back-off for empty cache entries. Even
      an incoming, flow-creating packet from a local host gives no
      guarantee that its MAC address is in the ARP table, so we must
      allow for a few new attempts at first possible occasions. Only
      after several failed lookups can we conclude that we probably
      never will succeed. Hence the back-off.
    - Fixed a bug that David inadvertently made me aware of: I only
      intended to set the initial expiry value to MAC_CACHE_RENEWAL
      when an ARP/NDP table lookup was successful.
    - Improved struct and function description comments.
v8: - Total re-design of table, adapting to the new, subscription
      based way of updating it.
v9: - Catering for MAC address change for an existing host.
v10: - Changes according to feedback from David Gibson
v12: - Changes according to feedback from David and Stefano
     - Added dummy entries for loopback and default GW addresses
v13: - Changes according to feedback and discussions with David
       and Stefano
v14: - Moved the call to nat_inbound() to a much more sensible
       place in netlink.c, as suggested by David.
---
 fwd.c     | 217 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fwd.h     |   7 ++
 netlink.c |  10 ++-
 passt.c   |   1 +
 4 files changed, 233 insertions(+), 2 deletions(-)

diff --git a/fwd.c b/fwd.c
index 250cf56..f70e4fc 100644
--- a/fwd.c
+++ b/fwd.c
@@ -26,6 +26,7 @@
 #include "passt.h"
 #include "lineread.h"
 #include "flow_table.h"
+#include "netlink.h"
 
 /* Empheral port range: values from RFC 6335 */
 static in_port_t fwd_ephemeral_min = (1 << 15) + (1 << 14);
@@ -33,6 +34,222 @@ static in_port_t fwd_ephemeral_max = NUM_PORTS - 1;
 
 #define PORT_RANGE_SYSCTL	"/proc/sys/net/ipv4/ip_local_port_range"
 
+#define NEIGH_TABLE_SLOTS    1024
+#define NEIGH_TABLE_SIZE     (NEIGH_TABLE_SLOTS / 2)
+static_assert((NEIGH_TABLE_SLOTS & (NEIGH_TABLE_SLOTS - 1)) == 0,
+	      "NEIGH_TABLE_SLOTS must be a power of two");
+
+/**
+ * struct neigh_table_entry - Entry in the ARP/NDP table
+ * @next:	Next entry in slot or free list
+ * @addr:	IP address of represented host
+ * @mac:	MAC address of represented host
+ * @permanent:	Entry cannot be altered or freed by notification
+ */
+struct neigh_table_entry {
+	struct neigh_table_entry *next;
+	union inany_addr addr;
+	uint8_t mac[ETH_ALEN];
+	bool permanent;
+};
+
+/**
+ * struct neigh_table - Cache of ARP/NDP table contents
+ * @entries:	Entries to be plugged into the hash slots when allocated
+ * @slots:	Hash table slots
+ * @free:	Linked list of unused entries
+ */
+struct neigh_table {
+	struct neigh_table_entry entries[NEIGH_TABLE_SIZE];
+	struct neigh_table_entry *slots[NEIGH_TABLE_SLOTS];
+	struct neigh_table_entry *free;
+};
+
+static struct neigh_table neigh_table;
+
+/**
+ * neigh_table_slot() - Hash key to a number within the table range
+ * @c:		Execution context
+ * @key:	The key to be used for the hash
+ *
+ * Return: the resulting hash value
+ */
+static size_t neigh_table_slot(const struct ctx *c,
+			       const union inany_addr *key)
+{
+	struct siphash_state st = SIPHASH_INIT(c->hash_secret);
+	uint32_t i;
+
+	inany_siphash_feed(&st, key);
+	i = siphash_final(&st, sizeof(*key), 0);
+
+	return ((size_t)i) & (NEIGH_TABLE_SIZE - 1);
+}
+
+/**
+ * fwd_neigh_table_find() - Find a MAC table entry
+ * @c:		Execution context
+ * @addr:	Neighbour address to be used as key for the lookup
+ *
+ * Return: the matching entry, if found. Otherwise NULL
+ */
+static struct neigh_table_entry *fwd_neigh_table_find(const struct ctx *c,
+						      const union inany_addr *addr)
+{
+	size_t slot = neigh_table_slot(c, addr);
+	struct neigh_table_entry *e = neigh_table.slots[slot];
+
+	while (e && !inany_equals(&e->addr, addr))
+		e = e->next;
+
+	return e;
+}
+
+/**
+ * fwd_neigh_table_update() - Allocate or update neighbour table entry
+ * @c:		Execution context
+ * @addr:	IP address used to determine insertion slot and store in entry
+ * @mac:	The MAC address associated with the neighbour address
+ * @permanent:	Created entry cannot be altered or freed
+ */
+void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr,
+			    const uint8_t *mac, bool permanent)
+{
+	struct neigh_table *t = &neigh_table;
+	struct neigh_table_entry *e;
+	ssize_t slot;
+
+	/* MAC address might change sometimes */
+	e = fwd_neigh_table_find(c, addr);
+	if (e) {
+		if (!e->permanent)
+			memcpy(e->mac, mac, ETH_ALEN);
+		return;
+	}
+
+	e = t->free;
+	if (!e) {
+		debug("Failed to allocate neighbour table entry");
+		return;
+	}
+	t->free = e->next;
+	slot = neigh_table_slot(c, addr);
+	e->next = t->slots[slot];
+	t->slots[slot] = e;
+
+	memcpy(&e->addr, addr, sizeof(*addr));
+	memcpy(e->mac, mac, ETH_ALEN);
+	e->permanent = permanent;
+}
+
+/**
+ * fwd_neigh_table_free() - Remove an entry from a slot and add it to free list
+ * @c:		Execution context
+ * @addr:	IP address used to find the slot for the entry
+ */
+void fwd_neigh_table_free(const struct ctx *c, const union inany_addr *addr)
+{
+	ssize_t slot = neigh_table_slot(c, addr);
+	struct neigh_table *t = &neigh_table;
+	struct neigh_table_entry *e, **prev;
+
+	prev = &t->slots[slot];
+	e = t->slots[slot];
+	while (e && !inany_equals(&e->addr, addr)) {
+		prev = &e->next;
+		e = e->next;
+	}
+
+	if (!e || e->permanent)
+		return;
+
+	*prev = e->next;
+	e->next = t->free;
+	t->free = e;
+	memset(&e->addr, 0, sizeof(*addr));
+	memset(e->mac, 0, ETH_ALEN);
+}
+
+/**
+ * fwd_neigh_mac_get() - Look up MAC address in the ARP/NDP table
+ * @c:		Execution context
+ * @addr:	Neighbour IP address used as lookup key
+ * @mac:	Buffer for returned MAC address
+ */
+void fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr,
+		       uint8_t *mac)
+{
+	const struct neigh_table_entry *e = fwd_neigh_table_find(c, addr);
+
+	if (e)
+		memcpy(mac, e->mac, ETH_ALEN);
+	else
+		memcpy(mac, c->our_tap_mac, ETH_ALEN);
+}
+
+/**
+ * fwd_neigh_table_init() - Initialize the neighbour table
+ * @c:		Execution context
+ */
+void fwd_neigh_table_init(const struct ctx *c)
+{
+	union inany_addr mhl = inany_from_v4(c->ip4.map_host_loopback);
+	union inany_addr mga = inany_from_v4(c->ip4.map_guest_addr);
+	union inany_addr ggw = inany_from_v4(c->ip4.guest_gw);
+	struct neigh_table *t = &neigh_table;
+	struct neigh_table_entry *e;
+	int i;
+
+	memset(t, 0, sizeof(*t));
+
+	for (i = 0; i < NEIGH_TABLE_SIZE; i++) {
+		e = &t->entries[i];
+		e->next = t->free;
+		t->free = e;
+	}
+
+	/* Blocker entries to stop events from hosts using these addresses */
+	if (!inany_is_unspecified4(&mhl))
+		fwd_neigh_table_update(c, &mhl, c->our_tap_mac, true);
+
+	if (!inany_is_unspecified4(&ggw) && !c->no_map_gw)
+		fwd_neigh_table_update(c, &ggw, c->our_tap_mac, true);
+
+	if (!inany_is_unspecified4(&mga) && !inany_equals(&mhl, &mga)) {
+		uint8_t mac[ETH_ALEN];
+		int rc;
+
+		rc = nl_link_get_mac(nl_sock, c->ifi4, mac);
+		if (rc < 0) {
+			debug("Couldn't get ip4 MAC addr: %s", strerror_(-rc));
+			memcpy(mac, c->our_tap_mac, ETH_ALEN);
+		}
+		fwd_neigh_table_update(c, &mga, mac, true);
+	}
+
+	mhl = *(union inany_addr *)&c->ip6.map_host_loopback;
+	mga = *(union inany_addr *)&c->ip4.map_guest_addr;
+	ggw = *(union inany_addr *)&c->ip4.guest_gw;
+
+	if (!inany_is_unspecified6(&mhl))
+		fwd_neigh_table_update(c, &mhl, c->our_tap_mac, true);
+
+	if (!inany_is_unspecified6(&ggw) && !c->no_map_gw)
+		fwd_neigh_table_update(c, &ggw, c->our_tap_mac, true);
+
+	if (!inany_is_unspecified6(&mga) && !inany_equals(&mhl, &mga)) {
+		uint8_t mac[ETH_ALEN];
+		int rc;
+
+		rc = nl_link_get_mac(nl_sock, c->ifi6, mac);
+		if (rc < 0) {
+			debug("Couldn't get ip6 MAC addr: %s", strerror_(-rc));
+			memcpy(mac, c->our_tap_mac, ETH_ALEN);
+		}
+		fwd_neigh_table_update(c, &mga, mac, true);
+	}
+}
+
 /** fwd_probe_ephemeral() - Determine what ports this host considers ephemeral
  *
  * Work out what ports the host thinks are emphemeral and record it for later
diff --git a/fwd.h b/fwd.h
index 65c7c96..352f3b5 100644
--- a/fwd.h
+++ b/fwd.h
@@ -56,5 +56,12 @@ uint8_t fwd_nat_from_splice(const struct ctx *c, uint8_t proto,
 			    const struct flowside *ini, struct flowside *tgt);
 uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto,
 			  const struct flowside *ini, struct flowside *tgt);
+void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr,
+			    const uint8_t *mac, bool permanent);
+void fwd_neigh_table_free(const struct ctx *c,
+			  const union inany_addr *addr);
+void fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr,
+		       uint8_t *mac);
+void fwd_neigh_table_init(const struct ctx *c);
 
 #endif /* FWD_H */
diff --git a/netlink.c b/netlink.c
index 186383c..85643bd 100644
--- a/netlink.c
+++ b/netlink.c
@@ -1123,10 +1123,10 @@ static void nl_neigh_msg_read(const struct ctx *c, struct nlmsghdr *nh)
 	char ip_str[INET6_ADDRSTRLEN];
 	char mac_str[ETH_ADDRSTRLEN];
 	const uint8_t *lladdr = NULL;
+	union inany_addr addr, daddr;
 	const void *dst = NULL;
 	size_t lladdr_len = 0;
 	uint8_t mac[ETH_ALEN];
-	union inany_addr addr;
 	size_t dstlen = 0;
 
 	if (nh->nlmsg_type == NLMSG_DONE)
@@ -1183,15 +1183,20 @@ static void nl_neigh_msg_read(const struct ctx *c, struct nlmsghdr *nh)
 		warn("netlink: wrong address length in AF_INET6 notification");
 		return;
 	}
+	/* We only handle guest-side visible addresses */
 	inany_from_af(&addr, ndm->ndm_family, dst);
-	inany_ntop(dst, ip_str, sizeof(ip_str));
+	if (!nat_inbound(c, &addr, &daddr))
+		return;
+	inany_ntop(&daddr, ip_str, sizeof(ip_str));
 
 	if (nh->nlmsg_type == RTM_DELNEIGH) {
 		trace("neigh table delete: %s", ip_str);
+		fwd_neigh_table_free(c, &daddr);
 		return;
 	}
 	if (!(ndm->ndm_state & NUD_VALID)) {
 		trace("neigh table: invalid state for %s", ip_str);
+		fwd_neigh_table_free(c, &daddr);
 		return;
 	}
 	if (nh->nlmsg_type != RTM_NEWNEIGH || !lladdr) {
@@ -1204,6 +1209,7 @@ static void nl_neigh_msg_read(const struct ctx *c, struct nlmsghdr *nh)
 	memcpy(mac, lladdr, ETH_ALEN);
 	eth_ntop(mac, mac_str, sizeof(mac_str));
 	trace("neigh table update: %s / %s", ip_str, mac_str);
+	fwd_neigh_table_update(c, &daddr, mac, false);
 }
 
 /**
diff --git a/passt.c b/passt.c
index e21d6ba..98fc430 100644
--- a/passt.c
+++ b/passt.c
@@ -324,6 +324,7 @@ int main(int argc, char **argv)
 
 	pcap_init(&c);
 
+	fwd_neigh_table_init(&c);
 	nl_neigh_notify_init(&c);
 
 	if (!c.foreground) {
-- 
2.50.1


  parent reply	other threads:[~2025-10-15  2:55 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-15  2:55 [PATCH v14 00/10] Use true MAC address of LAN local remote hosts Jon Maloy
2025-10-15  2:55 ` [PATCH v14 01/10] netlink: add subscription on changes in NDP/ARP table Jon Maloy
2025-10-17  2:36   ` David Gibson
2025-10-19 10:07   ` Stefano Brivio
2025-10-20  0:17     ` David Gibson
2025-10-15  2:55 ` [PATCH v14 02/10] passt: add no_map_gw flag to struct ctx Jon Maloy
2025-10-19 10:07   ` Stefano Brivio
2025-10-15  2:55 ` Jon Maloy [this message]
2025-10-17  3:05   ` [PATCH v14 03/10] fwd: Add cache table for ARP/NDP contents David Gibson
2025-10-17 18:49     ` Jon Maloy
2025-10-20  0:06       ` David Gibson
2025-10-20 10:00         ` Jon Maloy
2025-10-22  1:20           ` David Gibson
2025-10-19 10:07   ` Stefano Brivio
2025-10-15  2:55 ` [PATCH v14 04/10] arp/ndp: respond with true MAC address of LAN local remote hosts Jon Maloy
2025-10-17  3:06   ` David Gibson
2025-10-15  2:55 ` [PATCH v14 05/10] arp/ndp: send ARP announcement / unsolicited NA when neigbour entry added Jon Maloy
2025-10-17  3:08   ` David Gibson
2025-10-19 10:08   ` Stefano Brivio
2025-10-15  2:55 ` [PATCH v14 06/10] flow: add MAC address of LAN local remote hosts to flow Jon Maloy
2025-10-15  2:55 ` [PATCH v14 07/10] udp: forward external source MAC address through tap interface Jon Maloy
2025-10-15  2:55 ` [PATCH v14 08/10] tcp: " Jon Maloy
2025-10-15  2:55 ` [PATCH v14 09/10] tap: change signature of function tap_push_l2h() Jon Maloy
2025-10-15  2:55 ` [PATCH v14 10/10] icmp: let icmp use mac address from flowside structure Jon Maloy
2025-10-19 10:08   ` Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251015025521.1449156-4-jmaloy@redhat.com \
    --to=jmaloy@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=dgibson@redhat.com \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).