From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=QmjyRGv1; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id 3E1B35A0619 for ; Fri, 03 Oct 2025 02:34:23 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1759451662; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9bS21AKFjKGeflKpIGDUM1W62yQIhjylV+ncO94CXxE=; b=QmjyRGv1wci/sb2l/E0JB/sGnQIkT+RNA7QbtgLJHjlp6o8lORorBD23B6nMJOzaqWa1DN IV/7dAkPKYGJqK9nqjf1EZVPkajunhr04P42a31vvneBs95UNIwe3S1XKVaGBBt+89g/vU iKHKmliGjgVtHGS75YTBdjtVZFd/fe0= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-365-bt5guT6_PTyLfmr_YNSUXw-1; Thu, 02 Oct 2025 20:34:21 -0400 X-MC-Unique: bt5guT6_PTyLfmr_YNSUXw-1 X-Mimecast-MFC-AGG-ID: bt5guT6_PTyLfmr_YNSUXw_1759451660 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id DDDD61800447; Fri, 3 Oct 2025 00:34:19 +0000 (UTC) Received: from jmaloy-thinkpadp16vgen1.rmtcaqc.csb (unknown [10.22.88.36]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 27908300018D; Fri, 3 Oct 2025 00:34:18 +0000 (UTC) From: Jon Maloy To: sbrivio@redhat.com, dgibson@redhat.com, david@gibson.dropbear.id.au, jmaloy@redhat.com, passt-dev@passt.top Subject: [PATCH v12 2/9] fwd: Add cache table for ARP/NDP contents Date: Thu, 2 Oct 2025 20:34:05 -0400 Message-ID: <20251003003412.588801-3-jmaloy@redhat.com> In-Reply-To: <20251003003412.588801-1-jmaloy@redhat.com> References: <20251003003412.588801-1-jmaloy@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: BlbGIjJewti7RaYT2tsis-bluP2yQgrtkrnvGZrOM9M_1759451660 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit content-type: text/plain; charset="US-ASCII"; x-default=true Message-ID-Hash: 6XIY224YFZRLGT22GUNQM2KJGJGP62S6 X-Message-ID-Hash: 6XIY224YFZRLGT22GUNQM2KJGJGP62S6 X-MailFrom: jmaloy@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: We add a cache table to keep track of the contents of the kernel ARP and NDP tables. The table is fed from the just introduced netlink based neigbour subscription function. The new table eliminates the need for explicit netlink calls to find a host's MAC address. Signed-off-by: Jon Maloy --- v5: - Moved to earlier in series to reduce rebase conflicts v6: - Sqashed the hash list commit and the FIFO/LRU queue commit - Removed hash lookup. We now only use linear lookup in a linked list - Eliminated dynamic memory allocation. - Ensured there is only one call to clock_gettime() - Using MAC_ZERO instead of the previously dedicated definitions v7: - NOW using MAC_ZERO where needed - I am still using linear back-off for empty cache entries. Even an incoming, flow-creating packet from a local host gives no guarantee that its MAC address is in the ARP table, so we must allow for a few new attempts at first possible occasions. Only after several failed lookups can we conclude that we probably never will succeed. Hence the back-off. - Fixed a bug that David inadvertently made me aware of: I only intended to set the initial expiry value to MAC_CACHE_RENEWAL when an ARP/NDP table lookup was successful. - Improved struct and function description comments. v8: - Total re-design of table, adapting to the new, subscription based way of updating it. v9: - Catering for MAC address change for an existing host. v10: - Changes according to feedback from David Gibson v12: - Changes according to feedback from David and Stefano - Added dummy entries for loopback and default GW addresses --- conf.c | 1 + fwd.c | 182 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ fwd.h | 7 +++ netlink.c | 7 ++- 4 files changed, 195 insertions(+), 2 deletions(-) diff --git a/conf.c b/conf.c index 66b9e63..3224ee6 100644 --- a/conf.c +++ b/conf.c @@ -2133,6 +2133,7 @@ void conf(struct ctx *c, int argc, char **argv) c->udp.fwd_out.mode = fwd_default; fwd_scan_ports_init(c); + fwd_neigh_table_init(c); if (!c->quiet) conf_print(c); diff --git a/fwd.c b/fwd.c index 250cf56..c34bb1c 100644 --- a/fwd.c +++ b/fwd.c @@ -33,6 +33,188 @@ static in_port_t fwd_ephemeral_max = NUM_PORTS - 1; #define PORT_RANGE_SYSCTL "/proc/sys/net/ipv4/ip_local_port_range" +#define NEIGH_TABLE_SLOTS 1024 /* Must be power of two */ +#define NEIGH_TABLE_SIZE (NEIGH_TABLE_SLOTS / 2) +static_assert((NEIGH_TABLE_SLOTS & (NEIGH_TABLE_SLOTS - 1)) == 0, + "NEIGH_TABLE_SLOTS must be a power of two"); + +/** + * struct neigh_table_entry - Entry in the ARP/NDP table + * @next: Next entry in slot or free list + * @addr: IP address of represented host + * @mac: MAC address of represented host + */ +struct neigh_table_entry { + struct neigh_table_entry *next; + union inany_addr addr; + uint8_t mac[ETH_ALEN]; +}; + +/** + * struct neigh_table - Cache of ARP/NDP table contents + * @entries: Entries to be plugged into the hash slots when allocated + * @slots: Hash table slots + * @free: Linked list of unused entries + */ +struct neigh_table { + struct neigh_table_entry entries[NEIGH_TABLE_SIZE]; + struct neigh_table_entry *slots[NEIGH_TABLE_SLOTS]; + struct neigh_table_entry *free; +}; + +static struct neigh_table neigh_table; + +/** + * neigh_table_slot() - Hash key to a number within the table range + * @c: Execution context + * @key: The key to be used for the hash + * + * Return: The resulting hash value + */ +static size_t neigh_table_slot(const struct ctx *c, + const union inany_addr *key) +{ + struct siphash_state st = SIPHASH_INIT(c->hash_secret); + uint32_t i; + + inany_siphash_feed(&st, key); + i = siphash_final(&st, sizeof(*key), 0); + + return ((size_t)i) & (NEIGH_TABLE_SIZE - 1); +} + +/** + * fwd_neigh_table_find() - Find a MAC table entry + * @c: Execution context + * @addr: Neighbour address to be used as key for the lookup + * + * Return: The matching entry, if found. Otherwise NULL. + */ +static struct neigh_table_entry *fwd_neigh_table_find(const struct ctx *c, + const union inany_addr *addr) +{ + size_t slot = neigh_table_slot(c, addr); + struct neigh_table_entry *e = neigh_table.slots[slot]; + + while (e && !inany_equals(&e->addr, addr)) + e = e->next; + + return e; +} + +/** + * fwd_neigh_table_update() - Allocate or update neighbour table entry + * @c: Execution context + * @addr: IP address used to determine insertion slot and store in entry + * @mac: The MAC address associated with the neighbour address + */ +void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr, + const uint8_t *mac) +{ + struct neigh_table *t = &neigh_table; + struct neigh_table_entry *e; + ssize_t slot; + + /* MAC address might change sometimes */ + e = fwd_neigh_table_find(c, addr); + if (e) { + if (inany_equals(addr, &inany_from_v4(c->ip4.guest_gw))) + return; + if (inany_equals(addr, (union inany_addr *)&c->ip6.guest_gw)) + return; + + memcpy(e->mac, mac, ETH_ALEN); + return; + } + + e = t->free; + if (!e) { + debug("Failed to allocate neighbour table entry"); + return; + } + t->free = e->next; + + slot = neigh_table_slot(c, addr); + e->next = t->slots[slot]; + t->slots[slot] = e; + + memcpy(&e->addr, addr, sizeof(*addr)); + memcpy(e->mac, mac, ETH_ALEN); +} + +/** + * fwd_neigh_table_free() - Remove an entry from a slot and add it to free list + * @c: Execution context + * @addr: IP address used to find the slot for the entry + */ +void fwd_neigh_table_free(const struct ctx *c, const union inany_addr *addr) +{ + ssize_t slot = neigh_table_slot(c, addr); + struct neigh_table *t = &neigh_table; + struct neigh_table_entry *e, **prev; + + prev = &t->slots[slot]; + e = t->slots[slot]; + while (e && !inany_equals(&e->addr, addr)) { + prev = &e->next; + e = e->next; + } + if (!e) + return; + + *prev = e->next; + e->next = t->free; + t->free = e; + memset(&e->addr, 0, sizeof(*addr)); + memset(e->mac, 0, ETH_ALEN); +} + +/** + * fwd_neigh_mac_get() - Look up MAC address in the ARP/NDP table + * @c: Execution context + * @addr: Neighbour IP address used as lookup key + * @mac: Buffer for Ethernet MAC to return, found or default value. + * + * Return: true if real MAC found, false if not found or if failure + */ +bool fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr, + uint8_t *mac) +{ + const struct neigh_table_entry *e = fwd_neigh_table_find(c, addr); + + if (e) + memcpy(mac, e->mac, ETH_ALEN); + else + memcpy(mac, c->our_tap_mac, ETH_ALEN); + + return !!e; +} + +/** + * fwd_neigh_table_init() - Initialize the neighbour table + * @c: Execution context + */ +void fwd_neigh_table_init(const struct ctx *c) +{ + struct neigh_table *t = &neigh_table; + const uint8_t *omac = c->our_tap_mac; + struct neigh_table_entry *e; + int i; + + memset(t, 0, sizeof(*t)); + for (i = 0; i < NEIGH_TABLE_SIZE; i++) { + e = &t->entries[i]; + e->next = t->free; + t->free = e; + } + + /* These addresses must always map to our own MAC address */ + fwd_neigh_table_update(c, &inany_loopback4, omac); + fwd_neigh_table_update(c, &inany_loopback6, omac); + fwd_neigh_table_update(c, &inany_from_v4(c->ip4.guest_gw), omac); + fwd_neigh_table_update(c, (union inany_addr *)&c->ip6.guest_gw, omac); +} + /** fwd_probe_ephemeral() - Determine what ports this host considers ephemeral * * Work out what ports the host thinks are emphemeral and record it for later diff --git a/fwd.h b/fwd.h index 65c7c96..6ca743c 100644 --- a/fwd.h +++ b/fwd.h @@ -56,5 +56,12 @@ uint8_t fwd_nat_from_splice(const struct ctx *c, uint8_t proto, const struct flowside *ini, struct flowside *tgt); uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, const struct flowside *ini, struct flowside *tgt); +void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr, + const uint8_t *mac); +void fwd_neigh_table_free(const struct ctx *c, + const union inany_addr *addr); +bool fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr, + uint8_t *mac); +void fwd_neigh_table_init(const struct ctx *c); #endif /* FWD_H */ diff --git a/netlink.c b/netlink.c index 3fe2fdd..4be5fcf 100644 --- a/netlink.c +++ b/netlink.c @@ -1192,10 +1192,13 @@ static void nl_neigh_msg_read(const struct ctx *c, struct nlmsghdr *nh) inany_from_af(&addr, ndm->ndm_family, dst); inany_ntop(dst, ip_str, sizeof(ip_str)); - if (nh->nlmsg_type == RTM_NEWNEIGH && ndm->ndm_state & NUD_VALID) + if (nh->nlmsg_type == RTM_NEWNEIGH && ndm->ndm_state & NUD_VALID) { trace("neigh table update: %s / %s", ip_str, mac_str); - else + fwd_neigh_table_update(c, &addr, mac); + } else { trace("neigh table delete: %s / %s", ip_str, mac_str); + fwd_neigh_table_free(c, &addr); + } } /** -- 2.50.1