From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=F+QAcnVL; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id 865735A0619 for ; Tue, 07 Oct 2025 00:40:54 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1759790453; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GuU6tctkAS8XxWvQGv6oBsecfNU+Opwy9ZxDVeLzFLc=; b=F+QAcnVLjPe4y48n7rGPn2mbb6ER2xcZvPjmHaeDfFGqbfrTjiE4W1CMicEPIKgKzUgEmz LP+WaxCvV3qD18d+v8AJTpRuNJGsH1jbaWcWDJEL9q8W/2G4ri/yYW4dKa9t7EkfXqT3ED gekM45XBXpZCCbyNFY49RyUfR92NfzM= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-163-8uj67l2kMRKU46dC09OwaA-1; Mon, 06 Oct 2025 18:40:52 -0400 X-MC-Unique: 8uj67l2kMRKU46dC09OwaA-1 X-Mimecast-MFC-AGG-ID: 8uj67l2kMRKU46dC09OwaA_1759790451 Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-46e377d8c80so20383975e9.3 for ; Mon, 06 Oct 2025 15:40:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759790451; x=1760395251; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=GuU6tctkAS8XxWvQGv6oBsecfNU+Opwy9ZxDVeLzFLc=; b=eyhO4jgVGDvzw0DgEaIKQepsgoy0fGSN//xDinCs1jfPQQNOlMxim6GcShYErX0GNb MYPllsXyNIDVoh6XFD+RJQdiWlgzPaprbZern8A6XNUpQ775iPRufMSQtxTiR8l7QMxD u8TL1WiKYg3UAfs4CWO4nTyIomj+lYYRQ5tWjg1lUVmDiNddAlghE+LjVnqwkAFcpbmO Yk358gUL6vd6rX4d8K52+yj6pVtJfKlC8JmKgA/AHvwmJtwdXsILR0l3MYEKaO9OohP5 K+dCSHhAJA+h6o1lY8ZoddqsAFXK5WSjejp5GHq3W4acui2/+7pJcLnqg8WGmqfdeElv VqLQ== X-Forwarded-Encrypted: i=1; AJvYcCXNCKRbyWUbMzywm371XrXVDKNWeQAFxG38khPQJ/jM0AUIbp9oRMrNHnNF7U8Gi4ve5OG08dQvtag=@passt.top X-Gm-Message-State: AOJu0YwXg7cWG89ILKegSuMUatJPe3+JELgI73VJV3ERPiJ6ZvyEI4t3 oHOZzJ6BsLeGQtB+blOVnxRyxmVREcJ/b3xv4tFwPrMEhH+2dPwH4xpSBWwLSSpnG7U0iwyyGYV FwQjTiH7vlFXle32ybiWFxiP2eTa1BFNRGvw9ChqUuAs9o2ukFdILAQ== X-Gm-Gg: ASbGnctOHdkORUkN2NXroqP9LcNI9irbICyjSihnWDOn877WKEOlXobrN62eCbGLLEC NAD4zBJJDByT9CypLL459ZTokIWAxzD/f49mew5Ha27AvSFY4dKyu5faEY7RW2/37NahjXxTHDQ uu+oRx4R6WsilZfg8r8SdUSD8PMvXcww5/Q0ejkvzozfnMmpdmg3QDhDTfFKGt25DYj87ui5Byj KgsEAz2XTj2pqgPn27ziP1gvLAV6kMfXAlDQizyUb/sOiN9ulfT3jW/CJHVpK0jskDk9Zh1fyV6 ncf9lR7v2ehC4KWsQgy9nF+EKdeRnBszY36d1DqY5S9jt6FJoS59pAyd X-Received: by 2002:a05:600c:8b45:b0:46e:3dc2:ebac with SMTP id 5b1f17b1804b1-46e71168aa1mr95869535e9.27.1759790450926; Mon, 06 Oct 2025 15:40:50 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFdltM+frT0Bm7Q8WgxmCaoudaBrweDwS0Z0vXhZLUovDbNuK0ETyDDAzdtxqXCsqlLEfVIqA== X-Received: by 2002:a05:600c:8b45:b0:46e:3dc2:ebac with SMTP id 5b1f17b1804b1-46e71168aa1mr95869375e9.27.1759790450439; Mon, 06 Oct 2025 15:40:50 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-46fa3acf0c1sm2843495e9.9.2025.10.06.15.40.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Oct 2025 15:40:49 -0700 (PDT) Date: Tue, 7 Oct 2025 00:40:48 +0200 From: Stefano Brivio To: Jon Maloy Subject: Re: [PATCH v12 2/9] fwd: Add cache table for ARP/NDP contents Message-ID: <20251007004048.6ce0c5f8@elisabeth> In-Reply-To: <20251003003412.588801-3-jmaloy@redhat.com> References: <20251003003412.588801-1-jmaloy@redhat.com> <20251003003412.588801-3-jmaloy@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 1rJjXI-t7hEf6Wr-q0l7xlpiYSPpwq-sgUincnTaDMQ_1759790451 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: YAUBWPJ6VNLJSZ3I3DFIBOZ53WAZIPRU X-Message-ID-Hash: YAUBWPJ6VNLJSZ3I3DFIBOZ53WAZIPRU X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: dgibson@redhat.com, david@gibson.dropbear.id.au, passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, 2 Oct 2025 20:34:05 -0400 Jon Maloy wrote: > We add a cache table to keep track of the contents of the kernel ARP > and NDP tables. The table is fed from the just introduced netlink based > neigbour subscription function. The new table eliminates the need for > explicit netlink calls to find a host's MAC address. > > Signed-off-by: Jon Maloy > > --- > v5: - Moved to earlier in series to reduce rebase conflicts > v6: - Sqashed the hash list commit and the FIFO/LRU queue commit > - Removed hash lookup. We now only use linear lookup in a > linked list > - Eliminated dynamic memory allocation. > - Ensured there is only one call to clock_gettime() > - Using MAC_ZERO instead of the previously dedicated definitions > v7: - NOW using MAC_ZERO where needed > - I am still using linear back-off for empty cache entries. Even > an incoming, flow-creating packet from a local host gives no > guarantee that its MAC address is in the ARP table, so we must > allow for a few new attempts at first possible occasions. Only > after several failed lookups can we conclude that we probably > never will succeed. Hence the back-off. > - Fixed a bug that David inadvertently made me aware of: I only > intended to set the initial expiry value to MAC_CACHE_RENEWAL > when an ARP/NDP table lookup was successful. > - Improved struct and function description comments. > v8: - Total re-design of table, adapting to the new, subscription > based way of updating it. > v9: - Catering for MAC address change for an existing host. > v10: - Changes according to feedback from David Gibson > v12: - Changes according to feedback from David and Stefano > - Added dummy entries for loopback and default GW addresses > --- > conf.c | 1 + > fwd.c | 182 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > fwd.h | 7 +++ > netlink.c | 7 ++- > 4 files changed, 195 insertions(+), 2 deletions(-) > > diff --git a/conf.c b/conf.c > index 66b9e63..3224ee6 100644 > --- a/conf.c > +++ b/conf.c > @@ -2133,6 +2133,7 @@ void conf(struct ctx *c, int argc, char **argv) > c->udp.fwd_out.mode = fwd_default; > > fwd_scan_ports_init(c); > + fwd_neigh_table_init(c); > > if (!c->quiet) > conf_print(c); > diff --git a/fwd.c b/fwd.c > index 250cf56..c34bb1c 100644 > --- a/fwd.c > +++ b/fwd.c > @@ -33,6 +33,188 @@ static in_port_t fwd_ephemeral_max = NUM_PORTS - 1; > > #define PORT_RANGE_SYSCTL "/proc/sys/net/ipv4/ip_local_port_range" > > +#define NEIGH_TABLE_SLOTS 1024 /* Must be power of two */ > +#define NEIGH_TABLE_SIZE (NEIGH_TABLE_SLOTS / 2) > +static_assert((NEIGH_TABLE_SLOTS & (NEIGH_TABLE_SLOTS - 1)) == 0, > + "NEIGH_TABLE_SLOTS must be a power of two"); > + > +/** > + * struct neigh_table_entry - Entry in the ARP/NDP table Nit: excess whitespace before "Entry". > + * @next: Next entry in slot or free list > + * @addr: IP address of represented host > + * @mac: MAC address of represented host > + */ > +struct neigh_table_entry { > + struct neigh_table_entry *next; > + union inany_addr addr; > + uint8_t mac[ETH_ALEN]; > +}; > + > +/** > + * struct neigh_table - Cache of ARP/NDP table contents Same here. > + * @entries: Entries to be plugged into the hash slots when allocated > + * @slots: Hash table slots > + * @free: Linked list of unused entries > + */ > +struct neigh_table { > + struct neigh_table_entry entries[NEIGH_TABLE_SIZE]; > + struct neigh_table_entry *slots[NEIGH_TABLE_SLOTS]; > + struct neigh_table_entry *free; > +}; > + > +static struct neigh_table neigh_table; > + > +/** > + * neigh_table_slot() - Hash key to a number within the table range > + * @c: Execution context > + * @key: The key to be used for the hash > + * > + * Return: The resulting hash value Nit: s/The/the/, it's part of the same statement / sentence. > + */ > +static size_t neigh_table_slot(const struct ctx *c, > + const union inany_addr *key) > +{ > + struct siphash_state st = SIPHASH_INIT(c->hash_secret); > + uint32_t i; > + > + inany_siphash_feed(&st, key); > + i = siphash_final(&st, sizeof(*key), 0); > + > + return ((size_t)i) & (NEIGH_TABLE_SIZE - 1); > +} > + > +/** > + * fwd_neigh_table_find() - Find a MAC table entry > + * @c: Execution context > + * @addr: Neighbour address to be used as key for the lookup > + * > + * Return: The matching entry, if found. Otherwise NULL. Same here. > + */ > +static struct neigh_table_entry *fwd_neigh_table_find(const struct ctx *c, > + const union inany_addr *addr) > +{ > + size_t slot = neigh_table_slot(c, addr); > + struct neigh_table_entry *e = neigh_table.slots[slot]; > + > + while (e && !inany_equals(&e->addr, addr)) > + e = e->next; > + > + return e; > +} > + > +/** > + * fwd_neigh_table_update() - Allocate or update neighbour table entry > + * @c: Execution context > + * @addr: IP address used to determine insertion slot and store in entry > + * @mac: The MAC address associated with the neighbour address > + */ > +void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr, > + const uint8_t *mac) > +{ > + struct neigh_table *t = &neigh_table; > + struct neigh_table_entry *e; > + ssize_t slot; > + > + /* MAC address might change sometimes */ > + e = fwd_neigh_table_find(c, addr); > + if (e) { > + if (inany_equals(addr, &inany_from_v4(c->ip4.guest_gw))) > + return; > + if (inany_equals(addr, (union inany_addr *)&c->ip6.guest_gw)) > + return; > + > + memcpy(e->mac, mac, ETH_ALEN); > + return; > + } > + > + e = t->free; > + if (!e) { > + debug("Failed to allocate neighbour table entry"); > + return; > + } > + t->free = e->next; > + > + slot = neigh_table_slot(c, addr); > + e->next = t->slots[slot]; > + t->slots[slot] = e; > + > + memcpy(&e->addr, addr, sizeof(*addr)); > + memcpy(e->mac, mac, ETH_ALEN); > +} > + > +/** > + * fwd_neigh_table_free() - Remove an entry from a slot and add it to free list > + * @c: Execution context > + * @addr: IP address used to find the slot for the entry > + */ > +void fwd_neigh_table_free(const struct ctx *c, const union inany_addr *addr) > +{ > + ssize_t slot = neigh_table_slot(c, addr); > + struct neigh_table *t = &neigh_table; > + struct neigh_table_entry *e, **prev; > + > + prev = &t->slots[slot]; > + e = t->slots[slot]; > + while (e && !inany_equals(&e->addr, addr)) { > + prev = &e->next; > + e = e->next; > + } > + if (!e) > + return; > + > + *prev = e->next; > + e->next = t->free; > + t->free = e; > + memset(&e->addr, 0, sizeof(*addr)); > + memset(e->mac, 0, ETH_ALEN); > +} > + > +/** > + * fwd_neigh_mac_get() - Look up MAC address in the ARP/NDP table > + * @c: Execution context > + * @addr: Neighbour IP address used as lookup key > + * @mac: Buffer for Ethernet MAC to return, found or default value. Nit: trailing '.' The whole description isn't really clear to me. It sounds like there are three possible alternatives, that is, "return", "found", or "default". Maybe simply: Buffer for returned MAC address ? > + * > + * Return: true if real MAC found, false if not found or if failure This is a MAC address, not the media access control itself (you can't really... find it). > + */ > +bool fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr, > + uint8_t *mac) As I pointed out on v11: you seem to be ignoring the return value everywhere in this series, so returning a value here seems to be a left-over. I'm not reviewing past this point. > +{ > + const struct neigh_table_entry *e = fwd_neigh_table_find(c, addr); > + > + if (e) > + memcpy(mac, e->mac, ETH_ALEN); > + else > + memcpy(mac, c->our_tap_mac, ETH_ALEN); > + > + return !!e; > +} > + > +/** > + * fwd_neigh_table_init() - Initialize the neighbour table > + * @c: Execution context > + */ > +void fwd_neigh_table_init(const struct ctx *c) > +{ > + struct neigh_table *t = &neigh_table; > + const uint8_t *omac = c->our_tap_mac; > + struct neigh_table_entry *e; > + int i; > + > + memset(t, 0, sizeof(*t)); > + for (i = 0; i < NEIGH_TABLE_SIZE; i++) { > + e = &t->entries[i]; > + e->next = t->free; > + t->free = e; > + } > + > + /* These addresses must always map to our own MAC address */ > + fwd_neigh_table_update(c, &inany_loopback4, omac); > + fwd_neigh_table_update(c, &inany_loopback6, omac); > + fwd_neigh_table_update(c, &inany_from_v4(c->ip4.guest_gw), omac); > + fwd_neigh_table_update(c, (union inany_addr *)&c->ip6.guest_gw, omac); > +} > + > /** fwd_probe_ephemeral() - Determine what ports this host considers ephemeral > * > * Work out what ports the host thinks are emphemeral and record it for later > diff --git a/fwd.h b/fwd.h > index 65c7c96..6ca743c 100644 > --- a/fwd.h > +++ b/fwd.h > @@ -56,5 +56,12 @@ uint8_t fwd_nat_from_splice(const struct ctx *c, uint8_t proto, > const struct flowside *ini, struct flowside *tgt); > uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, > const struct flowside *ini, struct flowside *tgt); > +void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr, > + const uint8_t *mac); > +void fwd_neigh_table_free(const struct ctx *c, > + const union inany_addr *addr); > +bool fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr, > + uint8_t *mac); > +void fwd_neigh_table_init(const struct ctx *c); > > #endif /* FWD_H */ > diff --git a/netlink.c b/netlink.c > index 3fe2fdd..4be5fcf 100644 > --- a/netlink.c > +++ b/netlink.c > @@ -1192,10 +1192,13 @@ static void nl_neigh_msg_read(const struct ctx *c, struct nlmsghdr *nh) > inany_from_af(&addr, ndm->ndm_family, dst); > inany_ntop(dst, ip_str, sizeof(ip_str)); > > - if (nh->nlmsg_type == RTM_NEWNEIGH && ndm->ndm_state & NUD_VALID) > + if (nh->nlmsg_type == RTM_NEWNEIGH && ndm->ndm_state & NUD_VALID) { > trace("neigh table update: %s / %s", ip_str, mac_str); > - else > + fwd_neigh_table_update(c, &addr, mac); > + } else { > trace("neigh table delete: %s / %s", ip_str, mac_str); > + fwd_neigh_table_free(c, &addr); > + } > } > > /** -- Stefano