From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=ZQiaSYNI; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id BB3BE5A0275 for ; Tue, 30 Sep 2025 01:58:48 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1759190327; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IpT/GukUukhnO0IZAfs8gEzObt07k4OaoTOuT8W6vq0=; b=ZQiaSYNIkZUqxrnAg4hHs9t7DP0smTOPFcdeG8s9xOmbaOrMCg5Cteaq9A//VDbWhQGClm I3PRhzijJSQ3necEGIP+MmzvHlri0KA929m8jBYBkeCITashXO78yeUX9Wens8MIjfteCx qYgOwNOf9hyQEjlcmkEuv67vtVEtfQo= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-78-0IsIQXjgONWVbkPp7G05dw-1; Mon, 29 Sep 2025 19:58:46 -0400 X-MC-Unique: 0IsIQXjgONWVbkPp7G05dw-1 X-Mimecast-MFC-AGG-ID: 0IsIQXjgONWVbkPp7G05dw_1759190325 Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-3ee888281c3so5461334f8f.3 for ; Mon, 29 Sep 2025 16:58:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759190325; x=1759795125; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=IpT/GukUukhnO0IZAfs8gEzObt07k4OaoTOuT8W6vq0=; b=XHfHoeI24CMNBGF20jYp8EAbxCty8iwlYRVNexGm4laxfJU98nhQrofs2KCBkQeHf4 NHUWvUft78QGd+RuSLNAkmEIHVUr22ychjS2kuL/6GGbpo7i38364yXVEBqMYpZSov/k UhZ9MKVO2vsJQSPcRilMC+GKSqw2wMSPa5xNpO3/zADVhQuIp3zuZ040O8X1Ses3QxPI xvWtTso/WYmhAgpIEypY29pCzNRNm+ZUTaAwsuJg2UAWJ9kcdYKimo82zP5w9zwjQ/fY Ge9ZjS7X1fYOcxhLU2uaFKOi9TZUyBq40HhykbaxnO8XgDhgjn/BkJFTYlzIOciomlB6 N20Q== X-Forwarded-Encrypted: i=1; AJvYcCXxjqIBXcUwY4rke2m8i4P504u+cjGcJffGqWt/6t1drMPGZz4NmOcBDDziqJJwFnk3IhAtQA23HUo=@passt.top X-Gm-Message-State: AOJu0Yxh4L2hvGaz5f5UXbBrJkLwfuVsilbqdVRj5YzRFTZLloMt3H6O 43BnG4dGadU+92g8AH28O0wZvywrwxjbV9NN9fnZRLq7EtBpBGPNmwiRFeihKrdMeYYOOTbAZRz DLBtGHUOJSHxml2Ok5aPQE68tuhLjF4VNQnnnKpzf04SqPwbyqHB1JQ== X-Gm-Gg: ASbGncumF3SbZOPvK+E8/bCmt9u1LzoDiIFSrLQ0oXOdZfTpVBfS29kAz/PqGQZTs+6 gHRGyXC+ioCqmHyDRja++rViik+Oh/6LNwzKsPhZ8Rtm8LVFLcSZtfKD27Fi/0TSAPPjcK/7WMH 3BIQsBhKytzdAkZoIAnNXZbxvN1WEdX9OpSsTAOlsPLru+TWKh5cSawcrHfOuifiQTCE6Lo5lq9 vT93fc0TAko4iy3inA8fy95nG0cAgCpFwk7iuCckpRa5N/cY16YRMHF2qr3m+PM08m9zv8kJAGC GcGyFioVHd8EVk5txQohDPjfQal3dpjz5u9xfhpOPxYa4h+J1QHfowiLnAvzsnlysJPT X-Received: by 2002:a05:6000:2906:b0:3ec:db8b:cbf1 with SMTP id ffacd0b85a97d-40e47831113mr15048535f8f.24.1759190324621; Mon, 29 Sep 2025 16:58:44 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFdoE9uXJYEQk4w3dxq+UcqzrI9YXUE2dnwMmZIyM8MwiteHidFJLyEkysd5Sxveafw2w5elg== X-Received: by 2002:a05:6000:2906:b0:3ec:db8b:cbf1 with SMTP id ffacd0b85a97d-40e47831113mr15048524f8f.24.1759190324145; Mon, 29 Sep 2025 16:58:44 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-46e5c3cad50sm501085e9.3.2025.09.29.16.58.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Sep 2025 16:58:43 -0700 (PDT) Date: Tue, 30 Sep 2025 01:58:42 +0200 From: Stefano Brivio To: Jon Maloy Subject: Re: [PATCH v11 2/9] fwd: Add cache table for ARP/NDP contents Message-ID: <20250930015842.68341327@elisabeth> In-Reply-To: <20250927192522.3024554-3-jmaloy@redhat.com> References: <20250927192522.3024554-1-jmaloy@redhat.com> <20250927192522.3024554-3-jmaloy@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: ThGcxZ4blao-X9dL-p-Zwxoso4t0g-oiWH9bqDlhRx8_1759190325 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: WJELD6Z4C67Z2R2ZMBBMIMXTTT4B6XMH X-Message-ID-Hash: WJELD6Z4C67Z2R2ZMBBMIMXTTT4B6XMH X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: dgibson@redhat.com, david@gibson.dropbear.id.au, passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Almost entirely nitpicks here: On Sat, 27 Sep 2025 15:25:15 -0400 Jon Maloy wrote: > We add a cache table to keep track of the contents of the kernel ARP > and NDP tables. The table is fed from the just introduced netlink based > neigbour subscription function. The new table eliminates the need for > explicit netlink calls to find a host's MAC address. > > Signed-off-by: Jon Maloy > > --- > v5: - Moved to earlier in series to reduce rebase conflicts > v6: - Sqashed the hash list commit and the FIFO/LRU queue commit > - Removed hash lookup. We now only use linear lookup in a > linked list > - Eliminated dynamic memory allocation. > - Ensured there is only one call to clock_gettime() > - Using MAC_ZERO instead of the previously dedicated definitions > v7: - NOW using MAC_ZERO where needed > - I am still using linear back-off for empty cache entries. Even > an incoming, flow-creating packet from a local host gives no > guarantee that its MAC address is in the ARP table, so we must > allow for a few new attempts at first possible occasions. Only > after several failed lookups can we conclude that we probably > never will succeed. Hence the back-off. > - Fixed a bug that David inadvertently made me aware of: I only > intended to set the initial expiry value to MAC_CACHE_RENEWAL > when an ARP/NDP table lookup was successful. > - Improved struct and function description comments. > v8: - Total re-design of table, adapting to the new, subscription > based way of updating it. > v9: - Catering for MAC address change for an existing host. > v10: - Changes according to feedback from David Gibson > --- > conf.c | 1 + > fwd.c | 164 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > fwd.h | 7 +++ > netlink.c | 2 + > 4 files changed, 174 insertions(+) > > diff --git a/conf.c b/conf.c > index 66b9e63..cfd8590 100644 > --- a/conf.c > +++ b/conf.c > @@ -2133,6 +2133,7 @@ void conf(struct ctx *c, int argc, char **argv) > c->udp.fwd_out.mode = fwd_default; > > fwd_scan_ports_init(c); > + fwd_neigh_table_init(); > > if (!c->quiet) > conf_print(c); > diff --git a/fwd.c b/fwd.c > index 250cf56..2fd6cee 100644 > --- a/fwd.c > +++ b/fwd.c > @@ -33,6 +33,170 @@ static in_port_t fwd_ephemeral_max = NUM_PORTS - 1; > > #define PORT_RANGE_SYSCTL "/proc/sys/net/ipv4/ip_local_port_range" > > +#define NEIGH_TABLE_SLOTS 1024 /* Must be power of two */ > +#define NEIGH_TABLE_SIZE (NEIGH_TABLE_SLOTS / 2) > + > +/** > + * neigh_table_entry - Entry in the ARP/NDP table It's a struct (it could be a union), so kerneldoc says: * struct neigh_table_entry - ... > + * @next: Next entry in slot or free list > + * @addr: IP address of represented host > + * @mac: MAC address of represented host > + */ > +struct neigh_table_entry { > + struct neigh_table_entry *next; > + union inany_addr addr; > + uint8_t mac[ETH_ALEN]; > +}; > + > +/** > + * neigh_table - Cache of ARP/NDP table contents Same as above. > + * @entries: Entries to be plugged into the hash slots when allocated > + * @slots: Hash table slots > + * @free: Linked list of unused entries > + */ > +struct neigh_table { > + struct neigh_table_entry entries[NEIGH_TABLE_SIZE]; > + struct neigh_table_entry *slots[NEIGH_TABLE_SLOTS]; > + struct neigh_table_entry *free; > +}; > + > +static struct neigh_table neigh_table; > + > +/** > + * neigh_table_slot() - Hash key to a number within the table range > + * @c: Execution context > + * @key: The key to be used for the hash > + * > + * Return: The resulting hash value > + */ > +static inline size_t neigh_table_slot(const struct ctx *c, See: https://www.kernel.org/doc/html/latest/process/coding-style.html#the-inline-disease > + const union inany_addr *key) > +{ > + struct siphash_state st = SIPHASH_INIT(c->hash_secret); > + uint32_t i; > + > + inany_siphash_feed(&st, key); > + i = siphash_final(&st, sizeof(*key), 0); > + > + return ((size_t)i) & (NEIGH_TABLE_SIZE - 1); > +} > + > +/** > + * fwd_neigh_table_find() - Find a MAC table entry Strictly speaking, it's a MAC address table -- MAC refers to the access control itself. > + * @c: Execution context > + * @addr: Neighbour address to be used as key for the lookup > + * > + * Return: The found entry, if any. Otherwise NULL. In the spirit of the closing lines of: https://archives.passt.top/passt-dev/20250915081319.00e72e53@elisabeth/ I'd suggest that in Germanic languages verb participles don't always behave like adjectives, so you can say "The entry [that was] found", but "The found entry" is rather awkward. Or go with "matching entry", gerunds are forgiving. > + */ > +static struct neigh_table_entry *fwd_neigh_table_find(const struct ctx *c, > + const union inany_addr *addr) > +{ > + size_t slot = neigh_table_slot(c, addr); > + struct neigh_table_entry *e = neigh_table.slots[slot]; > + > + while (e && !inany_equals(&e->addr, addr)) > + e = e->next; > + > + return e; > +} > + > +/** > + * fwd_neigh_table_update() - Allocate a neigbour table entry from the free list neighbour > + * @c: Execution context > + * @addr: Address used to determine insertion slot and store in entry > + * @mac: The MAC address associated with the neighbour address > + */ > +void fwd_neigh_table_update(const struct ctx *c, > + const union inany_addr *addr, uint8_t *mac) Cppcheck suggests that: fwd.c:112:47: style: Parameter 'mac' can be declared as pointer to const [constParameterPointer] const union inany_addr *addr, uint8_t *mac) ^ > +{ > + struct neigh_table *t = &neigh_table; > + struct neigh_table_entry *e; > + ssize_t slot; > + > + /* MAC address might change sometimes */ > + e = fwd_neigh_table_find(c, addr); > + if (e) { > + memcpy(e->mac, mac, ETH_ALEN); > + return; > + } > + > + e = t->free; > + if (!e) > + return; Log a debug() or trace() message here? We might be staring at something like this for a while otherwise. > + t->free = e->next; > + > + slot = neigh_table_slot(c, addr); > + e->next = t->slots[slot]; > + t->slots[slot] = e; > + > + memcpy(&e->addr, addr, sizeof(*addr)); > + memcpy(e->mac, mac, ETH_ALEN); > +} > + > +/** > + * fwd_neigh_table_free() - Remove an entry from a slot and add it to free list > + * @c: Execution context > + * @addr: Neighbour address used to find the slot for the entry The neighbour address can be a link-layer / MAC address, or an IP address. Perhaps go with "IP address" here, so that it's clear that we free by IP address, and not by MAC address? > + */ > +void fwd_neigh_table_free(const struct ctx *c, const union inany_addr *addr) > +{ > + ssize_t slot = neigh_table_slot(c, addr); > + struct neigh_table *t = &neigh_table; > + struct neigh_table_entry *e, **prev; > + > + prev = &t->slots[slot]; > + e = t->slots[slot]; > + while (e && !inany_equals(&e->addr, addr)) { > + prev = &e->next; > + e = e->next; > + } > + if (!e) > + return; > + > + *prev = e->next; > + e->next = t->free; > + t->free = e; > + memset(&e->addr, 0, sizeof(*addr)); > + memset(e->mac, 0, ETH_ALEN); Do we care about zeroing them? If we do because we might find those entries, note that both all-zero MAC address and IP addresses are valid. If we can't find those entries linked anywhere, we don't necessarily care about zeroing them. If it makes you feel better, though, I'd say keep that. > +} > + > +/** > + * fwd_neigh_mac_get() - Lookup MAC address in the ARP/NDP table Look up (the verb, not the noun) > + * @c: Execution context > + * @addr: Neighbour address used as lookup key > + * @mac: Buffer for Ethernet MAC to return, found or default value. > + * > + * Return: true if real MAC found, false if not found or if failure on failure ...by the way, you're ignoring this return value in the rest of the series. Is it a left-over? > + */ > +bool fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr, > + uint8_t *mac) > +{ > + struct neigh_table_entry *e = fwd_neigh_table_find(c, addr); My pedantic friend reports: fwd.c:182:28: style: Variable 'e' can be declared as pointer to const [constVariablePointer] struct neigh_table_entry *e = fwd_neigh_table_find(c, addr); ^ > + > + if (e) > + memcpy(mac, e->mac, ETH_ALEN); > + else > + memcpy(mac, c->our_tap_mac, ETH_ALEN); > + > + return !!e; > +} > + > +/** > + * fwd_neigh_table_init() - Initialize the neighbor table neighbour, for consistency, not correctness. > + */ > +void fwd_neigh_table_init(void) > +{ > + struct neigh_table *t = &neigh_table; > + struct neigh_table_entry *e; > + > + memset(t, 0, sizeof(*t)); > + for (int i = 0; i < NEIGH_TABLE_SIZE; i++) { For consistency: we never declare variables in loop headers. > + e = &t->entries[i]; > + e->next = t->free; > + t->free = e; This, by the way, has the practical effect of making the kernel allocate memory for the full table, even if it's otherwise unused, because of those pointers. It's ~24 kB so I don't really care, and I can't think of a much better way to organise it, either. > + } > +} > + > /** fwd_probe_ephemeral() - Determine what ports this host considers ephemeral > * > * Work out what ports the host thinks are emphemeral and record it for later > diff --git a/fwd.h b/fwd.h > index 65c7c96..5464349 100644 > --- a/fwd.h > +++ b/fwd.h > @@ -56,5 +56,12 @@ uint8_t fwd_nat_from_splice(const struct ctx *c, uint8_t proto, > const struct flowside *ini, struct flowside *tgt); > uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, > const struct flowside *ini, struct flowside *tgt); > +void fwd_neigh_table_update(const struct ctx *c, > + const union inany_addr *addr, uint8_t *mac); > +void fwd_neigh_table_free(const struct ctx *c, > + const union inany_addr *addr); > +bool fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr, > + uint8_t *mac); > +void fwd_neigh_table_init(void); > > #endif /* FWD_H */ > diff --git a/netlink.c b/netlink.c > index 0dc4e9d..da9e1d5 100644 > --- a/netlink.c > +++ b/netlink.c > @@ -1189,8 +1189,10 @@ static void nl_neigh_msg_read(const struct ctx *c, struct nlmsghdr *nh) > if (nh->nlmsg_type == RTM_NEWNEIGH && > ndm->ndm_state & NUD_VALID) { > trace("neigh table update: %s / %s", ip_str, mac_str); > + fwd_neigh_table_update(c, &addr, mac); > } else { > trace("neigh table delete: %s / %s", ip_str, mac_str); > + fwd_neigh_table_free(c, &addr); > } > } >