From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=NgbYc857; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 27B1B5A026F for ; Sun, 19 Oct 2025 12:07:59 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1760868478; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7Nfd0R8yu2+Va/IfInyinql70YHygVvjBg60Kqos7aE=; b=NgbYc8570Rj0byqq/9FWIwWg2Bv3OsGjIZQXx33kaIBPhUW+oVknsWL99cKCtwFJJr8leM wwjjdCXZ4n5zJsPdNbqCbRpACXpxiFY6DYX1y8BWgBSYgjUxX8S0+qEm9e2hzB7LcE6JlS ZlXWqJNkrfZ9T1OkrCqaDEElAAkoUnc= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-92-nhZ9mjRJOOGOs7BK0iGUlQ-1; Sun, 19 Oct 2025 06:07:56 -0400 X-MC-Unique: nhZ9mjRJOOGOs7BK0iGUlQ-1 X-Mimecast-MFC-AGG-ID: nhZ9mjRJOOGOs7BK0iGUlQ_1760868475 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-427027a2095so3693128f8f.2 for ; Sun, 19 Oct 2025 03:07:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760868475; x=1761473275; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=7Nfd0R8yu2+Va/IfInyinql70YHygVvjBg60Kqos7aE=; b=JzPjBqt2LqZuecZ/wYVVKm8VkrnZee434hMYGwQraUjysU+YhRUIRBE4JJn3AfPOtl M3LYv5Gr+aELSXOUVstOU6Wb5mZAjX8um132OWFlPqlt0jbKe4+6zoHdw3Hi1GjaD8cu qu/7RxrpO1IVeH8UZwRNd/I7mM2kYUnVzlRpIT0fb2o+a9G2xZWe0PCTC7hOrIJEh3Hj FjKTtsTTvGYpZuSgYWumIPy5ypq5XwvShI7a1ueQnWoO4kKOM5tlnGm89f7WG/sKpVNi bR/rwvWJzldzfpuBf6IUKNLRwvAX7leKIavXj8r+8kp13f/hOnTJ1Sf+eeV+H4iyuc2z ekpQ== X-Forwarded-Encrypted: i=1; AJvYcCVSSpPcHaKGSy5TfYvnARKT8EOlk+eloob2HXX+B5l0vNjJeV607fLW7AheKkAYfVCYTaDEAl9Z33Q=@passt.top X-Gm-Message-State: AOJu0YzTuu5JfenyCj7KNSx7pGgpdXnz1ZS7SFb12pLAzpE5CRVQNIrn rKZpChK+eJS6CHhkQvI+mb2p1naHC2i4nyv1l6gfOTEYI8g4gx7CC6lJD4+CA/FcHWAr1c1HQaH LD8UvuuXfy8mSB/O+X1w9lE4wYYshg+ncv9HMGArYqP4xLq3QzU7/kQ== X-Gm-Gg: ASbGnctjMo9tc3LyIn4sNYSDgtk/NXFI6k/u7YkjPv7DH27YtyE2fJbqovz2+Ant1qz uvP1n0lcnxPqI6ZFZ1CGXwCb7RAI+HNdiEvGxz9BgTHJZ+IeOB2oTwVywDQlZ4djbritaVW6PR9 O2hVXz7GPLQUN7JB7i7psnM+BzZg1bFk52TOOnvgm9jPVB4aGKJFdEO8EyAQwMUT4nxKsV+d0i7 JdJZRAcg93uFzQ6E/yigP0lTbiEsSCS7pLRLJpk1s9FUp1xJNO/bDkjau3DADV6omqMpBQzKT9A 5Sc9hJ6eaMN/jEXkBSwr7MSwtDovf8HWC1LpPF2vR86BoqWLAMFhLFnI2dl148+kNUGBGToMllp B+Omn1tbbf+44DFDnNl2/vB/ZMNo= X-Received: by 2002:a5d:5888:0:b0:427:151:3dae with SMTP id ffacd0b85a97d-42704d8dfa2mr6228230f8f.20.1760868475142; Sun, 19 Oct 2025 03:07:55 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFHP1i3TF349HeYcu8grx9891/FfuP7aSJKtKnRGAWzz4LN0UrzrKd1slzs/kyf5s7T/jd4FA== X-Received: by 2002:a5d:5888:0:b0:427:151:3dae with SMTP id ffacd0b85a97d-42704d8dfa2mr6228213f8f.20.1760868474588; Sun, 19 Oct 2025 03:07:54 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-427ea5a0f88sm9190548f8f.7.2025.10.19.03.07.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 Oct 2025 03:07:53 -0700 (PDT) Date: Sun, 19 Oct 2025 12:07:52 +0200 From: Stefano Brivio To: Jon Maloy Subject: Re: [PATCH v14 03/10] fwd: Add cache table for ARP/NDP contents Message-ID: <20251019120752.6b48503a@elisabeth> In-Reply-To: <20251015025521.1449156-4-jmaloy@redhat.com> References: <20251015025521.1449156-1-jmaloy@redhat.com> <20251015025521.1449156-4-jmaloy@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 0hHpJA8pvv50E7rNggbN_o_vyozzcYsZGDPXbxZQkdU_1760868475 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: TJY24GRFLWBWFCUKO6S7GIRJFV5B45XH X-Message-ID-Hash: TJY24GRFLWBWFCUKO6S7GIRJFV5B45XH X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: dgibson@redhat.com, david@gibson.dropbear.id.au, passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Tue, 14 Oct 2025 22:55:14 -0400 Jon Maloy wrote: > We add a cache table to keep track of the contents of the kernel ARP > and NDP tables. The table is fed from the just introduced netlink based > neigbour subscription function. > > Signed-off-by: Jon Maloy > > --- > v5: - Moved to earlier in series to reduce rebase conflicts > v6: - Sqashed the hash list commit and the FIFO/LRU queue commit > - Removed hash lookup. We now only use linear lookup in a > linked list > - Eliminated dynamic memory allocation. > - Ensured there is only one call to clock_gettime() > - Using MAC_ZERO instead of the previously dedicated definitions > v7: - NOW using MAC_ZERO where needed > - I am still using linear back-off for empty cache entries. Even > an incoming, flow-creating packet from a local host gives no > guarantee that its MAC address is in the ARP table, so we must > allow for a few new attempts at first possible occasions. Only > after several failed lookups can we conclude that we probably > never will succeed. Hence the back-off. > - Fixed a bug that David inadvertently made me aware of: I only > intended to set the initial expiry value to MAC_CACHE_RENEWAL > when an ARP/NDP table lookup was successful. > - Improved struct and function description comments. > v8: - Total re-design of table, adapting to the new, subscription > based way of updating it. > v9: - Catering for MAC address change for an existing host. > v10: - Changes according to feedback from David Gibson > v12: - Changes according to feedback from David and Stefano > - Added dummy entries for loopback and default GW addresses > v13: - Changes according to feedback and discussions with David > and Stefano > v14: - Moved the call to nat_inbound() to a much more sensible > place in netlink.c, as suggested by David. > --- > fwd.c | 217 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > fwd.h | 7 ++ > netlink.c | 10 ++- > passt.c | 1 + > 4 files changed, 233 insertions(+), 2 deletions(-) > > diff --git a/fwd.c b/fwd.c > index 250cf56..f70e4fc 100644 > --- a/fwd.c > +++ b/fwd.c > @@ -26,6 +26,7 @@ > #include "passt.h" > #include "lineread.h" > #include "flow_table.h" > +#include "netlink.h" > > /* Empheral port range: values from RFC 6335 */ > static in_port_t fwd_ephemeral_min = (1 << 15) + (1 << 14); > @@ -33,6 +34,222 @@ static in_port_t fwd_ephemeral_max = NUM_PORTS - 1; > > #define PORT_RANGE_SYSCTL "/proc/sys/net/ipv4/ip_local_port_range" > > +#define NEIGH_TABLE_SLOTS 1024 > +#define NEIGH_TABLE_SIZE (NEIGH_TABLE_SLOTS / 2) > +static_assert((NEIGH_TABLE_SLOTS & (NEIGH_TABLE_SLOTS - 1)) == 0, > + "NEIGH_TABLE_SLOTS must be a power of two"); > + > +/** > + * struct neigh_table_entry - Entry in the ARP/NDP table > + * @next: Next entry in slot or free list > + * @addr: IP address of represented host > + * @mac: MAC address of represented host > + * @permanent: Entry cannot be altered or freed by notification > + */ > +struct neigh_table_entry { > + struct neigh_table_entry *next; > + union inany_addr addr; > + uint8_t mac[ETH_ALEN]; > + bool permanent; > +}; > + > +/** > + * struct neigh_table - Cache of ARP/NDP table contents > + * @entries: Entries to be plugged into the hash slots when allocated > + * @slots: Hash table slots > + * @free: Linked list of unused entries > + */ > +struct neigh_table { > + struct neigh_table_entry entries[NEIGH_TABLE_SIZE]; > + struct neigh_table_entry *slots[NEIGH_TABLE_SLOTS]; > + struct neigh_table_entry *free; > +}; > + > +static struct neigh_table neigh_table; > + > +/** > + * neigh_table_slot() - Hash key to a number within the table range > + * @c: Execution context > + * @key: The key to be used for the hash > + * > + * Return: the resulting hash value > + */ > +static size_t neigh_table_slot(const struct ctx *c, > + const union inany_addr *key) > +{ > + struct siphash_state st = SIPHASH_INIT(c->hash_secret); > + uint32_t i; > + > + inany_siphash_feed(&st, key); > + i = siphash_final(&st, sizeof(*key), 0); > + > + return ((size_t)i) & (NEIGH_TABLE_SIZE - 1); > +} > + > +/** > + * fwd_neigh_table_find() - Find a MAC table entry > + * @c: Execution context > + * @addr: Neighbour address to be used as key for the lookup > + * > + * Return: the matching entry, if found. Otherwise NULL > + */ > +static struct neigh_table_entry *fwd_neigh_table_find(const struct ctx *c, > + const union inany_addr *addr) > +{ > + size_t slot = neigh_table_slot(c, addr); > + struct neigh_table_entry *e = neigh_table.slots[slot]; > + > + while (e && !inany_equals(&e->addr, addr)) > + e = e->next; > + > + return e; > +} > + > +/** > + * fwd_neigh_table_update() - Allocate or update neighbour table entry > + * @c: Execution context > + * @addr: IP address used to determine insertion slot and store in entry > + * @mac: The MAC address associated with the neighbour address > + * @permanent: Created entry cannot be altered or freed > + */ > +void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr, > + const uint8_t *mac, bool permanent) > +{ > + struct neigh_table *t = &neigh_table; > + struct neigh_table_entry *e; > + ssize_t slot; > + > + /* MAC address might change sometimes */ > + e = fwd_neigh_table_find(c, addr); > + if (e) { > + if (!e->permanent) > + memcpy(e->mac, mac, ETH_ALEN); > + return; > + } > + > + e = t->free; > + if (!e) { > + debug("Failed to allocate neighbour table entry"); > + return; > + } > + t->free = e->next; > + slot = neigh_table_slot(c, addr); > + e->next = t->slots[slot]; > + t->slots[slot] = e; > + > + memcpy(&e->addr, addr, sizeof(*addr)); > + memcpy(e->mac, mac, ETH_ALEN); > + e->permanent = permanent; > +} > + > +/** > + * fwd_neigh_table_free() - Remove an entry from a slot and add it to free list > + * @c: Execution context > + * @addr: IP address used to find the slot for the entry > + */ > +void fwd_neigh_table_free(const struct ctx *c, const union inany_addr *addr) > +{ > + ssize_t slot = neigh_table_slot(c, addr); > + struct neigh_table *t = &neigh_table; > + struct neigh_table_entry *e, **prev; > + > + prev = &t->slots[slot]; > + e = t->slots[slot]; > + while (e && !inany_equals(&e->addr, addr)) { > + prev = &e->next; > + e = e->next; > + } > + > + if (!e || e->permanent) > + return; > + > + *prev = e->next; > + e->next = t->free; > + t->free = e; > + memset(&e->addr, 0, sizeof(*addr)); > + memset(e->mac, 0, ETH_ALEN); > +} > + > +/** > + * fwd_neigh_mac_get() - Look up MAC address in the ARP/NDP table > + * @c: Execution context > + * @addr: Neighbour IP address used as lookup key > + * @mac: Buffer for returned MAC address > + */ > +void fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr, > + uint8_t *mac) > +{ > + const struct neigh_table_entry *e = fwd_neigh_table_find(c, addr); > + > + if (e) > + memcpy(mac, e->mac, ETH_ALEN); > + else > + memcpy(mac, c->our_tap_mac, ETH_ALEN); > +} > + > +/** > + * fwd_neigh_table_init() - Initialize the neighbour table > + * @c: Execution context > + */ > +void fwd_neigh_table_init(const struct ctx *c) > +{ > + union inany_addr mhl = inany_from_v4(c->ip4.map_host_loopback); > + union inany_addr mga = inany_from_v4(c->ip4.map_guest_addr); > + union inany_addr ggw = inany_from_v4(c->ip4.guest_gw); I already mentioned this a couple of times: guest_gw is *not* *relevant*. Ignore it. You needed to add 2/10 and a bunch of lines below because of it, but you shouldn't use it at all. It's not relevant because, for the purposes of this series, everything you need is already reflected in map_guest_addr and map_host_loopback. If no_map_gw is false, and no other options are given, then map_host_loopback is set to guest_gw. If no_map_gw is true, and no other options are given, then map_host_loopback is *not* set to guest_gw. If zero to two of map_host_loopback and map_guest_addr are set, they will tell you which zero to two addresses are translated (for each IP version). > + struct neigh_table *t = &neigh_table; > + struct neigh_table_entry *e; > + int i; > + > + memset(t, 0, sizeof(*t)); > + > + for (i = 0; i < NEIGH_TABLE_SIZE; i++) { > + e = &t->entries[i]; > + e->next = t->free; > + t->free = e; > + } > + > + /* Blocker entries to stop events from hosts using these addresses */ > + if (!inany_is_unspecified4(&mhl)) > + fwd_neigh_table_update(c, &mhl, c->our_tap_mac, true); > + > + if (!inany_is_unspecified4(&ggw) && !c->no_map_gw) > + fwd_neigh_table_update(c, &ggw, c->our_tap_mac, true); This is not needed, and in some cases wrong: - if guest_gw is set and no_map_gw is false, but map_host_loopback wasn't configured, mhl is the same as ggw, and you already added an entry. No harm done, just one excess entry here - if guest_gw is set and no_map_gw is false, but map_host_loopback was configured, mhl is different from ggw. You already added the right entry we'll map in that case (mhl), and now you're adding an entry that's plain wrong (ggw), because we're *not* mapping ggw if the user gave another address. Just ignore guest_gw altogether. > + > + if (!inany_is_unspecified4(&mga) && !inany_equals(&mhl, &mga)) { > + uint8_t mac[ETH_ALEN]; > + int rc; > + > + rc = nl_link_get_mac(nl_sock, c->ifi4, mac); > + if (rc < 0) { > + debug("Couldn't get ip4 MAC addr: %s", strerror_(-rc)); IPv4 > + memcpy(mac, c->our_tap_mac, ETH_ALEN); > + } > + fwd_neigh_table_update(c, &mga, mac, true); > + } > + > + mhl = *(union inany_addr *)&c->ip6.map_host_loopback; > + mga = *(union inany_addr *)&c->ip4.map_guest_addr; Shouldn't this be c->ip6.map_guest_addr? > + ggw = *(union inany_addr *)&c->ip4.guest_gw; Not needed. > + > + if (!inany_is_unspecified6(&mhl)) > + fwd_neigh_table_update(c, &mhl, c->our_tap_mac, true); > + > + if (!inany_is_unspecified6(&ggw) && !c->no_map_gw) > + fwd_neigh_table_update(c, &ggw, c->our_tap_mac, true); Not needed, possibly harmful. > + > + if (!inany_is_unspecified6(&mga) && !inany_equals(&mhl, &mga)) { > + uint8_t mac[ETH_ALEN]; > + int rc; > + > + rc = nl_link_get_mac(nl_sock, c->ifi6, mac); > + if (rc < 0) { > + debug("Couldn't get ip6 MAC addr: %s", strerror_(-rc)); IPv6 > + memcpy(mac, c->our_tap_mac, ETH_ALEN); > + } > + fwd_neigh_table_update(c, &mga, mac, true); > + } > +} > + > /** fwd_probe_ephemeral() - Determine what ports this host considers ephemeral > * > * Work out what ports the host thinks are emphemeral and record it for later > diff --git a/fwd.h b/fwd.h > index 65c7c96..352f3b5 100644 > --- a/fwd.h > +++ b/fwd.h > @@ -56,5 +56,12 @@ uint8_t fwd_nat_from_splice(const struct ctx *c, uint8_t proto, > const struct flowside *ini, struct flowside *tgt); > uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, > const struct flowside *ini, struct flowside *tgt); > +void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr, > + const uint8_t *mac, bool permanent); > +void fwd_neigh_table_free(const struct ctx *c, > + const union inany_addr *addr); > +void fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr, > + uint8_t *mac); > +void fwd_neigh_table_init(const struct ctx *c); > > #endif /* FWD_H */ > diff --git a/netlink.c b/netlink.c > index 186383c..85643bd 100644 > --- a/netlink.c > +++ b/netlink.c > @@ -1123,10 +1123,10 @@ static void nl_neigh_msg_read(const struct ctx *c, struct nlmsghdr *nh) > char ip_str[INET6_ADDRSTRLEN]; > char mac_str[ETH_ADDRSTRLEN]; > const uint8_t *lladdr = NULL; > + union inany_addr addr, daddr; > const void *dst = NULL; > size_t lladdr_len = 0; > uint8_t mac[ETH_ALEN]; > - union inany_addr addr; > size_t dstlen = 0; > > if (nh->nlmsg_type == NLMSG_DONE) > @@ -1183,15 +1183,20 @@ static void nl_neigh_msg_read(const struct ctx *c, struct nlmsghdr *nh) > warn("netlink: wrong address length in AF_INET6 notification"); > return; > } > + /* We only handle guest-side visible addresses */ > inany_from_af(&addr, ndm->ndm_family, dst); > - inany_ntop(dst, ip_str, sizeof(ip_str)); > + if (!nat_inbound(c, &addr, &daddr)) > + return; > + inany_ntop(&daddr, ip_str, sizeof(ip_str)); > > if (nh->nlmsg_type == RTM_DELNEIGH) { > trace("neigh table delete: %s", ip_str); > + fwd_neigh_table_free(c, &daddr); > return; > } > if (!(ndm->ndm_state & NUD_VALID)) { > trace("neigh table: invalid state for %s", ip_str); > + fwd_neigh_table_free(c, &daddr); > return; > } > if (nh->nlmsg_type != RTM_NEWNEIGH || !lladdr) { > @@ -1204,6 +1209,7 @@ static void nl_neigh_msg_read(const struct ctx *c, struct nlmsghdr *nh) > memcpy(mac, lladdr, ETH_ALEN); > eth_ntop(mac, mac_str, sizeof(mac_str)); > trace("neigh table update: %s / %s", ip_str, mac_str); > + fwd_neigh_table_update(c, &daddr, mac, false); > } > > /** > diff --git a/passt.c b/passt.c > index e21d6ba..98fc430 100644 > --- a/passt.c > +++ b/passt.c > @@ -324,6 +324,7 @@ int main(int argc, char **argv) > > pcap_init(&c); > > + fwd_neigh_table_init(&c); > nl_neigh_notify_init(&c); > > if (!c.foreground) { -- Stefano