public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Jon Maloy <jmaloy@redhat.com>
Cc: sbrivio@redhat.com, dgibson@redhat.com, passt-dev@passt.top
Subject: Re: [PATCH v12 2/9] fwd: Add cache table for ARP/NDP contents
Date: Fri, 3 Oct 2025 14:31:31 +1000	[thread overview]
Message-ID: <aN9Ro2wO9C6FdwxG@zatzit> (raw)
In-Reply-To: <20251003003412.588801-3-jmaloy@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 8423 bytes --]

On Thu, Oct 02, 2025 at 08:34:05PM -0400, Jon Maloy wrote:
> We add a cache table to keep track of the contents of the kernel ARP
> and NDP tables. The table is fed from the just introduced netlink based
> neigbour subscription function. The new table eliminates the need for
> explicit netlink calls to find a host's MAC address.
> 
> Signed-off-by: Jon Maloy <jmaloy@redhat.com>
> 
> ---
> v5: - Moved to earlier in series to reduce rebase conflicts
> v6: - Sqashed the hash list commit and the FIFO/LRU queue commit
>     - Removed hash lookup. We now only use linear lookup in a
>       linked list
>     - Eliminated dynamic memory allocation.
>     - Ensured there is only one call to clock_gettime()
>     - Using MAC_ZERO instead of the previously dedicated definitions
> v7: - NOW using MAC_ZERO where needed
>     - I am still using linear back-off for empty cache entries. Even
>       an incoming, flow-creating packet from a local host gives no
>       guarantee that its MAC address is in the ARP table, so we must
>       allow for a few new attempts at first possible occasions. Only
>       after several failed lookups can we conclude that we probably
>       never will succeed. Hence the back-off.
>     - Fixed a bug that David inadvertently made me aware of: I only
>       intended to set the initial expiry value to MAC_CACHE_RENEWAL
>       when an ARP/NDP table lookup was successful.
>     - Improved struct and function description comments.
> v8: - Total re-design of table, adapting to the new, subscription
>       based way of updating it.
> v9: - Catering for MAC address change for an existing host.
> v10: - Changes according to feedback from David Gibson
> v12: - Changes according to feedback from David and Stefano
>      - Added dummy entries for loopback and default GW addresses
> ---
>  conf.c    |   1 +
>  fwd.c     | 182 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  fwd.h     |   7 +++
>  netlink.c |   7 ++-
>  4 files changed, 195 insertions(+), 2 deletions(-)
> 
> diff --git a/conf.c b/conf.c
> index 66b9e63..3224ee6 100644
> --- a/conf.c
> +++ b/conf.c
[snip]
> +/**
> + * fwd_neigh_table_update() - Allocate or update neighbour table entry
> + * @c:		Execution context
> + * @addr:	IP address used to determine insertion slot and store in entry
> + * @mac:	The MAC address associated with the neighbour address
> + */
> +void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr,
> +			    const uint8_t *mac)
> +{
> +	struct neigh_table *t = &neigh_table;
> +	struct neigh_table_entry *e;
> +	ssize_t slot;
> +
> +	/* MAC address might change sometimes */
> +	e = fwd_neigh_table_find(c, addr);
> +	if (e) {
> +		if (inany_equals(addr, &inany_from_v4(c->ip4.guest_gw)))
> +			return;
> +		if (inany_equals(addr, (union inany_addr *)&c->ip6.guest_gw))
> +			return;

This doesn't make sense to me.  From the way its looked up in 4/9, the
IP addresses in the table are as seen by the host, not by the guest
(we look up the table *after* applying NAT).  Which makes guest_gw not
meaningful here.

You _do_ need to handle the case that addr is loopback (which guest_gw
might be translated to), and that's handled by your dummy entries.

The other case we might need to consider here is if the (translated)
address is the host's IP.  Do we want to give out_tap_addr for that
case?  Or the host's real MAC on the template interface?  If the
latter will that be in the host ARP table, or would we have to handle
it specially?

> +
> +		memcpy(e->mac, mac, ETH_ALEN);
> +		return;
> +	}
> +
> +	e = t->free;
> +	if (!e) {
> +		debug("Failed to allocate neighbour table entry");
> +		return;
> +	}
> +	t->free = e->next;
> +
> +	slot = neigh_table_slot(c, addr);
> +	e->next = t->slots[slot];
> +	t->slots[slot] = e;
> +
> +	memcpy(&e->addr, addr, sizeof(*addr));
> +	memcpy(e->mac, mac, ETH_ALEN);
> +}
> +
> +/**
> + * fwd_neigh_table_free() - Remove an entry from a slot and add it to free list
> + * @c:		Execution context
> + * @addr:	IP address used to find the slot for the entry
> + */
> +void fwd_neigh_table_free(const struct ctx *c, const union inany_addr *addr)
> +{
> +	ssize_t slot = neigh_table_slot(c, addr);
> +	struct neigh_table *t = &neigh_table;
> +	struct neigh_table_entry *e, **prev;
> +
> +	prev = &t->slots[slot];
> +	e = t->slots[slot];
> +	while (e && !inany_equals(&e->addr, addr)) {
> +		prev = &e->next;
> +		e = e->next;
> +	}
> +	if (!e)
> +		return;
> +
> +	*prev = e->next;
> +	e->next = t->free;
> +	t->free = e;
> +	memset(&e->addr, 0, sizeof(*addr));
> +	memset(e->mac, 0, ETH_ALEN);

As Stefano noted earlier, 0xff is probably a better placeholder here,
since 00:00:00:00:00:00 is a valid MAC.

> +}
> +
> +/**
> + * fwd_neigh_mac_get() - Look up MAC address in the ARP/NDP table
> + * @c:		Execution context
> + * @addr:	Neighbour IP address used as lookup key
> + * @mac:	Buffer for Ethernet MAC to return, found or default value.
> + *
> + * Return: true if real MAC found, false if not found or if failure
> + */
> +bool fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr,
> +		       uint8_t *mac)
> +{
> +	const struct neigh_table_entry *e = fwd_neigh_table_find(c, addr);
> +
> +	if (e)
> +		memcpy(mac, e->mac, ETH_ALEN);
> +	else
> +		memcpy(mac, c->our_tap_mac, ETH_ALEN);
> +
> +	return !!e;
> +}
> +
> +/**
> + * fwd_neigh_table_init() - Initialize the neighbour table
> + * @c:		Execution context
> + */
> +void fwd_neigh_table_init(const struct ctx *c)
> +{
> +	struct neigh_table *t = &neigh_table;
> +	const uint8_t *omac = c->our_tap_mac;
> +	struct neigh_table_entry *e;
> +	int i;
> +
> +	memset(t, 0, sizeof(*t));
> +	for (i = 0; i < NEIGH_TABLE_SIZE; i++) {
> +		e = &t->entries[i];
> +		e->next = t->free;
> +		t->free = e;
> +	}
> +
> +	/* These addresses must always map to our own MAC address */
> +	fwd_neigh_table_update(c, &inany_loopback4, omac);
> +	fwd_neigh_table_update(c, &inany_loopback6, omac);

These make sense.

> +	fwd_neigh_table_update(c, &inany_from_v4(c->ip4.guest_gw), omac);
> +	fwd_neigh_table_update(c, (union inany_addr *)&c->ip6.guest_gw, omac);

But these don't.  For two reasons:
 * As noted guest_gw is only meaningful guest side, but the table is
   based on host side addresses.
 * These will be no-ops, since you explicitly exclude these addresses
   in fwd_neigh_table_update().

> +}
> +
>  /** fwd_probe_ephemeral() - Determine what ports this host considers ephemeral
>   *
>   * Work out what ports the host thinks are emphemeral and record it for later
> diff --git a/fwd.h b/fwd.h
> index 65c7c96..6ca743c 100644
> --- a/fwd.h
> +++ b/fwd.h
> @@ -56,5 +56,12 @@ uint8_t fwd_nat_from_splice(const struct ctx *c, uint8_t proto,
>  			    const struct flowside *ini, struct flowside *tgt);
>  uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto,
>  			  const struct flowside *ini, struct flowside *tgt);
> +void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr,
> +			    const uint8_t *mac);
> +void fwd_neigh_table_free(const struct ctx *c,
> +			  const union inany_addr *addr);
> +bool fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr,
> +		       uint8_t *mac);
> +void fwd_neigh_table_init(const struct ctx *c);
>  
>  #endif /* FWD_H */
> diff --git a/netlink.c b/netlink.c
> index 3fe2fdd..4be5fcf 100644
> --- a/netlink.c
> +++ b/netlink.c
> @@ -1192,10 +1192,13 @@ static void nl_neigh_msg_read(const struct ctx *c, struct nlmsghdr *nh)
>  	inany_from_af(&addr, ndm->ndm_family, dst);
>  	inany_ntop(dst, ip_str, sizeof(ip_str));
>  
> -	if (nh->nlmsg_type == RTM_NEWNEIGH && ndm->ndm_state & NUD_VALID)
> +	if (nh->nlmsg_type == RTM_NEWNEIGH && ndm->ndm_state & NUD_VALID) {
>  		trace("neigh table update: %s / %s", ip_str, mac_str);
> -	else
> +		fwd_neigh_table_update(c, &addr, mac);
> +	} else {
>  		trace("neigh table delete: %s / %s", ip_str, mac_str);
> +		fwd_neigh_table_free(c, &addr);
> +	}
>  }
>  
>  /**
> -- 
> 2.50.1
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2025-10-03  5:33 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-03  0:34 [PATCH v12 0/9] Use true MAC address of LAN local remote hosts Jon Maloy
2025-10-03  0:34 ` [PATCH v12 1/9] netlink: add subsciption on changes in NDP/ARP table Jon Maloy
2025-10-03  4:01   ` David Gibson
2025-10-06 22:33   ` Stefano Brivio
2025-10-03  0:34 ` [PATCH v12 2/9] fwd: Add cache table for ARP/NDP contents Jon Maloy
2025-10-03  1:03   ` Jon Maloy
2025-10-03  4:31   ` David Gibson [this message]
2025-10-05 15:52     ` Jon Maloy
2025-10-06 22:33       ` Stefano Brivio
2025-10-07  3:33       ` David Gibson
2025-10-06 22:40   ` Stefano Brivio
2025-10-03  0:34 ` [PATCH v12 3/9] arp/ndp: send ARP announcement / unsolicited NA when neigbour entry added Jon Maloy
2025-10-03  4:41   ` David Gibson
2025-10-07 10:10     ` Stefano Brivio
2025-10-06 22:51   ` Stefano Brivio
2025-10-03  0:34 ` [PATCH v12 4/9] arp/ndp: respond with true MAC address of LAN local remote hosts Jon Maloy
2025-10-03  4:48   ` David Gibson
2025-10-03  0:34 ` [PATCH v12 5/9] flow: add MAC address of LAN local remote hosts to flow Jon Maloy
2025-10-03  0:34 ` [PATCH v12 6/9] udp: forward external source MAC address through tap interface Jon Maloy
2025-10-03  4:52   ` David Gibson
2025-10-03  0:34 ` [PATCH v12 7/9] tcp: " Jon Maloy
2025-10-03  4:54   ` David Gibson
2025-10-03  0:34 ` [PATCH v12 8/9] tap: change signature of function tap_push_l2h() Jon Maloy
2025-10-03  0:34 ` [PATCH v12 9/9] icmp: let icmp use mac address from flowside structure Jon Maloy
2025-10-03  4:57   ` David Gibson
2025-10-03  5:33 ` [PATCH v12 0/9] Use true MAC address of LAN local remote hosts David Gibson
2025-10-05 13:39   ` Jon Maloy
2025-10-07  0:56     ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aN9Ro2wO9C6FdwxG@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=dgibson@redhat.com \
    --cc=jmaloy@redhat.com \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).