From: Jon Maloy <jmaloy@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: sbrivio@redhat.com, dgibson@redhat.com, passt-dev@passt.top
Subject: Re: [PATCH v12 2/9] fwd: Add cache table for ARP/NDP contents
Date: Sun, 5 Oct 2025 11:52:42 -0400 [thread overview]
Message-ID: <82865ef4-326e-48af-ab9a-6367f2c9e023@redhat.com> (raw)
In-Reply-To: <aN9Ro2wO9C6FdwxG@zatzit>
On 2025-10-03 00:31, David Gibson wrote:
> On Thu, Oct 02, 2025 at 08:34:05PM -0400, Jon Maloy wrote:
>> We add a cache table to keep track of the contents of the kernel ARP
>> and NDP tables. The table is fed from the just introduced netlink based
[...]
>> +void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr,
>> + const uint8_t *mac)
>> +{
>> + struct neigh_table *t = &neigh_table;
>> + struct neigh_table_entry *e;
>> + ssize_t slot;
>> +
>> + /* MAC address might change sometimes */
>> + e = fwd_neigh_table_find(c, addr);
>> + if (e) {
>> + if (inany_equals(addr, &inany_from_v4(c->ip4.guest_gw)))
>> + return;
>> + if (inany_equals(addr, (union inany_addr *)&c->ip6.guest_gw))
>> + return;
>
> This doesn't make sense to me. From the way its looked up in 4/9, the
> IP addresses in the table are as seen by the host, not by the guest
> (we look up the table *after* applying NAT). Which makes guest_gw not
> meaningful here.
>
> You _do_ need to handle the case that addr is loopback (which guest_gw
> might be translated to), and that's handled by your dummy entries.
>
> The other case we might need to consider here is if the (translated)
> address is the host's IP. Do we want to give out_tap_addr for that
> case? Or the host's real MAC on the template interface? If the
> latter will that be in the host ARP table, or would we have to handle
> it specially?
>
This is my current understanding of this:
1) Adding the loopback entries to the neigbour table is harmless,
but unnecessary. fwd_neigh_mac_get() will always return our_tap_mac
when it doesn't find an entry in the table, which is the case here.
This addition was a mistake.
2) Regarding the default gw, we have two cases:
2.1) guest_gw IP is the same as the host_gw IP:
In this case, we'll have an event trying to inserting an entry
into the table with the host_gw's real mac address.
In v11, this mapping was announced to the guest, but later
contradicted by all ARP/NDP messages he receives, because they all
come from PASTA/PASST and uses our_tap_mac as source mac in the
message header. This is probably harmless, but causes confusion in
the guest and a warning wireshark.
In v12, I therefore chose to add the entry but suppress the
announcements for the gw, and also the updates of
the mac fields from subsequent events.
This is not right either. We either have to be consistent, and
add the real host_gw mac in all messages sent from the purported
gw address, or we just as consistently use our_tap_mac. I think
the latter is simpler with less consequences for the code.
I think we should just simply suppress all events from the host
gw in such cases.
(Here I have a question: I don't see any NAT mapping making
it possible to communicate between the guest and the the host gw
when guest gw IP and host gw IP are are equal. Shouldn't there be
one?)
2.2) guest_gw IP is different from host_gw IP:
This case is simpler. We just treat the host_gw as any other
local host, add and update its entry at new events, and pass it on
to the guest as an announcement at first contact, consistently
using that host's true mapping.
3) Regarding the host address, we also have two cases:
3.1) guest IP is the same as the host IP (on the default interface):
In this case the guest can only reach the host by using the
guest_gw IP (which lands on the host), some or the other IPs
on the host, or the map_host_loopback IP, as far as I
understand.
The host IP will never enter the neigbour table by ARP/NDP
event, and there is no reason for us to add a dummy for it.
3.2) guest IP differs from the host IP.
In this case we let fwd_neigh_table_init() add an entry for the
host as if it were any other host, using the IP and mac
addresses from the template interface.
Conclusion:
- We use nat_inbound() instead of nat_outbound() before consulting the
table, like you suggest.
- We need to manually add an entry representing the host in the case the
host IP differs from the guest IP. This entry is announced.
In the case the IPs are the same we don't add any entry.
- Local host entries are added by ARP/NDP events, but only if
nat_inbound() doesn't translate the IP address (will that ever
happen?).
We suppress all events from the host gw in case it has the same
IP as the guest gw (unless this is covered by the nat_inbound()
check?).
- We may want a NAT mapping making it possible to reach the host gw
in the case it has the same IP as the guest gw. This has no effect
on our neighbour table.
If you agree with my above analysis I can go ahead and post a v13 of
the series early next week, including the above changes and fixes
for your other comments to the series.
///jon
>> +
>> + memcpy(e->mac, mac, ETH_ALEN);
>> + return;
>> + }
>> +
>> + e = t->free;
>> + if (!e) {
>> + debug("Failed to allocate neighbour table entry");
>> + return;
>> + }
>> + t->free = e->next;
>> +
>> + slot = neigh_table_slot(c, addr);
>> + e->next = t->slots[slot];
>> + t->slots[slot] = e;
>> +
>> + memcpy(&e->addr, addr, sizeof(*addr));
>> + memcpy(e->mac, mac, ETH_ALEN);
>> +}
>> +
>> +/**
>> + * fwd_neigh_table_free() - Remove an entry from a slot and add it to free list
>> + * @c: Execution context
>> + * @addr: IP address used to find the slot for the entry
>> + */
>> +void fwd_neigh_table_free(const struct ctx *c, const union inany_addr *addr)
>> +{
>> + ssize_t slot = neigh_table_slot(c, addr);
>> + struct neigh_table *t = &neigh_table;
>> + struct neigh_table_entry *e, **prev;
>> +
>> + prev = &t->slots[slot];
>> + e = t->slots[slot];
>> + while (e && !inany_equals(&e->addr, addr)) {
>> + prev = &e->next;
>> + e = e->next;
>> + }
>> + if (!e)
>> + return;
>> +
>> + *prev = e->next;
>> + e->next = t->free;
>> + t->free = e;
>> + memset(&e->addr, 0, sizeof(*addr));
>> + memset(e->mac, 0, ETH_ALEN);
>
> As Stefano noted earlier, 0xff is probably a better placeholder here,
> since 00:00:00:00:00:00 is a valid MAC.
>
>> +}
>> +
>> +/**
>> + * fwd_neigh_mac_get() - Look up MAC address in the ARP/NDP table
>> + * @c: Execution context
>> + * @addr: Neighbour IP address used as lookup key
>> + * @mac: Buffer for Ethernet MAC to return, found or default value.
>> + *
>> + * Return: true if real MAC found, false if not found or if failure
>> + */
>> +bool fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr,
>> + uint8_t *mac)
>> +{
>> + const struct neigh_table_entry *e = fwd_neigh_table_find(c, addr);
>> +
>> + if (e)
>> + memcpy(mac, e->mac, ETH_ALEN);
>> + else
>> + memcpy(mac, c->our_tap_mac, ETH_ALEN);
>> +
>> + return !!e;
>> +}
>> +
>> +/**
>> + * fwd_neigh_table_init() - Initialize the neighbour table
>> + * @c: Execution context
>> + */
>> +void fwd_neigh_table_init(const struct ctx *c)
>> +{
>> + struct neigh_table *t = &neigh_table;
>> + const uint8_t *omac = c->our_tap_mac;
>> + struct neigh_table_entry *e;
>> + int i;
>> +
>> + memset(t, 0, sizeof(*t));
>> + for (i = 0; i < NEIGH_TABLE_SIZE; i++) {
>> + e = &t->entries[i];
>> + e->next = t->free;
>> + t->free = e;
>> + }
>> +
>> + /* These addresses must always map to our own MAC address */
>> + fwd_neigh_table_update(c, &inany_loopback4, omac);
>> + fwd_neigh_table_update(c, &inany_loopback6, omac);
>
> These make sense.
>
>> + fwd_neigh_table_update(c, &inany_from_v4(c->ip4.guest_gw), omac);
>> + fwd_neigh_table_update(c, (union inany_addr *)&c->ip6.guest_gw, omac);
>
> But these don't. For two reasons:
> * As noted guest_gw is only meaningful guest side, but the table is
> based on host side addresses.
> * These will be no-ops, since you explicitly exclude these addresses
> in fwd_neigh_table_update().
>
>> +}
>> +
>> /** fwd_probe_ephemeral() - Determine what ports this host considers ephemeral
>> *
>> * Work out what ports the host thinks are emphemeral and record it for later
>> diff --git a/fwd.h b/fwd.h
>> index 65c7c96..6ca743c 100644
>> --- a/fwd.h
>> +++ b/fwd.h
>> @@ -56,5 +56,12 @@ uint8_t fwd_nat_from_splice(const struct ctx *c, uint8_t proto,
>> const struct flowside *ini, struct flowside *tgt);
>> uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto,
>> const struct flowside *ini, struct flowside *tgt);
>> +void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr,
>> + const uint8_t *mac);
>> +void fwd_neigh_table_free(const struct ctx *c,
>> + const union inany_addr *addr);
>> +bool fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr,
>> + uint8_t *mac);
>> +void fwd_neigh_table_init(const struct ctx *c);
>>
>> #endif /* FWD_H */
>> diff --git a/netlink.c b/netlink.c
>> index 3fe2fdd..4be5fcf 100644
>> --- a/netlink.c
>> +++ b/netlink.c
>> @@ -1192,10 +1192,13 @@ static void nl_neigh_msg_read(const struct ctx *c, struct nlmsghdr *nh)
>> inany_from_af(&addr, ndm->ndm_family, dst);
>> inany_ntop(dst, ip_str, sizeof(ip_str));
>>
>> - if (nh->nlmsg_type == RTM_NEWNEIGH && ndm->ndm_state & NUD_VALID)
>> + if (nh->nlmsg_type == RTM_NEWNEIGH && ndm->ndm_state & NUD_VALID) {
>> trace("neigh table update: %s / %s", ip_str, mac_str);
>> - else
>> + fwd_neigh_table_update(c, &addr, mac);
>> + } else {
>> trace("neigh table delete: %s / %s", ip_str, mac_str);
>> + fwd_neigh_table_free(c, &addr);
>> + }
>> }
>>
>> /**
>> --
>> 2.50.1
>>
>
next prev parent reply other threads:[~2025-10-05 15:52 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-03 0:34 [PATCH v12 0/9] Use true MAC address of LAN local remote hosts Jon Maloy
2025-10-03 0:34 ` [PATCH v12 1/9] netlink: add subsciption on changes in NDP/ARP table Jon Maloy
2025-10-03 4:01 ` David Gibson
2025-10-06 22:33 ` Stefano Brivio
2025-10-03 0:34 ` [PATCH v12 2/9] fwd: Add cache table for ARP/NDP contents Jon Maloy
2025-10-03 1:03 ` Jon Maloy
2025-10-03 4:31 ` David Gibson
2025-10-05 15:52 ` Jon Maloy [this message]
2025-10-06 22:33 ` Stefano Brivio
2025-10-07 3:33 ` David Gibson
2025-10-06 22:40 ` Stefano Brivio
2025-10-03 0:34 ` [PATCH v12 3/9] arp/ndp: send ARP announcement / unsolicited NA when neigbour entry added Jon Maloy
2025-10-03 4:41 ` David Gibson
2025-10-07 10:10 ` Stefano Brivio
2025-10-06 22:51 ` Stefano Brivio
2025-10-03 0:34 ` [PATCH v12 4/9] arp/ndp: respond with true MAC address of LAN local remote hosts Jon Maloy
2025-10-03 4:48 ` David Gibson
2025-10-03 0:34 ` [PATCH v12 5/9] flow: add MAC address of LAN local remote hosts to flow Jon Maloy
2025-10-03 0:34 ` [PATCH v12 6/9] udp: forward external source MAC address through tap interface Jon Maloy
2025-10-03 4:52 ` David Gibson
2025-10-03 0:34 ` [PATCH v12 7/9] tcp: " Jon Maloy
2025-10-03 4:54 ` David Gibson
2025-10-03 0:34 ` [PATCH v12 8/9] tap: change signature of function tap_push_l2h() Jon Maloy
2025-10-03 0:34 ` [PATCH v12 9/9] icmp: let icmp use mac address from flowside structure Jon Maloy
2025-10-03 4:57 ` David Gibson
2025-10-03 5:33 ` [PATCH v12 0/9] Use true MAC address of LAN local remote hosts David Gibson
2025-10-05 13:39 ` Jon Maloy
2025-10-07 0:56 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=82865ef4-326e-48af-ab9a-6367f2c9e023@redhat.com \
--to=jmaloy@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=dgibson@redhat.com \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).