On Sun, Oct 05, 2025 at 11:52:42AM -0400, Jon Maloy wrote:
> 
> 
> On 2025-10-03 00:31, David Gibson wrote:
> > On Thu, Oct 02, 2025 at 08:34:05PM -0400, Jon Maloy wrote:
> > > We add a cache table to keep track of the contents of the kernel ARP
> > > and NDP tables. The table is fed from the just introduced netlink based
> 
> [...]
> 
> > > +void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr,
> > > +			    const uint8_t *mac)
> > > +{
> > > +	struct neigh_table *t = &neigh_table;
> > > +	struct neigh_table_entry *e;
> > > +	ssize_t slot;
> > > +
> > > +	/* MAC address might change sometimes */
> > > +	e = fwd_neigh_table_find(c, addr);
> > > +	if (e) {
> > > +		if (inany_equals(addr, &inany_from_v4(c->ip4.guest_gw)))
> > > +			return;
> > > +		if (inany_equals(addr, (union inany_addr *)&c->ip6.guest_gw))
> > > +			return;
> > 
> > This doesn't make sense to me.  From the way its looked up in 4/9, the
> > IP addresses in the table are as seen by the host, not by the guest
> > (we look up the table *after* applying NAT).  Which makes guest_gw not
> > meaningful here.
> > 
> > You _do_ need to handle the case that addr is loopback (which guest_gw
> > might be translated to), and that's handled by your dummy entries.
> > 
> > The other case we might need to consider here is if the (translated)
> > address is the host's IP.  Do we want to give out_tap_addr for that
> > case?  Or the host's real MAC on the template interface?  If the
> > latter will that be in the host ARP table, or would we have to handle
> > it specially?
> > 
> This is my current understanding of this:
> 
> 1) Adding the loopback entries to the neigbour table is harmless,
>    but unnecessary. fwd_neigh_mac_get() will always return our_tap_mac
>    when it doesn't find an entry in the table, which is the case here.
>    This addition was a mistake.

Yes.  In fact, it goes even further than that: now that we're
conceptualizing this in terms of guest side addresses, we should never
ever see loopback addresses on tap (I believe we already filter for
this before we get to this point).

> 2) Regarding the default gw, we have two cases:
> 
> 2.1) guest_gw IP is the same as the host_gw IP:
>      In this case, we'll have an event trying to inserting an entry
>      into the table with the host_gw's real mac address.

Yes.  Indeed if guest_gw has the same address as anything on the
host's network segment.

>      In v11, this mapping was announced to the guest, but later
>      contradicted by all ARP/NDP messages he receives, because they all
>      come from PASTA/PASST and uses our_tap_mac as source mac in the
>      message header. This is probably harmless, but causes confusion in
>      the guest and a warning wireshark.
>      In v12, I therefore chose to add the entry but suppress the
>      announcements for the gw, and also the updates of
>      the mac fields from subsequent events.
>      This is not right either. We either have to be consistent, and
>      add the real host_gw mac in all messages sent from the purported
>      gw address, or we just as consistently use our_tap_mac. I think
>      the latter is simpler with less consequences for the code.
>      I think we should just simply suppress all events from the host
>      gw in such cases.
>      (Here I have a question: I don't see any NAT mapping making
>       it possible to communicate between the guest and the the host gw
>       when guest gw IP and host gw IP are are equal. Shouldn't there be
>       one?)

No, there isn't.  This is a limitation in passt that's been there
forever and is tricky to remove.  The problem is that we can't
allocate IP addresses, but we're trying to give the guest the illusion
of being its own network host with its own address.

Re-using the host's IP for the guest has a number of advantages - we
can avoid many NATs, and in particular means (most) peers will see the
same address as the guest thinks it has.  However it means the guest
can't contact the host at all, which is pretty inconvenient.

The gateway NAT is essentially sacrificing the ability for the guest
to contact the (host) gateway, in order to let it contact the host
itself.  This is a good tradeoff when the gateway is a dumb network
appliance (common in a data centre) - it's not such a great tradeoff
when the gateway also has other services running on it (common in a
home network).

There are other ways of approaching the tradeoffs here, none of them
perfect.  podman uses a different one by default, using link-local
(169.254.x.y) addresses for the NATs.

One of the goals of the flexible forwarding stuff I'm (gradually)
working on is to make more of those options easier to access.

> 2.2) guest_gw IP is different from host_gw IP:
>      This case is simpler. We just treat the host_gw as any other
>      local host, add and update its entry at new events, and pass it on
>      to the guest as an announcement at first contact, consistently
>      using that host's true mapping.

So, what to do is a bit confused by the fact that there are 4
conceptually different addresses that usually have the same value:
  a) The host gw address
  b) The guest gw address
  c) The --map-host-loopback address (c->ip*.map_host_loopback)
  d) The --map-guest-addr address (c->ip*.map_guest_addr)

I don't think the host gw address is relevant here, except insofar as
its affected by having the same value as others.

The guest gw address is *usually* not directly relevant for forwarding
decisions.  But here it kind of is.  We can be essentially certain the
guest will ARP this address, and it will look weird if the MAC address
that returns isn't the same as the MAC for packets from peers outside
the network segment (i.e. we are simulating as being routed through
the guest's gateway).

> 3) Regarding the host address, we also have two cases:
> 
> 3.1) guest IP is the same as the host IP (on the default interface):
>      In this case the guest can only reach the host by using the
>      guest_gw IP (which lands on the host), some or the other IPs
>      on the host, or the map_host_loopback IP, as far as I
>      understand.

It can only reach the host via the map_host_loopback or map_guest_addr
addresses.  The guest_gw IP is not relevant... except that it's
usually the same as one of the former.

>      The host IP will never enter the neigbour table by ARP/NDP
>      event, and there is no reason for us to add a dummy for it.
> 
> 3.2) guest IP differs from the host IP.
>      In this case we let fwd_neigh_table_init() add an entry for the
>      host as if it were any other host, using the IP and mac
>      addresses from the template interface.

*thinks*...

* The map_host_loopback and map_guest_addr addresses really refer to
  the host, regardless of what that might be shadowing out in the
  wider internet
   * map_host_loopback address should always use our_tap_mac,
     regardless of anything else.  The interface it's referring to (lo
     on the host) has no MAC address, so our_tap_mac is the only real
     choice.
   * map_guest_addr usually refers to the host, but could refer to
     something else (host addr != guest addr).
       * If it refers to the host, we kind of have a choice whether to
         use our_tap_mac or the host's real MAC addr
       * If it refers to something else, we should probably use the
         real MAC addr if it's on the same network segment
       * I think the best thing here is not to treat this
         specially, relying on nat_inbound to handle it.  If it's not
	 the host that will preserve the MAC.  If it is the host it
	 will preserve the MAC iff we get an event for the host's own
	 MAC.
   * host_gw needs no special handling.  If it's the same as guest_gw
     it will be handled by one of the other cases.  If it's !=
     guest_gw, then it's just another machine on the host's network
     segment and will be handled the same
   * guest_gw is the trickiest to think about.
       * guest_gw == map_host_loopback
          * our_tap_mac is the only real choice - but that's already
            handled by the map_host_loopback rule
       * guest_gw == map_guest_addr && guest_addr == host_addr
          * guest_gw refers to the host, so we kind of have a choice between
            our_tap_mac and the host's real MAC addr
       * guest_gw == map_guest_addr && guest_addr != host_addr
          * guest_gw refers to some machine that's not the host, but
            when we forward packets we're acting on our own, it's not
	    something that other machine is doing
       * guest_gw != map_guest_addr && guest_gw == host_addr
          * guest_gw refers to the host without NAT.  Makes sense to
	    use host's real MAC addr here?
       * A bunch of other combinations
          * My brain hurts.

The confusion with guest_gw comes because the guest can communicate
with it in two different ways: it can be communicating with guest_gw
directly as a peer (so the guest_gw IP actually appears in the
packets), or it can be communicating only as a router (no guest_gw IP
in the packets, but the guest will expect the router's MAC address
there).  The question is what do we need to keep consistent between
those cases so that we don't confuse anyone.

I think the way to go - for now at least - is to lock the MAC address
for guest_gw to our_tap_mac, regardless of what else is going on.
That means we pass up the opportunity to preserve the MAC address of
the host router, even if that doesn't conflict with our NATs
(--no-map-gw).  But I think it's the simplest way to keep things
consistent.  Otherwise we need special logic when we forward a packet
from outside the host network to set the MAC to the router's MAC, even
though the router's IP doesn't appear in the packet or flow.


> Conclusion:
> - We use nat_inbound() instead of nat_outbound() before consulting the
>   table, like you suggest.

Agreed.

> - We need to manually add an entry representing the host in the case the
>   host IP differs from the  guest IP. This entry is announced.
>   In the case the IPs are the same we don't add any entry.

Rather than deal with multiple cases explicitly here, I think we
always want to insert an entry for nat_inbound(host_addr), unless that
conflicts with the rules below.

> - Local host entries are added by ARP/NDP events, but only if
>   nat_inbound() doesn't translate the IP address (will that ever
>   happen?).

I don't think it matter whether nat_inbound() translates, only whether
the guest side address is == map_host_loopback or == guest_gw.  We
can't allow an entry for those cases, but I think we can in all other
cases, whether or not there's been a (non-identity) translation.

>   We suppress all events from the host gw in case it has the same
>   IP as the guest gw (unless this is covered by the nat_inbound()
>   check?).

I believe that is handled by the above checks.

> - We may want a NAT mapping making it possible to reach the host gw
>   in the case it has the same IP as the guest gw. This has no effect
>   on our neighbour table.

host_gw == guest_gw doesn't of itself prevent us from reaching it.
host_gw == map_guest_addr does.  There's not really a reason to add a
new NAT.  If you need to reach the host gw, use --no-map-gw and/or
override --map-guest-addr.  That will leave something else
inaccessible instead, but there's not getting around that - we can't
create a free IP out of thin air[0].

> If you agree with my above analysis I can go ahead and post a v13 of
> the series early next week, including the above changes and fixes
> for your other comments to the series.

With revisions noted above, then yes.

[0] Using link-local addresses, maybe we can, but doing that is a
larger project.  And, to do so implies that the "link" is just from
the guest to passt, not encompassing the host's neighbourhood, which
doesn't really match with preserving MAC addresses in the first place.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson