public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: Jon Maloy <jmaloy@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>,
	dgibson@redhat.com, passt-dev@passt.top
Subject: Re: [PATCH v12 2/9] fwd: Add cache table for ARP/NDP contents
Date: Tue, 7 Oct 2025 00:33:17 +0200	[thread overview]
Message-ID: <20251007003317.5b4dc567@elisabeth> (raw)
In-Reply-To: <82865ef4-326e-48af-ab9a-6367f2c9e023@redhat.com>

On Sun, 5 Oct 2025 11:52:42 -0400
Jon Maloy <jmaloy@redhat.com> wrote:

> On 2025-10-03 00:31, David Gibson wrote:
> > On Thu, Oct 02, 2025 at 08:34:05PM -0400, Jon Maloy wrote:  
> >> We add a cache table to keep track of the contents of the kernel ARP
> >> and NDP tables. The table is fed from the just introduced netlink based  
> 
> [...]
> 
> >> +void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr,
> >> +			    const uint8_t *mac)
> >> +{
> >> +	struct neigh_table *t = &neigh_table;
> >> +	struct neigh_table_entry *e;
> >> +	ssize_t slot;
> >> +
> >> +	/* MAC address might change sometimes */
> >> +	e = fwd_neigh_table_find(c, addr);
> >> +	if (e) {
> >> +		if (inany_equals(addr, &inany_from_v4(c->ip4.guest_gw)))
> >> +			return;
> >> +		if (inany_equals(addr, (union inany_addr *)&c->ip6.guest_gw))
> >> +			return;  
> > 
> > This doesn't make sense to me.  From the way its looked up in 4/9, the
> > IP addresses in the table are as seen by the host, not by the guest
> > (we look up the table *after* applying NAT).  Which makes guest_gw not
> > meaningful here.
> > 
> > You _do_ need to handle the case that addr is loopback (which guest_gw
> > might be translated to), and that's handled by your dummy entries.
> > 
> > The other case we might need to consider here is if the (translated)
> > address is the host's IP.  Do we want to give out_tap_addr for that
> > case?  Or the host's real MAC on the template interface?  If the
> > latter will that be in the host ARP table, or would we have to handle
> > it specially?
> >   
> This is my current understanding of this:
> 
> 1) Adding the loopback entries to the neigbour table is harmless,
>     but unnecessary.

Loopback entries yes, but I thought you would add the equivalent
guest-side entries.

>     fwd_neigh_mac_get() will always return our_tap_mac
>     when it doesn't find an entry in the table, which is the case here.

Kind of, because while with IPv4:

  $ pasta sh -c 'ip n a 127.0.0.1 dev lo lladdr 00:ba:db:ad:1d:ea; ip n'
  0.0.0.0 dev lo lladdr 00:ba:db:ad:1d:ea PERMANENT

that won't match anything, with IPv6:

  $ pasta sh -c 'ip n a ::1 dev lo lladdr 00:ba:db:ad:1d:ea; ip n'
  ::1 dev lo lladdr 00:ba:db:ad:1d:ea PERMANENT

that would match the loopback address. I don't think we actually need
to care about this, though.

>     This addition was a mistake.
>
> 
> 2) Regarding the default gw, we have two cases:
> 
> 2.1) guest_gw IP is the same as the host_gw IP:

I'm afraid we're losing a substantial amount of time on ambiguous
terminology, so let me clarify what my understanding of these terms is:

a. "guest_gw IP" -> any of the two IP addresses, in the guest,
   possibly mapping to the host

...that's because the default gateway address in the guest doesn't
really matter, so I don't think you're referring to the IP address of
the default gateway in the guest.

b. "host_gw IP" -> the IP address of the default gateway on the host

>       In this case, we'll have an event trying to inserting an entry
>       into the table with the host_gw's real mac address.
>       In v11, this mapping was announced to the guest, but later
>       contradicted by all ARP/NDP messages he receives, because they all
>       come from PASTA/PASST and uses our_tap_mac as source mac in the
>       message header. This is probably harmless, but causes confusion in
>       the guest and a warning wireshark.
>       In v12, I therefore chose to add the entry but suppress the
>       announcements for the gw, and also the updates of
>       the mac fields from subsequent events.
>       This is not right either. We either have to be consistent, and
>       add the real host_gw mac in all messages sent from the purported
>       gw address, or we just as consistently use our_tap_mac. I think
>       the latter is simpler with less consequences for the code.

I also think we should use our_tap_mac for any message coming from the
host itself: they're not coming from the gateway (as seen by the host),
which we are effectively shadowing.

>       I think we should just simply suppress all events from the host
>       gw in such cases.

This is generally fine, I think, but it comes with the additional
problem that, should we ever make the map_guest_addr or
map_host_loopback parameters configurable at runtime, we'll miss the
entry at that point and we'll need to rework this.

If it doesn't cause substantial effort, I would suggest that, if
entries always refer to IP addresses as seen from the guest (David's
suggestion), you would actually add the entries mapping to the host to
the table, resolving to our_tap_mac.

Those addresses are *not* c->ip[46].guest_gw though. They are
c->ip[46].map_guest_addr and c->ip[46].map_host_loopback.

See nat_outbound() for an overview of translated addresses.

>       (Here I have a question: I don't see any NAT mapping making
>        it possible to communicate between the guest and the the host gw
>        when guest gw IP and host gw IP are are equal. Shouldn't there be
>        one?)

Right, there's no such parameter at the moment, because the assumption
is that, if you need to reach the default gateway of the host, you
would specify a non-default --map-host-loopback (possibly 'none').

On the other hand we plan to make this more generic, perhaps as part of
https://bugs.passt.top/show_bug.cgi?id=140, perhaps as part of another
effort.

The flow table should enable arbitrary NATs, and was implemented with
that in mind. We just don't have a suitable configuration front-end at
the moment, even though command line parameters would probably be
enough to cover most common cases.

> 2.2) guest_gw IP is different from host_gw IP:
>       This case is simpler. We just treat the host_gw as any other
>       local host, add and update its entry at new events, and pass it on
>       to the guest as an announcement at first contact, consistently
>       using that host's true mapping.
> 
> 3) Regarding the host address, we also have two cases:
> 
> 3.1) guest IP is the same as the host IP (on the default interface):

My interpretation of the terms here is:

c. "guest IP" -> one of the configured guest IP addresses
   (c->ip[46].addr)

d. "host IP" -> one of the IP addresses configured on the host (not in
   struct ctx, but two of those will become c->ip[46].addr by default)

>       In this case the guest can only reach the host by using the
>       guest_gw IP (which lands on the host),

Not according to my interpretation of "guest_gw IP" (a. above).

>       some or the other IPs
>       on the host, or the map_host_loopback IP, as far as I
>       understand.

Yes. And c->ip[46].map_guest_addr on top of them.

>       The host IP will never enter the neigbour table by ARP/NDP
>       event, and there is no reason for us to add a dummy for it.

If none of the host IP addresses are visible to the guest, right,
there's no need for any matching entry.

> 3.2) guest IP differs from the host IP.
>       In this case we let fwd_neigh_table_init() add an entry for the
>       host as if it were any other host, using the IP and mac
>       addresses from the template interface.

This assumes that we'll use MAC addresses assigned to host interfaces,
instead of using our_tap_mac.

I think it looks more elegant than the alternative, but mind that there
can be 2^32 - 1 interfaces on Linux (see dev_index_reserve() in
net/core/dev.c), so that might flood the table right away. Maybe
our_tap_mac isn't that bad after all.

> Conclusion:
> - We use nat_inbound() instead of nat_outbound() before consulting the
>    table, like you suggest.

If this implies standardising on guest-side addresses, it makes sense
to me as well.

> - We need to manually add an entry representing the host in the case the
>    host IP differs from the  guest IP. This entry is announced.
>    In the case the IPs are the same we don't add any entry.

I don't think host and guest IP addresses are really related to this
point. It's rather about the two sets of addresses that might map to
the host.

> - Local host entries are added by ARP/NDP events, but only if
>    nat_inbound() doesn't translate the IP address (will that ever
>    happen?).
>    We suppress all events from the host gw in case it has the same
>    IP as the guest gw (unless this is covered by the nat_inbound()
>    check?).

I wouldn't hide any address like that, to make this a bit more
future-proof.

> - We may want a NAT mapping making it possible to reach the host gw
>    in the case it has the same IP as the guest gw. This has no effect
>    on our neighbour table.
> 
> If you agree with my above analysis I can go ahead and post a v13 of
> the series early next week, including the above changes and fixes
> for your other comments to the series.

...the rest sounds good to me.

-- 
Stefano


  reply	other threads:[~2025-10-06 22:33 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-03  0:34 [PATCH v12 0/9] Use true MAC address of LAN local remote hosts Jon Maloy
2025-10-03  0:34 ` [PATCH v12 1/9] netlink: add subsciption on changes in NDP/ARP table Jon Maloy
2025-10-03  4:01   ` David Gibson
2025-10-06 22:33   ` Stefano Brivio
2025-10-03  0:34 ` [PATCH v12 2/9] fwd: Add cache table for ARP/NDP contents Jon Maloy
2025-10-03  1:03   ` Jon Maloy
2025-10-03  4:31   ` David Gibson
2025-10-05 15:52     ` Jon Maloy
2025-10-06 22:33       ` Stefano Brivio [this message]
2025-10-07  3:33       ` David Gibson
2025-10-06 22:40   ` Stefano Brivio
2025-10-03  0:34 ` [PATCH v12 3/9] arp/ndp: send ARP announcement / unsolicited NA when neigbour entry added Jon Maloy
2025-10-03  4:41   ` David Gibson
2025-10-07 10:10     ` Stefano Brivio
2025-10-06 22:51   ` Stefano Brivio
2025-10-03  0:34 ` [PATCH v12 4/9] arp/ndp: respond with true MAC address of LAN local remote hosts Jon Maloy
2025-10-03  4:48   ` David Gibson
2025-10-03  0:34 ` [PATCH v12 5/9] flow: add MAC address of LAN local remote hosts to flow Jon Maloy
2025-10-03  0:34 ` [PATCH v12 6/9] udp: forward external source MAC address through tap interface Jon Maloy
2025-10-03  4:52   ` David Gibson
2025-10-03  0:34 ` [PATCH v12 7/9] tcp: " Jon Maloy
2025-10-03  4:54   ` David Gibson
2025-10-03  0:34 ` [PATCH v12 8/9] tap: change signature of function tap_push_l2h() Jon Maloy
2025-10-03  0:34 ` [PATCH v12 9/9] icmp: let icmp use mac address from flowside structure Jon Maloy
2025-10-03  4:57   ` David Gibson
2025-10-03  5:33 ` [PATCH v12 0/9] Use true MAC address of LAN local remote hosts David Gibson
2025-10-05 13:39   ` Jon Maloy
2025-10-07  0:56     ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251007003317.5b4dc567@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=dgibson@redhat.com \
    --cc=jmaloy@redhat.com \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).