On Wed, Oct 08, 2025 at 12:01:18PM +0200, Stefano Brivio wrote: > On Wed, 8 Oct 2025 11:27:32 +1100 > David Gibson wrote: > > > On Tue, Oct 07, 2025 at 12:10:22PM +0200, Stefano Brivio wrote: > > > On Fri, 3 Oct 2025 14:41:56 +1000 > > > David Gibson wrote: > > > > > > > On Thu, Oct 02, 2025 at 08:34:06PM -0400, Jon Maloy wrote: > > > > > ARP announcements and unsolicited NAs should be handled with caution > > > > > because of the risk of malignant users emitting them to disturb > > > > > network communication. > > > > > > > > > > There is however one case we where we know it is legitimate > > > > > and safe for us to send out such messages: The one time we switch > > > > > from using ctx->own_tap_mac to a MAC address received via the > > > > > recently added neigbour subscription function. Later changes to > > > > > the MAC address of a host in an existing entry cannot be fully > > > > > trusted, so we abstain from doing it in such cases. > > > > > > > > > > When sending this type of messages, we notice that the guest accepts > > > > > the update, but shortly later asks for a confirmation in the form of > > > > > a regular ARP/NS request. This is responded to with the new value, > > > > > and we have exactly the effect we wanted. > > > > > > > > > > This commit adds this functionality. > > > > > > > > > > Signed-off-by: Jon Maloy > > > > > > > > > > --- > > > > > v10: -Made small changes based of feedback from David G. > > > > > v11: -Moved from 'Gratuitous ARP reply' model to 'ARP Announcement' > > > > > model. > > > > > v12: -Excluding loopback and default GW addresses from the ARP/NA > > > > > announcement to be sent to the guest > > > > > --- > > > > > arp.c | 42 ++++++++++++++++++++++++++++++++++++++++++ > > > > > arp.h | 2 ++ > > > > > fwd.c | 16 ++++++++++++++++ > > > > > ndp.c | 10 ++++++++++ > > > > > ndp.h | 1 + > > > > > 5 files changed, 71 insertions(+) > > > > > > > > > > diff --git a/arp.c b/arp.c > > > > > index ad088b1..b08780f 100644 > > > > > --- a/arp.c > > > > > +++ b/arp.c > > > > > @@ -146,3 +146,45 @@ void arp_send_init_req(const struct ctx *c) > > > > > debug("Sending initial ARP request for guest MAC address"); > > > > > tap_send_single(c, &req, sizeof(req)); > > > > > } > > > > > + > > > > > +/** > > > > > + * arp_announce() - Send an ARP announcement for an IPv4 host > > > > > + * @c: Execution context > > > > > + * @ip: IPv4 address we announce as owned by @mac > > > > > + * @mac: MAC address to advertise for @ip > > > > > + */ > > > > > +void arp_announce(const struct ctx *c, struct in_addr *ip, > > > > > + const unsigned char *mac) > > > > > +{ > > > > > + char ip_str[INET_ADDRSTRLEN]; > > > > > + char mac_str[ETH_ADDRSTRLEN]; > > > > > + struct { > > > > > + struct ethhdr eh; > > > > > + struct arphdr ah; > > > > > + struct arpmsg am; > > > > > + } __attribute__((__packed__)) annc; > > > > > + > > > > > + /* Ethernet header */ > > > > > + annc.eh.h_proto = htons(ETH_P_ARP); > > > > > + memcpy(annc.eh.h_dest, MAC_BROADCAST, sizeof(annc.eh.h_dest)); > > > > > + memcpy(annc.eh.h_source, mac, sizeof(annc.eh.h_source)); > > > > > + > > > > > + /* ARP header */ > > > > > + annc.ah.ar_op = htons(ARPOP_REQUEST); > > > > > + annc.ah.ar_hrd = htons(ARPHRD_ETHER); > > > > > + annc.ah.ar_pro = htons(ETH_P_IP); > > > > > + annc.ah.ar_hln = ETH_ALEN; > > > > > + annc.ah.ar_pln = 4; > > > > > + > > > > > + /* ARP message */ > > > > > + memcpy(annc.am.sha, mac, sizeof(annc.am.sha)); > > > > > + memcpy(annc.am.sip, ip, sizeof(annc.am.sip)); > > > > > + memcpy(annc.am.tha, MAC_BROADCAST, sizeof(annc.am.tha)); > > > > > + memcpy(annc.am.tip, ip, sizeof(annc.am.tip)); > > > > > > > > As noted in several earlier revisions, having sip == tip (but with > > > > different mac addresses) looks odd. Is that what the RFCs say to do > > > > for ARP announcements? > > > > > > > > > + inet_ntop(AF_INET, ip, ip_str, sizeof(ip_str)); > > > > > + eth_ntop(mac, mac_str, sizeof(mac_str)); > > > > > + debug("Announcing ARP for %s / %s", ip_str, mac_str); > > > > > + > > > > > + tap_send_single(c, &annc, sizeof(annc)); > > > > > +} > > > > > diff --git a/arp.h b/arp.h > > > > > index d5ad0e1..4862e90 100644 > > > > > --- a/arp.h > > > > > +++ b/arp.h > > > > > @@ -22,5 +22,7 @@ struct arpmsg { > > > > > > > > > > int arp(const struct ctx *c, struct iov_tail *data); > > > > > void arp_send_init_req(const struct ctx *c); > > > > > +void arp_announce(const struct ctx *c, struct in_addr *ip, > > > > > + const unsigned char *mac); > > > > > > > > > > #endif /* ARP_H */ > > > > > diff --git a/fwd.c b/fwd.c > > > > > index c34bb1c..ade97c8 100644 > > > > > --- a/fwd.c > > > > > +++ b/fwd.c > > > > > @@ -26,6 +26,8 @@ > > > > > #include "passt.h" > > > > > #include "lineread.h" > > > > > #include "flow_table.h" > > > > > +#include "arp.h" > > > > > +#include "ndp.h" > > > > > > > > > > /* Empheral port range: values from RFC 6335 */ > > > > > static in_port_t fwd_ephemeral_min = (1 << 15) + (1 << 14); > > > > > @@ -140,6 +142,20 @@ void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr, > > > > > > > > > > memcpy(&e->addr, addr, sizeof(*addr)); > > > > > memcpy(e->mac, mac, ETH_ALEN); > > > > > + > > > > > + if (inany_equals(addr, &inany_loopback4)) > > > > > + return; > > > > > + if (inany_equals(addr, &inany_loopback6)) > > > > > + return; > > > > > > > > Since you need these explicit checks anyway, there's not much point to > > > > the dummy entries you created - you could exit on these addresses > > > > before even looking up the table. > > > > > > I guess those entries make sense if we can drop all these checks as a > > > result. I think we should be able to. > > > > We couldn't in this version, because that might have allowed the > > entries for loopback to be updated, which is certainly wrong. But > > it will all need re-examination after moving everything over to guest > > side addresses which AIUI is the plan for the next spin. > > Yes, I was talking about the next version. For context, when we first > discussed about the possibility of these entries with Jon, my > assumption was that the whole series used guest-side link-layer > addresses exclusively, We did use guest-side link-layer addresses - host-side LL addresses might not even exist. The question is about whether we use guest side or host side IP addresses to index the table. > but that wasn't the case, hence (I think) the > current struggle. If we go in that direction, I hope it's possible. Thinking a bit more closely, I don't think it is, for much the same reason it wasn't in this draft. According to the rules Jon and I thrashed out elsewhere in the thread, there are certain guest side addresses that must be locked to use our_tap_mac. We're essentially shadowing something that might exist on the host side, so we should use our MAC not the MAC of whatever is shadowed. Just pre-populating an entry won't do the trick, because it could be overwritten if the right events occur for the shadowed host. > By the way, while they are probably more elegant because we can skip > explicit cases, they might be a bit more complicated to manage compared > to those explicit cases the day we get to change addresses and routes > dynamically using a netlink monitor, because at that point we might > need to remove some entries based on old addresses / default gateways. > > But given that this is already complicated enough, we can keep that > problem for later, and just go with the simplest possible approach > (whatever it is) for the moment. > > -- > Stefano > -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson