On Wed, Sep 24, 2025 at 06:18:52PM -0400, Jon Maloy wrote: > > > On 2025-09-23 23:22, David Gibson wrote: > > On Tue, Sep 23, 2025 at 09:13:30PM -0400, Jon Maloy wrote: > > > Gratuitious ARP and unsolicitated NA should be handled with caution > > > because of the risk of malignant users emitting them to disturb > > > network communication. > > > > > > There is however one case we where we know it is legitimate > > > and safe for us to send out such messages: The one time we switch > > > from using ctx->own_tap_mac to a MAC address received via the > > > recently added neigbour subscription function. Later changes to > > > the MAC address of a host in an existing entry cannot be fully > > > trusted, so we abstain from doing it in such cases. > > > > So, I think you're right that the gratuitous ARP is safe in this case. > > > > But it concerns me that (other that some edge cases) we're sending > > data to the guest under own_tap_mac before we get the real MAC. At > > the point we send data from a flow to the guest, I would have expected > > to already have an entry in the host neighbour table (because by > > definition the host is talking to the peer), therefore in our cache, > > by the subscriber. > > > > I'm wondering if it could be as simple as both the neighbour update > > and the actual data coming in the same epoll batch, and we could avoid > > the temporary use of own_tap_mac by prioritising processing of the > > neighbour events. > > I experimented a bit with this. My test program is a simple UDP > client-server pair, exchanging first 3 UDP messages client->server, followed > by > 3 messages server->client. With the client on the guest, and server outside? How is the outside machine arranged - is it a physically separate host? A bridged VM or container on the same host? Something else? > First, I changed the main() loop a bit, so that netlink events are > handled before all other events, if any. (Basically, I added > an extra loop before the main loop, only handling netlink events, before > moving on to the main loop (where netlink events had been excluded.) > This should secure absolute priority of netlink events before any other > events. As you will see below, this made no difference to the scenarios > I describe. Drat. > 1: When starting the container, I notice that there is no subscription > event in PASTA, even though I can see the entry for the remote host > is present in the host's ARP table. There is never any event coming > up even if I wait for 10+ minutes. Huh.... do we need to do something to ensure we get events for existing entries in the host ARP table, not just ones that are added or updated after we're running? > 2: The first UDP is attempted sent from the guest. An ARP request is > sent to PASTA, and responded to with the 9a:9a: address. Maybe we still need to explicitly ask for an ARP resolution when the guest ARPs. > 3: The UDP, and two more UDPs, are sent via PASTA to the remote host. > Those are responded to and sent back to the guest. > 4: I now receive a neigbour event, and can update my cache, but since > there is still no new ARP request from the guest, even if I wait > for many minutes, he continues in the belief the old address > is confirmed. > 5: If I run the same test again after a few minutes, > the guest *does* send out an ARP request a few seconds after the > message exchange, and is now updated with the correct address. > > - If i run this sequence in the opposite direction everything seems to > work ok, at least if the ARP entry is already present on the local > host. > > - When I delete that ARP entry before running the sequence, Delete it from the host ARP table, you mean? > a neigbour > event shows up after some seconds, but it can take up to a minute, at > least. Oof. I guess some delay is inevitable, but that's way longer than I would have expected. > If I run my sequence from the remote host before that happens, > there will be an ARP request from the guest (for the response UDPs), > responded to with the default tap mac, and it will remain > like that for a long time, since the guest considers the mac address > confirmed. It doesn't help much that a neigbour event shows up some > seconds after the exchange. > > In brief, the guest *will* be updated eventually, but depending on luck > and timing it may take a long time, at least several minutes. > My gratuitous ARPs/ non-solicitated NAs doesn't completely solve this issue, > but it significantly reduces the potential time gap before the guest gets > properly updated. > > > > > When sending this type of messages, we notice that the guest accepts > > > the update, but also asks for a confirmation in the form of a regular > > > ARP/NS request. This is responded to with the new value, and we have > > > exactly the effect we wanted. > > > > > > This commit adds this functionality. > > > > > > Signed-off-by: Jon Maloy > > > --- > > > arp.c | 39 +++++++++++++++++++++++++++++++++++++++ > > > arp.h | 2 ++ > > > fwd.c | 11 +++++++++++ > > > ndp.c | 10 ++++++++++ > > > ndp.h | 1 + > > > 5 files changed, 63 insertions(+) > > > > > > diff --git a/arp.c b/arp.c > > > index 442faff..259f736 100644 > > > --- a/arp.c > > > +++ b/arp.c > > > @@ -151,3 +151,42 @@ void arp_send_init_req(const struct ctx *c) > > > debug("Sending initial ARP request for guest MAC address"); > > > tap_send_single(c, &req, sizeof(req)); > > > } > > > + > > > +/** > > > + * arp_send_gratuitous() - Send a gratuitous ARP announcement for an IPv4 host > > > + * @c: Execution context > > > + * @ip: IPv4 address we announce as owned by @mac > > > + * @mac: MAC address to advertise for @ip > > > + */ > > > +void arp_send_gratuitous(const struct ctx *c, struct in_addr ip, > > > + const unsigned char *mac) > > > +{ > > > + char ip_str[INET_ADDRSTRLEN]; > > > + struct { > > > + struct ethhdr eh; > > > + struct arphdr ah; > > > + struct arpmsg am; > > > + } __attribute__((__packed__)) req; > > > > 'req' is not a great name, since this is an ARP response, not a > > request (but see below). > > > > > + /* Ethernet header */ > > > + req.eh.h_proto = htons(ETH_P_ARP); > > > + memcpy(req.eh.h_dest, MAC_BROADCAST, sizeof(req.eh.h_dest)); > > > + memcpy(req.eh.h_source, c->our_tap_mac, sizeof(req.eh.h_source)); > > > + > > > + /* ARP header */ > > > + req.ah.ar_op = htons(ARPOP_REPLY); > > > + req.ah.ar_hrd = htons(ARPHRD_ETHER); > > > + req.ah.ar_pro = htons(ETH_P_IP); > > > + req.ah.ar_hln = ETH_ALEN; > > > + req.ah.ar_pln = 4; > > > + > > > + /* ARP message */ > > > + memcpy(req.am.sha, mac, sizeof(req.am.sha)); > > > + memcpy(req.am.sip, &ip, sizeof(req.am.sip)); > > > + memcpy(req.am.tha, MAC_BROADCAST, sizeof(req.am.tha)); > > > + memcpy(req.am.tip, &ip, sizeof(req.am.tip)); > > > > So, I was trying to check if it made sense to use the same IP for both > > source and target here, and came across > > https://www.rfc-editor.org/rfc/rfc5227#section-3 > > > > Which suggests we should (counter intuitively) be using ARP requests, > > not ARP replies for announcements. > > Instead of gratuitous ARP, you mean? I can try it. It suggests that what's traditionally meant by "gratuitous ARP" is actually ARP requests, not responses as you might expect. There's some detailed reasoning there, I'd give it a read. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson