From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202508 header.b=jkwvLhdM; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 5AB035A026F for ; Mon, 29 Sep 2025 07:49:52 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202508; t=1759124989; bh=T5LuJ93hWfXTI7yiFhQ38enro3gG7hjUngsgUvLAnck=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=jkwvLhdM/Lx3hRYvRU8D+8xYdjNZycitUg48uMjn7KpPxJCqQuSZ1Ot/KiYMmL9MI 6FBiwpSg45zlyZhikMukc9oF8/ykHMU6At7eX/We5mIPga3LLxEmWjYTgtPf5XqFxl YH72VlZqklX3IfwIWtVieS57gIBOgXOhh/N3+Jq/pUoUNSsJu273xZpx71ELsRWGn2 Fbd5PM+aWDsfF2VovorPC9Wgra0hWkxNGYzl61y88LzgjtAab2vO+e+ssIxKQsrzqd 2G026sPYvWCkVi0qhwM9lxetfVThC2TBTH1O59+qg0C/dfWH4TXs+o31t2BfH7+NhY yD4QWVyodZgfQ== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4cZqyj5ySWz4wBD; Mon, 29 Sep 2025 15:49:49 +1000 (AEST) Date: Mon, 29 Sep 2025 14:29:23 +1000 From: David Gibson To: Jon Maloy Subject: Re: [PATCH v11 1/9] netlink: add subsciption on changes in NDP/ARP table Message-ID: References: <20250927192522.3024554-1-jmaloy@redhat.com> <20250927192522.3024554-2-jmaloy@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="65YdbpEDdAfxF2e6" Content-Disposition: inline In-Reply-To: <20250927192522.3024554-2-jmaloy@redhat.com> Message-ID-Hash: 7M6EUUPMZXYBJRJT6ZNAWECESEJITQ2I X-Message-ID-Hash: 7M6EUUPMZXYBJRJT6ZNAWECESEJITQ2I X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: sbrivio@redhat.com, dgibson@redhat.com, passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --65YdbpEDdAfxF2e6 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Sep 27, 2025 at 03:25:14PM -0400, Jon Maloy wrote: > The solution to bug https://bugs.passt.top/show_bug.cgi?id=3D120 > requires the ability to translate from an IP address to its > corresponding MAC address in cases where those are present in > the ARP or NDP tables. >=20 > To keep track of the contents of these tables we add a netlink > based neighbour subscription feature. >=20 > Signed-off-by: Jon Maloy Reviewed-by: David Gibson Although there are some suggestions for polish below. >=20 > --- > v3: - Added an attribute contianing NDA_DST to sent message, so > that we let the kernel do the filtering of the IP address > and return only one entry. > - Added interface index to the call signature. Since the only > interface we know is the template interface, this limits > the number of hosts that will be seen as 'network segment > local' from a PASST viewpoint. > v4: - Made loop independent of attribute order. > - Ignoring L2 addresses which are not of size ETH_ALEN. > v5: - Changed return value of new function, so caller can know if > a MAC address really was found. > v6: - Removed warning printout which had ended up in the wrong > commit. > v8: - Changed to neighbour event subscription model > - netlink: arp/ndp table subscription > v10:- Updated according to David's latest comments on v8 > - Added functionaly where we initially read current > state of ARP/NDP tables > --- > epoll_type.h | 2 + > netlink.c | 180 +++++++++++++++++++++++++++++++++++++++++++++++++++ > netlink.h | 4 ++ > passt.c | 8 +++ > 4 files changed, 194 insertions(+) >=20 > diff --git a/epoll_type.h b/epoll_type.h > index 12ac59b..a90ffb6 100644 > --- a/epoll_type.h > +++ b/epoll_type.h > @@ -44,6 +44,8 @@ enum epoll_type { > EPOLL_TYPE_REPAIR_LISTEN, > /* TCP_REPAIR helper socket */ > EPOLL_TYPE_REPAIR, > + /* Netlink neighbour subscription socket */ > + EPOLL_TYPE_NL_NEIGH, > =20 > EPOLL_NUM_TYPES, > }; > diff --git a/netlink.c b/netlink.c > index c436780..0dc4e9d 100644 > --- a/netlink.c > +++ b/netlink.c > @@ -26,6 +26,7 @@ > #include > #include > #include > +#include > =20 > #include > #include > @@ -40,6 +41,10 @@ > #define RTNH_NEXT_AND_DEC(rtnh, attrlen) \ > ((attrlen) -=3D RTNH_ALIGN((rtnh)->rtnh_len), RTNH_NEXT(rtnh)) > =20 > +/* Convenience macro borrowed from kernel */ > +#define NUD_VALID (NUD_PERMANENT | NUD_NOARP | \ > + NUD_REACHABLE | NUD_PROBE | NUD_STALE) > + > /* Netlink expects a buffer of at least 8kiB or the system page size, > * whichever is larger. 32kiB is recommended for more efficient. > * Since the largest page size on any remotely common Linux setup is > @@ -53,6 +58,7 @@ > int nl_sock =3D -1; > int nl_sock_ns =3D -1; > static int nl_seq =3D 1; > +static int nl_sock_neigh =3D -1; > =20 > /** > * nl_sock_init_do() - Set up netlink sockets in init or target namespace > @@ -1103,3 +1109,177 @@ int nl_link_set_flags(int s, unsigned int ifi, > =20 > return nl_do(s, &req, RTM_NEWLINK, 0, sizeof(req)); > } > + > +/** > + * nl_neigh_msg_read() - Interpret a neigbour state message from netlink > + * > + */ > +static void nl_neigh_msg_read(const struct ctx *c, struct nlmsghdr *nh) > +{ > + struct ndmsg *ndm =3D NLMSG_DATA(nh); > + struct rtattr *rta =3D (struct rtattr *)(ndm + 1); > + size_t na =3D NLMSG_PAYLOAD(nh, sizeof(*ndm)); > + char ip_str[INET6_ADDRSTRLEN]; > + char mac_str[ETH_ADDRSTRLEN]; > + const uint8_t *lladdr =3D NULL; > + const void *dst =3D NULL; > + size_t lladdr_len =3D 0; > + uint8_t mac[ETH_ALEN]; > + union inany_addr addr; > + size_t dstlen =3D 0; > + > + if (nh->nlmsg_type =3D=3D NLMSG_DONE) > + return; > + > + if (nh->nlmsg_type =3D=3D NLMSG_ERROR) { > + warn("NLMSG type error"); This message could be a bit more helpful, e.g. saying that it's on the neighbour watching socket. There's also an errno in the message (see nl_status()). > + return; > + } > + > + if (nh->nlmsg_type !=3D RTM_NEWNEIGH && > + nh->nlmsg_type !=3D RTM_DELNEIGH) > + return; > + > + for (; RTA_OK(rta, na); rta =3D RTA_NEXT(rta, na)) { > + switch (rta->rta_type) { > + case NDA_DST: > + dst =3D RTA_DATA(rta); > + dstlen =3D RTA_PAYLOAD(rta); > + break; > + case NDA_LLADDR: > + lladdr =3D RTA_DATA(rta); > + lladdr_len =3D RTA_PAYLOAD(rta); > + break; > + default: > + break; > + } > + } > + > + if (!dst) > + return; > + > + if (!lladdr || lladdr_len !=3D ETH_ALEN) > + return; > + > + if (ndm->ndm_type !=3D ARPHRD_ETHER) > + return; > + > + memcpy(mac, lladdr, ETH_ALEN); > + eth_ntop(mac, mac_str, sizeof(mac_str)); It would be nice to avoid the stringification if we're not printing the trace message, but that is kind of awkward to do. > + if (ndm->ndm_family =3D=3D AF_INET && > + (dstlen !=3D sizeof(struct in_addr) || > + ndm->ndm_ifindex !=3D c->ifi4)) > + return; > + > + if (ndm->ndm_family =3D=3D AF_INET6 && > + (dstlen !=3D sizeof(struct in6_addr) || > + ndm->ndm_ifindex !=3D c->ifi6)) > + return; Hmm. These are basically correct, but make me uncomfortable, because they're checking for two very different conditions. ifindex !=3D ifi is basically an expected thing; we get a message for an interface we don't care about, ignore it and carry on. The other condition - addr length doesn't match addr family means the kernel has done something really unexpected. There's not much we can do but ignore it (maybe log) and carry on, but lumping it together with the expected cases seems misleading to me. > + > + if (ndm->ndm_family =3D=3D AF_INET) > + addr =3D inany_from_v4(*(struct in_addr *) dst); > + else if (ndm->ndm_family =3D=3D AF_INET6) > + addr.a6 =3D *(struct in6_addr *) dst; > + else > + return; You can use inany_from_af(&addr, ndm->ndm_family, dst) You will need to exclude non-IP addresses beforehand, though. > + > + inet_ntop(ndm->ndm_family, dst, ip_str, sizeof(ip_str)); > + > + if (nh->nlmsg_type =3D=3D RTM_NEWNEIGH && > + ndm->ndm_state & NUD_VALID) { > + trace("neigh table update: %s / %s", ip_str, mac_str); > + } else { > + trace("neigh table delete: %s / %s", ip_str, mac_str); > + } > +} > + > +/** > + * nl_neigh_subscr_sync() - Read current contents ARP/NDP tables > + * > + */ > +static void nl_neigh_subscr_sync(const struct ctx *c, int proto, int ifi) > +{ > + struct { > + struct nlmsghdr nlh; > + struct ndmsg ndm; > + } req =3D { 0 }; > + struct nlmsghdr *nh; > + char buf[NLBUFSIZ]; > + ssize_t status; > + uint32_t seq; > + > + req.ndm.ndm_family =3D proto; > + req.ndm.ndm_ifindex =3D ifi; > + seq =3D nl_send(nl_sock_neigh, &req, RTM_GETNEIGH, NLM_F_DUMP, sizeof(r= eq)); > + nl_foreach_oftype(nh, status, nl_sock_neigh, buf, seq, RTM_NEWNEIGH) > + nl_neigh_msg_read(c, nh); Should probably check status afterwards, and log an error if necessary. > +} > + > +/** > + * nl_neigh_subscr_handler() - Non-blocking drain of pending neighbour u= pdates > + * @c: Execution context > + */ > +void nl_neigh_subscr_handler(struct ctx *c) > +{ > + struct nlmsghdr *nh; > + char buf[NLBUFSIZ]; > + ssize_t n; > + > + for (;;) { > + n =3D recv(nl_sock_neigh, buf, sizeof(buf), MSG_DONTWAIT); > + if (n < 0) { > + if (errno !=3D EAGAIN) > + warn_perror("Failed to read from netlink sock"); > + return; > + } > + nh =3D (struct nlmsghdr *)buf; > + for (; NLMSG_OK(nh, n); nh =3D NLMSG_NEXT(nh, n)) > + nl_neigh_msg_read(c, nh); > + } > +} > + > +/** > + * nl_neigh_subscr_init() - Open netlink sock and subscribe to neighbour= events > + * > + * Return: 0 on success, -1 on failure > + */ > +int nl_neigh_subscr_init(struct ctx *c) > +{ > + union epoll_ref ref =3D { > + .type =3D EPOLL_TYPE_NL_NEIGH > + }; > + struct epoll_event ev =3D { > + .events =3D EPOLLIN > + }; > + struct sockaddr_nl addr =3D { > + .nl_family =3D AF_NETLINK, > + .nl_groups =3D RTMGRP_NEIGH, > + }; > + > + if (nl_sock_neigh >=3D 0) > + return 0; > + > + nl_sock_neigh =3D socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, NETLINK_R= OUTE); > + if (nl_sock_neigh < 0) > + return -1; > + > + if (bind(nl_sock_neigh, (struct sockaddr *)&addr, sizeof(addr)) < 0) { > + close(nl_sock_neigh); > + nl_sock_neigh =3D -1; > + return -1; > + } > + > + ref.fd =3D nl_sock_neigh; > + ev.data.u64 =3D ref.u64; > + if (epoll_ctl(c->epollfd, EPOLL_CTL_ADD, nl_sock_neigh, &ev) =3D=3D -1)= { > + close(nl_sock_neigh); > + nl_sock_neigh =3D -1; > + return -1; > + } > + > + nl_neigh_subscr_sync(c, AF_INET, c->ifi4); > + nl_neigh_subscr_sync(c, AF_INET6, c->ifi6); > + > + return 0; > +} > diff --git a/netlink.h b/netlink.h > index b51e99c..a7d3506 100644 > --- a/netlink.h > +++ b/netlink.h > @@ -17,6 +17,8 @@ int nl_route_dup(int s_src, unsigned int ifi_src, > int s_dst, unsigned int ifi_dst, sa_family_t af); > int nl_addr_get(int s, unsigned int ifi, sa_family_t af, > void *addr, int *prefix_len, void *addr_l); > +bool nl_neigh_mac_get(int s, const union inany_addr *addr, int ifi, > + unsigned char *mac); > int nl_addr_set(int s, unsigned int ifi, sa_family_t af, > const void *addr, int prefix_len); > int nl_addr_get_ll(int s, unsigned int ifi, struct in6_addr *addr); > @@ -28,5 +30,7 @@ int nl_link_set_mac(int s, unsigned int ifi, const void= *mac); > int nl_link_set_mtu(int s, unsigned int ifi, int mtu); > int nl_link_set_flags(int s, unsigned int ifi, > unsigned int set, unsigned int change); > +int nl_neigh_subscr_init(struct ctx *c); > +void nl_neigh_subscr_handler(struct ctx *c); > =20 > #endif /* NETLINK_H */ > diff --git a/passt.c b/passt.c > index 31fbb75..e20bbad 100644 > --- a/passt.c > +++ b/passt.c > @@ -53,6 +53,7 @@ > #include "vu_common.h" > #include "migrate.h" > #include "repair.h" > +#include "netlink.h" > =20 > #define EPOLL_EVENTS 8 > =20 > @@ -79,6 +80,7 @@ char *epoll_type_str[] =3D { > [EPOLL_TYPE_VHOST_KICK] =3D "vhost-user kick socket", > [EPOLL_TYPE_REPAIR_LISTEN] =3D "TCP_REPAIR helper listening socket", > [EPOLL_TYPE_REPAIR] =3D "TCP_REPAIR helper socket", > + [EPOLL_TYPE_NL_NEIGH] =3D "netlink neighbour subscription socket", > }; > static_assert(ARRAY_SIZE(epoll_type_str) =3D=3D EPOLL_NUM_TYPES, > "epoll_type_str[] doesn't match enum epoll_type"); > @@ -322,6 +324,9 @@ int main(int argc, char **argv) > =20 > pcap_init(&c); > =20 > + if (nl_neigh_subscr_init(&c) < 0) > + warn("Failed to subscribe to RTMGRP_NEIGH"); I'd be inclined to put the error message inside nl_neigh_subscr_init(), where it can be a bit more detailed as to what went wrong where. > + > if (!c.foreground) { > if ((devnull_fd =3D open("/dev/null", O_RDWR | O_CLOEXEC)) < 0) > die_perror("Failed to open /dev/null"); > @@ -414,6 +419,9 @@ loop: > case EPOLL_TYPE_REPAIR: > repair_handler(&c, eventmask); > break; > + case EPOLL_TYPE_NL_NEIGH: > + nl_neigh_subscr_handler(&c); > + break; > default: > /* Can't happen */ > ASSERT(0); > --=20 > 2.50.1 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --65YdbpEDdAfxF2e6 Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmjaCyIACgkQzQJF27ox 2GdxXg//U7D9ND6eWvBkujBv2HuH7LXNXWo0rh6awwWan013fJ/Q820AF4tXKj77 S0F7GB4IbNz9rG4KvkHy0qDe1V8ZLQViMOSpDcJRsHD4YosKJPDc7VOfXZiXU3GG RXJG27EHjEc2/tK/ovqfF6lGXAl/yES8+LQQa+SqwqVs7VvSYQNvR1ZHFLe+Be0j J2RYXL2RqOQnEwsJyp7acQSHgG+dseHon6PkMaFo6dSnmhjWph727CelSI8PZDUn wPt+K4S1yvggpAUt9Hd9ltxawIxpObicWTcRBt4Pj5dlgU/HMt0rGGUvuIFxbCUu k6JQqIuq1TsC7Hh58T6aXicdbS66Afyz01dragzx606S6sPgkDi/MTeLovbptECu EBuky3vZsSfgYtyywx/2zxqQkR8KcynhCydXDNtZ5UddMneTJFMqmIRFGV/lHPUu MT2XovluugNWtf2FanQ32nVAdyt01F5fURZNFL1x3zTyQheveiN7Ze+191wHNUfd gdKGjVJzNZpDJPBpZDhTzoVU0J5MoDu4LEp2PU2T4YXLClRge3qA9WBvueHyMnrX H6QYUi2qXfxhSWTP92qi4Q+TkRuMRtIIX8sIbWsQ1R3K8Du80ZCTTNLFV/v6VhTA xURlHgnwBEcAEQvHZIAzp+mXq/4Y+OKFz2oylQtfKpxyZtQShs0= =eE+1 -----END PGP SIGNATURE----- --65YdbpEDdAfxF2e6--