From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202508 header.b=CJuwZafL; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id C69BD5A0276 for ; Wed, 24 Sep 2025 05:34:29 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202508; t=1758684866; bh=RgZvdYeE3+ok2E8R87SfHEIrh+BG2g+XCvh1WDEXEkk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=CJuwZafLJsYnWdcL8V1Thy0A9IOh4W0OEVQzzAUzHeEKI60hLG2BxtcMtNXHKYXan nEB7PxpNkNpSye0zHv/+YhqGDeLXmG3XUvp/ewDJpUnzMvykcpD1SoGxvVhBpYbiG5 smaXhUd/qHt0BN4+H7OxF+5GpTHp0hL2LPrDYTJhdjQCPYBVcZB0gd8kvOLTdixRKh KzQkPxKMOkfa1+PMsfhbTmaQeFgK3rzfeeUixKqoVjRsrIYqQaIsttN2PIYs8Kjr01 oPxaird6ZUsZKv0M/3YWtthjPkKBhXJFf6RFw3+KxRn2QDKAyA133TIr5hhGPdKtdS UqTBGeSSAFqpA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4cWjBp50R2z4wCK; Wed, 24 Sep 2025 13:34:26 +1000 (AEST) Date: Wed, 24 Sep 2025 12:47:22 +1000 From: David Gibson To: Jon Maloy Subject: Re: [PATCH v9 1/9] netlink: add subsciption on changes in NDP/ARP table Message-ID: References: <20250924011330.1168921-1-jmaloy@redhat.com> <20250924011330.1168921-2-jmaloy@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="WILFw4H5AmUcHS4K" Content-Disposition: inline In-Reply-To: <20250924011330.1168921-2-jmaloy@redhat.com> Message-ID-Hash: CQYZPIRGMKOD3DGDSQDY37SHIPTNOMVQ X-Message-ID-Hash: CQYZPIRGMKOD3DGDSQDY37SHIPTNOMVQ X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: sbrivio@redhat.com, dgibson@redhat.com, passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --WILFw4H5AmUcHS4K Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Sep 23, 2025 at 09:13:22PM -0400, Jon Maloy wrote: > The solution to bug https://bugs.passt.top/show_bug.cgi?id=3D120 > requires the ability to translate from an IP address to its > corresponding MAC address in cases where those are present in > the ARP or NDP tables. >=20 > To keep track of the contents of these tables we add a netlink > based neighbour subscription feature. >=20 > Signed-off-by: Jon Maloy > Reviewed-by: David Gibson The change to a subscription model is large enough that I think it warrants dropping pre-existing reviews. > --- > v3: - Added an attribute contianing NDA_DST to sent message, so > that we let the kernel do the filtering of the IP address > and return only one entry. > - Added interface index to the call signature. Since the only > interface we know is the template interface, this limits > the number of hosts that will be seen as 'network segment > local' from a PASST viewpoint. > v4: - Made loop independent of attribute order. > - Ignoring L2 addresses which are not of size ETH_ALEN. > v5: - Changed return value of new function, so caller can know if > a MAC address really was found. > v6: - Removed warning printout which had ended up in the wrong > commit. > v8: - Changed to neighbour event subscription model >=20 > netlink: arp/ndp table subscription > --- > epoll_type.h | 2 + > netlink.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++ > netlink.h | 4 ++ > passt.c | 8 ++++ > 4 files changed, 128 insertions(+) >=20 > diff --git a/epoll_type.h b/epoll_type.h > index 12ac59b..a90ffb6 100644 > --- a/epoll_type.h > +++ b/epoll_type.h > @@ -44,6 +44,8 @@ enum epoll_type { > EPOLL_TYPE_REPAIR_LISTEN, > /* TCP_REPAIR helper socket */ > EPOLL_TYPE_REPAIR, > + /* Netlink neighbour subscription socket */ > + EPOLL_TYPE_NL_NEIGH, > =20 > EPOLL_NUM_TYPES, > }; > diff --git a/netlink.c b/netlink.c > index c436780..1faf3da 100644 > --- a/netlink.c > +++ b/netlink.c > @@ -53,6 +53,7 @@ > int nl_sock =3D -1; > int nl_sock_ns =3D -1; > static int nl_seq =3D 1; > +static int nl_sock_neigh =3D -1; > =20 > /** > * nl_sock_init_do() - Set up netlink sockets in init or target namespace > @@ -84,6 +85,119 @@ static int nl_sock_init_do(void *arg) > return 0; > } > =20 > +/** > + * nl_neigh_subscr_init() - Open a NETLINK_ROUTE socket and subscribe to= neighbor events > + * > + * Return: 0 on success, -1 on failure > + */ > +int nl_neigh_subscr_init(struct ctx *c) > +{ > + struct epoll_event ev =3D { 0 }; > + union epoll_ref ref =3D { .type =3D EPOLL_TYPE_NL_NEIGH, .fd =3D 0 }; You overwrite the .fd field, so there's not need to include it in the initializer. (Also 0 is a bad placeholder, since 0 is a valid fd). > + > + struct sockaddr_nl addr =3D { > + .nl_family =3D AF_NETLINK, > + .nl_groups =3D RTMGRP_NEIGH, > + }; > + > + if (nl_sock_neigh >=3D 0) > + return 0; > + > + nl_sock_neigh =3D socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, NETLINK_R= OUTE); > + if (nl_sock_neigh < 0) > + return -1; > + > + if (bind(nl_sock_neigh, (struct sockaddr *)&addr, sizeof(addr)) < 0) { > + close(nl_sock_neigh); > + nl_sock_neigh =3D -1; > + return -1; > + } > + > + ref.fd =3D nl_sock_neigh; > + ev.events =3D EPOLLIN; You could set this in the initializer. > + ev.data.u64 =3D ref.u64; > + if (epoll_ctl(c->epollfd, EPOLL_CTL_ADD, nl_sock_neigh, &ev) =3D=3D -1)= { > + close(nl_sock_neigh); > + nl_sock_neigh =3D -1; > + return -1; > + } > + > + return 0; > +} > + > +/** > + * nl_neigh_subscr_handler() - Non-blocking drain of pending neighbor up= dates > + * @c: Execution context > + */ > +void nl_neigh_subscr_handler(struct ctx *c) > +{ > + struct nlmsghdr *nh; > + char buf[NLBUFSIZ]; > + ssize_t n; > + > + if (nl_sock_neigh < 0) > + return; > + > + for (;;) { > + n =3D recv(nl_sock_neigh, buf, sizeof(buf), MSG_DONTWAIT); > + if (n <=3D 0) > + return; EAGAIN correctly results in a silent return, but you probably want to report other errors and EOF (n =3D=3D 0) since those are not expected. > + > + nh =3D (struct nlmsghdr *)buf; > + for (; NLMSG_OK(nh, n); nh =3D NLMSG_NEXT(nh, n)) { > + struct ndmsg *ndm =3D NLMSG_DATA(nh); > + struct rtattr *rta =3D (struct rtattr *)(ndm + 1); > + size_t na =3D NLMSG_PAYLOAD(nh, sizeof(*ndm)); > + const uint8_t *lladdr =3D NULL; > + const void *dst =3D NULL; > + size_t lladdr_len =3D 0; > + size_t dstlen =3D 0; > + > + if (nh->nlmsg_type =3D=3D NLMSG_DONE || > + nh->nlmsg_type =3D=3D NLMSG_ERROR) > + continue; IIUC, NLMSG_ERROR would also be unexpected, so should probably be reported. > + if (nh->nlmsg_type !=3D RTM_NEWNEIGH && > + nh->nlmsg_type !=3D RTM_DELNEIGH) > + continue; Should we filter on ndm->ndm_ifindex for neighbours on the template interface only? I'm not sure. It seems like we probably should filter on ndm->ndm_state. > + for (; RTA_OK(rta, na); rta =3D RTA_NEXT(rta, na)) { > + switch (rta->rta_type) { > + case NDA_DST: > + dst =3D RTA_DATA(rta); > + dstlen =3D RTA_PAYLOAD(rta); > + break; > + case NDA_LLADDR: > + lladdr =3D RTA_DATA(rta); > + lladdr_len =3D RTA_PAYLOAD(rta); > + break; > + default: > + break; > + } > + } It seems like somewhere there ought to be something specifying the address type of both DST and LLADDR, but I'm not sure exactly where. We should check those, just in case some weirdo runs us on a host with IPX over Token Ring. > + if (!dst) > + continue; > + > + if (dstlen !=3D sizeof(struct in_addr) && > + dstlen !=3D sizeof(struct in6_addr)) > + continue; > + > + char abuf[INET6_ADDRSTRLEN]; > + > + if (dstlen =3D=3D sizeof(struct in_addr)) > + inet_ntop(AF_INET, dst, abuf, sizeof(abuf)); > + else > + inet_ntop(AF_INET6, dst, abuf, sizeof(abuf)); > + > + if (nh->nlmsg_type =3D=3D RTM_NEWNEIGH) > + debug("neigh: NEW %s lladdr_len=3D%zu", abuf, lladdr_len); > + else > + debug("neigh: DEL %s", abuf); > + } > + } > +} > + > /** > * nl_sock_init() - Call nl_sock_init_do(), won't return on failure > * @c: Execution context > diff --git a/netlink.h b/netlink.h > index b51e99c..a7d3506 100644 > --- a/netlink.h > +++ b/netlink.h > @@ -17,6 +17,8 @@ int nl_route_dup(int s_src, unsigned int ifi_src, > int s_dst, unsigned int ifi_dst, sa_family_t af); > int nl_addr_get(int s, unsigned int ifi, sa_family_t af, > void *addr, int *prefix_len, void *addr_l); > +bool nl_neigh_mac_get(int s, const union inany_addr *addr, int ifi, > + unsigned char *mac); > int nl_addr_set(int s, unsigned int ifi, sa_family_t af, > const void *addr, int prefix_len); > int nl_addr_get_ll(int s, unsigned int ifi, struct in6_addr *addr); > @@ -28,5 +30,7 @@ int nl_link_set_mac(int s, unsigned int ifi, const void= *mac); > int nl_link_set_mtu(int s, unsigned int ifi, int mtu); > int nl_link_set_flags(int s, unsigned int ifi, > unsigned int set, unsigned int change); > +int nl_neigh_subscr_init(struct ctx *c); > +void nl_neigh_subscr_handler(struct ctx *c); > =20 > #endif /* NETLINK_H */ > diff --git a/passt.c b/passt.c > index 31fbb75..e20bbad 100644 > --- a/passt.c > +++ b/passt.c > @@ -53,6 +53,7 @@ > #include "vu_common.h" > #include "migrate.h" > #include "repair.h" > +#include "netlink.h" > =20 > #define EPOLL_EVENTS 8 > =20 > @@ -79,6 +80,7 @@ char *epoll_type_str[] =3D { > [EPOLL_TYPE_VHOST_KICK] =3D "vhost-user kick socket", > [EPOLL_TYPE_REPAIR_LISTEN] =3D "TCP_REPAIR helper listening socket", > [EPOLL_TYPE_REPAIR] =3D "TCP_REPAIR helper socket", > + [EPOLL_TYPE_NL_NEIGH] =3D "netlink neighbour subscription socket", > }; > static_assert(ARRAY_SIZE(epoll_type_str) =3D=3D EPOLL_NUM_TYPES, > "epoll_type_str[] doesn't match enum epoll_type"); > @@ -322,6 +324,9 @@ int main(int argc, char **argv) > =20 > pcap_init(&c); > =20 > + if (nl_neigh_subscr_init(&c) < 0) > + warn("Failed to subscribe to RTMGRP_NEIGH"); > + > if (!c.foreground) { > if ((devnull_fd =3D open("/dev/null", O_RDWR | O_CLOEXEC)) < 0) > die_perror("Failed to open /dev/null"); > @@ -414,6 +419,9 @@ loop: > case EPOLL_TYPE_REPAIR: > repair_handler(&c, eventmask); > break; > + case EPOLL_TYPE_NL_NEIGH: > + nl_neigh_subscr_handler(&c); > + break; > default: > /* Can't happen */ > ASSERT(0); > --=20 > 2.50.1 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --WILFw4H5AmUcHS4K Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmjTW6kACgkQzQJF27ox 2Gf2Ng//ZugQUH7bFkJ7fkfgytlSLXzwnAzRHyRterkqg4D0loOxOeVjo/YcYeqe t4aVBp2x5d5IyK+San44KuIGZenJ1poJYmNdqzZ2X8CRzMZB4DHpp2zHL0vWzS+V 1yZzgBRXjDDAPjhJYdKyhPZ/o9UpqJlBtlsHFBVelZ6jJFkmgqjHExFbB8lKxh27 OS7QPQkMCJg7PF7flESzbsyrY2UoJhKdhIpEjYyUWrWbJaaoS+QqgLl7mSG90+HR ZqwRw5FfVk9A7VwqW2+2fyPxi6UsMWNXuEehKrYAEPoV/arv2GY/qDdaHjtKG9Pg 8XKm01tg+Y6w+s7lpdByq3xndnuBb7ccSD1NMQ/Tys6vSfDNiHVgJzZH0jTxye6a LSr4Bh9f7QARtu7qY+t7dLzJEwkOBF4qIAvAhBx1CdNjRIdNMs55mbLlrNuNJb0Y OuQCiHcErg5dQ4kirRhNkOCuHFgE95WZeIiJtSXbKVfwQBs3AnW0ZGDZ9DphNrtI n/nJ/fXoofyeQn66NAk0Cz+yTc17OAXC8prdrjQ9yxy+INGjGpfmcToVBCY63tHN lurG5c5+owas/REAb4sK1bhhelvyGYotbfFMRimmgDWRaK2+w6rZWv8OFRkDxh5b RheV+KhyNdO07laLeavEI//2Tnfa2WeHuzUwLGRgPwJ29ngyxGQ= =31Yg -----END PGP SIGNATURE----- --WILFw4H5AmUcHS4K--