From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=BA69pBNe; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id C6A4E5A026F for ; Tue, 30 Sep 2025 01:58:43 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1759190322; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gCbLtnB1xdvKOqRbmoLv7ohRzi6aIvIoZEnstrmXz3I=; b=BA69pBNe1179WNEPf0A0CO0gyCo7gKcj88Xp8/PojSdzvUbrSq6Py2m4EcLeELP4XRLcI8 L05AChKk3JoOF5fhc92AvVAUHPDGsR7w7IO1TESxL/K7BY+4cZwT/+B1OZfdNDimUICuOP Tr+XzZmtkQtyr/AkGG8L5nQk/Yvr0dQ= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-164-FJ2x8PXjOuelbaBdrsr_tA-1; Mon, 29 Sep 2025 19:58:40 -0400 X-MC-Unique: FJ2x8PXjOuelbaBdrsr_tA-1 X-Mimecast-MFC-AGG-ID: FJ2x8PXjOuelbaBdrsr_tA_1759190319 Received: by mail-wr1-f71.google.com with SMTP id ffacd0b85a97d-3ee1365964cso3655614f8f.2 for ; Mon, 29 Sep 2025 16:58:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759190319; x=1759795119; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=gCbLtnB1xdvKOqRbmoLv7ohRzi6aIvIoZEnstrmXz3I=; b=ckAEdQJ59Rv2Es9YQWu30LHc/80sfA+w3HRCiTNpYT0+mxWoLYu6nNzLGAck+e3c5s moa4Di0yMYWah5WO3EEuQhtwTHKyYabEmMkHWZ8MCeB7rhWkwOvUUgqXTeYUCO5TSSw4 jkEmVSqJKzs+JWnMu2BNa03b8fLtLavg1FP0wioyrAAz/cUJ3X5ImwUmIlWW0ln19yAG AwynFfiHLrfpOdSxYSQ3lrcWwrtAlJx25eHGrGnXs6xjoF93MhF6p9bobfPsj8mn6uMS dO+qBE74ogfyu4bFCjjIc2EKHtAE9oxL9bkKW1z4QQI6ADfjbO3gqmUMwa1/tKgZ9Ngd yCLw== X-Forwarded-Encrypted: i=1; AJvYcCWF9Dsw8S2UlPCmplkTr1vFHaduHxhHxj9sHUAiHBHlTuWjdc8D0PpFftUB5Nkkt20FUP73gNNuzAg=@passt.top X-Gm-Message-State: AOJu0YxX49A/9t+6zH9A5/afT1LQNaYg/WxU4NgN3Io3Ux3ruKDC7B9Q UQ47IpfdjXdZiEDOYzt+4qx7t7Lg9HXSlhg405a2YQZlUeKUHB7o9KVbkT0+fAou15O/VzcKBbZ SxodzI8M/S2RS5XP+GSMKBki39SPHGtYx1cqiGvs/UF8x8LAmVO8fgw== X-Gm-Gg: ASbGncuIjJg8IqJX8rraOVmqLAat3gKZ+LCEf1x8wn1rboMKxPyJmJ+tVfhXJq8AXxl cHy1OaN/o/3yABUcNToJVqrwS65DPhVO7ClM6mY9XpHU1iW4QiaurdTELn3LRD2UhfaNABYlXUF IO8/mhQjEEhvfpG7nVMpw3zxwqdoCB6VSycahKH1W6+x2ZM6oPjC9KFlSwLT5Zq6HO5bbVWygdV CIF6a7Hflw47gt27qTObdWri4DiBh6+Fci+bffiy23+WsdR7dKqimKVD0opI43CWiO7vtC2AM8z ObQMNnhnGcaa92eTyHaSbiufEyyL1PVZ1G4Z6Y8uPebswaE9FA3uAxJhObn71QY0/sSN X-Received: by 2002:a05:6000:3102:b0:3e0:152a:87b2 with SMTP id ffacd0b85a97d-40e429c9cdfmr17155128f8f.13.1759190318577; Mon, 29 Sep 2025 16:58:38 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE6bPRJNBWLgkrcpJu9khOpwx0BpkQKnFOO7DY/gNTy3y4j8UlEU3+Gn5OrnK5lnlIonyZFFQ== X-Received: by 2002:a05:6000:3102:b0:3e0:152a:87b2 with SMTP id ffacd0b85a97d-40e429c9cdfmr17155109f8f.13.1759190317937; Mon, 29 Sep 2025 16:58:37 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-46e3bffe67asm157346775e9.5.2025.09.29.16.58.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Sep 2025 16:58:37 -0700 (PDT) Date: Tue, 30 Sep 2025 01:58:35 +0200 From: Stefano Brivio To: Jon Maloy Subject: Re: [PATCH v11 1/9] netlink: add subsciption on changes in NDP/ARP table Message-ID: <20250930015835.598d183f@elisabeth> In-Reply-To: <20250927192522.3024554-2-jmaloy@redhat.com> References: <20250927192522.3024554-1-jmaloy@redhat.com> <20250927192522.3024554-2-jmaloy@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: ozgD6vlVJslfrN-z_vBONoLFbtbLmuDOCpgAYvha9S0_1759190319 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: LC4CQQJ3MWTKTL3SRMHQPEEEOXLZ762C X-Message-ID-Hash: LC4CQQJ3MWTKTL3SRMHQPEEEOXLZ762C X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: dgibson@redhat.com, david@gibson.dropbear.id.au, passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Sat, 27 Sep 2025 15:25:14 -0400 Jon Maloy wrote: > The solution to bug https://bugs.passt.top/show_bug.cgi?id=120 > requires the ability to translate from an IP address to its > corresponding MAC address in cases where those are present in > the ARP or NDP tables. > > To keep track of the contents of these tables we add a netlink > based neighbour subscription feature. > > Signed-off-by: Jon Maloy > > --- > v3: - Added an attribute contianing NDA_DST to sent message, so > that we let the kernel do the filtering of the IP address > and return only one entry. > - Added interface index to the call signature. Since the only > interface we know is the template interface, this limits > the number of hosts that will be seen as 'network segment > local' from a PASST viewpoint. > v4: - Made loop independent of attribute order. > - Ignoring L2 addresses which are not of size ETH_ALEN. > v5: - Changed return value of new function, so caller can know if > a MAC address really was found. > v6: - Removed warning printout which had ended up in the wrong > commit. > v8: - Changed to neighbour event subscription model > - netlink: arp/ndp table subscription > v10:- Updated according to David's latest comments on v8 > - Added functionaly where we initially read current > state of ARP/NDP tables > --- > epoll_type.h | 2 + > netlink.c | 180 +++++++++++++++++++++++++++++++++++++++++++++++++++ > netlink.h | 4 ++ > passt.c | 8 +++ > 4 files changed, 194 insertions(+) > > diff --git a/epoll_type.h b/epoll_type.h > index 12ac59b..a90ffb6 100644 > --- a/epoll_type.h > +++ b/epoll_type.h > @@ -44,6 +44,8 @@ enum epoll_type { > EPOLL_TYPE_REPAIR_LISTEN, > /* TCP_REPAIR helper socket */ > EPOLL_TYPE_REPAIR, > + /* Netlink neighbour subscription socket */ > + EPOLL_TYPE_NL_NEIGH, > > EPOLL_NUM_TYPES, > }; > diff --git a/netlink.c b/netlink.c > index c436780..0dc4e9d 100644 > --- a/netlink.c > +++ b/netlink.c > @@ -26,6 +26,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -40,6 +41,10 @@ > #define RTNH_NEXT_AND_DEC(rtnh, attrlen) \ > ((attrlen) -= RTNH_ALIGN((rtnh)->rtnh_len), RTNH_NEXT(rtnh)) > > +/* Convenience macro borrowed from kernel */ > +#define NUD_VALID (NUD_PERMANENT | NUD_NOARP | \ > + NUD_REACHABLE | NUD_PROBE | NUD_STALE) In general (see especially util.h) we do: #define NUD_VALID \ (NUD_PERMANENT | NUD_NOARP | NUD_REACHABLE | NUD_PROBE | NUD_STALE) ...rationale: it was on one line, and now the most important part remains on the same one line. > + > /* Netlink expects a buffer of at least 8kiB or the system page size, > * whichever is larger. 32kiB is recommended for more efficient. > * Since the largest page size on any remotely common Linux setup is > @@ -53,6 +58,7 @@ > int nl_sock = -1; > int nl_sock_ns = -1; These two assignments are aligned for (debatable) eye candy. Given that nl_seq is recycled for nl_sock_neigh as well, maybe nl_sock_neigh could be moved here and you could add extra tabs to these two to align the -1 values again. > static int nl_seq = 1; > +static int nl_sock_neigh = -1; > > /** > * nl_sock_init_do() - Set up netlink sockets in init or target namespace > @@ -1103,3 +1109,177 @@ int nl_link_set_flags(int s, unsigned int ifi, > > return nl_do(s, &req, RTM_NEWLINK, 0, sizeof(req)); > } > + > +/** > + * nl_neigh_msg_read() - Interpret a neigbour state message from netlink The usual description of parameters is missing here. > + * > + */ > +static void nl_neigh_msg_read(const struct ctx *c, struct nlmsghdr *nh) > +{ > + struct ndmsg *ndm = NLMSG_DATA(nh); > + struct rtattr *rta = (struct rtattr *)(ndm + 1); > + size_t na = NLMSG_PAYLOAD(nh, sizeof(*ndm)); > + char ip_str[INET6_ADDRSTRLEN]; > + char mac_str[ETH_ADDRSTRLEN]; > + const uint8_t *lladdr = NULL; > + const void *dst = NULL; > + size_t lladdr_len = 0; > + uint8_t mac[ETH_ALEN]; > + union inany_addr addr; > + size_t dstlen = 0; > + > + if (nh->nlmsg_type == NLMSG_DONE) > + return; > + > + if (nh->nlmsg_type == NLMSG_ERROR) { > + warn("NLMSG type error"); > + return; > + } > + > + if (nh->nlmsg_type != RTM_NEWNEIGH && > + nh->nlmsg_type != RTM_DELNEIGH) > + return; > + > + for (; RTA_OK(rta, na); rta = RTA_NEXT(rta, na)) { > + switch (rta->rta_type) { > + case NDA_DST: > + dst = RTA_DATA(rta); > + dstlen = RTA_PAYLOAD(rta); > + break; > + case NDA_LLADDR: > + lladdr = RTA_DATA(rta); > + lladdr_len = RTA_PAYLOAD(rta); > + break; > + default: > + break; > + } > + } > + > + if (!dst) > + return; > + > + if (!lladdr || lladdr_len != ETH_ALEN) > + return; > + > + if (ndm->ndm_type != ARPHRD_ETHER) > + return; > + > + memcpy(mac, lladdr, ETH_ALEN); > + eth_ntop(mac, mac_str, sizeof(mac_str)); > + > + if (ndm->ndm_family == AF_INET && > + (dstlen != sizeof(struct in_addr) || > + ndm->ndm_ifindex != c->ifi4)) > + return; > + > + if (ndm->ndm_family == AF_INET6 && > + (dstlen != sizeof(struct in6_addr) || > + ndm->ndm_ifindex != c->ifi6)) > + return; > + > + if (ndm->ndm_family == AF_INET) > + addr = inany_from_v4(*(struct in_addr *) dst); > + else if (ndm->ndm_family == AF_INET6) > + addr.a6 = *(struct in6_addr *) dst; We never add spaces between cast and variable name: *(struct in6_addr *)dst; > + else > + return; > + > + inet_ntop(ndm->ndm_family, dst, ip_str, sizeof(ip_str)); > + > + if (nh->nlmsg_type == RTM_NEWNEIGH && > + ndm->ndm_state & NUD_VALID) { This fits on one line, and you can drop the curly brackets then. > + trace("neigh table update: %s / %s", ip_str, mac_str); > + } else { > + trace("neigh table delete: %s / %s", ip_str, mac_str); > + } > +} > + > +/** > + * nl_neigh_subscr_sync() - Read current contents ARP/NDP tables ...where "subscr" indicates the whole mechanism by referring to the subscription socket, that is, some kind of synecdoche, but I wonder if it's really helpful here. What about nl_neigh_sync()? > + * Usual description of arguments missing. > + */ > +static void nl_neigh_subscr_sync(const struct ctx *c, int proto, int ifi) > +{ > + struct { > + struct nlmsghdr nlh; > + struct ndmsg ndm; > + } req = { 0 }; > + struct nlmsghdr *nh; > + char buf[NLBUFSIZ]; > + ssize_t status; > + uint32_t seq; > + > + req.ndm.ndm_family = proto; > + req.ndm.ndm_ifindex = ifi; You could initialise 'req' to those right away, see other functions in netlink.c for examples (say, nl_route_get_def()). > + seq = nl_send(nl_sock_neigh, &req, RTM_GETNEIGH, NLM_F_DUMP, sizeof(req)); > + nl_foreach_oftype(nh, status, nl_sock_neigh, buf, seq, RTM_NEWNEIGH) > + nl_neigh_msg_read(c, nh); > +} > + > +/** > + * nl_neigh_subscr_handler() - Non-blocking drain of pending neighbour updates This function name brings me to a different naming suggestion. Given that you're handling notifications, what about nl_neigh_notify_handler(), and nl_neigh_notify_init()? It's a bit easier to read/write as it's not a truncated word. > + * @c: Execution context Mixed tab ^ and ^^^^^^ spaces. > + */ > +void nl_neigh_subscr_handler(struct ctx *c) Cppcheck (the guy from 'make cppcheck') says: netlink.c:1225:42: style: Parameter 'c' can be declared as pointer to const [constParameterPointer] void nl_neigh_subscr_handler(struct ctx *c) and I think it actually makes sense. > +{ > + struct nlmsghdr *nh; > + char buf[NLBUFSIZ]; > + ssize_t n; ...it also says: netlink.c:1227:19: style: The scope of the variable 'nh' can be reduced. [variableScope] struct nlmsghdr *nh; ^ netlink.c:1229:10: style: The scope of the variable 'n' can be reduced. [variableScope] ssize_t n; ^ > + > + for (;;) { > + n = recv(nl_sock_neigh, buf, sizeof(buf), MSG_DONTWAIT); > + if (n < 0) { > + if (errno != EAGAIN) > + warn_perror("Failed to read from netlink sock"); > + return; > + } Can we ever get end-of-file for whatever reason? What about returning on n <= 0 just to be on the safe side? > + nh = (struct nlmsghdr *)buf; > + for (; NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) > + nl_neigh_msg_read(c, nh); > + } > +} > + > +/** > + * nl_neigh_subscr_init() - Open netlink sock and subscribe to neighbour events The fact we're opening the netlink socket here is kind of obvious. If you just say we subscribe to neighbour events, it fits in 80 columns. > + * * @c: Execution context > + * Return: 0 on success, -1 on failure We're always getting something nice in errno for those, I would actually suggest spending a moment right now to fetch that and return as negative value so that we can use that in the caller if ever needed. Should you do so, though, note that close() sets errno on failure. > + */ > +int nl_neigh_subscr_init(struct ctx *c) > +{ > + union epoll_ref ref = { > + .type = EPOLL_TYPE_NL_NEIGH > + }; > + struct epoll_event ev = { > + .events = EPOLLIN > + }; > + struct sockaddr_nl addr = { > + .nl_family = AF_NETLINK, > + .nl_groups = RTMGRP_NEIGH, > + }; > + > + if (nl_sock_neigh >= 0) > + return 0; > + > + nl_sock_neigh = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, NETLINK_ROUTE); > + if (nl_sock_neigh < 0) Should we log a warn() message then? It might be relevant with LSMs (AppArmor, SELinux, etc.). > + return -1; > + > + if (bind(nl_sock_neigh, (struct sockaddr *)&addr, sizeof(addr)) < 0) { Same here. > + close(nl_sock_neigh); > + nl_sock_neigh = -1; > + return -1; > + } > + > + ref.fd = nl_sock_neigh; Static Checkers Hate This One Trick: netlink.c:1275:9: style: Redundant initialization for 'ref'. The initialized value is overwritten before it is read. [redundantInitialization] ref.fd = nl_sock_neigh; ^ netlink.c:1251:22: note: ref is initialized union epoll_ref ref = { ^ netlink.c:1275:9: note: ref is overwritten ref.fd = nl_sock_neigh; ^ > + ev.data.u64 = ref.u64; > + if (epoll_ctl(c->epollfd, EPOLL_CTL_ADD, nl_sock_neigh, &ev) == -1) { > + close(nl_sock_neigh); > + nl_sock_neigh = -1; > + return -1; > + } > + > + nl_neigh_subscr_sync(c, AF_INET, c->ifi4); > + nl_neigh_subscr_sync(c, AF_INET6, c->ifi6); > + > + return 0; > +} > diff --git a/netlink.h b/netlink.h > index b51e99c..a7d3506 100644 > --- a/netlink.h > +++ b/netlink.h > @@ -17,6 +17,8 @@ int nl_route_dup(int s_src, unsigned int ifi_src, > int s_dst, unsigned int ifi_dst, sa_family_t af); > int nl_addr_get(int s, unsigned int ifi, sa_family_t af, > void *addr, int *prefix_len, void *addr_l); > +bool nl_neigh_mac_get(int s, const union inany_addr *addr, int ifi, > + unsigned char *mac); > int nl_addr_set(int s, unsigned int ifi, sa_family_t af, > const void *addr, int prefix_len); > int nl_addr_get_ll(int s, unsigned int ifi, struct in6_addr *addr); > @@ -28,5 +30,7 @@ int nl_link_set_mac(int s, unsigned int ifi, const void *mac); > int nl_link_set_mtu(int s, unsigned int ifi, int mtu); > int nl_link_set_flags(int s, unsigned int ifi, > unsigned int set, unsigned int change); > +int nl_neigh_subscr_init(struct ctx *c); > +void nl_neigh_subscr_handler(struct ctx *c); > > #endif /* NETLINK_H */ > diff --git a/passt.c b/passt.c > index 31fbb75..e20bbad 100644 > --- a/passt.c > +++ b/passt.c > @@ -53,6 +53,7 @@ > #include "vu_common.h" > #include "migrate.h" > #include "repair.h" > +#include "netlink.h" > > #define EPOLL_EVENTS 8 > > @@ -79,6 +80,7 @@ char *epoll_type_str[] = { > [EPOLL_TYPE_VHOST_KICK] = "vhost-user kick socket", > [EPOLL_TYPE_REPAIR_LISTEN] = "TCP_REPAIR helper listening socket", > [EPOLL_TYPE_REPAIR] = "TCP_REPAIR helper socket", > + [EPOLL_TYPE_NL_NEIGH] = "netlink neighbour subscription socket", ...or notification socket. I mean, you subscribe to events, but that doesn't describe its main function. > }; > static_assert(ARRAY_SIZE(epoll_type_str) == EPOLL_NUM_TYPES, > "epoll_type_str[] doesn't match enum epoll_type"); > @@ -322,6 +324,9 @@ int main(int argc, char **argv) > > pcap_init(&c); > > + if (nl_neigh_subscr_init(&c) < 0) > + warn("Failed to subscribe to RTMGRP_NEIGH"); > + > if (!c.foreground) { > if ((devnull_fd = open("/dev/null", O_RDWR | O_CLOEXEC)) < 0) > die_perror("Failed to open /dev/null"); > @@ -414,6 +419,9 @@ loop: > case EPOLL_TYPE_REPAIR: > repair_handler(&c, eventmask); > break; > + case EPOLL_TYPE_NL_NEIGH: > + nl_neigh_subscr_handler(&c); > + break; > default: > /* Can't happen */ > ASSERT(0); -- Stefano