From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=MBl4q9/V; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id A3D265A0271 for ; Wed, 24 Sep 2025 20:40:43 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1758739242; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3LYbGV+wpUypgwGznzjOqPITvYWhtNzytW8GgMt1fZ0=; b=MBl4q9/VRKpk13Xw+p0NX5jPQjMz1EDZDV504wTUbRQlLBnrpvq3xPGBq1yYGN6gv5Rsih V6i0zoPypcwKFdpeCPwxj76AsRTwHAarKnjC75W/1NBeTuYhtzWFiqS48dZcOOxRrfFeMa LdBn+tdGi9TK2tpWwhhfz2crlXJZ7oM= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-655-ZhH5so5mPiy_xoqgcOsUtg-1; Wed, 24 Sep 2025 14:40:36 -0400 X-MC-Unique: ZhH5so5mPiy_xoqgcOsUtg-1 X-Mimecast-MFC-AGG-ID: ZhH5so5mPiy_xoqgcOsUtg_1758739236 Received: by mail-qk1-f198.google.com with SMTP id af79cd13be357-82968fe9e8cso60696885a.0 for ; Wed, 24 Sep 2025 11:40:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758739236; x=1759344036; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3LYbGV+wpUypgwGznzjOqPITvYWhtNzytW8GgMt1fZ0=; b=QtqudLVh3OmzpwaVyKg8TDFSqFQgd36o3runUTzBJsNt3n9BsU4H2AN4mq1hZ2NYJ8 EnzqYPjtNSKMq0OOk7ui9gX2IV/WktT8q7mVn9I6KCCdcKZv/TRsGyBRqNawpsnbVObB gcyXgMwshiQyPnABIQvJ/y1ZwP+zV2tERR+qjWClPuld9tCA8xDJR7uEaGjyjeRqbiG4 uGLiHo1LYDuDG1SatngLD7mBuWzbPP9rpLnlmEsm3sJKLmbyURCFRPfzbqTgaIkCaBZv VyO2COCgyZqoBs6AUTTFh7E3zu7yXk9R4xLFUwfyB1PanZhCsCAexx2gTDck+VwvFVgB BjKA== X-Forwarded-Encrypted: i=1; AJvYcCUF2Y19QbzTRkm5csFnStq9y1QUuJ87Ovj86Hpx0wiWTjE/fdU8hjbe0J5sdSLk5hspUhXJ0AW0SOo=@passt.top X-Gm-Message-State: AOJu0Yzh7K+IgIStb692XwEnF4/JpjU6Kw/zVT6tf3NEDOvdDM7L0J1E 9VWZpSCbsTOJ3AiApX39noz33PzkPa34hPy52GnNNnqYzO3rHCQShu7Rdao9dHeaWCtXteFTucY PPoHcsfC4atFHImH+LrX1ri0mX+XkH+c3BqmPSSAcBg+BjyeRcaCDuA== X-Gm-Gg: ASbGncvA9sPWrFM/ZZe1hqbJ83QkzdVv8dLYJ7NMVKZNEAsXSSSdmk5KSO0RCDElbyj WS8WkDg6DsBQPVK8zqt8ZjBIvt4lMsUXg5Z5PyoBdWb1bnX644NUgYKLFP8pmnIaDoI0RXfZXuc 5+4M/u4A81vC1EJcGFnXKDWhcGleeaUAmFuo4WWEw0Zeny72AgNLwnE6bCoyhC6baN/wrmZ85Nc t6LkmbZvvdaaffxpHq8p5KAoWa5ngWwFTGJUJitriLuCiS6+iSgzrZXgMiWvtlmG24eJ66SsxkZ XScL/oAdfZ5QLso/99GP0cFiCKoFYgtdnrMEOiVi0k2sN+VIuTWRGjKqZBUH0RHVUjCWrmPqzNU wPy6BrO1WuQ== X-Received: by 2002:a05:620a:172a:b0:84a:5f04:4c5d with SMTP id af79cd13be357-85ae071160dmr100023185a.29.1758739235427; Wed, 24 Sep 2025 11:40:35 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFkGLjYRhkp+NM9vAAmtrrBhNTsx6I/pz4YJr41v5MiOpBF1x1VSnDDPYpZFi79sKjUoerzJA== X-Received: by 2002:a05:620a:172a:b0:84a:5f04:4c5d with SMTP id af79cd13be357-85ae071160dmr100016185a.29.1758739234668; Wed, 24 Sep 2025 11:40:34 -0700 (PDT) Received: from ?IPV6:2001:4958:2206:8901:6025:1483:4146:72dd? ([2001:4958:2206:8901:6025:1483:4146:72dd]) by smtp.gmail.com with ESMTPSA id af79cd13be357-84bc62df395sm537432585a.40.2025.09.24.11.40.33 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 24 Sep 2025 11:40:34 -0700 (PDT) Message-ID: <87aafa5b-bc1c-4fe2-808d-3a7514ab11ce@redhat.com> Date: Wed, 24 Sep 2025 14:40:33 -0400 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v9 1/9] netlink: add subsciption on changes in NDP/ARP table To: David Gibson References: <20250924011330.1168921-1-jmaloy@redhat.com> <20250924011330.1168921-2-jmaloy@redhat.com> From: Jon Maloy In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: qfktIedyyrhKoImDWfN_IYGEooC3DTrEGomtGydy83w_1758739236 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Message-ID-Hash: Q6UWBPERTW7YT74EIKZYJSO5HVESRN62 X-Message-ID-Hash: Q6UWBPERTW7YT74EIKZYJSO5HVESRN62 X-MailFrom: jmaloy@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: sbrivio@redhat.com, dgibson@redhat.com, passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On 2025-09-23 22:47, David Gibson wrote: > On Tue, Sep 23, 2025 at 09:13:22PM -0400, Jon Maloy wrote: >> The solution to bug https://bugs.passt.top/show_bug.cgi?id=120 >> requires the ability to translate from an IP address to its >> corresponding MAC address in cases where those are present in >> the ARP or NDP tables. >> >> To keep track of the contents of these tables we add a netlink >> based neighbour subscription feature. >> >> Signed-off-by: Jon Maloy >> Reviewed-by: David Gibson > > The change to a subscription model is large enough that I think it > warrants dropping pre-existing reviews. > >> --- >> v3: - Added an attribute contianing NDA_DST to sent message, so >> that we let the kernel do the filtering of the IP address >> and return only one entry. >> - Added interface index to the call signature. Since the only >> interface we know is the template interface, this limits >> the number of hosts that will be seen as 'network segment >> local' from a PASST viewpoint. >> v4: - Made loop independent of attribute order. >> - Ignoring L2 addresses which are not of size ETH_ALEN. >> v5: - Changed return value of new function, so caller can know if >> a MAC address really was found. >> v6: - Removed warning printout which had ended up in the wrong >> commit. >> v8: - Changed to neighbour event subscription model >> >> netlink: arp/ndp table subscription >> --- >> epoll_type.h | 2 + >> netlink.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++ >> netlink.h | 4 ++ >> passt.c | 8 ++++ >> 4 files changed, 128 insertions(+) >> >> diff --git a/epoll_type.h b/epoll_type.h >> index 12ac59b..a90ffb6 100644 >> --- a/epoll_type.h >> +++ b/epoll_type.h >> @@ -44,6 +44,8 @@ enum epoll_type { >> EPOLL_TYPE_REPAIR_LISTEN, >> /* TCP_REPAIR helper socket */ >> EPOLL_TYPE_REPAIR, >> + /* Netlink neighbour subscription socket */ >> + EPOLL_TYPE_NL_NEIGH, >> >> EPOLL_NUM_TYPES, >> }; >> diff --git a/netlink.c b/netlink.c >> index c436780..1faf3da 100644 >> --- a/netlink.c >> +++ b/netlink.c >> @@ -53,6 +53,7 @@ >> int nl_sock = -1; >> int nl_sock_ns = -1; >> static int nl_seq = 1; >> +static int nl_sock_neigh = -1; >> >> /** >> * nl_sock_init_do() - Set up netlink sockets in init or target namespace >> @@ -84,6 +85,119 @@ static int nl_sock_init_do(void *arg) >> return 0; >> } >> >> +/** >> + * nl_neigh_subscr_init() - Open a NETLINK_ROUTE socket and subscribe to neighbor events >> + * >> + * Return: 0 on success, -1 on failure >> + */ >> +int nl_neigh_subscr_init(struct ctx *c) >> +{ >> + struct epoll_event ev = { 0 }; >> + union epoll_ref ref = { .type = EPOLL_TYPE_NL_NEIGH, .fd = 0 }; > > You overwrite the .fd field, so there's not need to include it in the > initializer. (Also 0 is a bad placeholder, since 0 is a valid fd). ok > >> + >> + struct sockaddr_nl addr = { >> + .nl_family = AF_NETLINK, >> + .nl_groups = RTMGRP_NEIGH, >> + }; >> + >> + if (nl_sock_neigh >= 0) >> + return 0; >> + >> + nl_sock_neigh = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, NETLINK_ROUTE); >> + if (nl_sock_neigh < 0) >> + return -1; >> + >> + if (bind(nl_sock_neigh, (struct sockaddr *)&addr, sizeof(addr)) < 0) { >> + close(nl_sock_neigh); >> + nl_sock_neigh = -1; >> + return -1; >> + } >> + >> + ref.fd = nl_sock_neigh; >> + ev.events = EPOLLIN; > > You could set this in the initializer. ok > >> + ev.data.u64 = ref.u64; >> + if (epoll_ctl(c->epollfd, EPOLL_CTL_ADD, nl_sock_neigh, &ev) == -1) { >> + close(nl_sock_neigh); >> + nl_sock_neigh = -1; >> + return -1; >> + } >> + >> + return 0; >> +} >> + >> +/** >> + * nl_neigh_subscr_handler() - Non-blocking drain of pending neighbor updates >> + * @c: Execution context >> + */ >> +void nl_neigh_subscr_handler(struct ctx *c) >> +{ >> + struct nlmsghdr *nh; >> + char buf[NLBUFSIZ]; >> + ssize_t n; >> + >> + if (nl_sock_neigh < 0) >> + return; >> + >> + for (;;) { >> + n = recv(nl_sock_neigh, buf, sizeof(buf), MSG_DONTWAIT); >> + if (n <= 0) >> + return; > > EAGAIN correctly results in a silent return, but you probably want to > report other errors and EOF (n == 0) since those are not expected. Ok. But EOF is never returned from netlink. I can issue a warn_perror() for all other negative return values than EAGAIN. > >> + >> + nh = (struct nlmsghdr *)buf; >> + for (; NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) { >> + struct ndmsg *ndm = NLMSG_DATA(nh); >> + struct rtattr *rta = (struct rtattr *)(ndm + 1); >> + size_t na = NLMSG_PAYLOAD(nh, sizeof(*ndm)); >> + const uint8_t *lladdr = NULL; >> + const void *dst = NULL; >> + size_t lladdr_len = 0; >> + size_t dstlen = 0; >> + >> + if (nh->nlmsg_type == NLMSG_DONE || >> + nh->nlmsg_type == NLMSG_ERROR) >> + continue; > > IIUC, NLMSG_ERROR would also be unexpected, so should probably be > reported. > >> + if (nh->nlmsg_type != RTM_NEWNEIGH && >> + nh->nlmsg_type != RTM_DELNEIGH) >> + continue; > > Should we filter on ndm->ndm_ifindex for neighbours on the template > interface only? I'm not sure. Yes. I was wondering about this, because most events are state changes we aren't really interested in. My research indicates I can ignore all states except NUD_VALID (==up) and (INCOMPLETE | FAILED) (== down). This cannot be set in the subscription itself unless I want to involve eBPF (I don't), so the right place is to filter it here. That will save at least some cycles. > > It seems like we probably should filter on . > >> + for (; RTA_OK(rta, na); rta = RTA_NEXT(rta, na)) { >> + switch (rta->rta_type) { >> + case NDA_DST: >> + dst = RTA_DATA(rta); >> + dstlen = RTA_PAYLOAD(rta); >> + break; >> + case NDA_LLADDR: >> + lladdr = RTA_DATA(rta); >> + lladdr_len = RTA_PAYLOAD(rta); >> + break; >> + default: >> + break; >> + } >> + } > > It seems like somewhere there ought to be something specifying the > address type of both DST and LLADDR, but I'm not sure exactly where. > We should check those, just in case some weirdo runs us on a host with > IPX over Token Ring. There is (ndm_family == AF_INET/6) for DST, and (lladdr_len == ETH_ALEN). The latter sounds like a good idea, while the fist sounds slightly paranoid. OK, we do have AF_TIPC when giving it a second thought ;-) ///jon > >> + if (!dst) >> + continue; >> + >> + if (dstlen != sizeof(struct in_addr) && >> + dstlen != sizeof(struct in6_addr)) >> + continue; >> + >> + char abuf[INET6_ADDRSTRLEN]; >> + >> + if (dstlen == sizeof(struct in_addr)) >> + inet_ntop(AF_INET, dst, abuf, sizeof(abuf)); >> + else >> + inet_ntop(AF_INET6, dst, abuf, sizeof(abuf)); >> + >> + if (nh->nlmsg_type == RTM_NEWNEIGH) >> + debug("neigh: NEW %s lladdr_len=%zu", abuf, lladdr_len); >> + else >> + debug("neigh: DEL %s", abuf); >> + } >> + } >> +} >> + >> /** >> * nl_sock_init() - Call nl_sock_init_do(), won't return on failure >> * @c: Execution context >> diff --git a/netlink.h b/netlink.h >> index b51e99c..a7d3506 100644 >> --- a/netlink.h >> +++ b/netlink.h >> @@ -17,6 +17,8 @@ int nl_route_dup(int s_src, unsigned int ifi_src, >> int s_dst, unsigned int ifi_dst, sa_family_t af); >> int nl_addr_get(int s, unsigned int ifi, sa_family_t af, >> void *addr, int *prefix_len, void *addr_l); >> +bool nl_neigh_mac_get(int s, const union inany_addr *addr, int ifi, >> + unsigned char *mac); >> int nl_addr_set(int s, unsigned int ifi, sa_family_t af, >> const void *addr, int prefix_len); >> int nl_addr_get_ll(int s, unsigned int ifi, struct in6_addr *addr); >> @@ -28,5 +30,7 @@ int nl_link_set_mac(int s, unsigned int ifi, const void *mac); >> int nl_link_set_mtu(int s, unsigned int ifi, int mtu); >> int nl_link_set_flags(int s, unsigned int ifi, >> unsigned int set, unsigned int change); >> +int nl_neigh_subscr_init(struct ctx *c); >> +void nl_neigh_subscr_handler(struct ctx *c); >> >> #endif /* NETLINK_H */ >> diff --git a/passt.c b/passt.c >> index 31fbb75..e20bbad 100644 >> --- a/passt.c >> +++ b/passt.c >> @@ -53,6 +53,7 @@ >> #include "vu_common.h" >> #include "migrate.h" >> #include "repair.h" >> +#include "netlink.h" >> >> #define EPOLL_EVENTS 8 >> >> @@ -79,6 +80,7 @@ char *epoll_type_str[] = { >> [EPOLL_TYPE_VHOST_KICK] = "vhost-user kick socket", >> [EPOLL_TYPE_REPAIR_LISTEN] = "TCP_REPAIR helper listening socket", >> [EPOLL_TYPE_REPAIR] = "TCP_REPAIR helper socket", >> + [EPOLL_TYPE_NL_NEIGH] = "netlink neighbour subscription socket", >> }; >> static_assert(ARRAY_SIZE(epoll_type_str) == EPOLL_NUM_TYPES, >> "epoll_type_str[] doesn't match enum epoll_type"); >> @@ -322,6 +324,9 @@ int main(int argc, char **argv) >> >> pcap_init(&c); >> >> + if (nl_neigh_subscr_init(&c) < 0) >> + warn("Failed to subscribe to RTMGRP_NEIGH"); >> + >> if (!c.foreground) { >> if ((devnull_fd = open("/dev/null", O_RDWR | O_CLOEXEC)) < 0) >> die_perror("Failed to open /dev/null"); >> @@ -414,6 +419,9 @@ loop: >> case EPOLL_TYPE_REPAIR: >> repair_handler(&c, eventmask); >> break; >> + case EPOLL_TYPE_NL_NEIGH: >> + nl_neigh_subscr_handler(&c); >> + break; >> default: >> /* Can't happen */ >> ASSERT(0); >> -- >> 2.50.1 >> >