From: Jon Maloy <jmaloy@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: sbrivio@redhat.com, dgibson@redhat.com, passt-dev@passt.top
Subject: Re: [RFC 07/12] netlink: Subscribe to link/address changes in namespace
Date: Mon, 15 Dec 2025 18:25:53 -0500 [thread overview]
Message-ID: <6acb974d-c0dc-4bc3-82e0-360099a67a99@redhat.com> (raw)
In-Reply-To: <aT_jqXFr_KWHW1WZ@zatzit>
On 2025-12-15 05:32, David Gibson wrote:
> On Sun, Dec 14, 2025 at 08:54:36PM -0500, Jon Maloy wrote:
>> We add subscriptions to RTMGRP_LINK, RTMGRP_IPV4_IFADDR, and
>> RTMGRP_IPV6_IFADDR, so that we can receive notifications when link
>> state or addresses change on the namespace interface.
>>
>> When addresses are discovered via netlink:
>>
>> - We mark them as non-permanent, which means they can be modified or
>> deleted by subsequent events.
>> - We apply the prefix indicated in the notification.
>> - Update addr_seen to track the new address as the active one.
>
> addr_seen isn't really about an "active" address. The expectation was
> that the guest would only use a single address, it just might not be
> the one we told it to.
>
> Now that we're aiming to allow multiple concurrent addresses, we can
> expect the guest to use all of them actively.
Right. This makes the case for having guest/tap side subscriptions,
since we now will now know all addresses he is using, so addr_seen
should become obsolete.
>
>> This provides the foundation for dynamic address monitoring,
>> and supports runtime network changes.
>>
>> Signed-off-by: Jon Maloy <jmaloy@redhat.com>
>> ---
>> epoll_type.h | 2 +
>> netlink.c | 370 +++++++++++++++++++++++++++++++++++++++++++++++++++
>> netlink.h | 3 +
>> passt.c | 5 +
>> passt.h | 1 +
>> tap.c | 6 +-
>> 6 files changed, 384 insertions(+), 3 deletions(-)
>>
>> diff --git a/epoll_type.h b/epoll_type.h
>> index a90ffb6..0a16d94 100644
>> --- a/epoll_type.h
>> +++ b/epoll_type.h
>> @@ -46,6 +46,8 @@ enum epoll_type {
>> EPOLL_TYPE_REPAIR,
>> /* Netlink neighbour subscription socket */
>> EPOLL_TYPE_NL_NEIGH,
>> + /* Netlink link/address subscription socket */
>> + EPOLL_TYPE_NL_LINKADDR,
>>
>> EPOLL_NUM_TYPES,
>> };
>> diff --git a/netlink.c b/netlink.c
>> index 82a2f0c..7492f17 100644
>> --- a/netlink.c
>> +++ b/netlink.c
>> @@ -35,6 +35,9 @@
>> #include "passt.h"
>> #include "log.h"
>> #include "ip.h"
>> +#include "tap.h"
>> +#include "arp.h"
>> +#include "ndp.h"
>> #include "netlink.h"
>> #include "epoll_ctl.h"
>>
>> @@ -59,6 +62,7 @@
>> int nl_sock = -1;
>> int nl_sock_ns = -1;
>> static int nl_sock_neigh = -1;
>> +static int nl_sock_linkaddr = -1;
>> static int nl_seq = 1;
>>
>> /**
>> @@ -91,6 +95,372 @@ static int nl_sock_init_do(void *arg)
>> return 0;
>> }
>>
>> +/**
>> + * nl_addr4_find() - Find an IPv4 address in the address array
>> + * @c: Execution context
>> + * @addr: Address to find
>> + *
>> + * Return: index if found, -1 otherwise
>> + */
>> +static int nl_addr4_find(const struct ctx *c, const struct in_addr *addr)
>> +{
>> + int i;
>> +
>> + for (i = 0; i < c->ip4.addr_count; i++)
>> + if (IN4_ARE_ADDR_EQUAL(&c->ip4.addrs[i].addr, addr))
>> + return (int)i;
>> +
>> + return -1;
>> +}
>> +
>> +/**
>> + * nl_addr6_find() - Find an IPv6 address in the address array
>> + * @c: Execution context
>> + * @addr: Address to find
>> + *
>> + * Return: index if found, -1 otherwise
>> + */
>> +static int nl_addr6_find(const struct ctx *c, const struct in6_addr *addr)
>> +{
>> + int i;
>> +
>> + for (i = 0; i < c->ip6.addr_count; i++)
>> + if (IN6_ARE_ADDR_EQUAL(&c->ip6.addrs[i].addr, addr))
>> + return (int)i;
>> +
>> + return -1;
>> +}
>> +
>> +/**
>> + * nl_addr4_add() - Add a discovered IPv4 address to the address array
>> + * @c: Execution context
>> + * @addr: Address to add
>> + * @prefix_len: Prefix length
>> + *
>> + * Return: true if added or updated, false if array full or already permanent
>> + */
>> +static bool nl_addr4_add(struct ctx *c, const struct in_addr *addr,
>> + int prefix_len)
>> +{
>> + int idx = nl_addr4_find(c, addr);
>> +
>> + if (idx >= 0) {
>> + /* Address exists - if permanent, don't touch; else update */
>> + if (c->ip4.addrs[idx].permanent)
>> + return false;
>> + c->ip4.addrs[idx].prefix_len = prefix_len;
>> + return true;
>> + }
>> +
>> + /* New address - add if room */
>> + if (c->ip4.addr_count >= IP4_MAX_ADDRS) {
>> + debug("IPv4 address array full, ignoring discovered address");
>> + return false;
>> + }
>> +
>> + idx = c->ip4.addr_count++;
>> + c->ip4.addrs[idx].addr = *addr;
>> + c->ip4.addrs[idx].prefix_len = prefix_len;
>> + c->ip4.addrs[idx].permanent = 0;
>> + return true;
>> +}
>> +
>> +/**
>> + * nl_addr6_add() - Add a discovered IPv6 address to the address array
>> + * @c: Execution context
>> + * @addr: Address to add
>> + * @prefix_len: Prefix length
>> + *
>> + * Return: true if added or updated, false if array full or already permanent
>> + */
>> +static bool nl_addr6_add(struct ctx *c, const struct in6_addr *addr,
>> + int prefix_len)
>> +{
>> + int idx = nl_addr6_find(c, addr);
>> +
>> + if (idx >= 0) {
>> + /* Address exists - if permanent, don't touch; else update */
>> + if (c->ip6.addrs[idx].permanent)
>> + return false;
>> + c->ip6.addrs[idx].prefix_len = prefix_len;
>> + return true;
>> + }
>> +
>> + /* New address - add if room */
>> + if (c->ip6.addr_count >= IP6_MAX_ADDRS) {
>> + debug("IPv6 address array full, ignoring discovered address");
>> + return false;
>> + }
>> +
>> + idx = c->ip6.addr_count++;
>> + c->ip6.addrs[idx].addr = *addr;
>> + c->ip6.addrs[idx].prefix_len = prefix_len;
>> + c->ip6.addrs[idx].permanent = 0;
>> + return true;
>> +}
>> +
>> +/**
>> + * nl_addr4_del() - Remove an IPv4 address from the array if not permanent
>> + * @c: Execution context
>> + * @addr: Address to remove
>> + *
>> + * Return: true if removed, false if not found or permanent
>> + */
>> +static bool nl_addr4_del(struct ctx *c, const struct in_addr *addr)
>> +{
>> + int i, idx = nl_addr4_find(c, addr);
>> +
>> + if (idx < 0)
>> + return false;
>> +
>> + if (c->ip4.addrs[idx].permanent)
>> + return false;
>> +
>> + /* Shift remaining entries down */
>> + c->ip4.addr_count--;
>> + for (i = idx; i < c->ip4.addr_count; i++)
>> + c->ip4.addrs[i] = c->ip4.addrs[i + 1];
>> +
>> + return true;
>> +}
>> +
>> +/**
>> + * nl_addr6_del() - Remove an IPv6 address from the array if not permanent
>> + * @c: Execution context
>> + * @addr: Address to remove
>> + *
>> + * Return: true if removed, false if not found or permanent
>> + */
>> +static bool nl_addr6_del(struct ctx *c, const struct in6_addr *addr)
>> +{
>> + int i, idx = nl_addr6_find(c, addr);
>> +
>> + if (idx < 0)
>> + return false;
>> +
>> + if (c->ip6.addrs[idx].permanent)
>> + return false;
>> +
>> + /* Shift remaining entries down */
>> + c->ip6.addr_count--;
>> + for (i = idx; i < c->ip6.addr_count; i++)
>> + c->ip6.addrs[i] = c->ip6.addrs[i + 1];
>> +
>> + return true;
>> +}
>
> All the functions above are more to do with the data structure storing
> the addresses than they are to do with netlink. Better to move them
> into... maybe ip.c? And use them from conf.c as well.
Agreed. I'll fix that.
>
> Given the amount of near-duplication here, maybe it would be better to
> have a single table for v4 and v6 using inany_addr?
Yes.
>
>> +/**
>> + * nl_linkaddr_msg_read() - Parse and log a netlink link/addr message
>> + * @c: Execution context
>> + * @nh: Netlink message header
>> + */
>> +static void nl_linkaddr_msg_read(struct ctx *c, const struct nlmsghdr *nh)
>> +{
>> + if (nh->nlmsg_type == NLMSG_DONE || nh->nlmsg_type == NLMSG_ERROR)
>> + return;
>> +
>> + if (nh->nlmsg_type == RTM_NEWLINK || nh->nlmsg_type == RTM_DELLINK) {
>> + const struct ifinfomsg *ifm = NLMSG_DATA(nh);
>> + struct rtattr *rta = IFLA_RTA(ifm);
>> + size_t na = IFLA_PAYLOAD(nh);
>> + const char *name = "?";
>> + bool up = !!(ifm->ifi_flags & IFF_UP);
>> + bool running = !!(ifm->ifi_flags & IFF_RUNNING);
>> +
>> + for (; RTA_OK(rta, na); rta = RTA_NEXT(rta, na)) {
>> + if (rta->rta_type == IFLA_IFNAME) {
>> + name = (const char *)RTA_DATA(rta);
>> + break;
>> + }
>> + }
>> +
>> + /* Update pasta interface UP state if this is our interface */
>> + if (c->mode == MODE_PASTA &&
>> + (unsigned int)ifm->ifi_index == c->pasta_ifi) {
>> + c->pasta_ifi_up = up;
>> + debug("Interface %s", up ? "UP" : "DOWN");
>
> This only makes sense if we're listening to netlink messages in the
> guest netns, but the address stuff only makes sense listening to
> messages in the host netns.
See previous response.>
>> + }
>> +
>> + if (nh->nlmsg_type == RTM_NEWLINK)
>> + debug("Link %s (idx=%d): %s %s", name, ifm->ifi_index,
>> + up ? "UP" : "DOWN", running ? "RUNNING" : "");
>> + else
>> + debug("Link %s (idx=%d): DELETED", name, ifm->ifi_index);
>> +
>> + return;
>> + }
>> +
>> + if (nh->nlmsg_type == RTM_NEWADDR || nh->nlmsg_type == RTM_DELADDR) {
>> + bool is_new = (nh->nlmsg_type == RTM_NEWADDR);
>> + const struct ifaddrmsg *ifa = NLMSG_DATA(nh);
>> + char addr_str[INET6_ADDRSTRLEN];
>> + struct rtattr *rta = IFA_RTA(ifa);
>> + char ifname[IFNAMSIZ] = { 0 };
>> + size_t na = IFA_PAYLOAD(nh);
>> + void *addr = NULL;
>> + for (; RTA_OK(rta, na); rta = RTA_NEXT(rta, na)) {
>> + if (ifa->ifa_family == AF_INET &&
>> + rta->rta_type == IFA_LOCAL) {
>> + addr = RTA_DATA(rta);
>> + break;
>> + } else if (ifa->ifa_family == AF_INET6 &&
>> + rta->rta_type == IFA_ADDRESS) {
>> + addr = RTA_DATA(rta);
>> + break;
>> + }
>> + }
>> +
>> + if (!addr)
>> + return;
>> +
>> + if_indextoname(ifa->ifa_index, ifname);
>> + inet_ntop(ifa->ifa_family, addr, addr_str, sizeof(addr_str));
>> +
>> + debug("%s addr on %s (index=%d): %s/%i%s",
>> + is_new ? "NEW" : "DEL", ifname, ifa->ifa_index, addr_str,
>> + ifa->ifa_prefixlen,
>> + tap_is_ready(c) ? " (tap UP)" : " (tap DOWN)");
>> +
>> + /* Only handle our pasta interface */
>> + if (c->mode != MODE_PASTA || ifa->ifa_index != c->pasta_ifi)
>> + return;
>
> Nope. This is a host netns event, so comparing to pasta_ifi makes no
> sense. We _should_ be comparing to ifi4 or ifi6 (depending on address
> family), and we should probably do that before we go parsing the
> details above.
>
> We should also probably check for --no-copy-addrs here, too.
>
> In the other direction, even for PASST mode we can store this address
> in our table, it just won't do anything until DHCP or whatever
> consults it.
ok
>
>> +
>> + if (ifa->ifa_family == AF_INET) {
>> + struct in_addr *a = (struct in_addr *)addr;
>> +
>> + if (!is_new) {
>> + nl_addr4_del(c, a);
>> + return;
>> + }
>> +
>> + if (nl_addr4_add(c, a, ifa->ifa_prefixlen)) {
>> + c->ip4.addr_seen = *a;
>> + if (c->pasta_ifi_up && c->ifi4) {
>> + debug("Sending ARP");
>> + arp_send_init_req(c);
>
> What does this ARP request do? AFAICT we haven't actually added the
> address in the guest netns yet, so the guest won't respond to the ARP.
The address was set by the guest, so why wouldn't he respond?
>
>> + }
>> + }
>> + } else if (ifa->ifa_family == AF_INET6) {
>> + struct in6_addr *a = (struct in6_addr *)addr;
>> +
>> + if (!is_new) {
>> + nl_addr6_del(c, a);
>> + return;
>> + }
>> +
>> + if (nl_addr6_add(c, a,
>> + ifa->ifa_prefixlen)) {
>> + c->ip6.addr_seen = *a;
>> + if (c->pasta_ifi_up &&
>> + c->ifi6 && !c->no_ndp) {
>> + debug("Sending NDP");
>> + ndp_send_init_req(c);
>
> Some question with this NDP.
>
>> + }
>> + }
>> + }
>> + }
>> +}
>> +
>> +/**
>> + * nl_linkaddr_notify_handler() - Handle events from link/addr notifier socket
>> + * @c: Execution context
>> + */
>> +void nl_linkaddr_notify_handler(struct ctx *c)
>> +{
>> + char buf[NLBUFSIZ];
>> +
>> + for (;;) {
>> + ssize_t n = recv(nl_sock_linkaddr, buf, sizeof(buf), MSG_DONTWAIT);
>> + struct nlmsghdr *nh = (struct nlmsghdr *)buf;
>> +
>> + if (n < 0) {
>> + if (errno == EINTR)
>> + continue;
>> + if (errno != EAGAIN)
>> + debug("recv() error: %s", strerror_(errno));
>> + break;
>> + }
>> +
>> + debug("Received %zd bytes", n);
>> +
>> + for (; NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n))
>> + nl_linkaddr_msg_read(c, nh);
>> + }
>> +}
>> +
>> +/**
>> + * nl_linkaddr_init_do() - Actually create and bind the netlink socket
>> + * @arg: Execution context (for namespace entry) or NULL
>> + *
>> + * Return: 0 on success, -1 on failure
>> + */
>> +static int nl_linkaddr_init_do(void *arg)
>> +{
>> + struct sockaddr_nl addr = { .nl_family = AF_NETLINK,
>> + .nl_groups = RTMGRP_LINK | RTMGRP_IPV4_IFADDR |
>> + RTMGRP_IPV6_IFADDR };
>> +
>> + if (arg)
>> + ns_enter((struct ctx *)arg);
>> +
>> + nl_sock_linkaddr = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, NETLINK_ROUTE);
>
> Is there a reason to use an additional socket, rather than adding more
> events to the neighbour listening socket?
None in particular. I'll look into it.
>
>> + if (nl_sock_linkaddr < 0) {
>> + debug("socket() failed: %s", strerror_(errno));
>> + return -1;
>> + }
>> +
>> + if (bind(nl_sock_linkaddr, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
>> + debug("bind() failed: %s", strerror_(errno));
>> + close(nl_sock_linkaddr);
>> + nl_sock_linkaddr = -1;
>> + return -1;
>> + }
>> +
>> + debug("socket fd=%d", nl_sock_linkaddr);
>> + return 0;
>> +}
>> +
>> +/**
>> + * nl_linkaddr_notify_init() - Initialize link/address change notifier
>> + * @c: Execution context
>> + *
>> + * Return: 0 on success, -1 on failure
>> + */
>> +int nl_linkaddr_notify_init(const struct ctx *c)
>> +{
>> + union epoll_ref ref = { .type = EPOLL_TYPE_NL_LINKADDR };
>> + struct epoll_event ev = { .events = EPOLLIN };
>> +
>> + if (nl_sock_linkaddr >= 0) {
>> + debug("notifier already initialized (fd=%d)", nl_sock_linkaddr);
>> + return 0;
>> + }
>> +
>> + /* Open the notifier socket in the namespace for pasta mode,
>> + * or in the init namespace otherwise.
>
> Definitely wrong. We're trying to watch host addresses so that we can
> copy them to the guest - therefore we always need to watch in the host
> netns. On pasta there might be reasons to *also* listen in the guest
> netns, but that would want a different handler to do different things.
Hmm. I think I'll wait for your feedback on the remaining patches before
I comment on this.
/jon
next prev parent reply other threads:[~2025-12-15 23:26 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-15 1:54 [RFC 00/12] Support for multiple address and late binding Jon Maloy
2025-12-15 1:54 ` [RFC 01/12] ip: Introduce multi-address data structures for IPv4 and IPv6 Jon Maloy
2025-12-15 9:40 ` David Gibson
2025-12-15 22:05 ` Jon Maloy
2025-12-16 1:58 ` Jon Maloy
2025-12-16 3:14 ` David Gibson
2025-12-15 9:46 ` David Gibson
2025-12-15 1:54 ` [RFC 02/12] ip: Add ip4_default_prefix_len() helper function for class-based prefix Jon Maloy
2025-12-15 9:41 ` David Gibson
2025-12-15 1:54 ` [RFC 03/12] conf: Allow multiple -a/--address options per address family Jon Maloy
2025-12-15 9:53 ` David Gibson
2025-12-15 1:54 ` [RFC 04/12] conf: Apply -n/--netmask to most recently added address Jon Maloy
2025-12-15 9:54 ` David Gibson
2025-12-15 22:43 ` Jon Maloy
2025-12-15 1:54 ` [RFC 05/12] fwd: Check all configured addresses in guest accessibility functions Jon Maloy
2025-12-15 10:06 ` David Gibson
2025-12-15 1:54 ` [RFC 06/12] arp: Check all configured addresses in ARP filtering Jon Maloy
2025-12-15 10:07 ` David Gibson
2025-12-15 1:54 ` [RFC 07/12] netlink: Subscribe to link/address changes in namespace Jon Maloy
2025-12-15 10:32 ` David Gibson
2025-12-15 23:25 ` Jon Maloy [this message]
2025-12-16 3:21 ` David Gibson
2025-12-15 1:54 ` [RFC 08/12] netlink: Subscribe to route " Jon Maloy
2025-12-15 10:38 ` David Gibson
2025-12-15 1:54 ` [RFC 09/12] netlink: Add host-side monitoring for late template interface binding Jon Maloy
2025-12-15 1:54 ` [RFC 10/12] netlink: Add host-side route monitoring and propagation Jon Maloy
2025-12-15 1:54 ` [RFC 11/12] netlink: Prevent host route events from overwriting guest-configured gateway Jon Maloy
2025-12-15 1:54 ` [RFC 12/12] netlink: Rename tap interface when late binding discovers template name Jon Maloy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6acb974d-c0dc-4bc3-82e0-360099a67a99@redhat.com \
--to=jmaloy@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=dgibson@redhat.com \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).