From: David Gibson <david@gibson.dropbear.id.au>
To: Jon Maloy <jmaloy@redhat.com>
Cc: sbrivio@redhat.com, dgibson@redhat.com, passt-dev@passt.top
Subject: Re: [RFC 07/12] netlink: Subscribe to link/address changes in namespace
Date: Tue, 16 Dec 2025 14:21:24 +1100 [thread overview]
Message-ID: <aUDQNLTDEtU69r7Q@zatzit> (raw)
In-Reply-To: <6acb974d-c0dc-4bc3-82e0-360099a67a99@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 17426 bytes --]
On Mon, Dec 15, 2025 at 06:25:53PM -0500, Jon Maloy wrote:
>
>
> On 2025-12-15 05:32, David Gibson wrote:
> > On Sun, Dec 14, 2025 at 08:54:36PM -0500, Jon Maloy wrote:
> > > We add subscriptions to RTMGRP_LINK, RTMGRP_IPV4_IFADDR, and
> > > RTMGRP_IPV6_IFADDR, so that we can receive notifications when link
> > > state or addresses change on the namespace interface.
> > >
> > > When addresses are discovered via netlink:
> > >
> > > - We mark them as non-permanent, which means they can be modified or
> > > deleted by subsequent events.
> > > - We apply the prefix indicated in the notification.
> > > - Update addr_seen to track the new address as the active one.
> >
> > addr_seen isn't really about an "active" address. The expectation was
> > that the guest would only use a single address, it just might not be
> > the one we told it to.
> >
> > Now that we're aiming to allow multiple concurrent addresses, we can
> > expect the guest to use all of them actively.
>
> Right. This makes the case for having guest/tap side subscriptions, since we
> now will now know all addresses he is using, so addr_seen
> should become obsolete.
Well.. it depends what you mean by obsolete. We can store multiple
addresses, so it probably makes sense to store configured and observed
addresses in the same table (since they'll usually be the same).
Assuming we still support the guest using a non-configured address
we'll still need to be able to observe them on the wire, not just via
netlink, because that won't work for passt. Given that, it's not
clear to me that we actually gain anything by observing addresses
using guest side netlink.
> > > This provides the foundation for dynamic address monitoring,
> > > and supports runtime network changes.
> > >
> > > Signed-off-by: Jon Maloy <jmaloy@redhat.com>
> > > ---
> > > epoll_type.h | 2 +
> > > netlink.c | 370 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > > netlink.h | 3 +
> > > passt.c | 5 +
> > > passt.h | 1 +
> > > tap.c | 6 +-
> > > 6 files changed, 384 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/epoll_type.h b/epoll_type.h
> > > index a90ffb6..0a16d94 100644
> > > --- a/epoll_type.h
> > > +++ b/epoll_type.h
> > > @@ -46,6 +46,8 @@ enum epoll_type {
> > > EPOLL_TYPE_REPAIR,
> > > /* Netlink neighbour subscription socket */
> > > EPOLL_TYPE_NL_NEIGH,
> > > + /* Netlink link/address subscription socket */
> > > + EPOLL_TYPE_NL_LINKADDR,
> > > EPOLL_NUM_TYPES,
> > > };
> > > diff --git a/netlink.c b/netlink.c
> > > index 82a2f0c..7492f17 100644
> > > --- a/netlink.c
> > > +++ b/netlink.c
> > > @@ -35,6 +35,9 @@
> > > #include "passt.h"
> > > #include "log.h"
> > > #include "ip.h"
> > > +#include "tap.h"
> > > +#include "arp.h"
> > > +#include "ndp.h"
> > > #include "netlink.h"
> > > #include "epoll_ctl.h"
> > > @@ -59,6 +62,7 @@
> > > int nl_sock = -1;
> > > int nl_sock_ns = -1;
> > > static int nl_sock_neigh = -1;
> > > +static int nl_sock_linkaddr = -1;
> > > static int nl_seq = 1;
> > > /**
> > > @@ -91,6 +95,372 @@ static int nl_sock_init_do(void *arg)
> > > return 0;
> > > }
> > > +/**
> > > + * nl_addr4_find() - Find an IPv4 address in the address array
> > > + * @c: Execution context
> > > + * @addr: Address to find
> > > + *
> > > + * Return: index if found, -1 otherwise
> > > + */
> > > +static int nl_addr4_find(const struct ctx *c, const struct in_addr *addr)
> > > +{
> > > + int i;
> > > +
> > > + for (i = 0; i < c->ip4.addr_count; i++)
> > > + if (IN4_ARE_ADDR_EQUAL(&c->ip4.addrs[i].addr, addr))
> > > + return (int)i;
> > > +
> > > + return -1;
> > > +}
> > > +
> > > +/**
> > > + * nl_addr6_find() - Find an IPv6 address in the address array
> > > + * @c: Execution context
> > > + * @addr: Address to find
> > > + *
> > > + * Return: index if found, -1 otherwise
> > > + */
> > > +static int nl_addr6_find(const struct ctx *c, const struct in6_addr *addr)
> > > +{
> > > + int i;
> > > +
> > > + for (i = 0; i < c->ip6.addr_count; i++)
> > > + if (IN6_ARE_ADDR_EQUAL(&c->ip6.addrs[i].addr, addr))
> > > + return (int)i;
> > > +
> > > + return -1;
> > > +}
> > > +
> > > +/**
> > > + * nl_addr4_add() - Add a discovered IPv4 address to the address array
> > > + * @c: Execution context
> > > + * @addr: Address to add
> > > + * @prefix_len: Prefix length
> > > + *
> > > + * Return: true if added or updated, false if array full or already permanent
> > > + */
> > > +static bool nl_addr4_add(struct ctx *c, const struct in_addr *addr,
> > > + int prefix_len)
> > > +{
> > > + int idx = nl_addr4_find(c, addr);
> > > +
> > > + if (idx >= 0) {
> > > + /* Address exists - if permanent, don't touch; else update */
> > > + if (c->ip4.addrs[idx].permanent)
> > > + return false;
> > > + c->ip4.addrs[idx].prefix_len = prefix_len;
> > > + return true;
> > > + }
> > > +
> > > + /* New address - add if room */
> > > + if (c->ip4.addr_count >= IP4_MAX_ADDRS) {
> > > + debug("IPv4 address array full, ignoring discovered address");
> > > + return false;
> > > + }
> > > +
> > > + idx = c->ip4.addr_count++;
> > > + c->ip4.addrs[idx].addr = *addr;
> > > + c->ip4.addrs[idx].prefix_len = prefix_len;
> > > + c->ip4.addrs[idx].permanent = 0;
> > > + return true;
> > > +}
> > > +
> > > +/**
> > > + * nl_addr6_add() - Add a discovered IPv6 address to the address array
> > > + * @c: Execution context
> > > + * @addr: Address to add
> > > + * @prefix_len: Prefix length
> > > + *
> > > + * Return: true if added or updated, false if array full or already permanent
> > > + */
> > > +static bool nl_addr6_add(struct ctx *c, const struct in6_addr *addr,
> > > + int prefix_len)
> > > +{
> > > + int idx = nl_addr6_find(c, addr);
> > > +
> > > + if (idx >= 0) {
> > > + /* Address exists - if permanent, don't touch; else update */
> > > + if (c->ip6.addrs[idx].permanent)
> > > + return false;
> > > + c->ip6.addrs[idx].prefix_len = prefix_len;
> > > + return true;
> > > + }
> > > +
> > > + /* New address - add if room */
> > > + if (c->ip6.addr_count >= IP6_MAX_ADDRS) {
> > > + debug("IPv6 address array full, ignoring discovered address");
> > > + return false;
> > > + }
> > > +
> > > + idx = c->ip6.addr_count++;
> > > + c->ip6.addrs[idx].addr = *addr;
> > > + c->ip6.addrs[idx].prefix_len = prefix_len;
> > > + c->ip6.addrs[idx].permanent = 0;
> > > + return true;
> > > +}
> > > +
> > > +/**
> > > + * nl_addr4_del() - Remove an IPv4 address from the array if not permanent
> > > + * @c: Execution context
> > > + * @addr: Address to remove
> > > + *
> > > + * Return: true if removed, false if not found or permanent
> > > + */
> > > +static bool nl_addr4_del(struct ctx *c, const struct in_addr *addr)
> > > +{
> > > + int i, idx = nl_addr4_find(c, addr);
> > > +
> > > + if (idx < 0)
> > > + return false;
> > > +
> > > + if (c->ip4.addrs[idx].permanent)
> > > + return false;
> > > +
> > > + /* Shift remaining entries down */
> > > + c->ip4.addr_count--;
> > > + for (i = idx; i < c->ip4.addr_count; i++)
> > > + c->ip4.addrs[i] = c->ip4.addrs[i + 1];
> > > +
> > > + return true;
> > > +}
> > > +
> > > +/**
> > > + * nl_addr6_del() - Remove an IPv6 address from the array if not permanent
> > > + * @c: Execution context
> > > + * @addr: Address to remove
> > > + *
> > > + * Return: true if removed, false if not found or permanent
> > > + */
> > > +static bool nl_addr6_del(struct ctx *c, const struct in6_addr *addr)
> > > +{
> > > + int i, idx = nl_addr6_find(c, addr);
> > > +
> > > + if (idx < 0)
> > > + return false;
> > > +
> > > + if (c->ip6.addrs[idx].permanent)
> > > + return false;
> > > +
> > > + /* Shift remaining entries down */
> > > + c->ip6.addr_count--;
> > > + for (i = idx; i < c->ip6.addr_count; i++)
> > > + c->ip6.addrs[i] = c->ip6.addrs[i + 1];
> > > +
> > > + return true;
> > > +}
> >
> > All the functions above are more to do with the data structure storing
> > the addresses than they are to do with netlink. Better to move them
> > into... maybe ip.c? And use them from conf.c as well.
>
> Agreed. I'll fix that.
>
> >
> > Given the amount of near-duplication here, maybe it would be better to
> > have a single table for v4 and v6 using inany_addr?
> Yes.
> >
> > > +/**
> > > + * nl_linkaddr_msg_read() - Parse and log a netlink link/addr message
> > > + * @c: Execution context
> > > + * @nh: Netlink message header
> > > + */
> > > +static void nl_linkaddr_msg_read(struct ctx *c, const struct nlmsghdr *nh)
> > > +{
> > > + if (nh->nlmsg_type == NLMSG_DONE || nh->nlmsg_type == NLMSG_ERROR)
> > > + return;
> > > +
> > > + if (nh->nlmsg_type == RTM_NEWLINK || nh->nlmsg_type == RTM_DELLINK) {
> > > + const struct ifinfomsg *ifm = NLMSG_DATA(nh);
> > > + struct rtattr *rta = IFLA_RTA(ifm);
> > > + size_t na = IFLA_PAYLOAD(nh);
> > > + const char *name = "?";
> > > + bool up = !!(ifm->ifi_flags & IFF_UP);
> > > + bool running = !!(ifm->ifi_flags & IFF_RUNNING);
> > > +
> > > + for (; RTA_OK(rta, na); rta = RTA_NEXT(rta, na)) {
> > > + if (rta->rta_type == IFLA_IFNAME) {
> > > + name = (const char *)RTA_DATA(rta);
> > > + break;
> > > + }
> > > + }
> > > +
> > > + /* Update pasta interface UP state if this is our interface */
> > > + if (c->mode == MODE_PASTA &&
> > > + (unsigned int)ifm->ifi_index == c->pasta_ifi) {
> > > + c->pasta_ifi_up = up;
> > > + debug("Interface %s", up ? "UP" : "DOWN");
> >
> > This only makes sense if we're listening to netlink messages in the
> > guest netns, but the address stuff only makes sense listening to
> > messages in the host netns.
>
> See previous response.>
> > > + }
> > > +
> > > + if (nh->nlmsg_type == RTM_NEWLINK)
> > > + debug("Link %s (idx=%d): %s %s", name, ifm->ifi_index,
> > > + up ? "UP" : "DOWN", running ? "RUNNING" : "");
> > > + else
> > > + debug("Link %s (idx=%d): DELETED", name, ifm->ifi_index);
> > > +
> > > + return;
> > > + }
> > > +
> > > + if (nh->nlmsg_type == RTM_NEWADDR || nh->nlmsg_type == RTM_DELADDR) {
> > > + bool is_new = (nh->nlmsg_type == RTM_NEWADDR);
> > > + const struct ifaddrmsg *ifa = NLMSG_DATA(nh);
> > > + char addr_str[INET6_ADDRSTRLEN];
> > > + struct rtattr *rta = IFA_RTA(ifa);
> > > + char ifname[IFNAMSIZ] = { 0 };
> > > + size_t na = IFA_PAYLOAD(nh);
> > > + void *addr = NULL;
> > > + for (; RTA_OK(rta, na); rta = RTA_NEXT(rta, na)) {
> > > + if (ifa->ifa_family == AF_INET &&
> > > + rta->rta_type == IFA_LOCAL) {
> > > + addr = RTA_DATA(rta);
> > > + break;
> > > + } else if (ifa->ifa_family == AF_INET6 &&
> > > + rta->rta_type == IFA_ADDRESS) {
> > > + addr = RTA_DATA(rta);
> > > + break;
> > > + }
> > > + }
> > > +
> > > + if (!addr)
> > > + return;
> > > +
> > > + if_indextoname(ifa->ifa_index, ifname);
> > > + inet_ntop(ifa->ifa_family, addr, addr_str, sizeof(addr_str));
> > > +
> > > + debug("%s addr on %s (index=%d): %s/%i%s",
> > > + is_new ? "NEW" : "DEL", ifname, ifa->ifa_index, addr_str,
> > > + ifa->ifa_prefixlen,
> > > + tap_is_ready(c) ? " (tap UP)" : " (tap DOWN)");
> > > +
> > > + /* Only handle our pasta interface */
> > > + if (c->mode != MODE_PASTA || ifa->ifa_index != c->pasta_ifi)
> > > + return;
> >
> > Nope. This is a host netns event, so comparing to pasta_ifi makes no
> > sense. We _should_ be comparing to ifi4 or ifi6 (depending on address
> > family), and we should probably do that before we go parsing the
> > details above.
> >
> > We should also probably check for --no-copy-addrs here, too.
> >
> > In the other direction, even for PASST mode we can store this address
> > in our table, it just won't do anything until DHCP or whatever
> > consults it.
>
> ok
This comment no longer makes sense, now that I understand this is
intentionally observing the guest not the host.
>
> >
> > > +
> > > + if (ifa->ifa_family == AF_INET) {
> > > + struct in_addr *a = (struct in_addr *)addr;
> > > +
> > > + if (!is_new) {
> > > + nl_addr4_del(c, a);
> > > + return;
> > > + }
> > > +
> > > + if (nl_addr4_add(c, a, ifa->ifa_prefixlen)) {
> > > + c->ip4.addr_seen = *a;
> > > + if (c->pasta_ifi_up && c->ifi4) {
> > > + debug("Sending ARP");
> > > + arp_send_init_req(c);
> >
> > What does this ARP request do? AFAICT we haven't actually added the
> > address in the guest netns yet, so the guest won't respond to the ARP.
>
> The address was set by the guest, so why wouldn't he respond?
Again, I was assuming this was meant to be observing host addresses.
> > > + }
> > > + }
> > > + } else if (ifa->ifa_family == AF_INET6) {
> > > + struct in6_addr *a = (struct in6_addr *)addr;
> > > +
> > > + if (!is_new) {
> > > + nl_addr6_del(c, a);
> > > + return;
> > > + }
> > > +
> > > + if (nl_addr6_add(c, a,
> > > + ifa->ifa_prefixlen)) {
> > > + c->ip6.addr_seen = *a;
> > > + if (c->pasta_ifi_up &&
> > > + c->ifi6 && !c->no_ndp) {
> > > + debug("Sending NDP");
> > > + ndp_send_init_req(c);
> >
> > Some question with this NDP.
> >
> > > + }
> > > + }
> > > + }
> > > + }
> > > +}
> > > +
> > > +/**
> > > + * nl_linkaddr_notify_handler() - Handle events from link/addr notifier socket
> > > + * @c: Execution context
> > > + */
> > > +void nl_linkaddr_notify_handler(struct ctx *c)
> > > +{
> > > + char buf[NLBUFSIZ];
> > > +
> > > + for (;;) {
> > > + ssize_t n = recv(nl_sock_linkaddr, buf, sizeof(buf), MSG_DONTWAIT);
> > > + struct nlmsghdr *nh = (struct nlmsghdr *)buf;
> > > +
> > > + if (n < 0) {
> > > + if (errno == EINTR)
> > > + continue;
> > > + if (errno != EAGAIN)
> > > + debug("recv() error: %s", strerror_(errno));
> > > + break;
> > > + }
> > > +
> > > + debug("Received %zd bytes", n);
> > > +
> > > + for (; NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n))
> > > + nl_linkaddr_msg_read(c, nh);
> > > + }
> > > +}
> > > +
> > > +/**
> > > + * nl_linkaddr_init_do() - Actually create and bind the netlink socket
> > > + * @arg: Execution context (for namespace entry) or NULL
> > > + *
> > > + * Return: 0 on success, -1 on failure
> > > + */
> > > +static int nl_linkaddr_init_do(void *arg)
> > > +{
> > > + struct sockaddr_nl addr = { .nl_family = AF_NETLINK,
> > > + .nl_groups = RTMGRP_LINK | RTMGRP_IPV4_IFADDR |
> > > + RTMGRP_IPV6_IFADDR };
> > > +
> > > + if (arg)
> > > + ns_enter((struct ctx *)arg);
> > > +
> > > + nl_sock_linkaddr = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, NETLINK_ROUTE);
> >
> > Is there a reason to use an additional socket, rather than adding more
> > events to the neighbour listening socket?
>
> None in particular. I'll look into it.
>
> >
> > > + if (nl_sock_linkaddr < 0) {
> > > + debug("socket() failed: %s", strerror_(errno));
> > > + return -1;
> > > + }
> > > +
> > > + if (bind(nl_sock_linkaddr, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
> > > + debug("bind() failed: %s", strerror_(errno));
> > > + close(nl_sock_linkaddr);
> > > + nl_sock_linkaddr = -1;
> > > + return -1;
> > > + }
> > > +
> > > + debug("socket fd=%d", nl_sock_linkaddr);
> > > + return 0;
> > > +}
> > > +
> > > +/**
> > > + * nl_linkaddr_notify_init() - Initialize link/address change notifier
> > > + * @c: Execution context
> > > + *
> > > + * Return: 0 on success, -1 on failure
> > > + */
> > > +int nl_linkaddr_notify_init(const struct ctx *c)
> > > +{
> > > + union epoll_ref ref = { .type = EPOLL_TYPE_NL_LINKADDR };
> > > + struct epoll_event ev = { .events = EPOLLIN };
> > > +
> > > + if (nl_sock_linkaddr >= 0) {
> > > + debug("notifier already initialized (fd=%d)", nl_sock_linkaddr);
> > > + return 0;
> > > + }
> > > +
> > > + /* Open the notifier socket in the namespace for pasta mode,
> > > + * or in the init namespace otherwise.
> >
> > Definitely wrong. We're trying to watch host addresses so that we can
> > copy them to the guest - therefore we always need to watch in the host
> > netns. On pasta there might be reasons to *also* listen in the guest
> > netns, but that would want a different handler to do different things.
>
> Hmm. I think I'll wait for your feedback on the remaining patches before I
> comment on this.
So, as discussed in the call, this makes a bit more sense to me now
that I know this is intending to observe what the guest is doing, not
the host. That said, it still doesn't make sense to listen in
different places based on the passt/pasta mode.
For both passt and pasta we want something listening on the host, that
will behave pretty similarly. For pasta we might want an additional
listener on the guest side, but this will be separate from the former,
and I'm not yet convinced it's actually helpful.
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2025-12-16 3:21 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-15 1:54 [RFC 00/12] Support for multiple address and late binding Jon Maloy
2025-12-15 1:54 ` [RFC 01/12] ip: Introduce multi-address data structures for IPv4 and IPv6 Jon Maloy
2025-12-15 9:40 ` David Gibson
2025-12-15 22:05 ` Jon Maloy
2025-12-16 1:58 ` Jon Maloy
2025-12-16 3:14 ` David Gibson
2025-12-15 9:46 ` David Gibson
2025-12-15 1:54 ` [RFC 02/12] ip: Add ip4_default_prefix_len() helper function for class-based prefix Jon Maloy
2025-12-15 9:41 ` David Gibson
2025-12-15 1:54 ` [RFC 03/12] conf: Allow multiple -a/--address options per address family Jon Maloy
2025-12-15 9:53 ` David Gibson
2025-12-15 1:54 ` [RFC 04/12] conf: Apply -n/--netmask to most recently added address Jon Maloy
2025-12-15 9:54 ` David Gibson
2025-12-15 22:43 ` Jon Maloy
2025-12-15 1:54 ` [RFC 05/12] fwd: Check all configured addresses in guest accessibility functions Jon Maloy
2025-12-15 10:06 ` David Gibson
2025-12-15 1:54 ` [RFC 06/12] arp: Check all configured addresses in ARP filtering Jon Maloy
2025-12-15 10:07 ` David Gibson
2025-12-15 1:54 ` [RFC 07/12] netlink: Subscribe to link/address changes in namespace Jon Maloy
2025-12-15 10:32 ` David Gibson
2025-12-15 23:25 ` Jon Maloy
2025-12-16 3:21 ` David Gibson [this message]
2025-12-15 1:54 ` [RFC 08/12] netlink: Subscribe to route " Jon Maloy
2025-12-15 10:38 ` David Gibson
2025-12-15 1:54 ` [RFC 09/12] netlink: Add host-side monitoring for late template interface binding Jon Maloy
2025-12-15 1:54 ` [RFC 10/12] netlink: Add host-side route monitoring and propagation Jon Maloy
2025-12-15 1:54 ` [RFC 11/12] netlink: Prevent host route events from overwriting guest-configured gateway Jon Maloy
2025-12-15 1:54 ` [RFC 12/12] netlink: Rename tap interface when late binding discovers template name Jon Maloy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aUDQNLTDEtU69r7Q@zatzit \
--to=david@gibson.dropbear.id.au \
--cc=dgibson@redhat.com \
--cc=jmaloy@redhat.com \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).