From: David Gibson <david@gibson.dropbear.id.au>
To: Jon Maloy <jmaloy@redhat.com>
Cc: sbrivio@redhat.com, dgibson@redhat.com, passt-dev@passt.top
Subject: Re: [RFC 09/12] netlink: Add host-side monitoring for late template interface binding
Date: Thu, 18 Dec 2025 15:44:31 +1100 [thread overview]
Message-ID: <aUOGrz_VtyBiTYjn@zatzit> (raw)
In-Reply-To: <20251215015441.887736-10-jmaloy@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 16505 bytes --]
On Sun, Dec 14, 2025 at 08:54:38PM -0500, Jon Maloy wrote:
> When pasta starts without an active template interface (e.g., WiFi
> not yet connected), it falls back to local mode. This change adds
> support for late binding: when the template interface gets an address
> later, pasta detects this via a host-side netlink socket and
> propagates the configuration to the namespace.
>
> Late binding occurs when:
> - A specific interface is given via -I and it later gets an address
"-i"? -I is the guest side interface name.
> - No interface is specified, and any interface gets an address.
> In the latter case the first discovered interface is adopted as
> template.
>
> The key changes we make in this commit are:
> - We add a host-side netlink socket (nl_sock_linkaddr_host) to
> monitor link and address changes on the template interface.
> - We add a nl_linkaddr_host_handler() to process these events
> and propagate addresses to the namespace.
> - We add support for late binding: when ifi4/ifi6 are unset, we
> adopt the interface that receives an address - either the one
> specified via -I, or the first one to get an address if -I was
> not given.
> - We bring the interface UP after first address is added via late
> binding.
> - We retain CAP_NET_ADMIN in isolate_prefork() for pasta mode to allow
> dynamic interface configuration after sandboxing.
>
> Signed-off-by: Jon Maloy <jmaloy@redhat.com>
> ---
> epoll_type.h | 4 +-
> isolation.c | 4 +
> netlink.c | 314 +++++++++++++++++++++++++++++++++++++++++++++++++++
> netlink.h | 3 +
> passt.c | 4 +
> 5 files changed, 328 insertions(+), 1 deletion(-)
>
> diff --git a/epoll_type.h b/epoll_type.h
> index 0a16d94..8dc6b8a 100644
> --- a/epoll_type.h
> +++ b/epoll_type.h
> @@ -46,8 +46,10 @@ enum epoll_type {
> EPOLL_TYPE_REPAIR,
> /* Netlink neighbour subscription socket */
> EPOLL_TYPE_NL_NEIGH,
> - /* Netlink link/address subscription socket */
> + /* Netlink link/address subscription socket (namespace) */
> EPOLL_TYPE_NL_LINKADDR,
> + /* Netlink link/address subscription socket (host, for template) */
> + EPOLL_TYPE_NL_LINKADDR_HOST,
>
> EPOLL_NUM_TYPES,
> };
> diff --git a/isolation.c b/isolation.c
> index b25f349..633c396 100644
> --- a/isolation.c
> +++ b/isolation.c
> @@ -356,6 +356,10 @@ int isolate_prefork(const struct ctx *c)
> if (c->mode == MODE_PASTA) {
> /* Keep CAP_SYS_ADMIN, so we can enter the netns */
> ns_caps |= BIT(CAP_SYS_ADMIN);
> + /* Keep CAP_NET_ADMIN for dynamic interface configuration
> + * (late binding when template interface comes up after start)
> + */
> + ns_caps |= BIT(CAP_NET_ADMIN);
> /* Keep CAP_NET_BIND_SERVICE, so we can splice
> * outbound connections to low port numbers
> */
> diff --git a/netlink.c b/netlink.c
> index a8d3116..583ada8 100644
> --- a/netlink.c
> +++ b/netlink.c
> @@ -41,6 +41,9 @@
> #include "netlink.h"
> #include "epoll_ctl.h"
>
> +/* Default namespace interface name from conf.c */
> +extern const char *pasta_default_ifn;
> +
> /* Same as RTA_NEXT() but for nexthops: RTNH_NEXT() doesn't take 'attrlen' */
> #define RTNH_NEXT_AND_DEC(rtnh, attrlen) \
> ((attrlen) -= RTNH_ALIGN((rtnh)->rtnh_len), RTNH_NEXT(rtnh))
> @@ -63,6 +66,7 @@ int nl_sock = -1;
> int nl_sock_ns = -1;
> static int nl_sock_neigh = -1;
> static int nl_sock_linkaddr = -1;
> +static int nl_sock_linkaddr_host = -1;
> static int nl_seq = 1;
>
> /**
> @@ -249,6 +253,175 @@ static bool nl_addr6_del(struct ctx *c, const struct in6_addr *addr)
> return true;
> }
>
> +/**
> + * nl_linkaddr_host_msg_read() - Handle host-side link/addr changes
> + * @c: Execution context
> + * @nh: Netlink message header
> + *
> + * Monitor template interface changes and propagate to namespace.
> + * Supports late binding: if no template was detected at startup,
> + * adopt the interface specified by -I when it gets an address.
> + */
> +static void nl_linkaddr_host_msg_read(struct ctx *c, const struct nlmsghdr *nh)
> +{
> + if (nh->nlmsg_type == NLMSG_DONE || nh->nlmsg_type == NLMSG_ERROR)
> + return;
> +
> + if (nh->nlmsg_type == RTM_NEWADDR || nh->nlmsg_type == RTM_DELADDR) {
> + bool is_new = (nh->nlmsg_type == RTM_NEWADDR);
> + const struct ifaddrmsg *ifa = NLMSG_DATA(nh);
> + struct rtattr *rta = IFA_RTA(ifa);
> + size_t na = IFA_PAYLOAD(nh);
> + bool late_binding = false;
> + unsigned int template_ifi;
> + char ifname[IFNAMSIZ];
> + void *addr = NULL;
> + bool is_default;
> + bool is_match;
> + bool unbound;
> +
> + /* Get interface name for this message */
> + if (!if_indextoname(ifa->ifa_index, ifname))
> + snprintf(ifname, sizeof(ifname), "?");
> +
> + /* Get template interface index, handling late binding.
> + * Late binding occurs when ifi4/ifi6 <= 0 (local mode) and either:
> + * - pasta_ifn is set and matches this interface, or
> + * - pasta_ifn contains the default name
I think this is trying to infer that we're in local mode from the
interface name. That seems roundabout and fragile to me, better to
add a flag to struct ctx if we need it. (If nothing else, using -I
would break this, wouldn't it?).
> + */
> + if (ifa->ifa_family == AF_INET)
> + template_ifi = c->ifi4;
> + else if (ifa->ifa_family == AF_INET6)
> + template_ifi = c->ifi6;
> + else
> + return;
> +
> + /* Check for late binding conditions */
> + is_default = !strcmp(c->pasta_ifn, pasta_default_ifn);
> + is_match = !strcmp(ifname, c->pasta_ifn);
> + unbound = (ifa->ifa_family == AF_INET) ?
> + (int)c->ifi4 <= 0 : (int)c->ifi6 <= 0;
> +
> + if (unbound && (is_default || is_match)) {
> + debug("Late binding: using %s as %s template", ifname,
> + ifa->ifa_family == AF_INET ? "IPv4" : "IPv6");
> +
> + if (ifa->ifa_family == AF_INET) {
> + c->ifi4 = ifa->ifa_index;
> + template_ifi = c->ifi4;
> + } else {
> + c->ifi6 = ifa->ifa_index;
> + template_ifi = c->ifi6;
> + }
> + late_binding = true;
> +
> + if (is_default)
> + snprintf(c->pasta_ifn, sizeof(c->pasta_ifn),
> + "%s", ifname);
> + }
> +
> + if (ifa->ifa_index != template_ifi)
> + return;
> +
> + /* Re-initialize rta/na for attribute parsing */
> + rta = IFA_RTA(ifa);
> + na = IFA_PAYLOAD(nh);
> +
> + for (; RTA_OK(rta, na); rta = RTA_NEXT(rta, na)) {
> + if (ifa->ifa_family == AF_INET &&
> + rta->rta_type == IFA_LOCAL) {
> + addr = RTA_DATA(rta);
> + break;
> + } else if (ifa->ifa_family == AF_INET6 &&
> + rta->rta_type == IFA_ADDRESS) {
> + addr = RTA_DATA(rta);
> + break;
> + }
> + }
> +
> + if (!addr) {
> + info("No addr found in netlink linkaddr message");
Maybe best to check for this *before* touching important state
variables like c->ifi4?
> + return;
> + }
> +
> + if (ifa->ifa_family == AF_INET) {
> + struct in_addr *a = (struct in_addr *)addr;
> + char buf[INET_ADDRSTRLEN];
> + int rc;
> +
> + inet_ntop(AF_INET, a, buf, sizeof(buf));
> +
> + if (!is_new) {
> + nl_addr4_del(c, a);
> + nl_addr_del(nl_sock_ns, c->pasta_ifi,
> + AF_INET, a, ifa->ifa_prefixlen);
We only want to actually poke the guest if c->pasta_conf_ns.
> + return;
> + }
> + rc = nl_addr_set(nl_sock_ns, c->pasta_ifi,
> + AF_INET, a,
> + ifa->ifa_prefixlen);
> + if (rc < 0) {
> + debug("Failed to add %s/%u to ns: %s", buf,
> + ifa->ifa_prefixlen, strerror_(-rc));
> + } else {
> + nl_addr4_add(c, a, ifa->ifa_prefixlen);
What's nl_addr4_add() and how does it differ from nl_addr_set()?
> + c->ip4.addr_seen = *a;
This is a host side event, so we shouldn't be updating addr_seen.
> + debug("Added %s/%u to namespace",
> + buf, ifa->ifa_prefixlen);
> +
> + /* Bring interface UP on late binding */
> + if (late_binding && !c->pasta_ifi_up) {
> + nl_link_set_flags(nl_sock_ns,
> + c->pasta_ifi,
> + IFF_UP, IFF_UP);
> + c->pasta_ifi_up = 1;
> + debug("Brought interface up");
> + }
> + if (late_binding || c->pasta_ifi_up)
c->pasta_ifi_up must always be true at this point, no?
> + arp_send_init_req(c);
> + }
> + } else if (ifa->ifa_family == AF_INET6) {
> + struct in6_addr *a = (struct in6_addr *)addr;
> + char buf[INET6_ADDRSTRLEN];
> + int rc;
> +
> + inet_ntop(AF_INET6, a, buf, sizeof(buf));
> +
> + if (!is_new) {
> + nl_addr6_del(c, a);
> + nl_addr_del(nl_sock_ns, c->pasta_ifi,
> + AF_INET6, a, ifa->ifa_prefixlen);
> + return;
> + }
> + rc = nl_addr_set(nl_sock_ns, c->pasta_ifi,
> + AF_INET6, a, ifa->ifa_prefixlen);
> + if (rc < 0) {
> + debug("Failed to add %s/%u to ns: %s",
> + buf, ifa->ifa_prefixlen,
> + strerror_(-rc));
> + } else {
> + nl_addr6_add(c, a, ifa->ifa_prefixlen);
> + c->ip6.addr_seen = *a;
> + debug("Added %s/%u to namespace",
> + buf, ifa->ifa_prefixlen);
> +
> + /* Bring interface UP on late binding */
> + if (late_binding && !c->pasta_ifi_up) {
> + nl_link_set_flags(nl_sock_ns,
> + c->pasta_ifi,
> + IFF_UP, IFF_UP);
> + c->pasta_ifi_up = 1;
> + debug("Brought interface up");
> + }
> + if ((late_binding || c->pasta_ifi_up) &&
> + !c->no_ndp)
> + ndp_send_init_req(c);
> + }
> + }
> + return;
> + }
> +}
> +
> /**
> * nl_linkaddr_msg_read() - Parse and log a netlink link/addr message
> * @c: Execution context
> @@ -432,6 +605,36 @@ void nl_linkaddr_notify_handler(struct ctx *c)
> }
> }
>
> +/**
> + * nl_linkaddr_host_handler() - Handle events from host link/addr notifier
> + * @c: Execution context
> + *
> + * Monitor template interface changes and propagate to namespace
> + */
> +void nl_linkaddr_host_handler(struct ctx *c)
> +{
> + char buf[NLBUFSIZ];
> +
> + for (;;) {
> + ssize_t n = recv(nl_sock_linkaddr_host, buf, sizeof(buf),
> + MSG_DONTWAIT);
> + struct nlmsghdr *nh = (struct nlmsghdr *)buf;
> +
> + if (n < 0) {
> + if (errno == EINTR)
> + continue;
> + if (errno != EAGAIN)
> + info("Host recv() error: %s", strerror_(errno));
> + break;
> + }
> +
> + info("Host netlink: received %zd bytes", n);
> +
> + for (; NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n))
> + nl_linkaddr_host_msg_read(c, nh);
> + }
> +}
> +
> /**
> * nl_linkaddr_init_do() - Actually create and bind the netlink socket
> * @arg: Execution context (for namespace entry) or NULL
> @@ -464,6 +667,38 @@ static int nl_linkaddr_init_do(void *arg)
> return 0;
> }
>
> +/**
> + * nl_linkaddr_host_init_do() - Create host-side link/addr notifier socket
> + * @arg: Unused
> + *
> + * Return: 0 on success, -1 on failure
> + */
> +static int nl_linkaddr_host_init_do(void *arg)
Why the void *? This is host side, so you don't need an NS_CALL().
> +{
> + struct sockaddr_nl addr = { .nl_family = AF_NETLINK,
> + .nl_groups = RTMGRP_LINK | RTMGRP_IPV4_IFADDR | RTMGRP_IPV6_IFADDR };
> +
> + (void)arg;
> +
> + nl_sock_linkaddr_host = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC,
> + NETLINK_ROUTE);
> + if (nl_sock_linkaddr_host < 0) {
> + debug("socket() failed for host: %s", strerror_(errno));
> + return -1;
> + }
> +
> + if (bind(nl_sock_linkaddr_host, (struct sockaddr *)&addr,
> + sizeof(addr)) < 0) {
> + debug("bind() failed for host: %s", strerror_(errno));
> + close(nl_sock_linkaddr_host);
> + nl_sock_linkaddr_host = -1;
> + return -1;
> + }
> +
> + debug("host socket fd=%d", nl_sock_linkaddr_host);
> + return 0;
> +}
> +
> /**
> * nl_linkaddr_notify_init() - Initialize link/address change notifier
> * @c: Execution context
> @@ -502,6 +737,33 @@ int nl_linkaddr_notify_init(const struct ctx *c)
> return -1;
> }
>
> + debug("namespace socket fd=%d", nl_sock_linkaddr);
Looks like it belongs in an earlier patch. Plus "namespace socket fd"
isn't very specific.
> +
> + /* In PASTA mode, also create a host-side socket to monitor
> + * template interface changes
> + */
> + if (c->mode == MODE_PASTA) {
> + nl_linkaddr_host_init_do(NULL);
> +
> + if (nl_sock_linkaddr_host < 0) {
> + warn("Failed to create host link/addr notifier socket");
> + /* Non-fatal - continue without host monitoring */
> + } else {
> + ref.type = EPOLL_TYPE_NL_LINKADDR_HOST;
> + ev.data.u64 = ref.u64;
> + if (epoll_ctl(c->epollfd, EPOLL_CTL_ADD,
> + nl_sock_linkaddr_host, &ev) == -1) {
> + warn("epoll_ctl() failed on host notifier: %s",
> + strerror_(errno));
> + close(nl_sock_linkaddr_host);
> + nl_sock_linkaddr_host = -1;
> + } else {
> + info("Host netlink socket fd=%d, pasta_ifn=%s",
> + nl_sock_linkaddr_host, c->pasta_ifn);
> + }
> + }
> + }
> +
> return 0;
> }
> /**
> @@ -1340,6 +1602,58 @@ int nl_addr_set(int s, unsigned int ifi, sa_family_t af,
> return nl_do(s, &req, RTM_NEWADDR, NLM_F_CREATE | NLM_F_EXCL, len);
> }
>
> +/**
> + * nl_addr_del() - Delete IP address from given interface
> + * @s: Netlink socket
> + * @ifi: Interface index
> + * @af: Address family
> + * @addr: Address to delete
> + * @prefix_len: Prefix length
> + *
> + * Return: 0 on success, negative error code on failure
> + */
> +int nl_addr_del(int s, unsigned int ifi, sa_family_t af,
> + const void *addr, int prefix_len)
> +{
> + struct req_t {
> + struct nlmsghdr nlh;
> + struct ifaddrmsg ifa;
> + union {
> + struct {
> + struct rtattr rta_l;
> + struct in_addr l;
> + } a4;
> + struct {
> + struct rtattr rta_l;
> + struct in6_addr l;
> + } a6;
> + } del;
> + } req = {
> + .ifa.ifa_family = af,
> + .ifa.ifa_index = ifi,
> + .ifa.ifa_prefixlen = prefix_len,
> + };
> + ssize_t len;
> +
> + if (af == AF_INET6) {
> + size_t rta_len = RTA_LENGTH(sizeof(req.del.a6.l));
> +
> + len = offsetof(struct req_t, del.a6) + sizeof(req.del.a6);
> + memcpy(&req.del.a6.l, addr, sizeof(req.del.a6.l));
> + req.del.a6.rta_l.rta_len = rta_len;
> + req.del.a6.rta_l.rta_type = IFA_LOCAL;
> + } else {
> + size_t rta_len = RTA_LENGTH(sizeof(req.del.a4.l));
> +
> + len = offsetof(struct req_t, del.a4) + sizeof(req.del.a4);
> + memcpy(&req.del.a4.l, addr, sizeof(req.del.a4.l));
> + req.del.a4.rta_l.rta_len = rta_len;
> + req.del.a4.rta_l.rta_type = IFA_LOCAL;
> + }
> +
> + return nl_do(s, &req, RTM_DELADDR, 0, len);
> +}
> +
> /**
> * nl_addr_dup() - Copy IP addresses for given interface and address family
> * @s_src: Netlink socket in source network namespace
> diff --git a/netlink.h b/netlink.h
> index 1796a72..f65ae10 100644
> --- a/netlink.h
> +++ b/netlink.h
> @@ -35,5 +35,8 @@ void nl_neigh_notify_handler(const struct ctx *c);
>
> int nl_linkaddr_notify_init(const struct ctx *c);
> void nl_linkaddr_notify_handler(struct ctx *c);
> +void nl_linkaddr_host_handler(struct ctx *c);
> +int nl_addr_del(int s, unsigned int ifi, sa_family_t af,
> + const void *addr, int prefix_len);
>
> #endif /* NETLINK_H */
> diff --git a/passt.c b/passt.c
> index f274858..438dac8 100644
> --- a/passt.c
> +++ b/passt.c
> @@ -81,6 +81,7 @@ char *epoll_type_str[] = {
> [EPOLL_TYPE_REPAIR] = "TCP_REPAIR helper socket",
> [EPOLL_TYPE_NL_NEIGH] = "netlink neighbour notifier socket",
> [EPOLL_TYPE_NL_LINKADDR] = "netlink link/address notifier socket",
> + [EPOLL_TYPE_NL_LINKADDR_HOST] = "netlink host link/address notifier socket",
> };
> static_assert(ARRAY_SIZE(epoll_type_str) == EPOLL_NUM_TYPES,
> "epoll_type_str[] doesn't match enum epoll_type");
> @@ -308,6 +309,9 @@ static void passt_worker(void *opaque, int nfds, struct epoll_event *events)
> case EPOLL_TYPE_NL_LINKADDR:
> nl_linkaddr_notify_handler(c);
> break;
> + case EPOLL_TYPE_NL_LINKADDR_HOST:
> + nl_linkaddr_host_handler(c);
> + break;
> default:
> /* Can't happen */
> ASSERT(0);
> --
> 2.51.1
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2025-12-18 4:44 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-15 1:54 [RFC 00/12] Support for multiple address and late binding Jon Maloy
2025-12-15 1:54 ` [RFC 01/12] ip: Introduce multi-address data structures for IPv4 and IPv6 Jon Maloy
2025-12-15 9:40 ` David Gibson
2025-12-15 22:05 ` Jon Maloy
2025-12-16 1:58 ` Jon Maloy
2025-12-16 3:14 ` David Gibson
2025-12-15 9:46 ` David Gibson
2025-12-15 1:54 ` [RFC 02/12] ip: Add ip4_default_prefix_len() helper function for class-based prefix Jon Maloy
2025-12-15 9:41 ` David Gibson
2025-12-15 1:54 ` [RFC 03/12] conf: Allow multiple -a/--address options per address family Jon Maloy
2025-12-15 9:53 ` David Gibson
2025-12-15 1:54 ` [RFC 04/12] conf: Apply -n/--netmask to most recently added address Jon Maloy
2025-12-15 9:54 ` David Gibson
2025-12-15 22:43 ` Jon Maloy
2025-12-15 1:54 ` [RFC 05/12] fwd: Check all configured addresses in guest accessibility functions Jon Maloy
2025-12-15 10:06 ` David Gibson
2025-12-15 1:54 ` [RFC 06/12] arp: Check all configured addresses in ARP filtering Jon Maloy
2025-12-15 10:07 ` David Gibson
2025-12-15 1:54 ` [RFC 07/12] netlink: Subscribe to link/address changes in namespace Jon Maloy
2025-12-15 10:32 ` David Gibson
2025-12-15 23:25 ` Jon Maloy
2025-12-16 3:21 ` David Gibson
2025-12-15 1:54 ` [RFC 08/12] netlink: Subscribe to route " Jon Maloy
2025-12-15 10:38 ` David Gibson
2025-12-15 1:54 ` [RFC 09/12] netlink: Add host-side monitoring for late template interface binding Jon Maloy
2025-12-18 4:44 ` David Gibson [this message]
2025-12-15 1:54 ` [RFC 10/12] netlink: Add host-side route monitoring and propagation Jon Maloy
2025-12-18 4:53 ` David Gibson
2025-12-15 1:54 ` [RFC 11/12] netlink: Prevent host route events from overwriting guest-configured gateway Jon Maloy
2025-12-18 4:59 ` David Gibson
2025-12-15 1:54 ` [RFC 12/12] netlink: Rename tap interface when late binding discovers template name Jon Maloy
2025-12-18 5:02 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aUOGrz_VtyBiTYjn@zatzit \
--to=david@gibson.dropbear.id.au \
--cc=dgibson@redhat.com \
--cc=jmaloy@redhat.com \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).