public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH] netlink: Fetch most specific (longest prefix) address in nl_addr_get()
@ 2023-12-27 15:53 Stefano Brivio
  0 siblings, 0 replies; only message in thread
From: Stefano Brivio @ 2023-12-27 15:53 UTC (permalink / raw)
  To: passt-dev; +Cc: Yalan Zhang, David Gibson

This happened in most cases implicitly before commit eff3bcb24547
("netlink: Split nl_addr() into separate operation functions"): while
going through results from netlink, we would only copy an address
into the provided return buffer if no address had been picked yet.

Because of the insertion logic in the kernel (ipv6_link_dev_addr()),
the first returned address would also be the one added last, and, in
case of a Linux guest using a DHCPv6 client as well as SLAAC, that
would be the address assigned via DHCPv6, because SLAAC happens
before the DHCPv6 exchange.

The effect of, instead, picking the last returned address (first
assigned) is visible when passt or pasta runs nested, given that, by
default, they advertise a prefix for SLAAC usage, plus an address via
DHCPv6.

The first level (L1 guest) would get a /64 address by means of SLAAC,
and a /128 address via DHCPv6, the latter matching the address on the
host.

The second level (L2 guest) would also get two addresses: a /64 via
SLAAC (same prefix as the host), and a /128 via DHCPv6, matching the
the L1 SLAAC-assigned address, not the one obtained via DHCPv6. That
is, none of the L2 addresses would match the address on the host. The
whole point of having a DHCPv6 server is to avoid (implicit) NAT when
possible, though.

Fix this in a more explicit way than the behaviour we initially had:
pick the first address among the set of most specific ones, by
comparing prefix lengths. Do this for IPv4 and for link-local
addresses, too, to match in any case the implementation of the
default source address selection.

Reported-by: Yalan Zhang <yalzhang@redhat.com>
Fixes: eff3bcb24547 ("netlink: Split nl_addr() into separate operation functions")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 netlink.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/netlink.c b/netlink.c
index b221c79..6e40d28 100644
--- a/netlink.c
+++ b/netlink.c
@@ -527,7 +527,7 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
 }
 
 /**
- * nl_addr_get() - Get IP address for given interface and address family
+ * nl_addr_get() - Get most specific global address, given interface and family
  * @s:		Netlink socket
  * @ifi:	Interface index in outer network namespace
  * @af:		Address family
@@ -540,6 +540,7 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
 int nl_addr_get(int s, unsigned int ifi, sa_family_t af,
 		void *addr, int *prefix_len, void *addr_l)
 {
+	uint8_t prefix_max = 0, prefix_max_ll = 0;
 	struct req_t {
 		struct nlmsghdr nlh;
 		struct ifaddrmsg ifa;
@@ -566,17 +567,25 @@ int nl_addr_get(int s, unsigned int ifi, sa_family_t af,
 			if (rta->rta_type != IFA_ADDRESS)
 				continue;
 
-			if (af == AF_INET) {
+			if (af == AF_INET && ifa->ifa_prefixlen > prefix_max) {
 				memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta));
-				*prefix_len = ifa->ifa_prefixlen;
+
+				prefix_max = *prefix_len = ifa->ifa_prefixlen;
 			} else if (af == AF_INET6 && addr &&
-				   ifa->ifa_scope == RT_SCOPE_UNIVERSE) {
+				   ifa->ifa_scope == RT_SCOPE_UNIVERSE &&
+				   ifa->ifa_prefixlen > prefix_max) {
 				memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta));
+
+				prefix_max = ifa->ifa_prefixlen;
 			}
 
 			if (addr_l &&
-			    af == AF_INET6 && ifa->ifa_scope == RT_SCOPE_LINK)
+			    af == AF_INET6 && ifa->ifa_scope == RT_SCOPE_LINK &&
+			    ifa->ifa_prefixlen > prefix_max_ll) {
 				memcpy(addr_l, RTA_DATA(rta), RTA_PAYLOAD(rta));
+
+				prefix_max_ll = ifa->ifa_prefixlen;
+			}
 		}
 	}
 	return status;
-- 
@@ -527,7 +527,7 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
 }
 
 /**
- * nl_addr_get() - Get IP address for given interface and address family
+ * nl_addr_get() - Get most specific global address, given interface and family
  * @s:		Netlink socket
  * @ifi:	Interface index in outer network namespace
  * @af:		Address family
@@ -540,6 +540,7 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
 int nl_addr_get(int s, unsigned int ifi, sa_family_t af,
 		void *addr, int *prefix_len, void *addr_l)
 {
+	uint8_t prefix_max = 0, prefix_max_ll = 0;
 	struct req_t {
 		struct nlmsghdr nlh;
 		struct ifaddrmsg ifa;
@@ -566,17 +567,25 @@ int nl_addr_get(int s, unsigned int ifi, sa_family_t af,
 			if (rta->rta_type != IFA_ADDRESS)
 				continue;
 
-			if (af == AF_INET) {
+			if (af == AF_INET && ifa->ifa_prefixlen > prefix_max) {
 				memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta));
-				*prefix_len = ifa->ifa_prefixlen;
+
+				prefix_max = *prefix_len = ifa->ifa_prefixlen;
 			} else if (af == AF_INET6 && addr &&
-				   ifa->ifa_scope == RT_SCOPE_UNIVERSE) {
+				   ifa->ifa_scope == RT_SCOPE_UNIVERSE &&
+				   ifa->ifa_prefixlen > prefix_max) {
 				memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta));
+
+				prefix_max = ifa->ifa_prefixlen;
 			}
 
 			if (addr_l &&
-			    af == AF_INET6 && ifa->ifa_scope == RT_SCOPE_LINK)
+			    af == AF_INET6 && ifa->ifa_scope == RT_SCOPE_LINK &&
+			    ifa->ifa_prefixlen > prefix_max_ll) {
 				memcpy(addr_l, RTA_DATA(rta), RTA_PAYLOAD(rta));
+
+				prefix_max_ll = ifa->ifa_prefixlen;
+			}
 		}
 	}
 	return status;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] only message in thread

only message in thread, other threads:[~2023-12-27 15:53 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-27 15:53 [PATCH] netlink: Fetch most specific (longest prefix) address in nl_addr_get() Stefano Brivio

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).