public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH 00/17] netlink fixes and cleanups
@ 2023-07-24  6:09 David Gibson
  2023-07-24  6:09 ` [PATCH 01/17] netlink: Split up functionality if nl_link() David Gibson
                   ` (16 more replies)
  0 siblings, 17 replies; 35+ messages in thread
From: David Gibson @ 2023-07-24  6:09 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

We've had several bugs in the past that were quite tricky to debug,
but would have been much easier if we'd known that a netlink operation
had failed.  So, it would be desirable to actually detect and report
failures of netlink operations.  While working on that, I discovered
that there are a number of other issues ranging from very small to
medium sized with the way we use netlink.  This series addresses many
of them.

Link: https://bugs.passt.top/show_bug.cgi?id=60
Link: https://bugs.passt.top/show_bug.cgi?id=67

David Gibson (17):
  netlink: Split up functionality if nl_link()
  netlink: Split nl_addr() into separate operation functions
  netlink: Split nl_route() into separate operation functions
  netlink: Use struct in_addr for IPv4 addresses, not bare uint32_t
  netlink: Explicitly pass netlink sockets to operations
  netlink: Make nl_*_dup() use a separate datagram for each request
  netlink: Start sequence number from 1 instead of 0
  netlink: Treat send() or recv() errors as fatal
  netlink: Fill in netlink header fields from nl_req()
  netlink: Add nl_do() helper for simple operations with error checking
  netlink: Clearer reasoning about the netlink response buffer size
  netlink: Split nl_req() to allow processing multiple response
    datagrams
  netlink: Add nl_foreach_oftype to filter response message types
  netlink: Propagate errors for "set" operations
  netlink: Always process all responses to a netlink request
  netlink: Propagate errors for "dump" operations
  netlink: Propagate errors for "dup" operations

 conf.c    |  66 ++++-
 netlink.c | 844 ++++++++++++++++++++++++++++++++++--------------------
 netlink.h |  27 +-
 pasta.c   |  75 +++--
 4 files changed, 659 insertions(+), 353 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 01/17] netlink: Split up functionality if nl_link()
  2023-07-24  6:09 [PATCH 00/17] netlink fixes and cleanups David Gibson
@ 2023-07-24  6:09 ` David Gibson
  2023-08-02 22:47   ` Stefano Brivio
  2023-07-24  6:09 ` [PATCH 02/17] netlink: Split nl_addr() into separate operation functions David Gibson
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 35+ messages in thread
From: David Gibson @ 2023-07-24  6:09 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

nl_link() performs a number of functions: it can bring links up, set MAC
address and MTU and also retrieve the existing MAC.  This makes for a small
number of lines of code, but high conceptual complexity: it's quite hard
to follow what's going on both in nl_link() itself and it's also not very
obvious which function its callers are intending to use.

Clarify this, by splitting nl_link() into nl_link_up(), nl_link_set_mac(),
and nl_link_get_mac().  The first brings up a link, optionally setting the
MTU, the others get or set the MAC address.

This fixes an arguable bug in pasta_ns_conf(): it looks as though that was
intended to retrieve the guest MAC whether or not c->pasta_conf_ns is set.
However, it only actually does so in the !c->pasta_conf_ns case: the fact
that we set up==1 means we would only ever set, never get, the MAC in the
nl_link() call in the other path.  We get away with this because the MAC
will quickly be discovered once we receive packets on the tap interface.
Still, it's neater to always get the MAC address here.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c    |   4 +-
 netlink.c | 143 +++++++++++++++++++++++++++++++-----------------------
 netlink.h |   4 +-
 pasta.c   |  12 +++--
 4 files changed, 96 insertions(+), 67 deletions(-)

diff --git a/conf.c b/conf.c
index 78eaf2d..2ff9e2a 100644
--- a/conf.c
+++ b/conf.c
@@ -670,7 +670,7 @@ static unsigned int conf_ip4(unsigned int ifi,
 	memcpy(&ip4->addr_seen, &ip4->addr, sizeof(ip4->addr_seen));
 
 	if (MAC_IS_ZERO(mac))
-		nl_link(0, ifi, mac, 0, 0);
+		nl_link_get_mac(0, ifi, mac);
 
 	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr) ||
 	    MAC_IS_ZERO(mac))
@@ -711,7 +711,7 @@ static unsigned int conf_ip6(unsigned int ifi,
 	memcpy(&ip6->addr_ll_seen, &ip6->addr_ll, sizeof(ip6->addr_ll));
 
 	if (MAC_IS_ZERO(mac))
-		nl_link(0, ifi, mac, 0, 0);
+		nl_link_get_mac(0, ifi, mac);
 
 	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ||
 	    IN6_IS_ADDR_UNSPECIFIED(&ip6->addr_ll) ||
diff --git a/netlink.c b/netlink.c
index e15e23f..4b1f75e 100644
--- a/netlink.c
+++ b/netlink.c
@@ -486,83 +486,44 @@ next:
 }
 
 /**
- * nl_link() - Get/set link attributes
+ * nl_link_get_mac() - Get link MAC address
  * @ns:		Use netlink socket in namespace
  * @ifi:	Interface index
- * @mac:	MAC address to fill, if passed as zero, to set otherwise
- * @up:		If set, bring up the link
- * @mtu:	If non-zero, set interface MTU
+ * @mac:	Fill with current MAC address
  */
-void nl_link(int ns, unsigned int ifi, void *mac, int up, int mtu)
+void nl_link_get_mac(int ns, unsigned int ifi, void *mac)
 {
-	int change = !MAC_IS_ZERO(mac) || up || mtu;
 	struct req_t {
 		struct nlmsghdr nlh;
 		struct ifinfomsg ifm;
-		struct rtattr rta;
-		union {
-			unsigned char mac[ETH_ALEN];
-			struct {
-				unsigned int mtu;
-			} mtu;
-		} set;
 	} req = {
-		.nlh.nlmsg_type   = change ? RTM_NEWLINK : RTM_GETLINK,
-		.nlh.nlmsg_len    = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
-		.nlh.nlmsg_flags  = NLM_F_REQUEST | (change ? NLM_F_ACK : 0),
+		.nlh.nlmsg_type	  = RTM_GETLINK,
+		.nlh.nlmsg_len	  = sizeof(req),
+		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK,
 		.nlh.nlmsg_seq	  = nl_seq++,
 		.ifm.ifi_family	  = AF_UNSPEC,
 		.ifm.ifi_index	  = ifi,
-		.ifm.ifi_flags	  = up ? IFF_UP : 0,
-		.ifm.ifi_change	  = up ? IFF_UP : 0,
 	};
-	struct ifinfomsg *ifm;
 	struct nlmsghdr *nh;
-	struct rtattr *rta;
 	char buf[NLBUFSIZ];
 	ssize_t n;
-	size_t na;
-
-	if (!MAC_IS_ZERO(mac)) {
-		req.nlh.nlmsg_len = sizeof(req);
-		memcpy(req.set.mac, mac, ETH_ALEN);
-		req.rta.rta_type = IFLA_ADDRESS;
-		req.rta.rta_len = RTA_LENGTH(ETH_ALEN);
-		if (nl_req(ns, buf, &req, req.nlh.nlmsg_len) < 0)
-			return;
-
-		up = 0;
-	}
-
-	if (mtu) {
-		req.nlh.nlmsg_len = offsetof(struct req_t, set.mtu)
-			+ sizeof(req.set.mtu);
-		req.set.mtu.mtu = mtu;
-		req.rta.rta_type = IFLA_MTU;
-		req.rta.rta_len = RTA_LENGTH(sizeof(unsigned int));
-		if (nl_req(ns, buf, &req, req.nlh.nlmsg_len) < 0)
-			return;
-
-		up = 0;
-	}
-
-	if (up && nl_req(ns, buf, &req, req.nlh.nlmsg_len) < 0)
-		return;
-
-	if (change)
-		return;
 
-	if ((n = nl_req(ns, buf, &req, req.nlh.nlmsg_len)) < 0)
+	n = nl_req(ns, buf, &req, sizeof(req));
+	if (n < 0)
 		return;
+	
+	for (nh = (struct nlmsghdr *)buf;
+	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
+	     nh = NLMSG_NEXT(nh, n)) {
+		struct ifinfomsg *ifm = (struct ifinfomsg *)NLMSG_DATA(nh);
+		struct rtattr *rta;
+		size_t na;
 
-	nh = (struct nlmsghdr *)buf;
-	for ( ; NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
 		if (nh->nlmsg_type != RTM_NEWLINK)
-			goto next;
-
-		ifm = (struct ifinfomsg *)NLMSG_DATA(nh);
+			continue;
 
-		for (rta = IFLA_RTA(ifm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
+		for (rta = IFLA_RTA(ifm), na = RTM_PAYLOAD(nh);
+		     RTA_OK(rta, na);
 		     rta = RTA_NEXT(rta, na)) {
 			if (rta->rta_type != IFLA_ADDRESS)
 				continue;
@@ -570,8 +531,70 @@ void nl_link(int ns, unsigned int ifi, void *mac, int up, int mtu)
 			memcpy(mac, RTA_DATA(rta), ETH_ALEN);
 			break;
 		}
-next:
-		if (nh->nlmsg_type == NLMSG_DONE)
-			break;
 	}
 }
+
+/**
+ * nl_link_set_mac() - Set link MAC address
+ * @ns:		Use netlink socket in namespace
+ * @ifi:	Interface index
+ * @mac:	MAC address to set
+ */
+void nl_link_set_mac(int ns, unsigned int ifi, void *mac)
+{
+	struct req_t {
+		struct nlmsghdr nlh;
+		struct ifinfomsg ifm;
+		struct rtattr rta;
+		unsigned char mac[ETH_ALEN];
+	} req = {
+		.nlh.nlmsg_type	  = RTM_NEWLINK,
+		.nlh.nlmsg_len	  = sizeof(req),
+		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK,
+		.nlh.nlmsg_seq	  = nl_seq++,
+		.ifm.ifi_family	  = AF_UNSPEC,
+		.ifm.ifi_index	  = ifi,
+		.rta.rta_type	  = IFLA_ADDRESS,
+		.rta.rta_len	  = RTA_LENGTH(ETH_ALEN),
+	};
+	char buf[NLBUFSIZ];
+
+	memcpy(req.mac, mac, ETH_ALEN);
+
+	nl_req(ns, buf, &req, sizeof(req));
+}
+
+/**
+ * nl_link_up() - Bring link up
+ * @ns:		Use netlink socket in namespace
+ * @ifi:	Interface index
+ * @mtu:	If non-zero, set interface MTU
+ */
+void nl_link_up(int ns, unsigned int ifi, int mtu)
+{
+	struct req_t {
+		struct nlmsghdr nlh;
+		struct ifinfomsg ifm;
+		struct rtattr rta;
+		unsigned int mtu;
+	} req = {
+		.nlh.nlmsg_type   = RTM_NEWLINK,
+		.nlh.nlmsg_len    = sizeof(req),
+		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK,
+		.nlh.nlmsg_seq	  = nl_seq++,
+		.ifm.ifi_family	  = AF_UNSPEC,
+		.ifm.ifi_index	  = ifi,
+		.ifm.ifi_flags	  = IFF_UP,
+		.ifm.ifi_change	  = IFF_UP,
+		.rta.rta_type	  = IFLA_MTU,
+		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
+		.mtu		  = mtu,
+	};
+	char buf[NLBUFSIZ];
+
+	if (!mtu)
+		/* Shorten request to drop MTU attribute */
+		req.nlh.nlmsg_len = offsetof(struct req_t, rta);
+
+	nl_req(ns, buf, &req, req.nlh.nlmsg_len);
+}
diff --git a/netlink.h b/netlink.h
index cd0e666..980ac44 100644
--- a/netlink.h
+++ b/netlink.h
@@ -18,6 +18,8 @@ void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
 	      sa_family_t af, void *gw);
 void nl_addr(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
 	     sa_family_t af, void *addr, int *prefix_len, void *addr_l);
-void nl_link(int ns, unsigned int ifi, void *mac, int up, int mtu);
+void nl_link_get_mac(int ns, unsigned int ifi, void *mac);
+void nl_link_set_mac(int ns, unsigned int ifi, void *mac);
+void nl_link_up(int ns, unsigned int ifi, int mtu);
 
 #endif /* NETLINK_H */
diff --git a/pasta.c b/pasta.c
index 8c85546..3b5537d 100644
--- a/pasta.c
+++ b/pasta.c
@@ -272,13 +272,19 @@ void pasta_start_ns(struct ctx *c, uid_t uid, gid_t gid,
  */
 void pasta_ns_conf(struct ctx *c)
 {
-	nl_link(1, 1 /* lo */, MAC_ZERO, 1, 0);
+	nl_link_up(1, 1 /* lo */, 0);
+
+	/* Get or set guest MAC */
+	if (MAC_IS_ZERO(c->mac_guest))
+		nl_link_get_mac(1, c->pasta_ifi, c->mac_guest);
+	else
+		nl_link_set_mac(1, c->pasta_ifi, c->mac_guest);
 
 	if (c->pasta_conf_ns) {
 		enum nl_op op_routes = c->no_copy_routes ? NL_SET : NL_DUP;
 		enum nl_op op_addrs =  c->no_copy_addrs  ? NL_SET : NL_DUP;
 
-		nl_link(1, c->pasta_ifi, c->mac_guest, 1, c->mtu);
+		nl_link_up(1, c->pasta_ifi, c->mtu);
 
 		if (c->ifi4) {
 			nl_addr(op_addrs, c->ifi4, c->pasta_ifi, AF_INET,
@@ -294,8 +300,6 @@ void pasta_ns_conf(struct ctx *c)
 			nl_route(op_routes, c->ifi6, c->pasta_ifi, AF_INET6,
 				 &c->ip6.gw);
 		}
-	} else {
-		nl_link(1, c->pasta_ifi, c->mac_guest, 0, 0);
 	}
 
 	proto_update_l2_buf(c->mac_guest, NULL, NULL);
-- 
@@ -272,13 +272,19 @@ void pasta_start_ns(struct ctx *c, uid_t uid, gid_t gid,
  */
 void pasta_ns_conf(struct ctx *c)
 {
-	nl_link(1, 1 /* lo */, MAC_ZERO, 1, 0);
+	nl_link_up(1, 1 /* lo */, 0);
+
+	/* Get or set guest MAC */
+	if (MAC_IS_ZERO(c->mac_guest))
+		nl_link_get_mac(1, c->pasta_ifi, c->mac_guest);
+	else
+		nl_link_set_mac(1, c->pasta_ifi, c->mac_guest);
 
 	if (c->pasta_conf_ns) {
 		enum nl_op op_routes = c->no_copy_routes ? NL_SET : NL_DUP;
 		enum nl_op op_addrs =  c->no_copy_addrs  ? NL_SET : NL_DUP;
 
-		nl_link(1, c->pasta_ifi, c->mac_guest, 1, c->mtu);
+		nl_link_up(1, c->pasta_ifi, c->mtu);
 
 		if (c->ifi4) {
 			nl_addr(op_addrs, c->ifi4, c->pasta_ifi, AF_INET,
@@ -294,8 +300,6 @@ void pasta_ns_conf(struct ctx *c)
 			nl_route(op_routes, c->ifi6, c->pasta_ifi, AF_INET6,
 				 &c->ip6.gw);
 		}
-	} else {
-		nl_link(1, c->pasta_ifi, c->mac_guest, 0, 0);
 	}
 
 	proto_update_l2_buf(c->mac_guest, NULL, NULL);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 02/17] netlink: Split nl_addr() into separate operation functions
  2023-07-24  6:09 [PATCH 00/17] netlink fixes and cleanups David Gibson
  2023-07-24  6:09 ` [PATCH 01/17] netlink: Split up functionality if nl_link() David Gibson
@ 2023-07-24  6:09 ` David Gibson
  2023-08-02 22:47   ` Stefano Brivio
  2023-07-24  6:09 ` [PATCH 03/17] netlink: Split nl_route() " David Gibson
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 35+ messages in thread
From: David Gibson @ 2023-07-24  6:09 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

nl_addr() can perform three quite different operations based on the 'op'
parameter, each of which uses a different subset of the parameters.  Split
them up into a function for each operation.  This does use more lines of
code, but the overlap wasn't that great, and the separated logic is much
easier to follow.

It's also clearer in the callers what we expect the netlink operations to
do, and what information it uses.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c    |  12 ++-
 netlink.c | 232 ++++++++++++++++++++++++++++++++----------------------
 netlink.h |   6 +-
 pasta.c   |  17 ++--
 4 files changed, 159 insertions(+), 108 deletions(-)

diff --git a/conf.c b/conf.c
index 2ff9e2a..2057028 100644
--- a/conf.c
+++ b/conf.c
@@ -650,10 +650,8 @@ static unsigned int conf_ip4(unsigned int ifi,
 	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->gw))
 		nl_route(NL_GET, ifi, 0, AF_INET, &ip4->gw);
 
-	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr)) {
-		nl_addr(NL_GET, ifi, 0, AF_INET,
-			&ip4->addr, &ip4->prefix_len, NULL);
-	}
+	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr))
+		nl_addr_get(ifi, AF_INET, &ip4->addr, &ip4->prefix_len, NULL);
 
 	if (!ip4->prefix_len) {
 		in_addr_t addr = ntohl(ip4->addr.s_addr);
@@ -703,9 +701,9 @@ static unsigned int conf_ip6(unsigned int ifi,
 	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->gw))
 		nl_route(NL_GET, ifi, 0, AF_INET6, &ip6->gw);
 
-	nl_addr(NL_GET, ifi, 0, AF_INET6,
-		IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ? &ip6->addr : NULL,
-		&prefix_len, &ip6->addr_ll);
+	nl_addr_get(ifi, AF_INET6,
+		    IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ? &ip6->addr : NULL,
+		    &prefix_len, &ip6->addr_ll);
 
 	memcpy(&ip6->addr_seen, &ip6->addr, sizeof(ip6->addr));
 	memcpy(&ip6->addr_ll_seen, &ip6->addr_ll, sizeof(ip6->addr_ll));
diff --git a/netlink.c b/netlink.c
index 4b1f75e..269d738 100644
--- a/netlink.c
+++ b/netlink.c
@@ -334,17 +334,76 @@ next:
 }
 
 /**
- * nl_addr() - Get/set/copy IP addresses for given interface and address family
- * @op:		Requested operation
+ * nl_addr_get() - Get IP address for given interface and address family
  * @ifi:	Interface index in outer network namespace
- * @ifi_ns:	Interface index in target namespace for NL_SET, NL_DUP
  * @af:		Address family
- * @addr:	Global address to fill on NL_GET, to set on NL_SET
- * @prefix_len:	Mask or prefix length, set or fetched (for IPv4)
- * @addr_l:	Link-scoped address to fill on NL_GET
+ * @addr:	Global address to fill
+ * @prefix_len:	Mask or prefix length, to fill (for IPv4)
+ * @addr_l:	Link-scoped address to fill (for IPv6)
+ */
+void nl_addr_get(unsigned int ifi, sa_family_t af, void *addr,
+		 int *prefix_len, void *addr_l)
+{
+	struct req_t {
+		struct nlmsghdr nlh;
+		struct ifaddrmsg ifa;
+	} req = {
+		.nlh.nlmsg_type    = RTM_GETADDR,
+		.nlh.nlmsg_flags   = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
+		.nlh.nlmsg_len     = sizeof(req),
+		.nlh.nlmsg_seq     = nl_seq++,
+
+		.ifa.ifa_family    = af,
+		.ifa.ifa_index     = ifi,
+	};
+	struct nlmsghdr *nh;
+	char buf[NLBUFSIZ];
+	ssize_t n;
+
+	if ((n = nl_req(0, buf, &req, req.nlh.nlmsg_len)) < 0)
+		return;
+
+	for (nh = (struct nlmsghdr *)buf;
+	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
+	     nh = NLMSG_NEXT(nh, n)) {
+		struct ifaddrmsg *ifa = (struct ifaddrmsg *)NLMSG_DATA(nh);
+		struct rtattr *rta;
+		size_t na;
+
+		if (nh->nlmsg_type != RTM_NEWADDR)
+			continue;
+
+		if (ifa->ifa_index != ifi)
+			continue;
+
+		for (rta = IFA_RTA(ifa), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
+		     rta = RTA_NEXT(rta, na)) {
+			if (rta->rta_type != IFA_ADDRESS)
+				continue;
+
+			if (af == AF_INET) {
+				memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta));
+				*prefix_len = ifa->ifa_prefixlen;
+			} else if (af == AF_INET6 && addr &&
+				   ifa->ifa_scope == RT_SCOPE_UNIVERSE) {
+				memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta));
+			}
+
+			if (addr_l &&
+			    af == AF_INET6 && ifa->ifa_scope == RT_SCOPE_LINK)
+				memcpy(addr_l, RTA_DATA(rta), RTA_PAYLOAD(rta));
+		}
+	}
+}
+
+/**
+ * nl_add_set() - Set IP addresses for given interface and address family
+ * @ifi:	Interface index
+ * @af:		Address family
+ * @addr:	Global address to set
+ * @prefix_len:	Mask or prefix length to set
  */
-void nl_addr(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
-	     sa_family_t af, void *addr, int *prefix_len, void *addr_l)
+void nl_addr_set(unsigned int ifi, sa_family_t af, void *addr, int prefix_len)
 {
 	struct req_t {
 		struct nlmsghdr nlh;
@@ -364,125 +423,112 @@ void nl_addr(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
 			} a6;
 		} set;
 	} req = {
-		.nlh.nlmsg_type    = op == NL_SET ? RTM_NEWADDR : RTM_GETADDR,
-		.nlh.nlmsg_flags   = NLM_F_REQUEST,
+		.nlh.nlmsg_type    = RTM_NEWADDR,
+		.nlh.nlmsg_flags   = NLM_F_REQUEST | NLM_F_ACK |
+				     NLM_F_CREATE | NLM_F_EXCL,
 		.nlh.nlmsg_len     = NLMSG_LENGTH(sizeof(struct ifaddrmsg)),
 		.nlh.nlmsg_seq     = nl_seq++,
 
 		.ifa.ifa_family    = af,
-		.ifa.ifa_index     = op == NL_SET ? ifi_ns : ifi,
-		.ifa.ifa_prefixlen = op == NL_SET ? *prefix_len : 0,
+		.ifa.ifa_index     = ifi,
+		.ifa.ifa_prefixlen = prefix_len,
+		.ifa.ifa_scope	   = RT_SCOPE_UNIVERSE,
 	};
-	ssize_t n, nlmsgs_size;
-	struct ifaddrmsg *ifa;
-	struct nlmsghdr *nh;
-	struct rtattr *rta;
 	char buf[NLBUFSIZ];
-	size_t na;
 
-	if (op == NL_SET) {
-		if (af == AF_INET6) {
-			size_t rta_len = RTA_LENGTH(sizeof(req.set.a6.l));
+	if (af == AF_INET6) {
+		size_t rta_len = RTA_LENGTH(sizeof(req.set.a6.l));
 
-			/* By default, strictly speaking, it's duplicated */
-			req.ifa.ifa_flags = IFA_F_NODAD;
+		/* By default, strictly speaking, it's duplicated */
+		req.ifa.ifa_flags = IFA_F_NODAD;
 
-			req.nlh.nlmsg_len = offsetof(struct req_t, set.a6)
-				+ sizeof(req.set.a6);
+		req.nlh.nlmsg_len = offsetof(struct req_t, set.a6)
+			+ sizeof(req.set.a6);
 
-			memcpy(&req.set.a6.l, addr, sizeof(req.set.a6.l));
-			req.set.a6.rta_l.rta_len = rta_len;
-			req.set.a4.rta_l.rta_type = IFA_LOCAL;
-			memcpy(&req.set.a6.a, addr, sizeof(req.set.a6.a));
-			req.set.a6.rta_a.rta_len = rta_len;
-			req.set.a6.rta_a.rta_type = IFA_ADDRESS;
-		} else {
-			size_t rta_len = RTA_LENGTH(sizeof(req.set.a4.l));
-
-			req.nlh.nlmsg_len = offsetof(struct req_t, set.a4)
-				+ sizeof(req.set.a4);
+		memcpy(&req.set.a6.l, addr, sizeof(req.set.a6.l));
+		req.set.a6.rta_l.rta_len = rta_len;
+		req.set.a4.rta_l.rta_type = IFA_LOCAL;
+		memcpy(&req.set.a6.a, addr, sizeof(req.set.a6.a));
+		req.set.a6.rta_a.rta_len = rta_len;
+		req.set.a6.rta_a.rta_type = IFA_ADDRESS;
+	} else {
+		size_t rta_len = RTA_LENGTH(sizeof(req.set.a4.l));
 
-			req.set.a4.l = req.set.a4.a = *(uint32_t *)addr;
-			req.set.a4.rta_l.rta_len = rta_len;
-			req.set.a4.rta_l.rta_type = IFA_LOCAL;
-			req.set.a4.rta_a.rta_len = rta_len;
-			req.set.a4.rta_a.rta_type = IFA_ADDRESS;
-		}
+		req.nlh.nlmsg_len = offsetof(struct req_t, set.a4)
+			+ sizeof(req.set.a4);
 
-		req.ifa.ifa_scope = RT_SCOPE_UNIVERSE;
-		req.nlh.nlmsg_flags |= NLM_F_CREATE | NLM_F_ACK | NLM_F_EXCL;
-	} else {
-		req.nlh.nlmsg_flags |= NLM_F_DUMP;
+		req.set.a4.l = req.set.a4.a = *(uint32_t *)addr;
+		req.set.a4.rta_l.rta_len = rta_len;
+		req.set.a4.rta_l.rta_type = IFA_LOCAL;
+		req.set.a4.rta_a.rta_len = rta_len;
+		req.set.a4.rta_a.rta_type = IFA_ADDRESS;
 	}
 
-	if ((n = nl_req(op == NL_SET, buf, &req, req.nlh.nlmsg_len)) < 0)
-		return;
+	nl_req(1, buf, &req, req.nlh.nlmsg_len);
+}
 
-	if (op == NL_SET)
+/**
+ * nl_addr_dup() - Copy IP addresses for given interface and address family
+ * @ifi:	Interface index in outer network namespace
+ * @ifi_ns:	Interface index in target namespace
+ * @af:		Address family
+ */
+void nl_addr_dup(unsigned int ifi, unsigned int ifi_ns, sa_family_t af)
+{
+	struct req_t {
+		struct nlmsghdr nlh;
+		struct ifaddrmsg ifa;
+	} req = {
+		.nlh.nlmsg_type    = RTM_GETADDR,
+		.nlh.nlmsg_flags   = NLM_F_REQUEST | NLM_F_DUMP,
+		.nlh.nlmsg_len     = sizeof(req),
+		.nlh.nlmsg_seq     = nl_seq++,
+
+		.ifa.ifa_family    = af,
+		.ifa.ifa_index     = ifi,
+		.ifa.ifa_prefixlen = 0,
+	};
+	char buf[NLBUFSIZ], resp[NLBUFSIZ];
+	ssize_t n, nlmsgs_size;
+	struct nlmsghdr *nh;
+
+	if ((n = nl_req(0, buf, &req, sizeof(req))) < 0)
 		return;
 
-	nh = (struct nlmsghdr *)buf;
 	nlmsgs_size = n;
 
-	for ( ; NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
+	for (nh = (struct nlmsghdr *)buf;
+	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
+	     nh = NLMSG_NEXT(nh, n)) {
+		struct ifaddrmsg *ifa;
+		struct rtattr *rta;
+		size_t na;
+
 		if (nh->nlmsg_type != RTM_NEWADDR)
-			goto next;
+			continue;
 
-		if (op == NL_DUP) {
-			nh->nlmsg_seq = nl_seq++;
-			nh->nlmsg_pid = 0;
-			nh->nlmsg_flags &= ~NLM_F_DUMP_FILTERED;
-			nh->nlmsg_flags |= NLM_F_REQUEST | NLM_F_ACK |
-					   NLM_F_CREATE;
-		}
+		nh->nlmsg_seq = nl_seq++;
+		nh->nlmsg_pid = 0;
+		nh->nlmsg_flags &= ~NLM_F_DUMP_FILTERED;
+		nh->nlmsg_flags |= NLM_F_REQUEST | NLM_F_ACK | NLM_F_CREATE;
 
 		ifa = (struct ifaddrmsg *)NLMSG_DATA(nh);
 
-		if (op == NL_DUP && (ifa->ifa_scope == RT_SCOPE_LINK ||
-				     ifa->ifa_index != ifi)) {
+		if (ifa->ifa_scope == RT_SCOPE_LINK || ifa->ifa_index != ifi) {
 			ifa->ifa_family = AF_UNSPEC;
-			goto next;
+			continue;
 		}
 
-		if (ifa->ifa_index != ifi)
-			goto next;
-
-		if (op == NL_DUP)
-			ifa->ifa_index = ifi_ns;
+		ifa->ifa_index = ifi_ns;
 
 		for (rta = IFA_RTA(ifa), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
 		     rta = RTA_NEXT(rta, na)) {
-			if (op == NL_DUP && rta->rta_type == IFA_LABEL)
+			if (rta->rta_type == IFA_LABEL)
 				rta->rta_type = IFA_UNSPEC;
-
-			if (op == NL_DUP || rta->rta_type != IFA_ADDRESS)
-				continue;
-
-			if (af == AF_INET && addr && !*(uint32_t *)addr) {
-				memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta));
-				*prefix_len = ifa->ifa_prefixlen;
-			} else if (af == AF_INET6 && addr &&
-				 ifa->ifa_scope == RT_SCOPE_UNIVERSE &&
-				 IN6_IS_ADDR_UNSPECIFIED(addr)) {
-				memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta));
-			}
-
-			if (addr_l &&
-			    af == AF_INET6 && ifa->ifa_scope == RT_SCOPE_LINK &&
-			    IN6_IS_ADDR_UNSPECIFIED(addr_l))
-				memcpy(addr_l, RTA_DATA(rta), RTA_PAYLOAD(rta));
 		}
-next:
-		if (nh->nlmsg_type == NLMSG_DONE)
-			break;
 	}
 
-	if (op == NL_DUP) {
-		char resp[NLBUFSIZ];
-
-		nh = (struct nlmsghdr *)buf;
-		nl_req(1, resp, nh, nlmsgs_size);
-	}
+	nl_req(1, resp, buf, nlmsgs_size);
 }
 
 /**
diff --git a/netlink.h b/netlink.h
index 980ac44..5ac972d 100644
--- a/netlink.h
+++ b/netlink.h
@@ -16,8 +16,10 @@ void nl_sock_init(const struct ctx *c, bool ns);
 unsigned int nl_get_ext_if(sa_family_t af);
 void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
 	      sa_family_t af, void *gw);
-void nl_addr(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
-	     sa_family_t af, void *addr, int *prefix_len, void *addr_l);
+void nl_addr_get(unsigned int ifi, sa_family_t af, void *addr,
+		 int *prefix_len, void *addr_l);
+void nl_addr_set(unsigned int ifi, sa_family_t af, void *addr, int prefix_len);
+void nl_addr_dup(unsigned int ifi, unsigned int ifi_ns, sa_family_t af);
 void nl_link_get_mac(int ns, unsigned int ifi, void *mac);
 void nl_link_set_mac(int ns, unsigned int ifi, void *mac);
 void nl_link_up(int ns, unsigned int ifi, int mtu);
diff --git a/pasta.c b/pasta.c
index 3b5537d..1a8f09c 100644
--- a/pasta.c
+++ b/pasta.c
@@ -282,21 +282,26 @@ void pasta_ns_conf(struct ctx *c)
 
 	if (c->pasta_conf_ns) {
 		enum nl_op op_routes = c->no_copy_routes ? NL_SET : NL_DUP;
-		enum nl_op op_addrs =  c->no_copy_addrs  ? NL_SET : NL_DUP;
 
 		nl_link_up(1, c->pasta_ifi, c->mtu);
 
 		if (c->ifi4) {
-			nl_addr(op_addrs, c->ifi4, c->pasta_ifi, AF_INET,
-				&c->ip4.addr, &c->ip4.prefix_len, NULL);
+			if (c->no_copy_addrs)
+				nl_addr_set(c->pasta_ifi, AF_INET, 
+					    &c->ip4.addr, c->ip4.prefix_len);
+			else
+				nl_addr_dup(c->ifi4, c->pasta_ifi, AF_INET);
+
 			nl_route(op_routes, c->ifi4, c->pasta_ifi, AF_INET,
 				 &c->ip4.gw);
 		}
 
 		if (c->ifi6) {
-			int prefix_len = 64;
-			nl_addr(op_addrs, c->ifi6, c->pasta_ifi, AF_INET6,
-				&c->ip6.addr, &prefix_len, NULL);
+			if (c->no_copy_addrs)
+				nl_addr_set(c->pasta_ifi, AF_INET6, &c->ip6.addr, 64);
+			else
+				nl_addr_dup(c->ifi4, c->pasta_ifi, AF_INET6);
+
 			nl_route(op_routes, c->ifi6, c->pasta_ifi, AF_INET6,
 				 &c->ip6.gw);
 		}
-- 
@@ -282,21 +282,26 @@ void pasta_ns_conf(struct ctx *c)
 
 	if (c->pasta_conf_ns) {
 		enum nl_op op_routes = c->no_copy_routes ? NL_SET : NL_DUP;
-		enum nl_op op_addrs =  c->no_copy_addrs  ? NL_SET : NL_DUP;
 
 		nl_link_up(1, c->pasta_ifi, c->mtu);
 
 		if (c->ifi4) {
-			nl_addr(op_addrs, c->ifi4, c->pasta_ifi, AF_INET,
-				&c->ip4.addr, &c->ip4.prefix_len, NULL);
+			if (c->no_copy_addrs)
+				nl_addr_set(c->pasta_ifi, AF_INET, 
+					    &c->ip4.addr, c->ip4.prefix_len);
+			else
+				nl_addr_dup(c->ifi4, c->pasta_ifi, AF_INET);
+
 			nl_route(op_routes, c->ifi4, c->pasta_ifi, AF_INET,
 				 &c->ip4.gw);
 		}
 
 		if (c->ifi6) {
-			int prefix_len = 64;
-			nl_addr(op_addrs, c->ifi6, c->pasta_ifi, AF_INET6,
-				&c->ip6.addr, &prefix_len, NULL);
+			if (c->no_copy_addrs)
+				nl_addr_set(c->pasta_ifi, AF_INET6, &c->ip6.addr, 64);
+			else
+				nl_addr_dup(c->ifi4, c->pasta_ifi, AF_INET6);
+
 			nl_route(op_routes, c->ifi6, c->pasta_ifi, AF_INET6,
 				 &c->ip6.gw);
 		}
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 03/17] netlink: Split nl_route() into separate operation functions
  2023-07-24  6:09 [PATCH 00/17] netlink fixes and cleanups David Gibson
  2023-07-24  6:09 ` [PATCH 01/17] netlink: Split up functionality if nl_link() David Gibson
  2023-07-24  6:09 ` [PATCH 02/17] netlink: Split nl_addr() into separate operation functions David Gibson
@ 2023-07-24  6:09 ` David Gibson
  2023-08-02 22:47   ` Stefano Brivio
  2023-07-24  6:09 ` [PATCH 04/17] netlink: Use struct in_addr for IPv4 addresses, not bare uint32_t David Gibson
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 35+ messages in thread
From: David Gibson @ 2023-07-24  6:09 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

nl_route() can perform 3 quite different operations based on the 'op'
parameter.  Split this into separate functions for each one.  This requires
more lines of code, but makes the internal logic of each operation much
easier to follow.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c    |   4 +-
 netlink.c | 238 ++++++++++++++++++++++++++++++++++--------------------
 netlink.h |  11 +--
 pasta.c   |  16 ++--
 4 files changed, 164 insertions(+), 105 deletions(-)

diff --git a/conf.c b/conf.c
index 2057028..66958d4 100644
--- a/conf.c
+++ b/conf.c
@@ -648,7 +648,7 @@ static unsigned int conf_ip4(unsigned int ifi,
 	}
 
 	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->gw))
-		nl_route(NL_GET, ifi, 0, AF_INET, &ip4->gw);
+		nl_route_get_def(ifi, AF_INET, &ip4->gw);
 
 	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr))
 		nl_addr_get(ifi, AF_INET, &ip4->addr, &ip4->prefix_len, NULL);
@@ -699,7 +699,7 @@ static unsigned int conf_ip6(unsigned int ifi,
 	}
 
 	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->gw))
-		nl_route(NL_GET, ifi, 0, AF_INET6, &ip6->gw);
+		nl_route_get_def(ifi, AF_INET6, &ip6->gw);
 
 	nl_addr_get(ifi, AF_INET6,
 		    IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ? &ip6->addr : NULL,
diff --git a/netlink.c b/netlink.c
index 269d738..346eb3a 100644
--- a/netlink.c
+++ b/netlink.c
@@ -185,15 +185,71 @@ unsigned int nl_get_ext_if(sa_family_t af)
 }
 
 /**
- * nl_route() - Get/set/copy routes for given interface and address family
- * @op:		Requested operation
- * @ifi:	Interface index in outer network namespace
- * @ifi_ns:	Interface index in target namespace for NL_SET, NL_DUP
+ * nl_route_get_def() - Get default route for given interface and address family
+ * @ifi:	Interface index
+ * @af:		Address family
+ * @gw:		Default gateway to fill on NL_GET
+ */
+void nl_route_get_def(unsigned int ifi, sa_family_t af, void *gw)
+{
+	struct req_t {
+		struct nlmsghdr nlh;
+		struct rtmsg rtm;
+		struct rtattr rta;
+		unsigned int ifi;
+	} req = {
+		.nlh.nlmsg_type	  = RTM_GETROUTE,
+		.nlh.nlmsg_len	  = sizeof(req),
+		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_DUMP,
+		.nlh.nlmsg_seq	  = nl_seq++,
+
+		.rtm.rtm_family	  = af,
+		.rtm.rtm_table	  = RT_TABLE_MAIN,
+		.rtm.rtm_scope	  = RT_SCOPE_UNIVERSE,
+		.rtm.rtm_type	  = RTN_UNICAST,
+
+		.rta.rta_type	  = RTA_OIF,
+		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
+		.ifi		  = ifi,
+	};
+	struct nlmsghdr *nh;
+	char buf[NLBUFSIZ];
+	ssize_t n;
+
+	if ((n = nl_req(0, buf, &req, req.nlh.nlmsg_len)) < 0)
+		return;
+
+	for (nh = (struct nlmsghdr *)buf;
+	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
+	     nh = NLMSG_NEXT(nh, n)) {
+		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
+		struct rtattr *rta;
+		size_t na;
+
+		if (nh->nlmsg_type != RTM_NEWROUTE)
+			continue;
+
+		if (rtm->rtm_dst_len)
+			continue;
+
+		for (rta = RTM_RTA(rtm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
+		     rta = RTA_NEXT(rta, na)) {
+			if (rta->rta_type != RTA_GATEWAY)
+				continue;
+
+			memcpy(gw, RTA_DATA(rta), RTA_PAYLOAD(rta));
+			return;
+		}
+	}
+}
+
+/**
+ * nl_route_set_def() - Set default route for given interface and address family
+ * @ifi:	Interface index in target namespace
  * @af:		Address family
- * @gw:		Default gateway to fill on NL_GET, to set on NL_SET
+ * @gw:		Default gateway to set
  */
-void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
-	      sa_family_t af, void *gw)
+void nl_route_set_def(unsigned int ifi, sa_family_t af, void *gw)
 {
 	struct req_t {
 		struct nlmsghdr nlh;
@@ -215,122 +271,126 @@ void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
 			} r4;
 		} set;
 	} req = {
-		.nlh.nlmsg_type	  = op == NL_SET ? RTM_NEWROUTE : RTM_GETROUTE,
-		.nlh.nlmsg_flags  = NLM_F_REQUEST,
+		.nlh.nlmsg_type	  = RTM_NEWROUTE,
+		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK |
+				    NLM_F_CREATE | NLM_F_EXCL,
 		.nlh.nlmsg_seq	  = nl_seq++,
 
 		.rtm.rtm_family	  = af,
 		.rtm.rtm_table	  = RT_TABLE_MAIN,
 		.rtm.rtm_scope	  = RT_SCOPE_UNIVERSE,
 		.rtm.rtm_type	  = RTN_UNICAST,
+		.rtm.rtm_protocol = RTPROT_BOOT,
 
 		.rta.rta_type	  = RTA_OIF,
 		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
-		.ifi		  = op == NL_SET ? ifi_ns : ifi,
+		.ifi		  = ifi,
 	};
-	unsigned dup_routes = 0;
-	ssize_t n, nlmsgs_size;
-	struct nlmsghdr *nh;
-	struct rtattr *rta;
 	char buf[NLBUFSIZ];
-	struct rtmsg *rtm;
-	size_t na;
-
-	if (op == NL_SET) {
-		if (af == AF_INET6) {
-			size_t rta_len = RTA_LENGTH(sizeof(req.set.r6.d));
 
-			req.nlh.nlmsg_len = offsetof(struct req_t, set.r6)
-				+ sizeof(req.set.r6);
+	if (af == AF_INET6) {
+		size_t rta_len = RTA_LENGTH(sizeof(req.set.r6.d));
 
-			req.set.r6.rta_dst.rta_type = RTA_DST;
-			req.set.r6.rta_dst.rta_len = rta_len;
+		req.nlh.nlmsg_len = offsetof(struct req_t, set.r6)
+			+ sizeof(req.set.r6);
 
-			memcpy(&req.set.r6.a, gw, sizeof(req.set.r6.a));
-			req.set.r6.rta_gw.rta_type = RTA_GATEWAY;
-			req.set.r6.rta_gw.rta_len = rta_len;
-		} else {
-			size_t rta_len = RTA_LENGTH(sizeof(req.set.r4.d));
+		req.set.r6.rta_dst.rta_type = RTA_DST;
+		req.set.r6.rta_dst.rta_len = rta_len;
 
-			req.nlh.nlmsg_len = offsetof(struct req_t, set.r4)
-				+ sizeof(req.set.r4);
+		memcpy(&req.set.r6.a, gw, sizeof(req.set.r6.a));
+		req.set.r6.rta_gw.rta_type = RTA_GATEWAY;
+		req.set.r6.rta_gw.rta_len = rta_len;
+	} else {
+		size_t rta_len = RTA_LENGTH(sizeof(req.set.r4.d));
 
-			req.set.r4.rta_dst.rta_type = RTA_DST;
-			req.set.r4.rta_dst.rta_len = rta_len;
+		req.nlh.nlmsg_len = offsetof(struct req_t, set.r4)
+			+ sizeof(req.set.r4);
 
-			req.set.r4.a = *(uint32_t *)gw;
-			req.set.r4.rta_gw.rta_type = RTA_GATEWAY;
-			req.set.r4.rta_gw.rta_len = rta_len;
-		}
+		req.set.r4.rta_dst.rta_type = RTA_DST;
+		req.set.r4.rta_dst.rta_len = rta_len;
 
-		req.rtm.rtm_protocol = RTPROT_BOOT;
-		req.nlh.nlmsg_flags |= NLM_F_ACK | NLM_F_EXCL | NLM_F_CREATE;
-	} else {
-		req.nlh.nlmsg_len = offsetof(struct req_t, set.r6);
-		req.nlh.nlmsg_flags |= NLM_F_DUMP;
+		req.set.r4.a = *(uint32_t *)gw;
+		req.set.r4.rta_gw.rta_type = RTA_GATEWAY;
+		req.set.r4.rta_gw.rta_len = rta_len;
 	}
 
-	if ((n = nl_req(op == NL_SET, buf, &req, req.nlh.nlmsg_len)) < 0)
-		return;
+	nl_req(1, buf, &req, req.nlh.nlmsg_len);
+}
 
-	if (op == NL_SET)
+/**
+ * nl_route_dup() - Copy routes for given interface and address family
+ * @ifi:	Interface index in outer network namespace
+ * @ifi_ns:	Interface index in target namespace for NL_SET, NL_DUP
+ * @af:		Address family
+ */
+void nl_route_dup(unsigned int ifi, unsigned int ifi_ns, sa_family_t af)
+{
+	struct req_t {
+		struct nlmsghdr nlh;
+		struct rtmsg rtm;
+		struct rtattr rta;
+		unsigned int ifi;
+	} req = {
+		.nlh.nlmsg_type	  = RTM_GETROUTE,
+		.nlh.nlmsg_len	  = sizeof(req),
+		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_DUMP,
+		.nlh.nlmsg_seq	  = nl_seq++,
+
+		.rtm.rtm_family	  = af,
+		.rtm.rtm_table	  = RT_TABLE_MAIN,
+		.rtm.rtm_scope	  = RT_SCOPE_UNIVERSE,
+		.rtm.rtm_type	  = RTN_UNICAST,
+
+		.rta.rta_type	  = RTA_OIF,
+		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
+		.ifi		  = ifi,
+	};
+	char buf[NLBUFSIZ], resp[NLBUFSIZ];
+	unsigned dup_routes = 0;
+	ssize_t n, nlmsgs_size;
+	struct nlmsghdr *nh;
+	unsigned i;
+
+	if ((n = nl_req(0, buf, &req, req.nlh.nlmsg_len)) < 0)
 		return;
 
-	nh = (struct nlmsghdr *)buf;
 	nlmsgs_size = n;
 
-	for ( ; NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
-		if (nh->nlmsg_type != RTM_NEWROUTE)
-			goto next;
-
-		if (op == NL_DUP) {
-			nh->nlmsg_seq = nl_seq++;
-			nh->nlmsg_pid = 0;
-			nh->nlmsg_flags &= ~NLM_F_DUMP_FILTERED;
-			nh->nlmsg_flags |= NLM_F_REQUEST | NLM_F_ACK |
-					   NLM_F_CREATE;
-			dup_routes++;
-		}
+	for (nh = (struct nlmsghdr *)buf;
+	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
+	     nh = NLMSG_NEXT(nh, n)) {
+		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
+		struct rtattr *rta;
+		size_t na;
 
-		rtm = (struct rtmsg *)NLMSG_DATA(nh);
-		if (op == NL_GET && rtm->rtm_dst_len)
+		if (nh->nlmsg_type != RTM_NEWROUTE)
 			continue;
 
+		nh->nlmsg_seq = nl_seq++;
+		nh->nlmsg_pid = 0;
+		nh->nlmsg_flags &= ~NLM_F_DUMP_FILTERED;
+		nh->nlmsg_flags |= NLM_F_REQUEST | NLM_F_ACK |
+			NLM_F_CREATE;
+		dup_routes++;
+
 		for (rta = RTM_RTA(rtm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
 		     rta = RTA_NEXT(rta, na)) {
-			if (op == NL_GET) {
-				if (rta->rta_type != RTA_GATEWAY)
-					continue;
-
-				memcpy(gw, RTA_DATA(rta), RTA_PAYLOAD(rta));
-				return;
-			}
-
-			if (op == NL_DUP && rta->rta_type == RTA_OIF)
+			if (rta->rta_type == RTA_OIF)
 				*(unsigned int *)RTA_DATA(rta) = ifi_ns;
 		}
-
-next:
-		if (nh->nlmsg_type == NLMSG_DONE)
-			break;
 	}
 
-	if (op == NL_DUP) {
-		char resp[NLBUFSIZ];
-		unsigned i;
-
-		nh = (struct nlmsghdr *)buf;
-		/* Routes might have dependencies between each other, and the
-		 * kernel processes RTM_NEWROUTE messages sequentially. For n
-		 * valid routes, we might need to send up to n requests to get
-		 * all of them inserted. Routes that have been already inserted
-		 * won't cause the whole request to fail, so we can simply
-		 * repeat the whole request. This approach avoids the need to
-		 * calculate dependencies: let the kernel do that.
-		 */
-		for (i = 0; i < dup_routes; i++)
-			nl_req(1, resp, nh, nlmsgs_size);
-	}
+	nh = (struct nlmsghdr *)buf;
+	/* Routes might have dependencies between each other, and the
+	 * kernel processes RTM_NEWROUTE messages sequentially. For n
+	 * valid routes, we might need to send up to n requests to get
+	 * all of them inserted. Routes that have been already
+	 * inserted won't cause the whole request to fail, so we can
+	 * simply repeat the whole request. This approach avoids the
+	 * need to calculate dependencies: let the kernel do that.
+	 */
+	for (i = 0; i < dup_routes; i++)
+		nl_req(1, resp, nh, nlmsgs_size);
 }
 
 /**
diff --git a/netlink.h b/netlink.h
index 5ac972d..36bbf9f 100644
--- a/netlink.h
+++ b/netlink.h
@@ -6,16 +6,11 @@
 #ifndef NETLINK_H
 #define NETLINK_H
 
-enum nl_op {
-	NL_GET,
-	NL_SET,
-	NL_DUP,
-};
-
 void nl_sock_init(const struct ctx *c, bool ns);
 unsigned int nl_get_ext_if(sa_family_t af);
-void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
-	      sa_family_t af, void *gw);
+void nl_route_get_def(unsigned int ifi, sa_family_t af, void *gw);
+void nl_route_set_def(unsigned int ifi, sa_family_t af, void *gw);
+void nl_route_dup(unsigned int ifi, unsigned int ifi_ns, sa_family_t af);
 void nl_addr_get(unsigned int ifi, sa_family_t af, void *addr,
 		 int *prefix_len, void *addr_l);
 void nl_addr_set(unsigned int ifi, sa_family_t af, void *addr, int prefix_len);
diff --git a/pasta.c b/pasta.c
index 1a8f09c..14ecc71 100644
--- a/pasta.c
+++ b/pasta.c
@@ -281,8 +281,6 @@ void pasta_ns_conf(struct ctx *c)
 		nl_link_set_mac(1, c->pasta_ifi, c->mac_guest);
 
 	if (c->pasta_conf_ns) {
-		enum nl_op op_routes = c->no_copy_routes ? NL_SET : NL_DUP;
-
 		nl_link_up(1, c->pasta_ifi, c->mtu);
 
 		if (c->ifi4) {
@@ -292,8 +290,11 @@ void pasta_ns_conf(struct ctx *c)
 			else
 				nl_addr_dup(c->ifi4, c->pasta_ifi, AF_INET);
 
-			nl_route(op_routes, c->ifi4, c->pasta_ifi, AF_INET,
-				 &c->ip4.gw);
+			if (c->no_copy_routes)
+				nl_route_set_def(c->pasta_ifi, AF_INET,
+						 &c->ip4.gw);
+			else
+				nl_route_dup(c->ifi4, c->pasta_ifi, AF_INET);
 		}
 
 		if (c->ifi6) {
@@ -302,8 +303,11 @@ void pasta_ns_conf(struct ctx *c)
 			else
 				nl_addr_dup(c->ifi4, c->pasta_ifi, AF_INET6);
 
-			nl_route(op_routes, c->ifi6, c->pasta_ifi, AF_INET6,
-				 &c->ip6.gw);
+			if (c->no_copy_routes)
+				nl_route_set_def(c->pasta_ifi, AF_INET6,
+						 &c->ip6.gw);
+			else
+				nl_route_dup(c->ifi6, c->pasta_ifi, AF_INET6);
 		}
 	}
 
-- 
@@ -281,8 +281,6 @@ void pasta_ns_conf(struct ctx *c)
 		nl_link_set_mac(1, c->pasta_ifi, c->mac_guest);
 
 	if (c->pasta_conf_ns) {
-		enum nl_op op_routes = c->no_copy_routes ? NL_SET : NL_DUP;
-
 		nl_link_up(1, c->pasta_ifi, c->mtu);
 
 		if (c->ifi4) {
@@ -292,8 +290,11 @@ void pasta_ns_conf(struct ctx *c)
 			else
 				nl_addr_dup(c->ifi4, c->pasta_ifi, AF_INET);
 
-			nl_route(op_routes, c->ifi4, c->pasta_ifi, AF_INET,
-				 &c->ip4.gw);
+			if (c->no_copy_routes)
+				nl_route_set_def(c->pasta_ifi, AF_INET,
+						 &c->ip4.gw);
+			else
+				nl_route_dup(c->ifi4, c->pasta_ifi, AF_INET);
 		}
 
 		if (c->ifi6) {
@@ -302,8 +303,11 @@ void pasta_ns_conf(struct ctx *c)
 			else
 				nl_addr_dup(c->ifi4, c->pasta_ifi, AF_INET6);
 
-			nl_route(op_routes, c->ifi6, c->pasta_ifi, AF_INET6,
-				 &c->ip6.gw);
+			if (c->no_copy_routes)
+				nl_route_set_def(c->pasta_ifi, AF_INET6,
+						 &c->ip6.gw);
+			else
+				nl_route_dup(c->ifi6, c->pasta_ifi, AF_INET6);
 		}
 	}
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 04/17] netlink: Use struct in_addr for IPv4 addresses, not bare uint32_t
  2023-07-24  6:09 [PATCH 00/17] netlink fixes and cleanups David Gibson
                   ` (2 preceding siblings ...)
  2023-07-24  6:09 ` [PATCH 03/17] netlink: Split nl_route() " David Gibson
@ 2023-07-24  6:09 ` David Gibson
  2023-07-24  6:09 ` [PATCH 05/17] netlink: Explicitly pass netlink sockets to operations David Gibson
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2023-07-24  6:09 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

This improves consistency with IPv6 and makes it harder to misuse these as
some other sort of value.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 netlink.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/netlink.c b/netlink.c
index 346eb3a..75d5988 100644
--- a/netlink.c
+++ b/netlink.c
@@ -265,9 +265,9 @@ void nl_route_set_def(unsigned int ifi, sa_family_t af, void *gw)
 			} r6;
 			struct {
 				struct rtattr rta_dst;
-				uint32_t d;
+				struct in_addr d;
 				struct rtattr rta_gw;
-				uint32_t a;
+				struct in_addr a;
 			} r4;
 		} set;
 	} req = {
@@ -309,7 +309,7 @@ void nl_route_set_def(unsigned int ifi, sa_family_t af, void *gw)
 		req.set.r4.rta_dst.rta_type = RTA_DST;
 		req.set.r4.rta_dst.rta_len = rta_len;
 
-		req.set.r4.a = *(uint32_t *)gw;
+		memcpy(&req.set.r4.a, gw, sizeof(req.set.r4.a));
 		req.set.r4.rta_gw.rta_type = RTA_GATEWAY;
 		req.set.r4.rta_gw.rta_len = rta_len;
 	}
@@ -471,9 +471,9 @@ void nl_addr_set(unsigned int ifi, sa_family_t af, void *addr, int prefix_len)
 		union {
 			struct {
 				struct rtattr rta_l;
-				uint32_t l;
+				struct in_addr l;
 				struct rtattr rta_a;
-				uint32_t a;
+				struct in_addr a;
 			} a4;
 			struct {
 				struct rtattr rta_l;
@@ -517,7 +517,7 @@ void nl_addr_set(unsigned int ifi, sa_family_t af, void *addr, int prefix_len)
 		req.nlh.nlmsg_len = offsetof(struct req_t, set.a4)
 			+ sizeof(req.set.a4);
 
-		req.set.a4.l = req.set.a4.a = *(uint32_t *)addr;
+		memcpy(&req.set.a4.l, addr, sizeof(req.set.a4.l));
 		req.set.a4.rta_l.rta_len = rta_len;
 		req.set.a4.rta_l.rta_type = IFA_LOCAL;
 		req.set.a4.rta_a.rta_len = rta_len;
-- 
@@ -265,9 +265,9 @@ void nl_route_set_def(unsigned int ifi, sa_family_t af, void *gw)
 			} r6;
 			struct {
 				struct rtattr rta_dst;
-				uint32_t d;
+				struct in_addr d;
 				struct rtattr rta_gw;
-				uint32_t a;
+				struct in_addr a;
 			} r4;
 		} set;
 	} req = {
@@ -309,7 +309,7 @@ void nl_route_set_def(unsigned int ifi, sa_family_t af, void *gw)
 		req.set.r4.rta_dst.rta_type = RTA_DST;
 		req.set.r4.rta_dst.rta_len = rta_len;
 
-		req.set.r4.a = *(uint32_t *)gw;
+		memcpy(&req.set.r4.a, gw, sizeof(req.set.r4.a));
 		req.set.r4.rta_gw.rta_type = RTA_GATEWAY;
 		req.set.r4.rta_gw.rta_len = rta_len;
 	}
@@ -471,9 +471,9 @@ void nl_addr_set(unsigned int ifi, sa_family_t af, void *addr, int prefix_len)
 		union {
 			struct {
 				struct rtattr rta_l;
-				uint32_t l;
+				struct in_addr l;
 				struct rtattr rta_a;
-				uint32_t a;
+				struct in_addr a;
 			} a4;
 			struct {
 				struct rtattr rta_l;
@@ -517,7 +517,7 @@ void nl_addr_set(unsigned int ifi, sa_family_t af, void *addr, int prefix_len)
 		req.nlh.nlmsg_len = offsetof(struct req_t, set.a4)
 			+ sizeof(req.set.a4);
 
-		req.set.a4.l = req.set.a4.a = *(uint32_t *)addr;
+		memcpy(&req.set.a4.l, addr, sizeof(req.set.a4.l));
 		req.set.a4.rta_l.rta_len = rta_len;
 		req.set.a4.rta_l.rta_type = IFA_LOCAL;
 		req.set.a4.rta_a.rta_len = rta_len;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 05/17] netlink: Explicitly pass netlink sockets to operations
  2023-07-24  6:09 [PATCH 00/17] netlink fixes and cleanups David Gibson
                   ` (3 preceding siblings ...)
  2023-07-24  6:09 ` [PATCH 04/17] netlink: Use struct in_addr for IPv4 addresses, not bare uint32_t David Gibson
@ 2023-07-24  6:09 ` David Gibson
  2023-07-24  6:09 ` [PATCH 06/17] netlink: Make nl_*_dup() use a separate datagram for each request David Gibson
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2023-07-24  6:09 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

All the netlink operations currently implicitly use one of the two global
netlink sockets, sometimes depending on an 'ns' parameter.  Change them
all to explicitly take the socket to use (or two sockets to use in the case
of the *_dup() functions).  As well as making these functions strictly more
general, it makes the callers easier to follow because we're passing a
socket variable with a name rather than an unexplained '0' or '1' for the
ns parameter.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c    | 15 ++++-----
 netlink.c | 94 ++++++++++++++++++++++++++++++++-----------------------
 netlink.h | 28 ++++++++++-------
 pasta.c   | 35 ++++++++++++---------
 4 files changed, 100 insertions(+), 72 deletions(-)

diff --git a/conf.c b/conf.c
index 66958d4..2e6e03f 100644
--- a/conf.c
+++ b/conf.c
@@ -640,7 +640,7 @@ static unsigned int conf_ip4(unsigned int ifi,
 			     struct ip4_ctx *ip4, unsigned char *mac)
 {
 	if (!ifi)
-		ifi = nl_get_ext_if(AF_INET);
+		ifi = nl_get_ext_if(nl_sock, AF_INET);
 
 	if (!ifi) {
 		warn("No external routable interface for IPv4");
@@ -648,10 +648,11 @@ static unsigned int conf_ip4(unsigned int ifi,
 	}
 
 	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->gw))
-		nl_route_get_def(ifi, AF_INET, &ip4->gw);
+		nl_route_get_def(nl_sock, ifi, AF_INET, &ip4->gw);
 
 	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr))
-		nl_addr_get(ifi, AF_INET, &ip4->addr, &ip4->prefix_len, NULL);
+		nl_addr_get(nl_sock, ifi, AF_INET,
+			    &ip4->addr, &ip4->prefix_len, NULL);
 
 	if (!ip4->prefix_len) {
 		in_addr_t addr = ntohl(ip4->addr.s_addr);
@@ -668,7 +669,7 @@ static unsigned int conf_ip4(unsigned int ifi,
 	memcpy(&ip4->addr_seen, &ip4->addr, sizeof(ip4->addr_seen));
 
 	if (MAC_IS_ZERO(mac))
-		nl_link_get_mac(0, ifi, mac);
+		nl_link_get_mac(nl_sock, ifi, mac);
 
 	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr) ||
 	    MAC_IS_ZERO(mac))
@@ -691,7 +692,7 @@ static unsigned int conf_ip6(unsigned int ifi,
 	int prefix_len = 0;
 
 	if (!ifi)
-		ifi = nl_get_ext_if(AF_INET6);
+		ifi = nl_get_ext_if(nl_sock, AF_INET6);
 
 	if (!ifi) {
 		warn("No external routable interface for IPv6");
@@ -699,9 +700,9 @@ static unsigned int conf_ip6(unsigned int ifi,
 	}
 
 	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->gw))
-		nl_route_get_def(ifi, AF_INET6, &ip6->gw);
+		nl_route_get_def(nl_sock, ifi, AF_INET6, &ip6->gw);
 
-	nl_addr_get(ifi, AF_INET6,
+	nl_addr_get(nl_sock, ifi, AF_INET6,
 		    IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ? &ip6->addr : NULL,
 		    &prefix_len, &ip6->addr_ll);
 
diff --git a/netlink.c b/netlink.c
index 75d5988..72044cd 100644
--- a/netlink.c
+++ b/netlink.c
@@ -38,8 +38,8 @@
 #define NLBUFSIZ	(8192 * sizeof(struct nlmsghdr)) /* See netlink(7) */
 
 /* Socket in init, in target namespace, sequence (just needs to be monotonic) */
-static int nl_sock	= -1;
-static int nl_sock_ns	= -1;
+int nl_sock	= -1;
+int nl_sock_ns	= -1;
 static int nl_seq;
 
 /**
@@ -98,17 +98,17 @@ fail:
 
 /**
  * nl_req() - Send netlink request and read response
- * @ns:		Use netlink socket in namespace
+ * @s:		Netlink socket
  * @buf:	Buffer for response (at least NLBUFSIZ long)
  * @req:	Request with netlink header
  * @len:	Request length
  *
  * Return: received length on success, negative error code on failure
  */
-static int nl_req(int ns, char *buf, const void *req, ssize_t len)
+static int nl_req(int s, char *buf, const void *req, ssize_t len)
 {
-	int s = ns ? nl_sock_ns : nl_sock, done = 0;
 	char flush[NLBUFSIZ];
+	int done = 0;
 	ssize_t n;
 
 	while (!done && (n = recv(s, flush, sizeof(flush), MSG_DONTWAIT)) > 0) {
@@ -133,12 +133,13 @@ static int nl_req(int ns, char *buf, const void *req, ssize_t len)
 
 /**
  * nl_get_ext_if() - Get interface index supporting IP version being probed
+ * @s:	Netlink socket
  * @af:	Address family (AF_INET or AF_INET6) to look for connectivity
  *      for.
  *
  * Return: interface index, 0 if not found
  */
-unsigned int nl_get_ext_if(sa_family_t af)
+unsigned int nl_get_ext_if(int s, sa_family_t af)
 {
 	struct { struct nlmsghdr nlh; struct rtmsg rtm; } req = {
 		.nlh.nlmsg_type	 = RTM_GETROUTE,
@@ -157,7 +158,7 @@ unsigned int nl_get_ext_if(sa_family_t af)
 	ssize_t n;
 	size_t na;
 
-	if ((n = nl_req(0, buf, &req, sizeof(req))) < 0)
+	if ((n = nl_req(s, buf, &req, sizeof(req))) < 0)
 		return 0;
 
 	nh = (struct nlmsghdr *)buf;
@@ -186,11 +187,12 @@ unsigned int nl_get_ext_if(sa_family_t af)
 
 /**
  * nl_route_get_def() - Get default route for given interface and address family
+ * @s:		Netlink socket
  * @ifi:	Interface index
  * @af:		Address family
  * @gw:		Default gateway to fill on NL_GET
  */
-void nl_route_get_def(unsigned int ifi, sa_family_t af, void *gw)
+void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 {
 	struct req_t {
 		struct nlmsghdr nlh;
@@ -216,7 +218,7 @@ void nl_route_get_def(unsigned int ifi, sa_family_t af, void *gw)
 	char buf[NLBUFSIZ];
 	ssize_t n;
 
-	if ((n = nl_req(0, buf, &req, req.nlh.nlmsg_len)) < 0)
+	if ((n = nl_req(s, buf, &req, req.nlh.nlmsg_len)) < 0)
 		return;
 
 	for (nh = (struct nlmsghdr *)buf;
@@ -245,11 +247,12 @@ void nl_route_get_def(unsigned int ifi, sa_family_t af, void *gw)
 
 /**
  * nl_route_set_def() - Set default route for given interface and address family
+ * @s:		Netlink socket
  * @ifi:	Interface index in target namespace
  * @af:		Address family
  * @gw:		Default gateway to set
  */
-void nl_route_set_def(unsigned int ifi, sa_family_t af, void *gw)
+void nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 {
 	struct req_t {
 		struct nlmsghdr nlh;
@@ -314,16 +317,19 @@ void nl_route_set_def(unsigned int ifi, sa_family_t af, void *gw)
 		req.set.r4.rta_gw.rta_len = rta_len;
 	}
 
-	nl_req(1, buf, &req, req.nlh.nlmsg_len);
+	nl_req(s, buf, &req, req.nlh.nlmsg_len);
 }
 
 /**
  * nl_route_dup() - Copy routes for given interface and address family
- * @ifi:	Interface index in outer network namespace
- * @ifi_ns:	Interface index in target namespace for NL_SET, NL_DUP
+ * @s_src:	Netlink socket in source namespace
+ * @ifi_src:	Source interface index
+ * @s_dst:	Netlink socket in destination namespace
+ * @ifi_dst:	Interface index in destination namespace
  * @af:		Address family
  */
-void nl_route_dup(unsigned int ifi, unsigned int ifi_ns, sa_family_t af)
+void nl_route_dup(int s_src, unsigned int ifi_src,
+		  int s_dst, unsigned int ifi_dst, sa_family_t af)
 {
 	struct req_t {
 		struct nlmsghdr nlh;
@@ -343,7 +349,7 @@ void nl_route_dup(unsigned int ifi, unsigned int ifi_ns, sa_family_t af)
 
 		.rta.rta_type	  = RTA_OIF,
 		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
-		.ifi		  = ifi,
+		.ifi		  = ifi_src,
 	};
 	char buf[NLBUFSIZ], resp[NLBUFSIZ];
 	unsigned dup_routes = 0;
@@ -351,7 +357,7 @@ void nl_route_dup(unsigned int ifi, unsigned int ifi_ns, sa_family_t af)
 	struct nlmsghdr *nh;
 	unsigned i;
 
-	if ((n = nl_req(0, buf, &req, req.nlh.nlmsg_len)) < 0)
+	if ((n = nl_req(s_src, buf, &req, req.nlh.nlmsg_len)) < 0)
 		return;
 
 	nlmsgs_size = n;
@@ -376,7 +382,7 @@ void nl_route_dup(unsigned int ifi, unsigned int ifi_ns, sa_family_t af)
 		for (rta = RTM_RTA(rtm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
 		     rta = RTA_NEXT(rta, na)) {
 			if (rta->rta_type == RTA_OIF)
-				*(unsigned int *)RTA_DATA(rta) = ifi_ns;
+				*(unsigned int *)RTA_DATA(rta) = ifi_dst;
 		}
 	}
 
@@ -390,19 +396,20 @@ void nl_route_dup(unsigned int ifi, unsigned int ifi_ns, sa_family_t af)
 	 * need to calculate dependencies: let the kernel do that.
 	 */
 	for (i = 0; i < dup_routes; i++)
-		nl_req(1, resp, nh, nlmsgs_size);
+		nl_req(s_dst, resp, nh, nlmsgs_size);
 }
 
 /**
  * nl_addr_get() - Get IP address for given interface and address family
+ * @s:		Netlink socket
  * @ifi:	Interface index in outer network namespace
  * @af:		Address family
  * @addr:	Global address to fill
  * @prefix_len:	Mask or prefix length, to fill (for IPv4)
  * @addr_l:	Link-scoped address to fill (for IPv6)
  */
-void nl_addr_get(unsigned int ifi, sa_family_t af, void *addr,
-		 int *prefix_len, void *addr_l)
+void nl_addr_get(int s, unsigned int ifi, sa_family_t af,
+		 void *addr, int *prefix_len, void *addr_l)
 {
 	struct req_t {
 		struct nlmsghdr nlh;
@@ -420,7 +427,7 @@ void nl_addr_get(unsigned int ifi, sa_family_t af, void *addr,
 	char buf[NLBUFSIZ];
 	ssize_t n;
 
-	if ((n = nl_req(0, buf, &req, req.nlh.nlmsg_len)) < 0)
+	if ((n = nl_req(s, buf, &req, req.nlh.nlmsg_len)) < 0)
 		return;
 
 	for (nh = (struct nlmsghdr *)buf;
@@ -458,12 +465,14 @@ void nl_addr_get(unsigned int ifi, sa_family_t af, void *addr,
 
 /**
  * nl_add_set() - Set IP addresses for given interface and address family
+ * @s:		Netlink socket
  * @ifi:	Interface index
  * @af:		Address family
  * @addr:	Global address to set
  * @prefix_len:	Mask or prefix length to set
  */
-void nl_addr_set(unsigned int ifi, sa_family_t af, void *addr, int prefix_len)
+void nl_addr_set(int s, unsigned int ifi, sa_family_t af,
+		 void *addr, int prefix_len)
 {
 	struct req_t {
 		struct nlmsghdr nlh;
@@ -524,16 +533,19 @@ void nl_addr_set(unsigned int ifi, sa_family_t af, void *addr, int prefix_len)
 		req.set.a4.rta_a.rta_type = IFA_ADDRESS;
 	}
 
-	nl_req(1, buf, &req, req.nlh.nlmsg_len);
+	nl_req(s, buf, &req, req.nlh.nlmsg_len);
 }
 
 /**
  * nl_addr_dup() - Copy IP addresses for given interface and address family
- * @ifi:	Interface index in outer network namespace
- * @ifi_ns:	Interface index in target namespace
+ * @s_src:	Netlink socket in source network namespace
+ * @ifi_src:	Interface index in source network namespace
+ * @s_dst:	Netlink socket in destination network namespace
+ * @ifi_dst:	Interface index in destination namespace
  * @af:		Address family
  */
-void nl_addr_dup(unsigned int ifi, unsigned int ifi_ns, sa_family_t af)
+void nl_addr_dup(int s_src, unsigned int ifi_src,
+		 int s_dst, unsigned int ifi_dst, sa_family_t af)
 {
 	struct req_t {
 		struct nlmsghdr nlh;
@@ -545,14 +557,14 @@ void nl_addr_dup(unsigned int ifi, unsigned int ifi_ns, sa_family_t af)
 		.nlh.nlmsg_seq     = nl_seq++,
 
 		.ifa.ifa_family    = af,
-		.ifa.ifa_index     = ifi,
+		.ifa.ifa_index     = ifi_src,
 		.ifa.ifa_prefixlen = 0,
 	};
 	char buf[NLBUFSIZ], resp[NLBUFSIZ];
 	ssize_t n, nlmsgs_size;
 	struct nlmsghdr *nh;
 
-	if ((n = nl_req(0, buf, &req, sizeof(req))) < 0)
+	if ((n = nl_req(s_src, buf, &req, sizeof(req))) < 0)
 		return;
 
 	nlmsgs_size = n;
@@ -574,12 +586,13 @@ void nl_addr_dup(unsigned int ifi, unsigned int ifi_ns, sa_family_t af)
 
 		ifa = (struct ifaddrmsg *)NLMSG_DATA(nh);
 
-		if (ifa->ifa_scope == RT_SCOPE_LINK || ifa->ifa_index != ifi) {
+		if (ifa->ifa_scope == RT_SCOPE_LINK ||
+		    ifa->ifa_index != ifi_src) {
 			ifa->ifa_family = AF_UNSPEC;
 			continue;
 		}
 
-		ifa->ifa_index = ifi_ns;
+		ifa->ifa_index = ifi_dst;
 
 		for (rta = IFA_RTA(ifa), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
 		     rta = RTA_NEXT(rta, na)) {
@@ -588,16 +601,16 @@ void nl_addr_dup(unsigned int ifi, unsigned int ifi_ns, sa_family_t af)
 		}
 	}
 
-	nl_req(1, resp, buf, nlmsgs_size);
+	nl_req(s_dst, resp, buf, nlmsgs_size);
 }
 
 /**
  * nl_link_get_mac() - Get link MAC address
- * @ns:		Use netlink socket in namespace
+ * @s:		Netlink socket
  * @ifi:	Interface index
  * @mac:	Fill with current MAC address
  */
-void nl_link_get_mac(int ns, unsigned int ifi, void *mac)
+void nl_link_get_mac(int s, unsigned int ifi, void *mac)
 {
 	struct req_t {
 		struct nlmsghdr nlh;
@@ -614,10 +627,10 @@ void nl_link_get_mac(int ns, unsigned int ifi, void *mac)
 	char buf[NLBUFSIZ];
 	ssize_t n;
 
-	n = nl_req(ns, buf, &req, sizeof(req));
+	n = nl_req(s, buf, &req, sizeof(req));
 	if (n < 0)
 		return;
-	
+
 	for (nh = (struct nlmsghdr *)buf;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
 	     nh = NLMSG_NEXT(nh, n)) {
@@ -642,11 +655,12 @@ void nl_link_get_mac(int ns, unsigned int ifi, void *mac)
 
 /**
  * nl_link_set_mac() - Set link MAC address
+ * @s:		Netlink socket
  * @ns:		Use netlink socket in namespace
  * @ifi:	Interface index
  * @mac:	MAC address to set
  */
-void nl_link_set_mac(int ns, unsigned int ifi, void *mac)
+void nl_link_set_mac(int s, unsigned int ifi, void *mac)
 {
 	struct req_t {
 		struct nlmsghdr nlh;
@@ -667,16 +681,16 @@ void nl_link_set_mac(int ns, unsigned int ifi, void *mac)
 
 	memcpy(req.mac, mac, ETH_ALEN);
 
-	nl_req(ns, buf, &req, sizeof(req));
+	nl_req(s, buf, &req, sizeof(req));
 }
 
 /**
  * nl_link_up() - Bring link up
- * @ns:		Use netlink socket in namespace
+ * @s:		Netlink socket
  * @ifi:	Interface index
  * @mtu:	If non-zero, set interface MTU
  */
-void nl_link_up(int ns, unsigned int ifi, int mtu)
+void nl_link_up(int s, unsigned int ifi, int mtu)
 {
 	struct req_t {
 		struct nlmsghdr nlh;
@@ -702,5 +716,5 @@ void nl_link_up(int ns, unsigned int ifi, int mtu)
 		/* Shorten request to drop MTU attribute */
 		req.nlh.nlmsg_len = offsetof(struct req_t, rta);
 
-	nl_req(ns, buf, &req, req.nlh.nlmsg_len);
+	nl_req(s, buf, &req, req.nlh.nlmsg_len);
 }
diff --git a/netlink.h b/netlink.h
index 36bbf9f..5ca17c6 100644
--- a/netlink.h
+++ b/netlink.h
@@ -6,17 +6,23 @@
 #ifndef NETLINK_H
 #define NETLINK_H
 
+extern int nl_sock;
+extern int nl_sock_ns;
+
 void nl_sock_init(const struct ctx *c, bool ns);
-unsigned int nl_get_ext_if(sa_family_t af);
-void nl_route_get_def(unsigned int ifi, sa_family_t af, void *gw);
-void nl_route_set_def(unsigned int ifi, sa_family_t af, void *gw);
-void nl_route_dup(unsigned int ifi, unsigned int ifi_ns, sa_family_t af);
-void nl_addr_get(unsigned int ifi, sa_family_t af, void *addr,
-		 int *prefix_len, void *addr_l);
-void nl_addr_set(unsigned int ifi, sa_family_t af, void *addr, int prefix_len);
-void nl_addr_dup(unsigned int ifi, unsigned int ifi_ns, sa_family_t af);
-void nl_link_get_mac(int ns, unsigned int ifi, void *mac);
-void nl_link_set_mac(int ns, unsigned int ifi, void *mac);
-void nl_link_up(int ns, unsigned int ifi, int mtu);
+unsigned int nl_get_ext_if(int s, sa_family_t af);
+void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw);
+void nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw);
+void nl_route_dup(int s_src, unsigned int ifi_src,
+		  int s_dst, unsigned int ifi_dst, sa_family_t af);
+void nl_addr_get(int s, unsigned int ifi, sa_family_t af,
+		 void *addr, int *prefix_len, void *addr_l);
+void nl_addr_set(int s, unsigned int ifi, sa_family_t af,
+		 void *addr, int prefix_len);
+void nl_addr_dup(int s_src, unsigned int ifi_src,
+		 int s_dst, unsigned int ifi_dst, sa_family_t af);
+void nl_link_get_mac(int s, unsigned int ifi, void *mac);
+void nl_link_set_mac(int s, unsigned int ifi, void *mac);
+void nl_link_up(int s, unsigned int ifi, int mtu);
 
 #endif /* NETLINK_H */
diff --git a/pasta.c b/pasta.c
index 14ecc71..3380475 100644
--- a/pasta.c
+++ b/pasta.c
@@ -272,42 +272,49 @@ void pasta_start_ns(struct ctx *c, uid_t uid, gid_t gid,
  */
 void pasta_ns_conf(struct ctx *c)
 {
-	nl_link_up(1, 1 /* lo */, 0);
+	nl_link_up(nl_sock_ns, 1 /* lo */, 0);
 
 	/* Get or set guest MAC */
 	if (MAC_IS_ZERO(c->mac_guest))
-		nl_link_get_mac(1, c->pasta_ifi, c->mac_guest);
+		nl_link_get_mac(nl_sock_ns, c->pasta_ifi, c->mac_guest);
 	else
-		nl_link_set_mac(1, c->pasta_ifi, c->mac_guest);
+		nl_link_set_mac(nl_sock_ns, c->pasta_ifi, c->mac_guest);
 
 	if (c->pasta_conf_ns) {
-		nl_link_up(1, c->pasta_ifi, c->mtu);
+		nl_link_up(nl_sock_ns, c->pasta_ifi, c->mtu);
 
 		if (c->ifi4) {
 			if (c->no_copy_addrs)
-				nl_addr_set(c->pasta_ifi, AF_INET, 
+				nl_addr_set(nl_sock_ns, c->pasta_ifi, AF_INET,
 					    &c->ip4.addr, c->ip4.prefix_len);
 			else
-				nl_addr_dup(c->ifi4, c->pasta_ifi, AF_INET);
+				nl_addr_dup(nl_sock, c->ifi4,
+					    nl_sock_ns, c->pasta_ifi, AF_INET);
 
 			if (c->no_copy_routes)
-				nl_route_set_def(c->pasta_ifi, AF_INET,
-						 &c->ip4.gw);
+				nl_route_set_def(nl_sock_ns, c->pasta_ifi,
+						 AF_INET, &c->ip4.gw);
 			else
-				nl_route_dup(c->ifi4, c->pasta_ifi, AF_INET);
+				nl_route_dup(nl_sock, c->ifi4, nl_sock_ns,
+					     c->pasta_ifi, AF_INET);
 		}
 
 		if (c->ifi6) {
 			if (c->no_copy_addrs)
-				nl_addr_set(c->pasta_ifi, AF_INET6, &c->ip6.addr, 64);
+				nl_addr_set(nl_sock_ns, c->pasta_ifi,
+					    AF_INET6, &c->ip6.addr, 64);
 			else
-				nl_addr_dup(c->ifi4, c->pasta_ifi, AF_INET6);
+				nl_addr_dup(nl_sock, c->ifi4,
+					    nl_sock_ns, c->pasta_ifi,
+					    AF_INET6);
 
 			if (c->no_copy_routes)
-				nl_route_set_def(c->pasta_ifi, AF_INET6,
-						 &c->ip6.gw);
+				nl_route_set_def(nl_sock_ns, c->pasta_ifi,
+						 AF_INET6, &c->ip6.gw);
 			else
-				nl_route_dup(c->ifi6, c->pasta_ifi, AF_INET6);
+				nl_route_dup(nl_sock, c->ifi6,
+					     nl_sock_ns, c->pasta_ifi,
+					     AF_INET6);
 		}
 	}
 
-- 
@@ -272,42 +272,49 @@ void pasta_start_ns(struct ctx *c, uid_t uid, gid_t gid,
  */
 void pasta_ns_conf(struct ctx *c)
 {
-	nl_link_up(1, 1 /* lo */, 0);
+	nl_link_up(nl_sock_ns, 1 /* lo */, 0);
 
 	/* Get or set guest MAC */
 	if (MAC_IS_ZERO(c->mac_guest))
-		nl_link_get_mac(1, c->pasta_ifi, c->mac_guest);
+		nl_link_get_mac(nl_sock_ns, c->pasta_ifi, c->mac_guest);
 	else
-		nl_link_set_mac(1, c->pasta_ifi, c->mac_guest);
+		nl_link_set_mac(nl_sock_ns, c->pasta_ifi, c->mac_guest);
 
 	if (c->pasta_conf_ns) {
-		nl_link_up(1, c->pasta_ifi, c->mtu);
+		nl_link_up(nl_sock_ns, c->pasta_ifi, c->mtu);
 
 		if (c->ifi4) {
 			if (c->no_copy_addrs)
-				nl_addr_set(c->pasta_ifi, AF_INET, 
+				nl_addr_set(nl_sock_ns, c->pasta_ifi, AF_INET,
 					    &c->ip4.addr, c->ip4.prefix_len);
 			else
-				nl_addr_dup(c->ifi4, c->pasta_ifi, AF_INET);
+				nl_addr_dup(nl_sock, c->ifi4,
+					    nl_sock_ns, c->pasta_ifi, AF_INET);
 
 			if (c->no_copy_routes)
-				nl_route_set_def(c->pasta_ifi, AF_INET,
-						 &c->ip4.gw);
+				nl_route_set_def(nl_sock_ns, c->pasta_ifi,
+						 AF_INET, &c->ip4.gw);
 			else
-				nl_route_dup(c->ifi4, c->pasta_ifi, AF_INET);
+				nl_route_dup(nl_sock, c->ifi4, nl_sock_ns,
+					     c->pasta_ifi, AF_INET);
 		}
 
 		if (c->ifi6) {
 			if (c->no_copy_addrs)
-				nl_addr_set(c->pasta_ifi, AF_INET6, &c->ip6.addr, 64);
+				nl_addr_set(nl_sock_ns, c->pasta_ifi,
+					    AF_INET6, &c->ip6.addr, 64);
 			else
-				nl_addr_dup(c->ifi4, c->pasta_ifi, AF_INET6);
+				nl_addr_dup(nl_sock, c->ifi4,
+					    nl_sock_ns, c->pasta_ifi,
+					    AF_INET6);
 
 			if (c->no_copy_routes)
-				nl_route_set_def(c->pasta_ifi, AF_INET6,
-						 &c->ip6.gw);
+				nl_route_set_def(nl_sock_ns, c->pasta_ifi,
+						 AF_INET6, &c->ip6.gw);
 			else
-				nl_route_dup(c->ifi6, c->pasta_ifi, AF_INET6);
+				nl_route_dup(nl_sock, c->ifi6,
+					     nl_sock_ns, c->pasta_ifi,
+					     AF_INET6);
 		}
 	}
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 06/17] netlink: Make nl_*_dup() use a separate datagram for each request
  2023-07-24  6:09 [PATCH 00/17] netlink fixes and cleanups David Gibson
                   ` (4 preceding siblings ...)
  2023-07-24  6:09 ` [PATCH 05/17] netlink: Explicitly pass netlink sockets to operations David Gibson
@ 2023-07-24  6:09 ` David Gibson
  2023-07-24  6:09 ` [PATCH 07/17] netlink: Start sequence number from 1 instead of 0 David Gibson
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2023-07-24  6:09 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

nl_req() is designed to handle a single netlink request message: it only
receives a single reply datagram for the request, and only waits for a
single NLMSG_DONE or NLMSG_ERROR message at the beginning to clear out
things from previous requests.

However, in both nl_addr_dup() and nl_route_dup() we can send multiple
request messages as a single datagram, with a single nl_req() call.
This can easily mean that the replies nl_req() collects get out of
sync with requests.  We only get away with this because after we call
these functions we don't make any netlink calls where we need to parse
the replies.

This is fragile, so alter nl_*_dup() to make an nl_req() call for each
address it is adding in the target namespace.

For nl_route_dup() this fixes an additional minor problem: because
routes can have dependencies, some of the route add requests might
fail on the first attempt, so we need to repeat the requests a number
of times.  When we did that, we weren't updating the sequence number
on each new attempt.  This works, but not updating the sequence number
for each new request isn't ideal.  Now that we're making the requests
one at a time, it's easier to make sure we update the sequence number
each time.

Link: https://bugs.passt.top/show_bug.cgi?id=67

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 netlink.c | 50 +++++++++++++++++++++++++++-----------------------
 1 file changed, 27 insertions(+), 23 deletions(-)

diff --git a/netlink.c b/netlink.c
index 72044cd..bd76098 100644
--- a/netlink.c
+++ b/netlink.c
@@ -351,18 +351,16 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
 		.ifi		  = ifi_src,
 	};
-	char buf[NLBUFSIZ], resp[NLBUFSIZ];
 	unsigned dup_routes = 0;
 	ssize_t n, nlmsgs_size;
 	struct nlmsghdr *nh;
+	char buf[NLBUFSIZ];
 	unsigned i;
 
-	if ((n = nl_req(s_src, buf, &req, req.nlh.nlmsg_len)) < 0)
+	if ((nlmsgs_size = nl_req(s_src, buf, &req, req.nlh.nlmsg_len)) < 0)
 		return;
 
-	nlmsgs_size = n;
-
-	for (nh = (struct nlmsghdr *)buf;
+	for (nh = (struct nlmsghdr *)buf, n = nlmsgs_size;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
 	     nh = NLMSG_NEXT(nh, n)) {
 		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
@@ -372,7 +370,6 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 		if (nh->nlmsg_type != RTM_NEWROUTE)
 			continue;
 
-		nh->nlmsg_seq = nl_seq++;
 		nh->nlmsg_pid = 0;
 		nh->nlmsg_flags &= ~NLM_F_DUMP_FILTERED;
 		nh->nlmsg_flags |= NLM_F_REQUEST | NLM_F_ACK |
@@ -386,17 +383,27 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 		}
 	}
 
-	nh = (struct nlmsghdr *)buf;
 	/* Routes might have dependencies between each other, and the
 	 * kernel processes RTM_NEWROUTE messages sequentially. For n
-	 * valid routes, we might need to send up to n requests to get
-	 * all of them inserted. Routes that have been already
-	 * inserted won't cause the whole request to fail, so we can
-	 * simply repeat the whole request. This approach avoids the
-	 * need to calculate dependencies: let the kernel do that.
+	 * routes, we might need to send the requests up to n times to
+	 * get all of them inserted. Routes that have been already
+	 * inserted will return -EEXIST, but we can safely ignore that
+	 * and repeat the requests. This avoids the need to calculate
+	 * dependencies: let the kernel do that.
 	 */
-	for (i = 0; i < dup_routes; i++)
-		nl_req(s_dst, resp, nh, nlmsgs_size);
+	for (i = 0; i < dup_routes; i++) {
+		for (nh = (struct nlmsghdr *)buf, n = nlmsgs_size;
+		     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
+		     nh = NLMSG_NEXT(nh, n)) {
+			char resp[NLBUFSIZ];
+
+			if (nh->nlmsg_type != RTM_NEWROUTE)
+				continue;
+
+			nh->nlmsg_seq = nl_seq++;
+			nl_req(s_dst, resp, nh, nh->nlmsg_len);
+		}
+	}
 }
 
 /**
@@ -560,19 +567,18 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 		.ifa.ifa_index     = ifi_src,
 		.ifa.ifa_prefixlen = 0,
 	};
-	char buf[NLBUFSIZ], resp[NLBUFSIZ];
-	ssize_t n, nlmsgs_size;
+	char buf[NLBUFSIZ];
 	struct nlmsghdr *nh;
+	ssize_t n;
 
 	if ((n = nl_req(s_src, buf, &req, sizeof(req))) < 0)
 		return;
 
-	nlmsgs_size = n;
-
 	for (nh = (struct nlmsghdr *)buf;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
 	     nh = NLMSG_NEXT(nh, n)) {
 		struct ifaddrmsg *ifa;
+		char resp[NLBUFSIZ];
 		struct rtattr *rta;
 		size_t na;
 
@@ -587,10 +593,8 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 		ifa = (struct ifaddrmsg *)NLMSG_DATA(nh);
 
 		if (ifa->ifa_scope == RT_SCOPE_LINK ||
-		    ifa->ifa_index != ifi_src) {
-			ifa->ifa_family = AF_UNSPEC;
+		    ifa->ifa_index != ifi_src)
 			continue;
-		}
 
 		ifa->ifa_index = ifi_dst;
 
@@ -599,9 +603,9 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 			if (rta->rta_type == IFA_LABEL)
 				rta->rta_type = IFA_UNSPEC;
 		}
-	}
 
-	nl_req(s_dst, resp, buf, nlmsgs_size);
+		nl_req(s_dst, resp, nh, nh->nlmsg_len);
+	}
 }
 
 /**
-- 
@@ -351,18 +351,16 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
 		.ifi		  = ifi_src,
 	};
-	char buf[NLBUFSIZ], resp[NLBUFSIZ];
 	unsigned dup_routes = 0;
 	ssize_t n, nlmsgs_size;
 	struct nlmsghdr *nh;
+	char buf[NLBUFSIZ];
 	unsigned i;
 
-	if ((n = nl_req(s_src, buf, &req, req.nlh.nlmsg_len)) < 0)
+	if ((nlmsgs_size = nl_req(s_src, buf, &req, req.nlh.nlmsg_len)) < 0)
 		return;
 
-	nlmsgs_size = n;
-
-	for (nh = (struct nlmsghdr *)buf;
+	for (nh = (struct nlmsghdr *)buf, n = nlmsgs_size;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
 	     nh = NLMSG_NEXT(nh, n)) {
 		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
@@ -372,7 +370,6 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 		if (nh->nlmsg_type != RTM_NEWROUTE)
 			continue;
 
-		nh->nlmsg_seq = nl_seq++;
 		nh->nlmsg_pid = 0;
 		nh->nlmsg_flags &= ~NLM_F_DUMP_FILTERED;
 		nh->nlmsg_flags |= NLM_F_REQUEST | NLM_F_ACK |
@@ -386,17 +383,27 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 		}
 	}
 
-	nh = (struct nlmsghdr *)buf;
 	/* Routes might have dependencies between each other, and the
 	 * kernel processes RTM_NEWROUTE messages sequentially. For n
-	 * valid routes, we might need to send up to n requests to get
-	 * all of them inserted. Routes that have been already
-	 * inserted won't cause the whole request to fail, so we can
-	 * simply repeat the whole request. This approach avoids the
-	 * need to calculate dependencies: let the kernel do that.
+	 * routes, we might need to send the requests up to n times to
+	 * get all of them inserted. Routes that have been already
+	 * inserted will return -EEXIST, but we can safely ignore that
+	 * and repeat the requests. This avoids the need to calculate
+	 * dependencies: let the kernel do that.
 	 */
-	for (i = 0; i < dup_routes; i++)
-		nl_req(s_dst, resp, nh, nlmsgs_size);
+	for (i = 0; i < dup_routes; i++) {
+		for (nh = (struct nlmsghdr *)buf, n = nlmsgs_size;
+		     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
+		     nh = NLMSG_NEXT(nh, n)) {
+			char resp[NLBUFSIZ];
+
+			if (nh->nlmsg_type != RTM_NEWROUTE)
+				continue;
+
+			nh->nlmsg_seq = nl_seq++;
+			nl_req(s_dst, resp, nh, nh->nlmsg_len);
+		}
+	}
 }
 
 /**
@@ -560,19 +567,18 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 		.ifa.ifa_index     = ifi_src,
 		.ifa.ifa_prefixlen = 0,
 	};
-	char buf[NLBUFSIZ], resp[NLBUFSIZ];
-	ssize_t n, nlmsgs_size;
+	char buf[NLBUFSIZ];
 	struct nlmsghdr *nh;
+	ssize_t n;
 
 	if ((n = nl_req(s_src, buf, &req, sizeof(req))) < 0)
 		return;
 
-	nlmsgs_size = n;
-
 	for (nh = (struct nlmsghdr *)buf;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
 	     nh = NLMSG_NEXT(nh, n)) {
 		struct ifaddrmsg *ifa;
+		char resp[NLBUFSIZ];
 		struct rtattr *rta;
 		size_t na;
 
@@ -587,10 +593,8 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 		ifa = (struct ifaddrmsg *)NLMSG_DATA(nh);
 
 		if (ifa->ifa_scope == RT_SCOPE_LINK ||
-		    ifa->ifa_index != ifi_src) {
-			ifa->ifa_family = AF_UNSPEC;
+		    ifa->ifa_index != ifi_src)
 			continue;
-		}
 
 		ifa->ifa_index = ifi_dst;
 
@@ -599,9 +603,9 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 			if (rta->rta_type == IFA_LABEL)
 				rta->rta_type = IFA_UNSPEC;
 		}
-	}
 
-	nl_req(s_dst, resp, buf, nlmsgs_size);
+		nl_req(s_dst, resp, nh, nh->nlmsg_len);
+	}
 }
 
 /**
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 07/17] netlink: Start sequence number from 1 instead of 0
  2023-07-24  6:09 [PATCH 00/17] netlink fixes and cleanups David Gibson
                   ` (5 preceding siblings ...)
  2023-07-24  6:09 ` [PATCH 06/17] netlink: Make nl_*_dup() use a separate datagram for each request David Gibson
@ 2023-07-24  6:09 ` David Gibson
  2023-07-24  6:09 ` [PATCH 08/17] netlink: Treat send() or recv() errors as fatal David Gibson
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2023-07-24  6:09 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Netlink messages have a sequence number that's used to match requests to
responses.  It mostly doesn't matter what it is as long as it monotonically
increases, so we just use a global counter which we advance with each
request.

However, we start this counter at 0, so our very first request has sequence
number 0, which is usually reserved for asynchronous messages from the
kernel which aren't in response to a specific request. Since we don't (for
now) use such async messages, this doesn't really matter, but it's not
good practce.  So start the sequence at 1 instead.

Link: https://bugs.passt.top/show_bug.cgi?id=67

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 netlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/netlink.c b/netlink.c
index bd76098..3620fd6 100644
--- a/netlink.c
+++ b/netlink.c
@@ -40,7 +40,7 @@
 /* Socket in init, in target namespace, sequence (just needs to be monotonic) */
 int nl_sock	= -1;
 int nl_sock_ns	= -1;
-static int nl_seq;
+static int nl_seq = 1;
 
 /**
  * nl_sock_init_do() - Set up netlink sockets in init or target namespace
-- 
@@ -40,7 +40,7 @@
 /* Socket in init, in target namespace, sequence (just needs to be monotonic) */
 int nl_sock	= -1;
 int nl_sock_ns	= -1;
-static int nl_seq;
+static int nl_seq = 1;
 
 /**
  * nl_sock_init_do() - Set up netlink sockets in init or target namespace
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 08/17] netlink: Treat send() or recv() errors as fatal
  2023-07-24  6:09 [PATCH 00/17] netlink fixes and cleanups David Gibson
                   ` (6 preceding siblings ...)
  2023-07-24  6:09 ` [PATCH 07/17] netlink: Start sequence number from 1 instead of 0 David Gibson
@ 2023-07-24  6:09 ` David Gibson
  2023-08-02 22:47   ` Stefano Brivio
  2023-07-24  6:09 ` [PATCH 09/17] netlink: Fill in netlink header fields from nl_req() David Gibson
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 35+ messages in thread
From: David Gibson @ 2023-07-24  6:09 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Errors on send() or recv() calls on a netlink socket don't indicate errors
with the netlink operations we're attempting, but rather that something's
gone wrong with the mechanics of netlink itself.  We don't really expect
this to ever happen, and if it does, it's not clear what we could to to
recover.

So, treat errors from these calls as fatal, rather than returning the error
up the stack.  This makes handling failures in the callers of nl_req()
simpler.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 netlink.c | 36 +++++++++++++++++-------------------
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/netlink.c b/netlink.c
index 3620fd6..826c926 100644
--- a/netlink.c
+++ b/netlink.c
@@ -103,9 +103,9 @@ fail:
  * @req:	Request with netlink header
  * @len:	Request length
  *
- * Return: received length on success, negative error code on failure
+ * Return: received length on success, terminates on error
  */
-static int nl_req(int s, char *buf, const void *req, ssize_t len)
+static ssize_t nl_req(int s, char *buf, const void *req, ssize_t len)
 {
 	char flush[NLBUFSIZ];
 	int done = 0;
@@ -124,11 +124,17 @@ static int nl_req(int s, char *buf, const void *req, ssize_t len)
 		}
 	}
 
-	if ((send(s, req, len, 0) < len) ||
-	    (len = recv(s, buf, NLBUFSIZ, 0)) < 0)
-		return -errno;
+	n = send(s, req, len, 0);
+	if (n < 0)
+		die("netlink: Failed to send(): %s", strerror(errno));
+	else if (n < len)
+		die("netlink: Short send");
+
+	n = recv(s, buf, NLBUFSIZ, 0);
+	if (n < 0)
+		die("netlink: Failed to recv(): %s", strerror(errno));
 
-	return len;
+	return n;
 }
 
 /**
@@ -158,8 +164,7 @@ unsigned int nl_get_ext_if(int s, sa_family_t af)
 	ssize_t n;
 	size_t na;
 
-	if ((n = nl_req(s, buf, &req, sizeof(req))) < 0)
-		return 0;
+	n = nl_req(s, buf, &req, sizeof(req));
 
 	nh = (struct nlmsghdr *)buf;
 
@@ -218,8 +223,7 @@ void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 	char buf[NLBUFSIZ];
 	ssize_t n;
 
-	if ((n = nl_req(s, buf, &req, req.nlh.nlmsg_len)) < 0)
-		return;
+	n = nl_req(s, buf, &req, req.nlh.nlmsg_len);
 
 	for (nh = (struct nlmsghdr *)buf;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
@@ -357,8 +361,7 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 	char buf[NLBUFSIZ];
 	unsigned i;
 
-	if ((nlmsgs_size = nl_req(s_src, buf, &req, req.nlh.nlmsg_len)) < 0)
-		return;
+	nlmsgs_size = nl_req(s_src, buf, &req, req.nlh.nlmsg_len);
 
 	for (nh = (struct nlmsghdr *)buf, n = nlmsgs_size;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
@@ -434,8 +437,7 @@ void nl_addr_get(int s, unsigned int ifi, sa_family_t af,
 	char buf[NLBUFSIZ];
 	ssize_t n;
 
-	if ((n = nl_req(s, buf, &req, req.nlh.nlmsg_len)) < 0)
-		return;
+	n = nl_req(s, buf, &req, req.nlh.nlmsg_len);
 
 	for (nh = (struct nlmsghdr *)buf;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
@@ -571,8 +573,7 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 	struct nlmsghdr *nh;
 	ssize_t n;
 
-	if ((n = nl_req(s_src, buf, &req, sizeof(req))) < 0)
-		return;
+	n = nl_req(s_src, buf, &req, sizeof(req));
 
 	for (nh = (struct nlmsghdr *)buf;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
@@ -632,9 +633,6 @@ void nl_link_get_mac(int s, unsigned int ifi, void *mac)
 	ssize_t n;
 
 	n = nl_req(s, buf, &req, sizeof(req));
-	if (n < 0)
-		return;
-
 	for (nh = (struct nlmsghdr *)buf;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
 	     nh = NLMSG_NEXT(nh, n)) {
-- 
@@ -103,9 +103,9 @@ fail:
  * @req:	Request with netlink header
  * @len:	Request length
  *
- * Return: received length on success, negative error code on failure
+ * Return: received length on success, terminates on error
  */
-static int nl_req(int s, char *buf, const void *req, ssize_t len)
+static ssize_t nl_req(int s, char *buf, const void *req, ssize_t len)
 {
 	char flush[NLBUFSIZ];
 	int done = 0;
@@ -124,11 +124,17 @@ static int nl_req(int s, char *buf, const void *req, ssize_t len)
 		}
 	}
 
-	if ((send(s, req, len, 0) < len) ||
-	    (len = recv(s, buf, NLBUFSIZ, 0)) < 0)
-		return -errno;
+	n = send(s, req, len, 0);
+	if (n < 0)
+		die("netlink: Failed to send(): %s", strerror(errno));
+	else if (n < len)
+		die("netlink: Short send");
+
+	n = recv(s, buf, NLBUFSIZ, 0);
+	if (n < 0)
+		die("netlink: Failed to recv(): %s", strerror(errno));
 
-	return len;
+	return n;
 }
 
 /**
@@ -158,8 +164,7 @@ unsigned int nl_get_ext_if(int s, sa_family_t af)
 	ssize_t n;
 	size_t na;
 
-	if ((n = nl_req(s, buf, &req, sizeof(req))) < 0)
-		return 0;
+	n = nl_req(s, buf, &req, sizeof(req));
 
 	nh = (struct nlmsghdr *)buf;
 
@@ -218,8 +223,7 @@ void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 	char buf[NLBUFSIZ];
 	ssize_t n;
 
-	if ((n = nl_req(s, buf, &req, req.nlh.nlmsg_len)) < 0)
-		return;
+	n = nl_req(s, buf, &req, req.nlh.nlmsg_len);
 
 	for (nh = (struct nlmsghdr *)buf;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
@@ -357,8 +361,7 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 	char buf[NLBUFSIZ];
 	unsigned i;
 
-	if ((nlmsgs_size = nl_req(s_src, buf, &req, req.nlh.nlmsg_len)) < 0)
-		return;
+	nlmsgs_size = nl_req(s_src, buf, &req, req.nlh.nlmsg_len);
 
 	for (nh = (struct nlmsghdr *)buf, n = nlmsgs_size;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
@@ -434,8 +437,7 @@ void nl_addr_get(int s, unsigned int ifi, sa_family_t af,
 	char buf[NLBUFSIZ];
 	ssize_t n;
 
-	if ((n = nl_req(s, buf, &req, req.nlh.nlmsg_len)) < 0)
-		return;
+	n = nl_req(s, buf, &req, req.nlh.nlmsg_len);
 
 	for (nh = (struct nlmsghdr *)buf;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
@@ -571,8 +573,7 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 	struct nlmsghdr *nh;
 	ssize_t n;
 
-	if ((n = nl_req(s_src, buf, &req, sizeof(req))) < 0)
-		return;
+	n = nl_req(s_src, buf, &req, sizeof(req));
 
 	for (nh = (struct nlmsghdr *)buf;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
@@ -632,9 +633,6 @@ void nl_link_get_mac(int s, unsigned int ifi, void *mac)
 	ssize_t n;
 
 	n = nl_req(s, buf, &req, sizeof(req));
-	if (n < 0)
-		return;
-
 	for (nh = (struct nlmsghdr *)buf;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
 	     nh = NLMSG_NEXT(nh, n)) {
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 09/17] netlink: Fill in netlink header fields from nl_req()
  2023-07-24  6:09 [PATCH 00/17] netlink fixes and cleanups David Gibson
                   ` (7 preceding siblings ...)
  2023-07-24  6:09 ` [PATCH 08/17] netlink: Treat send() or recv() errors as fatal David Gibson
@ 2023-07-24  6:09 ` David Gibson
  2023-07-24  6:09 ` [PATCH 10/17] netlink: Add nl_do() helper for simple operations with error checking David Gibson
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2023-07-24  6:09 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Currently netlink functions need to fill in a full netlink header, as well
as a payload then call nl_req() to submit that to the kernel.  It makes
things a bit terser if we just give the relevant header fields as
parameters to nl_req() and have it complete the header.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 netlink.c | 126 ++++++++++++++++++------------------------------------
 1 file changed, 42 insertions(+), 84 deletions(-)

diff --git a/netlink.c b/netlink.c
index 826c926..3170344 100644
--- a/netlink.c
+++ b/netlink.c
@@ -97,25 +97,29 @@ fail:
 }
 
 /**
- * nl_req() - Send netlink request and read response
+ * nl_req() - Prepare and send netlink request, read response
  * @s:		Netlink socket
  * @buf:	Buffer for response (at least NLBUFSIZ long)
- * @req:	Request with netlink header
+ * @req:	Request (will fill netlink header)
+ * @type:	Request type
+ * @flags:	Extra request flags (NLM_F_REQUEST and NLM_F_ACK assumed)
  * @len:	Request length
  *
  * Return: received length on success, terminates on error
  */
-static ssize_t nl_req(int s, char *buf, const void *req, ssize_t len)
+static ssize_t nl_req(int s, char *buf, void *req,
+		      uint16_t type, uint16_t flags, ssize_t len)
 {
 	char flush[NLBUFSIZ];
+	struct nlmsghdr *nh;
 	int done = 0;
 	ssize_t n;
 
 	while (!done && (n = recv(s, flush, sizeof(flush), MSG_DONTWAIT)) > 0) {
-		struct nlmsghdr *nh = (struct nlmsghdr *)flush;
 		size_t nm = n;
 
-		for ( ; NLMSG_OK(nh, nm); nh = NLMSG_NEXT(nh, nm)) {
+		for (nh = (struct nlmsghdr *)flush;
+		     NLMSG_OK(nh, nm); nh = NLMSG_NEXT(nh, nm)) {
 			if (nh->nlmsg_type == NLMSG_DONE ||
 			    nh->nlmsg_type == NLMSG_ERROR) {
 				done = 1;
@@ -124,6 +128,13 @@ static ssize_t nl_req(int s, char *buf, const void *req, ssize_t len)
 		}
 	}
 
+	nh = (struct nlmsghdr *)req;
+	nh->nlmsg_type = type;
+	nh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | flags;
+	nh->nlmsg_len = len;
+	nh->nlmsg_seq = nl_seq++;
+	nh->nlmsg_pid = 0;
+
 	n = send(s, req, len, 0);
 	if (n < 0)
 		die("netlink: Failed to send(): %s", strerror(errno));
@@ -148,11 +159,6 @@ static ssize_t nl_req(int s, char *buf, const void *req, ssize_t len)
 unsigned int nl_get_ext_if(int s, sa_family_t af)
 {
 	struct { struct nlmsghdr nlh; struct rtmsg rtm; } req = {
-		.nlh.nlmsg_type	 = RTM_GETROUTE,
-		.nlh.nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP,
-		.nlh.nlmsg_len	 = NLMSG_LENGTH(sizeof(struct rtmsg)),
-		.nlh.nlmsg_seq	 = nl_seq++,
-
 		.rtm.rtm_table	 = RT_TABLE_MAIN,
 		.rtm.rtm_scope	 = RT_SCOPE_UNIVERSE,
 		.rtm.rtm_type	 = RTN_UNICAST,
@@ -164,7 +170,7 @@ unsigned int nl_get_ext_if(int s, sa_family_t af)
 	ssize_t n;
 	size_t na;
 
-	n = nl_req(s, buf, &req, sizeof(req));
+	n = nl_req(s, buf, &req, RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
 
 	nh = (struct nlmsghdr *)buf;
 
@@ -205,11 +211,6 @@ void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 		struct rtattr rta;
 		unsigned int ifi;
 	} req = {
-		.nlh.nlmsg_type	  = RTM_GETROUTE,
-		.nlh.nlmsg_len	  = sizeof(req),
-		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_DUMP,
-		.nlh.nlmsg_seq	  = nl_seq++,
-
 		.rtm.rtm_family	  = af,
 		.rtm.rtm_table	  = RT_TABLE_MAIN,
 		.rtm.rtm_scope	  = RT_SCOPE_UNIVERSE,
@@ -223,7 +224,7 @@ void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 	char buf[NLBUFSIZ];
 	ssize_t n;
 
-	n = nl_req(s, buf, &req, req.nlh.nlmsg_len);
+	n = nl_req(s, buf, &req, RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
 
 	for (nh = (struct nlmsghdr *)buf;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
@@ -278,11 +279,6 @@ void nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 			} r4;
 		} set;
 	} req = {
-		.nlh.nlmsg_type	  = RTM_NEWROUTE,
-		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK |
-				    NLM_F_CREATE | NLM_F_EXCL,
-		.nlh.nlmsg_seq	  = nl_seq++,
-
 		.rtm.rtm_family	  = af,
 		.rtm.rtm_table	  = RT_TABLE_MAIN,
 		.rtm.rtm_scope	  = RT_SCOPE_UNIVERSE,
@@ -294,12 +290,12 @@ void nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 		.ifi		  = ifi,
 	};
 	char buf[NLBUFSIZ];
+	ssize_t len;
 
 	if (af == AF_INET6) {
 		size_t rta_len = RTA_LENGTH(sizeof(req.set.r6.d));
 
-		req.nlh.nlmsg_len = offsetof(struct req_t, set.r6)
-			+ sizeof(req.set.r6);
+		len = offsetof(struct req_t, set.r6) + sizeof(req.set.r6);
 
 		req.set.r6.rta_dst.rta_type = RTA_DST;
 		req.set.r6.rta_dst.rta_len = rta_len;
@@ -310,8 +306,7 @@ void nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 	} else {
 		size_t rta_len = RTA_LENGTH(sizeof(req.set.r4.d));
 
-		req.nlh.nlmsg_len = offsetof(struct req_t, set.r4)
-			+ sizeof(req.set.r4);
+		len = offsetof(struct req_t, set.r4) + sizeof(req.set.r4);
 
 		req.set.r4.rta_dst.rta_type = RTA_DST;
 		req.set.r4.rta_dst.rta_len = rta_len;
@@ -321,7 +316,7 @@ void nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 		req.set.r4.rta_gw.rta_len = rta_len;
 	}
 
-	nl_req(s, buf, &req, req.nlh.nlmsg_len);
+	nl_req(s, buf, &req, RTM_NEWROUTE, NLM_F_CREATE | NLM_F_EXCL, len);
 }
 
 /**
@@ -341,11 +336,6 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 		struct rtattr rta;
 		unsigned int ifi;
 	} req = {
-		.nlh.nlmsg_type	  = RTM_GETROUTE,
-		.nlh.nlmsg_len	  = sizeof(req),
-		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_DUMP,
-		.nlh.nlmsg_seq	  = nl_seq++,
-
 		.rtm.rtm_family	  = af,
 		.rtm.rtm_table	  = RT_TABLE_MAIN,
 		.rtm.rtm_scope	  = RT_SCOPE_UNIVERSE,
@@ -361,7 +351,8 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 	char buf[NLBUFSIZ];
 	unsigned i;
 
-	nlmsgs_size = nl_req(s_src, buf, &req, req.nlh.nlmsg_len);
+	nlmsgs_size = nl_req(s_src, buf, &req,
+			     RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
 
 	for (nh = (struct nlmsghdr *)buf, n = nlmsgs_size;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
@@ -373,10 +364,6 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 		if (nh->nlmsg_type != RTM_NEWROUTE)
 			continue;
 
-		nh->nlmsg_pid = 0;
-		nh->nlmsg_flags &= ~NLM_F_DUMP_FILTERED;
-		nh->nlmsg_flags |= NLM_F_REQUEST | NLM_F_ACK |
-			NLM_F_CREATE;
 		dup_routes++;
 
 		for (rta = RTM_RTA(rtm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
@@ -398,13 +385,15 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 		for (nh = (struct nlmsghdr *)buf, n = nlmsgs_size;
 		     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
 		     nh = NLMSG_NEXT(nh, n)) {
+			uint16_t flags = nh->nlmsg_flags;
 			char resp[NLBUFSIZ];
 
 			if (nh->nlmsg_type != RTM_NEWROUTE)
 				continue;
 
-			nh->nlmsg_seq = nl_seq++;
-			nl_req(s_dst, resp, nh, nh->nlmsg_len);
+			nl_req(s_dst, resp, nh, RTM_NEWROUTE,
+			       (flags & ~NLM_F_DUMP_FILTERED) | NLM_F_CREATE,
+			       nh->nlmsg_len);
 		}
 	}
 }
@@ -425,11 +414,6 @@ void nl_addr_get(int s, unsigned int ifi, sa_family_t af,
 		struct nlmsghdr nlh;
 		struct ifaddrmsg ifa;
 	} req = {
-		.nlh.nlmsg_type    = RTM_GETADDR,
-		.nlh.nlmsg_flags   = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
-		.nlh.nlmsg_len     = sizeof(req),
-		.nlh.nlmsg_seq     = nl_seq++,
-
 		.ifa.ifa_family    = af,
 		.ifa.ifa_index     = ifi,
 	};
@@ -437,7 +421,7 @@ void nl_addr_get(int s, unsigned int ifi, sa_family_t af,
 	char buf[NLBUFSIZ];
 	ssize_t n;
 
-	n = nl_req(s, buf, &req, req.nlh.nlmsg_len);
+	n = nl_req(s, buf, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req));
 
 	for (nh = (struct nlmsghdr *)buf;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
@@ -501,18 +485,13 @@ void nl_addr_set(int s, unsigned int ifi, sa_family_t af,
 			} a6;
 		} set;
 	} req = {
-		.nlh.nlmsg_type    = RTM_NEWADDR,
-		.nlh.nlmsg_flags   = NLM_F_REQUEST | NLM_F_ACK |
-				     NLM_F_CREATE | NLM_F_EXCL,
-		.nlh.nlmsg_len     = NLMSG_LENGTH(sizeof(struct ifaddrmsg)),
-		.nlh.nlmsg_seq     = nl_seq++,
-
 		.ifa.ifa_family    = af,
 		.ifa.ifa_index     = ifi,
 		.ifa.ifa_prefixlen = prefix_len,
 		.ifa.ifa_scope	   = RT_SCOPE_UNIVERSE,
 	};
 	char buf[NLBUFSIZ];
+	ssize_t len;
 
 	if (af == AF_INET6) {
 		size_t rta_len = RTA_LENGTH(sizeof(req.set.a6.l));
@@ -520,8 +499,7 @@ void nl_addr_set(int s, unsigned int ifi, sa_family_t af,
 		/* By default, strictly speaking, it's duplicated */
 		req.ifa.ifa_flags = IFA_F_NODAD;
 
-		req.nlh.nlmsg_len = offsetof(struct req_t, set.a6)
-			+ sizeof(req.set.a6);
+		len = offsetof(struct req_t, set.a6) + sizeof(req.set.a6);
 
 		memcpy(&req.set.a6.l, addr, sizeof(req.set.a6.l));
 		req.set.a6.rta_l.rta_len = rta_len;
@@ -532,8 +510,7 @@ void nl_addr_set(int s, unsigned int ifi, sa_family_t af,
 	} else {
 		size_t rta_len = RTA_LENGTH(sizeof(req.set.a4.l));
 
-		req.nlh.nlmsg_len = offsetof(struct req_t, set.a4)
-			+ sizeof(req.set.a4);
+		len = offsetof(struct req_t, set.a4) + sizeof(req.set.a4);
 
 		memcpy(&req.set.a4.l, addr, sizeof(req.set.a4.l));
 		req.set.a4.rta_l.rta_len = rta_len;
@@ -542,7 +519,7 @@ void nl_addr_set(int s, unsigned int ifi, sa_family_t af,
 		req.set.a4.rta_a.rta_type = IFA_ADDRESS;
 	}
 
-	nl_req(s, buf, &req, req.nlh.nlmsg_len);
+	nl_req(s, buf, &req, RTM_NEWADDR, NLM_F_CREATE | NLM_F_EXCL, len);
 }
 
 /**
@@ -560,11 +537,6 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 		struct nlmsghdr nlh;
 		struct ifaddrmsg ifa;
 	} req = {
-		.nlh.nlmsg_type    = RTM_GETADDR,
-		.nlh.nlmsg_flags   = NLM_F_REQUEST | NLM_F_DUMP,
-		.nlh.nlmsg_len     = sizeof(req),
-		.nlh.nlmsg_seq     = nl_seq++,
-
 		.ifa.ifa_family    = af,
 		.ifa.ifa_index     = ifi_src,
 		.ifa.ifa_prefixlen = 0,
@@ -573,7 +545,7 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 	struct nlmsghdr *nh;
 	ssize_t n;
 
-	n = nl_req(s_src, buf, &req, sizeof(req));
+	n = nl_req(s_src, buf, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req));
 
 	for (nh = (struct nlmsghdr *)buf;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
@@ -586,11 +558,6 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 		if (nh->nlmsg_type != RTM_NEWADDR)
 			continue;
 
-		nh->nlmsg_seq = nl_seq++;
-		nh->nlmsg_pid = 0;
-		nh->nlmsg_flags &= ~NLM_F_DUMP_FILTERED;
-		nh->nlmsg_flags |= NLM_F_REQUEST | NLM_F_ACK | NLM_F_CREATE;
-
 		ifa = (struct ifaddrmsg *)NLMSG_DATA(nh);
 
 		if (ifa->ifa_scope == RT_SCOPE_LINK ||
@@ -605,7 +572,9 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 				rta->rta_type = IFA_UNSPEC;
 		}
 
-		nl_req(s_dst, resp, nh, nh->nlmsg_len);
+		nl_req(s_dst, resp, nh, RTM_NEWADDR,
+		       (nh->nlmsg_flags & ~NLM_F_DUMP_FILTERED) | NLM_F_CREATE,
+		       nh->nlmsg_len);
 	}
 }
 
@@ -621,10 +590,6 @@ void nl_link_get_mac(int s, unsigned int ifi, void *mac)
 		struct nlmsghdr nlh;
 		struct ifinfomsg ifm;
 	} req = {
-		.nlh.nlmsg_type	  = RTM_GETLINK,
-		.nlh.nlmsg_len	  = sizeof(req),
-		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK,
-		.nlh.nlmsg_seq	  = nl_seq++,
 		.ifm.ifi_family	  = AF_UNSPEC,
 		.ifm.ifi_index	  = ifi,
 	};
@@ -632,7 +597,7 @@ void nl_link_get_mac(int s, unsigned int ifi, void *mac)
 	char buf[NLBUFSIZ];
 	ssize_t n;
 
-	n = nl_req(s, buf, &req, sizeof(req));
+	n = nl_req(s, buf, &req, RTM_GETLINK, 0, sizeof(req));
 	for (nh = (struct nlmsghdr *)buf;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
 	     nh = NLMSG_NEXT(nh, n)) {
@@ -670,10 +635,6 @@ void nl_link_set_mac(int s, unsigned int ifi, void *mac)
 		struct rtattr rta;
 		unsigned char mac[ETH_ALEN];
 	} req = {
-		.nlh.nlmsg_type	  = RTM_NEWLINK,
-		.nlh.nlmsg_len	  = sizeof(req),
-		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK,
-		.nlh.nlmsg_seq	  = nl_seq++,
 		.ifm.ifi_family	  = AF_UNSPEC,
 		.ifm.ifi_index	  = ifi,
 		.rta.rta_type	  = IFLA_ADDRESS,
@@ -683,7 +644,7 @@ void nl_link_set_mac(int s, unsigned int ifi, void *mac)
 
 	memcpy(req.mac, mac, ETH_ALEN);
 
-	nl_req(s, buf, &req, sizeof(req));
+	nl_req(s, buf, &req, RTM_NEWLINK, 0, sizeof(req));
 }
 
 /**
@@ -700,10 +661,6 @@ void nl_link_up(int s, unsigned int ifi, int mtu)
 		struct rtattr rta;
 		unsigned int mtu;
 	} req = {
-		.nlh.nlmsg_type   = RTM_NEWLINK,
-		.nlh.nlmsg_len    = sizeof(req),
-		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK,
-		.nlh.nlmsg_seq	  = nl_seq++,
 		.ifm.ifi_family	  = AF_UNSPEC,
 		.ifm.ifi_index	  = ifi,
 		.ifm.ifi_flags	  = IFF_UP,
@@ -712,11 +669,12 @@ void nl_link_up(int s, unsigned int ifi, int mtu)
 		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
 		.mtu		  = mtu,
 	};
+	ssize_t len = sizeof(req);
 	char buf[NLBUFSIZ];
 
 	if (!mtu)
 		/* Shorten request to drop MTU attribute */
-		req.nlh.nlmsg_len = offsetof(struct req_t, rta);
+		len = offsetof(struct req_t, rta);
 
-	nl_req(s, buf, &req, req.nlh.nlmsg_len);
+	nl_req(s, buf, &req, RTM_NEWLINK, 0, len);
 }
-- 
@@ -97,25 +97,29 @@ fail:
 }
 
 /**
- * nl_req() - Send netlink request and read response
+ * nl_req() - Prepare and send netlink request, read response
  * @s:		Netlink socket
  * @buf:	Buffer for response (at least NLBUFSIZ long)
- * @req:	Request with netlink header
+ * @req:	Request (will fill netlink header)
+ * @type:	Request type
+ * @flags:	Extra request flags (NLM_F_REQUEST and NLM_F_ACK assumed)
  * @len:	Request length
  *
  * Return: received length on success, terminates on error
  */
-static ssize_t nl_req(int s, char *buf, const void *req, ssize_t len)
+static ssize_t nl_req(int s, char *buf, void *req,
+		      uint16_t type, uint16_t flags, ssize_t len)
 {
 	char flush[NLBUFSIZ];
+	struct nlmsghdr *nh;
 	int done = 0;
 	ssize_t n;
 
 	while (!done && (n = recv(s, flush, sizeof(flush), MSG_DONTWAIT)) > 0) {
-		struct nlmsghdr *nh = (struct nlmsghdr *)flush;
 		size_t nm = n;
 
-		for ( ; NLMSG_OK(nh, nm); nh = NLMSG_NEXT(nh, nm)) {
+		for (nh = (struct nlmsghdr *)flush;
+		     NLMSG_OK(nh, nm); nh = NLMSG_NEXT(nh, nm)) {
 			if (nh->nlmsg_type == NLMSG_DONE ||
 			    nh->nlmsg_type == NLMSG_ERROR) {
 				done = 1;
@@ -124,6 +128,13 @@ static ssize_t nl_req(int s, char *buf, const void *req, ssize_t len)
 		}
 	}
 
+	nh = (struct nlmsghdr *)req;
+	nh->nlmsg_type = type;
+	nh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | flags;
+	nh->nlmsg_len = len;
+	nh->nlmsg_seq = nl_seq++;
+	nh->nlmsg_pid = 0;
+
 	n = send(s, req, len, 0);
 	if (n < 0)
 		die("netlink: Failed to send(): %s", strerror(errno));
@@ -148,11 +159,6 @@ static ssize_t nl_req(int s, char *buf, const void *req, ssize_t len)
 unsigned int nl_get_ext_if(int s, sa_family_t af)
 {
 	struct { struct nlmsghdr nlh; struct rtmsg rtm; } req = {
-		.nlh.nlmsg_type	 = RTM_GETROUTE,
-		.nlh.nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP,
-		.nlh.nlmsg_len	 = NLMSG_LENGTH(sizeof(struct rtmsg)),
-		.nlh.nlmsg_seq	 = nl_seq++,
-
 		.rtm.rtm_table	 = RT_TABLE_MAIN,
 		.rtm.rtm_scope	 = RT_SCOPE_UNIVERSE,
 		.rtm.rtm_type	 = RTN_UNICAST,
@@ -164,7 +170,7 @@ unsigned int nl_get_ext_if(int s, sa_family_t af)
 	ssize_t n;
 	size_t na;
 
-	n = nl_req(s, buf, &req, sizeof(req));
+	n = nl_req(s, buf, &req, RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
 
 	nh = (struct nlmsghdr *)buf;
 
@@ -205,11 +211,6 @@ void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 		struct rtattr rta;
 		unsigned int ifi;
 	} req = {
-		.nlh.nlmsg_type	  = RTM_GETROUTE,
-		.nlh.nlmsg_len	  = sizeof(req),
-		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_DUMP,
-		.nlh.nlmsg_seq	  = nl_seq++,
-
 		.rtm.rtm_family	  = af,
 		.rtm.rtm_table	  = RT_TABLE_MAIN,
 		.rtm.rtm_scope	  = RT_SCOPE_UNIVERSE,
@@ -223,7 +224,7 @@ void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 	char buf[NLBUFSIZ];
 	ssize_t n;
 
-	n = nl_req(s, buf, &req, req.nlh.nlmsg_len);
+	n = nl_req(s, buf, &req, RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
 
 	for (nh = (struct nlmsghdr *)buf;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
@@ -278,11 +279,6 @@ void nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 			} r4;
 		} set;
 	} req = {
-		.nlh.nlmsg_type	  = RTM_NEWROUTE,
-		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK |
-				    NLM_F_CREATE | NLM_F_EXCL,
-		.nlh.nlmsg_seq	  = nl_seq++,
-
 		.rtm.rtm_family	  = af,
 		.rtm.rtm_table	  = RT_TABLE_MAIN,
 		.rtm.rtm_scope	  = RT_SCOPE_UNIVERSE,
@@ -294,12 +290,12 @@ void nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 		.ifi		  = ifi,
 	};
 	char buf[NLBUFSIZ];
+	ssize_t len;
 
 	if (af == AF_INET6) {
 		size_t rta_len = RTA_LENGTH(sizeof(req.set.r6.d));
 
-		req.nlh.nlmsg_len = offsetof(struct req_t, set.r6)
-			+ sizeof(req.set.r6);
+		len = offsetof(struct req_t, set.r6) + sizeof(req.set.r6);
 
 		req.set.r6.rta_dst.rta_type = RTA_DST;
 		req.set.r6.rta_dst.rta_len = rta_len;
@@ -310,8 +306,7 @@ void nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 	} else {
 		size_t rta_len = RTA_LENGTH(sizeof(req.set.r4.d));
 
-		req.nlh.nlmsg_len = offsetof(struct req_t, set.r4)
-			+ sizeof(req.set.r4);
+		len = offsetof(struct req_t, set.r4) + sizeof(req.set.r4);
 
 		req.set.r4.rta_dst.rta_type = RTA_DST;
 		req.set.r4.rta_dst.rta_len = rta_len;
@@ -321,7 +316,7 @@ void nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 		req.set.r4.rta_gw.rta_len = rta_len;
 	}
 
-	nl_req(s, buf, &req, req.nlh.nlmsg_len);
+	nl_req(s, buf, &req, RTM_NEWROUTE, NLM_F_CREATE | NLM_F_EXCL, len);
 }
 
 /**
@@ -341,11 +336,6 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 		struct rtattr rta;
 		unsigned int ifi;
 	} req = {
-		.nlh.nlmsg_type	  = RTM_GETROUTE,
-		.nlh.nlmsg_len	  = sizeof(req),
-		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_DUMP,
-		.nlh.nlmsg_seq	  = nl_seq++,
-
 		.rtm.rtm_family	  = af,
 		.rtm.rtm_table	  = RT_TABLE_MAIN,
 		.rtm.rtm_scope	  = RT_SCOPE_UNIVERSE,
@@ -361,7 +351,8 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 	char buf[NLBUFSIZ];
 	unsigned i;
 
-	nlmsgs_size = nl_req(s_src, buf, &req, req.nlh.nlmsg_len);
+	nlmsgs_size = nl_req(s_src, buf, &req,
+			     RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
 
 	for (nh = (struct nlmsghdr *)buf, n = nlmsgs_size;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
@@ -373,10 +364,6 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 		if (nh->nlmsg_type != RTM_NEWROUTE)
 			continue;
 
-		nh->nlmsg_pid = 0;
-		nh->nlmsg_flags &= ~NLM_F_DUMP_FILTERED;
-		nh->nlmsg_flags |= NLM_F_REQUEST | NLM_F_ACK |
-			NLM_F_CREATE;
 		dup_routes++;
 
 		for (rta = RTM_RTA(rtm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
@@ -398,13 +385,15 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 		for (nh = (struct nlmsghdr *)buf, n = nlmsgs_size;
 		     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
 		     nh = NLMSG_NEXT(nh, n)) {
+			uint16_t flags = nh->nlmsg_flags;
 			char resp[NLBUFSIZ];
 
 			if (nh->nlmsg_type != RTM_NEWROUTE)
 				continue;
 
-			nh->nlmsg_seq = nl_seq++;
-			nl_req(s_dst, resp, nh, nh->nlmsg_len);
+			nl_req(s_dst, resp, nh, RTM_NEWROUTE,
+			       (flags & ~NLM_F_DUMP_FILTERED) | NLM_F_CREATE,
+			       nh->nlmsg_len);
 		}
 	}
 }
@@ -425,11 +414,6 @@ void nl_addr_get(int s, unsigned int ifi, sa_family_t af,
 		struct nlmsghdr nlh;
 		struct ifaddrmsg ifa;
 	} req = {
-		.nlh.nlmsg_type    = RTM_GETADDR,
-		.nlh.nlmsg_flags   = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
-		.nlh.nlmsg_len     = sizeof(req),
-		.nlh.nlmsg_seq     = nl_seq++,
-
 		.ifa.ifa_family    = af,
 		.ifa.ifa_index     = ifi,
 	};
@@ -437,7 +421,7 @@ void nl_addr_get(int s, unsigned int ifi, sa_family_t af,
 	char buf[NLBUFSIZ];
 	ssize_t n;
 
-	n = nl_req(s, buf, &req, req.nlh.nlmsg_len);
+	n = nl_req(s, buf, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req));
 
 	for (nh = (struct nlmsghdr *)buf;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
@@ -501,18 +485,13 @@ void nl_addr_set(int s, unsigned int ifi, sa_family_t af,
 			} a6;
 		} set;
 	} req = {
-		.nlh.nlmsg_type    = RTM_NEWADDR,
-		.nlh.nlmsg_flags   = NLM_F_REQUEST | NLM_F_ACK |
-				     NLM_F_CREATE | NLM_F_EXCL,
-		.nlh.nlmsg_len     = NLMSG_LENGTH(sizeof(struct ifaddrmsg)),
-		.nlh.nlmsg_seq     = nl_seq++,
-
 		.ifa.ifa_family    = af,
 		.ifa.ifa_index     = ifi,
 		.ifa.ifa_prefixlen = prefix_len,
 		.ifa.ifa_scope	   = RT_SCOPE_UNIVERSE,
 	};
 	char buf[NLBUFSIZ];
+	ssize_t len;
 
 	if (af == AF_INET6) {
 		size_t rta_len = RTA_LENGTH(sizeof(req.set.a6.l));
@@ -520,8 +499,7 @@ void nl_addr_set(int s, unsigned int ifi, sa_family_t af,
 		/* By default, strictly speaking, it's duplicated */
 		req.ifa.ifa_flags = IFA_F_NODAD;
 
-		req.nlh.nlmsg_len = offsetof(struct req_t, set.a6)
-			+ sizeof(req.set.a6);
+		len = offsetof(struct req_t, set.a6) + sizeof(req.set.a6);
 
 		memcpy(&req.set.a6.l, addr, sizeof(req.set.a6.l));
 		req.set.a6.rta_l.rta_len = rta_len;
@@ -532,8 +510,7 @@ void nl_addr_set(int s, unsigned int ifi, sa_family_t af,
 	} else {
 		size_t rta_len = RTA_LENGTH(sizeof(req.set.a4.l));
 
-		req.nlh.nlmsg_len = offsetof(struct req_t, set.a4)
-			+ sizeof(req.set.a4);
+		len = offsetof(struct req_t, set.a4) + sizeof(req.set.a4);
 
 		memcpy(&req.set.a4.l, addr, sizeof(req.set.a4.l));
 		req.set.a4.rta_l.rta_len = rta_len;
@@ -542,7 +519,7 @@ void nl_addr_set(int s, unsigned int ifi, sa_family_t af,
 		req.set.a4.rta_a.rta_type = IFA_ADDRESS;
 	}
 
-	nl_req(s, buf, &req, req.nlh.nlmsg_len);
+	nl_req(s, buf, &req, RTM_NEWADDR, NLM_F_CREATE | NLM_F_EXCL, len);
 }
 
 /**
@@ -560,11 +537,6 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 		struct nlmsghdr nlh;
 		struct ifaddrmsg ifa;
 	} req = {
-		.nlh.nlmsg_type    = RTM_GETADDR,
-		.nlh.nlmsg_flags   = NLM_F_REQUEST | NLM_F_DUMP,
-		.nlh.nlmsg_len     = sizeof(req),
-		.nlh.nlmsg_seq     = nl_seq++,
-
 		.ifa.ifa_family    = af,
 		.ifa.ifa_index     = ifi_src,
 		.ifa.ifa_prefixlen = 0,
@@ -573,7 +545,7 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 	struct nlmsghdr *nh;
 	ssize_t n;
 
-	n = nl_req(s_src, buf, &req, sizeof(req));
+	n = nl_req(s_src, buf, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req));
 
 	for (nh = (struct nlmsghdr *)buf;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
@@ -586,11 +558,6 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 		if (nh->nlmsg_type != RTM_NEWADDR)
 			continue;
 
-		nh->nlmsg_seq = nl_seq++;
-		nh->nlmsg_pid = 0;
-		nh->nlmsg_flags &= ~NLM_F_DUMP_FILTERED;
-		nh->nlmsg_flags |= NLM_F_REQUEST | NLM_F_ACK | NLM_F_CREATE;
-
 		ifa = (struct ifaddrmsg *)NLMSG_DATA(nh);
 
 		if (ifa->ifa_scope == RT_SCOPE_LINK ||
@@ -605,7 +572,9 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 				rta->rta_type = IFA_UNSPEC;
 		}
 
-		nl_req(s_dst, resp, nh, nh->nlmsg_len);
+		nl_req(s_dst, resp, nh, RTM_NEWADDR,
+		       (nh->nlmsg_flags & ~NLM_F_DUMP_FILTERED) | NLM_F_CREATE,
+		       nh->nlmsg_len);
 	}
 }
 
@@ -621,10 +590,6 @@ void nl_link_get_mac(int s, unsigned int ifi, void *mac)
 		struct nlmsghdr nlh;
 		struct ifinfomsg ifm;
 	} req = {
-		.nlh.nlmsg_type	  = RTM_GETLINK,
-		.nlh.nlmsg_len	  = sizeof(req),
-		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK,
-		.nlh.nlmsg_seq	  = nl_seq++,
 		.ifm.ifi_family	  = AF_UNSPEC,
 		.ifm.ifi_index	  = ifi,
 	};
@@ -632,7 +597,7 @@ void nl_link_get_mac(int s, unsigned int ifi, void *mac)
 	char buf[NLBUFSIZ];
 	ssize_t n;
 
-	n = nl_req(s, buf, &req, sizeof(req));
+	n = nl_req(s, buf, &req, RTM_GETLINK, 0, sizeof(req));
 	for (nh = (struct nlmsghdr *)buf;
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
 	     nh = NLMSG_NEXT(nh, n)) {
@@ -670,10 +635,6 @@ void nl_link_set_mac(int s, unsigned int ifi, void *mac)
 		struct rtattr rta;
 		unsigned char mac[ETH_ALEN];
 	} req = {
-		.nlh.nlmsg_type	  = RTM_NEWLINK,
-		.nlh.nlmsg_len	  = sizeof(req),
-		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK,
-		.nlh.nlmsg_seq	  = nl_seq++,
 		.ifm.ifi_family	  = AF_UNSPEC,
 		.ifm.ifi_index	  = ifi,
 		.rta.rta_type	  = IFLA_ADDRESS,
@@ -683,7 +644,7 @@ void nl_link_set_mac(int s, unsigned int ifi, void *mac)
 
 	memcpy(req.mac, mac, ETH_ALEN);
 
-	nl_req(s, buf, &req, sizeof(req));
+	nl_req(s, buf, &req, RTM_NEWLINK, 0, sizeof(req));
 }
 
 /**
@@ -700,10 +661,6 @@ void nl_link_up(int s, unsigned int ifi, int mtu)
 		struct rtattr rta;
 		unsigned int mtu;
 	} req = {
-		.nlh.nlmsg_type   = RTM_NEWLINK,
-		.nlh.nlmsg_len    = sizeof(req),
-		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK,
-		.nlh.nlmsg_seq	  = nl_seq++,
 		.ifm.ifi_family	  = AF_UNSPEC,
 		.ifm.ifi_index	  = ifi,
 		.ifm.ifi_flags	  = IFF_UP,
@@ -712,11 +669,12 @@ void nl_link_up(int s, unsigned int ifi, int mtu)
 		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
 		.mtu		  = mtu,
 	};
+	ssize_t len = sizeof(req);
 	char buf[NLBUFSIZ];
 
 	if (!mtu)
 		/* Shorten request to drop MTU attribute */
-		req.nlh.nlmsg_len = offsetof(struct req_t, rta);
+		len = offsetof(struct req_t, rta);
 
-	nl_req(s, buf, &req, req.nlh.nlmsg_len);
+	nl_req(s, buf, &req, RTM_NEWLINK, 0, len);
 }
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 10/17] netlink: Add nl_do() helper for simple operations with error checking
  2023-07-24  6:09 [PATCH 00/17] netlink fixes and cleanups David Gibson
                   ` (8 preceding siblings ...)
  2023-07-24  6:09 ` [PATCH 09/17] netlink: Fill in netlink header fields from nl_req() David Gibson
@ 2023-07-24  6:09 ` David Gibson
  2023-08-02 22:48   ` Stefano Brivio
  2023-07-24  6:09 ` [PATCH 11/17] netlink: Clearer reasoning about the netlink response buffer size David Gibson
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 35+ messages in thread
From: David Gibson @ 2023-07-24  6:09 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

So far we never checked for errors reported on netlink operations via
NLMSG_ERROR messages.  This has led to several subtle and tricky to debug
situations which would have been obvious if we knew that certain netlink
operations had failed.

Introduce a nl_do() helper that performs netlink "do" operations (that is
making a single change without retreiving complex information) with much
more thorough error checking.  As well as returning an error code if we
get an NLMSG_ERROR message, we also check for unexpected behaviour in
several places.  That way if we've made a mistake in our assumptions about
how netlink works it should result in a clear error rather than some subtle
misbehaviour.

We update those calls to nl_req() that can use the new wrapper to do so.
We will extend those to better handle errors in future.  We don't touch
non-"do" operations for now, those are a bit trickier.

Link: https://bugs.passt.top/show_bug.cgi?id=60

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 netlink.c | 59 ++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 47 insertions(+), 12 deletions(-)

diff --git a/netlink.c b/netlink.c
index 3170344..cdd65c0 100644
--- a/netlink.c
+++ b/netlink.c
@@ -148,6 +148,47 @@ static ssize_t nl_req(int s, char *buf, void *req,
 	return n;
 }
 
+/**
+ * nl_do() - Send netlink "do" request, and wait for acknowledgement
+ * @s:		Netlink socket
+ * @req:	Request (will fill netlink header)
+ * @type:	Request type
+ * @flags:	Extra request flags (NLM_F_REQUEST and NLM_F_ACK assumed)
+ * @len:	Request length
+ *
+ * Return: 0 on success, negative error code on error
+ */
+static int nl_do(int s, void *req, uint16_t type, uint16_t flags, ssize_t len)
+{
+	struct nlmsghdr *nh;
+	char buf[NLBUFSIZ];
+	uint16_t seq;
+	ssize_t n;
+
+	n = nl_req(s, buf, req, type, flags, len);
+	seq = ((struct nlmsghdr *)req)->nlmsg_seq;
+
+	for (nh = (struct nlmsghdr *)buf;
+	     NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
+		struct nlmsgerr *errmsg;
+
+		if (nh->nlmsg_seq != seq)
+			die("netlink: Unexpected response sequence number");
+
+		switch (nh->nlmsg_type) {
+		case NLMSG_DONE:
+			return 0;
+		case NLMSG_ERROR:
+			errmsg = (struct nlmsgerr *)NLMSG_DATA(nh);
+			return errmsg->error;
+		default:
+			warn("netlink: Unexpected response message");
+		}
+	}
+
+	die("netlink: Missing acknowledgement of request");
+}
+
 /**
  * nl_get_ext_if() - Get interface index supporting IP version being probed
  * @s:	Netlink socket
@@ -289,7 +330,6 @@ void nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
 		.ifi		  = ifi,
 	};
-	char buf[NLBUFSIZ];
 	ssize_t len;
 
 	if (af == AF_INET6) {
@@ -316,7 +356,7 @@ void nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 		req.set.r4.rta_gw.rta_len = rta_len;
 	}
 
-	nl_req(s, buf, &req, RTM_NEWROUTE, NLM_F_CREATE | NLM_F_EXCL, len);
+	nl_do(s, &req, RTM_NEWROUTE, NLM_F_CREATE | NLM_F_EXCL, len);
 }
 
 /**
@@ -386,12 +426,11 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 		     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
 		     nh = NLMSG_NEXT(nh, n)) {
 			uint16_t flags = nh->nlmsg_flags;
-			char resp[NLBUFSIZ];
 
 			if (nh->nlmsg_type != RTM_NEWROUTE)
 				continue;
 
-			nl_req(s_dst, resp, nh, RTM_NEWROUTE,
+			nl_do(s_dst, nh, RTM_NEWROUTE,
 			       (flags & ~NLM_F_DUMP_FILTERED) | NLM_F_CREATE,
 			       nh->nlmsg_len);
 		}
@@ -490,7 +529,6 @@ void nl_addr_set(int s, unsigned int ifi, sa_family_t af,
 		.ifa.ifa_prefixlen = prefix_len,
 		.ifa.ifa_scope	   = RT_SCOPE_UNIVERSE,
 	};
-	char buf[NLBUFSIZ];
 	ssize_t len;
 
 	if (af == AF_INET6) {
@@ -519,7 +557,7 @@ void nl_addr_set(int s, unsigned int ifi, sa_family_t af,
 		req.set.a4.rta_a.rta_type = IFA_ADDRESS;
 	}
 
-	nl_req(s, buf, &req, RTM_NEWADDR, NLM_F_CREATE | NLM_F_EXCL, len);
+	nl_do(s, &req, RTM_NEWADDR, NLM_F_CREATE | NLM_F_EXCL, len);
 }
 
 /**
@@ -551,7 +589,6 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
 	     nh = NLMSG_NEXT(nh, n)) {
 		struct ifaddrmsg *ifa;
-		char resp[NLBUFSIZ];
 		struct rtattr *rta;
 		size_t na;
 
@@ -572,7 +609,7 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 				rta->rta_type = IFA_UNSPEC;
 		}
 
-		nl_req(s_dst, resp, nh, RTM_NEWADDR,
+		nl_do(s_dst, nh, RTM_NEWADDR,
 		       (nh->nlmsg_flags & ~NLM_F_DUMP_FILTERED) | NLM_F_CREATE,
 		       nh->nlmsg_len);
 	}
@@ -640,11 +677,10 @@ void nl_link_set_mac(int s, unsigned int ifi, void *mac)
 		.rta.rta_type	  = IFLA_ADDRESS,
 		.rta.rta_len	  = RTA_LENGTH(ETH_ALEN),
 	};
-	char buf[NLBUFSIZ];
 
 	memcpy(req.mac, mac, ETH_ALEN);
 
-	nl_req(s, buf, &req, RTM_NEWLINK, 0, sizeof(req));
+	nl_do(s, &req, RTM_NEWLINK, 0, sizeof(req));
 }
 
 /**
@@ -670,11 +706,10 @@ void nl_link_up(int s, unsigned int ifi, int mtu)
 		.mtu		  = mtu,
 	};
 	ssize_t len = sizeof(req);
-	char buf[NLBUFSIZ];
 
 	if (!mtu)
 		/* Shorten request to drop MTU attribute */
 		len = offsetof(struct req_t, rta);
 
-	nl_req(s, buf, &req, RTM_NEWLINK, 0, len);
+	nl_do(s, &req, RTM_NEWLINK, 0, len);
 }
-- 
@@ -148,6 +148,47 @@ static ssize_t nl_req(int s, char *buf, void *req,
 	return n;
 }
 
+/**
+ * nl_do() - Send netlink "do" request, and wait for acknowledgement
+ * @s:		Netlink socket
+ * @req:	Request (will fill netlink header)
+ * @type:	Request type
+ * @flags:	Extra request flags (NLM_F_REQUEST and NLM_F_ACK assumed)
+ * @len:	Request length
+ *
+ * Return: 0 on success, negative error code on error
+ */
+static int nl_do(int s, void *req, uint16_t type, uint16_t flags, ssize_t len)
+{
+	struct nlmsghdr *nh;
+	char buf[NLBUFSIZ];
+	uint16_t seq;
+	ssize_t n;
+
+	n = nl_req(s, buf, req, type, flags, len);
+	seq = ((struct nlmsghdr *)req)->nlmsg_seq;
+
+	for (nh = (struct nlmsghdr *)buf;
+	     NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
+		struct nlmsgerr *errmsg;
+
+		if (nh->nlmsg_seq != seq)
+			die("netlink: Unexpected response sequence number");
+
+		switch (nh->nlmsg_type) {
+		case NLMSG_DONE:
+			return 0;
+		case NLMSG_ERROR:
+			errmsg = (struct nlmsgerr *)NLMSG_DATA(nh);
+			return errmsg->error;
+		default:
+			warn("netlink: Unexpected response message");
+		}
+	}
+
+	die("netlink: Missing acknowledgement of request");
+}
+
 /**
  * nl_get_ext_if() - Get interface index supporting IP version being probed
  * @s:	Netlink socket
@@ -289,7 +330,6 @@ void nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
 		.ifi		  = ifi,
 	};
-	char buf[NLBUFSIZ];
 	ssize_t len;
 
 	if (af == AF_INET6) {
@@ -316,7 +356,7 @@ void nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 		req.set.r4.rta_gw.rta_len = rta_len;
 	}
 
-	nl_req(s, buf, &req, RTM_NEWROUTE, NLM_F_CREATE | NLM_F_EXCL, len);
+	nl_do(s, &req, RTM_NEWROUTE, NLM_F_CREATE | NLM_F_EXCL, len);
 }
 
 /**
@@ -386,12 +426,11 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 		     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
 		     nh = NLMSG_NEXT(nh, n)) {
 			uint16_t flags = nh->nlmsg_flags;
-			char resp[NLBUFSIZ];
 
 			if (nh->nlmsg_type != RTM_NEWROUTE)
 				continue;
 
-			nl_req(s_dst, resp, nh, RTM_NEWROUTE,
+			nl_do(s_dst, nh, RTM_NEWROUTE,
 			       (flags & ~NLM_F_DUMP_FILTERED) | NLM_F_CREATE,
 			       nh->nlmsg_len);
 		}
@@ -490,7 +529,6 @@ void nl_addr_set(int s, unsigned int ifi, sa_family_t af,
 		.ifa.ifa_prefixlen = prefix_len,
 		.ifa.ifa_scope	   = RT_SCOPE_UNIVERSE,
 	};
-	char buf[NLBUFSIZ];
 	ssize_t len;
 
 	if (af == AF_INET6) {
@@ -519,7 +557,7 @@ void nl_addr_set(int s, unsigned int ifi, sa_family_t af,
 		req.set.a4.rta_a.rta_type = IFA_ADDRESS;
 	}
 
-	nl_req(s, buf, &req, RTM_NEWADDR, NLM_F_CREATE | NLM_F_EXCL, len);
+	nl_do(s, &req, RTM_NEWADDR, NLM_F_CREATE | NLM_F_EXCL, len);
 }
 
 /**
@@ -551,7 +589,6 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
 	     nh = NLMSG_NEXT(nh, n)) {
 		struct ifaddrmsg *ifa;
-		char resp[NLBUFSIZ];
 		struct rtattr *rta;
 		size_t na;
 
@@ -572,7 +609,7 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 				rta->rta_type = IFA_UNSPEC;
 		}
 
-		nl_req(s_dst, resp, nh, RTM_NEWADDR,
+		nl_do(s_dst, nh, RTM_NEWADDR,
 		       (nh->nlmsg_flags & ~NLM_F_DUMP_FILTERED) | NLM_F_CREATE,
 		       nh->nlmsg_len);
 	}
@@ -640,11 +677,10 @@ void nl_link_set_mac(int s, unsigned int ifi, void *mac)
 		.rta.rta_type	  = IFLA_ADDRESS,
 		.rta.rta_len	  = RTA_LENGTH(ETH_ALEN),
 	};
-	char buf[NLBUFSIZ];
 
 	memcpy(req.mac, mac, ETH_ALEN);
 
-	nl_req(s, buf, &req, RTM_NEWLINK, 0, sizeof(req));
+	nl_do(s, &req, RTM_NEWLINK, 0, sizeof(req));
 }
 
 /**
@@ -670,11 +706,10 @@ void nl_link_up(int s, unsigned int ifi, int mtu)
 		.mtu		  = mtu,
 	};
 	ssize_t len = sizeof(req);
-	char buf[NLBUFSIZ];
 
 	if (!mtu)
 		/* Shorten request to drop MTU attribute */
 		len = offsetof(struct req_t, rta);
 
-	nl_req(s, buf, &req, RTM_NEWLINK, 0, len);
+	nl_do(s, &req, RTM_NEWLINK, 0, len);
 }
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 11/17] netlink: Clearer reasoning about the netlink response buffer size
  2023-07-24  6:09 [PATCH 00/17] netlink fixes and cleanups David Gibson
                   ` (9 preceding siblings ...)
  2023-07-24  6:09 ` [PATCH 10/17] netlink: Add nl_do() helper for simple operations with error checking David Gibson
@ 2023-07-24  6:09 ` David Gibson
  2023-08-02 22:48   ` Stefano Brivio
  2023-07-24  6:09 ` [PATCH 12/17] netlink: Split nl_req() to allow processing multiple response datagrams David Gibson
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 35+ messages in thread
From: David Gibson @ 2023-07-24  6:09 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Currently we set NLBUFSIZ large enough for 8192 netlink headers (128kiB in
total), and reference netlink(7).  However netlink(7) says nothing about
reponse buffer sizes, and the documents which do reference 8192 *bytes* not
8192 headers.

Update NLBUFSIZ to 64kiB with a more detailed rationale.

Link: https://bugs.passt.top/show_bug.cgi?id=67

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 netlink.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/netlink.c b/netlink.c
index cdd65c0..d553ddd 100644
--- a/netlink.c
+++ b/netlink.c
@@ -35,7 +35,14 @@
 #include "log.h"
 #include "netlink.h"
 
-#define NLBUFSIZ	(8192 * sizeof(struct nlmsghdr)) /* See netlink(7) */
+/* Netlink expects a buffer of at least 8kiB or the system page size,
+ * whichever is larger.  32kiB is recommended for more efficient.
+ * Since the largest page size on any remotely common Linux setup is
+ * 64kiB (ppc64), that should cover it.
+ *
+ * https://www.kernel.org/doc/html/next/userspace-api/netlink/intro.html#buffer-sizing
+ */
+#define NLBUFSIZ 65536
 
 /* Socket in init, in target namespace, sequence (just needs to be monotonic) */
 int nl_sock	= -1;
-- 
@@ -35,7 +35,14 @@
 #include "log.h"
 #include "netlink.h"
 
-#define NLBUFSIZ	(8192 * sizeof(struct nlmsghdr)) /* See netlink(7) */
+/* Netlink expects a buffer of at least 8kiB or the system page size,
+ * whichever is larger.  32kiB is recommended for more efficient.
+ * Since the largest page size on any remotely common Linux setup is
+ * 64kiB (ppc64), that should cover it.
+ *
+ * https://www.kernel.org/doc/html/next/userspace-api/netlink/intro.html#buffer-sizing
+ */
+#define NLBUFSIZ 65536
 
 /* Socket in init, in target namespace, sequence (just needs to be monotonic) */
 int nl_sock	= -1;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 12/17] netlink: Split nl_req() to allow processing multiple response datagrams
  2023-07-24  6:09 [PATCH 00/17] netlink fixes and cleanups David Gibson
                   ` (10 preceding siblings ...)
  2023-07-24  6:09 ` [PATCH 11/17] netlink: Clearer reasoning about the netlink response buffer size David Gibson
@ 2023-07-24  6:09 ` David Gibson
  2023-07-24  6:09 ` [PATCH 13/17] netlink: Add nl_foreach_oftype to filter response message types David Gibson
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2023-07-24  6:09 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Currently nl_req() sends the request, and receives a single response
datagram which we then process.  However, a single request can result in
multiple response datagrams.  That happens nearly all the time for DUMP
requests, where the 'DONE' message usually comes in a second datagram after
the NEW{LINK|ADDR|ROUTE} messages.  It can also happen if there are just
too many objects to dump in a single datagram.

Allow our netlink code to process multiple response datagrams by splitting
nl_req() into three different helpers: nl_send() just sends a request,
without getting a response.  nl_status() checks a single message to see if
it indicates the end of the reponses for our request.  nl_next() moves onto
the next response message, whether it's in a datagram we already received
or we need to recv() a new one.  We also add a 'for'-style macro to use
these to step through every response message to a request across multiple
datagrams.

While we're at it, be more thourough with checking that our sequence
numbers are in sync.

Link: https://bugs.passt.top/show_bug.cgi?id=67

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 netlink.c | 181 ++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 113 insertions(+), 68 deletions(-)

diff --git a/netlink.c b/netlink.c
index d553ddd..9293e2b 100644
--- a/netlink.c
+++ b/netlink.c
@@ -104,18 +104,17 @@ fail:
 }
 
 /**
- * nl_req() - Prepare and send netlink request, read response
+ * nl_send() - Prepare and send netlink request
  * @s:		Netlink socket
- * @buf:	Buffer for response (at least NLBUFSIZ long)
  * @req:	Request (will fill netlink header)
  * @type:	Request type
  * @flags:	Extra request flags (NLM_F_REQUEST and NLM_F_ACK assumed)
  * @len:	Request length
  *
- * Return: received length on success, terminates on error
+ * Return: sequence number of request on success, terminates on error
  */
-static ssize_t nl_req(int s, char *buf, void *req,
-		      uint16_t type, uint16_t flags, ssize_t len)
+static uint16_t nl_send(int s, void *req, uint16_t type,
+		       uint16_t flags, ssize_t len)
 {
 	char flush[NLBUFSIZ];
 	struct nlmsghdr *nh;
@@ -148,13 +147,80 @@ static ssize_t nl_req(int s, char *buf, void *req,
 	else if (n < len)
 		die("netlink: Short send");
 
-	n = recv(s, buf, NLBUFSIZ, 0);
-	if (n < 0)
-		die("netlink: Failed to recv(): %s", strerror(errno));
+	return nh->nlmsg_seq;
+}
+
+/**
+ * nl_status() - Check status given by a netlink response
+ * @nh:		Netlink response header
+ * @n:		Remaining space in response buffer from @nh
+ * @seq:	Request sequence number we expect a response to
+ *
+ * Return: 0 if @nh indicated successful completion,
+ *         < 0, negative error code if @nh indicated failure
+ *         > 0 @n if there are more responses to request @seq
+ *     terminates if sequence numbers are out of sync
+ */
+static int nl_status(const struct nlmsghdr *nh, ssize_t n, uint16_t seq)
+{
+	ASSERT(NLMSG_OK(nh, n));
+
+	if (nh->nlmsg_seq != seq)
+		die("netlink: Unexpected sequence number (%hu != %hu)",
+		    nh->nlmsg_seq, seq);
+
+	if (nh->nlmsg_type == NLMSG_DONE) {
+		return 0;
+	}
+	if (nh->nlmsg_type == NLMSG_ERROR) {
+		struct nlmsgerr *errmsg = (struct nlmsgerr *)NLMSG_DATA(nh);
+		return errmsg->error;
+	}
 
 	return n;
 }
 
+/**
+ * nl_next() - Get next netlink response message, recv()ing if necessary
+ * @s:		Netlink socket
+ * @buf:	Buffer for responses (at least NLBUFSIZ long)
+ * @nh:		Previous message, or NULL if there are none
+ * @n:		Variable with remaining unread bytes in buffer (updated)
+ *
+ * Return: pointer to next unread netlink response message (may block)
+ */
+static struct nlmsghdr *nl_next(int s, char *buf, struct nlmsghdr *nh, ssize_t *n)
+{
+	if (nh) {
+		nh = NLMSG_NEXT(nh, *n);
+		if (NLMSG_OK(nh, *n))
+			return nh;
+	}
+
+	*n = recv(s, buf, NLBUFSIZ, 0);
+	if (*n < 0)
+		die("netlink: Failed to recv(): %s", strerror(errno));
+
+	nh = (struct nlmsghdr *)buf;
+	if (!NLMSG_OK(nh, *n))
+		die("netlink: Response datagram with no message");
+
+	return nh;
+}
+
+/**
+ * nl_foreach - 'for' type macro to step through netlink response messages
+ * @nh:		Steps through each response header (struct nlmsghdr *)
+ * @status:	When loop exits indicates if there was an error (ssize_t)
+ * @s:		Netlink socket
+ * @buf:	Buffer for responses (at least NLBUFSIZ long)
+ * @seq:	Sequence number of request we're getting responses for
+  */
+#define nl_foreach(nh, status, s, buf, seq)				\
+	for ((nh) = nl_next((s), (buf), NULL, &(status));		\
+	     ((status) = nl_status((nh), (status), (seq))) > 0;		\
+	     (nh) = nl_next((s), (buf), (nh), &(status)))
+
 /**
  * nl_do() - Send netlink "do" request, and wait for acknowledgement
  * @s:		Netlink socket
@@ -169,31 +235,14 @@ static int nl_do(int s, void *req, uint16_t type, uint16_t flags, ssize_t len)
 {
 	struct nlmsghdr *nh;
 	char buf[NLBUFSIZ];
+	ssize_t status;
 	uint16_t seq;
-	ssize_t n;
-
-	n = nl_req(s, buf, req, type, flags, len);
-	seq = ((struct nlmsghdr *)req)->nlmsg_seq;
 
-	for (nh = (struct nlmsghdr *)buf;
-	     NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
-		struct nlmsgerr *errmsg;
+	seq = nl_send(s, req, type, flags, len);
+	nl_foreach(nh, status, s, buf, seq)
+		warn("netlink: Unexpected response message");
 
-		if (nh->nlmsg_seq != seq)
-			die("netlink: Unexpected response sequence number");
-
-		switch (nh->nlmsg_type) {
-		case NLMSG_DONE:
-			return 0;
-		case NLMSG_ERROR:
-			errmsg = (struct nlmsgerr *)NLMSG_DATA(nh);
-			return errmsg->error;
-		default:
-			warn("netlink: Unexpected response message");
-		}
-	}
-
-	die("netlink: Missing acknowledgement of request");
+	return status;
 }
 
 /**
@@ -215,14 +264,12 @@ unsigned int nl_get_ext_if(int s, sa_family_t af)
 	struct nlmsghdr *nh;
 	struct rtattr *rta;
 	char buf[NLBUFSIZ];
-	ssize_t n;
+	ssize_t status;
+	uint16_t seq;
 	size_t na;
 
-	n = nl_req(s, buf, &req, RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
-
-	nh = (struct nlmsghdr *)buf;
-
-	for ( ; NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
+	seq = nl_send(s, &req, RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
+	nl_foreach(nh, status, s, buf, seq) {
 		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
 
 		if (rtm->rtm_dst_len || rtm->rtm_family != af)
@@ -270,13 +317,11 @@ void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 	};
 	struct nlmsghdr *nh;
 	char buf[NLBUFSIZ];
-	ssize_t n;
-
-	n = nl_req(s, buf, &req, RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
+	ssize_t status;
+	uint16_t seq;
 
-	for (nh = (struct nlmsghdr *)buf;
-	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
-	     nh = NLMSG_NEXT(nh, n)) {
+	seq = nl_send(s, &req, RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
+	nl_foreach(nh, status, s, buf, seq) {
 		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
 		struct rtattr *rta;
 		size_t na;
@@ -392,18 +437,23 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
 		.ifi		  = ifi_src,
 	};
+	ssize_t nlmsgs_size, status;
 	unsigned dup_routes = 0;
-	ssize_t n, nlmsgs_size;
 	struct nlmsghdr *nh;
 	char buf[NLBUFSIZ];
+	uint16_t seq;
 	unsigned i;
 
-	nlmsgs_size = nl_req(s_src, buf, &req,
-			     RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
+	seq = nl_send(s_src, &req, RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
 
-	for (nh = (struct nlmsghdr *)buf, n = nlmsgs_size;
-	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
-	     nh = NLMSG_NEXT(nh, n)) {
+	/* nl_foreach() will step through multiple response datagrams,
+	 * which we don't want here because we need to have all the
+	 * routes in the buffer at once.
+	 */
+	nh = nl_next(s_src, buf, NULL, &nlmsgs_size);
+	for (status = nlmsgs_size;
+	     NLMSG_OK(nh, status) && (status = nl_status(nh, status, seq)) > 0;
+	     nh = NLMSG_NEXT(nh, status)) {
 		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
 		struct rtattr *rta;
 		size_t na;
@@ -429,9 +479,9 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 	 * dependencies: let the kernel do that.
 	 */
 	for (i = 0; i < dup_routes; i++) {
-		for (nh = (struct nlmsghdr *)buf, n = nlmsgs_size;
-		     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
-		     nh = NLMSG_NEXT(nh, n)) {
+		for (nh = (struct nlmsghdr *)buf, status = nlmsgs_size;
+		     NLMSG_OK(nh, status);
+		     nh = NLMSG_NEXT(nh, status)) {
 			uint16_t flags = nh->nlmsg_flags;
 
 			if (nh->nlmsg_type != RTM_NEWROUTE)
@@ -465,13 +515,11 @@ void nl_addr_get(int s, unsigned int ifi, sa_family_t af,
 	};
 	struct nlmsghdr *nh;
 	char buf[NLBUFSIZ];
-	ssize_t n;
-
-	n = nl_req(s, buf, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req));
+	ssize_t status;
+	uint16_t seq;
 
-	for (nh = (struct nlmsghdr *)buf;
-	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
-	     nh = NLMSG_NEXT(nh, n)) {
+	seq = nl_send(s, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req));
+	nl_foreach(nh, status, s, buf, seq) {
 		struct ifaddrmsg *ifa = (struct ifaddrmsg *)NLMSG_DATA(nh);
 		struct rtattr *rta;
 		size_t na;
@@ -588,13 +636,11 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 	};
 	char buf[NLBUFSIZ];
 	struct nlmsghdr *nh;
-	ssize_t n;
-
-	n = nl_req(s_src, buf, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req));
+	ssize_t status;
+	uint16_t seq;
 
-	for (nh = (struct nlmsghdr *)buf;
-	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
-	     nh = NLMSG_NEXT(nh, n)) {
+	seq = nl_send(s_src, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req));
+	nl_foreach(nh, status, s_src, buf, seq) {
 		struct ifaddrmsg *ifa;
 		struct rtattr *rta;
 		size_t na;
@@ -639,12 +685,11 @@ void nl_link_get_mac(int s, unsigned int ifi, void *mac)
 	};
 	struct nlmsghdr *nh;
 	char buf[NLBUFSIZ];
-	ssize_t n;
+	ssize_t status;
+	uint16_t seq;
 
-	n = nl_req(s, buf, &req, RTM_GETLINK, 0, sizeof(req));
-	for (nh = (struct nlmsghdr *)buf;
-	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
-	     nh = NLMSG_NEXT(nh, n)) {
+	seq = nl_send(s, &req, RTM_GETLINK, 0, sizeof(req));
+	nl_foreach(nh, status, s, buf, seq) {
 		struct ifinfomsg *ifm = (struct ifinfomsg *)NLMSG_DATA(nh);
 		struct rtattr *rta;
 		size_t na;
-- 
@@ -104,18 +104,17 @@ fail:
 }
 
 /**
- * nl_req() - Prepare and send netlink request, read response
+ * nl_send() - Prepare and send netlink request
  * @s:		Netlink socket
- * @buf:	Buffer for response (at least NLBUFSIZ long)
  * @req:	Request (will fill netlink header)
  * @type:	Request type
  * @flags:	Extra request flags (NLM_F_REQUEST and NLM_F_ACK assumed)
  * @len:	Request length
  *
- * Return: received length on success, terminates on error
+ * Return: sequence number of request on success, terminates on error
  */
-static ssize_t nl_req(int s, char *buf, void *req,
-		      uint16_t type, uint16_t flags, ssize_t len)
+static uint16_t nl_send(int s, void *req, uint16_t type,
+		       uint16_t flags, ssize_t len)
 {
 	char flush[NLBUFSIZ];
 	struct nlmsghdr *nh;
@@ -148,13 +147,80 @@ static ssize_t nl_req(int s, char *buf, void *req,
 	else if (n < len)
 		die("netlink: Short send");
 
-	n = recv(s, buf, NLBUFSIZ, 0);
-	if (n < 0)
-		die("netlink: Failed to recv(): %s", strerror(errno));
+	return nh->nlmsg_seq;
+}
+
+/**
+ * nl_status() - Check status given by a netlink response
+ * @nh:		Netlink response header
+ * @n:		Remaining space in response buffer from @nh
+ * @seq:	Request sequence number we expect a response to
+ *
+ * Return: 0 if @nh indicated successful completion,
+ *         < 0, negative error code if @nh indicated failure
+ *         > 0 @n if there are more responses to request @seq
+ *     terminates if sequence numbers are out of sync
+ */
+static int nl_status(const struct nlmsghdr *nh, ssize_t n, uint16_t seq)
+{
+	ASSERT(NLMSG_OK(nh, n));
+
+	if (nh->nlmsg_seq != seq)
+		die("netlink: Unexpected sequence number (%hu != %hu)",
+		    nh->nlmsg_seq, seq);
+
+	if (nh->nlmsg_type == NLMSG_DONE) {
+		return 0;
+	}
+	if (nh->nlmsg_type == NLMSG_ERROR) {
+		struct nlmsgerr *errmsg = (struct nlmsgerr *)NLMSG_DATA(nh);
+		return errmsg->error;
+	}
 
 	return n;
 }
 
+/**
+ * nl_next() - Get next netlink response message, recv()ing if necessary
+ * @s:		Netlink socket
+ * @buf:	Buffer for responses (at least NLBUFSIZ long)
+ * @nh:		Previous message, or NULL if there are none
+ * @n:		Variable with remaining unread bytes in buffer (updated)
+ *
+ * Return: pointer to next unread netlink response message (may block)
+ */
+static struct nlmsghdr *nl_next(int s, char *buf, struct nlmsghdr *nh, ssize_t *n)
+{
+	if (nh) {
+		nh = NLMSG_NEXT(nh, *n);
+		if (NLMSG_OK(nh, *n))
+			return nh;
+	}
+
+	*n = recv(s, buf, NLBUFSIZ, 0);
+	if (*n < 0)
+		die("netlink: Failed to recv(): %s", strerror(errno));
+
+	nh = (struct nlmsghdr *)buf;
+	if (!NLMSG_OK(nh, *n))
+		die("netlink: Response datagram with no message");
+
+	return nh;
+}
+
+/**
+ * nl_foreach - 'for' type macro to step through netlink response messages
+ * @nh:		Steps through each response header (struct nlmsghdr *)
+ * @status:	When loop exits indicates if there was an error (ssize_t)
+ * @s:		Netlink socket
+ * @buf:	Buffer for responses (at least NLBUFSIZ long)
+ * @seq:	Sequence number of request we're getting responses for
+  */
+#define nl_foreach(nh, status, s, buf, seq)				\
+	for ((nh) = nl_next((s), (buf), NULL, &(status));		\
+	     ((status) = nl_status((nh), (status), (seq))) > 0;		\
+	     (nh) = nl_next((s), (buf), (nh), &(status)))
+
 /**
  * nl_do() - Send netlink "do" request, and wait for acknowledgement
  * @s:		Netlink socket
@@ -169,31 +235,14 @@ static int nl_do(int s, void *req, uint16_t type, uint16_t flags, ssize_t len)
 {
 	struct nlmsghdr *nh;
 	char buf[NLBUFSIZ];
+	ssize_t status;
 	uint16_t seq;
-	ssize_t n;
-
-	n = nl_req(s, buf, req, type, flags, len);
-	seq = ((struct nlmsghdr *)req)->nlmsg_seq;
 
-	for (nh = (struct nlmsghdr *)buf;
-	     NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
-		struct nlmsgerr *errmsg;
+	seq = nl_send(s, req, type, flags, len);
+	nl_foreach(nh, status, s, buf, seq)
+		warn("netlink: Unexpected response message");
 
-		if (nh->nlmsg_seq != seq)
-			die("netlink: Unexpected response sequence number");
-
-		switch (nh->nlmsg_type) {
-		case NLMSG_DONE:
-			return 0;
-		case NLMSG_ERROR:
-			errmsg = (struct nlmsgerr *)NLMSG_DATA(nh);
-			return errmsg->error;
-		default:
-			warn("netlink: Unexpected response message");
-		}
-	}
-
-	die("netlink: Missing acknowledgement of request");
+	return status;
 }
 
 /**
@@ -215,14 +264,12 @@ unsigned int nl_get_ext_if(int s, sa_family_t af)
 	struct nlmsghdr *nh;
 	struct rtattr *rta;
 	char buf[NLBUFSIZ];
-	ssize_t n;
+	ssize_t status;
+	uint16_t seq;
 	size_t na;
 
-	n = nl_req(s, buf, &req, RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
-
-	nh = (struct nlmsghdr *)buf;
-
-	for ( ; NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
+	seq = nl_send(s, &req, RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
+	nl_foreach(nh, status, s, buf, seq) {
 		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
 
 		if (rtm->rtm_dst_len || rtm->rtm_family != af)
@@ -270,13 +317,11 @@ void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 	};
 	struct nlmsghdr *nh;
 	char buf[NLBUFSIZ];
-	ssize_t n;
-
-	n = nl_req(s, buf, &req, RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
+	ssize_t status;
+	uint16_t seq;
 
-	for (nh = (struct nlmsghdr *)buf;
-	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
-	     nh = NLMSG_NEXT(nh, n)) {
+	seq = nl_send(s, &req, RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
+	nl_foreach(nh, status, s, buf, seq) {
 		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
 		struct rtattr *rta;
 		size_t na;
@@ -392,18 +437,23 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
 		.ifi		  = ifi_src,
 	};
+	ssize_t nlmsgs_size, status;
 	unsigned dup_routes = 0;
-	ssize_t n, nlmsgs_size;
 	struct nlmsghdr *nh;
 	char buf[NLBUFSIZ];
+	uint16_t seq;
 	unsigned i;
 
-	nlmsgs_size = nl_req(s_src, buf, &req,
-			     RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
+	seq = nl_send(s_src, &req, RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
 
-	for (nh = (struct nlmsghdr *)buf, n = nlmsgs_size;
-	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
-	     nh = NLMSG_NEXT(nh, n)) {
+	/* nl_foreach() will step through multiple response datagrams,
+	 * which we don't want here because we need to have all the
+	 * routes in the buffer at once.
+	 */
+	nh = nl_next(s_src, buf, NULL, &nlmsgs_size);
+	for (status = nlmsgs_size;
+	     NLMSG_OK(nh, status) && (status = nl_status(nh, status, seq)) > 0;
+	     nh = NLMSG_NEXT(nh, status)) {
 		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
 		struct rtattr *rta;
 		size_t na;
@@ -429,9 +479,9 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 	 * dependencies: let the kernel do that.
 	 */
 	for (i = 0; i < dup_routes; i++) {
-		for (nh = (struct nlmsghdr *)buf, n = nlmsgs_size;
-		     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
-		     nh = NLMSG_NEXT(nh, n)) {
+		for (nh = (struct nlmsghdr *)buf, status = nlmsgs_size;
+		     NLMSG_OK(nh, status);
+		     nh = NLMSG_NEXT(nh, status)) {
 			uint16_t flags = nh->nlmsg_flags;
 
 			if (nh->nlmsg_type != RTM_NEWROUTE)
@@ -465,13 +515,11 @@ void nl_addr_get(int s, unsigned int ifi, sa_family_t af,
 	};
 	struct nlmsghdr *nh;
 	char buf[NLBUFSIZ];
-	ssize_t n;
-
-	n = nl_req(s, buf, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req));
+	ssize_t status;
+	uint16_t seq;
 
-	for (nh = (struct nlmsghdr *)buf;
-	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
-	     nh = NLMSG_NEXT(nh, n)) {
+	seq = nl_send(s, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req));
+	nl_foreach(nh, status, s, buf, seq) {
 		struct ifaddrmsg *ifa = (struct ifaddrmsg *)NLMSG_DATA(nh);
 		struct rtattr *rta;
 		size_t na;
@@ -588,13 +636,11 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 	};
 	char buf[NLBUFSIZ];
 	struct nlmsghdr *nh;
-	ssize_t n;
-
-	n = nl_req(s_src, buf, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req));
+	ssize_t status;
+	uint16_t seq;
 
-	for (nh = (struct nlmsghdr *)buf;
-	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
-	     nh = NLMSG_NEXT(nh, n)) {
+	seq = nl_send(s_src, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req));
+	nl_foreach(nh, status, s_src, buf, seq) {
 		struct ifaddrmsg *ifa;
 		struct rtattr *rta;
 		size_t na;
@@ -639,12 +685,11 @@ void nl_link_get_mac(int s, unsigned int ifi, void *mac)
 	};
 	struct nlmsghdr *nh;
 	char buf[NLBUFSIZ];
-	ssize_t n;
+	ssize_t status;
+	uint16_t seq;
 
-	n = nl_req(s, buf, &req, RTM_GETLINK, 0, sizeof(req));
-	for (nh = (struct nlmsghdr *)buf;
-	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
-	     nh = NLMSG_NEXT(nh, n)) {
+	seq = nl_send(s, &req, RTM_GETLINK, 0, sizeof(req));
+	nl_foreach(nh, status, s, buf, seq) {
 		struct ifinfomsg *ifm = (struct ifinfomsg *)NLMSG_DATA(nh);
 		struct rtattr *rta;
 		size_t na;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 13/17] netlink: Add nl_foreach_oftype to filter response message types
  2023-07-24  6:09 [PATCH 00/17] netlink fixes and cleanups David Gibson
                   ` (11 preceding siblings ...)
  2023-07-24  6:09 ` [PATCH 12/17] netlink: Split nl_req() to allow processing multiple response datagrams David Gibson
@ 2023-07-24  6:09 ` David Gibson
  2023-07-24  6:09 ` [PATCH 14/17] netlink: Propagate errors for "set" operations David Gibson
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2023-07-24  6:09 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

In most cases where processing response messages, we expect only one type
of message (excepting NLMSG_DONE or NLMSG_ERROR), and so we need a test
and continue to skip anything else.  Add a helper macro to do this.
This also fixes a bug in nl_get_ext_if() where we didn't have such a test
and if we got a message other than RTM_NEWROUTE we would have parsed
its contents as nonsense.

Also add a warning message if we get such an unexpected message type, which
could be useful for debugging if we ever hit it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 netlink.c | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/netlink.c b/netlink.c
index 9293e2b..1d40856 100644
--- a/netlink.c
+++ b/netlink.c
@@ -210,17 +210,25 @@ static struct nlmsghdr *nl_next(int s, char *buf, struct nlmsghdr *nh, ssize_t *
 
 /**
  * nl_foreach - 'for' type macro to step through netlink response messages
+ * nl_foreach_oftype - as above, but only messages of expected type
  * @nh:		Steps through each response header (struct nlmsghdr *)
  * @status:	When loop exits indicates if there was an error (ssize_t)
  * @s:		Netlink socket
  * @buf:	Buffer for responses (at least NLBUFSIZ long)
  * @seq:	Sequence number of request we're getting responses for
-  */
+ * @type:	Type of netlink message to process
+ */
 #define nl_foreach(nh, status, s, buf, seq)				\
 	for ((nh) = nl_next((s), (buf), NULL, &(status));		\
 	     ((status) = nl_status((nh), (status), (seq))) > 0;		\
 	     (nh) = nl_next((s), (buf), (nh), &(status)))
 
+#define nl_foreach_oftype(nh, status, s, buf, seq, type)		\
+	nl_foreach((nh), (status), (s), (buf), (seq))			\
+		if ((nh)->nlmsg_type != (type)) {			\
+			warn("netlink: Unexpected message type");	\
+		} else
+
 /**
  * nl_do() - Send netlink "do" request, and wait for acknowledgement
  * @s:		Netlink socket
@@ -269,7 +277,7 @@ unsigned int nl_get_ext_if(int s, sa_family_t af)
 	size_t na;
 
 	seq = nl_send(s, &req, RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
-	nl_foreach(nh, status, s, buf, seq) {
+	nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWROUTE) {
 		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
 
 		if (rtm->rtm_dst_len || rtm->rtm_family != af)
@@ -321,14 +329,11 @@ void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 	uint16_t seq;
 
 	seq = nl_send(s, &req, RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
-	nl_foreach(nh, status, s, buf, seq) {
+	nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWROUTE) {
 		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
 		struct rtattr *rta;
 		size_t na;
 
-		if (nh->nlmsg_type != RTM_NEWROUTE)
-			continue;
-
 		if (rtm->rtm_dst_len)
 			continue;
 
@@ -519,14 +524,11 @@ void nl_addr_get(int s, unsigned int ifi, sa_family_t af,
 	uint16_t seq;
 
 	seq = nl_send(s, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req));
-	nl_foreach(nh, status, s, buf, seq) {
+	nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWADDR) {
 		struct ifaddrmsg *ifa = (struct ifaddrmsg *)NLMSG_DATA(nh);
 		struct rtattr *rta;
 		size_t na;
 
-		if (nh->nlmsg_type != RTM_NEWADDR)
-			continue;
-
 		if (ifa->ifa_index != ifi)
 			continue;
 
@@ -640,7 +642,7 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 	uint16_t seq;
 
 	seq = nl_send(s_src, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req));
-	nl_foreach(nh, status, s_src, buf, seq) {
+	nl_foreach_oftype(nh, status, s_src, buf, seq, RTM_NEWADDR) {
 		struct ifaddrmsg *ifa;
 		struct rtattr *rta;
 		size_t na;
@@ -689,14 +691,11 @@ void nl_link_get_mac(int s, unsigned int ifi, void *mac)
 	uint16_t seq;
 
 	seq = nl_send(s, &req, RTM_GETLINK, 0, sizeof(req));
-	nl_foreach(nh, status, s, buf, seq) {
+	nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWLINK) {
 		struct ifinfomsg *ifm = (struct ifinfomsg *)NLMSG_DATA(nh);
 		struct rtattr *rta;
 		size_t na;
 
-		if (nh->nlmsg_type != RTM_NEWLINK)
-			continue;
-
 		for (rta = IFLA_RTA(ifm), na = RTM_PAYLOAD(nh);
 		     RTA_OK(rta, na);
 		     rta = RTA_NEXT(rta, na)) {
-- 
@@ -210,17 +210,25 @@ static struct nlmsghdr *nl_next(int s, char *buf, struct nlmsghdr *nh, ssize_t *
 
 /**
  * nl_foreach - 'for' type macro to step through netlink response messages
+ * nl_foreach_oftype - as above, but only messages of expected type
  * @nh:		Steps through each response header (struct nlmsghdr *)
  * @status:	When loop exits indicates if there was an error (ssize_t)
  * @s:		Netlink socket
  * @buf:	Buffer for responses (at least NLBUFSIZ long)
  * @seq:	Sequence number of request we're getting responses for
-  */
+ * @type:	Type of netlink message to process
+ */
 #define nl_foreach(nh, status, s, buf, seq)				\
 	for ((nh) = nl_next((s), (buf), NULL, &(status));		\
 	     ((status) = nl_status((nh), (status), (seq))) > 0;		\
 	     (nh) = nl_next((s), (buf), (nh), &(status)))
 
+#define nl_foreach_oftype(nh, status, s, buf, seq, type)		\
+	nl_foreach((nh), (status), (s), (buf), (seq))			\
+		if ((nh)->nlmsg_type != (type)) {			\
+			warn("netlink: Unexpected message type");	\
+		} else
+
 /**
  * nl_do() - Send netlink "do" request, and wait for acknowledgement
  * @s:		Netlink socket
@@ -269,7 +277,7 @@ unsigned int nl_get_ext_if(int s, sa_family_t af)
 	size_t na;
 
 	seq = nl_send(s, &req, RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
-	nl_foreach(nh, status, s, buf, seq) {
+	nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWROUTE) {
 		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
 
 		if (rtm->rtm_dst_len || rtm->rtm_family != af)
@@ -321,14 +329,11 @@ void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 	uint16_t seq;
 
 	seq = nl_send(s, &req, RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
-	nl_foreach(nh, status, s, buf, seq) {
+	nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWROUTE) {
 		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
 		struct rtattr *rta;
 		size_t na;
 
-		if (nh->nlmsg_type != RTM_NEWROUTE)
-			continue;
-
 		if (rtm->rtm_dst_len)
 			continue;
 
@@ -519,14 +524,11 @@ void nl_addr_get(int s, unsigned int ifi, sa_family_t af,
 	uint16_t seq;
 
 	seq = nl_send(s, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req));
-	nl_foreach(nh, status, s, buf, seq) {
+	nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWADDR) {
 		struct ifaddrmsg *ifa = (struct ifaddrmsg *)NLMSG_DATA(nh);
 		struct rtattr *rta;
 		size_t na;
 
-		if (nh->nlmsg_type != RTM_NEWADDR)
-			continue;
-
 		if (ifa->ifa_index != ifi)
 			continue;
 
@@ -640,7 +642,7 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 	uint16_t seq;
 
 	seq = nl_send(s_src, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req));
-	nl_foreach(nh, status, s_src, buf, seq) {
+	nl_foreach_oftype(nh, status, s_src, buf, seq, RTM_NEWADDR) {
 		struct ifaddrmsg *ifa;
 		struct rtattr *rta;
 		size_t na;
@@ -689,14 +691,11 @@ void nl_link_get_mac(int s, unsigned int ifi, void *mac)
 	uint16_t seq;
 
 	seq = nl_send(s, &req, RTM_GETLINK, 0, sizeof(req));
-	nl_foreach(nh, status, s, buf, seq) {
+	nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWLINK) {
 		struct ifinfomsg *ifm = (struct ifinfomsg *)NLMSG_DATA(nh);
 		struct rtattr *rta;
 		size_t na;
 
-		if (nh->nlmsg_type != RTM_NEWLINK)
-			continue;
-
 		for (rta = IFLA_RTA(ifm), na = RTM_PAYLOAD(nh);
 		     RTA_OK(rta, na);
 		     rta = RTA_NEXT(rta, na)) {
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 14/17] netlink: Propagate errors for "set" operations
  2023-07-24  6:09 [PATCH 00/17] netlink fixes and cleanups David Gibson
                   ` (12 preceding siblings ...)
  2023-07-24  6:09 ` [PATCH 13/17] netlink: Add nl_foreach_oftype to filter response message types David Gibson
@ 2023-07-24  6:09 ` David Gibson
  2023-07-24  6:09 ` [PATCH 15/17] netlink: Always process all responses to a netlink request David Gibson
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2023-07-24  6:09 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Currently if anything goes wrong while we're configuring the namespace
network with --config-net, we'll just ignore it and carry on.  This might
lead to a silently unconfigured or misconfigured namespace environment.

For simple "set" operations based on nl_do() we can now detect failures
reported via netlink.  Propagate those errors up to pasta_ns_conf() and
report them usefully.

Link: https://bugs.passt.top/show_bug.cgi?id=60

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 netlink.c | 26 +++++++++++++++++---------
 netlink.h | 10 +++++-----
 pasta.c   | 42 ++++++++++++++++++++++++++++++++----------
 3 files changed, 54 insertions(+), 24 deletions(-)

diff --git a/netlink.c b/netlink.c
index 1d40856..4932f07 100644
--- a/netlink.c
+++ b/netlink.c
@@ -354,8 +354,10 @@ void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw)
  * @ifi:	Interface index in target namespace
  * @af:		Address family
  * @gw:		Default gateway to set
+ *
+ * Return: 0 on success, negative error code on failure
  */
-void nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw)
+int nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 {
 	struct req_t {
 		struct nlmsghdr nlh;
@@ -413,7 +415,7 @@ void nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 		req.set.r4.rta_gw.rta_len = rta_len;
 	}
 
-	nl_do(s, &req, RTM_NEWROUTE, NLM_F_CREATE | NLM_F_EXCL, len);
+	return nl_do(s, &req, RTM_NEWROUTE, NLM_F_CREATE | NLM_F_EXCL, len);
 }
 
 /**
@@ -559,9 +561,11 @@ void nl_addr_get(int s, unsigned int ifi, sa_family_t af,
  * @af:		Address family
  * @addr:	Global address to set
  * @prefix_len:	Mask or prefix length to set
+ *
+ * Return: 0 on success, negative error code on failure
  */
-void nl_addr_set(int s, unsigned int ifi, sa_family_t af,
-		 void *addr, int prefix_len)
+int nl_addr_set(int s, unsigned int ifi, sa_family_t af,
+		void *addr, int prefix_len)
 {
 	struct req_t {
 		struct nlmsghdr nlh;
@@ -614,7 +618,7 @@ void nl_addr_set(int s, unsigned int ifi, sa_family_t af,
 		req.set.a4.rta_a.rta_type = IFA_ADDRESS;
 	}
 
-	nl_do(s, &req, RTM_NEWADDR, NLM_F_CREATE | NLM_F_EXCL, len);
+	return nl_do(s, &req, RTM_NEWADDR, NLM_F_CREATE | NLM_F_EXCL, len);
 }
 
 /**
@@ -714,8 +718,10 @@ void nl_link_get_mac(int s, unsigned int ifi, void *mac)
  * @ns:		Use netlink socket in namespace
  * @ifi:	Interface index
  * @mac:	MAC address to set
+ *
+ * Return: 0 on success, negative error code on failure
  */
-void nl_link_set_mac(int s, unsigned int ifi, void *mac)
+int nl_link_set_mac(int s, unsigned int ifi, void *mac)
 {
 	struct req_t {
 		struct nlmsghdr nlh;
@@ -731,7 +737,7 @@ void nl_link_set_mac(int s, unsigned int ifi, void *mac)
 
 	memcpy(req.mac, mac, ETH_ALEN);
 
-	nl_do(s, &req, RTM_NEWLINK, 0, sizeof(req));
+	return nl_do(s, &req, RTM_NEWLINK, 0, sizeof(req));
 }
 
 /**
@@ -739,8 +745,10 @@ void nl_link_set_mac(int s, unsigned int ifi, void *mac)
  * @s:		Netlink socket
  * @ifi:	Interface index
  * @mtu:	If non-zero, set interface MTU
+ *
+ * Return: 0 on success, negative error code on failure
  */
-void nl_link_up(int s, unsigned int ifi, int mtu)
+int nl_link_up(int s, unsigned int ifi, int mtu)
 {
 	struct req_t {
 		struct nlmsghdr nlh;
@@ -762,5 +770,5 @@ void nl_link_up(int s, unsigned int ifi, int mtu)
 		/* Shorten request to drop MTU attribute */
 		len = offsetof(struct req_t, rta);
 
-	nl_do(s, &req, RTM_NEWLINK, 0, len);
+	return nl_do(s, &req, RTM_NEWLINK, 0, len);
 }
diff --git a/netlink.h b/netlink.h
index 5ca17c6..977244b 100644
--- a/netlink.h
+++ b/netlink.h
@@ -12,17 +12,17 @@ extern int nl_sock_ns;
 void nl_sock_init(const struct ctx *c, bool ns);
 unsigned int nl_get_ext_if(int s, sa_family_t af);
 void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw);
-void nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw);
+int nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw);
 void nl_route_dup(int s_src, unsigned int ifi_src,
 		  int s_dst, unsigned int ifi_dst, sa_family_t af);
 void nl_addr_get(int s, unsigned int ifi, sa_family_t af,
 		 void *addr, int *prefix_len, void *addr_l);
-void nl_addr_set(int s, unsigned int ifi, sa_family_t af,
-		 void *addr, int prefix_len);
+int nl_addr_set(int s, unsigned int ifi, sa_family_t af,
+		void *addr, int prefix_len);
 void nl_addr_dup(int s_src, unsigned int ifi_src,
 		 int s_dst, unsigned int ifi_dst, sa_family_t af);
 void nl_link_get_mac(int s, unsigned int ifi, void *mac);
-void nl_link_set_mac(int s, unsigned int ifi, void *mac);
-void nl_link_up(int s, unsigned int ifi, int mtu);
+int nl_link_set_mac(int s, unsigned int ifi, void *mac);
+int nl_link_up(int s, unsigned int ifi, int mtu);
 
 #endif /* NETLINK_H */
diff --git a/pasta.c b/pasta.c
index 3380475..ed6fda3 100644
--- a/pasta.c
+++ b/pasta.c
@@ -272,49 +272,71 @@ void pasta_start_ns(struct ctx *c, uid_t uid, gid_t gid,
  */
 void pasta_ns_conf(struct ctx *c)
 {
-	nl_link_up(nl_sock_ns, 1 /* lo */, 0);
+	int rc = 0;
+
+	rc = nl_link_up(nl_sock_ns, 1 /* lo */, 0);
+	if (rc < 0)
+		die("Couldn't bring up loopback interface in namespace: %s",
+		    strerror(-rc));
 
 	/* Get or set guest MAC */
 	if (MAC_IS_ZERO(c->mac_guest))
 		nl_link_get_mac(nl_sock_ns, c->pasta_ifi, c->mac_guest);
 	else
-		nl_link_set_mac(nl_sock_ns, c->pasta_ifi, c->mac_guest);
+		rc = nl_link_set_mac(nl_sock_ns, c->pasta_ifi, c->mac_guest);
+	if (rc < 0)
+		die("Couldn't set MAC address in namespace: %s",
+		    strerror(-rc));
 
 	if (c->pasta_conf_ns) {
 		nl_link_up(nl_sock_ns, c->pasta_ifi, c->mtu);
 
 		if (c->ifi4) {
 			if (c->no_copy_addrs)
-				nl_addr_set(nl_sock_ns, c->pasta_ifi, AF_INET,
-					    &c->ip4.addr, c->ip4.prefix_len);
+				rc = nl_addr_set(nl_sock_ns, c->pasta_ifi,
+						 AF_INET,
+						 &c->ip4.addr,
+						 c->ip4.prefix_len);
 			else
 				nl_addr_dup(nl_sock, c->ifi4,
 					    nl_sock_ns, c->pasta_ifi, AF_INET);
+			if (rc < 0)
+				die("Couldn't set IPv4 address(es) in namespace: %s",
+				    strerror(-rc));
 
 			if (c->no_copy_routes)
-				nl_route_set_def(nl_sock_ns, c->pasta_ifi,
-						 AF_INET, &c->ip4.gw);
+				rc = nl_route_set_def(nl_sock_ns, c->pasta_ifi,
+						      AF_INET, &c->ip4.gw);
 			else
 				nl_route_dup(nl_sock, c->ifi4, nl_sock_ns,
 					     c->pasta_ifi, AF_INET);
+			if (rc < 0)
+				die("Couldn't set IPv4 route(s) in guest: %s",
+				    strerror(-rc));
 		}
 
 		if (c->ifi6) {
 			if (c->no_copy_addrs)
-				nl_addr_set(nl_sock_ns, c->pasta_ifi,
-					    AF_INET6, &c->ip6.addr, 64);
+				rc = nl_addr_set(nl_sock_ns, c->pasta_ifi,
+						 AF_INET6, &c->ip6.addr, 64);
 			else
 				nl_addr_dup(nl_sock, c->ifi4,
 					    nl_sock_ns, c->pasta_ifi,
 					    AF_INET6);
+			if (rc < 0)
+				die("Couldn't set IPv6 address(es) in namespace: %s",
+				    strerror(-rc));
 
 			if (c->no_copy_routes)
-				nl_route_set_def(nl_sock_ns, c->pasta_ifi,
-						 AF_INET6, &c->ip6.gw);
+				rc = nl_route_set_def(nl_sock_ns, c->pasta_ifi,
+						      AF_INET6, &c->ip6.gw);
 			else
 				nl_route_dup(nl_sock, c->ifi6,
 					     nl_sock_ns, c->pasta_ifi,
 					     AF_INET6);
+			if (rc < 0)
+				die("Couldn't set IPv6 route(s) in guest: %s",
+				    strerror(-rc));
 		}
 	}
 
-- 
@@ -272,49 +272,71 @@ void pasta_start_ns(struct ctx *c, uid_t uid, gid_t gid,
  */
 void pasta_ns_conf(struct ctx *c)
 {
-	nl_link_up(nl_sock_ns, 1 /* lo */, 0);
+	int rc = 0;
+
+	rc = nl_link_up(nl_sock_ns, 1 /* lo */, 0);
+	if (rc < 0)
+		die("Couldn't bring up loopback interface in namespace: %s",
+		    strerror(-rc));
 
 	/* Get or set guest MAC */
 	if (MAC_IS_ZERO(c->mac_guest))
 		nl_link_get_mac(nl_sock_ns, c->pasta_ifi, c->mac_guest);
 	else
-		nl_link_set_mac(nl_sock_ns, c->pasta_ifi, c->mac_guest);
+		rc = nl_link_set_mac(nl_sock_ns, c->pasta_ifi, c->mac_guest);
+	if (rc < 0)
+		die("Couldn't set MAC address in namespace: %s",
+		    strerror(-rc));
 
 	if (c->pasta_conf_ns) {
 		nl_link_up(nl_sock_ns, c->pasta_ifi, c->mtu);
 
 		if (c->ifi4) {
 			if (c->no_copy_addrs)
-				nl_addr_set(nl_sock_ns, c->pasta_ifi, AF_INET,
-					    &c->ip4.addr, c->ip4.prefix_len);
+				rc = nl_addr_set(nl_sock_ns, c->pasta_ifi,
+						 AF_INET,
+						 &c->ip4.addr,
+						 c->ip4.prefix_len);
 			else
 				nl_addr_dup(nl_sock, c->ifi4,
 					    nl_sock_ns, c->pasta_ifi, AF_INET);
+			if (rc < 0)
+				die("Couldn't set IPv4 address(es) in namespace: %s",
+				    strerror(-rc));
 
 			if (c->no_copy_routes)
-				nl_route_set_def(nl_sock_ns, c->pasta_ifi,
-						 AF_INET, &c->ip4.gw);
+				rc = nl_route_set_def(nl_sock_ns, c->pasta_ifi,
+						      AF_INET, &c->ip4.gw);
 			else
 				nl_route_dup(nl_sock, c->ifi4, nl_sock_ns,
 					     c->pasta_ifi, AF_INET);
+			if (rc < 0)
+				die("Couldn't set IPv4 route(s) in guest: %s",
+				    strerror(-rc));
 		}
 
 		if (c->ifi6) {
 			if (c->no_copy_addrs)
-				nl_addr_set(nl_sock_ns, c->pasta_ifi,
-					    AF_INET6, &c->ip6.addr, 64);
+				rc = nl_addr_set(nl_sock_ns, c->pasta_ifi,
+						 AF_INET6, &c->ip6.addr, 64);
 			else
 				nl_addr_dup(nl_sock, c->ifi4,
 					    nl_sock_ns, c->pasta_ifi,
 					    AF_INET6);
+			if (rc < 0)
+				die("Couldn't set IPv6 address(es) in namespace: %s",
+				    strerror(-rc));
 
 			if (c->no_copy_routes)
-				nl_route_set_def(nl_sock_ns, c->pasta_ifi,
-						 AF_INET6, &c->ip6.gw);
+				rc = nl_route_set_def(nl_sock_ns, c->pasta_ifi,
+						      AF_INET6, &c->ip6.gw);
 			else
 				nl_route_dup(nl_sock, c->ifi6,
 					     nl_sock_ns, c->pasta_ifi,
 					     AF_INET6);
+			if (rc < 0)
+				die("Couldn't set IPv6 route(s) in guest: %s",
+				    strerror(-rc));
 		}
 	}
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 15/17] netlink: Always process all responses to a netlink request
  2023-07-24  6:09 [PATCH 00/17] netlink fixes and cleanups David Gibson
                   ` (13 preceding siblings ...)
  2023-07-24  6:09 ` [PATCH 14/17] netlink: Propagate errors for "set" operations David Gibson
@ 2023-07-24  6:09 ` David Gibson
  2023-07-24  6:09 ` [PATCH 16/17] netlink: Propagate errors for "dump" operations David Gibson
  2023-07-24  6:09 ` [PATCH 17/17] netlink: Propagate errors for "dup" operations David Gibson
  16 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2023-07-24  6:09 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

A single netlink request can result in multiple response datagrams.  We
process multiple response datagrams in some circumstances, but there are
cases where we exit early and will leave remaining datagrams in the queue.
These will be flushed in nl_send() before we send another request.

This is confusing, and not what we need to reliably check for errors from
netlink operations.  So, instead, make sure we always process all the
response datagrams whenever we send a request (excepting fatal errors).

In most cases this is just a matter of avoiding early exits from nl_foreach
loops.  nl_route_dup() is a bit trickier, because we need to retain all the
routes we're going to try to copy in a single buffer.  Here we instead use
a secondary buffer to flush any remaining datagrams, and report an error
if there are any additional routes in those datagrams .

Link: https://bugs.passt.top/show_bug.cgi?id=67
Link: https://bugs.passt.top/show_bug.cgi?id=60

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 netlink.c | 46 ++++++++++++++++++++++------------------------
 1 file changed, 22 insertions(+), 24 deletions(-)

diff --git a/netlink.c b/netlink.c
index 4932f07..c57ee70 100644
--- a/netlink.c
+++ b/netlink.c
@@ -116,24 +116,9 @@ fail:
 static uint16_t nl_send(int s, void *req, uint16_t type,
 		       uint16_t flags, ssize_t len)
 {
-	char flush[NLBUFSIZ];
 	struct nlmsghdr *nh;
-	int done = 0;
 	ssize_t n;
 
-	while (!done && (n = recv(s, flush, sizeof(flush), MSG_DONTWAIT)) > 0) {
-		size_t nm = n;
-
-		for (nh = (struct nlmsghdr *)flush;
-		     NLMSG_OK(nh, nm); nh = NLMSG_NEXT(nh, nm)) {
-			if (nh->nlmsg_type == NLMSG_DONE ||
-			    nh->nlmsg_type == NLMSG_ERROR) {
-				done = 1;
-				break;
-			}
-		}
-	}
-
 	nh = (struct nlmsghdr *)req;
 	nh->nlmsg_type = type;
 	nh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | flags;
@@ -269,6 +254,7 @@ unsigned int nl_get_ext_if(int s, sa_family_t af)
 		.rtm.rtm_type	 = RTN_UNICAST,
 		.rtm.rtm_family	 = af,
 	};
+	unsigned int ifi = 0;
 	struct nlmsghdr *nh;
 	struct rtattr *rta;
 	char buf[NLBUFSIZ];
@@ -280,23 +266,19 @@ unsigned int nl_get_ext_if(int s, sa_family_t af)
 	nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWROUTE) {
 		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
 
-		if (rtm->rtm_dst_len || rtm->rtm_family != af)
+		if (ifi || rtm->rtm_dst_len || rtm->rtm_family != af)
 			continue;
 
 		for (rta = RTM_RTA(rtm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
 		     rta = RTA_NEXT(rta, na)) {
-			unsigned int ifi;
-
 			if (rta->rta_type != RTA_OIF)
 				continue;
 
 			ifi = *(unsigned int *)RTA_DATA(rta);
-
-			return ifi;
 		}
 	}
 
-	return 0;
+	return ifi;
 }
 
 /**
@@ -324,6 +306,7 @@ void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 		.ifi		  = ifi,
 	};
 	struct nlmsghdr *nh;
+	bool found = false;
 	char buf[NLBUFSIZ];
 	ssize_t status;
 	uint16_t seq;
@@ -334,7 +317,7 @@ void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 		struct rtattr *rta;
 		size_t na;
 
-		if (rtm->rtm_dst_len)
+		if (found || rtm->rtm_dst_len)
 			continue;
 
 		for (rta = RTM_RTA(rtm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
@@ -343,7 +326,7 @@ void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 				continue;
 
 			memcpy(gw, RTA_DATA(rta), RTA_PAYLOAD(rta));
-			return;
+			found = true;
 		}
 	}
 }
@@ -477,6 +460,22 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 		}
 	}
 
+	if (!NLMSG_OK(nh, status) || status > 0) {
+		/* Process any remaining datagrams in a different
+		 * buffer so we don't overwrite the first one.
+		 */
+		char tail[NLBUFSIZ];
+		unsigned extra = 0;
+
+		nl_foreach_oftype(nh, status, s_src, tail, seq, RTM_NEWROUTE)
+			extra++;
+
+		if (extra) {
+			err("netlink: Too many routes to duplicate");
+			return;
+		}
+	}
+
 	/* Routes might have dependencies between each other, and the
 	 * kernel processes RTM_NEWROUTE messages sequentially. For n
 	 * routes, we might need to send the requests up to n times to
@@ -707,7 +706,6 @@ void nl_link_get_mac(int s, unsigned int ifi, void *mac)
 				continue;
 
 			memcpy(mac, RTA_DATA(rta), ETH_ALEN);
-			break;
 		}
 	}
 }
-- 
@@ -116,24 +116,9 @@ fail:
 static uint16_t nl_send(int s, void *req, uint16_t type,
 		       uint16_t flags, ssize_t len)
 {
-	char flush[NLBUFSIZ];
 	struct nlmsghdr *nh;
-	int done = 0;
 	ssize_t n;
 
-	while (!done && (n = recv(s, flush, sizeof(flush), MSG_DONTWAIT)) > 0) {
-		size_t nm = n;
-
-		for (nh = (struct nlmsghdr *)flush;
-		     NLMSG_OK(nh, nm); nh = NLMSG_NEXT(nh, nm)) {
-			if (nh->nlmsg_type == NLMSG_DONE ||
-			    nh->nlmsg_type == NLMSG_ERROR) {
-				done = 1;
-				break;
-			}
-		}
-	}
-
 	nh = (struct nlmsghdr *)req;
 	nh->nlmsg_type = type;
 	nh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | flags;
@@ -269,6 +254,7 @@ unsigned int nl_get_ext_if(int s, sa_family_t af)
 		.rtm.rtm_type	 = RTN_UNICAST,
 		.rtm.rtm_family	 = af,
 	};
+	unsigned int ifi = 0;
 	struct nlmsghdr *nh;
 	struct rtattr *rta;
 	char buf[NLBUFSIZ];
@@ -280,23 +266,19 @@ unsigned int nl_get_ext_if(int s, sa_family_t af)
 	nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWROUTE) {
 		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
 
-		if (rtm->rtm_dst_len || rtm->rtm_family != af)
+		if (ifi || rtm->rtm_dst_len || rtm->rtm_family != af)
 			continue;
 
 		for (rta = RTM_RTA(rtm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
 		     rta = RTA_NEXT(rta, na)) {
-			unsigned int ifi;
-
 			if (rta->rta_type != RTA_OIF)
 				continue;
 
 			ifi = *(unsigned int *)RTA_DATA(rta);
-
-			return ifi;
 		}
 	}
 
-	return 0;
+	return ifi;
 }
 
 /**
@@ -324,6 +306,7 @@ void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 		.ifi		  = ifi,
 	};
 	struct nlmsghdr *nh;
+	bool found = false;
 	char buf[NLBUFSIZ];
 	ssize_t status;
 	uint16_t seq;
@@ -334,7 +317,7 @@ void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 		struct rtattr *rta;
 		size_t na;
 
-		if (rtm->rtm_dst_len)
+		if (found || rtm->rtm_dst_len)
 			continue;
 
 		for (rta = RTM_RTA(rtm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
@@ -343,7 +326,7 @@ void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 				continue;
 
 			memcpy(gw, RTA_DATA(rta), RTA_PAYLOAD(rta));
-			return;
+			found = true;
 		}
 	}
 }
@@ -477,6 +460,22 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 		}
 	}
 
+	if (!NLMSG_OK(nh, status) || status > 0) {
+		/* Process any remaining datagrams in a different
+		 * buffer so we don't overwrite the first one.
+		 */
+		char tail[NLBUFSIZ];
+		unsigned extra = 0;
+
+		nl_foreach_oftype(nh, status, s_src, tail, seq, RTM_NEWROUTE)
+			extra++;
+
+		if (extra) {
+			err("netlink: Too many routes to duplicate");
+			return;
+		}
+	}
+
 	/* Routes might have dependencies between each other, and the
 	 * kernel processes RTM_NEWROUTE messages sequentially. For n
 	 * routes, we might need to send the requests up to n times to
@@ -707,7 +706,6 @@ void nl_link_get_mac(int s, unsigned int ifi, void *mac)
 				continue;
 
 			memcpy(mac, RTA_DATA(rta), ETH_ALEN);
-			break;
 		}
 	}
 }
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 16/17] netlink: Propagate errors for "dump" operations
  2023-07-24  6:09 [PATCH 00/17] netlink fixes and cleanups David Gibson
                   ` (14 preceding siblings ...)
  2023-07-24  6:09 ` [PATCH 15/17] netlink: Always process all responses to a netlink request David Gibson
@ 2023-07-24  6:09 ` David Gibson
  2023-07-24  6:09 ` [PATCH 17/17] netlink: Propagate errors for "dup" operations David Gibson
  16 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2023-07-24  6:09 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Currently if we receive any netlink errors while discovering network
configuration from the host, we'll just ignore it and carry on.  This
might lead to cryptic error messages later on, or even silent
misconfiguration.

We now have the mechanisms to detect errors from get/dump netlink
operations.  Propgate these errors up to the callers and report them usefully.

Link: https://bugs.passt.top/show_bug.cgi?id=60

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c    | 65 +++++++++++++++++++++++++++++++++++++++++++------------
 netlink.c | 19 ++++++++++++----
 netlink.h |  8 +++----
 3 files changed, 70 insertions(+), 22 deletions(-)

diff --git a/conf.c b/conf.c
index 2e6e03f..14ec9f3 100644
--- a/conf.c
+++ b/conf.c
@@ -647,12 +647,24 @@ static unsigned int conf_ip4(unsigned int ifi,
 		return 0;
 	}
 
-	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->gw))
-		nl_route_get_def(nl_sock, ifi, AF_INET, &ip4->gw);
+	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->gw)) {
+		int rc = nl_route_get_def(nl_sock, ifi, AF_INET, &ip4->gw);
+		if (rc < 0) {
+			err("Couldn't discover IPv4 gateway address: %s",
+			    strerror(-rc));
+			return 0;
+		}
+	}
 
-	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr))
-		nl_addr_get(nl_sock, ifi, AF_INET,
-			    &ip4->addr, &ip4->prefix_len, NULL);
+	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr)) {
+		int rc = nl_addr_get(nl_sock, ifi, AF_INET,
+				     &ip4->addr, &ip4->prefix_len, NULL);
+		if (rc < 0) {
+			err("Couldn't discover IPv4 address: %s",
+			    strerror(-rc));
+			return 0;
+		}
+	}
 
 	if (!ip4->prefix_len) {
 		in_addr_t addr = ntohl(ip4->addr.s_addr);
@@ -668,8 +680,15 @@ static unsigned int conf_ip4(unsigned int ifi,
 
 	memcpy(&ip4->addr_seen, &ip4->addr, sizeof(ip4->addr_seen));
 
-	if (MAC_IS_ZERO(mac))
-		nl_link_get_mac(nl_sock, ifi, mac);
+	if (MAC_IS_ZERO(mac)) {
+		int rc = nl_link_get_mac(nl_sock, ifi, mac);
+		if (rc < 0) {
+			char ifname[IFNAMSIZ];
+			err("Couldn't discover MAC for %s: %s",
+			    if_indextoname(ifi, ifname), strerror(-rc));
+			return 0;
+		}
+	}
 
 	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr) ||
 	    MAC_IS_ZERO(mac))
@@ -690,6 +709,7 @@ static unsigned int conf_ip6(unsigned int ifi,
 			     struct ip6_ctx *ip6, unsigned char *mac)
 {
 	int prefix_len = 0;
+	int rc;
 
 	if (!ifi)
 		ifi = nl_get_ext_if(nl_sock, AF_INET6);
@@ -699,18 +719,35 @@ static unsigned int conf_ip6(unsigned int ifi,
 		return 0;
 	}
 
-	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->gw))
-		nl_route_get_def(nl_sock, ifi, AF_INET6, &ip6->gw);
+	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->gw)) {
+		rc = nl_route_get_def(nl_sock, ifi, AF_INET6, &ip6->gw);
+		if (rc < 0) {
+			err("Couldn't discover IPv6 gateway address: %s",
+			    strerror(-rc));
+			return 0;
+		}
+	}
 
-	nl_addr_get(nl_sock, ifi, AF_INET6,
-		    IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ? &ip6->addr : NULL,
-		    &prefix_len, &ip6->addr_ll);
+	rc = nl_addr_get(nl_sock, ifi, AF_INET6,
+			 IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ? &ip6->addr : NULL,
+			 &prefix_len, &ip6->addr_ll);
+	if (rc < 0) {
+		err("Couldn't discover IPv6 address: %s", strerror(-rc));
+		return 0;
+	}
 
 	memcpy(&ip6->addr_seen, &ip6->addr, sizeof(ip6->addr));
 	memcpy(&ip6->addr_ll_seen, &ip6->addr_ll, sizeof(ip6->addr_ll));
 
-	if (MAC_IS_ZERO(mac))
-		nl_link_get_mac(0, ifi, mac);
+	if (MAC_IS_ZERO(mac)) {
+		rc = nl_link_get_mac(nl_sock, ifi, mac);
+		if (rc < 0) {
+			char ifname[IFNAMSIZ];
+			err("Couldn't discover MAC for %s: %s",
+			    if_indextoname(ifi, ifname), strerror(-rc));
+			return 0;
+		}
+	}
 
 	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ||
 	    IN6_IS_ADDR_UNSPECIFIED(&ip6->addr_ll) ||
diff --git a/netlink.c b/netlink.c
index c57ee70..9e72b16 100644
--- a/netlink.c
+++ b/netlink.c
@@ -277,6 +277,8 @@ unsigned int nl_get_ext_if(int s, sa_family_t af)
 			ifi = *(unsigned int *)RTA_DATA(rta);
 		}
 	}
+	if (status < 0)
+		warn("netlink: RTM_GETROUTE failed: %s", strerror(-status));
 
 	return ifi;
 }
@@ -287,8 +289,10 @@ unsigned int nl_get_ext_if(int s, sa_family_t af)
  * @ifi:	Interface index
  * @af:		Address family
  * @gw:		Default gateway to fill on NL_GET
+ *
+ * Return: 0 on success, negative error code on failure
  */
-void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw)
+int nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 {
 	struct req_t {
 		struct nlmsghdr nlh;
@@ -329,6 +333,7 @@ void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw)
 			found = true;
 		}
 	}
+	return status;
 }
 
 /**
@@ -508,9 +513,11 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
  * @addr:	Global address to fill
  * @prefix_len:	Mask or prefix length, to fill (for IPv4)
  * @addr_l:	Link-scoped address to fill (for IPv6)
+ *
+ * Return: 9 on success, negative error code on failure
  */
-void nl_addr_get(int s, unsigned int ifi, sa_family_t af,
-		 void *addr, int *prefix_len, void *addr_l)
+int nl_addr_get(int s, unsigned int ifi, sa_family_t af,
+		void *addr, int *prefix_len, void *addr_l)
 {
 	struct req_t {
 		struct nlmsghdr nlh;
@@ -551,6 +558,7 @@ void nl_addr_get(int s, unsigned int ifi, sa_family_t af,
 				memcpy(addr_l, RTA_DATA(rta), RTA_PAYLOAD(rta));
 		}
 	}
+	return status;
 }
 
 /**
@@ -678,8 +686,10 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
  * @s:		Netlink socket
  * @ifi:	Interface index
  * @mac:	Fill with current MAC address
+ *
+ * Return: 0 on success, negative error code on failure
  */
-void nl_link_get_mac(int s, unsigned int ifi, void *mac)
+int nl_link_get_mac(int s, unsigned int ifi, void *mac)
 {
 	struct req_t {
 		struct nlmsghdr nlh;
@@ -708,6 +718,7 @@ void nl_link_get_mac(int s, unsigned int ifi, void *mac)
 			memcpy(mac, RTA_DATA(rta), ETH_ALEN);
 		}
 	}
+	return status;
 }
 
 /**
diff --git a/netlink.h b/netlink.h
index 977244b..b831405 100644
--- a/netlink.h
+++ b/netlink.h
@@ -11,17 +11,17 @@ extern int nl_sock_ns;
 
 void nl_sock_init(const struct ctx *c, bool ns);
 unsigned int nl_get_ext_if(int s, sa_family_t af);
-void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw);
+int nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw);
 int nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw);
 void nl_route_dup(int s_src, unsigned int ifi_src,
 		  int s_dst, unsigned int ifi_dst, sa_family_t af);
-void nl_addr_get(int s, unsigned int ifi, sa_family_t af,
-		 void *addr, int *prefix_len, void *addr_l);
+int nl_addr_get(int s, unsigned int ifi, sa_family_t af,
+		void *addr, int *prefix_len, void *addr_l);
 int nl_addr_set(int s, unsigned int ifi, sa_family_t af,
 		void *addr, int prefix_len);
 void nl_addr_dup(int s_src, unsigned int ifi_src,
 		 int s_dst, unsigned int ifi_dst, sa_family_t af);
-void nl_link_get_mac(int s, unsigned int ifi, void *mac);
+int nl_link_get_mac(int s, unsigned int ifi, void *mac);
 int nl_link_set_mac(int s, unsigned int ifi, void *mac);
 int nl_link_up(int s, unsigned int ifi, int mtu);
 
-- 
@@ -11,17 +11,17 @@ extern int nl_sock_ns;
 
 void nl_sock_init(const struct ctx *c, bool ns);
 unsigned int nl_get_ext_if(int s, sa_family_t af);
-void nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw);
+int nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw);
 int nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw);
 void nl_route_dup(int s_src, unsigned int ifi_src,
 		  int s_dst, unsigned int ifi_dst, sa_family_t af);
-void nl_addr_get(int s, unsigned int ifi, sa_family_t af,
-		 void *addr, int *prefix_len, void *addr_l);
+int nl_addr_get(int s, unsigned int ifi, sa_family_t af,
+		void *addr, int *prefix_len, void *addr_l);
 int nl_addr_set(int s, unsigned int ifi, sa_family_t af,
 		void *addr, int prefix_len);
 void nl_addr_dup(int s_src, unsigned int ifi_src,
 		 int s_dst, unsigned int ifi_dst, sa_family_t af);
-void nl_link_get_mac(int s, unsigned int ifi, void *mac);
+int nl_link_get_mac(int s, unsigned int ifi, void *mac);
 int nl_link_set_mac(int s, unsigned int ifi, void *mac);
 int nl_link_up(int s, unsigned int ifi, int mtu);
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 17/17] netlink: Propagate errors for "dup" operations
  2023-07-24  6:09 [PATCH 00/17] netlink fixes and cleanups David Gibson
                   ` (15 preceding siblings ...)
  2023-07-24  6:09 ` [PATCH 16/17] netlink: Propagate errors for "dump" operations David Gibson
@ 2023-07-24  6:09 ` David Gibson
  2023-08-02 22:48   ` Stefano Brivio
  16 siblings, 1 reply; 35+ messages in thread
From: David Gibson @ 2023-07-24  6:09 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

We now detect errors on netlink "set" operations while configuring the
pasta namespace with --config-net.  However in many cases rather than
a simple "set" we use a more complex "dup" function to copy
configuration from the host to the namespace.  We're not yet properly
detecting and reporting netlink errors for that case.

Change the "dup" operations to propagate netlink errors to their
caller, pasta_ns_conf() and report them there.

Link: https://bugs.passt.top/show_bug.cgi?id=60

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 netlink.c | 40 ++++++++++++++++++++++++++++------------
 netlink.h |  8 ++++----
 pasta.c   | 15 ++++++++-------
 3 files changed, 40 insertions(+), 23 deletions(-)

diff --git a/netlink.c b/netlink.c
index 9e72b16..cdc18c0 100644
--- a/netlink.c
+++ b/netlink.c
@@ -413,9 +413,11 @@ int nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw)
  * @s_dst:	Netlink socket in destination namespace
  * @ifi_dst:	Interface index in destination namespace
  * @af:		Address family
+ *
+ * Return: 0 on success, negative error code on failure
  */
-void nl_route_dup(int s_src, unsigned int ifi_src,
-		  int s_dst, unsigned int ifi_dst, sa_family_t af)
+int nl_route_dup(int s_src, unsigned int ifi_src,
+		 int s_dst, unsigned int ifi_dst, sa_family_t af)
 {
 	struct req_t {
 		struct nlmsghdr nlh;
@@ -477,9 +479,11 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 
 		if (extra) {
 			err("netlink: Too many routes to duplicate");
-			return;
+			return -E2BIG;
 		}
 	}
+	if (status < 0)
+		return status;
 
 	/* Routes might have dependencies between each other, and the
 	 * kernel processes RTM_NEWROUTE messages sequentially. For n
@@ -494,15 +498,20 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
 		     NLMSG_OK(nh, status);
 		     nh = NLMSG_NEXT(nh, status)) {
 			uint16_t flags = nh->nlmsg_flags;
+			int rc;
 
 			if (nh->nlmsg_type != RTM_NEWROUTE)
 				continue;
 
-			nl_do(s_dst, nh, RTM_NEWROUTE,
-			       (flags & ~NLM_F_DUMP_FILTERED) | NLM_F_CREATE,
-			       nh->nlmsg_len);
+			rc = nl_do(s_dst, nh, RTM_NEWROUTE,
+				   (flags & ~NLM_F_DUMP_FILTERED) | NLM_F_CREATE,
+				   nh->nlmsg_len);
+			if (rc < 0 && rc != -ENETUNREACH && rc != -EEXIST)
+				return rc;
 		}
 	}
+
+	return 0;
 }
 
 /**
@@ -635,9 +644,11 @@ int nl_addr_set(int s, unsigned int ifi, sa_family_t af,
  * @s_dst:	Netlink socket in destination network namespace
  * @ifi_dst:	Interface index in destination namespace
  * @af:		Address family
+ *
+ * Return: 0 on success, negative error code on failure
  */
-void nl_addr_dup(int s_src, unsigned int ifi_src,
-		 int s_dst, unsigned int ifi_dst, sa_family_t af)
+int nl_addr_dup(int s_src, unsigned int ifi_src,
+		int s_dst, unsigned int ifi_dst, sa_family_t af)
 {
 	struct req_t {
 		struct nlmsghdr nlh;
@@ -651,6 +662,7 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 	struct nlmsghdr *nh;
 	ssize_t status;
 	uint16_t seq;
+	int rc= 0;
 
 	seq = nl_send(s_src, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req));
 	nl_foreach_oftype(nh, status, s_src, buf, seq, RTM_NEWADDR) {
@@ -663,7 +675,7 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 
 		ifa = (struct ifaddrmsg *)NLMSG_DATA(nh);
 
-		if (ifa->ifa_scope == RT_SCOPE_LINK ||
+		if (rc < 0 || ifa->ifa_scope == RT_SCOPE_LINK ||
 		    ifa->ifa_index != ifi_src)
 			continue;
 
@@ -675,10 +687,14 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
 				rta->rta_type = IFA_UNSPEC;
 		}
 
-		nl_do(s_dst, nh, RTM_NEWADDR,
-		       (nh->nlmsg_flags & ~NLM_F_DUMP_FILTERED) | NLM_F_CREATE,
-		       nh->nlmsg_len);
+		rc = nl_do(s_dst, nh, RTM_NEWADDR,
+			   (nh->nlmsg_flags & ~NLM_F_DUMP_FILTERED) | NLM_F_CREATE,
+			   nh->nlmsg_len);
 	}
+	if (status < 0)
+		return status;
+
+	return rc;
 }
 
 /**
diff --git a/netlink.h b/netlink.h
index b831405..9f4f8f4 100644
--- a/netlink.h
+++ b/netlink.h
@@ -13,14 +13,14 @@ void nl_sock_init(const struct ctx *c, bool ns);
 unsigned int nl_get_ext_if(int s, sa_family_t af);
 int nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw);
 int nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw);
-void nl_route_dup(int s_src, unsigned int ifi_src,
-		  int s_dst, unsigned int ifi_dst, sa_family_t af);
+int nl_route_dup(int s_src, unsigned int ifi_src,
+		 int s_dst, unsigned int ifi_dst, sa_family_t af);
 int nl_addr_get(int s, unsigned int ifi, sa_family_t af,
 		void *addr, int *prefix_len, void *addr_l);
 int nl_addr_set(int s, unsigned int ifi, sa_family_t af,
 		void *addr, int prefix_len);
-void nl_addr_dup(int s_src, unsigned int ifi_src,
-		 int s_dst, unsigned int ifi_dst, sa_family_t af);
+int nl_addr_dup(int s_src, unsigned int ifi_src,
+		int s_dst, unsigned int ifi_dst, sa_family_t af);
 int nl_link_get_mac(int s, unsigned int ifi, void *mac);
 int nl_link_set_mac(int s, unsigned int ifi, void *mac);
 int nl_link_up(int s, unsigned int ifi, int mtu);
diff --git a/pasta.c b/pasta.c
index ed6fda3..b94423c 100644
--- a/pasta.c
+++ b/pasta.c
@@ -298,8 +298,9 @@ void pasta_ns_conf(struct ctx *c)
 						 &c->ip4.addr,
 						 c->ip4.prefix_len);
 			else
-				nl_addr_dup(nl_sock, c->ifi4,
-					    nl_sock_ns, c->pasta_ifi, AF_INET);
+				rc = nl_addr_dup(nl_sock, c->ifi4,
+						 nl_sock_ns, c->pasta_ifi,
+						 AF_INET);
 			if (rc < 0)
 				die("Couldn't set IPv4 address(es) in namespace: %s",
 				    strerror(-rc));
@@ -308,7 +309,7 @@ void pasta_ns_conf(struct ctx *c)
 				rc = nl_route_set_def(nl_sock_ns, c->pasta_ifi,
 						      AF_INET, &c->ip4.gw);
 			else
-				nl_route_dup(nl_sock, c->ifi4, nl_sock_ns,
+				rc = nl_route_dup(nl_sock, c->ifi4, nl_sock_ns,
 					     c->pasta_ifi, AF_INET);
 			if (rc < 0)
 				die("Couldn't set IPv4 route(s) in guest: %s",
@@ -320,9 +321,9 @@ void pasta_ns_conf(struct ctx *c)
 				rc = nl_addr_set(nl_sock_ns, c->pasta_ifi,
 						 AF_INET6, &c->ip6.addr, 64);
 			else
-				nl_addr_dup(nl_sock, c->ifi4,
-					    nl_sock_ns, c->pasta_ifi,
-					    AF_INET6);
+				rc = nl_addr_dup(nl_sock, c->ifi4,
+						 nl_sock_ns, c->pasta_ifi,
+						 AF_INET6);
 			if (rc < 0)
 				die("Couldn't set IPv6 address(es) in namespace: %s",
 				    strerror(-rc));
@@ -331,7 +332,7 @@ void pasta_ns_conf(struct ctx *c)
 				rc = nl_route_set_def(nl_sock_ns, c->pasta_ifi,
 						      AF_INET6, &c->ip6.gw);
 			else
-				nl_route_dup(nl_sock, c->ifi6,
+				rc = nl_route_dup(nl_sock, c->ifi6,
 					     nl_sock_ns, c->pasta_ifi,
 					     AF_INET6);
 			if (rc < 0)
-- 
@@ -298,8 +298,9 @@ void pasta_ns_conf(struct ctx *c)
 						 &c->ip4.addr,
 						 c->ip4.prefix_len);
 			else
-				nl_addr_dup(nl_sock, c->ifi4,
-					    nl_sock_ns, c->pasta_ifi, AF_INET);
+				rc = nl_addr_dup(nl_sock, c->ifi4,
+						 nl_sock_ns, c->pasta_ifi,
+						 AF_INET);
 			if (rc < 0)
 				die("Couldn't set IPv4 address(es) in namespace: %s",
 				    strerror(-rc));
@@ -308,7 +309,7 @@ void pasta_ns_conf(struct ctx *c)
 				rc = nl_route_set_def(nl_sock_ns, c->pasta_ifi,
 						      AF_INET, &c->ip4.gw);
 			else
-				nl_route_dup(nl_sock, c->ifi4, nl_sock_ns,
+				rc = nl_route_dup(nl_sock, c->ifi4, nl_sock_ns,
 					     c->pasta_ifi, AF_INET);
 			if (rc < 0)
 				die("Couldn't set IPv4 route(s) in guest: %s",
@@ -320,9 +321,9 @@ void pasta_ns_conf(struct ctx *c)
 				rc = nl_addr_set(nl_sock_ns, c->pasta_ifi,
 						 AF_INET6, &c->ip6.addr, 64);
 			else
-				nl_addr_dup(nl_sock, c->ifi4,
-					    nl_sock_ns, c->pasta_ifi,
-					    AF_INET6);
+				rc = nl_addr_dup(nl_sock, c->ifi4,
+						 nl_sock_ns, c->pasta_ifi,
+						 AF_INET6);
 			if (rc < 0)
 				die("Couldn't set IPv6 address(es) in namespace: %s",
 				    strerror(-rc));
@@ -331,7 +332,7 @@ void pasta_ns_conf(struct ctx *c)
 				rc = nl_route_set_def(nl_sock_ns, c->pasta_ifi,
 						      AF_INET6, &c->ip6.gw);
 			else
-				nl_route_dup(nl_sock, c->ifi6,
+				rc = nl_route_dup(nl_sock, c->ifi6,
 					     nl_sock_ns, c->pasta_ifi,
 					     AF_INET6);
 			if (rc < 0)
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/17] netlink: Split up functionality if nl_link()
  2023-07-24  6:09 ` [PATCH 01/17] netlink: Split up functionality if nl_link() David Gibson
@ 2023-08-02 22:47   ` Stefano Brivio
  2023-08-03  2:09     ` David Gibson
  0 siblings, 1 reply; 35+ messages in thread
From: Stefano Brivio @ 2023-08-02 22:47 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

In the subject: s/if/of/.

On Mon, 24 Jul 2023 16:09:20 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> nl_link() performs a number of functions: it can bring links up, set MAC
> address and MTU and also retrieve the existing MAC.  This makes for a small
> number of lines of code, but high conceptual complexity: it's quite hard
> to follow what's going on both in nl_link() itself and it's also not very
> obvious which function its callers are intending to use.

Actually I don't find nl_link() *that* bad, but for consistency with the
next patches this definitely makes sense.

> Clarify this, by splitting nl_link() into nl_link_up(), nl_link_set_mac(),
> and nl_link_get_mac().  The first brings up a link, optionally setting the
> MTU, the others get or set the MAC address.
> 
> This fixes an arguable bug in pasta_ns_conf(): it looks as though that was
> intended to retrieve the guest MAC whether or not c->pasta_conf_ns is set.
> However, it only actually does so in the !c->pasta_conf_ns case: the fact
> that we set up==1 means we would only ever set, never get, the MAC in the
> nl_link() call in the other path.  We get away with this because the MAC
> will quickly be discovered once we receive packets on the tap interface.
> Still, it's neater to always get the MAC address here.

Actually, the intention wasn't to always retrieve the namespaced MAC
address: I thought I'd do that only if we don't configure the
interface, because we want NDP and DHCP to be "ready". But that's not
really relevant... I guess yes, it's more consistent if we fetch it in
any case (as long as we don't configure it).

> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  conf.c    |   4 +-
>  netlink.c | 143 +++++++++++++++++++++++++++++++-----------------------
>  netlink.h |   4 +-
>  pasta.c   |  12 +++--
>  4 files changed, 96 insertions(+), 67 deletions(-)
> 
> diff --git a/conf.c b/conf.c
> index 78eaf2d..2ff9e2a 100644
> --- a/conf.c
> +++ b/conf.c
> @@ -670,7 +670,7 @@ static unsigned int conf_ip4(unsigned int ifi,
>  	memcpy(&ip4->addr_seen, &ip4->addr, sizeof(ip4->addr_seen));
>  
>  	if (MAC_IS_ZERO(mac))
> -		nl_link(0, ifi, mac, 0, 0);
> +		nl_link_get_mac(0, ifi, mac);
>  
>  	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr) ||
>  	    MAC_IS_ZERO(mac))
> @@ -711,7 +711,7 @@ static unsigned int conf_ip6(unsigned int ifi,
>  	memcpy(&ip6->addr_ll_seen, &ip6->addr_ll, sizeof(ip6->addr_ll));
>  
>  	if (MAC_IS_ZERO(mac))
> -		nl_link(0, ifi, mac, 0, 0);
> +		nl_link_get_mac(0, ifi, mac);
>  
>  	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ||
>  	    IN6_IS_ADDR_UNSPECIFIED(&ip6->addr_ll) ||
> diff --git a/netlink.c b/netlink.c
> index e15e23f..4b1f75e 100644
> --- a/netlink.c
> +++ b/netlink.c
> @@ -486,83 +486,44 @@ next:
>  }
>  
>  /**
> - * nl_link() - Get/set link attributes
> + * nl_link_get_mac() - Get link MAC address
>   * @ns:		Use netlink socket in namespace
>   * @ifi:	Interface index
> - * @mac:	MAC address to fill, if passed as zero, to set otherwise
> - * @up:		If set, bring up the link
> - * @mtu:	If non-zero, set interface MTU
> + * @mac:	Fill with current MAC address
>   */
> -void nl_link(int ns, unsigned int ifi, void *mac, int up, int mtu)
> +void nl_link_get_mac(int ns, unsigned int ifi, void *mac)
>  {
> -	int change = !MAC_IS_ZERO(mac) || up || mtu;
>  	struct req_t {
>  		struct nlmsghdr nlh;
>  		struct ifinfomsg ifm;
> -		struct rtattr rta;
> -		union {
> -			unsigned char mac[ETH_ALEN];
> -			struct {
> -				unsigned int mtu;
> -			} mtu;
> -		} set;
>  	} req = {
> -		.nlh.nlmsg_type   = change ? RTM_NEWLINK : RTM_GETLINK,
> -		.nlh.nlmsg_len    = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
> -		.nlh.nlmsg_flags  = NLM_F_REQUEST | (change ? NLM_F_ACK : 0),
> +		.nlh.nlmsg_type	  = RTM_GETLINK,
> +		.nlh.nlmsg_len	  = sizeof(req),

I don't think there's a practical issue with this, but there were two
reasons why I used NLMSG_LENGTH(sizeof(struct ifinfomsg)) instead:

- NLMSG_LENGTH() aligns to 4 bytes, not to whatever
  architecture-dependent alignment we might have: the message might
  actually be smaller

- I see that this works with gcc and clang, but, strictly
  speaking, is the size of the struct known "before"
  (sequence-point-wise) we're done initialising it? I have a very vague
  memory of this not working with gcc 2.9 or suchlike -- which is not a
  problem, as long as our new friend C11 actually supports this (but
  I'm not entirely sure).

Then, in 9/17, NLMSG_LENGTH() could be conveniently used by nl_req().

> +		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK,
>  		.nlh.nlmsg_seq	  = nl_seq++,
>  		.ifm.ifi_family	  = AF_UNSPEC,
>  		.ifm.ifi_index	  = ifi,
> -		.ifm.ifi_flags	  = up ? IFF_UP : 0,
> -		.ifm.ifi_change	  = up ? IFF_UP : 0,
>  	};
> -	struct ifinfomsg *ifm;
>  	struct nlmsghdr *nh;
> -	struct rtattr *rta;
>  	char buf[NLBUFSIZ];
>  	ssize_t n;
> -	size_t na;
> -
> -	if (!MAC_IS_ZERO(mac)) {
> -		req.nlh.nlmsg_len = sizeof(req);
> -		memcpy(req.set.mac, mac, ETH_ALEN);
> -		req.rta.rta_type = IFLA_ADDRESS;
> -		req.rta.rta_len = RTA_LENGTH(ETH_ALEN);
> -		if (nl_req(ns, buf, &req, req.nlh.nlmsg_len) < 0)
> -			return;
> -
> -		up = 0;
> -	}
> -
> -	if (mtu) {
> -		req.nlh.nlmsg_len = offsetof(struct req_t, set.mtu)
> -			+ sizeof(req.set.mtu);
> -		req.set.mtu.mtu = mtu;
> -		req.rta.rta_type = IFLA_MTU;
> -		req.rta.rta_len = RTA_LENGTH(sizeof(unsigned int));
> -		if (nl_req(ns, buf, &req, req.nlh.nlmsg_len) < 0)
> -			return;
> -
> -		up = 0;
> -	}
> -
> -	if (up && nl_req(ns, buf, &req, req.nlh.nlmsg_len) < 0)
> -		return;
> -
> -	if (change)
> -		return;
>  
> -	if ((n = nl_req(ns, buf, &req, req.nlh.nlmsg_len)) < 0)
> +	n = nl_req(ns, buf, &req, sizeof(req));
> +	if (n < 0)
>  		return;
> +	
> +	for (nh = (struct nlmsghdr *)buf;
> +	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
> +	     nh = NLMSG_NEXT(nh, n)) {
> +		struct ifinfomsg *ifm = (struct ifinfomsg *)NLMSG_DATA(nh);
> +		struct rtattr *rta;
> +		size_t na;
>  
> -	nh = (struct nlmsghdr *)buf;
> -	for ( ; NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
>  		if (nh->nlmsg_type != RTM_NEWLINK)
> -			goto next;
> -
> -		ifm = (struct ifinfomsg *)NLMSG_DATA(nh);
> +			continue;
>  
> -		for (rta = IFLA_RTA(ifm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
> +		for (rta = IFLA_RTA(ifm), na = RTM_PAYLOAD(nh);
> +		     RTA_OK(rta, na);
>  		     rta = RTA_NEXT(rta, na)) {
>  			if (rta->rta_type != IFLA_ADDRESS)
>  				continue;
> @@ -570,8 +531,70 @@ void nl_link(int ns, unsigned int ifi, void *mac, int up, int mtu)
>  			memcpy(mac, RTA_DATA(rta), ETH_ALEN);
>  			break;
>  		}
> -next:
> -		if (nh->nlmsg_type == NLMSG_DONE)
> -			break;
>  	}
>  }
> +
> +/**
> + * nl_link_set_mac() - Set link MAC address
> + * @ns:		Use netlink socket in namespace
> + * @ifi:	Interface index
> + * @mac:	MAC address to set
> + */
> +void nl_link_set_mac(int ns, unsigned int ifi, void *mac)
> +{
> +	struct req_t {
> +		struct nlmsghdr nlh;
> +		struct ifinfomsg ifm;
> +		struct rtattr rta;
> +		unsigned char mac[ETH_ALEN];
> +	} req = {
> +		.nlh.nlmsg_type	  = RTM_NEWLINK,
> +		.nlh.nlmsg_len	  = sizeof(req),

Same here.

> +		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK,
> +		.nlh.nlmsg_seq	  = nl_seq++,
> +		.ifm.ifi_family	  = AF_UNSPEC,
> +		.ifm.ifi_index	  = ifi,
> +		.rta.rta_type	  = IFLA_ADDRESS,
> +		.rta.rta_len	  = RTA_LENGTH(ETH_ALEN),
> +	};
> +	char buf[NLBUFSIZ];
> +
> +	memcpy(req.mac, mac, ETH_ALEN);
> +
> +	nl_req(ns, buf, &req, sizeof(req));
> +}
> +
> +/**
> + * nl_link_up() - Bring link up
> + * @ns:		Use netlink socket in namespace
> + * @ifi:	Interface index
> + * @mtu:	If non-zero, set interface MTU
> + */
> +void nl_link_up(int ns, unsigned int ifi, int mtu)
> +{
> +	struct req_t {
> +		struct nlmsghdr nlh;
> +		struct ifinfomsg ifm;
> +		struct rtattr rta;
> +		unsigned int mtu;
> +	} req = {
> +		.nlh.nlmsg_type   = RTM_NEWLINK,
> +		.nlh.nlmsg_len    = sizeof(req),

And here.

> +		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK,
> +		.nlh.nlmsg_seq	  = nl_seq++,
> +		.ifm.ifi_family	  = AF_UNSPEC,
> +		.ifm.ifi_index	  = ifi,
> +		.ifm.ifi_flags	  = IFF_UP,
> +		.ifm.ifi_change	  = IFF_UP,
> +		.rta.rta_type	  = IFLA_MTU,
> +		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
> +		.mtu		  = mtu,
> +	};
> +	char buf[NLBUFSIZ];
> +
> +	if (!mtu)
> +		/* Shorten request to drop MTU attribute */
> +		req.nlh.nlmsg_len = offsetof(struct req_t, rta);

Pre-existing issue I see now: we should probably use NLMSG_LENGTH()
here, in any case.

> +
> +	nl_req(ns, buf, &req, req.nlh.nlmsg_len);
> +}
> diff --git a/netlink.h b/netlink.h
> index cd0e666..980ac44 100644
> --- a/netlink.h
> +++ b/netlink.h
> @@ -18,6 +18,8 @@ void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
>  	      sa_family_t af, void *gw);
>  void nl_addr(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
>  	     sa_family_t af, void *addr, int *prefix_len, void *addr_l);
> -void nl_link(int ns, unsigned int ifi, void *mac, int up, int mtu);
> +void nl_link_get_mac(int ns, unsigned int ifi, void *mac);
> +void nl_link_set_mac(int ns, unsigned int ifi, void *mac);
> +void nl_link_up(int ns, unsigned int ifi, int mtu);
>  
>  #endif /* NETLINK_H */
> diff --git a/pasta.c b/pasta.c
> index 8c85546..3b5537d 100644
> --- a/pasta.c
> +++ b/pasta.c
> @@ -272,13 +272,19 @@ void pasta_start_ns(struct ctx *c, uid_t uid, gid_t gid,
>   */
>  void pasta_ns_conf(struct ctx *c)
>  {
> -	nl_link(1, 1 /* lo */, MAC_ZERO, 1, 0);
> +	nl_link_up(1, 1 /* lo */, 0);
> +
> +	/* Get or set guest MAC */

I know it's called mac_guest, my bad, but what about "MAC address in
the target namespace"?

-- 
Stefano


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/17] netlink: Split nl_route() into separate operation functions
  2023-07-24  6:09 ` [PATCH 03/17] netlink: Split nl_route() " David Gibson
@ 2023-08-02 22:47   ` Stefano Brivio
  2023-08-03  2:18     ` David Gibson
  0 siblings, 1 reply; 35+ messages in thread
From: Stefano Brivio @ 2023-08-02 22:47 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Mon, 24 Jul 2023 16:09:22 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> nl_route() can perform 3 quite different operations based on the 'op'
> parameter.  Split this into separate functions for each one.  This requires
> more lines of code, but makes the internal logic of each operation much
> easier to follow.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  conf.c    |   4 +-
>  netlink.c | 238 ++++++++++++++++++++++++++++++++++--------------------
>  netlink.h |  11 +--
>  pasta.c   |  16 ++--
>  4 files changed, 164 insertions(+), 105 deletions(-)
> 
> diff --git a/conf.c b/conf.c
> index 2057028..66958d4 100644
> --- a/conf.c
> +++ b/conf.c
> @@ -648,7 +648,7 @@ static unsigned int conf_ip4(unsigned int ifi,
>  	}
>  
>  	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->gw))
> -		nl_route(NL_GET, ifi, 0, AF_INET, &ip4->gw);
> +		nl_route_get_def(ifi, AF_INET, &ip4->gw);
>  
>  	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr))
>  		nl_addr_get(ifi, AF_INET, &ip4->addr, &ip4->prefix_len, NULL);
> @@ -699,7 +699,7 @@ static unsigned int conf_ip6(unsigned int ifi,
>  	}
>  
>  	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->gw))
> -		nl_route(NL_GET, ifi, 0, AF_INET6, &ip6->gw);
> +		nl_route_get_def(ifi, AF_INET6, &ip6->gw);
>  
>  	nl_addr_get(ifi, AF_INET6,
>  		    IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ? &ip6->addr : NULL,
> diff --git a/netlink.c b/netlink.c
> index 269d738..346eb3a 100644
> --- a/netlink.c
> +++ b/netlink.c
> @@ -185,15 +185,71 @@ unsigned int nl_get_ext_if(sa_family_t af)
>  }
>  
>  /**
> - * nl_route() - Get/set/copy routes for given interface and address family
> - * @op:		Requested operation
> - * @ifi:	Interface index in outer network namespace
> - * @ifi_ns:	Interface index in target namespace for NL_SET, NL_DUP
> + * nl_route_get_def() - Get default route for given interface and address family
> + * @ifi:	Interface index
> + * @af:		Address family
> + * @gw:		Default gateway to fill on NL_GET
> + */
> +void nl_route_get_def(unsigned int ifi, sa_family_t af, void *gw)
> +{
> +	struct req_t {
> +		struct nlmsghdr nlh;
> +		struct rtmsg rtm;
> +		struct rtattr rta;
> +		unsigned int ifi;
> +	} req = {
> +		.nlh.nlmsg_type	  = RTM_GETROUTE,
> +		.nlh.nlmsg_len	  = sizeof(req),
> +		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_DUMP,
> +		.nlh.nlmsg_seq	  = nl_seq++,
> +
> +		.rtm.rtm_family	  = af,
> +		.rtm.rtm_table	  = RT_TABLE_MAIN,
> +		.rtm.rtm_scope	  = RT_SCOPE_UNIVERSE,
> +		.rtm.rtm_type	  = RTN_UNICAST,
> +
> +		.rta.rta_type	  = RTA_OIF,
> +		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
> +		.ifi		  = ifi,
> +	};
> +	struct nlmsghdr *nh;
> +	char buf[NLBUFSIZ];
> +	ssize_t n;
> +
> +	if ((n = nl_req(0, buf, &req, req.nlh.nlmsg_len)) < 0)
> +		return;
> +
> +	for (nh = (struct nlmsghdr *)buf;
> +	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
> +	     nh = NLMSG_NEXT(nh, n)) {
> +		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
> +		struct rtattr *rta;
> +		size_t na;
> +
> +		if (nh->nlmsg_type != RTM_NEWROUTE)
> +			continue;
> +
> +		if (rtm->rtm_dst_len)
> +			continue;
> +
> +		for (rta = RTM_RTA(rtm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
> +		     rta = RTA_NEXT(rta, na)) {
> +			if (rta->rta_type != RTA_GATEWAY)
> +				continue;
> +
> +			memcpy(gw, RTA_DATA(rta), RTA_PAYLOAD(rta));
> +			return;
> +		}
> +	}
> +}
> +
> +/**
> + * nl_route_set_def() - Set default route for given interface and address family
> + * @ifi:	Interface index in target namespace
>   * @af:		Address family
> - * @gw:		Default gateway to fill on NL_GET, to set on NL_SET
> + * @gw:		Default gateway to set
>   */
> -void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
> -	      sa_family_t af, void *gw)
> +void nl_route_set_def(unsigned int ifi, sa_family_t af, void *gw)
>  {
>  	struct req_t {
>  		struct nlmsghdr nlh;
> @@ -215,122 +271,126 @@ void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
>  			} r4;
>  		} set;
>  	} req = {
> -		.nlh.nlmsg_type	  = op == NL_SET ? RTM_NEWROUTE : RTM_GETROUTE,
> -		.nlh.nlmsg_flags  = NLM_F_REQUEST,
> +		.nlh.nlmsg_type	  = RTM_NEWROUTE,
> +		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK |
> +				    NLM_F_CREATE | NLM_F_EXCL,
>  		.nlh.nlmsg_seq	  = nl_seq++,
>  
>  		.rtm.rtm_family	  = af,
>  		.rtm.rtm_table	  = RT_TABLE_MAIN,
>  		.rtm.rtm_scope	  = RT_SCOPE_UNIVERSE,
>  		.rtm.rtm_type	  = RTN_UNICAST,
> +		.rtm.rtm_protocol = RTPROT_BOOT,
>  
>  		.rta.rta_type	  = RTA_OIF,
>  		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
> -		.ifi		  = op == NL_SET ? ifi_ns : ifi,
> +		.ifi		  = ifi,
>  	};
> -	unsigned dup_routes = 0;
> -	ssize_t n, nlmsgs_size;
> -	struct nlmsghdr *nh;
> -	struct rtattr *rta;
>  	char buf[NLBUFSIZ];
> -	struct rtmsg *rtm;
> -	size_t na;
> -
> -	if (op == NL_SET) {
> -		if (af == AF_INET6) {
> -			size_t rta_len = RTA_LENGTH(sizeof(req.set.r6.d));
>  
> -			req.nlh.nlmsg_len = offsetof(struct req_t, set.r6)
> -				+ sizeof(req.set.r6);
> +	if (af == AF_INET6) {
> +		size_t rta_len = RTA_LENGTH(sizeof(req.set.r6.d));
>  
> -			req.set.r6.rta_dst.rta_type = RTA_DST;
> -			req.set.r6.rta_dst.rta_len = rta_len;
> +		req.nlh.nlmsg_len = offsetof(struct req_t, set.r6)
> +			+ sizeof(req.set.r6);
>  
> -			memcpy(&req.set.r6.a, gw, sizeof(req.set.r6.a));
> -			req.set.r6.rta_gw.rta_type = RTA_GATEWAY;
> -			req.set.r6.rta_gw.rta_len = rta_len;
> -		} else {
> -			size_t rta_len = RTA_LENGTH(sizeof(req.set.r4.d));
> +		req.set.r6.rta_dst.rta_type = RTA_DST;
> +		req.set.r6.rta_dst.rta_len = rta_len;
>  
> -			req.nlh.nlmsg_len = offsetof(struct req_t, set.r4)
> -				+ sizeof(req.set.r4);
> +		memcpy(&req.set.r6.a, gw, sizeof(req.set.r6.a));
> +		req.set.r6.rta_gw.rta_type = RTA_GATEWAY;
> +		req.set.r6.rta_gw.rta_len = rta_len;
> +	} else {
> +		size_t rta_len = RTA_LENGTH(sizeof(req.set.r4.d));
>  
> -			req.set.r4.rta_dst.rta_type = RTA_DST;
> -			req.set.r4.rta_dst.rta_len = rta_len;
> +		req.nlh.nlmsg_len = offsetof(struct req_t, set.r4)
> +			+ sizeof(req.set.r4);
>  
> -			req.set.r4.a = *(uint32_t *)gw;
> -			req.set.r4.rta_gw.rta_type = RTA_GATEWAY;
> -			req.set.r4.rta_gw.rta_len = rta_len;
> -		}
> +		req.set.r4.rta_dst.rta_type = RTA_DST;
> +		req.set.r4.rta_dst.rta_len = rta_len;
>  
> -		req.rtm.rtm_protocol = RTPROT_BOOT;
> -		req.nlh.nlmsg_flags |= NLM_F_ACK | NLM_F_EXCL | NLM_F_CREATE;
> -	} else {
> -		req.nlh.nlmsg_len = offsetof(struct req_t, set.r6);
> -		req.nlh.nlmsg_flags |= NLM_F_DUMP;
> +		req.set.r4.a = *(uint32_t *)gw;
> +		req.set.r4.rta_gw.rta_type = RTA_GATEWAY;
> +		req.set.r4.rta_gw.rta_len = rta_len;
>  	}
>  
> -	if ((n = nl_req(op == NL_SET, buf, &req, req.nlh.nlmsg_len)) < 0)
> -		return;
> +	nl_req(1, buf, &req, req.nlh.nlmsg_len);
> +}
>  
> -	if (op == NL_SET)
> +/**
> + * nl_route_dup() - Copy routes for given interface and address family
> + * @ifi:	Interface index in outer network namespace
> + * @ifi_ns:	Interface index in target namespace for NL_SET, NL_DUP
> + * @af:		Address family
> + */
> +void nl_route_dup(unsigned int ifi, unsigned int ifi_ns, sa_family_t af)
> +{
> +	struct req_t {
> +		struct nlmsghdr nlh;
> +		struct rtmsg rtm;
> +		struct rtattr rta;
> +		unsigned int ifi;
> +	} req = {
> +		.nlh.nlmsg_type	  = RTM_GETROUTE,
> +		.nlh.nlmsg_len	  = sizeof(req),
> +		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_DUMP,
> +		.nlh.nlmsg_seq	  = nl_seq++,
> +
> +		.rtm.rtm_family	  = af,
> +		.rtm.rtm_table	  = RT_TABLE_MAIN,
> +		.rtm.rtm_scope	  = RT_SCOPE_UNIVERSE,
> +		.rtm.rtm_type	  = RTN_UNICAST,
> +
> +		.rta.rta_type	  = RTA_OIF,
> +		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
> +		.ifi		  = ifi,
> +	};
> +	char buf[NLBUFSIZ], resp[NLBUFSIZ];
> +	unsigned dup_routes = 0;
> +	ssize_t n, nlmsgs_size;
> +	struct nlmsghdr *nh;
> +	unsigned i;
> +
> +	if ((n = nl_req(0, buf, &req, req.nlh.nlmsg_len)) < 0)
>  		return;
>  
> -	nh = (struct nlmsghdr *)buf;
>  	nlmsgs_size = n;
>  
> -	for ( ; NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
> -		if (nh->nlmsg_type != RTM_NEWROUTE)
> -			goto next;
> -
> -		if (op == NL_DUP) {
> -			nh->nlmsg_seq = nl_seq++;
> -			nh->nlmsg_pid = 0;
> -			nh->nlmsg_flags &= ~NLM_F_DUMP_FILTERED;
> -			nh->nlmsg_flags |= NLM_F_REQUEST | NLM_F_ACK |
> -					   NLM_F_CREATE;
> -			dup_routes++;
> -		}
> +	for (nh = (struct nlmsghdr *)buf;
> +	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
> +	     nh = NLMSG_NEXT(nh, n)) {
> +		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
> +		struct rtattr *rta;
> +		size_t na;
>  
> -		rtm = (struct rtmsg *)NLMSG_DATA(nh);
> -		if (op == NL_GET && rtm->rtm_dst_len)
> +		if (nh->nlmsg_type != RTM_NEWROUTE)
>  			continue;
>  
> +		nh->nlmsg_seq = nl_seq++;
> +		nh->nlmsg_pid = 0;
> +		nh->nlmsg_flags &= ~NLM_F_DUMP_FILTERED;
> +		nh->nlmsg_flags |= NLM_F_REQUEST | NLM_F_ACK |
> +			NLM_F_CREATE;
> +		dup_routes++;
> +
>  		for (rta = RTM_RTA(rtm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
>  		     rta = RTA_NEXT(rta, na)) {
> -			if (op == NL_GET) {
> -				if (rta->rta_type != RTA_GATEWAY)
> -					continue;
> -
> -				memcpy(gw, RTA_DATA(rta), RTA_PAYLOAD(rta));
> -				return;
> -			}
> -
> -			if (op == NL_DUP && rta->rta_type == RTA_OIF)
> +			if (rta->rta_type == RTA_OIF)
>  				*(unsigned int *)RTA_DATA(rta) = ifi_ns;
>  		}
> -
> -next:
> -		if (nh->nlmsg_type == NLMSG_DONE)
> -			break;
>  	}
>  
> -	if (op == NL_DUP) {
> -		char resp[NLBUFSIZ];
> -		unsigned i;
> -
> -		nh = (struct nlmsghdr *)buf;
> -		/* Routes might have dependencies between each other, and the
> -		 * kernel processes RTM_NEWROUTE messages sequentially. For n
> -		 * valid routes, we might need to send up to n requests to get
> -		 * all of them inserted. Routes that have been already inserted
> -		 * won't cause the whole request to fail, so we can simply
> -		 * repeat the whole request. This approach avoids the need to
> -		 * calculate dependencies: let the kernel do that.
> -		 */
> -		for (i = 0; i < dup_routes; i++)
> -			nl_req(1, resp, nh, nlmsgs_size);
> -	}
> +	nh = (struct nlmsghdr *)buf;
> +	/* Routes might have dependencies between each other, and the
> +	 * kernel processes RTM_NEWROUTE messages sequentially. For n
> +	 * valid routes, we might need to send up to n requests to get
> +	 * all of them inserted. Routes that have been already
> +	 * inserted won't cause the whole request to fail, so we can
> +	 * simply repeat the whole request. This approach avoids the
> +	 * need to calculate dependencies: let the kernel do that.
> +	 */

Or:

	/* Routes might have dependencies between each other, and the kernel
	 * processes RTM_NEWROUTE messages sequentially. For n valid routes, we
	 * might need to send up to n requests to get all of them inserted.
	 * Routes that have been already inserted won't cause the whole request
	 * to fail, so we can simply repeat the whole request. This approach
	 * avoids the need to calculate dependencies: let the kernel do that.
	 */

(can also be "fixed" in 6/17).

-- 
Stefano


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 02/17] netlink: Split nl_addr() into separate operation functions
  2023-07-24  6:09 ` [PATCH 02/17] netlink: Split nl_addr() into separate operation functions David Gibson
@ 2023-08-02 22:47   ` Stefano Brivio
  2023-08-03  2:11     ` David Gibson
  0 siblings, 1 reply; 35+ messages in thread
From: Stefano Brivio @ 2023-08-02 22:47 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Mon, 24 Jul 2023 16:09:21 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> nl_addr() can perform three quite different operations based on the 'op'
> parameter, each of which uses a different subset of the parameters.  Split
> them up into a function for each operation.  This does use more lines of
> code, but the overlap wasn't that great, and the separated logic is much
> easier to follow.
> 
> It's also clearer in the callers what we expect the netlink operations to
> do, and what information it uses.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  conf.c    |  12 ++-
>  netlink.c | 232 ++++++++++++++++++++++++++++++++----------------------
>  netlink.h |   6 +-
>  pasta.c   |  17 ++--
>  4 files changed, 159 insertions(+), 108 deletions(-)
> 
> diff --git a/conf.c b/conf.c
> index 2ff9e2a..2057028 100644
> --- a/conf.c
> +++ b/conf.c
> @@ -650,10 +650,8 @@ static unsigned int conf_ip4(unsigned int ifi,
>  	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->gw))
>  		nl_route(NL_GET, ifi, 0, AF_INET, &ip4->gw);
>  
> -	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr)) {
> -		nl_addr(NL_GET, ifi, 0, AF_INET,
> -			&ip4->addr, &ip4->prefix_len, NULL);
> -	}
> +	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr))
> +		nl_addr_get(ifi, AF_INET, &ip4->addr, &ip4->prefix_len, NULL);
>  
>  	if (!ip4->prefix_len) {
>  		in_addr_t addr = ntohl(ip4->addr.s_addr);
> @@ -703,9 +701,9 @@ static unsigned int conf_ip6(unsigned int ifi,
>  	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->gw))
>  		nl_route(NL_GET, ifi, 0, AF_INET6, &ip6->gw);
>  
> -	nl_addr(NL_GET, ifi, 0, AF_INET6,
> -		IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ? &ip6->addr : NULL,
> -		&prefix_len, &ip6->addr_ll);
> +	nl_addr_get(ifi, AF_INET6,
> +		    IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ? &ip6->addr : NULL,
> +		    &prefix_len, &ip6->addr_ll);
>  
>  	memcpy(&ip6->addr_seen, &ip6->addr, sizeof(ip6->addr));
>  	memcpy(&ip6->addr_ll_seen, &ip6->addr_ll, sizeof(ip6->addr_ll));
> diff --git a/netlink.c b/netlink.c
> index 4b1f75e..269d738 100644
> --- a/netlink.c
> +++ b/netlink.c
> @@ -334,17 +334,76 @@ next:
>  }
>  
>  /**
> - * nl_addr() - Get/set/copy IP addresses for given interface and address family
> - * @op:		Requested operation
> + * nl_addr_get() - Get IP address for given interface and address family
>   * @ifi:	Interface index in outer network namespace
> - * @ifi_ns:	Interface index in target namespace for NL_SET, NL_DUP
>   * @af:		Address family
> - * @addr:	Global address to fill on NL_GET, to set on NL_SET
> - * @prefix_len:	Mask or prefix length, set or fetched (for IPv4)
> - * @addr_l:	Link-scoped address to fill on NL_GET
> + * @addr:	Global address to fill
> + * @prefix_len:	Mask or prefix length, to fill (for IPv4)
> + * @addr_l:	Link-scoped address to fill (for IPv6)
> + */
> +void nl_addr_get(unsigned int ifi, sa_family_t af, void *addr,
> +		 int *prefix_len, void *addr_l)
> +{
> +	struct req_t {
> +		struct nlmsghdr nlh;
> +		struct ifaddrmsg ifa;
> +	} req = {
> +		.nlh.nlmsg_type    = RTM_GETADDR,
> +		.nlh.nlmsg_flags   = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
> +		.nlh.nlmsg_len     = sizeof(req),
> +		.nlh.nlmsg_seq     = nl_seq++,
> +
> +		.ifa.ifa_family    = af,
> +		.ifa.ifa_index     = ifi,
> +	};
> +	struct nlmsghdr *nh;
> +	char buf[NLBUFSIZ];
> +	ssize_t n;
> +
> +	if ((n = nl_req(0, buf, &req, req.nlh.nlmsg_len)) < 0)
> +		return;
> +
> +	for (nh = (struct nlmsghdr *)buf;
> +	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
> +	     nh = NLMSG_NEXT(nh, n)) {
> +		struct ifaddrmsg *ifa = (struct ifaddrmsg *)NLMSG_DATA(nh);
> +		struct rtattr *rta;
> +		size_t na;
> +
> +		if (nh->nlmsg_type != RTM_NEWADDR)
> +			continue;
> +
> +		if (ifa->ifa_index != ifi)
> +			continue;
> +
> +		for (rta = IFA_RTA(ifa), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
> +		     rta = RTA_NEXT(rta, na)) {
> +			if (rta->rta_type != IFA_ADDRESS)
> +				continue;
> +
> +			if (af == AF_INET) {
> +				memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta));
> +				*prefix_len = ifa->ifa_prefixlen;
> +			} else if (af == AF_INET6 && addr &&
> +				   ifa->ifa_scope == RT_SCOPE_UNIVERSE) {
> +				memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta));
> +			}
> +
> +			if (addr_l &&
> +			    af == AF_INET6 && ifa->ifa_scope == RT_SCOPE_LINK)
> +				memcpy(addr_l, RTA_DATA(rta), RTA_PAYLOAD(rta));
> +		}
> +	}
> +}
> +
> +/**
> + * nl_add_set() - Set IP addresses for given interface and address family
> + * @ifi:	Interface index
> + * @af:		Address family
> + * @addr:	Global address to set
> + * @prefix_len:	Mask or prefix length to set
>   */
> -void nl_addr(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
> -	     sa_family_t af, void *addr, int *prefix_len, void *addr_l)
> +void nl_addr_set(unsigned int ifi, sa_family_t af, void *addr, int prefix_len)
>  {
>  	struct req_t {
>  		struct nlmsghdr nlh;
> @@ -364,125 +423,112 @@ void nl_addr(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
>  			} a6;
>  		} set;
>  	} req = {
> -		.nlh.nlmsg_type    = op == NL_SET ? RTM_NEWADDR : RTM_GETADDR,
> -		.nlh.nlmsg_flags   = NLM_F_REQUEST,
> +		.nlh.nlmsg_type    = RTM_NEWADDR,
> +		.nlh.nlmsg_flags   = NLM_F_REQUEST | NLM_F_ACK |
> +				     NLM_F_CREATE | NLM_F_EXCL,
>  		.nlh.nlmsg_len     = NLMSG_LENGTH(sizeof(struct ifaddrmsg)),
>  		.nlh.nlmsg_seq     = nl_seq++,
>  
>  		.ifa.ifa_family    = af,
> -		.ifa.ifa_index     = op == NL_SET ? ifi_ns : ifi,
> -		.ifa.ifa_prefixlen = op == NL_SET ? *prefix_len : 0,
> +		.ifa.ifa_index     = ifi,
> +		.ifa.ifa_prefixlen = prefix_len,
> +		.ifa.ifa_scope	   = RT_SCOPE_UNIVERSE,
>  	};
> -	ssize_t n, nlmsgs_size;
> -	struct ifaddrmsg *ifa;
> -	struct nlmsghdr *nh;
> -	struct rtattr *rta;
>  	char buf[NLBUFSIZ];
> -	size_t na;
>  
> -	if (op == NL_SET) {
> -		if (af == AF_INET6) {
> -			size_t rta_len = RTA_LENGTH(sizeof(req.set.a6.l));
> +	if (af == AF_INET6) {
> +		size_t rta_len = RTA_LENGTH(sizeof(req.set.a6.l));
>  
> -			/* By default, strictly speaking, it's duplicated */
> -			req.ifa.ifa_flags = IFA_F_NODAD;
> +		/* By default, strictly speaking, it's duplicated */
> +		req.ifa.ifa_flags = IFA_F_NODAD;
>  
> -			req.nlh.nlmsg_len = offsetof(struct req_t, set.a6)
> -				+ sizeof(req.set.a6);
> +		req.nlh.nlmsg_len = offsetof(struct req_t, set.a6)
> +			+ sizeof(req.set.a6);
>  
> -			memcpy(&req.set.a6.l, addr, sizeof(req.set.a6.l));
> -			req.set.a6.rta_l.rta_len = rta_len;
> -			req.set.a4.rta_l.rta_type = IFA_LOCAL;
> -			memcpy(&req.set.a6.a, addr, sizeof(req.set.a6.a));
> -			req.set.a6.rta_a.rta_len = rta_len;
> -			req.set.a6.rta_a.rta_type = IFA_ADDRESS;
> -		} else {
> -			size_t rta_len = RTA_LENGTH(sizeof(req.set.a4.l));
> -
> -			req.nlh.nlmsg_len = offsetof(struct req_t, set.a4)
> -				+ sizeof(req.set.a4);
> +		memcpy(&req.set.a6.l, addr, sizeof(req.set.a6.l));
> +		req.set.a6.rta_l.rta_len = rta_len;
> +		req.set.a4.rta_l.rta_type = IFA_LOCAL;
> +		memcpy(&req.set.a6.a, addr, sizeof(req.set.a6.a));
> +		req.set.a6.rta_a.rta_len = rta_len;
> +		req.set.a6.rta_a.rta_type = IFA_ADDRESS;
> +	} else {
> +		size_t rta_len = RTA_LENGTH(sizeof(req.set.a4.l));
>  
> -			req.set.a4.l = req.set.a4.a = *(uint32_t *)addr;
> -			req.set.a4.rta_l.rta_len = rta_len;
> -			req.set.a4.rta_l.rta_type = IFA_LOCAL;
> -			req.set.a4.rta_a.rta_len = rta_len;
> -			req.set.a4.rta_a.rta_type = IFA_ADDRESS;
> -		}
> +		req.nlh.nlmsg_len = offsetof(struct req_t, set.a4)
> +			+ sizeof(req.set.a4);
>  
> -		req.ifa.ifa_scope = RT_SCOPE_UNIVERSE;
> -		req.nlh.nlmsg_flags |= NLM_F_CREATE | NLM_F_ACK | NLM_F_EXCL;
> -	} else {
> -		req.nlh.nlmsg_flags |= NLM_F_DUMP;
> +		req.set.a4.l = req.set.a4.a = *(uint32_t *)addr;
> +		req.set.a4.rta_l.rta_len = rta_len;
> +		req.set.a4.rta_l.rta_type = IFA_LOCAL;
> +		req.set.a4.rta_a.rta_len = rta_len;
> +		req.set.a4.rta_a.rta_type = IFA_ADDRESS;
>  	}
>  
> -	if ((n = nl_req(op == NL_SET, buf, &req, req.nlh.nlmsg_len)) < 0)
> -		return;
> +	nl_req(1, buf, &req, req.nlh.nlmsg_len);
> +}
>  
> -	if (op == NL_SET)
> +/**
> + * nl_addr_dup() - Copy IP addresses for given interface and address family
> + * @ifi:	Interface index in outer network namespace
> + * @ifi_ns:	Interface index in target namespace
> + * @af:		Address family
> + */
> +void nl_addr_dup(unsigned int ifi, unsigned int ifi_ns, sa_family_t af)
> +{
> +	struct req_t {
> +		struct nlmsghdr nlh;
> +		struct ifaddrmsg ifa;
> +	} req = {
> +		.nlh.nlmsg_type    = RTM_GETADDR,
> +		.nlh.nlmsg_flags   = NLM_F_REQUEST | NLM_F_DUMP,
> +		.nlh.nlmsg_len     = sizeof(req),
> +		.nlh.nlmsg_seq     = nl_seq++,
> +
> +		.ifa.ifa_family    = af,
> +		.ifa.ifa_index     = ifi,
> +		.ifa.ifa_prefixlen = 0,
> +	};
> +	char buf[NLBUFSIZ], resp[NLBUFSIZ];
> +	ssize_t n, nlmsgs_size;
> +	struct nlmsghdr *nh;
> +
> +	if ((n = nl_req(0, buf, &req, sizeof(req))) < 0)
>  		return;
>  
> -	nh = (struct nlmsghdr *)buf;
>  	nlmsgs_size = n;
>  
> -	for ( ; NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
> +	for (nh = (struct nlmsghdr *)buf;
> +	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
> +	     nh = NLMSG_NEXT(nh, n)) {
> +		struct ifaddrmsg *ifa;
> +		struct rtattr *rta;
> +		size_t na;
> +
>  		if (nh->nlmsg_type != RTM_NEWADDR)
> -			goto next;
> +			continue;
>  
> -		if (op == NL_DUP) {
> -			nh->nlmsg_seq = nl_seq++;
> -			nh->nlmsg_pid = 0;
> -			nh->nlmsg_flags &= ~NLM_F_DUMP_FILTERED;
> -			nh->nlmsg_flags |= NLM_F_REQUEST | NLM_F_ACK |
> -					   NLM_F_CREATE;
> -		}
> +		nh->nlmsg_seq = nl_seq++;
> +		nh->nlmsg_pid = 0;
> +		nh->nlmsg_flags &= ~NLM_F_DUMP_FILTERED;
> +		nh->nlmsg_flags |= NLM_F_REQUEST | NLM_F_ACK | NLM_F_CREATE;
>  
>  		ifa = (struct ifaddrmsg *)NLMSG_DATA(nh);
>  
> -		if (op == NL_DUP && (ifa->ifa_scope == RT_SCOPE_LINK ||
> -				     ifa->ifa_index != ifi)) {
> +		if (ifa->ifa_scope == RT_SCOPE_LINK || ifa->ifa_index != ifi) {
>  			ifa->ifa_family = AF_UNSPEC;
> -			goto next;
> +			continue;
>  		}
>  
> -		if (ifa->ifa_index != ifi)
> -			goto next;
> -
> -		if (op == NL_DUP)
> -			ifa->ifa_index = ifi_ns;
> +		ifa->ifa_index = ifi_ns;
>  
>  		for (rta = IFA_RTA(ifa), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
>  		     rta = RTA_NEXT(rta, na)) {
> -			if (op == NL_DUP && rta->rta_type == IFA_LABEL)
> +			if (rta->rta_type == IFA_LABEL)
>  				rta->rta_type = IFA_UNSPEC;
> -
> -			if (op == NL_DUP || rta->rta_type != IFA_ADDRESS)
> -				continue;
> -
> -			if (af == AF_INET && addr && !*(uint32_t *)addr) {
> -				memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta));
> -				*prefix_len = ifa->ifa_prefixlen;
> -			} else if (af == AF_INET6 && addr &&
> -				 ifa->ifa_scope == RT_SCOPE_UNIVERSE &&
> -				 IN6_IS_ADDR_UNSPECIFIED(addr)) {
> -				memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta));
> -			}
> -
> -			if (addr_l &&
> -			    af == AF_INET6 && ifa->ifa_scope == RT_SCOPE_LINK &&
> -			    IN6_IS_ADDR_UNSPECIFIED(addr_l))
> -				memcpy(addr_l, RTA_DATA(rta), RTA_PAYLOAD(rta));
>  		}
> -next:
> -		if (nh->nlmsg_type == NLMSG_DONE)
> -			break;
>  	}
>  
> -	if (op == NL_DUP) {
> -		char resp[NLBUFSIZ];
> -
> -		nh = (struct nlmsghdr *)buf;
> -		nl_req(1, resp, nh, nlmsgs_size);
> -	}
> +	nl_req(1, resp, buf, nlmsgs_size);
>  }
>  
>  /**
> diff --git a/netlink.h b/netlink.h
> index 980ac44..5ac972d 100644
> --- a/netlink.h
> +++ b/netlink.h
> @@ -16,8 +16,10 @@ void nl_sock_init(const struct ctx *c, bool ns);
>  unsigned int nl_get_ext_if(sa_family_t af);
>  void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
>  	      sa_family_t af, void *gw);
> -void nl_addr(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
> -	     sa_family_t af, void *addr, int *prefix_len, void *addr_l);
> +void nl_addr_get(unsigned int ifi, sa_family_t af, void *addr,
> +		 int *prefix_len, void *addr_l);
> +void nl_addr_set(unsigned int ifi, sa_family_t af, void *addr, int prefix_len);
> +void nl_addr_dup(unsigned int ifi, unsigned int ifi_ns, sa_family_t af);
>  void nl_link_get_mac(int ns, unsigned int ifi, void *mac);
>  void nl_link_set_mac(int ns, unsigned int ifi, void *mac);
>  void nl_link_up(int ns, unsigned int ifi, int mtu);
> diff --git a/pasta.c b/pasta.c
> index 3b5537d..1a8f09c 100644
> --- a/pasta.c
> +++ b/pasta.c
> @@ -282,21 +282,26 @@ void pasta_ns_conf(struct ctx *c)
>  
>  	if (c->pasta_conf_ns) {
>  		enum nl_op op_routes = c->no_copy_routes ? NL_SET : NL_DUP;
> -		enum nl_op op_addrs =  c->no_copy_addrs  ? NL_SET : NL_DUP;
>  
>  		nl_link_up(1, c->pasta_ifi, c->mtu);
>  
>  		if (c->ifi4) {
> -			nl_addr(op_addrs, c->ifi4, c->pasta_ifi, AF_INET,
> -				&c->ip4.addr, &c->ip4.prefix_len, NULL);
> +			if (c->no_copy_addrs)
> +				nl_addr_set(c->pasta_ifi, AF_INET, 
> +					    &c->ip4.addr, c->ip4.prefix_len);
> +			else
> +				nl_addr_dup(c->ifi4, c->pasta_ifi, AF_INET);
> +
>  			nl_route(op_routes, c->ifi4, c->pasta_ifi, AF_INET,
>  				 &c->ip4.gw);
>  		}
>  
>  		if (c->ifi6) {
> -			int prefix_len = 64;
> -			nl_addr(op_addrs, c->ifi6, c->pasta_ifi, AF_INET6,
> -				&c->ip6.addr, &prefix_len, NULL);
> +			if (c->no_copy_addrs)
> +				nl_addr_set(c->pasta_ifi, AF_INET6, &c->ip6.addr, 64);
> +			else
> +				nl_addr_dup(c->ifi4, c->pasta_ifi, AF_INET6);

I guess this should be c->ifi6 (also in 17/17).

-- 
Stefano


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 08/17] netlink: Treat send() or recv() errors as fatal
  2023-07-24  6:09 ` [PATCH 08/17] netlink: Treat send() or recv() errors as fatal David Gibson
@ 2023-08-02 22:47   ` Stefano Brivio
  2023-08-03  2:19     ` David Gibson
  0 siblings, 1 reply; 35+ messages in thread
From: Stefano Brivio @ 2023-08-02 22:47 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Mon, 24 Jul 2023 16:09:27 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> Errors on send() or recv() calls on a netlink socket don't indicate errors
> with the netlink operations we're attempting, but rather that something's
> gone wrong with the mechanics of netlink itself.  We don't really expect
> this to ever happen, and if it does, it's not clear what we could to to
> recover.
> 
> So, treat errors from these calls as fatal, rather than returning the error
> up the stack.  This makes handling failures in the callers of nl_req()
> simpler.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  netlink.c | 36 +++++++++++++++++-------------------
>  1 file changed, 17 insertions(+), 19 deletions(-)
> 
> diff --git a/netlink.c b/netlink.c
> index 3620fd6..826c926 100644
> --- a/netlink.c
> +++ b/netlink.c
> @@ -103,9 +103,9 @@ fail:
>   * @req:	Request with netlink header
>   * @len:	Request length
>   *
> - * Return: received length on success, negative error code on failure
> + * Return: received length on success, terminates on error
>   */
> -static int nl_req(int s, char *buf, const void *req, ssize_t len)
> +static ssize_t nl_req(int s, char *buf, const void *req, ssize_t len)
>  {
>  	char flush[NLBUFSIZ];
>  	int done = 0;
> @@ -124,11 +124,17 @@ static int nl_req(int s, char *buf, const void *req, ssize_t len)
>  		}
>  	}
>  
> -	if ((send(s, req, len, 0) < len) ||
> -	    (len = recv(s, buf, NLBUFSIZ, 0)) < 0)
> -		return -errno;
> +	n = send(s, req, len, 0);
> +	if (n < 0)
> +		die("netlink: Failed to send(): %s", strerror(errno));
> +	else if (n < len)
> +		die("netlink: Short send");

If you respin, probably worth doing:

		die("netlink: Short send (%li out of %li bytes)", n, len);

-- 
Stefano


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 10/17] netlink: Add nl_do() helper for simple operations with error checking
  2023-07-24  6:09 ` [PATCH 10/17] netlink: Add nl_do() helper for simple operations with error checking David Gibson
@ 2023-08-02 22:48   ` Stefano Brivio
  2023-08-03  2:24     ` David Gibson
  0 siblings, 1 reply; 35+ messages in thread
From: Stefano Brivio @ 2023-08-02 22:48 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Mon, 24 Jul 2023 16:09:29 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> So far we never checked for errors reported on netlink operations via
> NLMSG_ERROR messages.  This has led to several subtle and tricky to debug
> situations which would have been obvious if we knew that certain netlink
> operations had failed.
> 
> Introduce a nl_do() helper that performs netlink "do" operations (that is
> making a single change without retreiving complex information) with much
> more thorough error checking.  As well as returning an error code if we
> get an NLMSG_ERROR message, we also check for unexpected behaviour in
> several places.  That way if we've made a mistake in our assumptions about
> how netlink works it should result in a clear error rather than some subtle
> misbehaviour.
> 
> We update those calls to nl_req() that can use the new wrapper to do so.
> We will extend those to better handle errors in future.  We don't touch
> non-"do" operations for now, those are a bit trickier.
> 
> Link: https://bugs.passt.top/show_bug.cgi?id=60
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  netlink.c | 59 ++++++++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 47 insertions(+), 12 deletions(-)
> 
> diff --git a/netlink.c b/netlink.c
> index 3170344..cdd65c0 100644
> --- a/netlink.c
> +++ b/netlink.c
> @@ -148,6 +148,47 @@ static ssize_t nl_req(int s, char *buf, void *req,
>  	return n;
>  }
>  
> +/**
> + * nl_do() - Send netlink "do" request, and wait for acknowledgement
> + * @s:		Netlink socket
> + * @req:	Request (will fill netlink header)
> + * @type:	Request type
> + * @flags:	Extra request flags (NLM_F_REQUEST and NLM_F_ACK assumed)
> + * @len:	Request length
> + *
> + * Return: 0 on success, negative error code on error
> + */
> +static int nl_do(int s, void *req, uint16_t type, uint16_t flags, ssize_t len)
> +{
> +	struct nlmsghdr *nh;
> +	char buf[NLBUFSIZ];
> +	uint16_t seq;
> +	ssize_t n;
> +
> +	n = nl_req(s, buf, req, type, flags, len);
> +	seq = ((struct nlmsghdr *)req)->nlmsg_seq;
> +
> +	for (nh = (struct nlmsghdr *)buf;
> +	     NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
> +		struct nlmsgerr *errmsg;
> +
> +		if (nh->nlmsg_seq != seq)
> +			die("netlink: Unexpected response sequence number");
> +
> +		switch (nh->nlmsg_type) {
> +		case NLMSG_DONE:
> +			return 0;
> +		case NLMSG_ERROR:
> +			errmsg = (struct nlmsgerr *)NLMSG_DATA(nh);
> +			return errmsg->error;

This is an errno, we should probably print it here ...and, now reading
14/17 and 16/17: saving repeated strerror() calls there. On the other
hand this has the advantage of one single error message instead of two,
but... hmm.

-- 
Stefano


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 11/17] netlink: Clearer reasoning about the netlink response buffer size
  2023-07-24  6:09 ` [PATCH 11/17] netlink: Clearer reasoning about the netlink response buffer size David Gibson
@ 2023-08-02 22:48   ` Stefano Brivio
  2023-08-03  2:22     ` David Gibson
  0 siblings, 1 reply; 35+ messages in thread
From: Stefano Brivio @ 2023-08-02 22:48 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Mon, 24 Jul 2023 16:09:30 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> Currently we set NLBUFSIZ large enough for 8192 netlink headers (128kiB in
> total), and reference netlink(7).  However netlink(7) says nothing about
> reponse buffer sizes, and the documents which do reference 8192 *bytes* not
> 8192 headers.

Oops.

> Update NLBUFSIZ to 64kiB with a more detailed rationale.
> 
> Link: https://bugs.passt.top/show_bug.cgi?id=67
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  netlink.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/netlink.c b/netlink.c
> index cdd65c0..d553ddd 100644
> --- a/netlink.c
> +++ b/netlink.c
> @@ -35,7 +35,14 @@
>  #include "log.h"
>  #include "netlink.h"
>  
> -#define NLBUFSIZ	(8192 * sizeof(struct nlmsghdr)) /* See netlink(7) */
> +/* Netlink expects a buffer of at least 8kiB or the system page size,
> + * whichever is larger.  32kiB is recommended for more efficient.
> + * Since the largest page size on any remotely common Linux setup is
> + * 64kiB (ppc64), that should cover it.
> + *
> + * https://www.kernel.org/doc/html/next/userspace-api/netlink/intro.html#buffer-sizing
> + */
> +#define NLBUFSIZ 65536

I'm fine with this, but we also have PAGE_SIZE and MAX() defined. Or
maybe it's more reasonable to keep this constant. I'm not sure.

-- 
Stefano


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 17/17] netlink: Propagate errors for "dup" operations
  2023-07-24  6:09 ` [PATCH 17/17] netlink: Propagate errors for "dup" operations David Gibson
@ 2023-08-02 22:48   ` Stefano Brivio
  2023-08-03  2:26     ` David Gibson
  0 siblings, 1 reply; 35+ messages in thread
From: Stefano Brivio @ 2023-08-02 22:48 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Mon, 24 Jul 2023 16:09:36 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> We now detect errors on netlink "set" operations while configuring the
> pasta namespace with --config-net.  However in many cases rather than
> a simple "set" we use a more complex "dup" function to copy
> configuration from the host to the namespace.  We're not yet properly
> detecting and reporting netlink errors for that case.
> 
> Change the "dup" operations to propagate netlink errors to their
> caller, pasta_ns_conf() and report them there.
> 
> Link: https://bugs.passt.top/show_bug.cgi?id=60
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  netlink.c | 40 ++++++++++++++++++++++++++++------------
>  netlink.h |  8 ++++----
>  pasta.c   | 15 ++++++++-------
>  3 files changed, 40 insertions(+), 23 deletions(-)
> 
> diff --git a/netlink.c b/netlink.c
> index 9e72b16..cdc18c0 100644
> --- a/netlink.c
> +++ b/netlink.c
> @@ -413,9 +413,11 @@ int nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw)
>   * @s_dst:	Netlink socket in destination namespace
>   * @ifi_dst:	Interface index in destination namespace
>   * @af:		Address family
> + *
> + * Return: 0 on success, negative error code on failure
>   */
> -void nl_route_dup(int s_src, unsigned int ifi_src,
> -		  int s_dst, unsigned int ifi_dst, sa_family_t af)
> +int nl_route_dup(int s_src, unsigned int ifi_src,
> +		 int s_dst, unsigned int ifi_dst, sa_family_t af)
>  {
>  	struct req_t {
>  		struct nlmsghdr nlh;
> @@ -477,9 +479,11 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
>  
>  		if (extra) {
>  			err("netlink: Too many routes to duplicate");
> -			return;
> +			return -E2BIG;

This is "Argument list too long", and... I don't have much better
ideas. I would instinctively use ENOSPC or ENOMEM in this case, but
both are slightly misleading in different ways, too.

>  		}
>  	}
> +	if (status < 0)
> +		return status;
>  
>  	/* Routes might have dependencies between each other, and the
>  	 * kernel processes RTM_NEWROUTE messages sequentially. For n
> @@ -494,15 +498,20 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
>  		     NLMSG_OK(nh, status);
>  		     nh = NLMSG_NEXT(nh, status)) {
>  			uint16_t flags = nh->nlmsg_flags;
> +			int rc;
>  
>  			if (nh->nlmsg_type != RTM_NEWROUTE)
>  				continue;
>  
> -			nl_do(s_dst, nh, RTM_NEWROUTE,
> -			       (flags & ~NLM_F_DUMP_FILTERED) | NLM_F_CREATE,
> -			       nh->nlmsg_len);
> +			rc = nl_do(s_dst, nh, RTM_NEWROUTE,
> +				   (flags & ~NLM_F_DUMP_FILTERED) | NLM_F_CREATE,
> +				   nh->nlmsg_len);
> +			if (rc < 0 && rc != -ENETUNREACH && rc != -EEXIST)
> +				return rc;
>  		}
>  	}
> +
> +	return 0;
>  }
>  
>  /**
> @@ -635,9 +644,11 @@ int nl_addr_set(int s, unsigned int ifi, sa_family_t af,
>   * @s_dst:	Netlink socket in destination network namespace
>   * @ifi_dst:	Interface index in destination namespace
>   * @af:		Address family
> + *
> + * Return: 0 on success, negative error code on failure
>   */
> -void nl_addr_dup(int s_src, unsigned int ifi_src,
> -		 int s_dst, unsigned int ifi_dst, sa_family_t af)
> +int nl_addr_dup(int s_src, unsigned int ifi_src,
> +		int s_dst, unsigned int ifi_dst, sa_family_t af)
>  {
>  	struct req_t {
>  		struct nlmsghdr nlh;
> @@ -651,6 +662,7 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
>  	struct nlmsghdr *nh;
>  	ssize_t status;
>  	uint16_t seq;
> +	int rc= 0;

Missing whitespace.

-- 
Stefano


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/17] netlink: Split up functionality if nl_link()
  2023-08-02 22:47   ` Stefano Brivio
@ 2023-08-03  2:09     ` David Gibson
  2023-08-03  4:29       ` David Gibson
  0 siblings, 1 reply; 35+ messages in thread
From: David Gibson @ 2023-08-03  2:09 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 12057 bytes --]

On Thu, Aug 03, 2023 at 12:47:29AM +0200, Stefano Brivio wrote:
> In the subject: s/if/of/.
> 
> On Mon, 24 Jul 2023 16:09:20 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > nl_link() performs a number of functions: it can bring links up, set MAC
> > address and MTU and also retrieve the existing MAC.  This makes for a small
> > number of lines of code, but high conceptual complexity: it's quite hard
> > to follow what's going on both in nl_link() itself and it's also not very
> > obvious which function its callers are intending to use.
> 
> Actually I don't find nl_link() *that* bad, but for consistency with the
> next patches this definitely makes sense.

Eh.

> > Clarify this, by splitting nl_link() into nl_link_up(), nl_link_set_mac(),
> > and nl_link_get_mac().  The first brings up a link, optionally setting the
> > MTU, the others get or set the MAC address.
> > 
> > This fixes an arguable bug in pasta_ns_conf(): it looks as though that was
> > intended to retrieve the guest MAC whether or not c->pasta_conf_ns is set.
> > However, it only actually does so in the !c->pasta_conf_ns case: the fact
> > that we set up==1 means we would only ever set, never get, the MAC in the
> > nl_link() call in the other path.  We get away with this because the MAC
> > will quickly be discovered once we receive packets on the tap interface.
> > Still, it's neater to always get the MAC address here.
> 
> Actually, the intention wasn't to always retrieve the namespaced MAC
> address: I thought I'd do that only if we don't configure the
> interface, because we want NDP and DHCP to be "ready".

Huh, ok.  Still very hard to follow though, because a policy decision
of the caller is being implemented by the subtle interactions of
parameters to nl_link() itself, making the intent clear in neither
location.

> But that's not
> really relevant... I guess yes, it's more consistent if we fetch it in
> any case (as long as we don't configure it).
> 
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  conf.c    |   4 +-
> >  netlink.c | 143 +++++++++++++++++++++++++++++++-----------------------
> >  netlink.h |   4 +-
> >  pasta.c   |  12 +++--
> >  4 files changed, 96 insertions(+), 67 deletions(-)
> > 
> > diff --git a/conf.c b/conf.c
> > index 78eaf2d..2ff9e2a 100644
> > --- a/conf.c
> > +++ b/conf.c
> > @@ -670,7 +670,7 @@ static unsigned int conf_ip4(unsigned int ifi,
> >  	memcpy(&ip4->addr_seen, &ip4->addr, sizeof(ip4->addr_seen));
> >  
> >  	if (MAC_IS_ZERO(mac))
> > -		nl_link(0, ifi, mac, 0, 0);
> > +		nl_link_get_mac(0, ifi, mac);
> >  
> >  	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr) ||
> >  	    MAC_IS_ZERO(mac))
> > @@ -711,7 +711,7 @@ static unsigned int conf_ip6(unsigned int ifi,
> >  	memcpy(&ip6->addr_ll_seen, &ip6->addr_ll, sizeof(ip6->addr_ll));
> >  
> >  	if (MAC_IS_ZERO(mac))
> > -		nl_link(0, ifi, mac, 0, 0);
> > +		nl_link_get_mac(0, ifi, mac);
> >  
> >  	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ||
> >  	    IN6_IS_ADDR_UNSPECIFIED(&ip6->addr_ll) ||
> > diff --git a/netlink.c b/netlink.c
> > index e15e23f..4b1f75e 100644
> > --- a/netlink.c
> > +++ b/netlink.c
> > @@ -486,83 +486,44 @@ next:
> >  }
> >  
> >  /**
> > - * nl_link() - Get/set link attributes
> > + * nl_link_get_mac() - Get link MAC address
> >   * @ns:		Use netlink socket in namespace
> >   * @ifi:	Interface index
> > - * @mac:	MAC address to fill, if passed as zero, to set otherwise
> > - * @up:		If set, bring up the link
> > - * @mtu:	If non-zero, set interface MTU
> > + * @mac:	Fill with current MAC address
> >   */
> > -void nl_link(int ns, unsigned int ifi, void *mac, int up, int mtu)
> > +void nl_link_get_mac(int ns, unsigned int ifi, void *mac)
> >  {
> > -	int change = !MAC_IS_ZERO(mac) || up || mtu;
> >  	struct req_t {
> >  		struct nlmsghdr nlh;
> >  		struct ifinfomsg ifm;
> > -		struct rtattr rta;
> > -		union {
> > -			unsigned char mac[ETH_ALEN];
> > -			struct {
> > -				unsigned int mtu;
> > -			} mtu;
> > -		} set;
> >  	} req = {
> > -		.nlh.nlmsg_type   = change ? RTM_NEWLINK : RTM_GETLINK,
> > -		.nlh.nlmsg_len    = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
> > -		.nlh.nlmsg_flags  = NLM_F_REQUEST | (change ? NLM_F_ACK : 0),
> > +		.nlh.nlmsg_type	  = RTM_GETLINK,
> > +		.nlh.nlmsg_len	  = sizeof(req),
> 
> I don't think there's a practical issue with this, but there were two
> reasons why I used NLMSG_LENGTH(sizeof(struct ifinfomsg)) instead:
> 
> - NLMSG_LENGTH() aligns to 4 bytes, not to whatever
>   architecture-dependent alignment we might have: the message might
>   actually be smaller

Oof... so.  On the one hand, I see the issue; if these are different,
I'm not sure what the effect will be.  On the other hand, if we use
NLMSG_LENGTH and it *is* longer than the structure size, we'll be
saying that this message is longer than the datagram containing it.
I'm not sure what the effect of that will be either.

Not really sure what to do about this.

> - I see that this works with gcc and clang, but, strictly
>   speaking, is the size of the struct known "before"
>   (sequence-point-wise) we're done initialising it? I have a very vague
>   memory of this not working with gcc 2.9 or suchlike -- which is not a
>   problem, as long as our new friend C11 actually supports this (but
>   I'm not entirely sure).

I'm pretty sure it's ok, regardless of C11 state.  It's not really a
question of sequence points: those are about the ordering of run time
operations.  Even though the structure is being defined inline,
determining it's size and layout will still happen at compile time,
whereas the initialization is obviously a runtime event.

> Then, in 9/17, NLMSG_LENGTH() could be conveniently used by nl_req().
> 
> > +		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK,
> >  		.nlh.nlmsg_seq	  = nl_seq++,
> >  		.ifm.ifi_family	  = AF_UNSPEC,
> >  		.ifm.ifi_index	  = ifi,
> > -		.ifm.ifi_flags	  = up ? IFF_UP : 0,
> > -		.ifm.ifi_change	  = up ? IFF_UP : 0,
> >  	};
> > -	struct ifinfomsg *ifm;
> >  	struct nlmsghdr *nh;
> > -	struct rtattr *rta;
> >  	char buf[NLBUFSIZ];
> >  	ssize_t n;
> > -	size_t na;
> > -
> > -	if (!MAC_IS_ZERO(mac)) {
> > -		req.nlh.nlmsg_len = sizeof(req);
> > -		memcpy(req.set.mac, mac, ETH_ALEN);
> > -		req.rta.rta_type = IFLA_ADDRESS;
> > -		req.rta.rta_len = RTA_LENGTH(ETH_ALEN);
> > -		if (nl_req(ns, buf, &req, req.nlh.nlmsg_len) < 0)
> > -			return;
> > -
> > -		up = 0;
> > -	}
> > -
> > -	if (mtu) {
> > -		req.nlh.nlmsg_len = offsetof(struct req_t, set.mtu)
> > -			+ sizeof(req.set.mtu);
> > -		req.set.mtu.mtu = mtu;
> > -		req.rta.rta_type = IFLA_MTU;
> > -		req.rta.rta_len = RTA_LENGTH(sizeof(unsigned int));
> > -		if (nl_req(ns, buf, &req, req.nlh.nlmsg_len) < 0)
> > -			return;
> > -
> > -		up = 0;
> > -	}
> > -
> > -	if (up && nl_req(ns, buf, &req, req.nlh.nlmsg_len) < 0)
> > -		return;
> > -
> > -	if (change)
> > -		return;
> >  
> > -	if ((n = nl_req(ns, buf, &req, req.nlh.nlmsg_len)) < 0)
> > +	n = nl_req(ns, buf, &req, sizeof(req));
> > +	if (n < 0)
> >  		return;
> > +	
> > +	for (nh = (struct nlmsghdr *)buf;
> > +	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
> > +	     nh = NLMSG_NEXT(nh, n)) {
> > +		struct ifinfomsg *ifm = (struct ifinfomsg *)NLMSG_DATA(nh);
> > +		struct rtattr *rta;
> > +		size_t na;
> >  
> > -	nh = (struct nlmsghdr *)buf;
> > -	for ( ; NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
> >  		if (nh->nlmsg_type != RTM_NEWLINK)
> > -			goto next;
> > -
> > -		ifm = (struct ifinfomsg *)NLMSG_DATA(nh);
> > +			continue;
> >  
> > -		for (rta = IFLA_RTA(ifm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
> > +		for (rta = IFLA_RTA(ifm), na = RTM_PAYLOAD(nh);
> > +		     RTA_OK(rta, na);
> >  		     rta = RTA_NEXT(rta, na)) {
> >  			if (rta->rta_type != IFLA_ADDRESS)
> >  				continue;
> > @@ -570,8 +531,70 @@ void nl_link(int ns, unsigned int ifi, void *mac, int up, int mtu)
> >  			memcpy(mac, RTA_DATA(rta), ETH_ALEN);
> >  			break;
> >  		}
> > -next:
> > -		if (nh->nlmsg_type == NLMSG_DONE)
> > -			break;
> >  	}
> >  }
> > +
> > +/**
> > + * nl_link_set_mac() - Set link MAC address
> > + * @ns:		Use netlink socket in namespace
> > + * @ifi:	Interface index
> > + * @mac:	MAC address to set
> > + */
> > +void nl_link_set_mac(int ns, unsigned int ifi, void *mac)
> > +{
> > +	struct req_t {
> > +		struct nlmsghdr nlh;
> > +		struct ifinfomsg ifm;
> > +		struct rtattr rta;
> > +		unsigned char mac[ETH_ALEN];
> > +	} req = {
> > +		.nlh.nlmsg_type	  = RTM_NEWLINK,
> > +		.nlh.nlmsg_len	  = sizeof(req),
> 
> Same here.
> 
> > +		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK,
> > +		.nlh.nlmsg_seq	  = nl_seq++,
> > +		.ifm.ifi_family	  = AF_UNSPEC,
> > +		.ifm.ifi_index	  = ifi,
> > +		.rta.rta_type	  = IFLA_ADDRESS,
> > +		.rta.rta_len	  = RTA_LENGTH(ETH_ALEN),
> > +	};
> > +	char buf[NLBUFSIZ];
> > +
> > +	memcpy(req.mac, mac, ETH_ALEN);
> > +
> > +	nl_req(ns, buf, &req, sizeof(req));
> > +}
> > +
> > +/**
> > + * nl_link_up() - Bring link up
> > + * @ns:		Use netlink socket in namespace
> > + * @ifi:	Interface index
> > + * @mtu:	If non-zero, set interface MTU
> > + */
> > +void nl_link_up(int ns, unsigned int ifi, int mtu)
> > +{
> > +	struct req_t {
> > +		struct nlmsghdr nlh;
> > +		struct ifinfomsg ifm;
> > +		struct rtattr rta;
> > +		unsigned int mtu;
> > +	} req = {
> > +		.nlh.nlmsg_type   = RTM_NEWLINK,
> > +		.nlh.nlmsg_len    = sizeof(req),
> 
> And here.
> 
> > +		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK,
> > +		.nlh.nlmsg_seq	  = nl_seq++,
> > +		.ifm.ifi_family	  = AF_UNSPEC,
> > +		.ifm.ifi_index	  = ifi,
> > +		.ifm.ifi_flags	  = IFF_UP,
> > +		.ifm.ifi_change	  = IFF_UP,
> > +		.rta.rta_type	  = IFLA_MTU,
> > +		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
> > +		.mtu		  = mtu,
> > +	};
> > +	char buf[NLBUFSIZ];
> > +
> > +	if (!mtu)
> > +		/* Shorten request to drop MTU attribute */
> > +		req.nlh.nlmsg_len = offsetof(struct req_t, rta);
> 
> Pre-existing issue I see now: we should probably use NLMSG_LENGTH()
> here, in any case.

Well.. if NLMSG_LENGTH() really is different here, we're (by
definition) including some of req.rta in the message, which isn't our
intention.  So.. if we trust the rta member to be aligned properly for
the case where we *do* include it, can't we also trust it for the case
where we don't?

> > +
> > +	nl_req(ns, buf, &req, req.nlh.nlmsg_len);
> > +}
> > diff --git a/netlink.h b/netlink.h
> > index cd0e666..980ac44 100644
> > --- a/netlink.h
> > +++ b/netlink.h
> > @@ -18,6 +18,8 @@ void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
> >  	      sa_family_t af, void *gw);
> >  void nl_addr(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
> >  	     sa_family_t af, void *addr, int *prefix_len, void *addr_l);
> > -void nl_link(int ns, unsigned int ifi, void *mac, int up, int mtu);
> > +void nl_link_get_mac(int ns, unsigned int ifi, void *mac);
> > +void nl_link_set_mac(int ns, unsigned int ifi, void *mac);
> > +void nl_link_up(int ns, unsigned int ifi, int mtu);
> >  
> >  #endif /* NETLINK_H */
> > diff --git a/pasta.c b/pasta.c
> > index 8c85546..3b5537d 100644
> > --- a/pasta.c
> > +++ b/pasta.c
> > @@ -272,13 +272,19 @@ void pasta_start_ns(struct ctx *c, uid_t uid, gid_t gid,
> >   */
> >  void pasta_ns_conf(struct ctx *c)
> >  {
> > -	nl_link(1, 1 /* lo */, MAC_ZERO, 1, 0);
> > +	nl_link_up(1, 1 /* lo */, 0);
> > +
> > +	/* Get or set guest MAC */
> 
> I know it's called mac_guest, my bad, but what about "MAC address in
> the target namespace"?

Good idea, changed.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 02/17] netlink: Split nl_addr() into separate operation functions
  2023-08-02 22:47   ` Stefano Brivio
@ 2023-08-03  2:11     ` David Gibson
  0 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2023-08-03  2:11 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 14612 bytes --]

On Thu, Aug 03, 2023 at 12:47:50AM +0200, Stefano Brivio wrote:
> On Mon, 24 Jul 2023 16:09:21 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > nl_addr() can perform three quite different operations based on the 'op'
> > parameter, each of which uses a different subset of the parameters.  Split
> > them up into a function for each operation.  This does use more lines of
> > code, but the overlap wasn't that great, and the separated logic is much
> > easier to follow.
> > 
> > It's also clearer in the callers what we expect the netlink operations to
> > do, and what information it uses.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  conf.c    |  12 ++-
> >  netlink.c | 232 ++++++++++++++++++++++++++++++++----------------------
> >  netlink.h |   6 +-
> >  pasta.c   |  17 ++--
> >  4 files changed, 159 insertions(+), 108 deletions(-)
> > 
> > diff --git a/conf.c b/conf.c
> > index 2ff9e2a..2057028 100644
> > --- a/conf.c
> > +++ b/conf.c
> > @@ -650,10 +650,8 @@ static unsigned int conf_ip4(unsigned int ifi,
> >  	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->gw))
> >  		nl_route(NL_GET, ifi, 0, AF_INET, &ip4->gw);
> >  
> > -	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr)) {
> > -		nl_addr(NL_GET, ifi, 0, AF_INET,
> > -			&ip4->addr, &ip4->prefix_len, NULL);
> > -	}
> > +	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr))
> > +		nl_addr_get(ifi, AF_INET, &ip4->addr, &ip4->prefix_len, NULL);
> >  
> >  	if (!ip4->prefix_len) {
> >  		in_addr_t addr = ntohl(ip4->addr.s_addr);
> > @@ -703,9 +701,9 @@ static unsigned int conf_ip6(unsigned int ifi,
> >  	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->gw))
> >  		nl_route(NL_GET, ifi, 0, AF_INET6, &ip6->gw);
> >  
> > -	nl_addr(NL_GET, ifi, 0, AF_INET6,
> > -		IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ? &ip6->addr : NULL,
> > -		&prefix_len, &ip6->addr_ll);
> > +	nl_addr_get(ifi, AF_INET6,
> > +		    IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ? &ip6->addr : NULL,
> > +		    &prefix_len, &ip6->addr_ll);
> >  
> >  	memcpy(&ip6->addr_seen, &ip6->addr, sizeof(ip6->addr));
> >  	memcpy(&ip6->addr_ll_seen, &ip6->addr_ll, sizeof(ip6->addr_ll));
> > diff --git a/netlink.c b/netlink.c
> > index 4b1f75e..269d738 100644
> > --- a/netlink.c
> > +++ b/netlink.c
> > @@ -334,17 +334,76 @@ next:
> >  }
> >  
> >  /**
> > - * nl_addr() - Get/set/copy IP addresses for given interface and address family
> > - * @op:		Requested operation
> > + * nl_addr_get() - Get IP address for given interface and address family
> >   * @ifi:	Interface index in outer network namespace
> > - * @ifi_ns:	Interface index in target namespace for NL_SET, NL_DUP
> >   * @af:		Address family
> > - * @addr:	Global address to fill on NL_GET, to set on NL_SET
> > - * @prefix_len:	Mask or prefix length, set or fetched (for IPv4)
> > - * @addr_l:	Link-scoped address to fill on NL_GET
> > + * @addr:	Global address to fill
> > + * @prefix_len:	Mask or prefix length, to fill (for IPv4)
> > + * @addr_l:	Link-scoped address to fill (for IPv6)
> > + */
> > +void nl_addr_get(unsigned int ifi, sa_family_t af, void *addr,
> > +		 int *prefix_len, void *addr_l)
> > +{
> > +	struct req_t {
> > +		struct nlmsghdr nlh;
> > +		struct ifaddrmsg ifa;
> > +	} req = {
> > +		.nlh.nlmsg_type    = RTM_GETADDR,
> > +		.nlh.nlmsg_flags   = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
> > +		.nlh.nlmsg_len     = sizeof(req),
> > +		.nlh.nlmsg_seq     = nl_seq++,
> > +
> > +		.ifa.ifa_family    = af,
> > +		.ifa.ifa_index     = ifi,
> > +	};
> > +	struct nlmsghdr *nh;
> > +	char buf[NLBUFSIZ];
> > +	ssize_t n;
> > +
> > +	if ((n = nl_req(0, buf, &req, req.nlh.nlmsg_len)) < 0)
> > +		return;
> > +
> > +	for (nh = (struct nlmsghdr *)buf;
> > +	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
> > +	     nh = NLMSG_NEXT(nh, n)) {
> > +		struct ifaddrmsg *ifa = (struct ifaddrmsg *)NLMSG_DATA(nh);
> > +		struct rtattr *rta;
> > +		size_t na;
> > +
> > +		if (nh->nlmsg_type != RTM_NEWADDR)
> > +			continue;
> > +
> > +		if (ifa->ifa_index != ifi)
> > +			continue;
> > +
> > +		for (rta = IFA_RTA(ifa), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
> > +		     rta = RTA_NEXT(rta, na)) {
> > +			if (rta->rta_type != IFA_ADDRESS)
> > +				continue;
> > +
> > +			if (af == AF_INET) {
> > +				memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta));
> > +				*prefix_len = ifa->ifa_prefixlen;
> > +			} else if (af == AF_INET6 && addr &&
> > +				   ifa->ifa_scope == RT_SCOPE_UNIVERSE) {
> > +				memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta));
> > +			}
> > +
> > +			if (addr_l &&
> > +			    af == AF_INET6 && ifa->ifa_scope == RT_SCOPE_LINK)
> > +				memcpy(addr_l, RTA_DATA(rta), RTA_PAYLOAD(rta));
> > +		}
> > +	}
> > +}
> > +
> > +/**
> > + * nl_add_set() - Set IP addresses for given interface and address family
> > + * @ifi:	Interface index
> > + * @af:		Address family
> > + * @addr:	Global address to set
> > + * @prefix_len:	Mask or prefix length to set
> >   */
> > -void nl_addr(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
> > -	     sa_family_t af, void *addr, int *prefix_len, void *addr_l)
> > +void nl_addr_set(unsigned int ifi, sa_family_t af, void *addr, int prefix_len)
> >  {
> >  	struct req_t {
> >  		struct nlmsghdr nlh;
> > @@ -364,125 +423,112 @@ void nl_addr(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
> >  			} a6;
> >  		} set;
> >  	} req = {
> > -		.nlh.nlmsg_type    = op == NL_SET ? RTM_NEWADDR : RTM_GETADDR,
> > -		.nlh.nlmsg_flags   = NLM_F_REQUEST,
> > +		.nlh.nlmsg_type    = RTM_NEWADDR,
> > +		.nlh.nlmsg_flags   = NLM_F_REQUEST | NLM_F_ACK |
> > +				     NLM_F_CREATE | NLM_F_EXCL,
> >  		.nlh.nlmsg_len     = NLMSG_LENGTH(sizeof(struct ifaddrmsg)),
> >  		.nlh.nlmsg_seq     = nl_seq++,
> >  
> >  		.ifa.ifa_family    = af,
> > -		.ifa.ifa_index     = op == NL_SET ? ifi_ns : ifi,
> > -		.ifa.ifa_prefixlen = op == NL_SET ? *prefix_len : 0,
> > +		.ifa.ifa_index     = ifi,
> > +		.ifa.ifa_prefixlen = prefix_len,
> > +		.ifa.ifa_scope	   = RT_SCOPE_UNIVERSE,
> >  	};
> > -	ssize_t n, nlmsgs_size;
> > -	struct ifaddrmsg *ifa;
> > -	struct nlmsghdr *nh;
> > -	struct rtattr *rta;
> >  	char buf[NLBUFSIZ];
> > -	size_t na;
> >  
> > -	if (op == NL_SET) {
> > -		if (af == AF_INET6) {
> > -			size_t rta_len = RTA_LENGTH(sizeof(req.set.a6.l));
> > +	if (af == AF_INET6) {
> > +		size_t rta_len = RTA_LENGTH(sizeof(req.set.a6.l));
> >  
> > -			/* By default, strictly speaking, it's duplicated */
> > -			req.ifa.ifa_flags = IFA_F_NODAD;
> > +		/* By default, strictly speaking, it's duplicated */
> > +		req.ifa.ifa_flags = IFA_F_NODAD;
> >  
> > -			req.nlh.nlmsg_len = offsetof(struct req_t, set.a6)
> > -				+ sizeof(req.set.a6);
> > +		req.nlh.nlmsg_len = offsetof(struct req_t, set.a6)
> > +			+ sizeof(req.set.a6);
> >  
> > -			memcpy(&req.set.a6.l, addr, sizeof(req.set.a6.l));
> > -			req.set.a6.rta_l.rta_len = rta_len;
> > -			req.set.a4.rta_l.rta_type = IFA_LOCAL;
> > -			memcpy(&req.set.a6.a, addr, sizeof(req.set.a6.a));
> > -			req.set.a6.rta_a.rta_len = rta_len;
> > -			req.set.a6.rta_a.rta_type = IFA_ADDRESS;
> > -		} else {
> > -			size_t rta_len = RTA_LENGTH(sizeof(req.set.a4.l));
> > -
> > -			req.nlh.nlmsg_len = offsetof(struct req_t, set.a4)
> > -				+ sizeof(req.set.a4);
> > +		memcpy(&req.set.a6.l, addr, sizeof(req.set.a6.l));
> > +		req.set.a6.rta_l.rta_len = rta_len;
> > +		req.set.a4.rta_l.rta_type = IFA_LOCAL;
> > +		memcpy(&req.set.a6.a, addr, sizeof(req.set.a6.a));
> > +		req.set.a6.rta_a.rta_len = rta_len;
> > +		req.set.a6.rta_a.rta_type = IFA_ADDRESS;
> > +	} else {
> > +		size_t rta_len = RTA_LENGTH(sizeof(req.set.a4.l));
> >  
> > -			req.set.a4.l = req.set.a4.a = *(uint32_t *)addr;
> > -			req.set.a4.rta_l.rta_len = rta_len;
> > -			req.set.a4.rta_l.rta_type = IFA_LOCAL;
> > -			req.set.a4.rta_a.rta_len = rta_len;
> > -			req.set.a4.rta_a.rta_type = IFA_ADDRESS;
> > -		}
> > +		req.nlh.nlmsg_len = offsetof(struct req_t, set.a4)
> > +			+ sizeof(req.set.a4);
> >  
> > -		req.ifa.ifa_scope = RT_SCOPE_UNIVERSE;
> > -		req.nlh.nlmsg_flags |= NLM_F_CREATE | NLM_F_ACK | NLM_F_EXCL;
> > -	} else {
> > -		req.nlh.nlmsg_flags |= NLM_F_DUMP;
> > +		req.set.a4.l = req.set.a4.a = *(uint32_t *)addr;
> > +		req.set.a4.rta_l.rta_len = rta_len;
> > +		req.set.a4.rta_l.rta_type = IFA_LOCAL;
> > +		req.set.a4.rta_a.rta_len = rta_len;
> > +		req.set.a4.rta_a.rta_type = IFA_ADDRESS;
> >  	}
> >  
> > -	if ((n = nl_req(op == NL_SET, buf, &req, req.nlh.nlmsg_len)) < 0)
> > -		return;
> > +	nl_req(1, buf, &req, req.nlh.nlmsg_len);
> > +}
> >  
> > -	if (op == NL_SET)
> > +/**
> > + * nl_addr_dup() - Copy IP addresses for given interface and address family
> > + * @ifi:	Interface index in outer network namespace
> > + * @ifi_ns:	Interface index in target namespace
> > + * @af:		Address family
> > + */
> > +void nl_addr_dup(unsigned int ifi, unsigned int ifi_ns, sa_family_t af)
> > +{
> > +	struct req_t {
> > +		struct nlmsghdr nlh;
> > +		struct ifaddrmsg ifa;
> > +	} req = {
> > +		.nlh.nlmsg_type    = RTM_GETADDR,
> > +		.nlh.nlmsg_flags   = NLM_F_REQUEST | NLM_F_DUMP,
> > +		.nlh.nlmsg_len     = sizeof(req),
> > +		.nlh.nlmsg_seq     = nl_seq++,
> > +
> > +		.ifa.ifa_family    = af,
> > +		.ifa.ifa_index     = ifi,
> > +		.ifa.ifa_prefixlen = 0,
> > +	};
> > +	char buf[NLBUFSIZ], resp[NLBUFSIZ];
> > +	ssize_t n, nlmsgs_size;
> > +	struct nlmsghdr *nh;
> > +
> > +	if ((n = nl_req(0, buf, &req, sizeof(req))) < 0)
> >  		return;
> >  
> > -	nh = (struct nlmsghdr *)buf;
> >  	nlmsgs_size = n;
> >  
> > -	for ( ; NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
> > +	for (nh = (struct nlmsghdr *)buf;
> > +	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
> > +	     nh = NLMSG_NEXT(nh, n)) {
> > +		struct ifaddrmsg *ifa;
> > +		struct rtattr *rta;
> > +		size_t na;
> > +
> >  		if (nh->nlmsg_type != RTM_NEWADDR)
> > -			goto next;
> > +			continue;
> >  
> > -		if (op == NL_DUP) {
> > -			nh->nlmsg_seq = nl_seq++;
> > -			nh->nlmsg_pid = 0;
> > -			nh->nlmsg_flags &= ~NLM_F_DUMP_FILTERED;
> > -			nh->nlmsg_flags |= NLM_F_REQUEST | NLM_F_ACK |
> > -					   NLM_F_CREATE;
> > -		}
> > +		nh->nlmsg_seq = nl_seq++;
> > +		nh->nlmsg_pid = 0;
> > +		nh->nlmsg_flags &= ~NLM_F_DUMP_FILTERED;
> > +		nh->nlmsg_flags |= NLM_F_REQUEST | NLM_F_ACK | NLM_F_CREATE;
> >  
> >  		ifa = (struct ifaddrmsg *)NLMSG_DATA(nh);
> >  
> > -		if (op == NL_DUP && (ifa->ifa_scope == RT_SCOPE_LINK ||
> > -				     ifa->ifa_index != ifi)) {
> > +		if (ifa->ifa_scope == RT_SCOPE_LINK || ifa->ifa_index != ifi) {
> >  			ifa->ifa_family = AF_UNSPEC;
> > -			goto next;
> > +			continue;
> >  		}
> >  
> > -		if (ifa->ifa_index != ifi)
> > -			goto next;
> > -
> > -		if (op == NL_DUP)
> > -			ifa->ifa_index = ifi_ns;
> > +		ifa->ifa_index = ifi_ns;
> >  
> >  		for (rta = IFA_RTA(ifa), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
> >  		     rta = RTA_NEXT(rta, na)) {
> > -			if (op == NL_DUP && rta->rta_type == IFA_LABEL)
> > +			if (rta->rta_type == IFA_LABEL)
> >  				rta->rta_type = IFA_UNSPEC;
> > -
> > -			if (op == NL_DUP || rta->rta_type != IFA_ADDRESS)
> > -				continue;
> > -
> > -			if (af == AF_INET && addr && !*(uint32_t *)addr) {
> > -				memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta));
> > -				*prefix_len = ifa->ifa_prefixlen;
> > -			} else if (af == AF_INET6 && addr &&
> > -				 ifa->ifa_scope == RT_SCOPE_UNIVERSE &&
> > -				 IN6_IS_ADDR_UNSPECIFIED(addr)) {
> > -				memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta));
> > -			}
> > -
> > -			if (addr_l &&
> > -			    af == AF_INET6 && ifa->ifa_scope == RT_SCOPE_LINK &&
> > -			    IN6_IS_ADDR_UNSPECIFIED(addr_l))
> > -				memcpy(addr_l, RTA_DATA(rta), RTA_PAYLOAD(rta));
> >  		}
> > -next:
> > -		if (nh->nlmsg_type == NLMSG_DONE)
> > -			break;
> >  	}
> >  
> > -	if (op == NL_DUP) {
> > -		char resp[NLBUFSIZ];
> > -
> > -		nh = (struct nlmsghdr *)buf;
> > -		nl_req(1, resp, nh, nlmsgs_size);
> > -	}
> > +	nl_req(1, resp, buf, nlmsgs_size);
> >  }
> >  
> >  /**
> > diff --git a/netlink.h b/netlink.h
> > index 980ac44..5ac972d 100644
> > --- a/netlink.h
> > +++ b/netlink.h
> > @@ -16,8 +16,10 @@ void nl_sock_init(const struct ctx *c, bool ns);
> >  unsigned int nl_get_ext_if(sa_family_t af);
> >  void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
> >  	      sa_family_t af, void *gw);
> > -void nl_addr(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
> > -	     sa_family_t af, void *addr, int *prefix_len, void *addr_l);
> > +void nl_addr_get(unsigned int ifi, sa_family_t af, void *addr,
> > +		 int *prefix_len, void *addr_l);
> > +void nl_addr_set(unsigned int ifi, sa_family_t af, void *addr, int prefix_len);
> > +void nl_addr_dup(unsigned int ifi, unsigned int ifi_ns, sa_family_t af);
> >  void nl_link_get_mac(int ns, unsigned int ifi, void *mac);
> >  void nl_link_set_mac(int ns, unsigned int ifi, void *mac);
> >  void nl_link_up(int ns, unsigned int ifi, int mtu);
> > diff --git a/pasta.c b/pasta.c
> > index 3b5537d..1a8f09c 100644
> > --- a/pasta.c
> > +++ b/pasta.c
> > @@ -282,21 +282,26 @@ void pasta_ns_conf(struct ctx *c)
> >  
> >  	if (c->pasta_conf_ns) {
> >  		enum nl_op op_routes = c->no_copy_routes ? NL_SET : NL_DUP;
> > -		enum nl_op op_addrs =  c->no_copy_addrs  ? NL_SET : NL_DUP;
> >  
> >  		nl_link_up(1, c->pasta_ifi, c->mtu);
> >  
> >  		if (c->ifi4) {
> > -			nl_addr(op_addrs, c->ifi4, c->pasta_ifi, AF_INET,
> > -				&c->ip4.addr, &c->ip4.prefix_len, NULL);
> > +			if (c->no_copy_addrs)
> > +				nl_addr_set(c->pasta_ifi, AF_INET, 
> > +					    &c->ip4.addr, c->ip4.prefix_len);
> > +			else
> > +				nl_addr_dup(c->ifi4, c->pasta_ifi, AF_INET);
> > +
> >  			nl_route(op_routes, c->ifi4, c->pasta_ifi, AF_INET,
> >  				 &c->ip4.gw);
> >  		}
> >  
> >  		if (c->ifi6) {
> > -			int prefix_len = 64;
> > -			nl_addr(op_addrs, c->ifi6, c->pasta_ifi, AF_INET6,
> > -				&c->ip6.addr, &prefix_len, NULL);
> > +			if (c->no_copy_addrs)
> > +				nl_addr_set(c->pasta_ifi, AF_INET6, &c->ip6.addr, 64);
> > +			else
> > +				nl_addr_dup(c->ifi4, c->pasta_ifi, AF_INET6);
> 
> I guess this should be c->ifi6 (also in 17/17).

Oops, yes.  Fixed.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/17] netlink: Split nl_route() into separate operation functions
  2023-08-02 22:47   ` Stefano Brivio
@ 2023-08-03  2:18     ` David Gibson
  0 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2023-08-03  2:18 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 12389 bytes --]

On Thu, Aug 03, 2023 at 12:47:40AM +0200, Stefano Brivio wrote:
> On Mon, 24 Jul 2023 16:09:22 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > nl_route() can perform 3 quite different operations based on the 'op'
> > parameter.  Split this into separate functions for each one.  This requires
> > more lines of code, but makes the internal logic of each operation much
> > easier to follow.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  conf.c    |   4 +-
> >  netlink.c | 238 ++++++++++++++++++++++++++++++++++--------------------
> >  netlink.h |  11 +--
> >  pasta.c   |  16 ++--
> >  4 files changed, 164 insertions(+), 105 deletions(-)
> > 
> > diff --git a/conf.c b/conf.c
> > index 2057028..66958d4 100644
> > --- a/conf.c
> > +++ b/conf.c
> > @@ -648,7 +648,7 @@ static unsigned int conf_ip4(unsigned int ifi,
> >  	}
> >  
> >  	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->gw))
> > -		nl_route(NL_GET, ifi, 0, AF_INET, &ip4->gw);
> > +		nl_route_get_def(ifi, AF_INET, &ip4->gw);
> >  
> >  	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr))
> >  		nl_addr_get(ifi, AF_INET, &ip4->addr, &ip4->prefix_len, NULL);
> > @@ -699,7 +699,7 @@ static unsigned int conf_ip6(unsigned int ifi,
> >  	}
> >  
> >  	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->gw))
> > -		nl_route(NL_GET, ifi, 0, AF_INET6, &ip6->gw);
> > +		nl_route_get_def(ifi, AF_INET6, &ip6->gw);
> >  
> >  	nl_addr_get(ifi, AF_INET6,
> >  		    IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ? &ip6->addr : NULL,
> > diff --git a/netlink.c b/netlink.c
> > index 269d738..346eb3a 100644
> > --- a/netlink.c
> > +++ b/netlink.c
> > @@ -185,15 +185,71 @@ unsigned int nl_get_ext_if(sa_family_t af)
> >  }
> >  
> >  /**
> > - * nl_route() - Get/set/copy routes for given interface and address family
> > - * @op:		Requested operation
> > - * @ifi:	Interface index in outer network namespace
> > - * @ifi_ns:	Interface index in target namespace for NL_SET, NL_DUP
> > + * nl_route_get_def() - Get default route for given interface and address family
> > + * @ifi:	Interface index
> > + * @af:		Address family
> > + * @gw:		Default gateway to fill on NL_GET
> > + */
> > +void nl_route_get_def(unsigned int ifi, sa_family_t af, void *gw)
> > +{
> > +	struct req_t {
> > +		struct nlmsghdr nlh;
> > +		struct rtmsg rtm;
> > +		struct rtattr rta;
> > +		unsigned int ifi;
> > +	} req = {
> > +		.nlh.nlmsg_type	  = RTM_GETROUTE,
> > +		.nlh.nlmsg_len	  = sizeof(req),
> > +		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_DUMP,
> > +		.nlh.nlmsg_seq	  = nl_seq++,
> > +
> > +		.rtm.rtm_family	  = af,
> > +		.rtm.rtm_table	  = RT_TABLE_MAIN,
> > +		.rtm.rtm_scope	  = RT_SCOPE_UNIVERSE,
> > +		.rtm.rtm_type	  = RTN_UNICAST,
> > +
> > +		.rta.rta_type	  = RTA_OIF,
> > +		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
> > +		.ifi		  = ifi,
> > +	};
> > +	struct nlmsghdr *nh;
> > +	char buf[NLBUFSIZ];
> > +	ssize_t n;
> > +
> > +	if ((n = nl_req(0, buf, &req, req.nlh.nlmsg_len)) < 0)
> > +		return;
> > +
> > +	for (nh = (struct nlmsghdr *)buf;
> > +	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
> > +	     nh = NLMSG_NEXT(nh, n)) {
> > +		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
> > +		struct rtattr *rta;
> > +		size_t na;
> > +
> > +		if (nh->nlmsg_type != RTM_NEWROUTE)
> > +			continue;
> > +
> > +		if (rtm->rtm_dst_len)
> > +			continue;
> > +
> > +		for (rta = RTM_RTA(rtm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
> > +		     rta = RTA_NEXT(rta, na)) {
> > +			if (rta->rta_type != RTA_GATEWAY)
> > +				continue;
> > +
> > +			memcpy(gw, RTA_DATA(rta), RTA_PAYLOAD(rta));
> > +			return;
> > +		}
> > +	}
> > +}
> > +
> > +/**
> > + * nl_route_set_def() - Set default route for given interface and address family
> > + * @ifi:	Interface index in target namespace
> >   * @af:		Address family
> > - * @gw:		Default gateway to fill on NL_GET, to set on NL_SET
> > + * @gw:		Default gateway to set
> >   */
> > -void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
> > -	      sa_family_t af, void *gw)
> > +void nl_route_set_def(unsigned int ifi, sa_family_t af, void *gw)
> >  {
> >  	struct req_t {
> >  		struct nlmsghdr nlh;
> > @@ -215,122 +271,126 @@ void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
> >  			} r4;
> >  		} set;
> >  	} req = {
> > -		.nlh.nlmsg_type	  = op == NL_SET ? RTM_NEWROUTE : RTM_GETROUTE,
> > -		.nlh.nlmsg_flags  = NLM_F_REQUEST,
> > +		.nlh.nlmsg_type	  = RTM_NEWROUTE,
> > +		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK |
> > +				    NLM_F_CREATE | NLM_F_EXCL,
> >  		.nlh.nlmsg_seq	  = nl_seq++,
> >  
> >  		.rtm.rtm_family	  = af,
> >  		.rtm.rtm_table	  = RT_TABLE_MAIN,
> >  		.rtm.rtm_scope	  = RT_SCOPE_UNIVERSE,
> >  		.rtm.rtm_type	  = RTN_UNICAST,
> > +		.rtm.rtm_protocol = RTPROT_BOOT,
> >  
> >  		.rta.rta_type	  = RTA_OIF,
> >  		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
> > -		.ifi		  = op == NL_SET ? ifi_ns : ifi,
> > +		.ifi		  = ifi,
> >  	};
> > -	unsigned dup_routes = 0;
> > -	ssize_t n, nlmsgs_size;
> > -	struct nlmsghdr *nh;
> > -	struct rtattr *rta;
> >  	char buf[NLBUFSIZ];
> > -	struct rtmsg *rtm;
> > -	size_t na;
> > -
> > -	if (op == NL_SET) {
> > -		if (af == AF_INET6) {
> > -			size_t rta_len = RTA_LENGTH(sizeof(req.set.r6.d));
> >  
> > -			req.nlh.nlmsg_len = offsetof(struct req_t, set.r6)
> > -				+ sizeof(req.set.r6);
> > +	if (af == AF_INET6) {
> > +		size_t rta_len = RTA_LENGTH(sizeof(req.set.r6.d));
> >  
> > -			req.set.r6.rta_dst.rta_type = RTA_DST;
> > -			req.set.r6.rta_dst.rta_len = rta_len;
> > +		req.nlh.nlmsg_len = offsetof(struct req_t, set.r6)
> > +			+ sizeof(req.set.r6);
> >  
> > -			memcpy(&req.set.r6.a, gw, sizeof(req.set.r6.a));
> > -			req.set.r6.rta_gw.rta_type = RTA_GATEWAY;
> > -			req.set.r6.rta_gw.rta_len = rta_len;
> > -		} else {
> > -			size_t rta_len = RTA_LENGTH(sizeof(req.set.r4.d));
> > +		req.set.r6.rta_dst.rta_type = RTA_DST;
> > +		req.set.r6.rta_dst.rta_len = rta_len;
> >  
> > -			req.nlh.nlmsg_len = offsetof(struct req_t, set.r4)
> > -				+ sizeof(req.set.r4);
> > +		memcpy(&req.set.r6.a, gw, sizeof(req.set.r6.a));
> > +		req.set.r6.rta_gw.rta_type = RTA_GATEWAY;
> > +		req.set.r6.rta_gw.rta_len = rta_len;
> > +	} else {
> > +		size_t rta_len = RTA_LENGTH(sizeof(req.set.r4.d));
> >  
> > -			req.set.r4.rta_dst.rta_type = RTA_DST;
> > -			req.set.r4.rta_dst.rta_len = rta_len;
> > +		req.nlh.nlmsg_len = offsetof(struct req_t, set.r4)
> > +			+ sizeof(req.set.r4);
> >  
> > -			req.set.r4.a = *(uint32_t *)gw;
> > -			req.set.r4.rta_gw.rta_type = RTA_GATEWAY;
> > -			req.set.r4.rta_gw.rta_len = rta_len;
> > -		}
> > +		req.set.r4.rta_dst.rta_type = RTA_DST;
> > +		req.set.r4.rta_dst.rta_len = rta_len;
> >  
> > -		req.rtm.rtm_protocol = RTPROT_BOOT;
> > -		req.nlh.nlmsg_flags |= NLM_F_ACK | NLM_F_EXCL | NLM_F_CREATE;
> > -	} else {
> > -		req.nlh.nlmsg_len = offsetof(struct req_t, set.r6);
> > -		req.nlh.nlmsg_flags |= NLM_F_DUMP;
> > +		req.set.r4.a = *(uint32_t *)gw;
> > +		req.set.r4.rta_gw.rta_type = RTA_GATEWAY;
> > +		req.set.r4.rta_gw.rta_len = rta_len;
> >  	}
> >  
> > -	if ((n = nl_req(op == NL_SET, buf, &req, req.nlh.nlmsg_len)) < 0)
> > -		return;
> > +	nl_req(1, buf, &req, req.nlh.nlmsg_len);
> > +}
> >  
> > -	if (op == NL_SET)
> > +/**
> > + * nl_route_dup() - Copy routes for given interface and address family
> > + * @ifi:	Interface index in outer network namespace
> > + * @ifi_ns:	Interface index in target namespace for NL_SET, NL_DUP
> > + * @af:		Address family
> > + */
> > +void nl_route_dup(unsigned int ifi, unsigned int ifi_ns, sa_family_t af)
> > +{
> > +	struct req_t {
> > +		struct nlmsghdr nlh;
> > +		struct rtmsg rtm;
> > +		struct rtattr rta;
> > +		unsigned int ifi;
> > +	} req = {
> > +		.nlh.nlmsg_type	  = RTM_GETROUTE,
> > +		.nlh.nlmsg_len	  = sizeof(req),
> > +		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_DUMP,
> > +		.nlh.nlmsg_seq	  = nl_seq++,
> > +
> > +		.rtm.rtm_family	  = af,
> > +		.rtm.rtm_table	  = RT_TABLE_MAIN,
> > +		.rtm.rtm_scope	  = RT_SCOPE_UNIVERSE,
> > +		.rtm.rtm_type	  = RTN_UNICAST,
> > +
> > +		.rta.rta_type	  = RTA_OIF,
> > +		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
> > +		.ifi		  = ifi,
> > +	};
> > +	char buf[NLBUFSIZ], resp[NLBUFSIZ];
> > +	unsigned dup_routes = 0;
> > +	ssize_t n, nlmsgs_size;
> > +	struct nlmsghdr *nh;
> > +	unsigned i;
> > +
> > +	if ((n = nl_req(0, buf, &req, req.nlh.nlmsg_len)) < 0)
> >  		return;
> >  
> > -	nh = (struct nlmsghdr *)buf;
> >  	nlmsgs_size = n;
> >  
> > -	for ( ; NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
> > -		if (nh->nlmsg_type != RTM_NEWROUTE)
> > -			goto next;
> > -
> > -		if (op == NL_DUP) {
> > -			nh->nlmsg_seq = nl_seq++;
> > -			nh->nlmsg_pid = 0;
> > -			nh->nlmsg_flags &= ~NLM_F_DUMP_FILTERED;
> > -			nh->nlmsg_flags |= NLM_F_REQUEST | NLM_F_ACK |
> > -					   NLM_F_CREATE;
> > -			dup_routes++;
> > -		}
> > +	for (nh = (struct nlmsghdr *)buf;
> > +	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
> > +	     nh = NLMSG_NEXT(nh, n)) {
> > +		struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
> > +		struct rtattr *rta;
> > +		size_t na;
> >  
> > -		rtm = (struct rtmsg *)NLMSG_DATA(nh);
> > -		if (op == NL_GET && rtm->rtm_dst_len)
> > +		if (nh->nlmsg_type != RTM_NEWROUTE)
> >  			continue;
> >  
> > +		nh->nlmsg_seq = nl_seq++;
> > +		nh->nlmsg_pid = 0;
> > +		nh->nlmsg_flags &= ~NLM_F_DUMP_FILTERED;
> > +		nh->nlmsg_flags |= NLM_F_REQUEST | NLM_F_ACK |
> > +			NLM_F_CREATE;
> > +		dup_routes++;
> > +
> >  		for (rta = RTM_RTA(rtm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
> >  		     rta = RTA_NEXT(rta, na)) {
> > -			if (op == NL_GET) {
> > -				if (rta->rta_type != RTA_GATEWAY)
> > -					continue;
> > -
> > -				memcpy(gw, RTA_DATA(rta), RTA_PAYLOAD(rta));
> > -				return;
> > -			}
> > -
> > -			if (op == NL_DUP && rta->rta_type == RTA_OIF)
> > +			if (rta->rta_type == RTA_OIF)
> >  				*(unsigned int *)RTA_DATA(rta) = ifi_ns;
> >  		}
> > -
> > -next:
> > -		if (nh->nlmsg_type == NLMSG_DONE)
> > -			break;
> >  	}
> >  
> > -	if (op == NL_DUP) {
> > -		char resp[NLBUFSIZ];
> > -		unsigned i;
> > -
> > -		nh = (struct nlmsghdr *)buf;
> > -		/* Routes might have dependencies between each other, and the
> > -		 * kernel processes RTM_NEWROUTE messages sequentially. For n
> > -		 * valid routes, we might need to send up to n requests to get
> > -		 * all of them inserted. Routes that have been already inserted
> > -		 * won't cause the whole request to fail, so we can simply
> > -		 * repeat the whole request. This approach avoids the need to
> > -		 * calculate dependencies: let the kernel do that.
> > -		 */
> > -		for (i = 0; i < dup_routes; i++)
> > -			nl_req(1, resp, nh, nlmsgs_size);
> > -	}
> > +	nh = (struct nlmsghdr *)buf;
> > +	/* Routes might have dependencies between each other, and the
> > +	 * kernel processes RTM_NEWROUTE messages sequentially. For n
> > +	 * valid routes, we might need to send up to n requests to get
> > +	 * all of them inserted. Routes that have been already
> > +	 * inserted won't cause the whole request to fail, so we can
> > +	 * simply repeat the whole request. This approach avoids the
> > +	 * need to calculate dependencies: let the kernel do that.
> > +	 */
> 
> Or:
> 
> 	/* Routes might have dependencies between each other, and the kernel
> 	 * processes RTM_NEWROUTE messages sequentially. For n valid routes, we
> 	 * might need to send up to n requests to get all of them inserted.
> 	 * Routes that have been already inserted won't cause the whole request
> 	 * to fail, so we can simply repeat the whole request. This approach
> 	 * avoids the need to calculate dependencies: let the kernel do that.
> 	 */
> 
> (can also be "fixed" in 6/17).

Huh... I just used M-q to wrap these in emacs, and it turns out the
default fill-column value is 70, not 80.  Fixed up my config and
adjusted this accordingly.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 08/17] netlink: Treat send() or recv() errors as fatal
  2023-08-02 22:47   ` Stefano Brivio
@ 2023-08-03  2:19     ` David Gibson
  0 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2023-08-03  2:19 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 2177 bytes --]

On Thu, Aug 03, 2023 at 12:47:59AM +0200, Stefano Brivio wrote:
> On Mon, 24 Jul 2023 16:09:27 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > Errors on send() or recv() calls on a netlink socket don't indicate errors
> > with the netlink operations we're attempting, but rather that something's
> > gone wrong with the mechanics of netlink itself.  We don't really expect
> > this to ever happen, and if it does, it's not clear what we could to to
> > recover.
> > 
> > So, treat errors from these calls as fatal, rather than returning the error
> > up the stack.  This makes handling failures in the callers of nl_req()
> > simpler.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  netlink.c | 36 +++++++++++++++++-------------------
> >  1 file changed, 17 insertions(+), 19 deletions(-)
> > 
> > diff --git a/netlink.c b/netlink.c
> > index 3620fd6..826c926 100644
> > --- a/netlink.c
> > +++ b/netlink.c
> > @@ -103,9 +103,9 @@ fail:
> >   * @req:	Request with netlink header
> >   * @len:	Request length
> >   *
> > - * Return: received length on success, negative error code on failure
> > + * Return: received length on success, terminates on error
> >   */
> > -static int nl_req(int s, char *buf, const void *req, ssize_t len)
> > +static ssize_t nl_req(int s, char *buf, const void *req, ssize_t len)
> >  {
> >  	char flush[NLBUFSIZ];
> >  	int done = 0;
> > @@ -124,11 +124,17 @@ static int nl_req(int s, char *buf, const void *req, ssize_t len)
> >  		}
> >  	}
> >  
> > -	if ((send(s, req, len, 0) < len) ||
> > -	    (len = recv(s, buf, NLBUFSIZ, 0)) < 0)
> > -		return -errno;
> > +	n = send(s, req, len, 0);
> > +	if (n < 0)
> > +		die("netlink: Failed to send(): %s", strerror(errno));
> > +	else if (n < len)
> > +		die("netlink: Short send");
> 
> If you respin, probably worth doing:
> 
> 		die("netlink: Short send (%li out of %li bytes)", n, len);

Done.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 11/17] netlink: Clearer reasoning about the netlink response buffer size
  2023-08-02 22:48   ` Stefano Brivio
@ 2023-08-03  2:22     ` David Gibson
  0 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2023-08-03  2:22 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 1975 bytes --]

On Thu, Aug 03, 2023 at 12:48:14AM +0200, Stefano Brivio wrote:
> On Mon, 24 Jul 2023 16:09:30 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > Currently we set NLBUFSIZ large enough for 8192 netlink headers (128kiB in
> > total), and reference netlink(7).  However netlink(7) says nothing about
> > reponse buffer sizes, and the documents which do reference 8192 *bytes* not
> > 8192 headers.
> 
> Oops.
> 
> > Update NLBUFSIZ to 64kiB with a more detailed rationale.
> > 
> > Link: https://bugs.passt.top/show_bug.cgi?id=67
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  netlink.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/netlink.c b/netlink.c
> > index cdd65c0..d553ddd 100644
> > --- a/netlink.c
> > +++ b/netlink.c
> > @@ -35,7 +35,14 @@
> >  #include "log.h"
> >  #include "netlink.h"
> >  
> > -#define NLBUFSIZ	(8192 * sizeof(struct nlmsghdr)) /* See netlink(7) */
> > +/* Netlink expects a buffer of at least 8kiB or the system page size,
> > + * whichever is larger.  32kiB is recommended for more efficient.
> > + * Since the largest page size on any remotely common Linux setup is
> > + * 64kiB (ppc64), that should cover it.
> > + *
> > + * https://www.kernel.org/doc/html/next/userspace-api/netlink/intro.html#buffer-sizing
> > + */
> > +#define NLBUFSIZ 65536
> 
> I'm fine with this, but we also have PAGE_SIZE and MAX() defined. Or
> maybe it's more reasonable to keep this constant. I'm not sure.

Well, my thought was that this approach also works for the rare case
that the runtime page size doesn't equal the compile time page size
(e.g. built on ppc64le with 4kiB pagesize, then run on ppc64le with
64kiB pagesize).

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 10/17] netlink: Add nl_do() helper for simple operations with error checking
  2023-08-02 22:48   ` Stefano Brivio
@ 2023-08-03  2:24     ` David Gibson
  0 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2023-08-03  2:24 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 3428 bytes --]

On Thu, Aug 03, 2023 at 12:48:07AM +0200, Stefano Brivio wrote:
> On Mon, 24 Jul 2023 16:09:29 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > So far we never checked for errors reported on netlink operations via
> > NLMSG_ERROR messages.  This has led to several subtle and tricky to debug
> > situations which would have been obvious if we knew that certain netlink
> > operations had failed.
> > 
> > Introduce a nl_do() helper that performs netlink "do" operations (that is
> > making a single change without retreiving complex information) with much
> > more thorough error checking.  As well as returning an error code if we
> > get an NLMSG_ERROR message, we also check for unexpected behaviour in
> > several places.  That way if we've made a mistake in our assumptions about
> > how netlink works it should result in a clear error rather than some subtle
> > misbehaviour.
> > 
> > We update those calls to nl_req() that can use the new wrapper to do so.
> > We will extend those to better handle errors in future.  We don't touch
> > non-"do" operations for now, those are a bit trickier.
> > 
> > Link: https://bugs.passt.top/show_bug.cgi?id=60
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  netlink.c | 59 ++++++++++++++++++++++++++++++++++++++++++++-----------
> >  1 file changed, 47 insertions(+), 12 deletions(-)
> > 
> > diff --git a/netlink.c b/netlink.c
> > index 3170344..cdd65c0 100644
> > --- a/netlink.c
> > +++ b/netlink.c
> > @@ -148,6 +148,47 @@ static ssize_t nl_req(int s, char *buf, void *req,
> >  	return n;
> >  }
> >  
> > +/**
> > + * nl_do() - Send netlink "do" request, and wait for acknowledgement
> > + * @s:		Netlink socket
> > + * @req:	Request (will fill netlink header)
> > + * @type:	Request type
> > + * @flags:	Extra request flags (NLM_F_REQUEST and NLM_F_ACK assumed)
> > + * @len:	Request length
> > + *
> > + * Return: 0 on success, negative error code on error
> > + */
> > +static int nl_do(int s, void *req, uint16_t type, uint16_t flags, ssize_t len)
> > +{
> > +	struct nlmsghdr *nh;
> > +	char buf[NLBUFSIZ];
> > +	uint16_t seq;
> > +	ssize_t n;
> > +
> > +	n = nl_req(s, buf, req, type, flags, len);
> > +	seq = ((struct nlmsghdr *)req)->nlmsg_seq;
> > +
> > +	for (nh = (struct nlmsghdr *)buf;
> > +	     NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
> > +		struct nlmsgerr *errmsg;
> > +
> > +		if (nh->nlmsg_seq != seq)
> > +			die("netlink: Unexpected response sequence number");
> > +
> > +		switch (nh->nlmsg_type) {
> > +		case NLMSG_DONE:
> > +			return 0;
> > +		case NLMSG_ERROR:
> > +			errmsg = (struct nlmsgerr *)NLMSG_DATA(nh);
> > +			return errmsg->error;
> 
> This is an errno, we should probably print it here ...and, now reading
> 14/17 and 16/17: saving repeated strerror() calls there. On the other
> hand this has the advantage of one single error message instead of two,
> but... hmm.

No, this is deliberate.  We use this for the "write" side of the dup
operations.  So for the routes we don't want this to print errors for
all the times we get a net unreachable, then again for all the
duplicated routes as we try repeatedly.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 17/17] netlink: Propagate errors for "dup" operations
  2023-08-02 22:48   ` Stefano Brivio
@ 2023-08-03  2:26     ` David Gibson
  0 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2023-08-03  2:26 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 4212 bytes --]

On Thu, Aug 03, 2023 at 12:48:20AM +0200, Stefano Brivio wrote:
> On Mon, 24 Jul 2023 16:09:36 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > We now detect errors on netlink "set" operations while configuring the
> > pasta namespace with --config-net.  However in many cases rather than
> > a simple "set" we use a more complex "dup" function to copy
> > configuration from the host to the namespace.  We're not yet properly
> > detecting and reporting netlink errors for that case.
> > 
> > Change the "dup" operations to propagate netlink errors to their
> > caller, pasta_ns_conf() and report them there.
> > 
> > Link: https://bugs.passt.top/show_bug.cgi?id=60
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  netlink.c | 40 ++++++++++++++++++++++++++++------------
> >  netlink.h |  8 ++++----
> >  pasta.c   | 15 ++++++++-------
> >  3 files changed, 40 insertions(+), 23 deletions(-)
> > 
> > diff --git a/netlink.c b/netlink.c
> > index 9e72b16..cdc18c0 100644
> > --- a/netlink.c
> > +++ b/netlink.c
> > @@ -413,9 +413,11 @@ int nl_route_set_def(int s, unsigned int ifi, sa_family_t af, void *gw)
> >   * @s_dst:	Netlink socket in destination namespace
> >   * @ifi_dst:	Interface index in destination namespace
> >   * @af:		Address family
> > + *
> > + * Return: 0 on success, negative error code on failure
> >   */
> > -void nl_route_dup(int s_src, unsigned int ifi_src,
> > -		  int s_dst, unsigned int ifi_dst, sa_family_t af)
> > +int nl_route_dup(int s_src, unsigned int ifi_src,
> > +		 int s_dst, unsigned int ifi_dst, sa_family_t af)
> >  {
> >  	struct req_t {
> >  		struct nlmsghdr nlh;
> > @@ -477,9 +479,11 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
> >  
> >  		if (extra) {
> >  			err("netlink: Too many routes to duplicate");
> > -			return;
> > +			return -E2BIG;
> 
> This is "Argument list too long", and... I don't have much better
> ideas. I would instinctively use ENOSPC or ENOMEM in this case, but
> both are slightly misleading in different ways, too.

Right.  My main reason for picking this is that it's an obscure error
that won't be generated by the underlying netlink operations so it's
unambiguous.  I'm open to better ideas.

> >  		}
> >  	}
> > +	if (status < 0)
> > +		return status;
> >  
> >  	/* Routes might have dependencies between each other, and the
> >  	 * kernel processes RTM_NEWROUTE messages sequentially. For n
> > @@ -494,15 +498,20 @@ void nl_route_dup(int s_src, unsigned int ifi_src,
> >  		     NLMSG_OK(nh, status);
> >  		     nh = NLMSG_NEXT(nh, status)) {
> >  			uint16_t flags = nh->nlmsg_flags;
> > +			int rc;
> >  
> >  			if (nh->nlmsg_type != RTM_NEWROUTE)
> >  				continue;
> >  
> > -			nl_do(s_dst, nh, RTM_NEWROUTE,
> > -			       (flags & ~NLM_F_DUMP_FILTERED) | NLM_F_CREATE,
> > -			       nh->nlmsg_len);
> > +			rc = nl_do(s_dst, nh, RTM_NEWROUTE,
> > +				   (flags & ~NLM_F_DUMP_FILTERED) | NLM_F_CREATE,
> > +				   nh->nlmsg_len);
> > +			if (rc < 0 && rc != -ENETUNREACH && rc != -EEXIST)
> > +				return rc;
> >  		}
> >  	}
> > +
> > +	return 0;
> >  }
> >  
> >  /**
> > @@ -635,9 +644,11 @@ int nl_addr_set(int s, unsigned int ifi, sa_family_t af,
> >   * @s_dst:	Netlink socket in destination network namespace
> >   * @ifi_dst:	Interface index in destination namespace
> >   * @af:		Address family
> > + *
> > + * Return: 0 on success, negative error code on failure
> >   */
> > -void nl_addr_dup(int s_src, unsigned int ifi_src,
> > -		 int s_dst, unsigned int ifi_dst, sa_family_t af)
> > +int nl_addr_dup(int s_src, unsigned int ifi_src,
> > +		int s_dst, unsigned int ifi_dst, sa_family_t af)
> >  {
> >  	struct req_t {
> >  		struct nlmsghdr nlh;
> > @@ -651,6 +662,7 @@ void nl_addr_dup(int s_src, unsigned int ifi_src,
> >  	struct nlmsghdr *nh;
> >  	ssize_t status;
> >  	uint16_t seq;
> > +	int rc= 0;
> 
> Missing whitespace.

Fixed.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/17] netlink: Split up functionality if nl_link()
  2023-08-03  2:09     ` David Gibson
@ 2023-08-03  4:29       ` David Gibson
  2023-08-03  5:39         ` David Gibson
  2023-08-03  5:40         ` Stefano Brivio
  0 siblings, 2 replies; 35+ messages in thread
From: David Gibson @ 2023-08-03  4:29 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 9171 bytes --]

On Thu, Aug 03, 2023 at 12:09:16PM +1000, David Gibson wrote:
> On Thu, Aug 03, 2023 at 12:47:29AM +0200, Stefano Brivio wrote:
[snip]
> > > -void nl_link(int ns, unsigned int ifi, void *mac, int up, int mtu)
> > > +void nl_link_get_mac(int ns, unsigned int ifi, void *mac)
> > >  {
> > > -	int change = !MAC_IS_ZERO(mac) || up || mtu;
> > >  	struct req_t {
> > >  		struct nlmsghdr nlh;
> > >  		struct ifinfomsg ifm;
> > > -		struct rtattr rta;
> > > -		union {
> > > -			unsigned char mac[ETH_ALEN];
> > > -			struct {
> > > -				unsigned int mtu;
> > > -			} mtu;
> > > -		} set;
> > >  	} req = {
> > > -		.nlh.nlmsg_type   = change ? RTM_NEWLINK : RTM_GETLINK,
> > > -		.nlh.nlmsg_len    = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
> > > -		.nlh.nlmsg_flags  = NLM_F_REQUEST | (change ? NLM_F_ACK : 0),
> > > +		.nlh.nlmsg_type	  = RTM_GETLINK,
> > > +		.nlh.nlmsg_len	  = sizeof(req),
> > 
> > I don't think there's a practical issue with this, but there were two
> > reasons why I used NLMSG_LENGTH(sizeof(struct ifinfomsg)) instead:
> > 
> > - NLMSG_LENGTH() aligns to 4 bytes, not to whatever
> >   architecture-dependent alignment we might have: the message might
> >   actually be smaller
> 
> Oof... so.  On the one hand, I see the issue; if these are different,
> I'm not sure what the effect will be.  On the other hand, if we use
> NLMSG_LENGTH and it *is* longer than the structure size, we'll be
> saying that this message is longer than the datagram containing it.
> I'm not sure what the effect of that will be either.

Duh, sorry, I realized I had this backwards.  NLSMSG_LENGTH() is the
non-aligned length, sizeof() may include alignment.  I'll rework based
on that understanding.

> Not really sure what to do about this.
> 
> > - I see that this works with gcc and clang, but, strictly
> >   speaking, is the size of the struct known "before"
> >   (sequence-point-wise) we're done initialising it? I have a very vague
> >   memory of this not working with gcc 2.9 or suchlike -- which is not a
> >   problem, as long as our new friend C11 actually supports this (but
> >   I'm not entirely sure).
> 
> I'm pretty sure it's ok, regardless of C11 state.  It's not really a
> question of sequence points: those are about the ordering of run time
> operations.  Even though the structure is being defined inline,
> determining it's size and layout will still happen at compile time,
> whereas the initialization is obviously a runtime event.
> 
> > Then, in 9/17, NLMSG_LENGTH() could be conveniently used by nl_req().
> > 
> > > +		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK,
> > >  		.nlh.nlmsg_seq	  = nl_seq++,
> > >  		.ifm.ifi_family	  = AF_UNSPEC,
> > >  		.ifm.ifi_index	  = ifi,
> > > -		.ifm.ifi_flags	  = up ? IFF_UP : 0,
> > > -		.ifm.ifi_change	  = up ? IFF_UP : 0,
> > >  	};
> > > -	struct ifinfomsg *ifm;
> > >  	struct nlmsghdr *nh;
> > > -	struct rtattr *rta;
> > >  	char buf[NLBUFSIZ];
> > >  	ssize_t n;
> > > -	size_t na;
> > > -
> > > -	if (!MAC_IS_ZERO(mac)) {
> > > -		req.nlh.nlmsg_len = sizeof(req);
> > > -		memcpy(req.set.mac, mac, ETH_ALEN);
> > > -		req.rta.rta_type = IFLA_ADDRESS;
> > > -		req.rta.rta_len = RTA_LENGTH(ETH_ALEN);
> > > -		if (nl_req(ns, buf, &req, req.nlh.nlmsg_len) < 0)
> > > -			return;
> > > -
> > > -		up = 0;
> > > -	}
> > > -
> > > -	if (mtu) {
> > > -		req.nlh.nlmsg_len = offsetof(struct req_t, set.mtu)
> > > -			+ sizeof(req.set.mtu);
> > > -		req.set.mtu.mtu = mtu;
> > > -		req.rta.rta_type = IFLA_MTU;
> > > -		req.rta.rta_len = RTA_LENGTH(sizeof(unsigned int));
> > > -		if (nl_req(ns, buf, &req, req.nlh.nlmsg_len) < 0)
> > > -			return;
> > > -
> > > -		up = 0;
> > > -	}
> > > -
> > > -	if (up && nl_req(ns, buf, &req, req.nlh.nlmsg_len) < 0)
> > > -		return;
> > > -
> > > -	if (change)
> > > -		return;
> > >  
> > > -	if ((n = nl_req(ns, buf, &req, req.nlh.nlmsg_len)) < 0)
> > > +	n = nl_req(ns, buf, &req, sizeof(req));
> > > +	if (n < 0)
> > >  		return;
> > > +	
> > > +	for (nh = (struct nlmsghdr *)buf;
> > > +	     NLMSG_OK(nh, n) && nh->nlmsg_type != NLMSG_DONE;
> > > +	     nh = NLMSG_NEXT(nh, n)) {
> > > +		struct ifinfomsg *ifm = (struct ifinfomsg *)NLMSG_DATA(nh);
> > > +		struct rtattr *rta;
> > > +		size_t na;
> > >  
> > > -	nh = (struct nlmsghdr *)buf;
> > > -	for ( ; NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
> > >  		if (nh->nlmsg_type != RTM_NEWLINK)
> > > -			goto next;
> > > -
> > > -		ifm = (struct ifinfomsg *)NLMSG_DATA(nh);
> > > +			continue;
> > >  
> > > -		for (rta = IFLA_RTA(ifm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
> > > +		for (rta = IFLA_RTA(ifm), na = RTM_PAYLOAD(nh);
> > > +		     RTA_OK(rta, na);
> > >  		     rta = RTA_NEXT(rta, na)) {
> > >  			if (rta->rta_type != IFLA_ADDRESS)
> > >  				continue;
> > > @@ -570,8 +531,70 @@ void nl_link(int ns, unsigned int ifi, void *mac, int up, int mtu)
> > >  			memcpy(mac, RTA_DATA(rta), ETH_ALEN);
> > >  			break;
> > >  		}
> > > -next:
> > > -		if (nh->nlmsg_type == NLMSG_DONE)
> > > -			break;
> > >  	}
> > >  }
> > > +
> > > +/**
> > > + * nl_link_set_mac() - Set link MAC address
> > > + * @ns:		Use netlink socket in namespace
> > > + * @ifi:	Interface index
> > > + * @mac:	MAC address to set
> > > + */
> > > +void nl_link_set_mac(int ns, unsigned int ifi, void *mac)
> > > +{
> > > +	struct req_t {
> > > +		struct nlmsghdr nlh;
> > > +		struct ifinfomsg ifm;
> > > +		struct rtattr rta;
> > > +		unsigned char mac[ETH_ALEN];
> > > +	} req = {
> > > +		.nlh.nlmsg_type	  = RTM_NEWLINK,
> > > +		.nlh.nlmsg_len	  = sizeof(req),
> > 
> > Same here.
> > 
> > > +		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK,
> > > +		.nlh.nlmsg_seq	  = nl_seq++,
> > > +		.ifm.ifi_family	  = AF_UNSPEC,
> > > +		.ifm.ifi_index	  = ifi,
> > > +		.rta.rta_type	  = IFLA_ADDRESS,
> > > +		.rta.rta_len	  = RTA_LENGTH(ETH_ALEN),
> > > +	};
> > > +	char buf[NLBUFSIZ];
> > > +
> > > +	memcpy(req.mac, mac, ETH_ALEN);
> > > +
> > > +	nl_req(ns, buf, &req, sizeof(req));
> > > +}
> > > +
> > > +/**
> > > + * nl_link_up() - Bring link up
> > > + * @ns:		Use netlink socket in namespace
> > > + * @ifi:	Interface index
> > > + * @mtu:	If non-zero, set interface MTU
> > > + */
> > > +void nl_link_up(int ns, unsigned int ifi, int mtu)
> > > +{
> > > +	struct req_t {
> > > +		struct nlmsghdr nlh;
> > > +		struct ifinfomsg ifm;
> > > +		struct rtattr rta;
> > > +		unsigned int mtu;
> > > +	} req = {
> > > +		.nlh.nlmsg_type   = RTM_NEWLINK,
> > > +		.nlh.nlmsg_len    = sizeof(req),
> > 
> > And here.
> > 
> > > +		.nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK,
> > > +		.nlh.nlmsg_seq	  = nl_seq++,
> > > +		.ifm.ifi_family	  = AF_UNSPEC,
> > > +		.ifm.ifi_index	  = ifi,
> > > +		.ifm.ifi_flags	  = IFF_UP,
> > > +		.ifm.ifi_change	  = IFF_UP,
> > > +		.rta.rta_type	  = IFLA_MTU,
> > > +		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
> > > +		.mtu		  = mtu,
> > > +	};
> > > +	char buf[NLBUFSIZ];
> > > +
> > > +	if (!mtu)
> > > +		/* Shorten request to drop MTU attribute */
> > > +		req.nlh.nlmsg_len = offsetof(struct req_t, rta);
> > 
> > Pre-existing issue I see now: we should probably use NLMSG_LENGTH()
> > here, in any case.
> 
> Well.. if NLMSG_LENGTH() really is different here, we're (by
> definition) including some of req.rta in the message, which isn't our
> intention.  So.. if we trust the rta member to be aligned properly for
> the case where we *do* include it, can't we also trust it for the case
> where we don't?
> 
> > > +
> > > +	nl_req(ns, buf, &req, req.nlh.nlmsg_len);
> > > +}
> > > diff --git a/netlink.h b/netlink.h
> > > index cd0e666..980ac44 100644
> > > --- a/netlink.h
> > > +++ b/netlink.h
> > > @@ -18,6 +18,8 @@ void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
> > >  	      sa_family_t af, void *gw);
> > >  void nl_addr(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
> > >  	     sa_family_t af, void *addr, int *prefix_len, void *addr_l);
> > > -void nl_link(int ns, unsigned int ifi, void *mac, int up, int mtu);
> > > +void nl_link_get_mac(int ns, unsigned int ifi, void *mac);
> > > +void nl_link_set_mac(int ns, unsigned int ifi, void *mac);
> > > +void nl_link_up(int ns, unsigned int ifi, int mtu);
> > >  
> > >  #endif /* NETLINK_H */
> > > diff --git a/pasta.c b/pasta.c
> > > index 8c85546..3b5537d 100644
> > > --- a/pasta.c
> > > +++ b/pasta.c
> > > @@ -272,13 +272,19 @@ void pasta_start_ns(struct ctx *c, uid_t uid, gid_t gid,
> > >   */
> > >  void pasta_ns_conf(struct ctx *c)
> > >  {
> > > -	nl_link(1, 1 /* lo */, MAC_ZERO, 1, 0);
> > > +	nl_link_up(1, 1 /* lo */, 0);
> > > +
> > > +	/* Get or set guest MAC */
> > 
> > I know it's called mac_guest, my bad, but what about "MAC address in
> > the target namespace"?
> 
> Good idea, changed.
> 



-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/17] netlink: Split up functionality if nl_link()
  2023-08-03  4:29       ` David Gibson
@ 2023-08-03  5:39         ` David Gibson
  2023-08-03  5:40         ` Stefano Brivio
  1 sibling, 0 replies; 35+ messages in thread
From: David Gibson @ 2023-08-03  5:39 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 3691 bytes --]

On Thu, Aug 03, 2023 at 02:29:28PM +1000, David Gibson wrote:
> On Thu, Aug 03, 2023 at 12:09:16PM +1000, David Gibson wrote:
> > On Thu, Aug 03, 2023 at 12:47:29AM +0200, Stefano Brivio wrote:
> [snip]
> > > > -void nl_link(int ns, unsigned int ifi, void *mac, int up, int mtu)
> > > > +void nl_link_get_mac(int ns, unsigned int ifi, void *mac)
> > > >  {
> > > > -	int change = !MAC_IS_ZERO(mac) || up || mtu;
> > > >  	struct req_t {
> > > >  		struct nlmsghdr nlh;
> > > >  		struct ifinfomsg ifm;
> > > > -		struct rtattr rta;
> > > > -		union {
> > > > -			unsigned char mac[ETH_ALEN];
> > > > -			struct {
> > > > -				unsigned int mtu;
> > > > -			} mtu;
> > > > -		} set;
> > > >  	} req = {
> > > > -		.nlh.nlmsg_type   = change ? RTM_NEWLINK : RTM_GETLINK,
> > > > -		.nlh.nlmsg_len    = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
> > > > -		.nlh.nlmsg_flags  = NLM_F_REQUEST | (change ? NLM_F_ACK : 0),
> > > > +		.nlh.nlmsg_type	  = RTM_GETLINK,
> > > > +		.nlh.nlmsg_len	  = sizeof(req),
> > > 
> > > I don't think there's a practical issue with this, but there were two
> > > reasons why I used NLMSG_LENGTH(sizeof(struct ifinfomsg)) instead:
> > > 
> > > - NLMSG_LENGTH() aligns to 4 bytes, not to whatever
> > >   architecture-dependent alignment we might have: the message might
> > >   actually be smaller
> > 
> > Oof... so.  On the one hand, I see the issue; if these are different,
> > I'm not sure what the effect will be.  On the other hand, if we use
> > NLMSG_LENGTH and it *is* longer than the structure size, we'll be
> > saying that this message is longer than the datagram containing it.
> > I'm not sure what the effect of that will be either.
> 
> Duh, sorry, I realized I had this backwards.  NLSMSG_LENGTH() is the
> non-aligned length, sizeof() may include alignment.  I'll rework based
> on that understanding.

Uhhh... then I realized I don't really see what that entails either.

The basic problem is this, given a structure:
	struct req {
		struct nlmsghdr nlh;
		a_t a;
		b_t b;
		c_t c;
	} req;

then, what's the correct req.nlh.nlmsg_length?

sizeof(req) will probably work in practice, but as you say could be an
overestimate if struct req has end padding.

So try:
	payload_len = sizeof(req) - offsetof(struct req, a);
	NLMSG_LENGTH(payload_len)
But.. that still relies on sizeof(req) so will overestimate in exactly
the same circumstances.

So try:
	NLMSG_LENGTH(sizeof(a_t) + sizeof(b_t) + sizeof(c_t))
For one thing that's bulky and annoying... but also it will
*under*estimate the length if struct req has any mid-structure
padding.

Ok, so try:
	struct req {
		struct nlmsghdr nlg;
		struct payload {
			a_t a;
			b_t b;
			c_t c;
		}
	} req;

	NLMSG_LENGTH(sizeof(struct payload))

Well, if struct req has end padding, but struct payload doesn't, then
this will be correct, but struct payload isn't guaranteed to lack end
padding any more than struct req.

At that point I'm out of ideas.

So, I'm inclined to stick with sizeof(req) for its simplicity.  This
has the implicit requirement that we're always careful and ABI aware
about the construction of our structures so that they don't have
unexpected end padding.  But.. then we're also relying on the
structures having mid-padding only at the places that netlink expects
it, which already requires some awareness of the ABI, so I'm not sure
that avoiding the NLMSG_LENGTH() really loses us anything.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/17] netlink: Split up functionality if nl_link()
  2023-08-03  4:29       ` David Gibson
  2023-08-03  5:39         ` David Gibson
@ 2023-08-03  5:40         ` Stefano Brivio
  1 sibling, 0 replies; 35+ messages in thread
From: Stefano Brivio @ 2023-08-03  5:40 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Thu, 3 Aug 2023 14:29:28 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Aug 03, 2023 at 12:09:16PM +1000, David Gibson wrote:
> > On Thu, Aug 03, 2023 at 12:47:29AM +0200, Stefano Brivio wrote:  
> [snip]
> > > > -void nl_link(int ns, unsigned int ifi, void *mac, int up, int mtu)
> > > > +void nl_link_get_mac(int ns, unsigned int ifi, void *mac)
> > > >  {
> > > > -	int change = !MAC_IS_ZERO(mac) || up || mtu;
> > > >  	struct req_t {
> > > >  		struct nlmsghdr nlh;
> > > >  		struct ifinfomsg ifm;
> > > > -		struct rtattr rta;
> > > > -		union {
> > > > -			unsigned char mac[ETH_ALEN];
> > > > -			struct {
> > > > -				unsigned int mtu;
> > > > -			} mtu;
> > > > -		} set;
> > > >  	} req = {
> > > > -		.nlh.nlmsg_type   = change ? RTM_NEWLINK : RTM_GETLINK,
> > > > -		.nlh.nlmsg_len    = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
> > > > -		.nlh.nlmsg_flags  = NLM_F_REQUEST | (change ? NLM_F_ACK : 0),
> > > > +		.nlh.nlmsg_type	  = RTM_GETLINK,
> > > > +		.nlh.nlmsg_len	  = sizeof(req),  
> > > 
> > > I don't think there's a practical issue with this, but there were two
> > > reasons why I used NLMSG_LENGTH(sizeof(struct ifinfomsg)) instead:
> > > 
> > > - NLMSG_LENGTH() aligns to 4 bytes, not to whatever
> > >   architecture-dependent alignment we might have: the message might
> > >   actually be smaller  
> > 
> > Oof... so.  On the one hand, I see the issue; if these are different,
> > I'm not sure what the effect will be.  On the other hand, if we use
> > NLMSG_LENGTH and it *is* longer than the structure size, we'll be
> > saying that this message is longer than the datagram containing it.
> > I'm not sure what the effect of that will be either.  
> 
> Duh, sorry, I realized I had this backwards.  NLSMSG_LENGTH() is the
> non-aligned length, sizeof() may include alignment.  I'll rework based
> on that understanding.

Right, I was about to write you that... or rather, NLMSG_LENGTH() is
the (presumably) lesser-aligned length. Also, if you check pretty much
any example in iproute2, nlmsg_len is always set like that, using
NLMSG_LENGTH() on the payload.

> > Not really sure what to do about this.
> >   
> > > - I see that this works with gcc and clang, but, strictly
> > >   speaking, is the size of the struct known "before"
> > >   (sequence-point-wise) we're done initialising it? I have a very vague
> > >   memory of this not working with gcc 2.9 or suchlike -- which is not a
> > >   problem, as long as our new friend C11 actually supports this (but
> > >   I'm not entirely sure).  
> > 
> > I'm pretty sure it's ok, regardless of C11 state.  It's not really a
> > question of sequence points: those are about the ordering of run time
> > operations.  Even though the structure is being defined inline,
> > determining it's size and layout will still happen at compile time,
> > whereas the initialization is obviously a runtime event.

Ah, sorry, yes, of course. Still I remember that failing spectacularly
in a distant past. But you just checked with gcc 4-ish I guess, so I
guess it would have been fine anyway.

-- 
Stefano


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2023-08-03  5:40 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-24  6:09 [PATCH 00/17] netlink fixes and cleanups David Gibson
2023-07-24  6:09 ` [PATCH 01/17] netlink: Split up functionality if nl_link() David Gibson
2023-08-02 22:47   ` Stefano Brivio
2023-08-03  2:09     ` David Gibson
2023-08-03  4:29       ` David Gibson
2023-08-03  5:39         ` David Gibson
2023-08-03  5:40         ` Stefano Brivio
2023-07-24  6:09 ` [PATCH 02/17] netlink: Split nl_addr() into separate operation functions David Gibson
2023-08-02 22:47   ` Stefano Brivio
2023-08-03  2:11     ` David Gibson
2023-07-24  6:09 ` [PATCH 03/17] netlink: Split nl_route() " David Gibson
2023-08-02 22:47   ` Stefano Brivio
2023-08-03  2:18     ` David Gibson
2023-07-24  6:09 ` [PATCH 04/17] netlink: Use struct in_addr for IPv4 addresses, not bare uint32_t David Gibson
2023-07-24  6:09 ` [PATCH 05/17] netlink: Explicitly pass netlink sockets to operations David Gibson
2023-07-24  6:09 ` [PATCH 06/17] netlink: Make nl_*_dup() use a separate datagram for each request David Gibson
2023-07-24  6:09 ` [PATCH 07/17] netlink: Start sequence number from 1 instead of 0 David Gibson
2023-07-24  6:09 ` [PATCH 08/17] netlink: Treat send() or recv() errors as fatal David Gibson
2023-08-02 22:47   ` Stefano Brivio
2023-08-03  2:19     ` David Gibson
2023-07-24  6:09 ` [PATCH 09/17] netlink: Fill in netlink header fields from nl_req() David Gibson
2023-07-24  6:09 ` [PATCH 10/17] netlink: Add nl_do() helper for simple operations with error checking David Gibson
2023-08-02 22:48   ` Stefano Brivio
2023-08-03  2:24     ` David Gibson
2023-07-24  6:09 ` [PATCH 11/17] netlink: Clearer reasoning about the netlink response buffer size David Gibson
2023-08-02 22:48   ` Stefano Brivio
2023-08-03  2:22     ` David Gibson
2023-07-24  6:09 ` [PATCH 12/17] netlink: Split nl_req() to allow processing multiple response datagrams David Gibson
2023-07-24  6:09 ` [PATCH 13/17] netlink: Add nl_foreach_oftype to filter response message types David Gibson
2023-07-24  6:09 ` [PATCH 14/17] netlink: Propagate errors for "set" operations David Gibson
2023-07-24  6:09 ` [PATCH 15/17] netlink: Always process all responses to a netlink request David Gibson
2023-07-24  6:09 ` [PATCH 16/17] netlink: Propagate errors for "dump" operations David Gibson
2023-07-24  6:09 ` [PATCH 17/17] netlink: Propagate errors for "dup" operations David Gibson
2023-08-02 22:48   ` Stefano Brivio
2023-08-03  2:26     ` David Gibson

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).