public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: passt-dev@passt.top
Cc: Callum Parsey <callum@neoninteger.au>,
	me@yawnt.com, David Gibson <david@gibson.dropbear.id.au>,
	lemmi@nerd2nerd.org, Andrea Arcangeli <aarcange@redhat.com>
Subject: [PATCH v3 03/10] netlink: Add functionality to copy routes from outer namespace
Date: Mon, 22 May 2023 19:46:00 +0200	[thread overview]
Message-ID: <20230522174607.2824220-4-sbrivio@redhat.com> (raw)
In-Reply-To: <20230522174607.2824220-1-sbrivio@redhat.com>

Instead of just fetching the default gateway and configuring a single
equivalent route in the target namespace, on 'pasta --config-net', it
might be desirable in some cases to copy the whole set of routes
corresponding to a given output interface.

For instance, in:
  https://github.com/containers/podman/issues/18539
  IPv4 Default Route Does Not Propagate to Pasta Containers on Hetzner VPSes

configuring the default gateway won't work without a gateway-less
route (specifying the output interface only), because the default
gateway is, somewhat dubiously, not on the same subnet as the
container.

This is a similar case to the one covered by commit 7656a6f88882
("conf: Adjust netmask on mismatch between IPv4 address/netmask and
gateway"), and I'm not exactly proud of that workaround.

We also have:
  https://bugs.passt.top/show_bug.cgi?id=49
  pasta does not work with tap-style interface

for which, eventually, we should be able to configure a gateway-less
route in the target namespace.

Introduce different operation modes for nl_route(), including a new
NL_DUP one, not exposed yet, which simply parrots back to the kernel
the route dump for a given interface from the outer namespace, fixing
up flags and interface indices on the way, and requesting to add the
same routes in the target namespace, on the interface we manage.

For n routes we want to duplicate, send n identical netlink requests
including the full dump: routes might depend on each other and the
kernel processes RTM_NEWROUTE messages sequentially, not atomically,
and repeating the full dump naturally resolves dependencies without
the need to actually calculate them.

I'm not kidding, it actually works pretty well.

Link: https://github.com/containers/podman/issues/18539
Link: https://bugs.passt.top/show_bug.cgi?id=49
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c    |  4 ++--
 netlink.c | 71 ++++++++++++++++++++++++++++++++++++++++++-------------
 netlink.h |  9 ++++++-
 pasta.c   |  6 +++--
 4 files changed, 68 insertions(+), 22 deletions(-)

diff --git a/conf.c b/conf.c
index 984c3ce..1f6bbef 100644
--- a/conf.c
+++ b/conf.c
@@ -646,7 +646,7 @@ static unsigned int conf_ip4(unsigned int ifi,
 	}
 
 	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->gw))
-		nl_route(0, ifi, AF_INET, &ip4->gw);
+		nl_route(NL_GET, ifi, 0, AF_INET, &ip4->gw);
 
 	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr))
 		nl_addr(0, ifi, AF_INET, &ip4->addr, &ip4->prefix_len, NULL);
@@ -718,7 +718,7 @@ static unsigned int conf_ip6(unsigned int ifi,
 	}
 
 	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->gw))
-		nl_route(0, ifi, AF_INET6, &ip6->gw);
+		nl_route(NL_GET, ifi, 0, AF_INET6, &ip6->gw);
 
 	nl_addr(0, ifi, AF_INET6,
 		IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ? &ip6->addr : NULL,
diff --git a/netlink.c b/netlink.c
index c07a13c..d93ecda 100644
--- a/netlink.c
+++ b/netlink.c
@@ -185,16 +185,16 @@ unsigned int nl_get_ext_if(sa_family_t af)
 }
 
 /**
- * nl_route() - Get/set default gateway for given interface and address family
- * @ns:		Use netlink socket in namespace
- * @ifi:	Interface index
+ * nl_route() - Get/set/copy routes for given interface and address family
+ * @op:		Requested operation
+ * @ifi:	Interface index in outer network namespace
+ * @ifi_ns:	Interface index in target namespace for NL_SET, NL_DUP
  * @af:		Address family
- * @gw:		Default gateway to fill if zero, to set if not
+ * @gw:		Default gateway to fill on NL_GET, to set on NL_SET
  */
-void nl_route(int ns, unsigned int ifi, sa_family_t af, void *gw)
+void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
+	      sa_family_t af, void *gw)
 {
-	int set = (af == AF_INET6 && !IN6_IS_ADDR_UNSPECIFIED(gw)) ||
-		  (af == AF_INET && *(uint32_t *)gw);
 	struct req_t {
 		struct nlmsghdr nlh;
 		struct rtmsg rtm;
@@ -215,7 +215,7 @@ void nl_route(int ns, unsigned int ifi, sa_family_t af, void *gw)
 			} r4;
 		} set;
 	} req = {
-		.nlh.nlmsg_type	  = set ? RTM_NEWROUTE : RTM_GETROUTE,
+		.nlh.nlmsg_type	  = op == NL_SET ? RTM_NEWROUTE : RTM_GETROUTE,
 		.nlh.nlmsg_flags  = NLM_F_REQUEST,
 		.nlh.nlmsg_seq	  = nl_seq++,
 
@@ -228,14 +228,15 @@ void nl_route(int ns, unsigned int ifi, sa_family_t af, void *gw)
 		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
 		.ifi		  = ifi,
 	};
+	unsigned dup_routes = 0;
+	ssize_t n, nlmsgs_size;
 	struct nlmsghdr *nh;
 	struct rtattr *rta;
-	struct rtmsg *rtm;
 	char buf[NLBUFSIZ];
-	ssize_t n;
+	struct rtmsg *rtm;
 	size_t na;
 
-	if (set) {
+	if (op == NL_SET) {
 		if (af == AF_INET6) {
 			size_t rta_len = RTA_LENGTH(sizeof(req.set.r6.d));
 
@@ -269,31 +270,67 @@ void nl_route(int ns, unsigned int ifi, sa_family_t af, void *gw)
 		req.nlh.nlmsg_flags |= NLM_F_DUMP;
 	}
 
-	if ((n = nl_req(ns, buf, &req, req.nlh.nlmsg_len)) < 0 || set)
+	if ((n = nl_req(op == NL_SET, buf, &req, req.nlh.nlmsg_len)) < 0)
+		return;
+
+	if (op == NL_SET)
 		return;
 
 	nh = (struct nlmsghdr *)buf;
+	nlmsgs_size = n;
+
 	for ( ; NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
 		if (nh->nlmsg_type != RTM_NEWROUTE)
 			goto next;
 
+		if (op == NL_DUP) {
+			nh->nlmsg_seq = nl_seq++;
+			nh->nlmsg_pid = 0;
+			nh->nlmsg_flags &= ~NLM_F_DUMP_FILTERED;
+			nh->nlmsg_flags |= NLM_F_REQUEST | NLM_F_ACK |
+					   NLM_F_CREATE;
+			dup_routes++;
+		}
+
 		rtm = (struct rtmsg *)NLMSG_DATA(nh);
-		if (rtm->rtm_dst_len)
+		if (op == NL_GET && rtm->rtm_dst_len)
 			continue;
 
 		for (rta = RTM_RTA(rtm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
 		     rta = RTA_NEXT(rta, na)) {
-			if (rta->rta_type != RTA_GATEWAY)
-				continue;
+			if (op == NL_GET) {
+				if (rta->rta_type != RTA_GATEWAY)
+					continue;
 
-			memcpy(gw, RTA_DATA(rta), RTA_PAYLOAD(rta));
-			return;
+				memcpy(gw, RTA_DATA(rta), RTA_PAYLOAD(rta));
+				return;
+			}
+
+			if (op == NL_DUP && rta->rta_type == RTA_OIF)
+				*(unsigned int *)RTA_DATA(rta) = ifi_ns;
 		}
 
 next:
 		if (nh->nlmsg_type == NLMSG_DONE)
 			break;
 	}
+
+	if (op == NL_DUP) {
+		char resp[NLBUFSIZ];
+		unsigned i;
+
+		nh = (struct nlmsghdr *)buf;
+		/* Routes might have dependencies between each other, and the
+		 * kernel processes RTM_NEWROUTE messages sequentially. For n
+		 * valid routes, we might need to send up to n requests to get
+		 * all of them inserted. Routes that have been already inserted
+		 * won't cause the whole request to fail, so we can simply
+		 * repeat the whole request. This approach avoids the need to
+		 * calculate dependencies: let the kernel do that.
+		 */
+		for (i = 0; i < dup_routes; i++)
+			nl_req(1, resp, nh, nlmsgs_size);
+	}
 }
 
 /**
diff --git a/netlink.h b/netlink.h
index ca4d6ef..217cf1e 100644
--- a/netlink.h
+++ b/netlink.h
@@ -6,9 +6,16 @@
 #ifndef NETLINK_H
 #define NETLINK_H
 
+enum nl_op {
+	NL_GET,
+	NL_SET,
+	NL_DUP,
+};
+
 void nl_sock_init(const struct ctx *c, bool ns);
 unsigned int nl_get_ext_if(sa_family_t af);
-void nl_route(int ns, unsigned int ifi, sa_family_t af, void *gw);
+void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
+	      sa_family_t af, void *gw);
 void nl_addr(int ns, unsigned int ifi, sa_family_t af,
 	     void *addr, int *prefix_len, void *addr_l);
 void nl_link(int ns, unsigned int ifi, void *mac, int up, int mtu);
diff --git a/pasta.c b/pasta.c
index 2a6fb60..01109f5 100644
--- a/pasta.c
+++ b/pasta.c
@@ -278,14 +278,16 @@ void pasta_ns_conf(struct ctx *c)
 		if (c->ifi4) {
 			nl_addr(1, c->pasta_ifi, AF_INET, &c->ip4.addr,
 				&c->ip4.prefix_len, NULL);
-			nl_route(1, c->pasta_ifi, AF_INET, &c->ip4.gw);
+			nl_route(NL_SET, c->ifi4, c->pasta_ifi, AF_INET,
+				 &c->ip4.gw);
 		}
 
 		if (c->ifi6) {
 			int prefix_len = 64;
 			nl_addr(1, c->pasta_ifi, AF_INET6, &c->ip6.addr,
 				&prefix_len, NULL);
-			nl_route(1, c->pasta_ifi, AF_INET6, &c->ip6.gw);
+			nl_route(NL_SET, c->ifi6, c->pasta_ifi, AF_INET6,
+				 &c->ip6.gw);
 		}
 	} else {
 		nl_link(1, c->pasta_ifi, c->mac_guest, 0, 0);
-- 
@@ -278,14 +278,16 @@ void pasta_ns_conf(struct ctx *c)
 		if (c->ifi4) {
 			nl_addr(1, c->pasta_ifi, AF_INET, &c->ip4.addr,
 				&c->ip4.prefix_len, NULL);
-			nl_route(1, c->pasta_ifi, AF_INET, &c->ip4.gw);
+			nl_route(NL_SET, c->ifi4, c->pasta_ifi, AF_INET,
+				 &c->ip4.gw);
 		}
 
 		if (c->ifi6) {
 			int prefix_len = 64;
 			nl_addr(1, c->pasta_ifi, AF_INET6, &c->ip6.addr,
 				&prefix_len, NULL);
-			nl_route(1, c->pasta_ifi, AF_INET6, &c->ip6.gw);
+			nl_route(NL_SET, c->ifi6, c->pasta_ifi, AF_INET6,
+				 &c->ip6.gw);
 		}
 	} else {
 		nl_link(1, c->pasta_ifi, c->mac_guest, 0, 0);
-- 
2.39.2


  parent reply	other threads:[~2023-05-22 17:46 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-22 17:45 [PATCH v3 00/10] Optionally copy all routes and addresses for pasta, allow gateway-less routes Stefano Brivio
2023-05-22 17:45 ` [PATCH v3 01/10] netlink: Fix comment about response buffer size for nl_req() Stefano Brivio
2023-05-22 17:45 ` [PATCH v3 02/10] pasta: Improve error handling on failure to join network namespace Stefano Brivio
2023-05-23 13:47   ` Stefano Brivio
2023-05-22 17:46 ` Stefano Brivio [this message]
2023-05-22 17:46 ` [PATCH v3 04/10] conf: --config-net option is for pasta mode only Stefano Brivio
2023-05-22 17:46 ` [PATCH v3 05/10] conf, pasta: With --config-net, copy all routes by default Stefano Brivio
2023-05-23  3:04   ` David Gibson
2023-05-22 17:46 ` [PATCH v3 06/10] Revert "conf: Adjust netmask on mismatch between IPv4 address/netmask and gateway" Stefano Brivio
2023-05-22 17:46 ` [PATCH v3 07/10] conf: Don't exit if sourced default route has no gateway Stefano Brivio
2023-05-22 17:46 ` [PATCH v3 08/10] netlink: Add functionality to copy addresses from outer namespace Stefano Brivio
2023-05-22 17:46 ` [PATCH v3 09/10] conf, pasta: With --config-net, copy all addresses by default Stefano Brivio
2023-05-23  3:05   ` David Gibson
2023-05-22 17:46 ` [PATCH v3 10/10] passt.h: Fix description of pasta_ifi in struct ctx Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230522174607.2824220-4-sbrivio@redhat.com \
    --to=sbrivio@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=callum@neoninteger.au \
    --cc=david@gibson.dropbear.id.au \
    --cc=lemmi@nerd2nerd.org \
    --cc=me@yawnt.com \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).