public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH v2 00/10] Optionally copy all routes and addresses for pasta, allow gateway-less routes
@ 2023-05-21 23:42 Stefano Brivio
  2023-05-21 23:42 ` [PATCH v2 01/10] netlink: Fix comment about response buffer size for nl_req() Stefano Brivio
                   ` (9 more replies)
  0 siblings, 10 replies; 23+ messages in thread
From: Stefano Brivio @ 2023-05-21 23:42 UTC (permalink / raw)
  To: passt-dev; +Cc: Callum Parsey, me, David Gibson, lemmi, Andrea Arcangeli

This series, along with pseudo-related fixes, enables:

- optional copy of all routes from selected interface in outer
  namespace, to fix the issue reported by Callum at:
    https://github.com/containers/podman/issues/18539

- optional copy of all addresses, mostly for consistency. It doesn't,
  however, enable assignment of multiple addresses in the sense
  requested at:
    https://bugs.passt.top/show_bug.cgi?id=47

  because the addresses still need to be present on the host, and
  the "outer" address isn't selected depending on the address used
  inside the container

- operation without a gateway address. This is related to:
    https://bugs.passt.top/show_bug.cgi?id=49

  but Wireguard endpoints established outside the container
  can't be used yet as outbound interface (without the workaround
  reported there) for a number of reasons I'm still investigating.
  In any case, the correct route is now configured in the container,
  even without a default gateway on the corresponding host route,
  so we're a bit closer to support that configuration out of the box.

v2:

- in 3/10, repeat the netlink request once for each RTM_NEWROUTE we're
  going to send as part of the request: routes might depend on each
  other, and this is a somewhat rudimentary, but seemingly robust
  approach to insert all the routes we can insert, without explicitly
  calculating dependencies

- Cc: Andrea, reporter for the issue fixed in 4/10


Stefano Brivio (10):
  netlink: Fix comment about response buffer size for nl_req()
  pasta: Improve error handling on failure to join network namespace
  netlink: Add functionality to copy routes from outer namespace
  conf: --config-net option is for pasta mode only
  conf, pasta: With --config-net, copy all routes by default
  Revert "conf: Adjust netmask on mismatch between IPv4 address/netmask
    and gateway"
  conf: Don't exit if sourced default route has no gateway
  netlink: Add functionality to copy addresses from outer namespace
  conf, pasta: With --config-net, copy all addresses by default
  passt.h: Fix description of pasta_ifi in struct ctx

 conf.c    |  81 +++++++++++++++++++-------------
 netlink.c | 135 +++++++++++++++++++++++++++++++++++++++++-------------
 netlink.h |  13 ++++--
 passt.1   |  25 +++++++++-
 passt.h   |   8 +++-
 pasta.c   |  26 +++++++----
 6 files changed, 207 insertions(+), 81 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 01/10] netlink: Fix comment about response buffer size for nl_req()
  2023-05-21 23:42 [PATCH v2 00/10] Optionally copy all routes and addresses for pasta, allow gateway-less routes Stefano Brivio
@ 2023-05-21 23:42 ` Stefano Brivio
  2023-05-21 23:42 ` [PATCH v2 02/10] pasta: Improve error handling on failure to join network namespace Stefano Brivio
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Stefano Brivio @ 2023-05-21 23:42 UTC (permalink / raw)
  To: passt-dev; +Cc: Callum Parsey, me, David Gibson, lemmi, Andrea Arcangeli

Fixes: fde8004ab0b4 ("netlink: Use 8 KiB * netlink message header size as response buffer")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 netlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/netlink.c b/netlink.c
index b99af85..c07a13c 100644
--- a/netlink.c
+++ b/netlink.c
@@ -99,7 +99,7 @@ fail:
 /**
  * nl_req() - Send netlink request and read response
  * @ns:		Use netlink socket in namespace
- * @buf:	Buffer for response (at least BUFSIZ long)
+ * @buf:	Buffer for response (at least NLBUFSIZ long)
  * @req:	Request with netlink header
  * @len:	Request length
  *
-- 
@@ -99,7 +99,7 @@ fail:
 /**
  * nl_req() - Send netlink request and read response
  * @ns:		Use netlink socket in namespace
- * @buf:	Buffer for response (at least BUFSIZ long)
+ * @buf:	Buffer for response (at least NLBUFSIZ long)
  * @req:	Request with netlink header
  * @len:	Request length
  *
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 02/10] pasta: Improve error handling on failure to join network namespace
  2023-05-21 23:42 [PATCH v2 00/10] Optionally copy all routes and addresses for pasta, allow gateway-less routes Stefano Brivio
  2023-05-21 23:42 ` [PATCH v2 01/10] netlink: Fix comment about response buffer size for nl_req() Stefano Brivio
@ 2023-05-21 23:42 ` Stefano Brivio
  2023-05-21 23:42 ` [PATCH v2 03/10] netlink: Add functionality to copy routes from outer namespace Stefano Brivio
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Stefano Brivio @ 2023-05-21 23:42 UTC (permalink / raw)
  To: passt-dev; +Cc: Callum Parsey, me, David Gibson, lemmi, Andrea Arcangeli

In pasta_wait_for_ns(), open() failing with ENOENT is expected: we're
busy-looping until the network namespace appears. But any other
failure is not something we're going to recover from: return right
away if we don't get either success or ENOENT.

Now that pasta_wait_for_ns() can actually fail, handle that in
pasta_start_ns() by reporting the issue and exiting.

Looping on EPERM, when pasta doesn't actually have the permissions to
join a given namespace, isn't exactly a productive thing to do.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 pasta.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/pasta.c b/pasta.c
index b30ce70..2a6fb60 100644
--- a/pasta.c
+++ b/pasta.c
@@ -95,8 +95,11 @@ static int pasta_wait_for_ns(void *arg)
 	char ns[PATH_MAX];
 
 	snprintf(ns, PATH_MAX, "/proc/%i/ns/net", pasta_child_pid);
-	do
-		while ((c->pasta_netns_fd = open(ns, flags)) < 0);
+	while ((c->pasta_netns_fd = open(ns, flags)) < 0) {
+		if (errno != ENOENT)
+			return 0;
+	}
+
 	while (setns(c->pasta_netns_fd, CLONE_NEWNET) &&
 	       !close(c->pasta_netns_fd));
 
@@ -257,6 +260,8 @@ void pasta_start_ns(struct ctx *c, uid_t uid, gid_t gid,
 	}
 
 	NS_CALL(pasta_wait_for_ns, c);
+	if (c->pasta_netns_fd < 0)
+		die("Failed to join network namespace");
 }
 
 /**
-- 
@@ -95,8 +95,11 @@ static int pasta_wait_for_ns(void *arg)
 	char ns[PATH_MAX];
 
 	snprintf(ns, PATH_MAX, "/proc/%i/ns/net", pasta_child_pid);
-	do
-		while ((c->pasta_netns_fd = open(ns, flags)) < 0);
+	while ((c->pasta_netns_fd = open(ns, flags)) < 0) {
+		if (errno != ENOENT)
+			return 0;
+	}
+
 	while (setns(c->pasta_netns_fd, CLONE_NEWNET) &&
 	       !close(c->pasta_netns_fd));
 
@@ -257,6 +260,8 @@ void pasta_start_ns(struct ctx *c, uid_t uid, gid_t gid,
 	}
 
 	NS_CALL(pasta_wait_for_ns, c);
+	if (c->pasta_netns_fd < 0)
+		die("Failed to join network namespace");
 }
 
 /**
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 03/10] netlink: Add functionality to copy routes from outer namespace
  2023-05-21 23:42 [PATCH v2 00/10] Optionally copy all routes and addresses for pasta, allow gateway-less routes Stefano Brivio
  2023-05-21 23:42 ` [PATCH v2 01/10] netlink: Fix comment about response buffer size for nl_req() Stefano Brivio
  2023-05-21 23:42 ` [PATCH v2 02/10] pasta: Improve error handling on failure to join network namespace Stefano Brivio
@ 2023-05-21 23:42 ` Stefano Brivio
  2023-05-22  8:42   ` David Gibson
  2023-05-21 23:42 ` [PATCH v2 04/10] conf: --config-net option is for pasta mode only Stefano Brivio
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Stefano Brivio @ 2023-05-21 23:42 UTC (permalink / raw)
  To: passt-dev; +Cc: Callum Parsey, me, David Gibson, lemmi, Andrea Arcangeli

Instead of just fetching the default gateway and configuring a single
equivalent route in the target namespace, on 'pasta --config-net', it
might be desirable in some cases to copy the whole set of routes
corresponding to a given output interface.

For instance, in:
  https://github.com/containers/podman/issues/18539
  IPv4 Default Route Does Not Propagate to Pasta Containers on Hetzner VPSes

configuring the default gateway won't work without a gateway-less
route (specifying the output interface only), because the default
gateway is, somewhat dubiously, not on the same subnet as the
container.

This is a similar case to the one covered by commit 7656a6f88882
("conf: Adjust netmask on mismatch between IPv4 address/netmask and
gateway"), and I'm not exactly proud of that workaround.

We also have:
  https://bugs.passt.top/show_bug.cgi?id=49
  pasta does not work with tap-style interface

for which, eventually, we should be able to configure a gateway-less
route in the target namespace.

Introduce different operation modes for nl_route(), including a new
NL_DUP one, not exposed yet, which simply parrots back to the kernel
the route dump for a given interface from the outer namespace, fixing
up flags and interface indices on the way, and requesting to add the
same routes in the target namespace, on the interface we manage.

For n routes we want to duplicate, send n identical netlink requests
including the full dump: routes might depend on each other and the
kernel processes RTM_NEWROUTE messages sequentially, not atomically,
and repeating the full dump naturally resolves dependencies without
the need to actually calculate them.

I'm not kidding, it actually works pretty well.

Link: https://github.com/containers/podman/issues/18539
Link: https://bugs.passt.top/show_bug.cgi?id=49
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 conf.c    |  4 ++--
 netlink.c | 71 ++++++++++++++++++++++++++++++++++++++++++-------------
 netlink.h |  9 ++++++-
 pasta.c   |  6 +++--
 4 files changed, 68 insertions(+), 22 deletions(-)

diff --git a/conf.c b/conf.c
index 984c3ce..1f6bbef 100644
--- a/conf.c
+++ b/conf.c
@@ -646,7 +646,7 @@ static unsigned int conf_ip4(unsigned int ifi,
 	}
 
 	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->gw))
-		nl_route(0, ifi, AF_INET, &ip4->gw);
+		nl_route(NL_GET, ifi, 0, AF_INET, &ip4->gw);
 
 	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr))
 		nl_addr(0, ifi, AF_INET, &ip4->addr, &ip4->prefix_len, NULL);
@@ -718,7 +718,7 @@ static unsigned int conf_ip6(unsigned int ifi,
 	}
 
 	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->gw))
-		nl_route(0, ifi, AF_INET6, &ip6->gw);
+		nl_route(NL_GET, ifi, 0, AF_INET6, &ip6->gw);
 
 	nl_addr(0, ifi, AF_INET6,
 		IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ? &ip6->addr : NULL,
diff --git a/netlink.c b/netlink.c
index c07a13c..d93ecda 100644
--- a/netlink.c
+++ b/netlink.c
@@ -185,16 +185,16 @@ unsigned int nl_get_ext_if(sa_family_t af)
 }
 
 /**
- * nl_route() - Get/set default gateway for given interface and address family
- * @ns:		Use netlink socket in namespace
- * @ifi:	Interface index
+ * nl_route() - Get/set/copy routes for given interface and address family
+ * @op:		Requested operation
+ * @ifi:	Interface index in outer network namespace
+ * @ifi_ns:	Interface index in target namespace for NL_SET, NL_DUP
  * @af:		Address family
- * @gw:		Default gateway to fill if zero, to set if not
+ * @gw:		Default gateway to fill on NL_GET, to set on NL_SET
  */
-void nl_route(int ns, unsigned int ifi, sa_family_t af, void *gw)
+void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
+	      sa_family_t af, void *gw)
 {
-	int set = (af == AF_INET6 && !IN6_IS_ADDR_UNSPECIFIED(gw)) ||
-		  (af == AF_INET && *(uint32_t *)gw);
 	struct req_t {
 		struct nlmsghdr nlh;
 		struct rtmsg rtm;
@@ -215,7 +215,7 @@ void nl_route(int ns, unsigned int ifi, sa_family_t af, void *gw)
 			} r4;
 		} set;
 	} req = {
-		.nlh.nlmsg_type	  = set ? RTM_NEWROUTE : RTM_GETROUTE,
+		.nlh.nlmsg_type	  = op == NL_SET ? RTM_NEWROUTE : RTM_GETROUTE,
 		.nlh.nlmsg_flags  = NLM_F_REQUEST,
 		.nlh.nlmsg_seq	  = nl_seq++,
 
@@ -228,14 +228,15 @@ void nl_route(int ns, unsigned int ifi, sa_family_t af, void *gw)
 		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
 		.ifi		  = ifi,
 	};
+	unsigned dup_routes = 0;
+	ssize_t n, nlmsgs_size;
 	struct nlmsghdr *nh;
 	struct rtattr *rta;
-	struct rtmsg *rtm;
 	char buf[NLBUFSIZ];
-	ssize_t n;
+	struct rtmsg *rtm;
 	size_t na;
 
-	if (set) {
+	if (op == NL_SET) {
 		if (af == AF_INET6) {
 			size_t rta_len = RTA_LENGTH(sizeof(req.set.r6.d));
 
@@ -269,31 +270,67 @@ void nl_route(int ns, unsigned int ifi, sa_family_t af, void *gw)
 		req.nlh.nlmsg_flags |= NLM_F_DUMP;
 	}
 
-	if ((n = nl_req(ns, buf, &req, req.nlh.nlmsg_len)) < 0 || set)
+	if ((n = nl_req(op == NL_SET, buf, &req, req.nlh.nlmsg_len)) < 0)
+		return;
+
+	if (op == NL_SET)
 		return;
 
 	nh = (struct nlmsghdr *)buf;
+	nlmsgs_size = n;
+
 	for ( ; NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
 		if (nh->nlmsg_type != RTM_NEWROUTE)
 			goto next;
 
+		if (op == NL_DUP) {
+			nh->nlmsg_seq = nl_seq++;
+			nh->nlmsg_pid = 0;
+			nh->nlmsg_flags &= ~NLM_F_DUMP_FILTERED;
+			nh->nlmsg_flags |= NLM_F_REQUEST | NLM_F_ACK |
+					   NLM_F_CREATE;
+			dup_routes++;
+		}
+
 		rtm = (struct rtmsg *)NLMSG_DATA(nh);
-		if (rtm->rtm_dst_len)
+		if (op == NL_GET && rtm->rtm_dst_len)
 			continue;
 
 		for (rta = RTM_RTA(rtm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
 		     rta = RTA_NEXT(rta, na)) {
-			if (rta->rta_type != RTA_GATEWAY)
-				continue;
+			if (op == NL_GET) {
+				if (rta->rta_type != RTA_GATEWAY)
+					continue;
 
-			memcpy(gw, RTA_DATA(rta), RTA_PAYLOAD(rta));
-			return;
+				memcpy(gw, RTA_DATA(rta), RTA_PAYLOAD(rta));
+				return;
+			}
+
+			if (op == NL_DUP && rta->rta_type == RTA_OIF)
+				*(unsigned int *)RTA_DATA(rta) = ifi_ns;
 		}
 
 next:
 		if (nh->nlmsg_type == NLMSG_DONE)
 			break;
 	}
+
+	if (op == NL_DUP) {
+		char resp[NLBUFSIZ];
+		unsigned i;
+
+		nh = (struct nlmsghdr *)buf;
+		/* Routes might have dependencies between each other, and the
+		 * kernel processes RTM_NEWROUTE messages sequentially. For n
+		 * valid routes, we might need to send up to n requests to get
+		 * all of them inserted. Routes that have been already inserted
+		 * won't cause the whole request to fail, so we can simply
+		 * repeat the whole request. This approach avoids the need to
+		 * calculate dependencies: let the kernel do that.
+		 */
+		for (i = 0; i < dup_routes; i++)
+			nl_req(1, resp, nh, nlmsgs_size);
+	}
 }
 
 /**
diff --git a/netlink.h b/netlink.h
index ca4d6ef..217cf1e 100644
--- a/netlink.h
+++ b/netlink.h
@@ -6,9 +6,16 @@
 #ifndef NETLINK_H
 #define NETLINK_H
 
+enum nl_op {
+	NL_GET,
+	NL_SET,
+	NL_DUP,
+};
+
 void nl_sock_init(const struct ctx *c, bool ns);
 unsigned int nl_get_ext_if(sa_family_t af);
-void nl_route(int ns, unsigned int ifi, sa_family_t af, void *gw);
+void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
+	      sa_family_t af, void *gw);
 void nl_addr(int ns, unsigned int ifi, sa_family_t af,
 	     void *addr, int *prefix_len, void *addr_l);
 void nl_link(int ns, unsigned int ifi, void *mac, int up, int mtu);
diff --git a/pasta.c b/pasta.c
index 2a6fb60..01109f5 100644
--- a/pasta.c
+++ b/pasta.c
@@ -278,14 +278,16 @@ void pasta_ns_conf(struct ctx *c)
 		if (c->ifi4) {
 			nl_addr(1, c->pasta_ifi, AF_INET, &c->ip4.addr,
 				&c->ip4.prefix_len, NULL);
-			nl_route(1, c->pasta_ifi, AF_INET, &c->ip4.gw);
+			nl_route(NL_SET, c->ifi4, c->pasta_ifi, AF_INET,
+				 &c->ip4.gw);
 		}
 
 		if (c->ifi6) {
 			int prefix_len = 64;
 			nl_addr(1, c->pasta_ifi, AF_INET6, &c->ip6.addr,
 				&prefix_len, NULL);
-			nl_route(1, c->pasta_ifi, AF_INET6, &c->ip6.gw);
+			nl_route(NL_SET, c->ifi6, c->pasta_ifi, AF_INET6,
+				 &c->ip6.gw);
 		}
 	} else {
 		nl_link(1, c->pasta_ifi, c->mac_guest, 0, 0);
-- 
@@ -278,14 +278,16 @@ void pasta_ns_conf(struct ctx *c)
 		if (c->ifi4) {
 			nl_addr(1, c->pasta_ifi, AF_INET, &c->ip4.addr,
 				&c->ip4.prefix_len, NULL);
-			nl_route(1, c->pasta_ifi, AF_INET, &c->ip4.gw);
+			nl_route(NL_SET, c->ifi4, c->pasta_ifi, AF_INET,
+				 &c->ip4.gw);
 		}
 
 		if (c->ifi6) {
 			int prefix_len = 64;
 			nl_addr(1, c->pasta_ifi, AF_INET6, &c->ip6.addr,
 				&prefix_len, NULL);
-			nl_route(1, c->pasta_ifi, AF_INET6, &c->ip6.gw);
+			nl_route(NL_SET, c->ifi6, c->pasta_ifi, AF_INET6,
+				 &c->ip6.gw);
 		}
 	} else {
 		nl_link(1, c->pasta_ifi, c->mac_guest, 0, 0);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 04/10] conf: --config-net option is for pasta mode only
  2023-05-21 23:42 [PATCH v2 00/10] Optionally copy all routes and addresses for pasta, allow gateway-less routes Stefano Brivio
                   ` (2 preceding siblings ...)
  2023-05-21 23:42 ` [PATCH v2 03/10] netlink: Add functionality to copy routes from outer namespace Stefano Brivio
@ 2023-05-21 23:42 ` Stefano Brivio
  2023-05-21 23:42 ` [PATCH v2 05/10] conf, pasta: With --config-net, copy all routes by default Stefano Brivio
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Stefano Brivio @ 2023-05-21 23:42 UTC (permalink / raw)
  To: passt-dev; +Cc: Callum Parsey, me, David Gibson, lemmi, Andrea Arcangeli

Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/conf.c b/conf.c
index 1f6bbef..3ee6ae0 100644
--- a/conf.c
+++ b/conf.c
@@ -1184,7 +1184,6 @@ void conf(struct ctx *c, int argc, char **argv)
 		{"userns",	required_argument,	NULL,		2 },
 		{"netns",	required_argument,	NULL,		3 },
 		{"netns-only",	no_argument,		&netns_only,	1 },
-		{"config-net",	no_argument,		&c->pasta_conf_ns, 1 },
 		{"ns-mac-addr",	required_argument,	NULL,		4 },
 		{"dhcp-dns",	no_argument,		NULL,		5 },
 		{"no-dhcp-dns",	no_argument,		NULL,		6 },
@@ -1198,6 +1197,7 @@ void conf(struct ctx *c, int argc, char **argv)
 		{"version",	no_argument,		NULL,		14 },
 		{"outbound-if4", required_argument,	NULL,		15 },
 		{"outbound-if6", required_argument,	NULL,		16 },
+		{"config-net",	no_argument,		NULL,		17 },
 		{ 0 },
 	};
 	struct get_bound_ports_ns_arg ns_ports_arg = { .c = c };
@@ -1355,6 +1355,12 @@ void conf(struct ctx *c, int argc, char **argv)
 			if (ret <= 0 || ret >= (int)sizeof(c->ip6.ifname_out))
 				die("Invalid interface name: %s", optarg);
 
+			break;
+		case 17:
+			if (c->mode != MODE_PASTA)
+				die("--config-net is for pasta mode only");
+
+			c->pasta_conf_ns = 1;
 			break;
 		case 'd':
 			if (c->debug)
-- 
@@ -1184,7 +1184,6 @@ void conf(struct ctx *c, int argc, char **argv)
 		{"userns",	required_argument,	NULL,		2 },
 		{"netns",	required_argument,	NULL,		3 },
 		{"netns-only",	no_argument,		&netns_only,	1 },
-		{"config-net",	no_argument,		&c->pasta_conf_ns, 1 },
 		{"ns-mac-addr",	required_argument,	NULL,		4 },
 		{"dhcp-dns",	no_argument,		NULL,		5 },
 		{"no-dhcp-dns",	no_argument,		NULL,		6 },
@@ -1198,6 +1197,7 @@ void conf(struct ctx *c, int argc, char **argv)
 		{"version",	no_argument,		NULL,		14 },
 		{"outbound-if4", required_argument,	NULL,		15 },
 		{"outbound-if6", required_argument,	NULL,		16 },
+		{"config-net",	no_argument,		NULL,		17 },
 		{ 0 },
 	};
 	struct get_bound_ports_ns_arg ns_ports_arg = { .c = c };
@@ -1355,6 +1355,12 @@ void conf(struct ctx *c, int argc, char **argv)
 			if (ret <= 0 || ret >= (int)sizeof(c->ip6.ifname_out))
 				die("Invalid interface name: %s", optarg);
 
+			break;
+		case 17:
+			if (c->mode != MODE_PASTA)
+				die("--config-net is for pasta mode only");
+
+			c->pasta_conf_ns = 1;
 			break;
 		case 'd':
 			if (c->debug)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 05/10] conf, pasta: With --config-net, copy all routes by default
  2023-05-21 23:42 [PATCH v2 00/10] Optionally copy all routes and addresses for pasta, allow gateway-less routes Stefano Brivio
                   ` (3 preceding siblings ...)
  2023-05-21 23:42 ` [PATCH v2 04/10] conf: --config-net option is for pasta mode only Stefano Brivio
@ 2023-05-21 23:42 ` Stefano Brivio
  2023-05-22  8:44   ` David Gibson
  2023-05-21 23:42 ` [PATCH v2 06/10] Revert "conf: Adjust netmask on mismatch between IPv4 address/netmask and gateway" Stefano Brivio
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Stefano Brivio @ 2023-05-21 23:42 UTC (permalink / raw)
  To: passt-dev; +Cc: Callum Parsey, me, David Gibson, lemmi, Andrea Arcangeli

Use the newly-introduced NL_DUP mode for nl_route() to copy all the
routes associated to the template interface in the outer namespace,
unless --no-copy-routes (also implied by -g) is given.

Otherwise, we can't use default gateways which are not, address-wise,
on the same subnet as the container, as reported by Callum.

Reported-by: Callum Parsey <callum@neoninteger.au>
Link: https://github.com/containers/podman/issues/18539
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 conf.c  | 14 ++++++++++++++
 passt.1 | 10 ++++++++++
 passt.h |  4 +++-
 pasta.c |  6 ++++--
 4 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/conf.c b/conf.c
index 3ee6ae0..7541261 100644
--- a/conf.c
+++ b/conf.c
@@ -923,6 +923,7 @@ pasta_opts:
 	info(   "  --no-netns-quit	Don't quit if filesystem-bound target");
 	info(   "  			network namespace is deleted");
 	info(   "  --config-net		Configure tap interface in namespace");
+	info(   "  --no-copy-routes	Don't copy all routes to namespace");
 	info(   "  --ns-mac-addr ADDR	Set MAC address on tap interface");
 
 	exit(EXIT_FAILURE);
@@ -1198,6 +1199,7 @@ void conf(struct ctx *c, int argc, char **argv)
 		{"outbound-if4", required_argument,	NULL,		15 },
 		{"outbound-if6", required_argument,	NULL,		16 },
 		{"config-net",	no_argument,		NULL,		17 },
+		{"no-copy-routes", no_argument,		NULL,		18 },
 		{ 0 },
 	};
 	struct get_bound_ports_ns_arg ns_ports_arg = { .c = c };
@@ -1362,6 +1364,12 @@ void conf(struct ctx *c, int argc, char **argv)
 
 			c->pasta_conf_ns = 1;
 			break;
+		case 18:
+			if (c->mode != MODE_PASTA)
+				die("--no-copy-routes is for pasta mode only");
+
+			c->no_copy_routes = 1;
+			break;
 		case 'd':
 			if (c->debug)
 				die("Multiple --debug options given");
@@ -1510,6 +1518,9 @@ void conf(struct ctx *c, int argc, char **argv)
 			}
 			break;
 		case 'g':
+			if (c->mode == MODE_PASTA)
+				c->no_copy_routes = 1;
+
 			if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.gw)		&&
 			    inet_pton(AF_INET6, optarg, &c->ip6.gw)	&&
 			    !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.gw)	&&
@@ -1644,6 +1655,9 @@ void conf(struct ctx *c, int argc, char **argv)
 	if (*c->sock_path && c->fd_tap >= 0)
 		die("Options --socket and --fd are mutually exclusive");
 
+	if (c->mode == MODE_PASTA && c->no_copy_routes && !c->pasta_conf_ns)
+		die("Option --no-copy-routes needs --config-net");
+
 	if (!ifi4 && *c->ip4.ifname_out)
 		ifi4 = if_nametoindex(c->ip4.ifname_out);
 
diff --git a/passt.1 b/passt.1
index 19e01d5..10c96ae 100644
--- a/passt.1
+++ b/passt.1
@@ -546,6 +546,16 @@ NAME are given as target), do not exit once the network namespace is deleted.
 Configure networking in the namespace: set up addresses and routes as configured
 or sourced from the host, and bring up the tap interface.
 
+.TP
+.BR \-\-no-copy-routes
+With \-\-config-net, do not copy all the routes associated to the interface we
+derive addresses and routes from: set up only the default gateway. Implied by
+-g, \-\-gateway.
+
+Default is to copy all the routing entries from the interface in the outer
+namespace to the target namespace, translating the output interface attribute to
+the outbound interface in the namespace.
+
 .TP
 .BR \-\-ns-mac-addr " " \fIaddr
 Configure MAC address \fIaddr\fR on the tap interface in the namespace.
diff --git a/passt.h b/passt.h
index 73fe808..d314596 100644
--- a/passt.h
+++ b/passt.h
@@ -181,7 +181,8 @@ struct ip6_ctx {
  * @ip6:		IPv6 configuration
  * @pasta_ifn:		Name of namespace interface for pasta
  * @pasta_ifn:		Index of namespace interface for pasta
- * @pasta_conf_ns:	Configure namespace interface after creating it
+ * @pasta_conf_ns:	Configure namespace after creating it
+ * @no_copy_routes:	Don't copy all routes when configuring target namespace
  * @no_tcp:		Disable TCP operation
  * @tcp:		Context for TCP protocol handler
  * @no_tcp:		Disable UDP operation
@@ -240,6 +241,7 @@ struct ctx {
 	char pasta_ifn[IF_NAMESIZE];
 	unsigned int pasta_ifi;
 	int pasta_conf_ns;
+	int no_copy_routes;
 
 	int no_tcp;
 	struct tcp_ctx tcp;
diff --git a/pasta.c b/pasta.c
index 01109f5..b546c93 100644
--- a/pasta.c
+++ b/pasta.c
@@ -273,12 +273,14 @@ void pasta_ns_conf(struct ctx *c)
 	nl_link(1, 1 /* lo */, MAC_ZERO, 1, 0);
 
 	if (c->pasta_conf_ns) {
+		enum nl_op op_routes = c->no_copy_routes ? NL_SET : NL_DUP;
+
 		nl_link(1, c->pasta_ifi, c->mac_guest, 1, c->mtu);
 
 		if (c->ifi4) {
 			nl_addr(1, c->pasta_ifi, AF_INET, &c->ip4.addr,
 				&c->ip4.prefix_len, NULL);
-			nl_route(NL_SET, c->ifi4, c->pasta_ifi, AF_INET,
+			nl_route(op_routes, c->ifi4, c->pasta_ifi, AF_INET,
 				 &c->ip4.gw);
 		}
 
@@ -286,7 +288,7 @@ void pasta_ns_conf(struct ctx *c)
 			int prefix_len = 64;
 			nl_addr(1, c->pasta_ifi, AF_INET6, &c->ip6.addr,
 				&prefix_len, NULL);
-			nl_route(NL_SET, c->ifi6, c->pasta_ifi, AF_INET6,
+			nl_route(op_routes, c->ifi6, c->pasta_ifi, AF_INET6,
 				 &c->ip6.gw);
 		}
 	} else {
-- 
@@ -273,12 +273,14 @@ void pasta_ns_conf(struct ctx *c)
 	nl_link(1, 1 /* lo */, MAC_ZERO, 1, 0);
 
 	if (c->pasta_conf_ns) {
+		enum nl_op op_routes = c->no_copy_routes ? NL_SET : NL_DUP;
+
 		nl_link(1, c->pasta_ifi, c->mac_guest, 1, c->mtu);
 
 		if (c->ifi4) {
 			nl_addr(1, c->pasta_ifi, AF_INET, &c->ip4.addr,
 				&c->ip4.prefix_len, NULL);
-			nl_route(NL_SET, c->ifi4, c->pasta_ifi, AF_INET,
+			nl_route(op_routes, c->ifi4, c->pasta_ifi, AF_INET,
 				 &c->ip4.gw);
 		}
 
@@ -286,7 +288,7 @@ void pasta_ns_conf(struct ctx *c)
 			int prefix_len = 64;
 			nl_addr(1, c->pasta_ifi, AF_INET6, &c->ip6.addr,
 				&prefix_len, NULL);
-			nl_route(NL_SET, c->ifi6, c->pasta_ifi, AF_INET6,
+			nl_route(op_routes, c->ifi6, c->pasta_ifi, AF_INET6,
 				 &c->ip6.gw);
 		}
 	} else {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 06/10] Revert "conf: Adjust netmask on mismatch between IPv4 address/netmask and gateway"
  2023-05-21 23:42 [PATCH v2 00/10] Optionally copy all routes and addresses for pasta, allow gateway-less routes Stefano Brivio
                   ` (4 preceding siblings ...)
  2023-05-21 23:42 ` [PATCH v2 05/10] conf, pasta: With --config-net, copy all routes by default Stefano Brivio
@ 2023-05-21 23:42 ` Stefano Brivio
  2023-05-21 23:42 ` [PATCH v2 07/10] conf: Don't exit if sourced default route has no gateway Stefano Brivio
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Stefano Brivio @ 2023-05-21 23:42 UTC (permalink / raw)
  To: passt-dev; +Cc: Callum Parsey, me, David Gibson, lemmi, Andrea Arcangeli

This reverts commit 7656a6f8888237b9e23d63666e921528b6aaf950: now, by
default, we copy all the routes associated to the outbound interface
into the routing table of the container, so there's no need for this
horrible workaround anymore.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c | 25 +------------------------
 1 file changed, 1 insertion(+), 24 deletions(-)

diff --git a/conf.c b/conf.c
index 7541261..1392da5 100644
--- a/conf.c
+++ b/conf.c
@@ -634,9 +634,6 @@ static int conf_ip4_prefix(const char *arg)
 static unsigned int conf_ip4(unsigned int ifi,
 			     struct ip4_ctx *ip4, unsigned char *mac)
 {
-	in_addr_t addr, gw;
-	int shift;
-
 	if (!ifi)
 		ifi = nl_get_ext_if(AF_INET);
 
@@ -651,10 +648,8 @@ static unsigned int conf_ip4(unsigned int ifi,
 	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr))
 		nl_addr(0, ifi, AF_INET, &ip4->addr, &ip4->prefix_len, NULL);
 
-	addr = ntohl(ip4->addr.s_addr);
-	gw = ntohl(ip4->gw.s_addr);
-
 	if (!ip4->prefix_len) {
+		in_addr_t addr = ntohl(ip4->addr.s_addr);
 		if (IN_CLASSA(addr))
 			ip4->prefix_len = (32 - IN_CLASSA_NSHIFT);
 		else if (IN_CLASSB(addr))
@@ -665,24 +660,6 @@ static unsigned int conf_ip4(unsigned int ifi,
 			ip4->prefix_len = 32;
 	}
 
-	/* We might get an address with a netmask that makes the default
-	 * gateway unreachable, and in that case we would fail to configure
-	 * the default route, with --config-net, or presumably a DHCP client
-	 * in the guest or container would face the same issue.
-	 *
-	 * The host might have another route, to the default gateway itself,
-	 * fixing the situation, but we only read default routes.
-	 *
-	 * Fix up the mask to allow reaching the default gateway from our
-	 * configured address, if needed, and only if we find a non-zero
-	 * mask that makes the gateway reachable.
-	 */
-	shift = 32 - ip4->prefix_len;
-	while (shift < 32 && addr >> shift != gw >> shift)
-		shift++;
-	if (shift < 32)
-		ip4->prefix_len = 32 - shift;
-
 	memcpy(&ip4->addr_seen, &ip4->addr, sizeof(ip4->addr_seen));
 
 	if (MAC_IS_ZERO(mac))
-- 
@@ -634,9 +634,6 @@ static int conf_ip4_prefix(const char *arg)
 static unsigned int conf_ip4(unsigned int ifi,
 			     struct ip4_ctx *ip4, unsigned char *mac)
 {
-	in_addr_t addr, gw;
-	int shift;
-
 	if (!ifi)
 		ifi = nl_get_ext_if(AF_INET);
 
@@ -651,10 +648,8 @@ static unsigned int conf_ip4(unsigned int ifi,
 	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr))
 		nl_addr(0, ifi, AF_INET, &ip4->addr, &ip4->prefix_len, NULL);
 
-	addr = ntohl(ip4->addr.s_addr);
-	gw = ntohl(ip4->gw.s_addr);
-
 	if (!ip4->prefix_len) {
+		in_addr_t addr = ntohl(ip4->addr.s_addr);
 		if (IN_CLASSA(addr))
 			ip4->prefix_len = (32 - IN_CLASSA_NSHIFT);
 		else if (IN_CLASSB(addr))
@@ -665,24 +660,6 @@ static unsigned int conf_ip4(unsigned int ifi,
 			ip4->prefix_len = 32;
 	}
 
-	/* We might get an address with a netmask that makes the default
-	 * gateway unreachable, and in that case we would fail to configure
-	 * the default route, with --config-net, or presumably a DHCP client
-	 * in the guest or container would face the same issue.
-	 *
-	 * The host might have another route, to the default gateway itself,
-	 * fixing the situation, but we only read default routes.
-	 *
-	 * Fix up the mask to allow reaching the default gateway from our
-	 * configured address, if needed, and only if we find a non-zero
-	 * mask that makes the gateway reachable.
-	 */
-	shift = 32 - ip4->prefix_len;
-	while (shift < 32 && addr >> shift != gw >> shift)
-		shift++;
-	if (shift < 32)
-		ip4->prefix_len = 32 - shift;
-
 	memcpy(&ip4->addr_seen, &ip4->addr, sizeof(ip4->addr_seen));
 
 	if (MAC_IS_ZERO(mac))
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 07/10] conf: Don't exit if sourced default route has no gateway
  2023-05-21 23:42 [PATCH v2 00/10] Optionally copy all routes and addresses for pasta, allow gateway-less routes Stefano Brivio
                   ` (5 preceding siblings ...)
  2023-05-21 23:42 ` [PATCH v2 06/10] Revert "conf: Adjust netmask on mismatch between IPv4 address/netmask and gateway" Stefano Brivio
@ 2023-05-21 23:42 ` Stefano Brivio
  2023-05-22  8:48   ` David Gibson
  2023-05-21 23:42 ` [PATCH v2 08/10] netlink: Add functionality to copy addresses from outer namespace Stefano Brivio
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Stefano Brivio @ 2023-05-21 23:42 UTC (permalink / raw)
  To: passt-dev; +Cc: Callum Parsey, me, David Gibson, lemmi, Andrea Arcangeli

If we use a template interface without a gateway on the default
route, we can still offer almost complete functionality, except that,
of course, we can't map the gateway address to the outer namespace or
host, and that we have no obvious server address or identifier for
use in DHCP's siaddr and option 54 (Server identifier, mandatory).

Continue, if we have a default route but no default gateway, and
imply --no-map-gw and --no-dhcp in that case. NDP responder and
DHCPv6 should be able to work as usual because we require a
link-local address to be present, and we'll fall back to that.

Together with the previous commits implementing an actual copy of
routes from the outer namespace, this should finally fix the
operation of 'pasta --config-net' for cases where we have a default
route on the host, but no default gateway, as it's the case for
tap-style routes, including typical Wireguard endpoints.

Reported-by: me@yawnt.com
Link: https://bugs.passt.top/show_bug.cgi?id=49
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 conf.c  | 10 +++++++---
 passt.1 |  6 ++++--
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/conf.c b/conf.c
index 1392da5..c07b697 100644
--- a/conf.c
+++ b/conf.c
@@ -665,8 +665,7 @@ static unsigned int conf_ip4(unsigned int ifi,
 	if (MAC_IS_ZERO(mac))
 		nl_link(0, ifi, mac, 0, 0);
 
-	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->gw) ||
-	    IN4_IS_ADDR_UNSPECIFIED(&ip4->addr) ||
+	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr) ||
 	    MAC_IS_ZERO(mac))
 		return 0;
 
@@ -708,7 +707,6 @@ static unsigned int conf_ip6(unsigned int ifi,
 		nl_link(0, ifi, mac, 0, 0);
 
 	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->gw) ||
-	    IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ||
 	    IN6_IS_ADDR_UNSPECIFIED(&ip6->addr_ll) ||
 	    MAC_IS_ZERO(mac))
 		return 0;
@@ -1658,6 +1656,12 @@ void conf(struct ctx *c, int argc, char **argv)
 	    (*c->ip6.ifname_out && !c->ifi6))
 		die("External interface not usable");
 
+	if (c->ifi4 && IN4_IS_ADDR_UNSPECIFIED(&c->ip4.gw))
+		c->no_map_gw = c->no_dhcp = 1;
+
+	if (c->ifi6 && IN6_IS_ADDR_UNSPECIFIED(&c->ip6.gw))
+		c->no_map_gw = 1;
+
 	/* Inbound port options can be parsed now (after IPv4/IPv6 settings) */
 	optind = 1;
 	do {
diff --git a/passt.1 b/passt.1
index 10c96ae..f965c34 100644
--- a/passt.1
+++ b/passt.1
@@ -281,7 +281,8 @@ guest or target namespace will be silently dropped.
 .TP
 .BR \-\-no-dhcp
 Disable the DHCP server. DHCP client requests coming from guest or target
-namespace will be silently dropped.
+namespace will be silently dropped. Implied if there is no gateway on the
+selected IPv4 default route.
 
 .TP
 .BR \-\-no-ndp
@@ -301,7 +302,8 @@ namespace will be ignored.
 .TP
 .BR \-\-no-map-gw
 Don't remap TCP connections and untracked UDP traffic, with the gateway address
-as destination, to the host.
+as destination, to the host. Implied if there is no gateway on the selected
+default route for any of the enabled address families.
 
 .TP
 .BR \-4 ", " \-\-ipv4-only
-- 
@@ -281,7 +281,8 @@ guest or target namespace will be silently dropped.
 .TP
 .BR \-\-no-dhcp
 Disable the DHCP server. DHCP client requests coming from guest or target
-namespace will be silently dropped.
+namespace will be silently dropped. Implied if there is no gateway on the
+selected IPv4 default route.
 
 .TP
 .BR \-\-no-ndp
@@ -301,7 +302,8 @@ namespace will be ignored.
 .TP
 .BR \-\-no-map-gw
 Don't remap TCP connections and untracked UDP traffic, with the gateway address
-as destination, to the host.
+as destination, to the host. Implied if there is no gateway on the selected
+default route for any of the enabled address families.
 
 .TP
 .BR \-4 ", " \-\-ipv4-only
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 08/10] netlink: Add functionality to copy addresses from outer namespace
  2023-05-21 23:42 [PATCH v2 00/10] Optionally copy all routes and addresses for pasta, allow gateway-less routes Stefano Brivio
                   ` (6 preceding siblings ...)
  2023-05-21 23:42 ` [PATCH v2 07/10] conf: Don't exit if sourced default route has no gateway Stefano Brivio
@ 2023-05-21 23:42 ` Stefano Brivio
  2023-05-22  8:51   ` David Gibson
  2023-05-21 23:42 ` [PATCH v2 09/10] conf, pasta: With --config-net, copy all addresses by default Stefano Brivio
  2023-05-21 23:42 ` [PATCH v2 10/10] passt.h: Fix description of pasta_ifi in struct ctx Stefano Brivio
  9 siblings, 1 reply; 23+ messages in thread
From: Stefano Brivio @ 2023-05-21 23:42 UTC (permalink / raw)
  To: passt-dev; +Cc: Callum Parsey, me, David Gibson, lemmi, Andrea Arcangeli

Similarly to what we've just done with routes, support NL_DUP for
addresses (currently not exposed): nl_addr() can optionally copy
mulitple addresses to the target namespace, by fixing up data from
the dump with appropriate flags and interface index, and repeating
it back to the kernel on the socket opened in the target namespace.

Link-local addresses are not copied: the family is set to AF_UNSPEC,
which means the kernel will ignore them. Same for addresses from a
mismatching address (pre-4.19 kernels without support for
NETLINK_GET_STRICT_CHK).

Ignore IFA_LABEL attributes by changing their type to IFA_UNSPEC,
because in general they will report mismatching names, and we don't
really need to use labels as we already know the interface index.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 conf.c    |  8 ++++---
 netlink.c | 62 +++++++++++++++++++++++++++++++++++++++++--------------
 netlink.h |  4 ++--
 pasta.c   |  8 +++----
 4 files changed, 58 insertions(+), 24 deletions(-)

diff --git a/conf.c b/conf.c
index c07b697..1ffd05c 100644
--- a/conf.c
+++ b/conf.c
@@ -645,8 +645,10 @@ static unsigned int conf_ip4(unsigned int ifi,
 	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->gw))
 		nl_route(NL_GET, ifi, 0, AF_INET, &ip4->gw);
 
-	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr))
-		nl_addr(0, ifi, AF_INET, &ip4->addr, &ip4->prefix_len, NULL);
+	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr)) {
+		nl_addr(NL_GET, ifi, 0, AF_INET,
+			&ip4->addr, &ip4->prefix_len, NULL);
+	}
 
 	if (!ip4->prefix_len) {
 		in_addr_t addr = ntohl(ip4->addr.s_addr);
@@ -696,7 +698,7 @@ static unsigned int conf_ip6(unsigned int ifi,
 	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->gw))
 		nl_route(NL_GET, ifi, 0, AF_INET6, &ip6->gw);
 
-	nl_addr(0, ifi, AF_INET6,
+	nl_addr(NL_GET, ifi, 0, AF_INET6,
 		IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ? &ip6->addr : NULL,
 		&prefix_len, &ip6->addr_ll);
 
diff --git a/netlink.c b/netlink.c
index d93ecda..bc5b2bf 100644
--- a/netlink.c
+++ b/netlink.c
@@ -334,19 +334,18 @@ next:
 }
 
 /**
- * nl_addr() - Get/set IP addresses
- * @ns:		Use netlink socket in namespace
- * @ifi:	Interface index
+ * nl_addr() - Get/set/copy IP addresses for given interface and address family
+ * @op:		Requested operation
+ * @ifi:	Interface index in outer network namespace
+ * @ifi_ns:	Interface index in target namespace for NL_SET, NL_DUP
  * @af:		Address family
- * @addr:	Global address to fill if zero, to set if not, ignored if NULL
+ * @addr:	Global address to fill on NL_GET, to set on NL_SET
  * @prefix_len:	Mask or prefix length, set or fetched (for IPv4)
- * @addr_l:	Link-scoped address to fill, NULL if not requested
+ * @addr_l:	Link-scoped address to fill on NL_GET
  */
-void nl_addr(int ns, unsigned int ifi, sa_family_t af,
-	     void *addr, int *prefix_len, void *addr_l)
+void nl_addr(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
+	     sa_family_t af, void *addr, int *prefix_len, void *addr_l)
 {
-	int set = addr && ((af == AF_INET6 && !IN6_IS_ADDR_UNSPECIFIED(addr)) ||
-			   (af == AF_INET && *(uint32_t *)addr));
 	struct req_t {
 		struct nlmsghdr nlh;
 		struct ifaddrmsg ifa;
@@ -365,23 +364,23 @@ void nl_addr(int ns, unsigned int ifi, sa_family_t af,
 			} a6;
 		} set;
 	} req = {
-		.nlh.nlmsg_type    = set ? RTM_NEWADDR : RTM_GETADDR,
+		.nlh.nlmsg_type    = op == NL_SET ? RTM_NEWADDR : RTM_GETADDR,
 		.nlh.nlmsg_flags   = NLM_F_REQUEST,
 		.nlh.nlmsg_len     = NLMSG_LENGTH(sizeof(struct ifaddrmsg)),
 		.nlh.nlmsg_seq     = nl_seq++,
 
 		.ifa.ifa_family    = af,
 		.ifa.ifa_index     = ifi,
-		.ifa.ifa_prefixlen = *prefix_len,
+		.ifa.ifa_prefixlen = op == NL_SET ? *prefix_len : 0,
 	};
+	ssize_t n, nlmsgs_size;
 	struct ifaddrmsg *ifa;
 	struct nlmsghdr *nh;
 	struct rtattr *rta;
 	char buf[NLBUFSIZ];
-	ssize_t n;
 	size_t na;
 
-	if (set) {
+	if (op == NL_SET) {
 		if (af == AF_INET6) {
 			size_t rta_len = RTA_LENGTH(sizeof(req.set.a6.l));
 
@@ -416,21 +415,47 @@ void nl_addr(int ns, unsigned int ifi, sa_family_t af,
 		req.nlh.nlmsg_flags |= NLM_F_DUMP;
 	}
 
-	if ((n = nl_req(ns, buf, &req, req.nlh.nlmsg_len)) < 0 || set)
+	if ((n = nl_req(op == NL_SET, buf, &req, req.nlh.nlmsg_len)) < 0)
+		return;
+
+	if (op == NL_SET)
 		return;
 
 	nh = (struct nlmsghdr *)buf;
+	nlmsgs_size = n;
+
 	for ( ; NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
 		if (nh->nlmsg_type != RTM_NEWADDR)
 			goto next;
 
+		if (op == NL_DUP) {
+			nh->nlmsg_seq = nl_seq++;
+			nh->nlmsg_pid = 0;
+			nh->nlmsg_flags &= ~NLM_F_DUMP_FILTERED;
+			nh->nlmsg_flags |= NLM_F_REQUEST | NLM_F_ACK |
+					   NLM_F_CREATE;
+		}
+
 		ifa = (struct ifaddrmsg *)NLMSG_DATA(nh);
+
+		if (op == NL_DUP && (ifa->ifa_scope == RT_SCOPE_LINK ||
+				     ifa->ifa_index != ifi)) {
+			ifa->ifa_family = AF_UNSPEC;
+			goto next;
+		}
+
 		if (ifa->ifa_index != ifi)
 			goto next;
 
+		if (op == NL_DUP)
+			ifa->ifa_index = ifi_ns;
+
 		for (rta = IFA_RTA(ifa), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
 		     rta = RTA_NEXT(rta, na)) {
-			if (rta->rta_type != IFA_ADDRESS)
+			if (op == NL_DUP && rta->rta_type == IFA_LABEL)
+				rta->rta_type = IFA_UNSPEC;
+
+			if (op == NL_DUP || rta->rta_type != IFA_ADDRESS)
 				continue;
 
 			if (af == AF_INET && addr && !*(uint32_t *)addr) {
@@ -451,6 +476,13 @@ next:
 		if (nh->nlmsg_type == NLMSG_DONE)
 			break;
 	}
+
+	if (op == NL_DUP) {
+		char resp[NLBUFSIZ];
+
+		nh = (struct nlmsghdr *)buf;
+		nl_req(1, resp, nh, nlmsgs_size);
+	}
 }
 
 /**
diff --git a/netlink.h b/netlink.h
index 217cf1e..cd0e666 100644
--- a/netlink.h
+++ b/netlink.h
@@ -16,8 +16,8 @@ void nl_sock_init(const struct ctx *c, bool ns);
 unsigned int nl_get_ext_if(sa_family_t af);
 void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
 	      sa_family_t af, void *gw);
-void nl_addr(int ns, unsigned int ifi, sa_family_t af,
-	     void *addr, int *prefix_len, void *addr_l);
+void nl_addr(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
+	     sa_family_t af, void *addr, int *prefix_len, void *addr_l);
 void nl_link(int ns, unsigned int ifi, void *mac, int up, int mtu);
 
 #endif /* NETLINK_H */
diff --git a/pasta.c b/pasta.c
index b546c93..99ef3fc 100644
--- a/pasta.c
+++ b/pasta.c
@@ -278,16 +278,16 @@ void pasta_ns_conf(struct ctx *c)
 		nl_link(1, c->pasta_ifi, c->mac_guest, 1, c->mtu);
 
 		if (c->ifi4) {
-			nl_addr(1, c->pasta_ifi, AF_INET, &c->ip4.addr,
-				&c->ip4.prefix_len, NULL);
+			nl_addr(NL_SET, c->ifi4, c->pasta_ifi, AF_INET,
+				&c->ip4.addr, &c->ip4.prefix_len, NULL);
 			nl_route(op_routes, c->ifi4, c->pasta_ifi, AF_INET,
 				 &c->ip4.gw);
 		}
 
 		if (c->ifi6) {
 			int prefix_len = 64;
-			nl_addr(1, c->pasta_ifi, AF_INET6, &c->ip6.addr,
-				&prefix_len, NULL);
+			nl_addr(NL_SET, c->ifi6, c->pasta_ifi, AF_INET6,
+				&c->ip6.addr, &prefix_len, NULL);
 			nl_route(op_routes, c->ifi6, c->pasta_ifi, AF_INET6,
 				 &c->ip6.gw);
 		}
-- 
@@ -278,16 +278,16 @@ void pasta_ns_conf(struct ctx *c)
 		nl_link(1, c->pasta_ifi, c->mac_guest, 1, c->mtu);
 
 		if (c->ifi4) {
-			nl_addr(1, c->pasta_ifi, AF_INET, &c->ip4.addr,
-				&c->ip4.prefix_len, NULL);
+			nl_addr(NL_SET, c->ifi4, c->pasta_ifi, AF_INET,
+				&c->ip4.addr, &c->ip4.prefix_len, NULL);
 			nl_route(op_routes, c->ifi4, c->pasta_ifi, AF_INET,
 				 &c->ip4.gw);
 		}
 
 		if (c->ifi6) {
 			int prefix_len = 64;
-			nl_addr(1, c->pasta_ifi, AF_INET6, &c->ip6.addr,
-				&prefix_len, NULL);
+			nl_addr(NL_SET, c->ifi6, c->pasta_ifi, AF_INET6,
+				&c->ip6.addr, &prefix_len, NULL);
 			nl_route(op_routes, c->ifi6, c->pasta_ifi, AF_INET6,
 				 &c->ip6.gw);
 		}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 09/10] conf, pasta: With --config-net, copy all addresses by default
  2023-05-21 23:42 [PATCH v2 00/10] Optionally copy all routes and addresses for pasta, allow gateway-less routes Stefano Brivio
                   ` (7 preceding siblings ...)
  2023-05-21 23:42 ` [PATCH v2 08/10] netlink: Add functionality to copy addresses from outer namespace Stefano Brivio
@ 2023-05-21 23:42 ` Stefano Brivio
  2023-05-22  8:53   ` David Gibson
  2023-05-21 23:42 ` [PATCH v2 10/10] passt.h: Fix description of pasta_ifi in struct ctx Stefano Brivio
  9 siblings, 1 reply; 23+ messages in thread
From: Stefano Brivio @ 2023-05-21 23:42 UTC (permalink / raw)
  To: passt-dev; +Cc: Callum Parsey, me, David Gibson, lemmi, Andrea Arcangeli

Use the newly-introduced NL_DUP mode for nl_addr() to copy all the
addresses associated to the template interface in the outer
namespace, unless --no-copy-addrs (also implied by -a) is given.

This is done mostly for consistency with routes. It might partially
cover the issue at:
  https://bugs.passt.top/show_bug.cgi?id=47
  Support multiple addresses per address family

for some use cases, but not the originally intended one: we'll still
use a single outbound address (unless the routing table specifies
different preferred source addresses depending on the destination),
regardless of the address used in the target namespace.

Link: https://bugs.passt.top/show_bug.cgi?id=47
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 conf.c  | 16 ++++++++++++++--
 passt.1 |  9 +++++++++
 passt.h |  2 ++
 pasta.c |  5 +++--
 4 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/conf.c b/conf.c
index 1ffd05c..e6c68e2 100644
--- a/conf.c
+++ b/conf.c
@@ -901,6 +901,7 @@ pasta_opts:
 	info(   "  			network namespace is deleted");
 	info(   "  --config-net		Configure tap interface in namespace");
 	info(   "  --no-copy-routes	Don't copy all routes to namespace");
+	info(   "  --no-copy-addrs	Don't copy all addresses to namespace");
 	info(   "  --ns-mac-addr ADDR	Set MAC address on tap interface");
 
 	exit(EXIT_FAILURE);
@@ -1177,6 +1178,7 @@ void conf(struct ctx *c, int argc, char **argv)
 		{"outbound-if6", required_argument,	NULL,		16 },
 		{"config-net",	no_argument,		NULL,		17 },
 		{"no-copy-routes", no_argument,		NULL,		18 },
+		{"no-copy-addrs", no_argument,		NULL,		19 },
 		{ 0 },
 	};
 	struct get_bound_ports_ns_arg ns_ports_arg = { .c = c };
@@ -1347,6 +1349,12 @@ void conf(struct ctx *c, int argc, char **argv)
 
 			c->no_copy_routes = 1;
 			break;
+		case 19:
+			if (c->mode != MODE_PASTA)
+				die("--no-copy-addrs is for pasta mode only");
+
+			c->no_copy_addrs = 1;
+			break;
 		case 'd':
 			if (c->debug)
 				die("Multiple --debug options given");
@@ -1632,8 +1640,12 @@ void conf(struct ctx *c, int argc, char **argv)
 	if (*c->sock_path && c->fd_tap >= 0)
 		die("Options --socket and --fd are mutually exclusive");
 
-	if (c->mode == MODE_PASTA && c->no_copy_routes && !c->pasta_conf_ns)
-		die("Option --no-copy-routes needs --config-net");
+	if (c->mode == MODE_PASTA && !c->pasta_conf_ns) {
+		if (c->no_copy_routes)
+			die("Option --no-copy-routes needs --config-net");
+		if (c->no_copy_addrs)
+			die("Option --no-copy-addrs needs --config-net");
+	}
 
 	if (!ifi4 && *c->ip4.ifname_out)
 		ifi4 = if_nametoindex(c->ip4.ifname_out);
diff --git a/passt.1 b/passt.1
index f965c34..87b076d 100644
--- a/passt.1
+++ b/passt.1
@@ -558,6 +558,15 @@ Default is to copy all the routing entries from the interface in the outer
 namespace to the target namespace, translating the output interface attribute to
 the outbound interface in the namespace.
 
+.TP
+.BR \-\-no-copy-addrs
+With \-\-config-net, do not copy all the addresses associated to the interface
+we derive addresses and routes from: set up a single one. Implied by \-a,
+\-\-address.
+
+Default is to copy all the addresses, except for link-local ones, from the
+interface from the outer namespace to the target namespace.
+
 .TP
 .BR \-\-ns-mac-addr " " \fIaddr
 Configure MAC address \fIaddr\fR on the tap interface in the namespace.
diff --git a/passt.h b/passt.h
index d314596..b51a1e5 100644
--- a/passt.h
+++ b/passt.h
@@ -183,6 +183,7 @@ struct ip6_ctx {
  * @pasta_ifn:		Index of namespace interface for pasta
  * @pasta_conf_ns:	Configure namespace after creating it
  * @no_copy_routes:	Don't copy all routes when configuring target namespace
+ * @no_copy_addrs:	Don't copy all addresses when configuring namespace
  * @no_tcp:		Disable TCP operation
  * @tcp:		Context for TCP protocol handler
  * @no_tcp:		Disable UDP operation
@@ -242,6 +243,7 @@ struct ctx {
 	unsigned int pasta_ifi;
 	int pasta_conf_ns;
 	int no_copy_routes;
+	int no_copy_addrs;
 
 	int no_tcp;
 	struct tcp_ctx tcp;
diff --git a/pasta.c b/pasta.c
index 99ef3fc..4054e9a 100644
--- a/pasta.c
+++ b/pasta.c
@@ -274,11 +274,12 @@ void pasta_ns_conf(struct ctx *c)
 
 	if (c->pasta_conf_ns) {
 		enum nl_op op_routes = c->no_copy_routes ? NL_SET : NL_DUP;
+		enum nl_op op_addrs =  c->no_copy_addrs  ? NL_SET : NL_DUP;
 
 		nl_link(1, c->pasta_ifi, c->mac_guest, 1, c->mtu);
 
 		if (c->ifi4) {
-			nl_addr(NL_SET, c->ifi4, c->pasta_ifi, AF_INET,
+			nl_addr(op_addrs, c->ifi4, c->pasta_ifi, AF_INET,
 				&c->ip4.addr, &c->ip4.prefix_len, NULL);
 			nl_route(op_routes, c->ifi4, c->pasta_ifi, AF_INET,
 				 &c->ip4.gw);
@@ -286,7 +287,7 @@ void pasta_ns_conf(struct ctx *c)
 
 		if (c->ifi6) {
 			int prefix_len = 64;
-			nl_addr(NL_SET, c->ifi6, c->pasta_ifi, AF_INET6,
+			nl_addr(op_addrs, c->ifi6, c->pasta_ifi, AF_INET6,
 				&c->ip6.addr, &prefix_len, NULL);
 			nl_route(op_routes, c->ifi6, c->pasta_ifi, AF_INET6,
 				 &c->ip6.gw);
-- 
@@ -274,11 +274,12 @@ void pasta_ns_conf(struct ctx *c)
 
 	if (c->pasta_conf_ns) {
 		enum nl_op op_routes = c->no_copy_routes ? NL_SET : NL_DUP;
+		enum nl_op op_addrs =  c->no_copy_addrs  ? NL_SET : NL_DUP;
 
 		nl_link(1, c->pasta_ifi, c->mac_guest, 1, c->mtu);
 
 		if (c->ifi4) {
-			nl_addr(NL_SET, c->ifi4, c->pasta_ifi, AF_INET,
+			nl_addr(op_addrs, c->ifi4, c->pasta_ifi, AF_INET,
 				&c->ip4.addr, &c->ip4.prefix_len, NULL);
 			nl_route(op_routes, c->ifi4, c->pasta_ifi, AF_INET,
 				 &c->ip4.gw);
@@ -286,7 +287,7 @@ void pasta_ns_conf(struct ctx *c)
 
 		if (c->ifi6) {
 			int prefix_len = 64;
-			nl_addr(NL_SET, c->ifi6, c->pasta_ifi, AF_INET6,
+			nl_addr(op_addrs, c->ifi6, c->pasta_ifi, AF_INET6,
 				&c->ip6.addr, &prefix_len, NULL);
 			nl_route(op_routes, c->ifi6, c->pasta_ifi, AF_INET6,
 				 &c->ip6.gw);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 10/10] passt.h: Fix description of pasta_ifi in struct ctx
  2023-05-21 23:42 [PATCH v2 00/10] Optionally copy all routes and addresses for pasta, allow gateway-less routes Stefano Brivio
                   ` (8 preceding siblings ...)
  2023-05-21 23:42 ` [PATCH v2 09/10] conf, pasta: With --config-net, copy all addresses by default Stefano Brivio
@ 2023-05-21 23:42 ` Stefano Brivio
  9 siblings, 0 replies; 23+ messages in thread
From: Stefano Brivio @ 2023-05-21 23:42 UTC (permalink / raw)
  To: passt-dev; +Cc: Callum Parsey, me, David Gibson, lemmi, Andrea Arcangeli

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 passt.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/passt.h b/passt.h
index b51a1e5..96fd27b 100644
--- a/passt.h
+++ b/passt.h
@@ -180,7 +180,7 @@ struct ip6_ctx {
  * @ifi6:		Index of template interface for IPv6, 0 if IPv6 disabled
  * @ip6:		IPv6 configuration
  * @pasta_ifn:		Name of namespace interface for pasta
- * @pasta_ifn:		Index of namespace interface for pasta
+ * @pasta_ifi:		Index of namespace interface for pasta
  * @pasta_conf_ns:	Configure namespace after creating it
  * @no_copy_routes:	Don't copy all routes when configuring target namespace
  * @no_copy_addrs:	Don't copy all addresses when configuring namespace
-- 
@@ -180,7 +180,7 @@ struct ip6_ctx {
  * @ifi6:		Index of template interface for IPv6, 0 if IPv6 disabled
  * @ip6:		IPv6 configuration
  * @pasta_ifn:		Name of namespace interface for pasta
- * @pasta_ifn:		Index of namespace interface for pasta
+ * @pasta_ifi:		Index of namespace interface for pasta
  * @pasta_conf_ns:	Configure namespace after creating it
  * @no_copy_routes:	Don't copy all routes when configuring target namespace
  * @no_copy_addrs:	Don't copy all addresses when configuring namespace
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 03/10] netlink: Add functionality to copy routes from outer namespace
  2023-05-21 23:42 ` [PATCH v2 03/10] netlink: Add functionality to copy routes from outer namespace Stefano Brivio
@ 2023-05-22  8:42   ` David Gibson
  2023-05-22  9:58     ` Stefano Brivio
  0 siblings, 1 reply; 23+ messages in thread
From: David Gibson @ 2023-05-22  8:42 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Callum Parsey, me, lemmi, Andrea Arcangeli

[-- Attachment #1: Type: text/plain, Size: 9164 bytes --]

On Mon, May 22, 2023 at 01:42:17AM +0200, Stefano Brivio wrote:
> Instead of just fetching the default gateway and configuring a single
> equivalent route in the target namespace, on 'pasta --config-net', it
> might be desirable in some cases to copy the whole set of routes
> corresponding to a given output interface.
> 
> For instance, in:
>   https://github.com/containers/podman/issues/18539
>   IPv4 Default Route Does Not Propagate to Pasta Containers on Hetzner VPSes
> 
> configuring the default gateway won't work without a gateway-less
> route (specifying the output interface only), because the default
> gateway is, somewhat dubiously, not on the same subnet as the
> container.
> 
> This is a similar case to the one covered by commit 7656a6f88882
> ("conf: Adjust netmask on mismatch between IPv4 address/netmask and
> gateway"), and I'm not exactly proud of that workaround.
> 
> We also have:
>   https://bugs.passt.top/show_bug.cgi?id=49
>   pasta does not work with tap-style interface
> 
> for which, eventually, we should be able to configure a gateway-less
> route in the target namespace.
> 
> Introduce different operation modes for nl_route(), including a new
> NL_DUP one, not exposed yet, which simply parrots back to the kernel
> the route dump for a given interface from the outer namespace, fixing
> up flags and interface indices on the way, and requesting to add the
> same routes in the target namespace, on the interface we manage.
> 
> For n routes we want to duplicate, send n identical netlink requests
> including the full dump: routes might depend on each other and the
> kernel processes RTM_NEWROUTE messages sequentially, not atomically,
> and repeating the full dump naturally resolves dependencies without
> the need to actually calculate them.
> 
> I'm not kidding, it actually works pretty well.

If there's a way to detect whether the kernel rejected some of the
routes, it would be nice to cut that loop short as soon as all the
routes are inserted.  Obviously that could be a followup improvement,
though.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> 
> Link: https://github.com/containers/podman/issues/18539
> Link: https://bugs.passt.top/show_bug.cgi?id=49
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> ---
>  conf.c    |  4 ++--
>  netlink.c | 71 ++++++++++++++++++++++++++++++++++++++++++-------------
>  netlink.h |  9 ++++++-
>  pasta.c   |  6 +++--
>  4 files changed, 68 insertions(+), 22 deletions(-)
> 
> diff --git a/conf.c b/conf.c
> index 984c3ce..1f6bbef 100644
> --- a/conf.c
> +++ b/conf.c
> @@ -646,7 +646,7 @@ static unsigned int conf_ip4(unsigned int ifi,
>  	}
>  
>  	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->gw))
> -		nl_route(0, ifi, AF_INET, &ip4->gw);
> +		nl_route(NL_GET, ifi, 0, AF_INET, &ip4->gw);
>  
>  	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr))
>  		nl_addr(0, ifi, AF_INET, &ip4->addr, &ip4->prefix_len, NULL);
> @@ -718,7 +718,7 @@ static unsigned int conf_ip6(unsigned int ifi,
>  	}
>  
>  	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->gw))
> -		nl_route(0, ifi, AF_INET6, &ip6->gw);
> +		nl_route(NL_GET, ifi, 0, AF_INET6, &ip6->gw);
>  
>  	nl_addr(0, ifi, AF_INET6,
>  		IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ? &ip6->addr : NULL,
> diff --git a/netlink.c b/netlink.c
> index c07a13c..d93ecda 100644
> --- a/netlink.c
> +++ b/netlink.c
> @@ -185,16 +185,16 @@ unsigned int nl_get_ext_if(sa_family_t af)
>  }
>  
>  /**
> - * nl_route() - Get/set default gateway for given interface and address family
> - * @ns:		Use netlink socket in namespace
> - * @ifi:	Interface index
> + * nl_route() - Get/set/copy routes for given interface and address family
> + * @op:		Requested operation
> + * @ifi:	Interface index in outer network namespace
> + * @ifi_ns:	Interface index in target namespace for NL_SET, NL_DUP
>   * @af:		Address family
> - * @gw:		Default gateway to fill if zero, to set if not
> + * @gw:		Default gateway to fill on NL_GET, to set on NL_SET
>   */
> -void nl_route(int ns, unsigned int ifi, sa_family_t af, void *gw)
> +void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
> +	      sa_family_t af, void *gw)
>  {
> -	int set = (af == AF_INET6 && !IN6_IS_ADDR_UNSPECIFIED(gw)) ||
> -		  (af == AF_INET && *(uint32_t *)gw);
>  	struct req_t {
>  		struct nlmsghdr nlh;
>  		struct rtmsg rtm;
> @@ -215,7 +215,7 @@ void nl_route(int ns, unsigned int ifi, sa_family_t af, void *gw)
>  			} r4;
>  		} set;
>  	} req = {
> -		.nlh.nlmsg_type	  = set ? RTM_NEWROUTE : RTM_GETROUTE,
> +		.nlh.nlmsg_type	  = op == NL_SET ? RTM_NEWROUTE : RTM_GETROUTE,
>  		.nlh.nlmsg_flags  = NLM_F_REQUEST,
>  		.nlh.nlmsg_seq	  = nl_seq++,
>  
> @@ -228,14 +228,15 @@ void nl_route(int ns, unsigned int ifi, sa_family_t af, void *gw)
>  		.rta.rta_len	  = RTA_LENGTH(sizeof(unsigned int)),
>  		.ifi		  = ifi,
>  	};
> +	unsigned dup_routes = 0;
> +	ssize_t n, nlmsgs_size;
>  	struct nlmsghdr *nh;
>  	struct rtattr *rta;
> -	struct rtmsg *rtm;
>  	char buf[NLBUFSIZ];
> -	ssize_t n;
> +	struct rtmsg *rtm;
>  	size_t na;
>  
> -	if (set) {
> +	if (op == NL_SET) {
>  		if (af == AF_INET6) {
>  			size_t rta_len = RTA_LENGTH(sizeof(req.set.r6.d));
>  
> @@ -269,31 +270,67 @@ void nl_route(int ns, unsigned int ifi, sa_family_t af, void *gw)
>  		req.nlh.nlmsg_flags |= NLM_F_DUMP;
>  	}
>  
> -	if ((n = nl_req(ns, buf, &req, req.nlh.nlmsg_len)) < 0 || set)
> +	if ((n = nl_req(op == NL_SET, buf, &req, req.nlh.nlmsg_len)) < 0)
> +		return;
> +
> +	if (op == NL_SET)
>  		return;
>  
>  	nh = (struct nlmsghdr *)buf;
> +	nlmsgs_size = n;
> +
>  	for ( ; NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
>  		if (nh->nlmsg_type != RTM_NEWROUTE)
>  			goto next;
>  
> +		if (op == NL_DUP) {
> +			nh->nlmsg_seq = nl_seq++;
> +			nh->nlmsg_pid = 0;
> +			nh->nlmsg_flags &= ~NLM_F_DUMP_FILTERED;
> +			nh->nlmsg_flags |= NLM_F_REQUEST | NLM_F_ACK |
> +					   NLM_F_CREATE;
> +			dup_routes++;
> +		}
> +
>  		rtm = (struct rtmsg *)NLMSG_DATA(nh);
> -		if (rtm->rtm_dst_len)
> +		if (op == NL_GET && rtm->rtm_dst_len)
>  			continue;
>  
>  		for (rta = RTM_RTA(rtm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
>  		     rta = RTA_NEXT(rta, na)) {
> -			if (rta->rta_type != RTA_GATEWAY)
> -				continue;
> +			if (op == NL_GET) {
> +				if (rta->rta_type != RTA_GATEWAY)
> +					continue;
>  
> -			memcpy(gw, RTA_DATA(rta), RTA_PAYLOAD(rta));
> -			return;
> +				memcpy(gw, RTA_DATA(rta), RTA_PAYLOAD(rta));
> +				return;
> +			}
> +
> +			if (op == NL_DUP && rta->rta_type == RTA_OIF)
> +				*(unsigned int *)RTA_DATA(rta) = ifi_ns;
>  		}
>  
>  next:
>  		if (nh->nlmsg_type == NLMSG_DONE)
>  			break;
>  	}
> +
> +	if (op == NL_DUP) {
> +		char resp[NLBUFSIZ];
> +		unsigned i;
> +
> +		nh = (struct nlmsghdr *)buf;
> +		/* Routes might have dependencies between each other, and the
> +		 * kernel processes RTM_NEWROUTE messages sequentially. For n
> +		 * valid routes, we might need to send up to n requests to get
> +		 * all of them inserted. Routes that have been already inserted
> +		 * won't cause the whole request to fail, so we can simply
> +		 * repeat the whole request. This approach avoids the need to
> +		 * calculate dependencies: let the kernel do that.
> +		 */
> +		for (i = 0; i < dup_routes; i++)
> +			nl_req(1, resp, nh, nlmsgs_size);
> +	}
>  }
>  
>  /**
> diff --git a/netlink.h b/netlink.h
> index ca4d6ef..217cf1e 100644
> --- a/netlink.h
> +++ b/netlink.h
> @@ -6,9 +6,16 @@
>  #ifndef NETLINK_H
>  #define NETLINK_H
>  
> +enum nl_op {
> +	NL_GET,
> +	NL_SET,
> +	NL_DUP,
> +};
> +
>  void nl_sock_init(const struct ctx *c, bool ns);
>  unsigned int nl_get_ext_if(sa_family_t af);
> -void nl_route(int ns, unsigned int ifi, sa_family_t af, void *gw);
> +void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
> +	      sa_family_t af, void *gw);
>  void nl_addr(int ns, unsigned int ifi, sa_family_t af,
>  	     void *addr, int *prefix_len, void *addr_l);
>  void nl_link(int ns, unsigned int ifi, void *mac, int up, int mtu);
> diff --git a/pasta.c b/pasta.c
> index 2a6fb60..01109f5 100644
> --- a/pasta.c
> +++ b/pasta.c
> @@ -278,14 +278,16 @@ void pasta_ns_conf(struct ctx *c)
>  		if (c->ifi4) {
>  			nl_addr(1, c->pasta_ifi, AF_INET, &c->ip4.addr,
>  				&c->ip4.prefix_len, NULL);
> -			nl_route(1, c->pasta_ifi, AF_INET, &c->ip4.gw);
> +			nl_route(NL_SET, c->ifi4, c->pasta_ifi, AF_INET,
> +				 &c->ip4.gw);
>  		}
>  
>  		if (c->ifi6) {
>  			int prefix_len = 64;
>  			nl_addr(1, c->pasta_ifi, AF_INET6, &c->ip6.addr,
>  				&prefix_len, NULL);
> -			nl_route(1, c->pasta_ifi, AF_INET6, &c->ip6.gw);
> +			nl_route(NL_SET, c->ifi6, c->pasta_ifi, AF_INET6,
> +				 &c->ip6.gw);
>  		}
>  	} else {
>  		nl_link(1, c->pasta_ifi, c->mac_guest, 0, 0);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 05/10] conf, pasta: With --config-net, copy all routes by default
  2023-05-21 23:42 ` [PATCH v2 05/10] conf, pasta: With --config-net, copy all routes by default Stefano Brivio
@ 2023-05-22  8:44   ` David Gibson
  2023-05-22  9:59     ` Stefano Brivio
  0 siblings, 1 reply; 23+ messages in thread
From: David Gibson @ 2023-05-22  8:44 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Callum Parsey, me, lemmi, Andrea Arcangeli

[-- Attachment #1: Type: text/plain, Size: 5949 bytes --]

On Mon, May 22, 2023 at 01:42:19AM +0200, Stefano Brivio wrote:
> Use the newly-introduced NL_DUP mode for nl_route() to copy all the
> routes associated to the template interface in the outer namespace,
> unless --no-copy-routes (also implied by -g) is given.
> 
> Otherwise, we can't use default gateways which are not, address-wise,
> on the same subnet as the container, as reported by Callum.
> 
> Reported-by: Callum Parsey <callum@neoninteger.au>
> Link: https://github.com/containers/podman/issues/18539
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear>

The logic looks sound, although I do have one concern noted below.

> ---
>  conf.c  | 14 ++++++++++++++
>  passt.1 | 10 ++++++++++
>  passt.h |  4 +++-
>  pasta.c |  6 ++++--
>  4 files changed, 31 insertions(+), 3 deletions(-)
> 
> diff --git a/conf.c b/conf.c
> index 3ee6ae0..7541261 100644
> --- a/conf.c
> +++ b/conf.c
> @@ -923,6 +923,7 @@ pasta_opts:
>  	info(   "  --no-netns-quit	Don't quit if filesystem-bound target");
>  	info(   "  			network namespace is deleted");
>  	info(   "  --config-net		Configure tap interface in namespace");
> +	info(   "  --no-copy-routes	Don't copy all routes to namespace");

I'm always a bit nervous about adding new options, since it's
something we then have to maintain compatibility for.  Do we have a
confirmed use case where the copy routes behaviour will cause trouble?

>  	info(   "  --ns-mac-addr ADDR	Set MAC address on tap interface");
>  
>  	exit(EXIT_FAILURE);
> @@ -1198,6 +1199,7 @@ void conf(struct ctx *c, int argc, char **argv)
>  		{"outbound-if4", required_argument,	NULL,		15 },
>  		{"outbound-if6", required_argument,	NULL,		16 },
>  		{"config-net",	no_argument,		NULL,		17 },
> +		{"no-copy-routes", no_argument,		NULL,		18 },
>  		{ 0 },
>  	};
>  	struct get_bound_ports_ns_arg ns_ports_arg = { .c = c };
> @@ -1362,6 +1364,12 @@ void conf(struct ctx *c, int argc, char **argv)
>  
>  			c->pasta_conf_ns = 1;
>  			break;
> +		case 18:
> +			if (c->mode != MODE_PASTA)
> +				die("--no-copy-routes is for pasta mode only");
> +
> +			c->no_copy_routes = 1;
> +			break;
>  		case 'd':
>  			if (c->debug)
>  				die("Multiple --debug options given");
> @@ -1510,6 +1518,9 @@ void conf(struct ctx *c, int argc, char **argv)
>  			}
>  			break;
>  		case 'g':
> +			if (c->mode == MODE_PASTA)
> +				c->no_copy_routes = 1;
> +
>  			if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.gw)		&&
>  			    inet_pton(AF_INET6, optarg, &c->ip6.gw)	&&
>  			    !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.gw)	&&
> @@ -1644,6 +1655,9 @@ void conf(struct ctx *c, int argc, char **argv)
>  	if (*c->sock_path && c->fd_tap >= 0)
>  		die("Options --socket and --fd are mutually exclusive");
>  
> +	if (c->mode == MODE_PASTA && c->no_copy_routes && !c->pasta_conf_ns)
> +		die("Option --no-copy-routes needs --config-net");
> +
>  	if (!ifi4 && *c->ip4.ifname_out)
>  		ifi4 = if_nametoindex(c->ip4.ifname_out);
>  
> diff --git a/passt.1 b/passt.1
> index 19e01d5..10c96ae 100644
> --- a/passt.1
> +++ b/passt.1
> @@ -546,6 +546,16 @@ NAME are given as target), do not exit once the network namespace is deleted.
>  Configure networking in the namespace: set up addresses and routes as configured
>  or sourced from the host, and bring up the tap interface.
>  
> +.TP
> +.BR \-\-no-copy-routes
> +With \-\-config-net, do not copy all the routes associated to the interface we
> +derive addresses and routes from: set up only the default gateway. Implied by
> +-g, \-\-gateway.
> +
> +Default is to copy all the routing entries from the interface in the outer
> +namespace to the target namespace, translating the output interface attribute to
> +the outbound interface in the namespace.
> +
>  .TP
>  .BR \-\-ns-mac-addr " " \fIaddr
>  Configure MAC address \fIaddr\fR on the tap interface in the namespace.
> diff --git a/passt.h b/passt.h
> index 73fe808..d314596 100644
> --- a/passt.h
> +++ b/passt.h
> @@ -181,7 +181,8 @@ struct ip6_ctx {
>   * @ip6:		IPv6 configuration
>   * @pasta_ifn:		Name of namespace interface for pasta
>   * @pasta_ifn:		Index of namespace interface for pasta
> - * @pasta_conf_ns:	Configure namespace interface after creating it
> + * @pasta_conf_ns:	Configure namespace after creating it
> + * @no_copy_routes:	Don't copy all routes when configuring target namespace
>   * @no_tcp:		Disable TCP operation
>   * @tcp:		Context for TCP protocol handler
>   * @no_tcp:		Disable UDP operation
> @@ -240,6 +241,7 @@ struct ctx {
>  	char pasta_ifn[IF_NAMESIZE];
>  	unsigned int pasta_ifi;
>  	int pasta_conf_ns;
> +	int no_copy_routes;
>  
>  	int no_tcp;
>  	struct tcp_ctx tcp;
> diff --git a/pasta.c b/pasta.c
> index 01109f5..b546c93 100644
> --- a/pasta.c
> +++ b/pasta.c
> @@ -273,12 +273,14 @@ void pasta_ns_conf(struct ctx *c)
>  	nl_link(1, 1 /* lo */, MAC_ZERO, 1, 0);
>  
>  	if (c->pasta_conf_ns) {
> +		enum nl_op op_routes = c->no_copy_routes ? NL_SET : NL_DUP;
> +
>  		nl_link(1, c->pasta_ifi, c->mac_guest, 1, c->mtu);
>  
>  		if (c->ifi4) {
>  			nl_addr(1, c->pasta_ifi, AF_INET, &c->ip4.addr,
>  				&c->ip4.prefix_len, NULL);
> -			nl_route(NL_SET, c->ifi4, c->pasta_ifi, AF_INET,
> +			nl_route(op_routes, c->ifi4, c->pasta_ifi, AF_INET,
>  				 &c->ip4.gw);
>  		}
>  
> @@ -286,7 +288,7 @@ void pasta_ns_conf(struct ctx *c)
>  			int prefix_len = 64;
>  			nl_addr(1, c->pasta_ifi, AF_INET6, &c->ip6.addr,
>  				&prefix_len, NULL);
> -			nl_route(NL_SET, c->ifi6, c->pasta_ifi, AF_INET6,
> +			nl_route(op_routes, c->ifi6, c->pasta_ifi, AF_INET6,
>  				 &c->ip6.gw);
>  		}
>  	} else {

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 07/10] conf: Don't exit if sourced default route has no gateway
  2023-05-21 23:42 ` [PATCH v2 07/10] conf: Don't exit if sourced default route has no gateway Stefano Brivio
@ 2023-05-22  8:48   ` David Gibson
  2023-05-22 15:29     ` Stefano Brivio
  0 siblings, 1 reply; 23+ messages in thread
From: David Gibson @ 2023-05-22  8:48 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Callum Parsey, me, lemmi, Andrea Arcangeli

[-- Attachment #1: Type: text/plain, Size: 3732 bytes --]

On Mon, May 22, 2023 at 01:42:21AM +0200, Stefano Brivio wrote:
> If we use a template interface without a gateway on the default
> route, we can still offer almost complete functionality, except that,
> of course, we can't map the gateway address to the outer namespace or
> host, and that we have no obvious server address or identifier for
> use in DHCP's siaddr and option 54 (Server identifier, mandatory).
> 
> Continue, if we have a default route but no default gateway, and
> imply --no-map-gw and --no-dhcp in that case. NDP responder and
> DHCPv6 should be able to work as usual because we require a
> link-local address to be present, and we'll fall back to that.

Implying (rather than requiring) --no-map-gw and --no-dhcp does worry
me a bit.  I feel like it might violate the principle of least
surprise.

> Together with the previous commits implementing an actual copy of
> routes from the outer namespace, this should finally fix the
> operation of 'pasta --config-net' for cases where we have a default
> route on the host, but no default gateway, as it's the case for
> tap-style routes, including typical Wireguard endpoints.

Logic looks sound, though, so

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> 
> Reported-by: me@yawnt.com
> Link: https://bugs.passt.top/show_bug.cgi?id=49
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> ---
>  conf.c  | 10 +++++++---
>  passt.1 |  6 ++++--
>  2 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/conf.c b/conf.c
> index 1392da5..c07b697 100644
> --- a/conf.c
> +++ b/conf.c
> @@ -665,8 +665,7 @@ static unsigned int conf_ip4(unsigned int ifi,
>  	if (MAC_IS_ZERO(mac))
>  		nl_link(0, ifi, mac, 0, 0);
>  
> -	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->gw) ||
> -	    IN4_IS_ADDR_UNSPECIFIED(&ip4->addr) ||
> +	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr) ||
>  	    MAC_IS_ZERO(mac))
>  		return 0;
>  
> @@ -708,7 +707,6 @@ static unsigned int conf_ip6(unsigned int ifi,
>  		nl_link(0, ifi, mac, 0, 0);
>  
>  	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->gw) ||
> -	    IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ||
>  	    IN6_IS_ADDR_UNSPECIFIED(&ip6->addr_ll) ||
>  	    MAC_IS_ZERO(mac))
>  		return 0;
> @@ -1658,6 +1656,12 @@ void conf(struct ctx *c, int argc, char **argv)
>  	    (*c->ip6.ifname_out && !c->ifi6))
>  		die("External interface not usable");
>  
> +	if (c->ifi4 && IN4_IS_ADDR_UNSPECIFIED(&c->ip4.gw))
> +		c->no_map_gw = c->no_dhcp = 1;
> +
> +	if (c->ifi6 && IN6_IS_ADDR_UNSPECIFIED(&c->ip6.gw))
> +		c->no_map_gw = 1;
> +
>  	/* Inbound port options can be parsed now (after IPv4/IPv6 settings) */
>  	optind = 1;
>  	do {
> diff --git a/passt.1 b/passt.1
> index 10c96ae..f965c34 100644
> --- a/passt.1
> +++ b/passt.1
> @@ -281,7 +281,8 @@ guest or target namespace will be silently dropped.
>  .TP
>  .BR \-\-no-dhcp
>  Disable the DHCP server. DHCP client requests coming from guest or target
> -namespace will be silently dropped.
> +namespace will be silently dropped. Implied if there is no gateway on the
> +selected IPv4 default route.
>  
>  .TP
>  .BR \-\-no-ndp
> @@ -301,7 +302,8 @@ namespace will be ignored.
>  .TP
>  .BR \-\-no-map-gw
>  Don't remap TCP connections and untracked UDP traffic, with the gateway address
> -as destination, to the host.
> +as destination, to the host. Implied if there is no gateway on the selected
> +default route for any of the enabled address families.
>  
>  .TP
>  .BR \-4 ", " \-\-ipv4-only

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 08/10] netlink: Add functionality to copy addresses from outer namespace
  2023-05-21 23:42 ` [PATCH v2 08/10] netlink: Add functionality to copy addresses from outer namespace Stefano Brivio
@ 2023-05-22  8:51   ` David Gibson
  0 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2023-05-22  8:51 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Callum Parsey, me, lemmi, Andrea Arcangeli

[-- Attachment #1: Type: text/plain, Size: 7821 bytes --]

On Mon, May 22, 2023 at 01:42:22AM +0200, Stefano Brivio wrote:
> Similarly to what we've just done with routes, support NL_DUP for
> addresses (currently not exposed): nl_addr() can optionally copy
> mulitple addresses to the target namespace, by fixing up data from
> the dump with appropriate flags and interface index, and repeating
> it back to the kernel on the socket opened in the target namespace.
> 
> Link-local addresses are not copied: the family is set to AF_UNSPEC,
> which means the kernel will ignore them. Same for addresses from a
> mismatching address (pre-4.19 kernels without support for
> NETLINK_GET_STRICT_CHK).
> 
> Ignore IFA_LABEL attributes by changing their type to IFA_UNSPEC,
> because in general they will report mismatching names, and we don't
> really need to use labels as we already know the interface index.
> 
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  conf.c    |  8 ++++---
>  netlink.c | 62 +++++++++++++++++++++++++++++++++++++++++--------------
>  netlink.h |  4 ++--
>  pasta.c   |  8 +++----
>  4 files changed, 58 insertions(+), 24 deletions(-)
> 
> diff --git a/conf.c b/conf.c
> index c07b697..1ffd05c 100644
> --- a/conf.c
> +++ b/conf.c
> @@ -645,8 +645,10 @@ static unsigned int conf_ip4(unsigned int ifi,
>  	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->gw))
>  		nl_route(NL_GET, ifi, 0, AF_INET, &ip4->gw);
>  
> -	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr))
> -		nl_addr(0, ifi, AF_INET, &ip4->addr, &ip4->prefix_len, NULL);
> +	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr)) {
> +		nl_addr(NL_GET, ifi, 0, AF_INET,
> +			&ip4->addr, &ip4->prefix_len, NULL);
> +	}
>  
>  	if (!ip4->prefix_len) {
>  		in_addr_t addr = ntohl(ip4->addr.s_addr);
> @@ -696,7 +698,7 @@ static unsigned int conf_ip6(unsigned int ifi,
>  	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->gw))
>  		nl_route(NL_GET, ifi, 0, AF_INET6, &ip6->gw);
>  
> -	nl_addr(0, ifi, AF_INET6,
> +	nl_addr(NL_GET, ifi, 0, AF_INET6,
>  		IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ? &ip6->addr : NULL,
>  		&prefix_len, &ip6->addr_ll);
>  
> diff --git a/netlink.c b/netlink.c
> index d93ecda..bc5b2bf 100644
> --- a/netlink.c
> +++ b/netlink.c
> @@ -334,19 +334,18 @@ next:
>  }
>  
>  /**
> - * nl_addr() - Get/set IP addresses
> - * @ns:		Use netlink socket in namespace
> - * @ifi:	Interface index
> + * nl_addr() - Get/set/copy IP addresses for given interface and address family
> + * @op:		Requested operation
> + * @ifi:	Interface index in outer network namespace
> + * @ifi_ns:	Interface index in target namespace for NL_SET, NL_DUP
>   * @af:		Address family
> - * @addr:	Global address to fill if zero, to set if not, ignored if NULL
> + * @addr:	Global address to fill on NL_GET, to set on NL_SET
>   * @prefix_len:	Mask or prefix length, set or fetched (for IPv4)
> - * @addr_l:	Link-scoped address to fill, NULL if not requested
> + * @addr_l:	Link-scoped address to fill on NL_GET
>   */
> -void nl_addr(int ns, unsigned int ifi, sa_family_t af,
> -	     void *addr, int *prefix_len, void *addr_l)
> +void nl_addr(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
> +	     sa_family_t af, void *addr, int *prefix_len, void *addr_l)
>  {
> -	int set = addr && ((af == AF_INET6 && !IN6_IS_ADDR_UNSPECIFIED(addr)) ||
> -			   (af == AF_INET && *(uint32_t *)addr));
>  	struct req_t {
>  		struct nlmsghdr nlh;
>  		struct ifaddrmsg ifa;
> @@ -365,23 +364,23 @@ void nl_addr(int ns, unsigned int ifi, sa_family_t af,
>  			} a6;
>  		} set;
>  	} req = {
> -		.nlh.nlmsg_type    = set ? RTM_NEWADDR : RTM_GETADDR,
> +		.nlh.nlmsg_type    = op == NL_SET ? RTM_NEWADDR : RTM_GETADDR,
>  		.nlh.nlmsg_flags   = NLM_F_REQUEST,
>  		.nlh.nlmsg_len     = NLMSG_LENGTH(sizeof(struct ifaddrmsg)),
>  		.nlh.nlmsg_seq     = nl_seq++,
>  
>  		.ifa.ifa_family    = af,
>  		.ifa.ifa_index     = ifi,
> -		.ifa.ifa_prefixlen = *prefix_len,
> +		.ifa.ifa_prefixlen = op == NL_SET ? *prefix_len : 0,
>  	};
> +	ssize_t n, nlmsgs_size;
>  	struct ifaddrmsg *ifa;
>  	struct nlmsghdr *nh;
>  	struct rtattr *rta;
>  	char buf[NLBUFSIZ];
> -	ssize_t n;
>  	size_t na;
>  
> -	if (set) {
> +	if (op == NL_SET) {
>  		if (af == AF_INET6) {
>  			size_t rta_len = RTA_LENGTH(sizeof(req.set.a6.l));
>  
> @@ -416,21 +415,47 @@ void nl_addr(int ns, unsigned int ifi, sa_family_t af,
>  		req.nlh.nlmsg_flags |= NLM_F_DUMP;
>  	}
>  
> -	if ((n = nl_req(ns, buf, &req, req.nlh.nlmsg_len)) < 0 || set)
> +	if ((n = nl_req(op == NL_SET, buf, &req, req.nlh.nlmsg_len)) < 0)
> +		return;
> +
> +	if (op == NL_SET)
>  		return;
>  
>  	nh = (struct nlmsghdr *)buf;
> +	nlmsgs_size = n;
> +
>  	for ( ; NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
>  		if (nh->nlmsg_type != RTM_NEWADDR)
>  			goto next;
>  
> +		if (op == NL_DUP) {
> +			nh->nlmsg_seq = nl_seq++;
> +			nh->nlmsg_pid = 0;
> +			nh->nlmsg_flags &= ~NLM_F_DUMP_FILTERED;
> +			nh->nlmsg_flags |= NLM_F_REQUEST | NLM_F_ACK |
> +					   NLM_F_CREATE;
> +		}
> +
>  		ifa = (struct ifaddrmsg *)NLMSG_DATA(nh);
> +
> +		if (op == NL_DUP && (ifa->ifa_scope == RT_SCOPE_LINK ||
> +				     ifa->ifa_index != ifi)) {
> +			ifa->ifa_family = AF_UNSPEC;
> +			goto next;
> +		}
> +
>  		if (ifa->ifa_index != ifi)
>  			goto next;
>  
> +		if (op == NL_DUP)
> +			ifa->ifa_index = ifi_ns;
> +
>  		for (rta = IFA_RTA(ifa), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
>  		     rta = RTA_NEXT(rta, na)) {
> -			if (rta->rta_type != IFA_ADDRESS)
> +			if (op == NL_DUP && rta->rta_type == IFA_LABEL)
> +				rta->rta_type = IFA_UNSPEC;
> +
> +			if (op == NL_DUP || rta->rta_type != IFA_ADDRESS)
>  				continue;
>  
>  			if (af == AF_INET && addr && !*(uint32_t *)addr) {
> @@ -451,6 +476,13 @@ next:
>  		if (nh->nlmsg_type == NLMSG_DONE)
>  			break;
>  	}
> +
> +	if (op == NL_DUP) {
> +		char resp[NLBUFSIZ];
> +
> +		nh = (struct nlmsghdr *)buf;
> +		nl_req(1, resp, nh, nlmsgs_size);
> +	}
>  }
>  
>  /**
> diff --git a/netlink.h b/netlink.h
> index 217cf1e..cd0e666 100644
> --- a/netlink.h
> +++ b/netlink.h
> @@ -16,8 +16,8 @@ void nl_sock_init(const struct ctx *c, bool ns);
>  unsigned int nl_get_ext_if(sa_family_t af);
>  void nl_route(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
>  	      sa_family_t af, void *gw);
> -void nl_addr(int ns, unsigned int ifi, sa_family_t af,
> -	     void *addr, int *prefix_len, void *addr_l);
> +void nl_addr(enum nl_op op, unsigned int ifi, unsigned int ifi_ns,
> +	     sa_family_t af, void *addr, int *prefix_len, void *addr_l);
>  void nl_link(int ns, unsigned int ifi, void *mac, int up, int mtu);
>  
>  #endif /* NETLINK_H */
> diff --git a/pasta.c b/pasta.c
> index b546c93..99ef3fc 100644
> --- a/pasta.c
> +++ b/pasta.c
> @@ -278,16 +278,16 @@ void pasta_ns_conf(struct ctx *c)
>  		nl_link(1, c->pasta_ifi, c->mac_guest, 1, c->mtu);
>  
>  		if (c->ifi4) {
> -			nl_addr(1, c->pasta_ifi, AF_INET, &c->ip4.addr,
> -				&c->ip4.prefix_len, NULL);
> +			nl_addr(NL_SET, c->ifi4, c->pasta_ifi, AF_INET,
> +				&c->ip4.addr, &c->ip4.prefix_len, NULL);
>  			nl_route(op_routes, c->ifi4, c->pasta_ifi, AF_INET,
>  				 &c->ip4.gw);
>  		}
>  
>  		if (c->ifi6) {
>  			int prefix_len = 64;
> -			nl_addr(1, c->pasta_ifi, AF_INET6, &c->ip6.addr,
> -				&prefix_len, NULL);
> +			nl_addr(NL_SET, c->ifi6, c->pasta_ifi, AF_INET6,
> +				&c->ip6.addr, &prefix_len, NULL);
>  			nl_route(op_routes, c->ifi6, c->pasta_ifi, AF_INET6,
>  				 &c->ip6.gw);
>  		}

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 09/10] conf, pasta: With --config-net, copy all addresses by default
  2023-05-21 23:42 ` [PATCH v2 09/10] conf, pasta: With --config-net, copy all addresses by default Stefano Brivio
@ 2023-05-22  8:53   ` David Gibson
  2023-05-22 15:30     ` Stefano Brivio
  0 siblings, 1 reply; 23+ messages in thread
From: David Gibson @ 2023-05-22  8:53 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Callum Parsey, me, lemmi, Andrea Arcangeli

[-- Attachment #1: Type: text/plain, Size: 5899 bytes --]

On Mon, May 22, 2023 at 01:42:23AM +0200, Stefano Brivio wrote:
> Use the newly-introduced NL_DUP mode for nl_addr() to copy all the
> addresses associated to the template interface in the outer
> namespace, unless --no-copy-addrs (also implied by -a) is given.

Again, I'm always concerned by the addition of new command line
options.  Particularly in this case, where it looks like you can get
the same behaviour by using the right -a option.

> This is done mostly for consistency with routes. It might partially
> cover the issue at:
>   https://bugs.passt.top/show_bug.cgi?id=47
>   Support multiple addresses per address family
> 
> for some use cases, but not the originally intended one: we'll still
> use a single outbound address (unless the routing table specifies
> different preferred source addresses depending on the destination),
> regardless of the address used in the target namespace.
> 
> Link: https://bugs.passt.top/show_bug.cgi?id=47
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

Logic looks sound though, so

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  conf.c  | 16 ++++++++++++++--
>  passt.1 |  9 +++++++++
>  passt.h |  2 ++
>  pasta.c |  5 +++--
>  4 files changed, 28 insertions(+), 4 deletions(-)
> 
> diff --git a/conf.c b/conf.c
> index 1ffd05c..e6c68e2 100644
> --- a/conf.c
> +++ b/conf.c
> @@ -901,6 +901,7 @@ pasta_opts:
>  	info(   "  			network namespace is deleted");
>  	info(   "  --config-net		Configure tap interface in namespace");
>  	info(   "  --no-copy-routes	Don't copy all routes to namespace");
> +	info(   "  --no-copy-addrs	Don't copy all addresses to namespace");
>  	info(   "  --ns-mac-addr ADDR	Set MAC address on tap interface");
>  
>  	exit(EXIT_FAILURE);
> @@ -1177,6 +1178,7 @@ void conf(struct ctx *c, int argc, char **argv)
>  		{"outbound-if6", required_argument,	NULL,		16 },
>  		{"config-net",	no_argument,		NULL,		17 },
>  		{"no-copy-routes", no_argument,		NULL,		18 },
> +		{"no-copy-addrs", no_argument,		NULL,		19 },
>  		{ 0 },
>  	};
>  	struct get_bound_ports_ns_arg ns_ports_arg = { .c = c };
> @@ -1347,6 +1349,12 @@ void conf(struct ctx *c, int argc, char **argv)
>  
>  			c->no_copy_routes = 1;
>  			break;
> +		case 19:
> +			if (c->mode != MODE_PASTA)
> +				die("--no-copy-addrs is for pasta mode only");
> +
> +			c->no_copy_addrs = 1;
> +			break;
>  		case 'd':
>  			if (c->debug)
>  				die("Multiple --debug options given");
> @@ -1632,8 +1640,12 @@ void conf(struct ctx *c, int argc, char **argv)
>  	if (*c->sock_path && c->fd_tap >= 0)
>  		die("Options --socket and --fd are mutually exclusive");
>  
> -	if (c->mode == MODE_PASTA && c->no_copy_routes && !c->pasta_conf_ns)
> -		die("Option --no-copy-routes needs --config-net");
> +	if (c->mode == MODE_PASTA && !c->pasta_conf_ns) {
> +		if (c->no_copy_routes)
> +			die("Option --no-copy-routes needs --config-net");
> +		if (c->no_copy_addrs)
> +			die("Option --no-copy-addrs needs --config-net");
> +	}
>  
>  	if (!ifi4 && *c->ip4.ifname_out)
>  		ifi4 = if_nametoindex(c->ip4.ifname_out);
> diff --git a/passt.1 b/passt.1
> index f965c34..87b076d 100644
> --- a/passt.1
> +++ b/passt.1
> @@ -558,6 +558,15 @@ Default is to copy all the routing entries from the interface in the outer
>  namespace to the target namespace, translating the output interface attribute to
>  the outbound interface in the namespace.
>  
> +.TP
> +.BR \-\-no-copy-addrs
> +With \-\-config-net, do not copy all the addresses associated to the interface
> +we derive addresses and routes from: set up a single one. Implied by \-a,
> +\-\-address.
> +
> +Default is to copy all the addresses, except for link-local ones, from the
> +interface from the outer namespace to the target namespace.
> +
>  .TP
>  .BR \-\-ns-mac-addr " " \fIaddr
>  Configure MAC address \fIaddr\fR on the tap interface in the namespace.
> diff --git a/passt.h b/passt.h
> index d314596..b51a1e5 100644
> --- a/passt.h
> +++ b/passt.h
> @@ -183,6 +183,7 @@ struct ip6_ctx {
>   * @pasta_ifn:		Index of namespace interface for pasta
>   * @pasta_conf_ns:	Configure namespace after creating it
>   * @no_copy_routes:	Don't copy all routes when configuring target namespace
> + * @no_copy_addrs:	Don't copy all addresses when configuring namespace
>   * @no_tcp:		Disable TCP operation
>   * @tcp:		Context for TCP protocol handler
>   * @no_tcp:		Disable UDP operation
> @@ -242,6 +243,7 @@ struct ctx {
>  	unsigned int pasta_ifi;
>  	int pasta_conf_ns;
>  	int no_copy_routes;
> +	int no_copy_addrs;
>  
>  	int no_tcp;
>  	struct tcp_ctx tcp;
> diff --git a/pasta.c b/pasta.c
> index 99ef3fc..4054e9a 100644
> --- a/pasta.c
> +++ b/pasta.c
> @@ -274,11 +274,12 @@ void pasta_ns_conf(struct ctx *c)
>  
>  	if (c->pasta_conf_ns) {
>  		enum nl_op op_routes = c->no_copy_routes ? NL_SET : NL_DUP;
> +		enum nl_op op_addrs =  c->no_copy_addrs  ? NL_SET : NL_DUP;
>  
>  		nl_link(1, c->pasta_ifi, c->mac_guest, 1, c->mtu);
>  
>  		if (c->ifi4) {
> -			nl_addr(NL_SET, c->ifi4, c->pasta_ifi, AF_INET,
> +			nl_addr(op_addrs, c->ifi4, c->pasta_ifi, AF_INET,
>  				&c->ip4.addr, &c->ip4.prefix_len, NULL);
>  			nl_route(op_routes, c->ifi4, c->pasta_ifi, AF_INET,
>  				 &c->ip4.gw);
> @@ -286,7 +287,7 @@ void pasta_ns_conf(struct ctx *c)
>  
>  		if (c->ifi6) {
>  			int prefix_len = 64;
> -			nl_addr(NL_SET, c->ifi6, c->pasta_ifi, AF_INET6,
> +			nl_addr(op_addrs, c->ifi6, c->pasta_ifi, AF_INET6,
>  				&c->ip6.addr, &prefix_len, NULL);
>  			nl_route(op_routes, c->ifi6, c->pasta_ifi, AF_INET6,
>  				 &c->ip6.gw);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 03/10] netlink: Add functionality to copy routes from outer namespace
  2023-05-22  8:42   ` David Gibson
@ 2023-05-22  9:58     ` Stefano Brivio
  2023-05-23  3:08       ` David Gibson
  0 siblings, 1 reply; 23+ messages in thread
From: Stefano Brivio @ 2023-05-22  9:58 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Callum Parsey, me, lemmi, Andrea Arcangeli

On Mon, 22 May 2023 18:42:01 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Mon, May 22, 2023 at 01:42:17AM +0200, Stefano Brivio wrote:
> > Instead of just fetching the default gateway and configuring a single
> > equivalent route in the target namespace, on 'pasta --config-net', it
> > might be desirable in some cases to copy the whole set of routes
> > corresponding to a given output interface.
> > 
> > For instance, in:
> >   https://github.com/containers/podman/issues/18539
> >   IPv4 Default Route Does Not Propagate to Pasta Containers on Hetzner VPSes
> > 
> > configuring the default gateway won't work without a gateway-less
> > route (specifying the output interface only), because the default
> > gateway is, somewhat dubiously, not on the same subnet as the
> > container.
> > 
> > This is a similar case to the one covered by commit 7656a6f88882
> > ("conf: Adjust netmask on mismatch between IPv4 address/netmask and
> > gateway"), and I'm not exactly proud of that workaround.
> > 
> > We also have:
> >   https://bugs.passt.top/show_bug.cgi?id=49
> >   pasta does not work with tap-style interface
> > 
> > for which, eventually, we should be able to configure a gateway-less
> > route in the target namespace.
> > 
> > Introduce different operation modes for nl_route(), including a new
> > NL_DUP one, not exposed yet, which simply parrots back to the kernel
> > the route dump for a given interface from the outer namespace, fixing
> > up flags and interface indices on the way, and requesting to add the
> > same routes in the target namespace, on the interface we manage.
> > 
> > For n routes we want to duplicate, send n identical netlink requests
> > including the full dump: routes might depend on each other and the
> > kernel processes RTM_NEWROUTE messages sequentially, not atomically,
> > and repeating the full dump naturally resolves dependencies without
> > the need to actually calculate them.
> > 
> > I'm not kidding, it actually works pretty well.  
> 
> If there's a way to detect whether the kernel rejected some of the
> routes, it would be nice to cut that loop short as soon as all the
> routes are inserted.  Obviously that could be a followup improvement,
> though.

Yes, there's a way, but to keep things asynchronous in a simple way we
process errors from nl_req() only at the next request.

This part doesn't really need to be asynchronous, though: we could add
a flag for nl_req() saying that we want to know about NLMSG_ERROR right
away. This looks relatively straightforward, and already an improvement
in the sense you mentioned.

Actually parsing the error and finding out the offending route is a bit
more complicated, though.

-- 
Stefano


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 05/10] conf, pasta: With --config-net, copy all routes by default
  2023-05-22  8:44   ` David Gibson
@ 2023-05-22  9:59     ` Stefano Brivio
  0 siblings, 0 replies; 23+ messages in thread
From: Stefano Brivio @ 2023-05-22  9:59 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Callum Parsey, me, lemmi, Andrea Arcangeli

On Mon, 22 May 2023 18:44:51 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Mon, May 22, 2023 at 01:42:19AM +0200, Stefano Brivio wrote:
> > Use the newly-introduced NL_DUP mode for nl_route() to copy all the
> > routes associated to the template interface in the outer namespace,
> > unless --no-copy-routes (also implied by -g) is given.
> > 
> > Otherwise, we can't use default gateways which are not, address-wise,
> > on the same subnet as the container, as reported by Callum.
> > 
> > Reported-by: Callum Parsey <callum@neoninteger.au>
> > Link: https://github.com/containers/podman/issues/18539
> > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>  
> 
> Reviewed-by: David Gibson <david@gibson.dropbear>
> 
> The logic looks sound, although I do have one concern noted below.
> 
> > ---
> >  conf.c  | 14 ++++++++++++++
> >  passt.1 | 10 ++++++++++
> >  passt.h |  4 +++-
> >  pasta.c |  6 ++++--
> >  4 files changed, 31 insertions(+), 3 deletions(-)
> > 
> > diff --git a/conf.c b/conf.c
> > index 3ee6ae0..7541261 100644
> > --- a/conf.c
> > +++ b/conf.c
> > @@ -923,6 +923,7 @@ pasta_opts:
> >  	info(   "  --no-netns-quit	Don't quit if filesystem-bound target");
> >  	info(   "  			network namespace is deleted");
> >  	info(   "  --config-net		Configure tap interface in namespace");
> > +	info(   "  --no-copy-routes	Don't copy all routes to namespace");  
> 
> I'm always a bit nervous about adding new options, since it's
> something we then have to maintain compatibility for.  Do we have a
> confirmed use case where the copy routes behaviour will cause trouble?

Not really, but I wanted to keep around the possibility of having the
old behaviour, in case one wants to skip stuff like source routing or
fallback routes with different metrics.

Compatibility-wise it doesn't look like a huge burden (besides, I think
these options could even be dropped at some point).

Same as you noticed for 9/10: this could be obtained by passing one or
two -g options, but it's not as "immediate" as "just give me one
working default gateway".

-- 
Stefano


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 07/10] conf: Don't exit if sourced default route has no gateway
  2023-05-22  8:48   ` David Gibson
@ 2023-05-22 15:29     ` Stefano Brivio
  0 siblings, 0 replies; 23+ messages in thread
From: Stefano Brivio @ 2023-05-22 15:29 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Callum Parsey, me, lemmi, Andrea Arcangeli

On Mon, 22 May 2023 18:48:39 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Mon, May 22, 2023 at 01:42:21AM +0200, Stefano Brivio wrote:
> > If we use a template interface without a gateway on the default
> > route, we can still offer almost complete functionality, except that,
> > of course, we can't map the gateway address to the outer namespace or
> > host, and that we have no obvious server address or identifier for
> > use in DHCP's siaddr and option 54 (Server identifier, mandatory).
> > 
> > Continue, if we have a default route but no default gateway, and
> > imply --no-map-gw and --no-dhcp in that case. NDP responder and
> > DHCPv6 should be able to work as usual because we require a
> > link-local address to be present, and we'll fall back to that.  
> 
> Implying (rather than requiring) --no-map-gw and --no-dhcp does worry
> me a bit.  I feel like it might violate the principle of least
> surprise.

Well, yes, but on the other hand it risks to be a lot of typing (and
changes to the Podman integration) for a relatively common setup. And
we might even figure out there's a way and a benefit to re-enable DHCP
support in this case.

I'm not fond of that either, but I still prefer it to the alternative.

-- 
Stefano


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 09/10] conf, pasta: With --config-net, copy all addresses by default
  2023-05-22  8:53   ` David Gibson
@ 2023-05-22 15:30     ` Stefano Brivio
  0 siblings, 0 replies; 23+ messages in thread
From: Stefano Brivio @ 2023-05-22 15:30 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Callum Parsey, me, lemmi, Andrea Arcangeli

On Mon, 22 May 2023 18:53:35 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Mon, May 22, 2023 at 01:42:23AM +0200, Stefano Brivio wrote:
> > Use the newly-introduced NL_DUP mode for nl_addr() to copy all the
> > addresses associated to the template interface in the outer
> > namespace, unless --no-copy-addrs (also implied by -a) is given.  
> 
> Again, I'm always concerned by the addition of new command line
> options.  Particularly in this case, where it looks like you can get
> the same behaviour by using the right -a option.

As discussed earlier: --no-copy-addrs is probably even less useful than
--no-copy-routes, but, even here, it would be nice to keep these
options around for a while as we're changing the default behaviour
(while "removing" nothing from it) -- especially if this causes an
issue, it's quite convenient to have a single option to restore the old
behaviour for investigation.

So let's add those options as deprecated right away, and then we can
drop them in a couple of months if everything goes as expected. I'll
respin in a bit.

-- 
Stefano


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 03/10] netlink: Add functionality to copy routes from outer namespace
  2023-05-22  9:58     ` Stefano Brivio
@ 2023-05-23  3:08       ` David Gibson
  2023-05-23  6:14         ` Stefano Brivio
  0 siblings, 1 reply; 23+ messages in thread
From: David Gibson @ 2023-05-23  3:08 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Callum Parsey, me, lemmi, Andrea Arcangeli

[-- Attachment #1: Type: text/plain, Size: 3262 bytes --]

On Mon, May 22, 2023 at 11:58:51AM +0200, Stefano Brivio wrote:
> On Mon, 22 May 2023 18:42:01 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Mon, May 22, 2023 at 01:42:17AM +0200, Stefano Brivio wrote:
> > > Instead of just fetching the default gateway and configuring a single
> > > equivalent route in the target namespace, on 'pasta --config-net', it
> > > might be desirable in some cases to copy the whole set of routes
> > > corresponding to a given output interface.
> > > 
> > > For instance, in:
> > >   https://github.com/containers/podman/issues/18539
> > >   IPv4 Default Route Does Not Propagate to Pasta Containers on Hetzner VPSes
> > > 
> > > configuring the default gateway won't work without a gateway-less
> > > route (specifying the output interface only), because the default
> > > gateway is, somewhat dubiously, not on the same subnet as the
> > > container.
> > > 
> > > This is a similar case to the one covered by commit 7656a6f88882
> > > ("conf: Adjust netmask on mismatch between IPv4 address/netmask and
> > > gateway"), and I'm not exactly proud of that workaround.
> > > 
> > > We also have:
> > >   https://bugs.passt.top/show_bug.cgi?id=49
> > >   pasta does not work with tap-style interface
> > > 
> > > for which, eventually, we should be able to configure a gateway-less
> > > route in the target namespace.
> > > 
> > > Introduce different operation modes for nl_route(), including a new
> > > NL_DUP one, not exposed yet, which simply parrots back to the kernel
> > > the route dump for a given interface from the outer namespace, fixing
> > > up flags and interface indices on the way, and requesting to add the
> > > same routes in the target namespace, on the interface we manage.
> > > 
> > > For n routes we want to duplicate, send n identical netlink requests
> > > including the full dump: routes might depend on each other and the
> > > kernel processes RTM_NEWROUTE messages sequentially, not atomically,
> > > and repeating the full dump naturally resolves dependencies without
> > > the need to actually calculate them.
> > > 
> > > I'm not kidding, it actually works pretty well.  
> > 
> > If there's a way to detect whether the kernel rejected some of the
> > routes, it would be nice to cut that loop short as soon as all the
> > routes are inserted.  Obviously that could be a followup improvement,
> > though.
> 
> Yes, there's a way, but to keep things asynchronous in a simple way we
> process errors from nl_req() only at the next request.
> 
> This part doesn't really need to be asynchronous, though: we could add
> a flag for nl_req() saying that we want to know about NLMSG_ERROR right
> away. This looks relatively straightforward, and already an improvement
> in the sense you mentioned.
> 
> Actually parsing the error and finding out the offending route is a bit
> more complicated, though.

Right, but we don't necessarily need to do that: all we need is that
if there are *no* errors we can stop the loop early.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 03/10] netlink: Add functionality to copy routes from outer namespace
  2023-05-23  3:08       ` David Gibson
@ 2023-05-23  6:14         ` Stefano Brivio
  2023-05-27  2:06           ` David Gibson
  0 siblings, 1 reply; 23+ messages in thread
From: Stefano Brivio @ 2023-05-23  6:14 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Callum Parsey, me, lemmi, Andrea Arcangeli

On Tue, 23 May 2023 13:08:21 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Mon, May 22, 2023 at 11:58:51AM +0200, Stefano Brivio wrote:
> > On Mon, 22 May 2023 18:42:01 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Mon, May 22, 2023 at 01:42:17AM +0200, Stefano Brivio wrote:  
> > > > Instead of just fetching the default gateway and configuring a single
> > > > equivalent route in the target namespace, on 'pasta --config-net', it
> > > > might be desirable in some cases to copy the whole set of routes
> > > > corresponding to a given output interface.
> > > > 
> > > > For instance, in:
> > > >   https://github.com/containers/podman/issues/18539
> > > >   IPv4 Default Route Does Not Propagate to Pasta Containers on Hetzner VPSes
> > > > 
> > > > configuring the default gateway won't work without a gateway-less
> > > > route (specifying the output interface only), because the default
> > > > gateway is, somewhat dubiously, not on the same subnet as the
> > > > container.
> > > > 
> > > > This is a similar case to the one covered by commit 7656a6f88882
> > > > ("conf: Adjust netmask on mismatch between IPv4 address/netmask and
> > > > gateway"), and I'm not exactly proud of that workaround.
> > > > 
> > > > We also have:
> > > >   https://bugs.passt.top/show_bug.cgi?id=49
> > > >   pasta does not work with tap-style interface
> > > > 
> > > > for which, eventually, we should be able to configure a gateway-less
> > > > route in the target namespace.
> > > > 
> > > > Introduce different operation modes for nl_route(), including a new
> > > > NL_DUP one, not exposed yet, which simply parrots back to the kernel
> > > > the route dump for a given interface from the outer namespace, fixing
> > > > up flags and interface indices on the way, and requesting to add the
> > > > same routes in the target namespace, on the interface we manage.
> > > > 
> > > > For n routes we want to duplicate, send n identical netlink requests
> > > > including the full dump: routes might depend on each other and the
> > > > kernel processes RTM_NEWROUTE messages sequentially, not atomically,
> > > > and repeating the full dump naturally resolves dependencies without
> > > > the need to actually calculate them.
> > > > 
> > > > I'm not kidding, it actually works pretty well.    
> > > 
> > > If there's a way to detect whether the kernel rejected some of the
> > > routes, it would be nice to cut that loop short as soon as all the
> > > routes are inserted.  Obviously that could be a followup improvement,
> > > though.  
> > 
> > Yes, there's a way, but to keep things asynchronous in a simple way we
> > process errors from nl_req() only at the next request.
> > 
> > This part doesn't really need to be asynchronous, though: we could add
> > a flag for nl_req() saying that we want to know about NLMSG_ERROR right
> > away. This looks relatively straightforward, and already an improvement
> > in the sense you mentioned.
> > 
> > Actually parsing the error and finding out the offending route is a bit
> > more complicated, though.  
> 
> Right, but we don't necessarily need to do that: all we need is that
> if there are *no* errors we can stop the loop early.

Yes yes, that's what I meant with the paragraph before.

By the way, note that in general we'll get EEXIST in the "extended ACK"
for any message we send, because we just inserted addresses that
already created their prefix routes.

We could think of setting the IFA_F_NOPREFIXROUTE flag on addresses, on
NL_DUP in nl_addr(), or even always, to avoid this.

-- 
Stefano


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 03/10] netlink: Add functionality to copy routes from outer namespace
  2023-05-23  6:14         ` Stefano Brivio
@ 2023-05-27  2:06           ` David Gibson
  0 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2023-05-27  2:06 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Callum Parsey, me, lemmi, Andrea Arcangeli

[-- Attachment #1: Type: text/plain, Size: 4130 bytes --]

On Tue, May 23, 2023 at 08:14:07AM +0200, Stefano Brivio wrote:
> On Tue, 23 May 2023 13:08:21 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Mon, May 22, 2023 at 11:58:51AM +0200, Stefano Brivio wrote:
> > > On Mon, 22 May 2023 18:42:01 +1000
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > On Mon, May 22, 2023 at 01:42:17AM +0200, Stefano Brivio wrote:  
> > > > > Instead of just fetching the default gateway and configuring a single
> > > > > equivalent route in the target namespace, on 'pasta --config-net', it
> > > > > might be desirable in some cases to copy the whole set of routes
> > > > > corresponding to a given output interface.
> > > > > 
> > > > > For instance, in:
> > > > >   https://github.com/containers/podman/issues/18539
> > > > >   IPv4 Default Route Does Not Propagate to Pasta Containers on Hetzner VPSes
> > > > > 
> > > > > configuring the default gateway won't work without a gateway-less
> > > > > route (specifying the output interface only), because the default
> > > > > gateway is, somewhat dubiously, not on the same subnet as the
> > > > > container.
> > > > > 
> > > > > This is a similar case to the one covered by commit 7656a6f88882
> > > > > ("conf: Adjust netmask on mismatch between IPv4 address/netmask and
> > > > > gateway"), and I'm not exactly proud of that workaround.
> > > > > 
> > > > > We also have:
> > > > >   https://bugs.passt.top/show_bug.cgi?id=49
> > > > >   pasta does not work with tap-style interface
> > > > > 
> > > > > for which, eventually, we should be able to configure a gateway-less
> > > > > route in the target namespace.
> > > > > 
> > > > > Introduce different operation modes for nl_route(), including a new
> > > > > NL_DUP one, not exposed yet, which simply parrots back to the kernel
> > > > > the route dump for a given interface from the outer namespace, fixing
> > > > > up flags and interface indices on the way, and requesting to add the
> > > > > same routes in the target namespace, on the interface we manage.
> > > > > 
> > > > > For n routes we want to duplicate, send n identical netlink requests
> > > > > including the full dump: routes might depend on each other and the
> > > > > kernel processes RTM_NEWROUTE messages sequentially, not atomically,
> > > > > and repeating the full dump naturally resolves dependencies without
> > > > > the need to actually calculate them.
> > > > > 
> > > > > I'm not kidding, it actually works pretty well.    
> > > > 
> > > > If there's a way to detect whether the kernel rejected some of the
> > > > routes, it would be nice to cut that loop short as soon as all the
> > > > routes are inserted.  Obviously that could be a followup improvement,
> > > > though.  
> > > 
> > > Yes, there's a way, but to keep things asynchronous in a simple way we
> > > process errors from nl_req() only at the next request.
> > > 
> > > This part doesn't really need to be asynchronous, though: we could add
> > > a flag for nl_req() saying that we want to know about NLMSG_ERROR right
> > > away. This looks relatively straightforward, and already an improvement
> > > in the sense you mentioned.
> > > 
> > > Actually parsing the error and finding out the offending route is a bit
> > > more complicated, though.  
> > 
> > Right, but we don't necessarily need to do that: all we need is that
> > if there are *no* errors we can stop the loop early.
> 
> Yes yes, that's what I meant with the paragraph before.
> 
> By the way, note that in general we'll get EEXIST in the "extended ACK"
> for any message we send, because we just inserted addresses that
> already created their prefix routes.

Ah, right.  That might make it more trouble than it's worth.

> We could think of setting the IFA_F_NOPREFIXROUTE flag on addresses, on
> NL_DUP in nl_addr(), or even always, to avoid this.
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-05-27  2:26 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-21 23:42 [PATCH v2 00/10] Optionally copy all routes and addresses for pasta, allow gateway-less routes Stefano Brivio
2023-05-21 23:42 ` [PATCH v2 01/10] netlink: Fix comment about response buffer size for nl_req() Stefano Brivio
2023-05-21 23:42 ` [PATCH v2 02/10] pasta: Improve error handling on failure to join network namespace Stefano Brivio
2023-05-21 23:42 ` [PATCH v2 03/10] netlink: Add functionality to copy routes from outer namespace Stefano Brivio
2023-05-22  8:42   ` David Gibson
2023-05-22  9:58     ` Stefano Brivio
2023-05-23  3:08       ` David Gibson
2023-05-23  6:14         ` Stefano Brivio
2023-05-27  2:06           ` David Gibson
2023-05-21 23:42 ` [PATCH v2 04/10] conf: --config-net option is for pasta mode only Stefano Brivio
2023-05-21 23:42 ` [PATCH v2 05/10] conf, pasta: With --config-net, copy all routes by default Stefano Brivio
2023-05-22  8:44   ` David Gibson
2023-05-22  9:59     ` Stefano Brivio
2023-05-21 23:42 ` [PATCH v2 06/10] Revert "conf: Adjust netmask on mismatch between IPv4 address/netmask and gateway" Stefano Brivio
2023-05-21 23:42 ` [PATCH v2 07/10] conf: Don't exit if sourced default route has no gateway Stefano Brivio
2023-05-22  8:48   ` David Gibson
2023-05-22 15:29     ` Stefano Brivio
2023-05-21 23:42 ` [PATCH v2 08/10] netlink: Add functionality to copy addresses from outer namespace Stefano Brivio
2023-05-22  8:51   ` David Gibson
2023-05-21 23:42 ` [PATCH v2 09/10] conf, pasta: With --config-net, copy all addresses by default Stefano Brivio
2023-05-22  8:53   ` David Gibson
2023-05-22 15:30     ` Stefano Brivio
2023-05-21 23:42 ` [PATCH v2 10/10] passt.h: Fix description of pasta_ifi in struct ctx Stefano Brivio

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).