public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH 00/22] RFC: Allow configuration of special case NATs
@ 2024-08-16  5:39 David Gibson
  2024-08-16  5:39 ` [PATCH 01/22] treewide: Use "our address" instead of "forwarding address" David Gibson
                   ` (23 more replies)
  0 siblings, 24 replies; 55+ messages in thread
From: David Gibson @ 2024-08-16  5:39 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger, David Gibson

Based on Stefano's recent patch for faster tests.

Allow the user to specify which addresses are translated when used by
the guest, rather than always being the gateway address or nothing.
We also allow this remapping to go to the host's global address (more
precisely the address assigned to the guest) rather than just host
loopback.

Suggestions for better names for the new options in patches 20 & 22
are most welcome.

Along the way to implementing that make many changes to clarify what
various addresses we track mean, fixing a number of small bugs as
well.

NOTE: there is a bug in 21/22 which breaks some of the passt_tcp perf
tests.  I haven't managed to figure out why it's causing the problem,
or even what the exact triggering conditions are (running the single
stalling iperf alone doesn't do it).  Have to wrap up for today, so I
thought I'd get this out for review anyway.

Paul, amongst other things, I think this will allow podman to
(finally) nicely address #19213, picking an address to remap to the
host's external address with --nat-guest-addr, much like it already
uses --dns-forward.

David Gibson (22):
  treewide: Use "our address" instead of "forwarding address"
  util: Helper for formatting MAC addresses
  treewide: Rename MAC address fields for clarity
  treewide: Use struct assignment instead of memcpy() for IP addresses
  conf: Use array indices rather than pointers for DNS array slots
  conf: More accurately count entries added in get_dns()
  conf: Move DNS array bounds checks into add_dns[46]
  conf: Move adding of a nameserver from resolv.conf into subfunction
  conf: Correct setting of dns_match address in add_dns6()
  conf: Treat --dns addresses as guest visible addresses
  conf: Remove incorrect initialisation of addr_ll_seen
  util: Correct sock_l4() binding for link local addresses
  treewide: Change misleading 'addr_ll' name
  Clarify which addresses in ip[46]_ctx are meaningful where
  Initialise our_tap_ll to ip6.gw when suitable
  fwd: Helpers to clarify what host addresses aren't guest accessible
  fwd: Split notion of "our tap address" from gateway for IPv4
  Don't take "our" MAC address from the host
  conf, fwd: Split notion of gateway/router from guest-visible host
    address
  conf: Allow address remapped to host to be configured
  fwd: Distinguish translatable from untranslatable addresses on inbound
  fwd, conf: Allow NAT of the guest's assigned address

 arp.c                 |   4 +-
 conf.c                | 328 +++++++++++++++++++++++++-----------------
 dhcp.c                |  19 +--
 dhcpv6.c              |  21 +--
 flow.c                |  72 +++++-----
 flow.h                |  18 +--
 fwd.c                 | 170 +++++++++++++++++-----
 icmp.c                |   4 +-
 ndp.c                 |   9 +-
 passt.1               |  45 +++++-
 passt.c               |   2 +-
 passt.h               |  53 +++++--
 pasta.c               |  14 +-
 tap.c                 |  12 +-
 tcp.c                 |  33 ++---
 tcp_internal.h        |   2 +-
 test/lib/setup        |  11 +-
 test/passt_in_ns/dhcp |  73 ++++++++++
 test/passt_in_ns/tcp  |  38 +++--
 test/passt_in_ns/udp  |  22 +--
 test/perf/passt_tcp   |  33 ++---
 test/perf/passt_udp   |  31 ++--
 test/perf/pasta_tcp   |  29 ++--
 test/perf/pasta_udp   |  25 ++--
 test/run              |   4 +-
 udp.c                 |  12 +-
 util.c                |  22 ++-
 util.h                |   4 +-
 28 files changed, 719 insertions(+), 391 deletions(-)
 create mode 100644 test/passt_in_ns/dhcp

-- 
2.46.0


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 01/22] treewide: Use "our address" instead of "forwarding address"
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
@ 2024-08-16  5:39 ` David Gibson
  2024-08-18 15:44   ` Stefano Brivio
  2024-08-16  5:39 ` [PATCH 02/22] util: Helper for formatting MAC addresses David Gibson
                   ` (22 subsequent siblings)
  23 siblings, 1 reply; 55+ messages in thread
From: David Gibson @ 2024-08-16  5:39 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger, David Gibson

The term "forwarding address" to indicate the local-to-passt address was
well-intentioned, but ends up being kinda confusing.  As discussed on a
recent call, let's try "our" instead.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 flow.c         | 72 +++++++++++++++++++++++++-------------------------
 flow.h         | 18 ++++++-------
 fwd.c          | 70 ++++++++++++++++++++++++------------------------
 icmp.c         |  4 +--
 tcp.c          | 33 ++++++++++++-----------
 tcp_internal.h |  2 +-
 udp.c          | 12 ++++-----
 7 files changed, 106 insertions(+), 105 deletions(-)

diff --git a/flow.c b/flow.c
index 93b687dc..8915e366 100644
--- a/flow.c
+++ b/flow.c
@@ -127,18 +127,18 @@ static struct timespec flow_timer_run;
  * @af:		Address family (AF_INET or AF_INET6)
  * @eaddr:	Endpoint address (pointer to in_addr or in6_addr)
  * @eport:	Endpoint port
- * @faddr:	Forwarding address (pointer to in_addr or in6_addr)
- * @fport:	Forwarding port
+ * @oaddr:	Our address (pointer to in_addr or in6_addr)
+ * @oport:	Our port
  */
 static void flowside_from_af(struct flowside *side, sa_family_t af,
 			     const void *eaddr, in_port_t eport,
-			     const void *faddr, in_port_t fport)
+			     const void *oaddr, in_port_t oport)
 {
-	if (faddr)
-		inany_from_af(&side->faddr, af, faddr);
+	if (oaddr)
+		inany_from_af(&side->oaddr, af, oaddr);
 	else
-		side->faddr = inany_any6;
-	side->fport = fport;
+		side->oaddr = inany_any6;
+	side->oport = oport;
 
 	if (eaddr)
 		inany_from_af(&side->eaddr, af, eaddr);
@@ -193,8 +193,8 @@ static int flowside_sock_splice(void *arg)
  * @tgt:	Target flowside
  * @data:	epoll reference portion for protocol handlers
  *
- * Return: socket fd of protocol @proto bound to the forwarding address and port
- *         from @tgt (if specified).
+ * Return: socket fd of protocol @proto bound to our address and port from @tgt
+ *         (if specified).
  */
 int flowside_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
 		     const struct flowside *tgt, uint32_t data)
@@ -205,11 +205,11 @@ int flowside_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
 
 	ASSERT(pif_is_socket(pif));
 
-	pif_sockaddr(c, &sa, &sl, pif, &tgt->faddr, tgt->fport);
+	pif_sockaddr(c, &sa, &sl, pif, &tgt->oaddr, tgt->oport);
 
 	switch (pif) {
 	case PIF_HOST:
-		if (inany_is_loopback(&tgt->faddr))
+		if (inany_is_loopback(&tgt->oaddr))
 			ifname = NULL;
 		else if (sa.sa_family == AF_INET)
 			ifname = c->ip4.ifname_out;
@@ -309,11 +309,11 @@ static void flow_set_state(struct flow_common *f, enum flow_state state)
 			  pif_name(f->pif[INISIDE]),
 			  inany_ntop(&ini->eaddr, estr0, sizeof(estr0)),
 			  ini->eport,
-			  inany_ntop(&ini->faddr, fstr0, sizeof(fstr0)),
-			  ini->fport,
+			  inany_ntop(&ini->oaddr, fstr0, sizeof(fstr0)),
+			  ini->oport,
 			  pif_name(f->pif[TGTSIDE]),
-			  inany_ntop(&tgt->faddr, fstr1, sizeof(fstr1)),
-			  tgt->fport,
+			  inany_ntop(&tgt->oaddr, fstr1, sizeof(fstr1)),
+			  tgt->oport,
 			  inany_ntop(&tgt->eaddr, estr1, sizeof(estr1)),
 			  tgt->eport);
 	else if (MAX(state, oldstate) >= FLOW_STATE_INI)
@@ -321,8 +321,8 @@ static void flow_set_state(struct flow_common *f, enum flow_state state)
 			  pif_name(f->pif[INISIDE]),
 			  inany_ntop(&ini->eaddr, estr0, sizeof(estr0)),
 			  ini->eport,
-			  inany_ntop(&ini->faddr, fstr0, sizeof(fstr0)),
-			  ini->fport);
+			  inany_ntop(&ini->oaddr, fstr0, sizeof(fstr0)),
+			  ini->oport);
 }
 
 /**
@@ -347,7 +347,7 @@ static void flow_initiate_(union flow *flow, uint8_t pif)
  * flow_initiate_af() - Move flow to INI, setting INISIDE details
  * @flow:	Flow to change state
  * @pif:	pif of the initiating side
- * @af:		Address family of @eaddr and @faddr
+ * @af:		Address family of @eaddr and @oaddr
  * @saddr:	Source address (pointer to in_addr or in6_addr)
  * @sport:	Endpoint port
  * @daddr:	Destination address (pointer to in_addr or in6_addr)
@@ -384,10 +384,10 @@ const struct flowside *flow_initiate_sa(union flow *flow, uint8_t pif,
 
 	inany_from_sockaddr(&ini->eaddr, &ini->eport, ssa);
 	if (inany_v4(&ini->eaddr))
-		ini->faddr = inany_any4;
+		ini->oaddr = inany_any4;
 	else
-		ini->faddr = inany_any6;
-	ini->fport = dport;
+		ini->oaddr = inany_any6;
+	ini->oport = dport;
 	flow_initiate_(flow, pif);
 	return ini;
 }
@@ -432,8 +432,8 @@ const struct flowside *flow_target(const struct ctx *c, union flow *flow,
 			 pif_name(f->pif[INISIDE]),
 			 inany_ntop(&ini->eaddr, estr, sizeof(estr)),
 			 ini->eport,
-			 inany_ntop(&ini->faddr, fstr, sizeof(fstr)),
-			 ini->fport);
+			 inany_ntop(&ini->oaddr, fstr, sizeof(fstr)),
+			 ini->oport);
 	}
 
 	if (tgtpif == PIF_NONE)
@@ -561,12 +561,12 @@ static uint64_t flow_hash(const struct ctx *c, uint8_t proto, uint8_t pif,
 {
 	struct siphash_state state = SIPHASH_INIT(c->hash_secret);
 
-	inany_siphash_feed(&state, &side->faddr);
+	inany_siphash_feed(&state, &side->oaddr);
 	inany_siphash_feed(&state, &side->eaddr);
 
 	return siphash_final(&state, 38, (uint64_t)proto << 40 |
 			     (uint64_t)pif << 32 |
-			     (uint64_t)side->fport << 16 |
+			     (uint64_t)side->oport << 16 |
 			     (uint64_t)side->eport);
 }
 
@@ -587,7 +587,7 @@ static uint64_t flow_sidx_hash(const struct ctx *c, flow_sidx_t sidx)
 	 * information, and at least a forwarding port.
 	 */
 	ASSERT(pif != PIF_NONE && !inany_is_unspecified(&side->eaddr) &&
-	       side->eport != 0 && side->fport != 0);
+	       side->eport != 0 && side->oport != 0);
 
 	return flow_hash(c, FLOW_PROTO(f), pif, side);
 }
@@ -709,20 +709,20 @@ static flow_sidx_t flowside_lookup(const struct ctx *c, uint8_t proto,
  * @pif:	Interface of the flow
  * @af:		Address family, AF_INET or AF_INET6
  * @eaddr:	Guest side endpoint address (guest local address)
- * @faddr:	Guest side forwarding address (guest remote address)
+ * @oaddr:	Our guest side address (guest remote address)
  * @eport:	Guest side endpoint port (guest local port)
- * @fport:	Guest side forwarding port (guest remote port)
+ * @oport:	Our guest side port (guest remote port)
  *
  * Return: sidx of the matching flow & side, FLOW_SIDX_NONE if not found
  */
 flow_sidx_t flow_lookup_af(const struct ctx *c,
 			   uint8_t proto, uint8_t pif, sa_family_t af,
-			   const void *eaddr, const void *faddr,
-			   in_port_t eport, in_port_t fport)
+			   const void *eaddr, const void *oaddr,
+			   in_port_t eport, in_port_t oport)
 {
 	struct flowside side;
 
-	flowside_from_af(&side, af, eaddr, eport, faddr, fport);
+	flowside_from_af(&side, af, eaddr, eport, oaddr, oport);
 	return flowside_lookup(c, proto, pif, &side);
 }
 
@@ -732,22 +732,22 @@ flow_sidx_t flow_lookup_af(const struct ctx *c,
  * @proto:	Protocol of the flow (IP L4 protocol number)
  * @pif:	Interface of the flow
  * @esa:	Socket address of the endpoint
- * @fport:	Forwarding port number
+ * @oport:	Our port number
  *
  * Return: sidx of the matching flow & side, FLOW_SIDX_NONE if not found
  */
 flow_sidx_t flow_lookup_sa(const struct ctx *c, uint8_t proto, uint8_t pif,
-			   const void *esa, in_port_t fport)
+			   const void *esa, in_port_t oport)
 {
 	struct flowside side = {
-		.fport = fport,
+		.oport = oport,
 	};
 
 	inany_from_sockaddr(&side.eaddr, &side.eport, esa);
 	if (inany_v4(&side.eaddr))
-		side.faddr = inany_any4;
+		side.oaddr = inany_any4;
 	else
-		side.faddr = inany_any6;
+		side.oaddr = inany_any6;
 
 	return flowside_lookup(c, proto, pif, &side);
 }
diff --git a/flow.h b/flow.h
index 078fd605..d167b654 100644
--- a/flow.h
+++ b/flow.h
@@ -140,14 +140,14 @@ extern const uint8_t flow_proto[];
 /**
  * struct flowside - Address information for one side of a flow
  * @eaddr:	Endpoint address (remote address from passt's PoV)
- * @faddr:	Forwarding address (local address from passt's PoV)
+ * @oaddr:	Our address (local address from passt's PoV)
  * @eport:	Endpoint port
- * @fport:	Forwarding port
+ * @oport:	Our port
  */
 struct flowside {
-	union inany_addr	faddr;
+	union inany_addr	oaddr;
 	union inany_addr	eaddr;
-	in_port_t		fport;
+	in_port_t		oport;
 	in_port_t		eport;
 };
 
@@ -162,8 +162,8 @@ static inline bool flowside_eq(const struct flowside *left,
 {
 	return inany_equals(&left->eaddr, &right->eaddr) &&
 	       left->eport == right->eport &&
-	       inany_equals(&left->faddr, &right->faddr) &&
-	       left->fport == right->fport;
+	       inany_equals(&left->oaddr, &right->oaddr) &&
+	       left->oport == right->oport;
 }
 
 int flowside_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
@@ -240,10 +240,10 @@ uint64_t flow_hash_insert(const struct ctx *c, flow_sidx_t sidx);
 void flow_hash_remove(const struct ctx *c, flow_sidx_t sidx);
 flow_sidx_t flow_lookup_af(const struct ctx *c,
 			   uint8_t proto, uint8_t pif, sa_family_t af,
-			   const void *eaddr, const void *faddr,
-			   in_port_t eport, in_port_t fport);
+			   const void *eaddr, const void *oaddr,
+			   in_port_t eport, in_port_t oport);
 flow_sidx_t flow_lookup_sa(const struct ctx *c, uint8_t proto, uint8_t pif,
-			   const void *esa, in_port_t fport);
+			   const void *esa, in_port_t oport);
 
 union flow;
 
diff --git a/fwd.c b/fwd.c
index dea36f6c..b546bc41 100644
--- a/fwd.c
+++ b/fwd.c
@@ -167,7 +167,7 @@ void fwd_scan_ports_init(struct ctx *c)
 static bool is_dns_flow(uint8_t proto, const struct flowside *ini)
 {
 	return ((proto == IPPROTO_UDP) || (proto == IPPROTO_TCP)) &&
-		((ini->fport == 53) || (ini->fport == 853));
+		((ini->oport == 53) || (ini->oport == 853));
 }
 
 /**
@@ -184,33 +184,33 @@ uint8_t fwd_nat_from_tap(const struct ctx *c, uint8_t proto,
 			 const struct flowside *ini, struct flowside *tgt)
 {
 	if (is_dns_flow(proto, ini) &&
-	    inany_equals4(&ini->faddr, &c->ip4.dns_match))
+	    inany_equals4(&ini->oaddr, &c->ip4.dns_match))
 		tgt->eaddr = inany_from_v4(c->ip4.dns_host);
 	else if (is_dns_flow(proto, ini) &&
-		   inany_equals6(&ini->faddr, &c->ip6.dns_match))
+		   inany_equals6(&ini->oaddr, &c->ip6.dns_match))
 		tgt->eaddr.a6 = c->ip6.dns_host;
-	else if (!c->no_map_gw && inany_equals4(&ini->faddr, &c->ip4.gw))
+	else if (!c->no_map_gw && inany_equals4(&ini->oaddr, &c->ip4.gw))
 		tgt->eaddr = inany_loopback4;
-	else if (!c->no_map_gw && inany_equals6(&ini->faddr, &c->ip6.gw))
+	else if (!c->no_map_gw && inany_equals6(&ini->oaddr, &c->ip6.gw))
 		tgt->eaddr = inany_loopback6;
 	else
-		tgt->eaddr = ini->faddr;
+		tgt->eaddr = ini->oaddr;
 
-	tgt->eport = ini->fport;
+	tgt->eport = ini->oport;
 
 	/* The relevant addr_out controls the host side source address.  This
 	 * may be unspecified, which allows the kernel to pick an address.
 	 */
 	if (inany_v4(&tgt->eaddr))
-		tgt->faddr = inany_from_v4(c->ip4.addr_out);
+		tgt->oaddr = inany_from_v4(c->ip4.addr_out);
 	else
-		tgt->faddr.a6 = c->ip6.addr_out;
+		tgt->oaddr.a6 = c->ip6.addr_out;
 
 	/* Let the kernel pick a host side source port */
-	tgt->fport = 0;
+	tgt->oport = 0;
 	if (proto == IPPROTO_UDP) {
 		/* But for UDP we preserve the source port */
-		tgt->fport = ini->eport;
+		tgt->oport = ini->eport;
 	}
 
 	return PIF_HOST;
@@ -230,13 +230,13 @@ uint8_t fwd_nat_from_splice(const struct ctx *c, uint8_t proto,
 			    const struct flowside *ini, struct flowside *tgt)
 {
 	if (!inany_is_loopback(&ini->eaddr) ||
-	    (!inany_is_loopback(&ini->faddr) && !inany_is_unspecified(&ini->faddr))) {
+	    (!inany_is_loopback(&ini->oaddr) && !inany_is_unspecified(&ini->oaddr))) {
 		char estr[INANY_ADDRSTRLEN], fstr[INANY_ADDRSTRLEN];
 
 		debug("Non loopback address on %s: [%s]:%hu -> [%s]:%hu",
 		      pif_name(PIF_SPLICE),
 		      inany_ntop(&ini->eaddr, estr, sizeof(estr)), ini->eport,
-		      inany_ntop(&ini->faddr, fstr, sizeof(fstr)), ini->fport);
+		      inany_ntop(&ini->oaddr, fstr, sizeof(fstr)), ini->oport);
 		return PIF_NONE;
 	}
 
@@ -248,20 +248,20 @@ uint8_t fwd_nat_from_splice(const struct ctx *c, uint8_t proto,
 	/* Preserve the specific loopback adddress used, but let the kernel pick
 	 * a source port on the target side
 	 */
-	tgt->faddr = ini->eaddr;
-	tgt->fport = 0;
+	tgt->oaddr = ini->eaddr;
+	tgt->oport = 0;
 
-	tgt->eport = ini->fport;
+	tgt->eport = ini->oport;
 	if (proto == IPPROTO_TCP)
 		tgt->eport += c->tcp.fwd_out.delta[tgt->eport];
 	else if (proto == IPPROTO_UDP)
 		tgt->eport += c->udp.fwd_out.delta[tgt->eport];
 
 	/* Let the kernel pick a host side source port */
-	tgt->fport = 0;
+	tgt->oport = 0;
 	if (proto == IPPROTO_UDP)
 		/* But for UDP preserve the source port */
-		tgt->fport = ini->eport;
+		tgt->oport = ini->eport;
 
 	return PIF_HOST;
 }
@@ -280,7 +280,7 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto,
 			  const struct flowside *ini, struct flowside *tgt)
 {
 	/* Common for spliced and non-spliced cases */
-	tgt->eport = ini->fport;
+	tgt->eport = ini->oport;
 	if (proto == IPPROTO_TCP)
 		tgt->eport += c->tcp.fwd_in.delta[tgt->eport];
 	else if (proto == IPPROTO_UDP)
@@ -293,11 +293,11 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto,
 		/* Preserve the specific loopback adddress used, but let the
 		 * kernel pick a source port on the target side
 		 */
-		tgt->faddr = ini->eaddr;
-		tgt->fport = 0;
+		tgt->oaddr = ini->eaddr;
+		tgt->oport = 0;
 		if (proto == IPPROTO_UDP)
 			/* But for UDP preserve the source port */
-			tgt->fport = ini->eport;
+			tgt->oport = ini->eport;
 
 		if (inany_v4(&ini->eaddr))
 			tgt->eaddr = inany_loopback4;
@@ -307,26 +307,26 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto,
 		return PIF_SPLICE;
 	}
 
-	tgt->faddr = ini->eaddr;
-	tgt->fport = ini->eport;
+	tgt->oaddr = ini->eaddr;
+	tgt->oport = ini->eport;
 
-	if (inany_is_loopback4(&tgt->faddr) ||
-	    inany_is_unspecified4(&tgt->faddr) ||
-	    inany_equals4(&tgt->faddr, &c->ip4.addr_seen)) {
-		tgt->faddr = inany_from_v4(c->ip4.gw);
-	} else if (inany_is_loopback6(&tgt->faddr) ||
-		   inany_equals6(&tgt->faddr, &c->ip6.addr_seen) ||
-		   inany_equals6(&tgt->faddr, &c->ip6.addr)) {
+	if (inany_is_loopback4(&tgt->oaddr) ||
+	    inany_is_unspecified4(&tgt->oaddr) ||
+	    inany_equals4(&tgt->oaddr, &c->ip4.addr_seen)) {
+		tgt->oaddr = inany_from_v4(c->ip4.gw);
+	} else if (inany_is_loopback6(&tgt->oaddr) ||
+		   inany_equals6(&tgt->oaddr, &c->ip6.addr_seen) ||
+		   inany_equals6(&tgt->oaddr, &c->ip6.addr)) {
 		if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.gw))
-			tgt->faddr.a6 = c->ip6.gw;
+			tgt->oaddr.a6 = c->ip6.gw;
 		else
-			tgt->faddr.a6 = c->ip6.addr_ll;
+			tgt->oaddr.a6 = c->ip6.addr_ll;
 	}
 
-	if (inany_v4(&tgt->faddr)) {
+	if (inany_v4(&tgt->oaddr)) {
 		tgt->eaddr = inany_from_v4(c->ip4.addr_seen);
 	} else {
-		if (inany_is_linklocal6(&tgt->faddr))
+		if (inany_is_linklocal6(&tgt->oaddr))
 			tgt->eaddr.a6 = c->ip6.addr_ll_seen;
 		else
 			tgt->eaddr.a6 = c->ip6.addr_seen;
diff --git a/icmp.c b/icmp.c
index cb81c768..f514dbc9 100644
--- a/icmp.c
+++ b/icmp.c
@@ -125,13 +125,13 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref)
 		 ini->eport, seq);
 
 	if (pingf->f.type == FLOW_PING4) {
-		const struct in_addr *saddr = inany_v4(&ini->faddr);
+		const struct in_addr *saddr = inany_v4(&ini->oaddr);
 		const struct in_addr *daddr = inany_v4(&ini->eaddr);
 
 		ASSERT(saddr && daddr); /* Must have IPv4 addresses */
 		tap_icmp4_send(c, *saddr, *daddr, buf, n);
 	} else if (pingf->f.type == FLOW_PING6) {
-		const struct in6_addr *saddr = &ini->faddr.a6;
+		const struct in6_addr *saddr = &ini->oaddr.a6;
 		const struct in6_addr *daddr = &ini->eaddr.a6;
 
 		tap_icmp6_send(c, saddr, daddr, buf, n);
diff --git a/tcp.c b/tcp.c
index c0820ce7..f01fe8f9 100644
--- a/tcp.c
+++ b/tcp.c
@@ -361,8 +361,8 @@ static const char *tcp_flag_str[] __attribute((__unused__)) = {
 static int tcp_sock_init_ext	[NUM_PORTS][IP_VERSIONS];
 static int tcp_sock_ns		[NUM_PORTS][IP_VERSIONS];
 
-/* Table of guest side forwarding addresses with very low RTT (assumed
- * to be local to the host), LRU
+/* Table of our guest side addresses with very low RTT (assumed to be local to
+ * the host), LRU
  */
 static union inany_addr low_rtt_dst[LOW_RTT_TABLE_SIZE];
 
@@ -663,7 +663,7 @@ static int tcp_rtt_dst_low(const struct tcp_tap_conn *conn)
 	int i;
 
 	for (i = 0; i < LOW_RTT_TABLE_SIZE; i++)
-		if (inany_equals(&tapside->faddr, low_rtt_dst + i))
+		if (inany_equals(&tapside->oaddr, low_rtt_dst + i))
 			return 1;
 
 	return 0;
@@ -686,7 +686,7 @@ static void tcp_rtt_dst_check(const struct tcp_tap_conn *conn,
 		return;
 
 	for (i = 0; i < LOW_RTT_TABLE_SIZE; i++) {
-		if (inany_equals(&tapside->faddr, low_rtt_dst + i))
+		if (inany_equals(&tapside->oaddr, low_rtt_dst + i))
 			return;
 		if (hole == -1 && IN6_IS_ADDR_UNSPECIFIED(low_rtt_dst + i))
 			hole = i;
@@ -698,7 +698,7 @@ static void tcp_rtt_dst_check(const struct tcp_tap_conn *conn,
 	if (hole == -1)
 		return;
 
-	low_rtt_dst[hole++] = tapside->faddr;
+	low_rtt_dst[hole++] = tapside->oaddr;
 	if (hole == LOW_RTT_TABLE_SIZE)
 		hole = 0;
 	inany_from_af(low_rtt_dst + hole, AF_INET6, &in6addr_any);
@@ -881,7 +881,7 @@ static void tcp_fill_header(struct tcphdr *th,
 {
 	const struct flowside *tapside = TAPFLOW(conn);
 
-	th->source = htons(tapside->fport);
+	th->source = htons(tapside->oport);
 	th->dest = htons(tapside->eport);
 	th->seq = htonl(seq);
 	th->ack_seq = htonl(conn->seq_ack_to_tap);
@@ -913,7 +913,7 @@ static size_t tcp_fill_headers4(const struct tcp_tap_conn *conn,
 				uint32_t seq)
 {
 	const struct flowside *tapside = TAPFLOW(conn);
-	const struct in_addr *src4 = inany_v4(&tapside->faddr);
+	const struct in_addr *src4 = inany_v4(&tapside->oaddr);
 	const struct in_addr *dst4 = inany_v4(&tapside->eaddr);
 	size_t l4len = dlen + sizeof(*th);
 	size_t l3len = l4len + sizeof(*iph);
@@ -957,7 +957,7 @@ static size_t tcp_fill_headers6(const struct tcp_tap_conn *conn,
 	size_t l4len = dlen + sizeof(*th);
 
 	ip6h->payload_len = htons(l4len);
-	ip6h->saddr = tapside->faddr.a6;
+	ip6h->saddr = tapside->oaddr.a6;
 	ip6h->daddr = tapside->eaddr.a6;
 
 	ip6h->hop_limit = 255;
@@ -992,7 +992,7 @@ size_t tcp_l2_buf_fill_headers(const struct tcp_tap_conn *conn,
 			       const uint16_t *check, uint32_t seq)
 {
 	const struct flowside *tapside = TAPFLOW(conn);
-	const struct in_addr *a4 = inany_v4(&tapside->faddr);
+	const struct in_addr *a4 = inany_v4(&tapside->oaddr);
 
 	if (a4) {
 		return tcp_fill_headers4(conn, iov[TCP_IOV_TAP].iov_base,
@@ -1417,15 +1417,15 @@ static void tcp_bind_outbound(const struct ctx *c,
 	socklen_t sl;
 
 
-	pif_sockaddr(c, &bind_sa, &sl, PIF_HOST, &tgt->faddr, tgt->fport);
-	if (!inany_is_unspecified(&tgt->faddr) || tgt->fport) {
+	pif_sockaddr(c, &bind_sa, &sl, PIF_HOST, &tgt->oaddr, tgt->oport);
+	if (!inany_is_unspecified(&tgt->oaddr) || tgt->oport) {
 		if (bind(s, &bind_sa.sa, sl)) {
 			char sstr[INANY_ADDRSTRLEN];
 
 			flow_dbg(conn,
 				 "Can't bind TCP outbound socket to %s:%hu: %s",
-				 inany_ntop(&tgt->faddr, sstr, sizeof(sstr)),
-				 tgt->fport, strerror(errno));
+				 inany_ntop(&tgt->oaddr, sstr, sizeof(sstr)),
+				 tgt->oport, strerror(errno));
 		}
 	}
 
@@ -1497,12 +1497,12 @@ static void tcp_conn_from_tap(struct ctx *c, sa_family_t af,
 	conn = FLOW_SET_TYPE(flow, FLOW_TCP, tcp);
 
 	if (!inany_is_unicast(&ini->eaddr) || ini->eport == 0 ||
-	    !inany_is_unicast(&ini->faddr) || ini->fport == 0) {
+	    !inany_is_unicast(&ini->oaddr) || ini->oport == 0) {
 		char sstr[INANY_ADDRSTRLEN], dstr[INANY_ADDRSTRLEN];
 
 		debug("Invalid endpoint in TCP SYN: %s:%hu -> %s:%hu",
 		      inany_ntop(&ini->eaddr, sstr, sizeof(sstr)), ini->eport,
-		      inany_ntop(&ini->faddr, dstr, sizeof(dstr)), ini->fport);
+		      inany_ntop(&ini->oaddr, dstr, sizeof(dstr)), ini->oport);
 		goto cancel;
 	}
 
@@ -2100,7 +2100,8 @@ void tcp_listen_handler(struct ctx *c, union epoll_ref ref,
 		goto cancel;
 
 	/* FIXME: When listening port has a specific bound address, record that
-	 * as the forwarding address */
+	 * as our address
+	 */
 	ini = flow_initiate_sa(flow, ref.tcp_listen.pif, &sa,
 			       ref.tcp_listen.port);
 
diff --git a/tcp_internal.h b/tcp_internal.h
index 8b60aabc..aa8bb64f 100644
--- a/tcp_internal.h
+++ b/tcp_internal.h
@@ -44,7 +44,7 @@
 #define TAPFLOW(conn_)	(&((conn_)->f.side[TAPSIDE(conn_)]))
 #define TAP_SIDX(conn_)	(FLOW_SIDX((conn_), TAPSIDE(conn_)))
 
-#define CONN_V4(conn)		(!!inany_v4(&TAPFLOW(conn)->faddr))
+#define CONN_V4(conn)		(!!inany_v4(&TAPFLOW(conn)->oaddr))
 #define CONN_V6(conn)		(!CONN_V4(conn))
 
 /*
diff --git a/udp.c b/udp.c
index 77312572..57dcc667 100644
--- a/udp.c
+++ b/udp.c
@@ -321,7 +321,7 @@ static void udp_splice_send(const struct ctx *c, size_t start, size_t n,
 static size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
 			      const struct flowside *toside, size_t dlen)
 {
-	const struct in_addr *src = inany_v4(&toside->faddr);
+	const struct in_addr *src = inany_v4(&toside->oaddr);
 	const struct in_addr *dst = inany_v4(&toside->eaddr);
 	size_t l4len = dlen + sizeof(bp->uh);
 	size_t l3len = l4len + sizeof(*ip4h);
@@ -333,7 +333,7 @@ static size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
 	ip4h->saddr = src->s_addr;
 	ip4h->check = csum_ip4_header(l3len, IPPROTO_UDP, *src, *dst);
 
-	bp->uh.source = htons(toside->fport);
+	bp->uh.source = htons(toside->oport);
 	bp->uh.dest = htons(toside->eport);
 	bp->uh.len = htons(l4len);
 	csum_udp4(&bp->uh, *src, *dst, bp->data, dlen);
@@ -357,15 +357,15 @@ static size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp,
 
 	ip6h->payload_len = htons(l4len);
 	ip6h->daddr = toside->eaddr.a6;
-	ip6h->saddr = toside->faddr.a6;
+	ip6h->saddr = toside->oaddr.a6;
 	ip6h->version = 6;
 	ip6h->nexthdr = IPPROTO_UDP;
 	ip6h->hop_limit = 255;
 
-	bp->uh.source = htons(toside->fport);
+	bp->uh.source = htons(toside->oport);
 	bp->uh.dest = htons(toside->eport);
 	bp->uh.len = ip6h->payload_len;
-	csum_udp6(&bp->uh, &toside->faddr.a6, &toside->eaddr.a6, bp->data, dlen);
+	csum_udp6(&bp->uh, &toside->oaddr.a6, &toside->eaddr.a6, bp->data, dlen);
 
 	return l4len;
 }
@@ -384,7 +384,7 @@ static void udp_tap_prepare(const struct mmsghdr *mmh, unsigned idx,
 	struct udp_meta_t *bm = &udp_meta[idx];
 	size_t l4len;
 
-	if (!inany_v4(&toside->eaddr) || !inany_v4(&toside->faddr)) {
+	if (!inany_v4(&toside->eaddr) || !inany_v4(&toside->oaddr)) {
 		l4len = udp_update_hdr6(&bm->ip6h, bp, toside, mmh[idx].msg_len);
 		tap_hdr_update(&bm->taph, l4len + sizeof(bm->ip6h) +
 			       sizeof(udp6_eth_hdr));
-- 
@@ -321,7 +321,7 @@ static void udp_splice_send(const struct ctx *c, size_t start, size_t n,
 static size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
 			      const struct flowside *toside, size_t dlen)
 {
-	const struct in_addr *src = inany_v4(&toside->faddr);
+	const struct in_addr *src = inany_v4(&toside->oaddr);
 	const struct in_addr *dst = inany_v4(&toside->eaddr);
 	size_t l4len = dlen + sizeof(bp->uh);
 	size_t l3len = l4len + sizeof(*ip4h);
@@ -333,7 +333,7 @@ static size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp,
 	ip4h->saddr = src->s_addr;
 	ip4h->check = csum_ip4_header(l3len, IPPROTO_UDP, *src, *dst);
 
-	bp->uh.source = htons(toside->fport);
+	bp->uh.source = htons(toside->oport);
 	bp->uh.dest = htons(toside->eport);
 	bp->uh.len = htons(l4len);
 	csum_udp4(&bp->uh, *src, *dst, bp->data, dlen);
@@ -357,15 +357,15 @@ static size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp,
 
 	ip6h->payload_len = htons(l4len);
 	ip6h->daddr = toside->eaddr.a6;
-	ip6h->saddr = toside->faddr.a6;
+	ip6h->saddr = toside->oaddr.a6;
 	ip6h->version = 6;
 	ip6h->nexthdr = IPPROTO_UDP;
 	ip6h->hop_limit = 255;
 
-	bp->uh.source = htons(toside->fport);
+	bp->uh.source = htons(toside->oport);
 	bp->uh.dest = htons(toside->eport);
 	bp->uh.len = ip6h->payload_len;
-	csum_udp6(&bp->uh, &toside->faddr.a6, &toside->eaddr.a6, bp->data, dlen);
+	csum_udp6(&bp->uh, &toside->oaddr.a6, &toside->eaddr.a6, bp->data, dlen);
 
 	return l4len;
 }
@@ -384,7 +384,7 @@ static void udp_tap_prepare(const struct mmsghdr *mmh, unsigned idx,
 	struct udp_meta_t *bm = &udp_meta[idx];
 	size_t l4len;
 
-	if (!inany_v4(&toside->eaddr) || !inany_v4(&toside->faddr)) {
+	if (!inany_v4(&toside->eaddr) || !inany_v4(&toside->oaddr)) {
 		l4len = udp_update_hdr6(&bm->ip6h, bp, toside, mmh[idx].msg_len);
 		tap_hdr_update(&bm->taph, l4len + sizeof(bm->ip6h) +
 			       sizeof(udp6_eth_hdr));
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 02/22] util: Helper for formatting MAC addresses
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
  2024-08-16  5:39 ` [PATCH 01/22] treewide: Use "our address" instead of "forwarding address" David Gibson
@ 2024-08-16  5:39 ` David Gibson
  2024-08-18 15:44   ` Stefano Brivio
  2024-08-16  5:39 ` [PATCH 03/22] treewide: Rename MAC address fields for clarity David Gibson
                   ` (21 subsequent siblings)
  23 siblings, 1 reply; 55+ messages in thread
From: David Gibson @ 2024-08-16  5:39 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger, David Gibson

There are a couple of places where we somewhat messily open code formatting
an Ethernet like MAC address for display.  Add an eth_ntop() helper for
this.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c |  7 +++----
 dhcp.c |  5 ++---
 util.c | 19 +++++++++++++++++++
 util.h |  3 +++
 4 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/conf.c b/conf.c
index ed097bdc..830f91a6 100644
--- a/conf.c
+++ b/conf.c
@@ -921,7 +921,8 @@ pasta_opts:
  */
 static void conf_print(const struct ctx *c)
 {
-	char buf4[INET_ADDRSTRLEN], buf6[INET6_ADDRSTRLEN], ifn[IFNAMSIZ];
+	char buf4[INET_ADDRSTRLEN], buf6[INET6_ADDRSTRLEN];
+	char bufmac[ETH_ADDRSTRLEN], ifn[IFNAMSIZ];
 	int i;
 
 	info("Template interface: %s%s%s%s%s",
@@ -955,9 +956,7 @@ static void conf_print(const struct ctx *c)
 		info("Namespace interface: %s", c->pasta_ifn);
 
 	info("MAC:");
-	info("    host: %02x:%02x:%02x:%02x:%02x:%02x",
-	     c->mac[0], c->mac[1], c->mac[2],
-	     c->mac[3], c->mac[4], c->mac[5]);
+	info("    host: %s", eth_ntop(c->mac, bufmac, sizeof(bufmac)));
 
 	if (c->ifi4) {
 		if (!c->no_dhcp) {
diff --git a/dhcp.c b/dhcp.c
index aa9f59da..acc5b03e 100644
--- a/dhcp.c
+++ b/dhcp.c
@@ -276,6 +276,7 @@ static void opt_set_dns_search(const struct ctx *c, size_t max_len)
 int dhcp(const struct ctx *c, const struct pool *p)
 {
 	size_t mlen, dlen, offset = 0, opt_len, opt_off = 0;
+	char macstr[ETH_ADDRSTRLEN];
 	const struct ethhdr *eh;
 	const struct iphdr *iph;
 	const struct udphdr *uh;
@@ -340,9 +341,7 @@ int dhcp(const struct ctx *c, const struct pool *p)
 		return -1;
 	}
 
-	info("    from %02x:%02x:%02x:%02x:%02x:%02x",
-	     m->chaddr[0], m->chaddr[1], m->chaddr[2],
-	     m->chaddr[3], m->chaddr[4], m->chaddr[5]);
+	info("    from %s", eth_ntop(m->chaddr, macstr, sizeof(macstr)));
 
 	m->yiaddr = c->ip4.addr;
 	mask.s_addr = htonl(0xffffffff << (32 - c->ip4.prefix_len));
diff --git a/util.c b/util.c
index 0b414045..892358b1 100644
--- a/util.c
+++ b/util.c
@@ -676,6 +676,25 @@ const char *sockaddr_ntop(const void *sa, char *dst, socklen_t size)
 	return dst;
 }
 
+/** eth_ntop() - Convert an Ethernet MAC address to text format
+ * @mac:	MAC address
+ * @dst:	output buffer, minimum ETH_ADDRSTRLEN bytes
+ * @size:	size of buffer at @dst
+ *
+ * Return: On success, a non-null pointer to @dst, NULL on failure
+ */
+const char *eth_ntop(const unsigned char *mac, char *dst, size_t size)
+{
+	int len;
+
+	len = snprintf(dst, size, "%02x:%02x:%02x:%02x:%02x:%02x",
+		       mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
+	if (len < 0 || (size_t)len >= size)
+		return NULL;
+
+	return dst;
+}
+
 /** str_ee_origin() - Convert socket extended error origin to a string
  * @ee:		Socket extended error structure
  *
diff --git a/util.h b/util.h
index cb4d181c..c1748074 100644
--- a/util.h
+++ b/util.h
@@ -215,9 +215,12 @@ static inline const char *af_name(sa_family_t af)
 
 #define SOCKADDR_STRLEN		MAX(SOCKADDR_INET_STRLEN, SOCKADDR_INET6_STRLEN)
 
+#define ETH_ADDRSTRLEN	(ETH_ALEN * 3)
+
 struct sock_extended_err;
 
 const char *sockaddr_ntop(const void *sa, char *dst, socklen_t size);
+const char *eth_ntop(const unsigned char *mac, char *dst, size_t size);
 const char *str_ee_origin(const struct sock_extended_err *ee);
 
 /**
-- 
@@ -215,9 +215,12 @@ static inline const char *af_name(sa_family_t af)
 
 #define SOCKADDR_STRLEN		MAX(SOCKADDR_INET_STRLEN, SOCKADDR_INET6_STRLEN)
 
+#define ETH_ADDRSTRLEN	(ETH_ALEN * 3)
+
 struct sock_extended_err;
 
 const char *sockaddr_ntop(const void *sa, char *dst, socklen_t size);
+const char *eth_ntop(const unsigned char *mac, char *dst, size_t size);
 const char *str_ee_origin(const struct sock_extended_err *ee);
 
 /**
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 03/22] treewide: Rename MAC address fields for clarity
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
  2024-08-16  5:39 ` [PATCH 01/22] treewide: Use "our address" instead of "forwarding address" David Gibson
  2024-08-16  5:39 ` [PATCH 02/22] util: Helper for formatting MAC addresses David Gibson
@ 2024-08-16  5:39 ` David Gibson
  2024-08-18 15:45   ` Stefano Brivio
  2024-08-16  5:39 ` [PATCH 04/22] treewide: Use struct assignment instead of memcpy() for IP addresses David Gibson
                   ` (20 subsequent siblings)
  23 siblings, 1 reply; 55+ messages in thread
From: David Gibson @ 2024-08-16  5:39 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger, David Gibson

c->mac isn't a great name, because it doesn't say whose mac address it is
and it's not necessarily obvious in all the contexts we use it.  Since this
is specifically the address that we (passt/pasta) use on the tap interface,
rename it to "our_tap_mac".  Rename the "mac_guest" field to "guest_mac"
to be grammatically consistent.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 arp.c    |  4 ++--
 conf.c   | 10 +++++-----
 dhcpv6.c |  6 ++++--
 ndp.c    |  4 ++--
 passt.c  |  2 +-
 passt.h  |  8 ++++----
 pasta.c  |  8 ++++----
 tap.c    | 12 ++++++------
 8 files changed, 28 insertions(+), 26 deletions(-)

diff --git a/arp.c b/arp.c
index 93b22c5d..53334dac 100644
--- a/arp.c
+++ b/arp.c
@@ -72,7 +72,7 @@ int arp(const struct ctx *c, const struct pool *p)
 
 	ah->ar_op = htons(ARPOP_REPLY);
 	memcpy(am->tha,		am->sha,	sizeof(am->tha));
-	memcpy(am->sha,		c->mac,		sizeof(am->sha));
+	memcpy(am->sha,		c->our_tap_mac,	sizeof(am->sha));
 
 	memcpy(swap,		am->tip,	sizeof(am->tip));
 	memcpy(am->tip,		am->sip,	sizeof(am->tip));
@@ -80,7 +80,7 @@ int arp(const struct ctx *c, const struct pool *p)
 
 	l2len = sizeof(*eh) + sizeof(*ah) + sizeof(*am);
 	memcpy(eh->h_dest,	eh->h_source,	sizeof(eh->h_dest));
-	memcpy(eh->h_source,	c->mac,		sizeof(eh->h_source));
+	memcpy(eh->h_source,	c->our_tap_mac,	sizeof(eh->h_source));
 
 	tap_send_single(c, eh, l2len);
 
diff --git a/conf.c b/conf.c
index 830f91a6..750fdc86 100644
--- a/conf.c
+++ b/conf.c
@@ -956,7 +956,7 @@ static void conf_print(const struct ctx *c)
 		info("Namespace interface: %s", c->pasta_ifn);
 
 	info("MAC:");
-	info("    host: %s", eth_ntop(c->mac, bufmac, sizeof(bufmac)));
+	info("    host: %s", eth_ntop(c->our_tap_mac, bufmac, sizeof(bufmac)));
 
 	if (c->ifi4) {
 		if (!c->no_dhcp) {
@@ -1289,7 +1289,7 @@ void conf(struct ctx *c, int argc, char **argv)
 			if (c->mode != MODE_PASTA)
 				die("--ns-mac-addr is for pasta mode only");
 
-			parse_mac(c->mac_guest, optarg);
+			parse_mac(c->guest_mac, optarg);
 			break;
 		case 5:
 			if (c->mode != MODE_PASTA)
@@ -1500,7 +1500,7 @@ void conf(struct ctx *c, int argc, char **argv)
 
 			break;
 		case 'M':
-			parse_mac(c->mac, optarg);
+			parse_mac(c->our_tap_mac, optarg);
 			break;
 		case 'g':
 			if (inet_pton(AF_INET6, optarg, &c->ip6.gw)     &&
@@ -1629,9 +1629,9 @@ void conf(struct ctx *c, int argc, char **argv)
 
 	nl_sock_init(c, false);
 	if (!v6_only)
-		c->ifi4 = conf_ip4(ifi4, &c->ip4, c->mac);
+		c->ifi4 = conf_ip4(ifi4, &c->ip4, c->our_tap_mac);
 	if (!v4_only)
-		c->ifi6 = conf_ip6(ifi6, &c->ip6, c->mac);
+		c->ifi6 = conf_ip6(ifi6, &c->ip6, c->our_tap_mac);
 	if ((!c->ifi4 && !c->ifi6) ||
 	    (*c->ip4.ifname_out && !c->ifi4) ||
 	    (*c->ip6.ifname_out && !c->ifi6))
diff --git a/dhcpv6.c b/dhcpv6.c
index 7dcca2a7..bbed41dc 100644
--- a/dhcpv6.c
+++ b/dhcpv6.c
@@ -574,8 +574,10 @@ void dhcpv6_init(const struct ctx *c)
 	resp.server_id.duid_time		= duid_time;
 	resp_not_on_link.server_id.duid_time	= duid_time;
 
-	memcpy(resp.server_id.duid_lladdr,		c->mac, sizeof(c->mac));
-	memcpy(resp_not_on_link.server_id.duid_lladdr,	c->mac, sizeof(c->mac));
+	memcpy(resp.server_id.duid_lladdr,
+	       c->our_tap_mac, sizeof(c->our_tap_mac));
+	memcpy(resp_not_on_link.server_id.duid_lladdr,
+	       c->our_tap_mac, sizeof(c->our_tap_mac));
 
 	resp.ia_addr.addr	= c->ip6.addr;
 }
diff --git a/ndp.c b/ndp.c
index 6dcb4872..9c0fef4a 100644
--- a/ndp.c
+++ b/ndp.c
@@ -247,7 +247,7 @@ int ndp(struct ctx *c, const struct icmp6hdr *ih, const struct in6_addr *saddr,
 
 		memcpy(&na.target_addr, &ns->target_addr,
 		       sizeof(na.target_addr));
-		memcpy(na.target_l2_addr.mac, c->mac, ETH_ALEN);
+		memcpy(na.target_l2_addr.mac, c->our_tap_mac, ETH_ALEN);
 
 	} else if (ih->icmp6_type == RS) {
 		size_t dns_s_len = 0;
@@ -331,7 +331,7 @@ int ndp(struct ctx *c, const struct icmp6hdr *ih, const struct in6_addr *saddr,
 		}
 
 dns_done:
-		memcpy(&ra.source_ll.mac, c->mac, ETH_ALEN);
+		memcpy(&ra.source_ll.mac, c->our_tap_mac, ETH_ALEN);
 	} else {
 		return 1;
 	}
diff --git a/passt.c b/passt.c
index 4b3c306e..96374831 100644
--- a/passt.c
+++ b/passt.c
@@ -272,7 +272,7 @@ int main(int argc, char **argv)
 	if ((!c.no_udp && udp_init(&c)) || (!c.no_tcp && tcp_init(&c)))
 		exit(EXIT_FAILURE);
 
-	proto_update_l2_buf(c.mac_guest, c.mac);
+	proto_update_l2_buf(c.guest_mac, c.our_tap_mac);
 
 	if (c.ifi4 && !c.no_dhcp)
 		dhcp_init();
diff --git a/passt.h b/passt.h
index ef684037..fe3e47d2 100644
--- a/passt.h
+++ b/passt.h
@@ -172,8 +172,8 @@ struct ip6_ctx {
  * @epollfd:		File descriptor for epoll instance
  * @fd_tap_listen:	File descriptor for listening AF_UNIX socket, if any
  * @fd_tap:		AF_UNIX socket, tuntap device, or pre-opened socket
- * @mac:		Host MAC address
- * @mac_guest:		MAC address of guest or namespace, seen or configured
+ * @our_tap_mac:	Pasta/passt's MAC on the tap link
+ * @guest_mac:		MAC address of guest or namespace, seen or configured
  * @hash_secret:	128-bit secret for siphash functions
  * @ifi4:		Index of template interface for IPv4, 0 if IPv4 disabled
  * @ip:			IPv4 configuration
@@ -226,8 +226,8 @@ struct ctx {
 	int epollfd;
 	int fd_tap_listen;
 	int fd_tap;
-	unsigned char mac[ETH_ALEN];
-	unsigned char mac_guest[ETH_ALEN];
+	unsigned char our_tap_mac[ETH_ALEN];
+	unsigned char guest_mac[ETH_ALEN];
 	uint64_t hash_secret[2];
 
 	unsigned int ifi4;
diff --git a/pasta.c b/pasta.c
index 615ff7b3..3b4e8ead 100644
--- a/pasta.c
+++ b/pasta.c
@@ -294,10 +294,10 @@ void pasta_ns_conf(struct ctx *c)
 		    strerror(-rc));
 
 	/* Get or set MAC in target namespace */
-	if (MAC_IS_ZERO(c->mac_guest))
-		nl_link_get_mac(nl_sock_ns, c->pasta_ifi, c->mac_guest);
+	if (MAC_IS_ZERO(c->guest_mac))
+		nl_link_get_mac(nl_sock_ns, c->pasta_ifi, c->guest_mac);
 	else
-		rc = nl_link_set_mac(nl_sock_ns, c->pasta_ifi, c->mac_guest);
+		rc = nl_link_set_mac(nl_sock_ns, c->pasta_ifi, c->guest_mac);
 	if (rc < 0)
 		die("Couldn't set MAC address in namespace: %s",
 		    strerror(-rc));
@@ -367,7 +367,7 @@ void pasta_ns_conf(struct ctx *c)
 		}
 	}
 
-	proto_update_l2_buf(c->mac_guest, NULL);
+	proto_update_l2_buf(c->guest_mac, NULL);
 }
 
 /**
diff --git a/tap.c b/tap.c
index 87be3a6b..852d8376 100644
--- a/tap.c
+++ b/tap.c
@@ -118,8 +118,8 @@ static void *tap_push_l2h(const struct ctx *c, void *buf, uint16_t proto)
 	struct ethhdr *eh = (struct ethhdr *)buf;
 
 	/* TODO: ARP table lookup */
-	memcpy(eh->h_dest, c->mac_guest, ETH_ALEN);
-	memcpy(eh->h_source, c->mac, ETH_ALEN);
+	memcpy(eh->h_dest, c->guest_mac, ETH_ALEN);
+	memcpy(eh->h_source, c->our_tap_mac, ETH_ALEN);
 	eh->h_proto = ntohs(proto);
 	return eh + 1;
 }
@@ -946,9 +946,9 @@ void tap_add_packet(struct ctx *c, ssize_t l2len, char *p)
 
 	eh = (struct ethhdr *)p;
 
-	if (memcmp(c->mac_guest, eh->h_source, ETH_ALEN)) {
-		memcpy(c->mac_guest, eh->h_source, ETH_ALEN);
-		proto_update_l2_buf(c->mac_guest, NULL);
+	if (memcmp(c->guest_mac, eh->h_source, ETH_ALEN)) {
+		memcpy(c->guest_mac, eh->h_source, ETH_ALEN);
+		proto_update_l2_buf(c->guest_mac, NULL);
 	}
 
 	switch (ntohs(eh->h_proto)) {
@@ -1337,6 +1337,6 @@ void tap_sock_init(struct ctx *c)
 		 * sends us packets.  Use the broadcast address so that our
 		 * first packets will reach it.
 		 */
-		memset(&c->mac_guest, 0xff, sizeof(c->mac_guest));
+		memset(&c->guest_mac, 0xff, sizeof(c->guest_mac));
 	}
 }
-- 
@@ -118,8 +118,8 @@ static void *tap_push_l2h(const struct ctx *c, void *buf, uint16_t proto)
 	struct ethhdr *eh = (struct ethhdr *)buf;
 
 	/* TODO: ARP table lookup */
-	memcpy(eh->h_dest, c->mac_guest, ETH_ALEN);
-	memcpy(eh->h_source, c->mac, ETH_ALEN);
+	memcpy(eh->h_dest, c->guest_mac, ETH_ALEN);
+	memcpy(eh->h_source, c->our_tap_mac, ETH_ALEN);
 	eh->h_proto = ntohs(proto);
 	return eh + 1;
 }
@@ -946,9 +946,9 @@ void tap_add_packet(struct ctx *c, ssize_t l2len, char *p)
 
 	eh = (struct ethhdr *)p;
 
-	if (memcmp(c->mac_guest, eh->h_source, ETH_ALEN)) {
-		memcpy(c->mac_guest, eh->h_source, ETH_ALEN);
-		proto_update_l2_buf(c->mac_guest, NULL);
+	if (memcmp(c->guest_mac, eh->h_source, ETH_ALEN)) {
+		memcpy(c->guest_mac, eh->h_source, ETH_ALEN);
+		proto_update_l2_buf(c->guest_mac, NULL);
 	}
 
 	switch (ntohs(eh->h_proto)) {
@@ -1337,6 +1337,6 @@ void tap_sock_init(struct ctx *c)
 		 * sends us packets.  Use the broadcast address so that our
 		 * first packets will reach it.
 		 */
-		memset(&c->mac_guest, 0xff, sizeof(c->mac_guest));
+		memset(&c->guest_mac, 0xff, sizeof(c->guest_mac));
 	}
 }
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 04/22] treewide: Use struct assignment instead of memcpy() for IP addresses
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
                   ` (2 preceding siblings ...)
  2024-08-16  5:39 ` [PATCH 03/22] treewide: Rename MAC address fields for clarity David Gibson
@ 2024-08-16  5:39 ` David Gibson
  2024-08-18 15:45   ` Stefano Brivio
  2024-08-16  5:39 ` [PATCH 05/22] conf: Use array indices rather than pointers for DNS array slots David Gibson
                   ` (19 subsequent siblings)
  23 siblings, 1 reply; 55+ messages in thread
From: David Gibson @ 2024-08-16  5:39 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger, David Gibson

We rely on C11 already, so we can use clearer and more type-checkable
struct assignment instead of mempcy() for copying IP addresses around.

This exposes some "pointer could be const" warnings from cppcheck, so
address those too.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c   | 12 ++++++------
 dhcpv6.c | 10 ++++++----
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/conf.c b/conf.c
index 750fdc86..9b05afeb 100644
--- a/conf.c
+++ b/conf.c
@@ -389,14 +389,14 @@ static void add_dns6(struct ctx *c,
 	/* Guest or container can only access local addresses via redirect */
 	if (IN6_IS_ADDR_LOOPBACK(addr)) {
 		if (!c->no_map_gw) {
-			memcpy(*conf, &c->ip6.gw, sizeof(**conf));
+			**conf = c->ip6.gw;
 			(*conf)++;
 
 			if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_match))
-				memcpy(&c->ip6.dns_match, addr, sizeof(*addr));
+				c->ip6.dns_match = *addr;
 		}
 	} else {
-		memcpy(*conf, addr, sizeof(**conf));
+		**conf = *addr;
 		(*conf)++;
 	}
 
@@ -632,7 +632,7 @@ static unsigned int conf_ip4(unsigned int ifi,
 			ip4->prefix_len = 32;
 	}
 
-	memcpy(&ip4->addr_seen, &ip4->addr, sizeof(ip4->addr_seen));
+	ip4->addr_seen = ip4->addr;
 
 	if (MAC_IS_ZERO(mac)) {
 		int rc = nl_link_get_mac(nl_sock, ifi, mac);
@@ -693,8 +693,8 @@ static unsigned int conf_ip6(unsigned int ifi,
 		return 0;
 	}
 
-	memcpy(&ip6->addr_seen, &ip6->addr, sizeof(ip6->addr));
-	memcpy(&ip6->addr_ll_seen, &ip6->addr_ll, sizeof(ip6->addr_ll));
+	ip6->addr_seen = ip6->addr;
+	ip6->addr_ll_seen = ip6->addr_ll;
 
 	if (MAC_IS_ZERO(mac)) {
 		rc = nl_link_get_mac(nl_sock, ifi, mac);
diff --git a/dhcpv6.c b/dhcpv6.c
index bbed41dc..87b3c3eb 100644
--- a/dhcpv6.c
+++ b/dhcpv6.c
@@ -298,7 +298,8 @@ static struct opt_hdr *dhcpv6_ia_notonlink(const struct pool *p,
 {
 	char buf[INET6_ADDRSTRLEN];
 	struct in6_addr req_addr;
-	struct opt_hdr *ia, *h;
+	const struct opt_hdr *h;
+	struct opt_hdr *ia;
 	size_t offset;
 	int ia_type;
 
@@ -312,12 +313,13 @@ ia_ta:
 		offset += sizeof(struct opt_ia_na);
 
 		while ((h = dhcpv6_opt(p, &offset, OPT_IAAADR))) {
-			struct opt_ia_addr *opt_addr = (struct opt_ia_addr *)h;
+			const struct opt_ia_addr *opt_addr
+				= (const struct opt_ia_addr *)h;
 
 			if (ntohs(h->l) != OPT_VSIZE(ia_addr))
 				return NULL;
 
-			memcpy(&req_addr, &opt_addr->addr, sizeof(req_addr));
+			req_addr = opt_addr->addr;
 			if (!IN6_ARE_ADDR_EQUAL(la, &req_addr)) {
 				info("DHCPv6: requested address %s not on link",
 				     inet_ntop(AF_INET6, &req_addr,
@@ -363,7 +365,7 @@ static size_t dhcpv6_dns_fill(const struct ctx *c, char *buf, int offset)
 			srv->hdr.l = 0;
 		}
 
-		memcpy(&srv->addr[i], &c->ip6.dns[i], sizeof(srv->addr[i]));
+		srv->addr[i] = c->ip6.dns[i];
 		srv->hdr.l += sizeof(srv->addr[i]);
 		offset += sizeof(srv->addr[i]);
 	}
-- 
@@ -298,7 +298,8 @@ static struct opt_hdr *dhcpv6_ia_notonlink(const struct pool *p,
 {
 	char buf[INET6_ADDRSTRLEN];
 	struct in6_addr req_addr;
-	struct opt_hdr *ia, *h;
+	const struct opt_hdr *h;
+	struct opt_hdr *ia;
 	size_t offset;
 	int ia_type;
 
@@ -312,12 +313,13 @@ ia_ta:
 		offset += sizeof(struct opt_ia_na);
 
 		while ((h = dhcpv6_opt(p, &offset, OPT_IAAADR))) {
-			struct opt_ia_addr *opt_addr = (struct opt_ia_addr *)h;
+			const struct opt_ia_addr *opt_addr
+				= (const struct opt_ia_addr *)h;
 
 			if (ntohs(h->l) != OPT_VSIZE(ia_addr))
 				return NULL;
 
-			memcpy(&req_addr, &opt_addr->addr, sizeof(req_addr));
+			req_addr = opt_addr->addr;
 			if (!IN6_ARE_ADDR_EQUAL(la, &req_addr)) {
 				info("DHCPv6: requested address %s not on link",
 				     inet_ntop(AF_INET6, &req_addr,
@@ -363,7 +365,7 @@ static size_t dhcpv6_dns_fill(const struct ctx *c, char *buf, int offset)
 			srv->hdr.l = 0;
 		}
 
-		memcpy(&srv->addr[i], &c->ip6.dns[i], sizeof(srv->addr[i]));
+		srv->addr[i] = c->ip6.dns[i];
 		srv->hdr.l += sizeof(srv->addr[i]);
 		offset += sizeof(srv->addr[i]);
 	}
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 05/22] conf: Use array indices rather than pointers for DNS array slots
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
                   ` (3 preceding siblings ...)
  2024-08-16  5:39 ` [PATCH 04/22] treewide: Use struct assignment instead of memcpy() for IP addresses David Gibson
@ 2024-08-16  5:39 ` David Gibson
  2024-08-16  5:39 ` [PATCH 06/22] conf: More accurately count entries added in get_dns() David Gibson
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-16  5:39 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger, David Gibson

Currently add_dns[46]() take a somewhat awkward double pointer to the
entry in the c->ip[46].dns array to update.  It turns out to be easier to
work with indices into that array instead.

This diff does add some lines, but it's comments, and will allow some
future code reductions.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c | 73 +++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 41 insertions(+), 32 deletions(-)

diff --git a/conf.c b/conf.c
index 9b05afeb..2a52bc32 100644
--- a/conf.c
+++ b/conf.c
@@ -354,54 +354,65 @@ bind_all_fail:
  * add_dns4() - Possibly add the IPv4 address of a DNS resolver to configuration
  * @c:		Execution context
  * @addr:	Address found in /etc/resolv.conf
- * @conf:	Pointer to reference of current entry in array of IPv4 resolvers
+ * @idx:	Index of free entry in array of IPv4 resolvers
+ *
+ * Return: Number of entries added (0 or 1)
  */
-static void add_dns4(struct ctx *c, const struct in_addr *addr,
-		     struct in_addr **conf)
+static unsigned add_dns4(struct ctx *c, const struct in_addr *addr,
+			 unsigned idx)
 {
+	unsigned added = 0;
+
 	/* Guest or container can only access local addresses via redirect */
 	if (IN4_IS_ADDR_LOOPBACK(addr)) {
 		if (!c->no_map_gw) {
-			**conf = c->ip4.gw;
-			(*conf)++;
+			c->ip4.dns[idx] = c->ip4.gw;
+			added++;
 
 			if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns_match))
 				c->ip4.dns_match = c->ip4.gw;
 		}
 	} else {
-		**conf = *addr;
-		(*conf)++;
+		c->ip4.dns[idx] = *addr;
+		added++;
 	}
 
 	if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns_host))
 		c->ip4.dns_host = *addr;
+
+	return added;
 }
 
 /**
  * add_dns6() - Possibly add the IPv6 address of a DNS resolver to configuration
  * @c:		Execution context
  * @addr:	Address found in /etc/resolv.conf
- * @conf:	Pointer to reference of current entry in array of IPv6 resolvers
+ * @idx:	Index of free entry in array of IPv6 resolvers
+ *
+ * Return: Number of entries added (0 or 1)
  */
-static void add_dns6(struct ctx *c,
-		     struct in6_addr *addr, struct in6_addr **conf)
+static unsigned add_dns6(struct ctx *c, struct in6_addr *addr, unsigned idx)
 {
+	unsigned added = 0;
+
 	/* Guest or container can only access local addresses via redirect */
 	if (IN6_IS_ADDR_LOOPBACK(addr)) {
 		if (!c->no_map_gw) {
-			**conf = c->ip6.gw;
-			(*conf)++;
+			c->ip6.dns[idx] = c->ip6.gw;
+			added++;
 
 			if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_match))
 				c->ip6.dns_match = *addr;
 		}
 	} else {
-		**conf = *addr;
-		(*conf)++;
+		c->ip6.dns[idx] = *addr;
+		added++;
 	}
 
 	if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_host))
 		c->ip6.dns_host = *addr;
+
+	return added;
 }
 
 /**
@@ -410,18 +421,19 @@ static void add_dns6(struct ctx *c,
  */
 static void get_dns(struct ctx *c)
 {
-	struct in6_addr *dns6 = &c->ip6.dns[0], dns6_tmp;
-	struct in_addr *dns4 = &c->ip4.dns[0], dns4_tmp;
 	int dns4_set, dns6_set, dnss_set, dns_set, fd;
+	unsigned dns4_idx = 0, dns6_idx = 0;
 	struct fqdn *s = c->dns_search;
 	struct lineread resolvconf;
+	struct in6_addr dns6_tmp;
+	struct in_addr dns4_tmp;
 	unsigned int added = 0;
 	ssize_t line_len;
 	char *line, *end;
 	const char *p;
 
-	dns4_set = !c->ifi4 || !IN4_IS_ADDR_UNSPECIFIED(dns4);
-	dns6_set = !c->ifi6 || !IN6_IS_ADDR_UNSPECIFIED(dns6);
+	dns4_set = !c->ifi4 || !IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns[0]);
+	dns6_set = !c->ifi6 || !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns[0]);
 	dnss_set = !!*s->n || c->no_dns_search;
 	dns_set = (dns4_set && dns6_set) || c->no_dns;
 
@@ -442,17 +454,15 @@ static void get_dns(struct ctx *c)
 			if (end)
 				*end = 0;
 
-			if (!dns4_set &&
-			    dns4 - &c->ip4.dns[0] < ARRAY_SIZE(c->ip4.dns) - 1
+			if (!dns4_set && dns4_idx < ARRAY_SIZE(c->ip4.dns) - 1
 			    && inet_pton(AF_INET, p + 1, &dns4_tmp)) {
-				add_dns4(c, &dns4_tmp, &dns4);
+				dns4_idx += add_dns4(c, &dns4_tmp, dns4_idx);
 				added++;
 			}
 
-			if (!dns6_set &&
-			    dns6 - &c->ip6.dns[0] < ARRAY_SIZE(c->ip6.dns) - 1
+			if (!dns6_set && dns6_idx < ARRAY_SIZE(c->ip6.dns) - 1
 			    && inet_pton(AF_INET6, p + 1, &dns6_tmp)) {
-				add_dns6(c, &dns6_tmp, &dns6);
+				dns6_idx += add_dns6(c, &dns6_tmp, dns6_idx);
 				added++;
 			}
 		} else if (!dnss_set && strstr(line, "search ") == line &&
@@ -1236,8 +1246,7 @@ void conf(struct ctx *c, int argc, char **argv)
 	bool copy_addrs_opt = false, copy_routes_opt = false;
 	enum fwd_ports_mode fwd_default = FWD_NONE;
 	bool v4_only = false, v6_only = false;
-	struct in6_addr *dns6 = c->ip6.dns;
-	struct in_addr *dns4 = c->ip4.dns;
+	unsigned dns4_idx = 0, dns6_idx = 0;
 	struct fqdn *dnss = c->dns_search;
 	unsigned int ifi4 = 0, ifi6 = 0;
 	const char *logfile = NULL;
@@ -1662,13 +1671,13 @@ void conf(struct ctx *c, int argc, char **argv)
 			if (!strcmp(optarg, "none")) {
 				c->no_dns = 1;
 
-				dns4 = &c->ip4.dns[0];
+				dns4_idx = 0;
 				memset(c->ip4.dns, 0, sizeof(c->ip4.dns));
 				c->ip4.dns[0]    = (struct in_addr){ 0 };
 				c->ip4.dns_match = (struct in_addr){ 0 };
 				c->ip4.dns_host  = (struct in_addr){ 0 };
 
-				dns6 = &c->ip6.dns[0];
+				dns6_idx = 0;
 				memset(c->ip6.dns, 0, sizeof(c->ip6.dns));
 				c->ip6.dns_match = (struct in6_addr){ 0 };
 				c->ip6.dns_host  = (struct in6_addr){ 0 };
@@ -1678,15 +1687,15 @@ void conf(struct ctx *c, int argc, char **argv)
 
 			c->no_dns = 0;
 
-			if (dns4 - &c->ip4.dns[0] < ARRAY_SIZE(c->ip4.dns) &&
+			if (dns4_idx < ARRAY_SIZE(c->ip4.dns) &&
 			    inet_pton(AF_INET, optarg, &dns4_tmp)) {
-				add_dns4(c, &dns4_tmp, &dns4);
+				dns4_idx += add_dns4(c, &dns4_tmp, dns4_idx);
 				continue;
 			}
 
-			if (dns6 - &c->ip6.dns[0] < ARRAY_SIZE(c->ip6.dns) &&
+			if (dns6_idx < ARRAY_SIZE(c->ip6.dns) &&
 			    inet_pton(AF_INET6, optarg, &dns6_tmp)) {
-				add_dns6(c, &dns6_tmp, &dns6);
+				dns6_idx += add_dns6(c, &dns6_tmp, dns6_idx);
 				continue;
 			}
 
-- 
@@ -354,54 +354,65 @@ bind_all_fail:
  * add_dns4() - Possibly add the IPv4 address of a DNS resolver to configuration
  * @c:		Execution context
  * @addr:	Address found in /etc/resolv.conf
- * @conf:	Pointer to reference of current entry in array of IPv4 resolvers
+ * @idx:	Index of free entry in array of IPv4 resolvers
+ *
+ * Return: Number of entries added (0 or 1)
  */
-static void add_dns4(struct ctx *c, const struct in_addr *addr,
-		     struct in_addr **conf)
+static unsigned add_dns4(struct ctx *c, const struct in_addr *addr,
+			 unsigned idx)
 {
+	unsigned added = 0;
+
 	/* Guest or container can only access local addresses via redirect */
 	if (IN4_IS_ADDR_LOOPBACK(addr)) {
 		if (!c->no_map_gw) {
-			**conf = c->ip4.gw;
-			(*conf)++;
+			c->ip4.dns[idx] = c->ip4.gw;
+			added++;
 
 			if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns_match))
 				c->ip4.dns_match = c->ip4.gw;
 		}
 	} else {
-		**conf = *addr;
-		(*conf)++;
+		c->ip4.dns[idx] = *addr;
+		added++;
 	}
 
 	if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns_host))
 		c->ip4.dns_host = *addr;
+
+	return added;
 }
 
 /**
  * add_dns6() - Possibly add the IPv6 address of a DNS resolver to configuration
  * @c:		Execution context
  * @addr:	Address found in /etc/resolv.conf
- * @conf:	Pointer to reference of current entry in array of IPv6 resolvers
+ * @idx:	Index of free entry in array of IPv6 resolvers
+ *
+ * Return: Number of entries added (0 or 1)
  */
-static void add_dns6(struct ctx *c,
-		     struct in6_addr *addr, struct in6_addr **conf)
+static unsigned add_dns6(struct ctx *c, struct in6_addr *addr, unsigned idx)
 {
+	unsigned added = 0;
+
 	/* Guest or container can only access local addresses via redirect */
 	if (IN6_IS_ADDR_LOOPBACK(addr)) {
 		if (!c->no_map_gw) {
-			**conf = c->ip6.gw;
-			(*conf)++;
+			c->ip6.dns[idx] = c->ip6.gw;
+			added++;
 
 			if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_match))
 				c->ip6.dns_match = *addr;
 		}
 	} else {
-		**conf = *addr;
-		(*conf)++;
+		c->ip6.dns[idx] = *addr;
+		added++;
 	}
 
 	if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_host))
 		c->ip6.dns_host = *addr;
+
+	return added;
 }
 
 /**
@@ -410,18 +421,19 @@ static void add_dns6(struct ctx *c,
  */
 static void get_dns(struct ctx *c)
 {
-	struct in6_addr *dns6 = &c->ip6.dns[0], dns6_tmp;
-	struct in_addr *dns4 = &c->ip4.dns[0], dns4_tmp;
 	int dns4_set, dns6_set, dnss_set, dns_set, fd;
+	unsigned dns4_idx = 0, dns6_idx = 0;
 	struct fqdn *s = c->dns_search;
 	struct lineread resolvconf;
+	struct in6_addr dns6_tmp;
+	struct in_addr dns4_tmp;
 	unsigned int added = 0;
 	ssize_t line_len;
 	char *line, *end;
 	const char *p;
 
-	dns4_set = !c->ifi4 || !IN4_IS_ADDR_UNSPECIFIED(dns4);
-	dns6_set = !c->ifi6 || !IN6_IS_ADDR_UNSPECIFIED(dns6);
+	dns4_set = !c->ifi4 || !IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns[0]);
+	dns6_set = !c->ifi6 || !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns[0]);
 	dnss_set = !!*s->n || c->no_dns_search;
 	dns_set = (dns4_set && dns6_set) || c->no_dns;
 
@@ -442,17 +454,15 @@ static void get_dns(struct ctx *c)
 			if (end)
 				*end = 0;
 
-			if (!dns4_set &&
-			    dns4 - &c->ip4.dns[0] < ARRAY_SIZE(c->ip4.dns) - 1
+			if (!dns4_set && dns4_idx < ARRAY_SIZE(c->ip4.dns) - 1
 			    && inet_pton(AF_INET, p + 1, &dns4_tmp)) {
-				add_dns4(c, &dns4_tmp, &dns4);
+				dns4_idx += add_dns4(c, &dns4_tmp, dns4_idx);
 				added++;
 			}
 
-			if (!dns6_set &&
-			    dns6 - &c->ip6.dns[0] < ARRAY_SIZE(c->ip6.dns) - 1
+			if (!dns6_set && dns6_idx < ARRAY_SIZE(c->ip6.dns) - 1
 			    && inet_pton(AF_INET6, p + 1, &dns6_tmp)) {
-				add_dns6(c, &dns6_tmp, &dns6);
+				dns6_idx += add_dns6(c, &dns6_tmp, dns6_idx);
 				added++;
 			}
 		} else if (!dnss_set && strstr(line, "search ") == line &&
@@ -1236,8 +1246,7 @@ void conf(struct ctx *c, int argc, char **argv)
 	bool copy_addrs_opt = false, copy_routes_opt = false;
 	enum fwd_ports_mode fwd_default = FWD_NONE;
 	bool v4_only = false, v6_only = false;
-	struct in6_addr *dns6 = c->ip6.dns;
-	struct in_addr *dns4 = c->ip4.dns;
+	unsigned dns4_idx = 0, dns6_idx = 0;
 	struct fqdn *dnss = c->dns_search;
 	unsigned int ifi4 = 0, ifi6 = 0;
 	const char *logfile = NULL;
@@ -1662,13 +1671,13 @@ void conf(struct ctx *c, int argc, char **argv)
 			if (!strcmp(optarg, "none")) {
 				c->no_dns = 1;
 
-				dns4 = &c->ip4.dns[0];
+				dns4_idx = 0;
 				memset(c->ip4.dns, 0, sizeof(c->ip4.dns));
 				c->ip4.dns[0]    = (struct in_addr){ 0 };
 				c->ip4.dns_match = (struct in_addr){ 0 };
 				c->ip4.dns_host  = (struct in_addr){ 0 };
 
-				dns6 = &c->ip6.dns[0];
+				dns6_idx = 0;
 				memset(c->ip6.dns, 0, sizeof(c->ip6.dns));
 				c->ip6.dns_match = (struct in6_addr){ 0 };
 				c->ip6.dns_host  = (struct in6_addr){ 0 };
@@ -1678,15 +1687,15 @@ void conf(struct ctx *c, int argc, char **argv)
 
 			c->no_dns = 0;
 
-			if (dns4 - &c->ip4.dns[0] < ARRAY_SIZE(c->ip4.dns) &&
+			if (dns4_idx < ARRAY_SIZE(c->ip4.dns) &&
 			    inet_pton(AF_INET, optarg, &dns4_tmp)) {
-				add_dns4(c, &dns4_tmp, &dns4);
+				dns4_idx += add_dns4(c, &dns4_tmp, dns4_idx);
 				continue;
 			}
 
-			if (dns6 - &c->ip6.dns[0] < ARRAY_SIZE(c->ip6.dns) &&
+			if (dns6_idx < ARRAY_SIZE(c->ip6.dns) &&
 			    inet_pton(AF_INET6, optarg, &dns6_tmp)) {
-				add_dns6(c, &dns6_tmp, &dns6);
+				dns6_idx += add_dns6(c, &dns6_tmp, dns6_idx);
 				continue;
 			}
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 06/22] conf: More accurately count entries added in get_dns()
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
                   ` (4 preceding siblings ...)
  2024-08-16  5:39 ` [PATCH 05/22] conf: Use array indices rather than pointers for DNS array slots David Gibson
@ 2024-08-16  5:39 ` David Gibson
  2024-08-16  5:39 ` [PATCH 07/22] conf: Move DNS array bounds checks into add_dns[46] David Gibson
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-16  5:39 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger, David Gibson

get_dns() counts the number of guest DNS servers it adds, and gives an
error if it couldn't add any.  However, this count ignores the fact that
add_dns[46]() may in some cases *not* add an entry.  Use the array indices
we're already tracking to get an accurate count.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/conf.c b/conf.c
index 2a52bc32..d19013c1 100644
--- a/conf.c
+++ b/conf.c
@@ -427,7 +427,6 @@ static void get_dns(struct ctx *c)
 	struct lineread resolvconf;
 	struct in6_addr dns6_tmp;
 	struct in_addr dns4_tmp;
-	unsigned int added = 0;
 	ssize_t line_len;
 	char *line, *end;
 	const char *p;
@@ -455,16 +454,12 @@ static void get_dns(struct ctx *c)
 				*end = 0;
 
 			if (!dns4_set && dns4_idx < ARRAY_SIZE(c->ip4.dns) - 1
-			    && inet_pton(AF_INET, p + 1, &dns4_tmp)) {
+			    && inet_pton(AF_INET, p + 1, &dns4_tmp))
 				dns4_idx += add_dns4(c, &dns4_tmp, dns4_idx);
-				added++;
-			}
 
 			if (!dns6_set && dns6_idx < ARRAY_SIZE(c->ip6.dns) - 1
-			    && inet_pton(AF_INET6, p + 1, &dns6_tmp)) {
+			    && inet_pton(AF_INET6, p + 1, &dns6_tmp))
 				dns6_idx += add_dns6(c, &dns6_tmp, dns6_idx);
-				added++;
-			}
 		} else if (!dnss_set && strstr(line, "search ") == line &&
 			   s == c->dns_search) {
 			end = strpbrk(line, "\n");
@@ -491,7 +486,7 @@ static void get_dns(struct ctx *c)
 
 out:
 	if (!dns_set) {
-		if (!added)
+		if (!(dns4_idx + dns6_idx))
 			warn("Couldn't get any nameserver address");
 
 		if (c->no_dhcp_dns)
-- 
@@ -427,7 +427,6 @@ static void get_dns(struct ctx *c)
 	struct lineread resolvconf;
 	struct in6_addr dns6_tmp;
 	struct in_addr dns4_tmp;
-	unsigned int added = 0;
 	ssize_t line_len;
 	char *line, *end;
 	const char *p;
@@ -455,16 +454,12 @@ static void get_dns(struct ctx *c)
 				*end = 0;
 
 			if (!dns4_set && dns4_idx < ARRAY_SIZE(c->ip4.dns) - 1
-			    && inet_pton(AF_INET, p + 1, &dns4_tmp)) {
+			    && inet_pton(AF_INET, p + 1, &dns4_tmp))
 				dns4_idx += add_dns4(c, &dns4_tmp, dns4_idx);
-				added++;
-			}
 
 			if (!dns6_set && dns6_idx < ARRAY_SIZE(c->ip6.dns) - 1
-			    && inet_pton(AF_INET6, p + 1, &dns6_tmp)) {
+			    && inet_pton(AF_INET6, p + 1, &dns6_tmp))
 				dns6_idx += add_dns6(c, &dns6_tmp, dns6_idx);
-				added++;
-			}
 		} else if (!dnss_set && strstr(line, "search ") == line &&
 			   s == c->dns_search) {
 			end = strpbrk(line, "\n");
@@ -491,7 +486,7 @@ static void get_dns(struct ctx *c)
 
 out:
 	if (!dns_set) {
-		if (!added)
+		if (!(dns4_idx + dns6_idx))
 			warn("Couldn't get any nameserver address");
 
 		if (c->no_dhcp_dns)
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 07/22] conf: Move DNS array bounds checks into add_dns[46]
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
                   ` (5 preceding siblings ...)
  2024-08-16  5:39 ` [PATCH 06/22] conf: More accurately count entries added in get_dns() David Gibson
@ 2024-08-16  5:39 ` David Gibson
  2024-08-16  5:39 ` [PATCH 08/22] conf: Move adding of a nameserver from resolv.conf into subfunction David Gibson
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-16  5:39 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger, David Gibson

Every time we call add_dns[46] we need to first check if there's space in
the c->ip[46].dns array for the new entry.  We might as well make that
check in add_dns[46]() itself.

In fact it looks like the calls in get_dns() had an off by one error, not
allowing the last entry of the array to be filled.  So, that bug is also
fixed by the change.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/conf.c b/conf.c
index d19013c1..dfe417b4 100644
--- a/conf.c
+++ b/conf.c
@@ -363,6 +363,9 @@ static unsigned add_dns4(struct ctx *c, const struct in_addr *addr,
 {
 	unsigned added = 0;
 
+	if (idx >= ARRAY_SIZE(c->ip4.dns))
+		return 0;
+
 	/* Guest or container can only access local addresses via redirect */
 	if (IN4_IS_ADDR_LOOPBACK(addr)) {
 		if (!c->no_map_gw) {
@@ -395,6 +398,9 @@ static unsigned add_dns6(struct ctx *c, struct in6_addr *addr, unsigned idx)
 {
 	unsigned added = 0;
 
+	if (idx >= ARRAY_SIZE(c->ip6.dns))
+		return 0;
+
 	/* Guest or container can only access local addresses via redirect */
 	if (IN6_IS_ADDR_LOOPBACK(addr)) {
 		if (!c->no_map_gw) {
@@ -453,12 +459,10 @@ static void get_dns(struct ctx *c)
 			if (end)
 				*end = 0;
 
-			if (!dns4_set && dns4_idx < ARRAY_SIZE(c->ip4.dns) - 1
-			    && inet_pton(AF_INET, p + 1, &dns4_tmp))
+			if (!dns4_set && inet_pton(AF_INET, p + 1, &dns4_tmp))
 				dns4_idx += add_dns4(c, &dns4_tmp, dns4_idx);
 
-			if (!dns6_set && dns6_idx < ARRAY_SIZE(c->ip6.dns) - 1
-			    && inet_pton(AF_INET6, p + 1, &dns6_tmp))
+			if (!dns6_set && inet_pton(AF_INET6, p + 1, &dns6_tmp))
 				dns6_idx += add_dns6(c, &dns6_tmp, dns6_idx);
 		} else if (!dnss_set && strstr(line, "search ") == line &&
 			   s == c->dns_search) {
@@ -1682,14 +1686,12 @@ void conf(struct ctx *c, int argc, char **argv)
 
 			c->no_dns = 0;
 
-			if (dns4_idx < ARRAY_SIZE(c->ip4.dns) &&
-			    inet_pton(AF_INET, optarg, &dns4_tmp)) {
+			if (inet_pton(AF_INET, optarg, &dns4_tmp)) {
 				dns4_idx += add_dns4(c, &dns4_tmp, dns4_idx);
 				continue;
 			}
 
-			if (dns6_idx < ARRAY_SIZE(c->ip6.dns) &&
-			    inet_pton(AF_INET6, optarg, &dns6_tmp)) {
+			if (inet_pton(AF_INET6, optarg, &dns6_tmp)) {
 				dns6_idx += add_dns6(c, &dns6_tmp, dns6_idx);
 				continue;
 			}
-- 
@@ -363,6 +363,9 @@ static unsigned add_dns4(struct ctx *c, const struct in_addr *addr,
 {
 	unsigned added = 0;
 
+	if (idx >= ARRAY_SIZE(c->ip4.dns))
+		return 0;
+
 	/* Guest or container can only access local addresses via redirect */
 	if (IN4_IS_ADDR_LOOPBACK(addr)) {
 		if (!c->no_map_gw) {
@@ -395,6 +398,9 @@ static unsigned add_dns6(struct ctx *c, struct in6_addr *addr, unsigned idx)
 {
 	unsigned added = 0;
 
+	if (idx >= ARRAY_SIZE(c->ip6.dns))
+		return 0;
+
 	/* Guest or container can only access local addresses via redirect */
 	if (IN6_IS_ADDR_LOOPBACK(addr)) {
 		if (!c->no_map_gw) {
@@ -453,12 +459,10 @@ static void get_dns(struct ctx *c)
 			if (end)
 				*end = 0;
 
-			if (!dns4_set && dns4_idx < ARRAY_SIZE(c->ip4.dns) - 1
-			    && inet_pton(AF_INET, p + 1, &dns4_tmp))
+			if (!dns4_set && inet_pton(AF_INET, p + 1, &dns4_tmp))
 				dns4_idx += add_dns4(c, &dns4_tmp, dns4_idx);
 
-			if (!dns6_set && dns6_idx < ARRAY_SIZE(c->ip6.dns) - 1
-			    && inet_pton(AF_INET6, p + 1, &dns6_tmp))
+			if (!dns6_set && inet_pton(AF_INET6, p + 1, &dns6_tmp))
 				dns6_idx += add_dns6(c, &dns6_tmp, dns6_idx);
 		} else if (!dnss_set && strstr(line, "search ") == line &&
 			   s == c->dns_search) {
@@ -1682,14 +1686,12 @@ void conf(struct ctx *c, int argc, char **argv)
 
 			c->no_dns = 0;
 
-			if (dns4_idx < ARRAY_SIZE(c->ip4.dns) &&
-			    inet_pton(AF_INET, optarg, &dns4_tmp)) {
+			if (inet_pton(AF_INET, optarg, &dns4_tmp)) {
 				dns4_idx += add_dns4(c, &dns4_tmp, dns4_idx);
 				continue;
 			}
 
-			if (dns6_idx < ARRAY_SIZE(c->ip6.dns) &&
-			    inet_pton(AF_INET6, optarg, &dns6_tmp)) {
+			if (inet_pton(AF_INET6, optarg, &dns6_tmp)) {
 				dns6_idx += add_dns6(c, &dns6_tmp, dns6_idx);
 				continue;
 			}
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 08/22] conf: Move adding of a nameserver from resolv.conf into subfunction
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
                   ` (6 preceding siblings ...)
  2024-08-16  5:39 ` [PATCH 07/22] conf: Move DNS array bounds checks into add_dns[46] David Gibson
@ 2024-08-16  5:39 ` David Gibson
  2024-08-16  5:39 ` [PATCH 09/22] conf: Correct setting of dns_match address in add_dns6() David Gibson
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-16  5:39 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger, David Gibson

get_dns() is already quite deeply nested, and future changes I have in
mind will add more complexity.  Prepare for this by splitting out the
adding of a single nameserver to the configuration into its own function.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c | 33 ++++++++++++++++++++++++++-------
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/conf.c b/conf.c
index dfe417b4..4bb06b78 100644
--- a/conf.c
+++ b/conf.c
@@ -421,6 +421,29 @@ static unsigned add_dns6(struct ctx *c, struct in6_addr *addr, unsigned idx)
 	return added;
 }
 
+/**
+ * add_dns_resolv() - Possibly add ns from host resolv.conf to configuration
+ * @c:		Execution context
+ * @nameserver:	Nameserver address string from /etc/resolv.conf
+ * @idx4:	Pointer to index of current entry in array of IPv4 resolvers
+ * @idx6:	Pointer to index of current entry in array of IPv6 resolvers
+ *
+ * @idx4 or @idx6 may be NULL, in which case resolvers of the corresponding type
+ * are ignored.
+ */
+static void add_dns_resolv(struct ctx *c, const char *nameserver,
+			   unsigned *idx4, unsigned *idx6)
+{
+	struct in6_addr ns6;
+	struct in_addr ns4;
+
+	if (idx4 && inet_pton(AF_INET, nameserver, &ns4))
+		*idx4 += add_dns4(c, &ns4, *idx4);
+
+	if (idx6 && inet_pton(AF_INET6, nameserver, &ns6))
+		*idx6 += add_dns6(c, &ns6, *idx6);
+}
+
 /**
  * get_dns() - Get nameserver addresses from local /etc/resolv.conf
  * @c:		Execution context
@@ -431,8 +454,6 @@ static void get_dns(struct ctx *c)
 	unsigned dns4_idx = 0, dns6_idx = 0;
 	struct fqdn *s = c->dns_search;
 	struct lineread resolvconf;
-	struct in6_addr dns6_tmp;
-	struct in_addr dns4_tmp;
 	ssize_t line_len;
 	char *line, *end;
 	const char *p;
@@ -459,11 +480,9 @@ static void get_dns(struct ctx *c)
 			if (end)
 				*end = 0;
 
-			if (!dns4_set && inet_pton(AF_INET, p + 1, &dns4_tmp))
-				dns4_idx += add_dns4(c, &dns4_tmp, dns4_idx);
-
-			if (!dns6_set && inet_pton(AF_INET6, p + 1, &dns6_tmp))
-				dns6_idx += add_dns6(c, &dns6_tmp, dns6_idx);
+			add_dns_resolv(c, p + 1,
+				       dns4_set ? NULL : &dns4_idx,
+				       dns6_set ? NULL : &dns6_idx);
 		} else if (!dnss_set && strstr(line, "search ") == line &&
 			   s == c->dns_search) {
 			end = strpbrk(line, "\n");
-- 
@@ -421,6 +421,29 @@ static unsigned add_dns6(struct ctx *c, struct in6_addr *addr, unsigned idx)
 	return added;
 }
 
+/**
+ * add_dns_resolv() - Possibly add ns from host resolv.conf to configuration
+ * @c:		Execution context
+ * @nameserver:	Nameserver address string from /etc/resolv.conf
+ * @idx4:	Pointer to index of current entry in array of IPv4 resolvers
+ * @idx6:	Pointer to index of current entry in array of IPv6 resolvers
+ *
+ * @idx4 or @idx6 may be NULL, in which case resolvers of the corresponding type
+ * are ignored.
+ */
+static void add_dns_resolv(struct ctx *c, const char *nameserver,
+			   unsigned *idx4, unsigned *idx6)
+{
+	struct in6_addr ns6;
+	struct in_addr ns4;
+
+	if (idx4 && inet_pton(AF_INET, nameserver, &ns4))
+		*idx4 += add_dns4(c, &ns4, *idx4);
+
+	if (idx6 && inet_pton(AF_INET6, nameserver, &ns6))
+		*idx6 += add_dns6(c, &ns6, *idx6);
+}
+
 /**
  * get_dns() - Get nameserver addresses from local /etc/resolv.conf
  * @c:		Execution context
@@ -431,8 +454,6 @@ static void get_dns(struct ctx *c)
 	unsigned dns4_idx = 0, dns6_idx = 0;
 	struct fqdn *s = c->dns_search;
 	struct lineread resolvconf;
-	struct in6_addr dns6_tmp;
-	struct in_addr dns4_tmp;
 	ssize_t line_len;
 	char *line, *end;
 	const char *p;
@@ -459,11 +480,9 @@ static void get_dns(struct ctx *c)
 			if (end)
 				*end = 0;
 
-			if (!dns4_set && inet_pton(AF_INET, p + 1, &dns4_tmp))
-				dns4_idx += add_dns4(c, &dns4_tmp, dns4_idx);
-
-			if (!dns6_set && inet_pton(AF_INET6, p + 1, &dns6_tmp))
-				dns6_idx += add_dns6(c, &dns6_tmp, dns6_idx);
+			add_dns_resolv(c, p + 1,
+				       dns4_set ? NULL : &dns4_idx,
+				       dns6_set ? NULL : &dns6_idx);
 		} else if (!dnss_set && strstr(line, "search ") == line &&
 			   s == c->dns_search) {
 			end = strpbrk(line, "\n");
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 09/22] conf: Correct setting of dns_match address in add_dns6()
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
                   ` (7 preceding siblings ...)
  2024-08-16  5:39 ` [PATCH 08/22] conf: Move adding of a nameserver from resolv.conf into subfunction David Gibson
@ 2024-08-16  5:39 ` David Gibson
  2024-08-16  5:39 ` [PATCH 10/22] conf: Treat --dns addresses as guest visible addresses David Gibson
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-16  5:39 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger, David Gibson

add_dns6() (but not add_dns4()) has a bug setting dns_match: it sets it to
the given address, rather than the gateway address.  This is doubly wrong:
 - We've just established the given address is a host loopback address
   the guest can't access
 - We've just set ip6.dns[] to tell the guest to use the gateway address,
   so it won't use the dns_match address we're setting

Correct this to use the gateway address, like IPv4.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/conf.c b/conf.c
index 4bb06b78..bf61c143 100644
--- a/conf.c
+++ b/conf.c
@@ -408,7 +408,7 @@ static unsigned add_dns6(struct ctx *c, struct in6_addr *addr, unsigned idx)
 			added++;
 
 			if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_match))
-				c->ip6.dns_match = *addr;
+				c->ip6.dns_match = c->ip6.gw;
 		}
 	} else {
 		c->ip6.dns[idx] = *addr;
-- 
@@ -408,7 +408,7 @@ static unsigned add_dns6(struct ctx *c, struct in6_addr *addr, unsigned idx)
 			added++;
 
 			if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_match))
-				c->ip6.dns_match = *addr;
+				c->ip6.dns_match = c->ip6.gw;
 		}
 	} else {
 		c->ip6.dns[idx] = *addr;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 10/22] conf: Treat --dns addresses as guest visible addresses
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
                   ` (8 preceding siblings ...)
  2024-08-16  5:39 ` [PATCH 09/22] conf: Correct setting of dns_match address in add_dns6() David Gibson
@ 2024-08-16  5:39 ` David Gibson
  2024-08-16  5:39 ` [PATCH 11/22] conf: Remove incorrect initialisation of addr_ll_seen David Gibson
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-16  5:39 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger, David Gibson

Although it's not 100% explicit in the man page, addresses given to the
--dns option are intended to be addresses as seen by the guest.  This
differs from addresses taken from the host's /etc/resolv.conf, which must
be translated to to guest accessible versions in some cases.

Our implementation is currently inconsistent on this: when using
--dns-forward, you must usually also give --dns with the matching address,
which is meaningful only in the guest's address view.  However if you give
--dns with a loopback addres, it will be translated like a host view
address.

Move the remapping logic for DNS addresses out of add_dns4() and add_dns6()
into add_dns_resolv() so that it is only applied for host nameserver
addresses, not for nameservers given explicitly with --dns.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c  | 88 ++++++++++++++++++++++++++++-----------------------------
 passt.1 | 14 +++++----
 2 files changed, 52 insertions(+), 50 deletions(-)

diff --git a/conf.c b/conf.c
index bf61c143..3c102bcf 100644
--- a/conf.c
+++ b/conf.c
@@ -353,7 +353,7 @@ bind_all_fail:
 /**
  * add_dns4() - Possibly add the IPv4 address of a DNS resolver to configuration
  * @c:		Execution context
- * @addr:	Address found in /etc/resolv.conf
+ * @addr:	Guest nameserver IPv4 address
  * @idx:	Index of free entry in array of IPv4 resolvers
  *
  * Return: Number of entries added (0 or 1)
@@ -361,64 +361,29 @@ bind_all_fail:
 static unsigned add_dns4(struct ctx *c, const struct in_addr *addr,
 			 unsigned idx)
 {
-	unsigned added = 0;
-
 	if (idx >= ARRAY_SIZE(c->ip4.dns))
 		return 0;
 
-	/* Guest or container can only access local addresses via redirect */
-	if (IN4_IS_ADDR_LOOPBACK(addr)) {
-		if (!c->no_map_gw) {
-			c->ip4.dns[idx] = c->ip4.gw;
-			added++;
-
-			if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns_match))
-				c->ip4.dns_match = c->ip4.gw;
-		}
-	} else {
-		c->ip4.dns[idx] = *addr;
-		added++;
-	}
-
-	if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns_host))
-		c->ip4.dns_host = *addr;
-
-	return added;
+	c->ip4.dns[idx] = *addr;
+	return 1;
 }
 
 /**
  * add_dns6() - Possibly add the IPv6 address of a DNS resolver to configuration
  * @c:		Execution context
- * @addr:	Address found in /etc/resolv.conf
+ * @addr:	Guest nameserver IPv6 address
  * @idx:	Index of free entry in array of IPv6 resolvers
  *
  * Return: Number of entries added (0 or 1)
  */
-static unsigned add_dns6(struct ctx *c, struct in6_addr *addr, unsigned idx)
+static unsigned add_dns6(struct ctx *c, const struct in6_addr *addr,
+			 unsigned idx)
 {
-	unsigned added = 0;
-
 	if (idx >= ARRAY_SIZE(c->ip6.dns))
 		return 0;
 
-	/* Guest or container can only access local addresses via redirect */
-	if (IN6_IS_ADDR_LOOPBACK(addr)) {
-		if (!c->no_map_gw) {
-			c->ip6.dns[idx] = c->ip6.gw;
-			added++;
-
-			if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_match))
-				c->ip6.dns_match = c->ip6.gw;
-		}
-	} else {
-		c->ip6.dns[idx] = *addr;
-		added++;
-	}
-
-	if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_host))
-		c->ip6.dns_host = *addr;
-
-	return added;
+	c->ip6.dns[idx] = *addr;
+	return 1;
 }
 
 /**
@@ -437,11 +402,44 @@ static void add_dns_resolv(struct ctx *c, const char *nameserver,
 	struct in6_addr ns6;
 	struct in_addr ns4;
 
-	if (idx4 && inet_pton(AF_INET, nameserver, &ns4))
+	if (idx4 && inet_pton(AF_INET, nameserver, &ns4)) {
+		if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns_host))
+			c->ip4.dns_host = ns4;
+
+		/* Guest or container can only access local addresses via
+		 * redirect
+		 */
+		if (IN4_IS_ADDR_LOOPBACK(&ns4)) {
+			if (c->no_map_gw)
+				return;
+
+			ns4 = c->ip4.gw;
+			if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns_match))
+				c->ip4.dns_match = c->ip4.gw;
+		}
+
 		*idx4 += add_dns4(c, &ns4, *idx4);
+	}
+
+	if (idx6 && inet_pton(AF_INET6, nameserver, &ns6)) {
+		if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_host))
+			c->ip6.dns_host = ns6;
+
+		/* Guest or container can only access local addresses via
+		 * redirect
+		 */
+		if (IN6_IS_ADDR_LOOPBACK(&ns6)) {
+			if (c->no_map_gw)
+				return;
+
+			ns6 = c->ip6.gw;
+
+			if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_match))
+				c->ip6.dns_match = c->ip6.gw;
+		}
 
-	if (idx6 && inet_pton(AF_INET6, nameserver, &ns6))
 		*idx6 += add_dns6(c, &ns6, *idx6);
+	}
 }
 
 /**
diff --git a/passt.1 b/passt.1
index 3062b719..dca433b6 100644
--- a/passt.1
+++ b/passt.1
@@ -236,11 +236,15 @@ interface will be chosen instead.
 
 .TP
 .BR \-D ", " \-\-dns " " \fIaddr
-Use \fIaddr\fR (IPv4 or IPv6) for DHCP, DHCPv6, NDP or DNS forwarding, as
-configured (see options \fB--no-dhcp-dns\fR, \fB--dhcp-dns\fR,
-\fB--dns-forward\fR) instead of reading addresses from \fI/etc/resolv.conf\fR.
-This option can be specified multiple times.  Specifying \fB-D none\fR disables
-usage of DNS addresses altogether.
+Instruct the guest (via DHCP, DHVPv6 or NDP) to use \fIaddr\fR (IPv4
+or IPv6) as a nameserver, as configured (see options
+\fB--no-dhcp-dns\fR, \fB--dhcp-dns\fR) instead of reading addresses
+from \fI/etc/resolv.conf\fR.  This option can be specified multiple
+times.  Specifying \fB-D none\fR disables usage of DNS addresses
+altogether.  Unlike addresses from \fI/etc/resolv.conf\fR, \fIaddr\fR
+is given to the guest without remapping.  For example \fB--dns
+127.0.0.1\fR will instruct the guest to use itself as nameserver, not
+the host.
 
 .TP
 .BR \-\-dns-forward " " \fIaddr
-- 
@@ -236,11 +236,15 @@ interface will be chosen instead.
 
 .TP
 .BR \-D ", " \-\-dns " " \fIaddr
-Use \fIaddr\fR (IPv4 or IPv6) for DHCP, DHCPv6, NDP or DNS forwarding, as
-configured (see options \fB--no-dhcp-dns\fR, \fB--dhcp-dns\fR,
-\fB--dns-forward\fR) instead of reading addresses from \fI/etc/resolv.conf\fR.
-This option can be specified multiple times.  Specifying \fB-D none\fR disables
-usage of DNS addresses altogether.
+Instruct the guest (via DHCP, DHVPv6 or NDP) to use \fIaddr\fR (IPv4
+or IPv6) as a nameserver, as configured (see options
+\fB--no-dhcp-dns\fR, \fB--dhcp-dns\fR) instead of reading addresses
+from \fI/etc/resolv.conf\fR.  This option can be specified multiple
+times.  Specifying \fB-D none\fR disables usage of DNS addresses
+altogether.  Unlike addresses from \fI/etc/resolv.conf\fR, \fIaddr\fR
+is given to the guest without remapping.  For example \fB--dns
+127.0.0.1\fR will instruct the guest to use itself as nameserver, not
+the host.
 
 .TP
 .BR \-\-dns-forward " " \fIaddr
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 11/22] conf: Remove incorrect initialisation of addr_ll_seen
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
                   ` (9 preceding siblings ...)
  2024-08-16  5:39 ` [PATCH 10/22] conf: Treat --dns addresses as guest visible addresses David Gibson
@ 2024-08-16  5:39 ` David Gibson
  2024-08-16  5:39 ` [PATCH 12/22] util: Correct sock_l4() binding for link local addresses David Gibson
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-16  5:39 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger, David Gibson

Despite the names, addr_ll_seen does not relate to addr_ll the same way
addr_see relates to addr.  addr_ll_seen is an observed address from the
guest, whereas addr_ll is *our* link-local address for use on the tap link
when we can't use an external endpoint address.  It's used both for
passt provided services (DHCPv6, NDP) and in some cases for connections
from addresses the guest can't access.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/conf.c b/conf.c
index 3c102bcf..e5b5263f 100644
--- a/conf.c
+++ b/conf.c
@@ -720,7 +720,6 @@ static unsigned int conf_ip6(unsigned int ifi,
 	}
 
 	ip6->addr_seen = ip6->addr;
-	ip6->addr_ll_seen = ip6->addr_ll;
 
 	if (MAC_IS_ZERO(mac)) {
 		rc = nl_link_get_mac(nl_sock, ifi, mac);
-- 
@@ -720,7 +720,6 @@ static unsigned int conf_ip6(unsigned int ifi,
 	}
 
 	ip6->addr_seen = ip6->addr;
-	ip6->addr_ll_seen = ip6->addr_ll;
 
 	if (MAC_IS_ZERO(mac)) {
 		rc = nl_link_get_mac(nl_sock, ifi, mac);
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 12/22] util: Correct sock_l4() binding for link local addresses
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
                   ` (10 preceding siblings ...)
  2024-08-16  5:39 ` [PATCH 11/22] conf: Remove incorrect initialisation of addr_ll_seen David Gibson
@ 2024-08-16  5:39 ` David Gibson
  2024-08-20  0:14   ` Stefano Brivio
  2024-08-16  5:39 ` [PATCH 13/22] treewide: Change misleading 'addr_ll' name David Gibson
                   ` (11 subsequent siblings)
  23 siblings, 1 reply; 55+ messages in thread
From: David Gibson @ 2024-08-16  5:39 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger, David Gibson

When binding an IPv6 socket in sock_l4() we need to supply a scope id if
the address is link-local.  We check for this by comparing the given
address to c->ip6.addr_ll.  This is correct only by accident: while
c->ip6.addr_ll is typically set to the hsot interface's link local
address, the actually purpose of it is to provide a link local address
for passt's private use on the tap interface.

Instead set the scope id for any link-local address we're binding to.
We're going to need something and this is what makes sense for sockets
on the host.  It doesn't make sense for PIF_SPLICE sockets, but those
should always have loopback, not link-local addresses.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 util.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/util.c b/util.c
index 892358b1..9682e3ce 100644
--- a/util.c
+++ b/util.c
@@ -199,8 +199,7 @@ int sock_l4(const struct ctx *c, sa_family_t af, enum epoll_type type,
 		if (bind_addr) {
 			addr6.sin6_addr = *(struct in6_addr *)bind_addr;
 
-			if (!memcmp(bind_addr, &c->ip6.addr_ll,
-			    sizeof(c->ip6.addr_ll)))
+			if (IN6_IS_ADDR_LINKLOCAL(bind_addr))
 				addr6.sin6_scope_id = c->ifi6;
 		}
 		return sock_l4_sa(c, type, &addr6, sizeof(addr6), ifname,
-- 
@@ -199,8 +199,7 @@ int sock_l4(const struct ctx *c, sa_family_t af, enum epoll_type type,
 		if (bind_addr) {
 			addr6.sin6_addr = *(struct in6_addr *)bind_addr;
 
-			if (!memcmp(bind_addr, &c->ip6.addr_ll,
-			    sizeof(c->ip6.addr_ll)))
+			if (IN6_IS_ADDR_LINKLOCAL(bind_addr))
 				addr6.sin6_scope_id = c->ifi6;
 		}
 		return sock_l4_sa(c, type, &addr6, sizeof(addr6), ifname,
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 13/22] treewide: Change misleading 'addr_ll' name
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
                   ` (11 preceding siblings ...)
  2024-08-16  5:39 ` [PATCH 12/22] util: Correct sock_l4() binding for link local addresses David Gibson
@ 2024-08-16  5:39 ` David Gibson
  2024-08-20  0:15   ` Stefano Brivio
  2024-08-16  5:39 ` [PATCH 14/22] Clarify which addresses in ip[46]_ctx are meaningful where David Gibson
                   ` (10 subsequent siblings)
  23 siblings, 1 reply; 55+ messages in thread
From: David Gibson @ 2024-08-16  5:39 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger, David Gibson

c->ip6.addr_ll is not like c->ip6.addr.  The latter is an address for the
guest, but the former is an address for our use on the tap link.  Rename it
accordingly, to 'our_tap_ll'.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c   | 7 ++++---
 dhcpv6.c | 2 +-
 fwd.c    | 2 +-
 ndp.c    | 2 +-
 passt.h  | 4 ++--
 5 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/conf.c b/conf.c
index e5b5263f..1130ce5d 100644
--- a/conf.c
+++ b/conf.c
@@ -713,7 +713,7 @@ static unsigned int conf_ip6(unsigned int ifi,
 
 	rc = nl_addr_get(nl_sock, ifi, AF_INET6,
 			 IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ? &ip6->addr : NULL,
-			 &prefix_len, &ip6->addr_ll);
+			 &prefix_len, &ip6->our_tap_ll);
 	if (rc < 0) {
 		err("Couldn't discover IPv6 address: %s", strerror(-rc));
 		return 0;
@@ -735,7 +735,7 @@ static unsigned int conf_ip6(unsigned int ifi,
 	}
 
 	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ||
-	    IN6_IS_ADDR_UNSPECIFIED(&ip6->addr_ll))
+	    IN6_IS_ADDR_UNSPECIFIED(&ip6->our_tap_ll))
 		return 0;
 
 	return ifi;
@@ -1027,7 +1027,8 @@ static void conf_print(const struct ctx *c)
 		info("    router: %s",
 		     inet_ntop(AF_INET6, &c->ip6.gw,   buf6, sizeof(buf6)));
 		info("    our link-local: %s",
-		     inet_ntop(AF_INET6, &c->ip6.addr_ll, buf6, sizeof(buf6)));
+		     inet_ntop(AF_INET6, &c->ip6.our_tap_ll,
+			       buf6, sizeof(buf6)));
 
 dns6:
 		for (i = 0; !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns[i]); i++) {
diff --git a/dhcpv6.c b/dhcpv6.c
index 87b3c3eb..44e954e7 100644
--- a/dhcpv6.c
+++ b/dhcpv6.c
@@ -456,7 +456,7 @@ int dhcpv6(struct ctx *c, const struct pool *p,
 	if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.gw))
 		src = &c->ip6.gw;
 	else
-		src = &c->ip6.addr_ll;
+		src = &c->ip6.our_tap_ll;
 
 	mh = packet_get(p, 0, sizeof(*uh), sizeof(*mh), NULL);
 	if (!mh)
diff --git a/fwd.c b/fwd.c
index b546bc41..dccc947d 100644
--- a/fwd.c
+++ b/fwd.c
@@ -320,7 +320,7 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto,
 		if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.gw))
 			tgt->oaddr.a6 = c->ip6.gw;
 		else
-			tgt->oaddr.a6 = c->ip6.addr_ll;
+			tgt->oaddr.a6 = c->ip6.our_tap_ll;
 	}
 
 	if (inany_v4(&tgt->oaddr)) {
diff --git a/ndp.c b/ndp.c
index 9c0fef4a..3a76b00a 100644
--- a/ndp.c
+++ b/ndp.c
@@ -344,7 +344,7 @@ dns_done:
 	if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.gw))
 		rsaddr = &c->ip6.gw;
 	else
-		rsaddr = &c->ip6.addr_ll;
+		rsaddr = &c->ip6.our_tap_ll;
 
 	if (ih->icmp6_type == NS) {
 		dlen = sizeof(struct ndp_na);
diff --git a/passt.h b/passt.h
index fe3e47d2..5e7e6a04 100644
--- a/passt.h
+++ b/passt.h
@@ -122,7 +122,7 @@ struct ip4_ctx {
 /**
  * struct ip6_ctx - IPv6 execution context
  * @addr:		IPv6 address assigned to guest
- * @addr_ll:		Link-local IPv6 address on external, routable interface
+ * @our_tap_ll:		Link-local IPv6 address for passt's use on tap
  * @addr_seen:		Latest IPv6 global/site address seen as source from tap
  * @addr_ll_seen:	Latest IPv6 link-local address seen as source from tap
  * @gw:			Default IPv6 gateway
@@ -136,7 +136,7 @@ struct ip4_ctx {
  */
 struct ip6_ctx {
 	struct in6_addr addr;
-	struct in6_addr addr_ll;
+	struct in6_addr our_tap_ll;
 	struct in6_addr addr_seen;
 	struct in6_addr addr_ll_seen;
 	struct in6_addr gw;
-- 
@@ -122,7 +122,7 @@ struct ip4_ctx {
 /**
  * struct ip6_ctx - IPv6 execution context
  * @addr:		IPv6 address assigned to guest
- * @addr_ll:		Link-local IPv6 address on external, routable interface
+ * @our_tap_ll:		Link-local IPv6 address for passt's use on tap
  * @addr_seen:		Latest IPv6 global/site address seen as source from tap
  * @addr_ll_seen:	Latest IPv6 link-local address seen as source from tap
  * @gw:			Default IPv6 gateway
@@ -136,7 +136,7 @@ struct ip4_ctx {
  */
 struct ip6_ctx {
 	struct in6_addr addr;
-	struct in6_addr addr_ll;
+	struct in6_addr our_tap_ll;
 	struct in6_addr addr_seen;
 	struct in6_addr addr_ll_seen;
 	struct in6_addr gw;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 14/22] Clarify which addresses in ip[46]_ctx are meaningful where
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
                   ` (12 preceding siblings ...)
  2024-08-16  5:39 ` [PATCH 13/22] treewide: Change misleading 'addr_ll' name David Gibson
@ 2024-08-16  5:39 ` David Gibson
  2024-08-16  5:39 ` [PATCH 15/22] Initialise our_tap_ll to ip6.gw when suitable David Gibson
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-16  5:39 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger, David Gibson

Some are guest visible addresses and may not be valid on the host, others
are host visible addresses and may not be valid on the guest.  Rearrange
and comment the ip[46]_ctx definitions to make it clearer which is which.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 passt.h | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/passt.h b/passt.h
index 5e7e6a04..3b8a6283 100644
--- a/passt.h
+++ b/passt.h
@@ -104,15 +104,18 @@ enum passt_modes {
  * @no_copy_addrs:	Don't copy all addresses when configuring namespace
  */
 struct ip4_ctx {
+	/* PIF_TAP addresses */
 	struct in_addr addr;
 	struct in_addr addr_seen;
 	int prefix_len;
 	struct in_addr gw;
 	struct in_addr dns[MAXNS + 1];
 	struct in_addr dns_match;
-	struct in_addr dns_host;
 
+	/* PIF_HOST addresses */
+	struct in_addr dns_host;
 	struct in_addr addr_out;
+
 	char ifname_out[IFNAMSIZ];
 
 	bool no_copy_routes;
@@ -122,12 +125,12 @@ struct ip4_ctx {
 /**
  * struct ip6_ctx - IPv6 execution context
  * @addr:		IPv6 address assigned to guest
- * @our_tap_ll:		Link-local IPv6 address for passt's use on tap
  * @addr_seen:		Latest IPv6 global/site address seen as source from tap
  * @addr_ll_seen:	Latest IPv6 link-local address seen as source from tap
  * @gw:			Default IPv6 gateway
  * @dns:		DNS addresses for DHCPv6 and NDP, zero-terminated
  * @dns_match:		Forward DNS query if sent to this address
+ * @our_tap_ll:		Link-local IPv6 address for passt's use on tap
  * @dns_host:		Use this DNS on the host for forwarding
  * @addr_out:		Optional source address for outbound traffic
  * @ifname_out:		Optional interface name to bind outbound sockets to
@@ -135,16 +138,19 @@ struct ip4_ctx {
  * @no_copy_addrs:	Don't copy all addresses when configuring namespace
  */
 struct ip6_ctx {
+	/* PIF_TAP addresses */
 	struct in6_addr addr;
-	struct in6_addr our_tap_ll;
 	struct in6_addr addr_seen;
 	struct in6_addr addr_ll_seen;
 	struct in6_addr gw;
 	struct in6_addr dns[MAXNS + 1];
 	struct in6_addr dns_match;
-	struct in6_addr dns_host;
+	struct in6_addr our_tap_ll;
 
+	/* PIF_HOST addresses */
+	struct in6_addr dns_host;
 	struct in6_addr addr_out;
+
 	char ifname_out[IFNAMSIZ];
 
 	bool no_copy_routes;
-- 
@@ -104,15 +104,18 @@ enum passt_modes {
  * @no_copy_addrs:	Don't copy all addresses when configuring namespace
  */
 struct ip4_ctx {
+	/* PIF_TAP addresses */
 	struct in_addr addr;
 	struct in_addr addr_seen;
 	int prefix_len;
 	struct in_addr gw;
 	struct in_addr dns[MAXNS + 1];
 	struct in_addr dns_match;
-	struct in_addr dns_host;
 
+	/* PIF_HOST addresses */
+	struct in_addr dns_host;
 	struct in_addr addr_out;
+
 	char ifname_out[IFNAMSIZ];
 
 	bool no_copy_routes;
@@ -122,12 +125,12 @@ struct ip4_ctx {
 /**
  * struct ip6_ctx - IPv6 execution context
  * @addr:		IPv6 address assigned to guest
- * @our_tap_ll:		Link-local IPv6 address for passt's use on tap
  * @addr_seen:		Latest IPv6 global/site address seen as source from tap
  * @addr_ll_seen:	Latest IPv6 link-local address seen as source from tap
  * @gw:			Default IPv6 gateway
  * @dns:		DNS addresses for DHCPv6 and NDP, zero-terminated
  * @dns_match:		Forward DNS query if sent to this address
+ * @our_tap_ll:		Link-local IPv6 address for passt's use on tap
  * @dns_host:		Use this DNS on the host for forwarding
  * @addr_out:		Optional source address for outbound traffic
  * @ifname_out:		Optional interface name to bind outbound sockets to
@@ -135,16 +138,19 @@ struct ip4_ctx {
  * @no_copy_addrs:	Don't copy all addresses when configuring namespace
  */
 struct ip6_ctx {
+	/* PIF_TAP addresses */
 	struct in6_addr addr;
-	struct in6_addr our_tap_ll;
 	struct in6_addr addr_seen;
 	struct in6_addr addr_ll_seen;
 	struct in6_addr gw;
 	struct in6_addr dns[MAXNS + 1];
 	struct in6_addr dns_match;
-	struct in6_addr dns_host;
+	struct in6_addr our_tap_ll;
 
+	/* PIF_HOST addresses */
+	struct in6_addr dns_host;
 	struct in6_addr addr_out;
+
 	char ifname_out[IFNAMSIZ];
 
 	bool no_copy_routes;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 15/22] Initialise our_tap_ll to ip6.gw when suitable
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
                   ` (13 preceding siblings ...)
  2024-08-16  5:39 ` [PATCH 14/22] Clarify which addresses in ip[46]_ctx are meaningful where David Gibson
@ 2024-08-16  5:39 ` David Gibson
  2024-08-16  5:39 ` [PATCH 16/22] fwd: Helpers to clarify what host addresses aren't guest accessible David Gibson
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-16  5:39 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger, David Gibson

In every place we use our_tap_ll, we only use it as a fallback if the
IPv6 gateway address is not link-local.  We can avoid that conditional at
use time by doing it at initialisation of our_tap_ll instead.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c   | 3 +++
 dhcpv6.c | 5 +----
 fwd.c    | 5 +----
 ndp.c    | 5 +----
 4 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/conf.c b/conf.c
index 1130ce5d..954f20ea 100644
--- a/conf.c
+++ b/conf.c
@@ -721,6 +721,9 @@ static unsigned int conf_ip6(unsigned int ifi,
 
 	ip6->addr_seen = ip6->addr;
 
+	if (IN6_IS_ADDR_LINKLOCAL(&ip6->gw))
+		ip6->our_tap_ll = ip6->gw;
+
 	if (MAC_IS_ZERO(mac)) {
 		rc = nl_link_get_mac(nl_sock, ifi, mac);
 		if (rc < 0) {
diff --git a/dhcpv6.c b/dhcpv6.c
index 44e954e7..69841abc 100644
--- a/dhcpv6.c
+++ b/dhcpv6.c
@@ -453,10 +453,7 @@ int dhcpv6(struct ctx *c, const struct pool *p,
 
 	c->ip6.addr_ll_seen = *saddr;
 
-	if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.gw))
-		src = &c->ip6.gw;
-	else
-		src = &c->ip6.our_tap_ll;
+	src = &c->ip6.our_tap_ll;
 
 	mh = packet_get(p, 0, sizeof(*uh), sizeof(*mh), NULL);
 	if (!mh)
diff --git a/fwd.c b/fwd.c
index dccc947d..75dc0151 100644
--- a/fwd.c
+++ b/fwd.c
@@ -317,10 +317,7 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto,
 	} else if (inany_is_loopback6(&tgt->oaddr) ||
 		   inany_equals6(&tgt->oaddr, &c->ip6.addr_seen) ||
 		   inany_equals6(&tgt->oaddr, &c->ip6.addr)) {
-		if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.gw))
-			tgt->oaddr.a6 = c->ip6.gw;
-		else
-			tgt->oaddr.a6 = c->ip6.our_tap_ll;
+		tgt->oaddr.a6 = c->ip6.our_tap_ll;
 	}
 
 	if (inany_v4(&tgt->oaddr)) {
diff --git a/ndp.c b/ndp.c
index 3a76b00a..a1ee8349 100644
--- a/ndp.c
+++ b/ndp.c
@@ -341,10 +341,7 @@ dns_done:
 	else
 		c->ip6.addr_seen = *saddr;
 
-	if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.gw))
-		rsaddr = &c->ip6.gw;
-	else
-		rsaddr = &c->ip6.our_tap_ll;
+	rsaddr = &c->ip6.our_tap_ll;
 
 	if (ih->icmp6_type == NS) {
 		dlen = sizeof(struct ndp_na);
-- 
@@ -341,10 +341,7 @@ dns_done:
 	else
 		c->ip6.addr_seen = *saddr;
 
-	if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.gw))
-		rsaddr = &c->ip6.gw;
-	else
-		rsaddr = &c->ip6.our_tap_ll;
+	rsaddr = &c->ip6.our_tap_ll;
 
 	if (ih->icmp6_type == NS) {
 		dlen = sizeof(struct ndp_na);
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 16/22] fwd: Helpers to clarify what host addresses aren't guest accessible
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
                   ` (14 preceding siblings ...)
  2024-08-16  5:39 ` [PATCH 15/22] Initialise our_tap_ll to ip6.gw when suitable David Gibson
@ 2024-08-16  5:39 ` David Gibson
  2024-08-20 19:56   ` Stefano Brivio
  2024-08-16  5:39 ` [PATCH 17/22] fwd: Split notion of "our tap address" from gateway for IPv4 David Gibson
                   ` (7 subsequent siblings)
  23 siblings, 1 reply; 55+ messages in thread
From: David Gibson @ 2024-08-16  5:39 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger, David Gibson

We usually avoid NAT, but in a few cases we need to apply address
translations.  For inbound connections that happens for addresses which
make sense to the host but are either inaccessible, or mean a different
location from the guest's point of view.

Add some helper functions to determine such addresses, and use them in
fwd_nat_from_host().  In doing so clarify some of the reasons for the
logic.  We'll also have further use for these helpers in future.

While we're there fix one unneccessary inconsistency between IPv4 and IPv6.
We always translated the guest's observed address, but for IPv4 we didn't
translate the guest's assigned address, whereas for IPv6 we did.  Change
this to translate both in all cases for consistency.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 fwd.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 87 insertions(+), 11 deletions(-)

diff --git a/fwd.c b/fwd.c
index 75dc0151..1baae338 100644
--- a/fwd.c
+++ b/fwd.c
@@ -170,6 +170,85 @@ static bool is_dns_flow(uint8_t proto, const struct flowside *ini)
 		((ini->oport == 53) || (ini->oport == 853));
 }
 
+/**
+ * fwd_guest_accessible4() - Is IPv4 address guest accessible
+ * @c:		Execution context
+ * @addr:	Host visible IPv4 address
+ *
+ * Return: true if @addr on the host is accessible to the guest without
+ *         translation, false otherwise
+ */
+static bool fwd_guest_accessible4(const struct ctx *c,
+				    const struct in_addr *addr)
+{
+	if (IN4_IS_ADDR_LOOPBACK(addr))
+		return false;
+
+	/* In socket interfaces 0.0.0.0 generally means "any" or unspecified,
+	 * however on the wire it can mean "this host on this network".  Since
+	 * that has a different meaning for host and guest, we can't let it
+	 * through untranslated.
+	 */
+	if (IN4_IS_ADDR_UNSPECIFIED(addr))
+		return false;
+
+	/* For IPv4, addr_seen is initialised to addr, so is always a valid
+	 * address
+	 */
+	if (IN4_ARE_ADDR_EQUAL(addr, &c->ip4.addr) ||
+	    IN4_ARE_ADDR_EQUAL(addr, &c->ip4.addr_seen))
+		return false;
+
+	return true;
+}
+
+/**
+ * fwd_guest_accessible6() - Is IPv6 address guest accessible
+ * @c:		Execution context
+ * @addr:	Host visible IPv6 address
+ *
+ * Return: true if @addr on the host is accessible to the guest without
+ *         translation, false otherwise
+ */
+static bool fwd_guest_accessible6(const struct ctx *c,
+				  const struct in6_addr *addr)
+{
+	if (IN6_IS_ADDR_LOOPBACK(addr))
+		return false;
+
+	if (IN6_ARE_ADDR_EQUAL(addr, &c->ip6.addr))
+		return false;
+
+	/* For IPv6, addr_seen starts unspecified, because we don't know what LL
+	 * address the guest will take until we see it.  Only check against it
+	 * if it has been set to a real address.
+	 */
+	if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.addr_seen) &&
+	    IN6_ARE_ADDR_EQUAL(addr, &c->ip6.addr_seen))
+		return false;
+
+	return true;
+}
+
+/**
+ * fwd_guest_accessible() - Is IPv[46] address guest accessible
+ * @c:		Execution context
+ * @addr:	Host visible IPv[46] address
+ *
+ * Return: true if @addr on the host is accessible to the guest without
+ *         translation, false otherwise
+ */
+static bool fwd_guest_accessible(const struct ctx *c,
+				 const union inany_addr *addr)
+{
+	const struct in_addr *a4 = inany_v4(addr);
+
+	if (a4)
+		return fwd_guest_accessible4(c, a4);
+
+	return fwd_guest_accessible6(c, &addr->a6);
+}
+
 /**
  * fwd_nat_from_tap() - Determine to forward a flow from the tap interface
  * @c:		Execution context
@@ -307,18 +386,15 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto,
 		return PIF_SPLICE;
 	}
 
-	tgt->oaddr = ini->eaddr;
-	tgt->oport = ini->eport;
-
-	if (inany_is_loopback4(&tgt->oaddr) ||
-	    inany_is_unspecified4(&tgt->oaddr) ||
-	    inany_equals4(&tgt->oaddr, &c->ip4.addr_seen)) {
-		tgt->oaddr = inany_from_v4(c->ip4.gw);
-	} else if (inany_is_loopback6(&tgt->oaddr) ||
-		   inany_equals6(&tgt->oaddr, &c->ip6.addr_seen) ||
-		   inany_equals6(&tgt->oaddr, &c->ip6.addr)) {
-		tgt->oaddr.a6 = c->ip6.our_tap_ll;
+	if (!fwd_guest_accessible(c, &ini->eaddr)) {
+		if (inany_v4(&ini->eaddr))
+			tgt->oaddr = inany_from_v4(c->ip4.gw);
+		else
+			tgt->oaddr.a6 = c->ip6.our_tap_ll;
+	} else {
+		tgt->oaddr = ini->eaddr;
 	}
+	tgt->oport = ini->eport;
 
 	if (inany_v4(&tgt->oaddr)) {
 		tgt->eaddr = inany_from_v4(c->ip4.addr_seen);
-- 
@@ -170,6 +170,85 @@ static bool is_dns_flow(uint8_t proto, const struct flowside *ini)
 		((ini->oport == 53) || (ini->oport == 853));
 }
 
+/**
+ * fwd_guest_accessible4() - Is IPv4 address guest accessible
+ * @c:		Execution context
+ * @addr:	Host visible IPv4 address
+ *
+ * Return: true if @addr on the host is accessible to the guest without
+ *         translation, false otherwise
+ */
+static bool fwd_guest_accessible4(const struct ctx *c,
+				    const struct in_addr *addr)
+{
+	if (IN4_IS_ADDR_LOOPBACK(addr))
+		return false;
+
+	/* In socket interfaces 0.0.0.0 generally means "any" or unspecified,
+	 * however on the wire it can mean "this host on this network".  Since
+	 * that has a different meaning for host and guest, we can't let it
+	 * through untranslated.
+	 */
+	if (IN4_IS_ADDR_UNSPECIFIED(addr))
+		return false;
+
+	/* For IPv4, addr_seen is initialised to addr, so is always a valid
+	 * address
+	 */
+	if (IN4_ARE_ADDR_EQUAL(addr, &c->ip4.addr) ||
+	    IN4_ARE_ADDR_EQUAL(addr, &c->ip4.addr_seen))
+		return false;
+
+	return true;
+}
+
+/**
+ * fwd_guest_accessible6() - Is IPv6 address guest accessible
+ * @c:		Execution context
+ * @addr:	Host visible IPv6 address
+ *
+ * Return: true if @addr on the host is accessible to the guest without
+ *         translation, false otherwise
+ */
+static bool fwd_guest_accessible6(const struct ctx *c,
+				  const struct in6_addr *addr)
+{
+	if (IN6_IS_ADDR_LOOPBACK(addr))
+		return false;
+
+	if (IN6_ARE_ADDR_EQUAL(addr, &c->ip6.addr))
+		return false;
+
+	/* For IPv6, addr_seen starts unspecified, because we don't know what LL
+	 * address the guest will take until we see it.  Only check against it
+	 * if it has been set to a real address.
+	 */
+	if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.addr_seen) &&
+	    IN6_ARE_ADDR_EQUAL(addr, &c->ip6.addr_seen))
+		return false;
+
+	return true;
+}
+
+/**
+ * fwd_guest_accessible() - Is IPv[46] address guest accessible
+ * @c:		Execution context
+ * @addr:	Host visible IPv[46] address
+ *
+ * Return: true if @addr on the host is accessible to the guest without
+ *         translation, false otherwise
+ */
+static bool fwd_guest_accessible(const struct ctx *c,
+				 const union inany_addr *addr)
+{
+	const struct in_addr *a4 = inany_v4(addr);
+
+	if (a4)
+		return fwd_guest_accessible4(c, a4);
+
+	return fwd_guest_accessible6(c, &addr->a6);
+}
+
 /**
  * fwd_nat_from_tap() - Determine to forward a flow from the tap interface
  * @c:		Execution context
@@ -307,18 +386,15 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto,
 		return PIF_SPLICE;
 	}
 
-	tgt->oaddr = ini->eaddr;
-	tgt->oport = ini->eport;
-
-	if (inany_is_loopback4(&tgt->oaddr) ||
-	    inany_is_unspecified4(&tgt->oaddr) ||
-	    inany_equals4(&tgt->oaddr, &c->ip4.addr_seen)) {
-		tgt->oaddr = inany_from_v4(c->ip4.gw);
-	} else if (inany_is_loopback6(&tgt->oaddr) ||
-		   inany_equals6(&tgt->oaddr, &c->ip6.addr_seen) ||
-		   inany_equals6(&tgt->oaddr, &c->ip6.addr)) {
-		tgt->oaddr.a6 = c->ip6.our_tap_ll;
+	if (!fwd_guest_accessible(c, &ini->eaddr)) {
+		if (inany_v4(&ini->eaddr))
+			tgt->oaddr = inany_from_v4(c->ip4.gw);
+		else
+			tgt->oaddr.a6 = c->ip6.our_tap_ll;
+	} else {
+		tgt->oaddr = ini->eaddr;
 	}
+	tgt->oport = ini->eport;
 
 	if (inany_v4(&tgt->oaddr)) {
 		tgt->eaddr = inany_from_v4(c->ip4.addr_seen);
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 17/22] fwd: Split notion of "our tap address" from gateway for IPv4
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
                   ` (15 preceding siblings ...)
  2024-08-16  5:39 ` [PATCH 16/22] fwd: Helpers to clarify what host addresses aren't guest accessible David Gibson
@ 2024-08-16  5:39 ` David Gibson
  2024-08-20 19:56   ` Stefano Brivio
  2024-08-16  5:39 ` [PATCH 18/22] Don't take "our" MAC address from the host David Gibson
                   ` (6 subsequent siblings)
  23 siblings, 1 reply; 55+ messages in thread
From: David Gibson @ 2024-08-16  5:39 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger, David Gibson

ip4.gw conflates 3 conceptually different things, which (for now) have the
same value:
  1. The router/gateway address as seen by the guest
  2. An address to NAT to the host with --no-map-gw isn't specified
  3. An address to use as source when nothing else makes sense

Case 3 occurs in two situations:

a) for our DHCP responses - since they come from passt internally there's
   no naturally meaningful address for them to come from
b) for forwarded connections coming from an address that isn't guest
   accessible (localhost or the guest's own address).

(b) occurs even with --no-map-gw, and the expected behaviour of forwarding
local connections requires it.

For IPv6 role (3) is now taken by ip6.our_tap_ll (which usually has the
same value as ip6.gw).  For future flexibility we may want to make this
"address of last resort" different from the gateway address, so split them
logically for IPv4 as well.

Specifically, add a new ip4.our_tap_addr field for the address with this
role, and initialise it to ip4.gw for now.  Unlike IPv6 where we can always
get a link-local address, we might not be able to get a (non 0.0.0.0)
address here.  In that case we have to disable DHCP and forwarding of
inbound connections with guest-inaccessible source addresses.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c  |  7 ++++++-
 dhcp.c  |  4 ++--
 fwd.c   | 10 +++++++---
 passt.h |  2 ++
 4 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/conf.c b/conf.c
index 954f20ea..9f962fc8 100644
--- a/conf.c
+++ b/conf.c
@@ -660,6 +660,8 @@ static unsigned int conf_ip4(unsigned int ifi,
 
 	ip4->addr_seen = ip4->addr;
 
+	ip4->our_tap_addr = ip4->gw;
+
 	if (MAC_IS_ZERO(mac)) {
 		int rc = nl_link_get_mac(nl_sock, ifi, mac);
 		if (rc < 0) {
@@ -1666,7 +1668,10 @@ void conf(struct ctx *c, int argc, char **argv)
 		die("External interface not usable");
 
 	if (c->ifi4 && IN4_IS_ADDR_UNSPECIFIED(&c->ip4.gw))
-		c->no_map_gw = c->no_dhcp = 1;
+		c->no_map_gw = 1;
+
+	if (c->ifi4 && IN4_IS_ADDR_UNSPECIFIED(&c->ip4.our_tap_addr))
+		c->no_dhcp = 1;
 
 	if (c->ifi6 && IN6_IS_ADDR_UNSPECIFIED(&c->ip6.gw))
 		c->no_map_gw = 1;
diff --git a/dhcp.c b/dhcp.c
index acc5b03e..a935dc94 100644
--- a/dhcp.c
+++ b/dhcp.c
@@ -347,7 +347,7 @@ int dhcp(const struct ctx *c, const struct pool *p)
 	mask.s_addr = htonl(0xffffffff << (32 - c->ip4.prefix_len));
 	memcpy(opts[1].s,  &mask,        sizeof(mask));
 	memcpy(opts[3].s,  &c->ip4.gw,   sizeof(c->ip4.gw));
-	memcpy(opts[54].s, &c->ip4.gw,   sizeof(c->ip4.gw));
+	memcpy(opts[54].s, &c->ip4.our_tap_addr, sizeof(c->ip4.our_tap_addr));
 
 	/* If the gateway is not on the assigned subnet, send an option 121
 	 * (Classless Static Routing) adding a dummy route to it.
@@ -377,7 +377,7 @@ int dhcp(const struct ctx *c, const struct pool *p)
 		opt_set_dns_search(c, sizeof(m->o));
 
 	dlen = offsetof(struct msg, o) + fill(m);
-	tap_udp4_send(c, c->ip4.gw, 67, c->ip4.addr, 68, m, dlen);
+	tap_udp4_send(c, c->ip4.our_tap_addr, 67, c->ip4.addr, 68, m, dlen);
 
 	return 1;
 }
diff --git a/fwd.c b/fwd.c
index 1baae338..fe618742 100644
--- a/fwd.c
+++ b/fwd.c
@@ -387,10 +387,14 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto,
 	}
 
 	if (!fwd_guest_accessible(c, &ini->eaddr)) {
-		if (inany_v4(&ini->eaddr))
-			tgt->oaddr = inany_from_v4(c->ip4.gw);
-		else
+		if (inany_v4(&ini->eaddr)) {
+			if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.our_tap_addr))
+				/* No source address we can use */
+				return PIF_NONE;
+			tgt->oaddr = inany_from_v4(c->ip4.our_tap_addr);
+		} else {
 			tgt->oaddr.a6 = c->ip6.our_tap_ll;
+		}
 	} else {
 		tgt->oaddr = ini->eaddr;
 	}
diff --git a/passt.h b/passt.h
index 3b8a6283..ecfed1e7 100644
--- a/passt.h
+++ b/passt.h
@@ -97,6 +97,7 @@ enum passt_modes {
  * @gw:			Default IPv4 gateway
  * @dns:		DNS addresses for DHCP, zero-terminated
  * @dns_match:		Forward DNS query if sent to this address
+ * @our_tap_addr:	IPv4 address for passt's use on tap
  * @dns_host:		Use this DNS on the host for forwarding
  * @addr_out:		Optional source address for outbound traffic
  * @ifname_out:		Optional interface name to bind outbound sockets to
@@ -111,6 +112,7 @@ struct ip4_ctx {
 	struct in_addr gw;
 	struct in_addr dns[MAXNS + 1];
 	struct in_addr dns_match;
+	struct in_addr our_tap_addr;
 
 	/* PIF_HOST addresses */
 	struct in_addr dns_host;
-- 
@@ -97,6 +97,7 @@ enum passt_modes {
  * @gw:			Default IPv4 gateway
  * @dns:		DNS addresses for DHCP, zero-terminated
  * @dns_match:		Forward DNS query if sent to this address
+ * @our_tap_addr:	IPv4 address for passt's use on tap
  * @dns_host:		Use this DNS on the host for forwarding
  * @addr_out:		Optional source address for outbound traffic
  * @ifname_out:		Optional interface name to bind outbound sockets to
@@ -111,6 +112,7 @@ struct ip4_ctx {
 	struct in_addr gw;
 	struct in_addr dns[MAXNS + 1];
 	struct in_addr dns_match;
+	struct in_addr our_tap_addr;
 
 	/* PIF_HOST addresses */
 	struct in_addr dns_host;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 18/22] Don't take "our" MAC address from the host
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
                   ` (16 preceding siblings ...)
  2024-08-16  5:39 ` [PATCH 17/22] fwd: Split notion of "our tap address" from gateway for IPv4 David Gibson
@ 2024-08-16  5:39 ` David Gibson
  2024-08-16  5:40 ` [PATCH 19/22] conf, fwd: Split notion of gateway/router from guest-visible host address David Gibson
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-16  5:39 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger, David Gibson

When sending frames to the guest over the tap link, we need a source MAC
address.  Currently we take that from the MAC address of the main interface
on the host, but that doesn't actually make much sense:
 * We can't preserve the real MAC address of packets from anywhere
   external so there's no transparency case here
 * In fact, it's confusingly different from how we handle IP addresses:
   whereas we give the guest the same IP as the host, we're making the
   host's MAC the one MAC that the guest *can't* use for itself.
 * We already need a fallback case if the host doesn't have an Ethernet
   like MAC (e.g. if it's connected via a point to point interface, such
   as a wireguard VPN).

Change to just just use an arbitrary fixed MAC address - I've picked
9a:55:9a:55:9a:55.  It's simpler and has the small advantage of making
the fact that passt/pasta is in use typically obvious from guest side
packet dumps.  This can still, of course, be overridden with the -M option.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c  | 40 +++++-----------------------------------
 passt.h |  7 +++++++
 util.h  |  1 -
 3 files changed, 12 insertions(+), 36 deletions(-)

diff --git a/conf.c b/conf.c
index 9f962fc8..b1c58d5b 100644
--- a/conf.c
+++ b/conf.c
@@ -612,12 +612,10 @@ static int conf_ip4_prefix(const char *arg)
  * conf_ip4() - Verify or detect IPv4 support, get relevant addresses
  * @ifi:	Host interface to attempt (0 to determine one)
  * @ip4:	IPv4 context (will be written)
- * @mac:	MAC address to use (written if unset)
  *
  * Return:	Interface index for IPv4, or 0 on failure.
  */
-static unsigned int conf_ip4(unsigned int ifi,
-			     struct ip4_ctx *ip4, unsigned char *mac)
+static unsigned int conf_ip4(unsigned int ifi, struct ip4_ctx *ip4)
 {
 	if (!ifi)
 		ifi = nl_get_ext_if(nl_sock, AF_INET);
@@ -662,20 +660,6 @@ static unsigned int conf_ip4(unsigned int ifi,
 
 	ip4->our_tap_addr = ip4->gw;
 
-	if (MAC_IS_ZERO(mac)) {
-		int rc = nl_link_get_mac(nl_sock, ifi, mac);
-		if (rc < 0) {
-			char ifname[IFNAMSIZ];
-
-			err("Couldn't discover MAC address for %s: %s",
-			    if_indextoname(ifi, ifname), strerror(-rc));
-			return 0;
-		}
-
-		if (MAC_IS_ZERO(mac))
-			memcpy(mac, MAC_LAA, ETH_ALEN);
-	}
-
 	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr))
 		return 0;
 
@@ -686,12 +670,10 @@ static unsigned int conf_ip4(unsigned int ifi,
  * conf_ip6() - Verify or detect IPv6 support, get relevant addresses
  * @ifi:	Host interface to attempt (0 to determine one)
  * @ip6:	IPv6 context (will be written)
- * @mac:	MAC address to use (written if unset)
  *
  * Return:	Interface index for IPv6, or 0 on failure.
  */
-static unsigned int conf_ip6(unsigned int ifi,
-			     struct ip6_ctx *ip6, unsigned char *mac)
+static unsigned int conf_ip6(unsigned int ifi, struct ip6_ctx *ip6)
 {
 	int prefix_len = 0;
 	int rc;
@@ -726,19 +708,6 @@ static unsigned int conf_ip6(unsigned int ifi,
 	if (IN6_IS_ADDR_LINKLOCAL(&ip6->gw))
 		ip6->our_tap_ll = ip6->gw;
 
-	if (MAC_IS_ZERO(mac)) {
-		rc = nl_link_get_mac(nl_sock, ifi, mac);
-		if (rc < 0) {
-			char ifname[IFNAMSIZ];
-			err("Couldn't discover MAC address for %s: %s",
-			    if_indextoname(ifi, ifname), strerror(-rc));
-			return 0;
-		}
-
-		if (MAC_IS_ZERO(mac))
-			memcpy(mac, MAC_LAA, ETH_ALEN);
-	}
-
 	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ||
 	    IN6_IS_ADDR_UNSPECIFIED(&ip6->our_tap_ll))
 		return 0;
@@ -1289,6 +1258,7 @@ void conf(struct ctx *c, int argc, char **argv)
 
 	c->tcp.fwd_in.mode = c->tcp.fwd_out.mode = FWD_UNSET;
 	c->udp.fwd_in.mode = c->udp.fwd_out.mode = FWD_UNSET;
+	memcpy(c->our_tap_mac, MAC_OUR_LAA, ETH_ALEN);
 
 	optind = 1;
 	do {
@@ -1659,9 +1629,9 @@ void conf(struct ctx *c, int argc, char **argv)
 
 	nl_sock_init(c, false);
 	if (!v6_only)
-		c->ifi4 = conf_ip4(ifi4, &c->ip4, c->our_tap_mac);
+		c->ifi4 = conf_ip4(ifi4, &c->ip4);
 	if (!v4_only)
-		c->ifi6 = conf_ip6(ifi6, &c->ip6, c->our_tap_mac);
+		c->ifi6 = conf_ip6(ifi6, &c->ip6);
 	if ((!c->ifi4 && !c->ifi6) ||
 	    (*c->ip4.ifname_out && !c->ifi4) ||
 	    (*c->ip6.ifname_out && !c->ifi6))
diff --git a/passt.h b/passt.h
index ecfed1e7..c6c67ffc 100644
--- a/passt.h
+++ b/passt.h
@@ -26,6 +26,13 @@ union epoll_ref;
 #include "tcp.h"
 #include "udp.h"
 
+/* Default address for our end on the tap interface.  Bit 0 of byte 0 must be 0
+ * (unicast) and bit 1 of byte 1 must be 1 (locally administered).  Otherwise
+ * it's arbitrary.
+ */
+#define MAC_OUR_LAA	\
+	((uint8_t [ETH_ALEN]){0x9a, 0x55, 0x9a, 0x55, 0x9a, 0x55})
+
 /**
  * union epoll_ref - Breakdown of reference for epoll fd bookkeeping
  * @type:	Type of fd (tells us what to do with events)
diff --git a/util.h b/util.h
index c1748074..899496e3 100644
--- a/util.h
+++ b/util.h
@@ -96,7 +96,6 @@
 #define PORT_IS_EPHEMERAL(port) ((port) >= PORT_EPHEMERAL_MIN)
 
 #define MAC_ZERO		((uint8_t [ETH_ALEN]){ 0 })
-#define MAC_LAA			((uint8_t [ETH_ALEN]){ BIT(1), 0, 0, 0, 0, 0 })
 #define MAC_IS_ZERO(addr)	(!memcmp((addr), MAC_ZERO, ETH_ALEN))
 
 #ifndef __bswap_constant_16
-- 
@@ -96,7 +96,6 @@
 #define PORT_IS_EPHEMERAL(port) ((port) >= PORT_EPHEMERAL_MIN)
 
 #define MAC_ZERO		((uint8_t [ETH_ALEN]){ 0 })
-#define MAC_LAA			((uint8_t [ETH_ALEN]){ BIT(1), 0, 0, 0, 0, 0 })
 #define MAC_IS_ZERO(addr)	(!memcmp((addr), MAC_ZERO, ETH_ALEN))
 
 #ifndef __bswap_constant_16
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 19/22] conf, fwd: Split notion of gateway/router from guest-visible host address
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
                   ` (17 preceding siblings ...)
  2024-08-16  5:39 ` [PATCH 18/22] Don't take "our" MAC address from the host David Gibson
@ 2024-08-16  5:40 ` David Gibson
  2024-08-20 19:56   ` Stefano Brivio
  2024-08-16  5:40 ` [PATCH 20/22] conf: Allow address remapped to host to be configured David Gibson
                   ` (4 subsequent siblings)
  23 siblings, 1 reply; 55+ messages in thread
From: David Gibson @ 2024-08-16  5:40 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger, David Gibson

The @gw fields in the ip4_ctx and ip6_ctx give the (host's) default
gateway.  We use this for two quite distinct things: advertising the
gateway that the guest should use (via DHCP, NDP and/or --config-net)
and for a limited form of NAT.  So that the guest can access services
on the host, we map the gateway address within the guest to the
loopback address on the host.

Using the gateway address for this isn't necessarily the best choice
for this purpose, certainly not for all circumstances.  So, start off
by splitting the notion of these into two different values: @guest_gw
which is the gateway address the guest should use and @nat_host_loopback,
which is the guest visible address to remap to the host's loopback.

Usually nat_host_loopback will have the same value as guest_gw.  However
when --no-map-gw is specified we leave them unspecified instead.  This
means when we use nat_host_loopback, we don't need to separately check
c->no_map_gw to see if it's relevant.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c  | 60 +++++++++++++++++++++++++++++----------------------------
 dhcp.c  | 10 ++++++----
 fwd.c   |  4 ++--
 passt.h | 16 +++++++++------
 pasta.c |  6 ++++--
 5 files changed, 53 insertions(+), 43 deletions(-)

diff --git a/conf.c b/conf.c
index b1c58d5b..26373584 100644
--- a/conf.c
+++ b/conf.c
@@ -410,12 +410,12 @@ static void add_dns_resolv(struct ctx *c, const char *nameserver,
 		 * redirect
 		 */
 		if (IN4_IS_ADDR_LOOPBACK(&ns4)) {
-			if (c->no_map_gw)
+			if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.nat_host_loopback))
 				return;
 
-			ns4 = c->ip4.gw;
+			ns4 = c->ip4.nat_host_loopback;
 			if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns_match))
-				c->ip4.dns_match = c->ip4.gw;
+				c->ip4.dns_match = c->ip4.nat_host_loopback;
 		}
 
 		*idx4 += add_dns4(c, &ns4, *idx4);
@@ -429,13 +429,13 @@ static void add_dns_resolv(struct ctx *c, const char *nameserver,
 		 * redirect
 		 */
 		if (IN6_IS_ADDR_LOOPBACK(&ns6)) {
-			if (c->no_map_gw)
+			if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.nat_host_loopback))
 				return;
 
-			ns6 = c->ip6.gw;
+			ns6 = c->ip6.nat_host_loopback;
 
 			if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_match))
-				c->ip6.dns_match = c->ip6.gw;
+				c->ip6.dns_match = c->ip6.nat_host_loopback;
 		}
 
 		*idx6 += add_dns6(c, &ns6, *idx6);
@@ -625,8 +625,9 @@ static unsigned int conf_ip4(unsigned int ifi, struct ip4_ctx *ip4)
 		return 0;
 	}
 
-	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->gw)) {
-		int rc = nl_route_get_def(nl_sock, ifi, AF_INET, &ip4->gw);
+	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->guest_gw)) {
+		int rc = nl_route_get_def(nl_sock, ifi, AF_INET,
+					  &ip4->guest_gw);
 		if (rc < 0) {
 			err("Couldn't discover IPv4 gateway address: %s",
 			    strerror(-rc));
@@ -658,7 +659,7 @@ static unsigned int conf_ip4(unsigned int ifi, struct ip4_ctx *ip4)
 
 	ip4->addr_seen = ip4->addr;
 
-	ip4->our_tap_addr = ip4->gw;
+	ip4->our_tap_addr = ip4->guest_gw;
 
 	if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr))
 		return 0;
@@ -686,8 +687,8 @@ static unsigned int conf_ip6(unsigned int ifi, struct ip6_ctx *ip6)
 		return 0;
 	}
 
-	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->gw)) {
-		rc = nl_route_get_def(nl_sock, ifi, AF_INET6, &ip6->gw);
+	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->guest_gw)) {
+		rc = nl_route_get_def(nl_sock, ifi, AF_INET6, &ip6->guest_gw);
 		if (rc < 0) {
 			err("Couldn't discover IPv6 gateway address: %s",
 			    strerror(-rc));
@@ -705,8 +706,8 @@ static unsigned int conf_ip6(unsigned int ifi, struct ip6_ctx *ip6)
 
 	ip6->addr_seen = ip6->addr;
 
-	if (IN6_IS_ADDR_LINKLOCAL(&ip6->gw))
-		ip6->our_tap_ll = ip6->gw;
+	if (IN6_IS_ADDR_LINKLOCAL(&ip6->guest_gw))
+		ip6->our_tap_ll = ip6->guest_gw;
 
 	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ||
 	    IN6_IS_ADDR_UNSPECIFIED(&ip6->our_tap_ll))
@@ -969,7 +970,8 @@ static void conf_print(const struct ctx *c)
 			info("    mask: %s",
 			     inet_ntop(AF_INET, &mask,        buf4, sizeof(buf4)));
 			info("    router: %s",
-			     inet_ntop(AF_INET, &c->ip4.gw,   buf4, sizeof(buf4)));
+			     inet_ntop(AF_INET, &c->ip4.guest_gw,
+				       buf4, sizeof(buf4)));
 		}
 
 		for (i = 0; !IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns[i]); i++) {
@@ -999,7 +1001,7 @@ static void conf_print(const struct ctx *c)
 		info("    assign: %s",
 		     inet_ntop(AF_INET6, &c->ip6.addr, buf6, sizeof(buf6)));
 		info("    router: %s",
-		     inet_ntop(AF_INET6, &c->ip6.gw,   buf6, sizeof(buf6)));
+		     inet_ntop(AF_INET6, &c->ip6.guest_gw, buf6, sizeof(buf6)));
 		info("    our link-local: %s",
 		     inet_ntop(AF_INET6, &c->ip6.our_tap_ll,
 			       buf6, sizeof(buf6)));
@@ -1173,7 +1175,7 @@ fail:
  */
 void conf(struct ctx *c, int argc, char **argv)
 {
-	int netns_only = 0;
+	int netns_only = 0, no_map_gw = 0;
 	const struct option options[] = {
 		{"debug",	no_argument,		NULL,		'd' },
 		{"quiet",	no_argument,		NULL,		'q' },
@@ -1202,7 +1204,7 @@ void conf(struct ctx *c, int argc, char **argv)
 		{"no-dhcpv6",	no_argument,		&c->no_dhcpv6,	1 },
 		{"no-ndp",	no_argument,		&c->no_ndp,	1 },
 		{"no-ra",	no_argument,		&c->no_ra,	1 },
-		{"no-map-gw",	no_argument,		&c->no_map_gw,	1 },
+		{"no-map-gw",	no_argument,		&no_map_gw,	1 },
 		{"ipv4-only",	no_argument,		NULL,		'4' },
 		{"ipv6-only",	no_argument,		NULL,		'6' },
 		{"one-off",	no_argument,		NULL,		'1' },
@@ -1503,18 +1505,18 @@ void conf(struct ctx *c, int argc, char **argv)
 			parse_mac(c->our_tap_mac, optarg);
 			break;
 		case 'g':
-			if (inet_pton(AF_INET6, optarg, &c->ip6.gw)     &&
-			    !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.gw)	&&
-			    !IN6_IS_ADDR_LOOPBACK(&c->ip6.gw)) {
+			if (inet_pton(AF_INET6, optarg, &c->ip6.guest_gw) &&
+			    !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.guest_gw)	&&
+			    !IN6_IS_ADDR_LOOPBACK(&c->ip6.guest_gw)) {
 				if (c->mode == MODE_PASTA)
 					c->ip6.no_copy_routes = true;
 				break;
 			}
 
-			if (inet_pton(AF_INET, optarg, &c->ip4.gw)	&&
-			    !IN4_IS_ADDR_UNSPECIFIED(&c->ip4.gw)	&&
-			    !IN4_IS_ADDR_BROADCAST(&c->ip4.gw)		&&
-			    !IN4_IS_ADDR_LOOPBACK(&c->ip4.gw)) {
+			if (inet_pton(AF_INET, optarg, &c->ip4.guest_gw) &&
+			    !IN4_IS_ADDR_UNSPECIFIED(&c->ip4.guest_gw)	&&
+			    !IN4_IS_ADDR_BROADCAST(&c->ip4.guest_gw)	&&
+			    !IN4_IS_ADDR_LOOPBACK(&c->ip4.guest_gw)) {
 				if (c->mode == MODE_PASTA)
 					c->ip4.no_copy_routes = true;
 				break;
@@ -1637,15 +1639,15 @@ void conf(struct ctx *c, int argc, char **argv)
 	    (*c->ip6.ifname_out && !c->ifi6))
 		die("External interface not usable");
 
-	if (c->ifi4 && IN4_IS_ADDR_UNSPECIFIED(&c->ip4.gw))
-		c->no_map_gw = 1;
+	if (c->ifi4 && !no_map_gw)
+		c->ip4.nat_host_loopback = c->ip4.guest_gw;
+
+	if (c->ifi6 && !no_map_gw)
+		c->ip6.nat_host_loopback = c->ip6.guest_gw;
 
 	if (c->ifi4 && IN4_IS_ADDR_UNSPECIFIED(&c->ip4.our_tap_addr))
 		c->no_dhcp = 1;
 
-	if (c->ifi6 && IN6_IS_ADDR_UNSPECIFIED(&c->ip6.gw))
-		c->no_map_gw = 1;
-
 	/* Inbound port options & DNS can be parsed now (after IPv4/IPv6
 	 * settings)
 	 */
diff --git a/dhcp.c b/dhcp.c
index a935dc94..43585888 100644
--- a/dhcp.c
+++ b/dhcp.c
@@ -346,19 +346,21 @@ int dhcp(const struct ctx *c, const struct pool *p)
 	m->yiaddr = c->ip4.addr;
 	mask.s_addr = htonl(0xffffffff << (32 - c->ip4.prefix_len));
 	memcpy(opts[1].s,  &mask,        sizeof(mask));
-	memcpy(opts[3].s,  &c->ip4.gw,   sizeof(c->ip4.gw));
+	memcpy(opts[3].s,  &c->ip4.guest_gw, sizeof(c->ip4.guest_gw));
 	memcpy(opts[54].s, &c->ip4.our_tap_addr, sizeof(c->ip4.our_tap_addr));
 
 	/* If the gateway is not on the assigned subnet, send an option 121
 	 * (Classless Static Routing) adding a dummy route to it.
 	 */
 	if ((c->ip4.addr.s_addr & mask.s_addr)
-	    != (c->ip4.gw.s_addr & mask.s_addr)) {
+	    != (c->ip4.guest_gw.s_addr & mask.s_addr)) {
 		/* a.b.c.d/32:0.0.0.0, 0:a.b.c.d */
 		opts[121].slen = 14;
 		opts[121].s[0] = 32;
-		memcpy(opts[121].s + 1,  &c->ip4.gw, sizeof(c->ip4.gw));
-		memcpy(opts[121].s + 10, &c->ip4.gw, sizeof(c->ip4.gw));
+		memcpy(opts[121].s + 1,
+		       &c->ip4.guest_gw, sizeof(c->ip4.guest_gw));
+		memcpy(opts[121].s + 10,
+		       &c->ip4.guest_gw, sizeof(c->ip4.guest_gw));
 	}
 
 	if (c->mtu != -1) {
diff --git a/fwd.c b/fwd.c
index fe618742..779278a9 100644
--- a/fwd.c
+++ b/fwd.c
@@ -268,9 +268,9 @@ uint8_t fwd_nat_from_tap(const struct ctx *c, uint8_t proto,
 	else if (is_dns_flow(proto, ini) &&
 		   inany_equals6(&ini->oaddr, &c->ip6.dns_match))
 		tgt->eaddr.a6 = c->ip6.dns_host;
-	else if (!c->no_map_gw && inany_equals4(&ini->oaddr, &c->ip4.gw))
+	else if (inany_equals4(&ini->oaddr, &c->ip4.nat_host_loopback))
 		tgt->eaddr = inany_loopback4;
-	else if (!c->no_map_gw && inany_equals6(&ini->oaddr, &c->ip6.gw))
+	else if (inany_equals6(&ini->oaddr, &c->ip6.nat_host_loopback))
 		tgt->eaddr = inany_loopback6;
 	else
 		tgt->eaddr = ini->oaddr;
diff --git a/passt.h b/passt.h
index c6c67ffc..20a5904a 100644
--- a/passt.h
+++ b/passt.h
@@ -101,7 +101,9 @@ enum passt_modes {
  * @addr:		IPv4 address assigned to guest
  * @addr_seen:		Latest IPv4 address seen as source from tap
  * @prefixlen:		IPv4 prefix length (netmask)
- * @gw:			Default IPv4 gateway
+ * @guest_gw:		IPv4 gateway as seen by the guest
+ * @nat_host_loopback:	Outbound connections to this address are NATted to the
+ *                      host's 127.0.0.1
  * @dns:		DNS addresses for DHCP, zero-terminated
  * @dns_match:		Forward DNS query if sent to this address
  * @our_tap_addr:	IPv4 address for passt's use on tap
@@ -116,7 +118,8 @@ struct ip4_ctx {
 	struct in_addr addr;
 	struct in_addr addr_seen;
 	int prefix_len;
-	struct in_addr gw;
+	struct in_addr guest_gw;
+	struct in_addr nat_host_loopback;
 	struct in_addr dns[MAXNS + 1];
 	struct in_addr dns_match;
 	struct in_addr our_tap_addr;
@@ -136,7 +139,9 @@ struct ip4_ctx {
  * @addr:		IPv6 address assigned to guest
  * @addr_seen:		Latest IPv6 global/site address seen as source from tap
  * @addr_ll_seen:	Latest IPv6 link-local address seen as source from tap
- * @gw:			Default IPv6 gateway
+ * @guest_gw:		IPv6 gateway as seen by the guest
+ * @nat_host_loopback:	Outbound connections to this address are NATted to the
+ *                      host's [::1]
  * @dns:		DNS addresses for DHCPv6 and NDP, zero-terminated
  * @dns_match:		Forward DNS query if sent to this address
  * @our_tap_ll:		Link-local IPv6 address for passt's use on tap
@@ -151,7 +156,8 @@ struct ip6_ctx {
 	struct in6_addr addr;
 	struct in6_addr addr_seen;
 	struct in6_addr addr_ll_seen;
-	struct in6_addr gw;
+	struct in6_addr guest_gw;
+	struct in6_addr nat_host_loopback;
 	struct in6_addr dns[MAXNS + 1];
 	struct in6_addr dns_match;
 	struct in6_addr our_tap_ll;
@@ -213,7 +219,6 @@ struct ip6_ctx {
  * @no_dhcpv6:		Disable DHCPv6 server
  * @no_ndp:		Disable NDP handler altogether
  * @no_ra:		Disable router advertisements
- * @no_map_gw:		Don't map connections, untracked UDP to gateway to host
  * @low_wmem:		Low probed net.core.wmem_max
  * @low_rmem:		Low probed net.core.rmem_max
  */
@@ -273,7 +278,6 @@ struct ctx {
 	int no_dhcpv6;
 	int no_ndp;
 	int no_ra;
-	int no_map_gw;
 
 	int low_wmem;
 	int low_rmem;
diff --git a/pasta.c b/pasta.c
index 3b4e8ead..2aeaf388 100644
--- a/pasta.c
+++ b/pasta.c
@@ -324,7 +324,8 @@ void pasta_ns_conf(struct ctx *c)
 
 			if (c->ip4.no_copy_routes) {
 				rc = nl_route_set_def(nl_sock_ns, c->pasta_ifi,
-						      AF_INET, &c->ip4.gw);
+						      AF_INET,
+						      &c->ip4.guest_gw);
 			} else {
 				rc = nl_route_dup(nl_sock, c->ifi4, nl_sock_ns,
 						  c->pasta_ifi, AF_INET);
@@ -353,7 +354,8 @@ void pasta_ns_conf(struct ctx *c)
 
 			if (c->ip6.no_copy_routes) {
 				rc = nl_route_set_def(nl_sock_ns, c->pasta_ifi,
-						      AF_INET6, &c->ip6.gw);
+						      AF_INET6,
+						      &c->ip6.guest_gw);
 			} else {
 				rc = nl_route_dup(nl_sock, c->ifi6,
 						  nl_sock_ns, c->pasta_ifi,
-- 
@@ -324,7 +324,8 @@ void pasta_ns_conf(struct ctx *c)
 
 			if (c->ip4.no_copy_routes) {
 				rc = nl_route_set_def(nl_sock_ns, c->pasta_ifi,
-						      AF_INET, &c->ip4.gw);
+						      AF_INET,
+						      &c->ip4.guest_gw);
 			} else {
 				rc = nl_route_dup(nl_sock, c->ifi4, nl_sock_ns,
 						  c->pasta_ifi, AF_INET);
@@ -353,7 +354,8 @@ void pasta_ns_conf(struct ctx *c)
 
 			if (c->ip6.no_copy_routes) {
 				rc = nl_route_set_def(nl_sock_ns, c->pasta_ifi,
-						      AF_INET6, &c->ip6.gw);
+						      AF_INET6,
+						      &c->ip6.guest_gw);
 			} else {
 				rc = nl_route_dup(nl_sock, c->ifi6,
 						  nl_sock_ns, c->pasta_ifi,
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 20/22] conf: Allow address remapped to host to be configured
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
                   ` (18 preceding siblings ...)
  2024-08-16  5:40 ` [PATCH 19/22] conf, fwd: Split notion of gateway/router from guest-visible host address David Gibson
@ 2024-08-16  5:40 ` David Gibson
  2024-08-20 19:56   ` Stefano Brivio
  2024-08-16  5:40 ` [PATCH 21/22] fwd: Distinguish translatable from untranslatable addresses on inbound David Gibson
                   ` (3 subsequent siblings)
  23 siblings, 1 reply; 55+ messages in thread
From: David Gibson @ 2024-08-16  5:40 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger, David Gibson

Because the host and guest share the same IP address with passt/pasta, it's
not possible for the guest to directly address the host.  Therefore we
allow packets from the guest going to a special "NAT to host" address to be
redirected to the host, appearing there as though they have both source and
destination address of loopback.

Currently that special address is always the address of the default
gateway (or none).  That can be a problem if we want that gateway to be
addressable by the guest.  Therefore, allow the special "NAT to host"
address to be overridden on the command line with a new --nat-host-loopback
option.

In order to exercise and test it, update the passt_in_ns and perf
tests to use this option and give different mapping addresses for the
two layers of the environment.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c                | 57 +++++++++++++++++++++++++++++++--
 passt.1               | 16 ++++++++++
 test/lib/setup        | 11 +++++--
 test/passt_in_ns/dhcp | 73 +++++++++++++++++++++++++++++++++++++++++++
 test/passt_in_ns/tcp  | 38 +++++++++++-----------
 test/passt_in_ns/udp  | 22 +++++++------
 test/perf/passt_tcp   | 33 +++++++++----------
 test/perf/passt_udp   | 31 +++++++++---------
 test/perf/pasta_tcp   | 29 ++++++++---------
 test/perf/pasta_udp   | 25 ++++++++-------
 test/run              |  4 +--
 11 files changed, 244 insertions(+), 95 deletions(-)
 create mode 100644 test/passt_in_ns/dhcp

diff --git a/conf.c b/conf.c
index 26373584..c5831e82 100644
--- a/conf.c
+++ b/conf.c
@@ -817,6 +817,14 @@ static void usage(const char *name, FILE *f, int status)
 		fprintf(f, "  --no-dhcp-search	No list in DHCP/DHCPv6/NDP\n");
 
 	fprintf(f,
+		"  --nat-host-loopback ADDR	NAT ADDR to refer to host\n"
+		"    Packets from the guest to ADDR will be redirected to the\n"
+		"    host.  On the host such packets will appear to have both\n"
+		"    source and destination of loopback (127.0.0.1 or ::1).\n"
+		"    ADDR can be 'none', in which case nothing is mapped\n"
+	        "    Can be specified zero to two (for IPv4 and IPv6)\n"
+		"    default: gateway address, or none if --no-map-gw is also\n"
+		"             specified\n"
 		"  --dns-forward ADDR	Forward DNS queries sent to ADDR\n"
 		"    can be specified zero to two times (for IPv4 and IPv6)\n"
 		"    default: don't forward DNS queries\n"
@@ -959,6 +967,11 @@ static void conf_print(const struct ctx *c)
 	info("    host: %s", eth_ntop(c->our_tap_mac, bufmac, sizeof(bufmac)));
 
 	if (c->ifi4) {
+		if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.nat_host_loopback))
+			info("    NAT to host 127.0.0.1: %s",
+			     inet_ntop(AF_INET, &c->ip4.nat_host_loopback,
+				       buf4, sizeof(buf4)));
+
 		if (!c->no_dhcp) {
 			uint32_t mask;
 
@@ -989,6 +1002,11 @@ static void conf_print(const struct ctx *c)
 	}
 
 	if (c->ifi6) {
+		if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.nat_host_loopback))
+			info("    NAT to host ::1: %s",
+			     inet_ntop(AF_INET6, &c->ip6.nat_host_loopback,
+				       buf6, sizeof(buf6)));
+
 		if (!c->no_ndp && !c->no_dhcpv6)
 			info("NDP/DHCPv6:");
 		else if (!c->no_ndp)
@@ -1122,6 +1140,35 @@ static void conf_ugid(char *runas, uid_t *uid, gid_t *gid)
 	}
 }
 
+/**
+ * conf_nat() - Parse --nat-host-loopback option
+ * @c:		Execution context
+ * @arg:	String argument to --nat-host-loopback
+ * @no_map_gw:	--no-map-gw flag, updated for "none" argument
+ */
+static void conf_nat(struct ctx *c, const char *arg, int *no_map_gw)
+{
+	if (strcmp(arg, "none") == 0) {
+		c->ip4.nat_host_loopback = in4addr_any;
+		c->ip6.nat_host_loopback = in6addr_any;
+		*no_map_gw = 1;
+	}
+
+	if (inet_pton(AF_INET6, arg, &c->ip6.nat_host_loopback) &&
+	    !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.nat_host_loopback)	&&
+	    !IN6_IS_ADDR_LOOPBACK(&c->ip6.nat_host_loopback)	&&
+	    !IN6_IS_ADDR_MULTICAST(&c->ip6.nat_host_loopback))
+		return;
+
+	if (inet_pton(AF_INET, arg, &c->ip4.nat_host_loopback)	&&
+	    !IN4_IS_ADDR_UNSPECIFIED(&c->ip4.nat_host_loopback)	&&
+	    !IN4_IS_ADDR_LOOPBACK(&c->ip4.nat_host_loopback)	&&
+	    !IN4_IS_ADDR_MULTICAST(&c->ip4.nat_host_loopback))
+		return;
+
+	die("Invalid address to remap to host: %s", optarg);
+}
+
 /**
  * conf_open_files() - Open files as requested by configuration
  * @c:		Execution context
@@ -1231,6 +1278,7 @@ void conf(struct ctx *c, int argc, char **argv)
 		{"no-copy-routes", no_argument,		NULL,		18 },
 		{"no-copy-addrs", no_argument,		NULL,		19 },
 		{"netns-only",	no_argument,		NULL,		20 },
+		{"nat-host-loopback", required_argument, NULL,		21 },
 		{ 0 },
 	};
 	const char *logname = (c->mode == MODE_PASTA) ? "pasta" : "passt";
@@ -1400,6 +1448,9 @@ void conf(struct ctx *c, int argc, char **argv)
 			netns_only = 1;
 			*userns = 0;
 			break;
+		case 21:
+			conf_nat(c, optarg, &no_map_gw);
+			break;
 		case 'd':
 			c->debug = 1;
 			c->quiet = 0;
@@ -1639,10 +1690,12 @@ void conf(struct ctx *c, int argc, char **argv)
 	    (*c->ip6.ifname_out && !c->ifi6))
 		die("External interface not usable");
 
-	if (c->ifi4 && !no_map_gw)
+	if (c->ifi4 && !no_map_gw &&
+	    IN4_IS_ADDR_UNSPECIFIED(&c->ip4.nat_host_loopback))
 		c->ip4.nat_host_loopback = c->ip4.guest_gw;
 
-	if (c->ifi6 && !no_map_gw)
+	if (c->ifi6 && !no_map_gw &&
+	    IN6_IS_ADDR_UNSPECIFIED(&c->ip6.nat_host_loopback))
 		c->ip6.nat_host_loopback = c->ip6.guest_gw;
 
 	if (c->ifi4 && IN4_IS_ADDR_UNSPECIFIED(&c->ip4.our_tap_addr))
diff --git a/passt.1 b/passt.1
index dca433b6..3680056a 100644
--- a/passt.1
+++ b/passt.1
@@ -327,6 +327,22 @@ namespace will be silently dropped.
 Disable Router Advertisements. Router Solicitations coming from guest or target
 namespace will be ignored.
 
+.TP
+.BR \-\-nat-host-loopback " " \fIaddr
+Translate \fIaddr\fR to refer to the host. Packets from the guest to
+\fIaddr\fR will be redirected to the host.  On the host such packets
+will appear to have both source and destination of loopback (127.0.0.1
+or ::1).
+
+If \fIaddr\fR is 'none', no address is mapped (this implies
+\fB--no-map-gw\fR).  Only one IPv4 and one IPv6 address can be
+translated, if the option is specified multiple times, the last one
+takes effect.
+
+Default is to translate the guest's default gateway address, unless
+\fB--no-map-gw\fR is also given, in which case no address is mapped by
+default.
+
 .TP
 .BR \-\-no-map-gw
 Don't remap TCP connections and untracked UDP traffic, with the gateway address
diff --git a/test/lib/setup b/test/lib/setup
index 9b39b9fe..061bf997 100755
--- a/test/lib/setup
+++ b/test/lib/setup
@@ -124,7 +124,12 @@ setup_passt_in_ns() {
 	[ ${DEBUG} -eq 1 ] && __opts="${__opts} -d"
 	[ ${TRACE} -eq 1 ] && __opts="${__opts} --trace"
 
-	context_run_bg pasta "./pasta ${__opts} -t 10001,10002,10011,10012 -T 10003,10013 -u 10001,10002,10011,10012 -U 10003,10013 -P ${STATESETUP}/pasta.pid --config-net ${NSTOOL} hold ${STATESETUP}/ns.hold"
+        __nat_host4=192.0.2.1
+        __nat_host6=2001:db8:9a55::1
+        __nat_ns4=192.0.2.2
+        __nat_ns6=2001:db8:9a55::2
+
+	context_run_bg pasta "./pasta ${__opts} -t 10001,10002,10011,10012 -T 10003,10013 -u 10001,10002,10011,10012 -U 10003,10013 -P ${STATESETUP}/pasta.pid --nat-host-loopback ${__nat_host4} --nat-host-loopback ${__nat_host6} --config-net ${NSTOOL} hold ${STATESETUP}/ns.hold"
 	wait_for [ -f "${STATESETUP}/pasta.pid" ]
 
 	context_setup_nstool qemu ${STATESETUP}/ns.hold
@@ -139,11 +144,11 @@ setup_passt_in_ns() {
 	if [ ${VALGRIND} -eq 1 ]; then
 		context_run passt "make clean"
 		context_run passt "make valgrind"
-		context_run_bg passt "valgrind --max-stackframe=$((4 * 1024 * 1024)) --trace-children=yes --vgdb=no --error-exitcode=1 --suppressions=test/valgrind.supp ./passt -f ${__opts} -s ${STATESETUP}/passt.socket -t 10001,10011,10021,10031 -u 10001,10011,10021,10031 -P ${STATESETUP}/passt.pid"
+		context_run_bg passt "valgrind --max-stackframe=$((4 * 1024 * 1024)) --trace-children=yes --vgdb=no --error-exitcode=1 --suppressions=test/valgrind.supp ./passt -f ${__opts} -s ${STATESETUP}/passt.socket -t 10001,10011,10021,10031 -u 10001,10011,10021,10031 -P ${STATESETUP}/passt.pid --nat-host-loopback ${__nat_ns4} --nat-host-loopback ${__nat_ns6}"
 	else
 		context_run passt "make clean"
 		context_run passt "make"
-		context_run_bg passt "./passt -f ${__opts} -s ${STATESETUP}/passt.socket -t 10001,10011,10021,10031 -u 10001,10011,10021,10031 -P ${STATESETUP}/passt.pid"
+		context_run_bg passt "./passt -f ${__opts} -s ${STATESETUP}/passt.socket -t 10001,10011,10021,10031 -u 10001,10011,10021,10031 -P ${STATESETUP}/passt.pid --nat-host-loopback ${__nat_ns4} --nat-host-loopback ${__nat_ns6}"
 	fi
 	wait_for [ -f "${STATESETUP}/passt.pid" ]
 
diff --git a/test/passt_in_ns/dhcp b/test/passt_in_ns/dhcp
new file mode 100644
index 00000000..48c7d197
--- /dev/null
+++ b/test/passt_in_ns/dhcp
@@ -0,0 +1,73 @@
+# SPDX-License-Identifier: GPL-2.0-or-later
+#
+# PASST - Plug A Simple Socket Transport
+#  for qemu/UNIX domain socket mode
+#
+# PASTA - Pack A Subtle Tap Abstraction
+#  for network namespace/tap device mode
+#
+# test/passt/dhcp - Check DHCP and DHCPv6 functionality in passt mode
+#
+# Copyright (c) 2021 Red Hat GmbH
+# Author: Stefano Brivio <sbrivio@redhat.com>
+
+gtools	ip jq dhclient sed tr
+htools	ip jq sed tr head
+
+set	NAT_NS4 192.0.2.2
+set	NAT_NS6 2001:db8:9a55::2
+
+test	Interface name
+gout	IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname'
+hout	HOST_IFNAME ip -j -4 route show|jq -rM '[.[] | select(.dst == "default").dev] | .[0]'
+hout	HOST_IFNAME6 ip -j -6 route show|jq -rM '[.[] | select(.dst == "default").dev] | .[0]'
+check	[ -n "__IFNAME__" ]
+
+test	DHCP: address
+guest	/sbin/dhclient -4 __IFNAME__
+gout	ADDR ip -j -4 addr show|jq -rM '.[] | select(.ifname == "__IFNAME__").addr_info[0].local'
+hout	HOST_ADDR ip -j -4 addr show|jq -rM '.[] | select(.ifname == "__HOST_IFNAME__").addr_info[0].local'
+check	[ "__ADDR__" = "__HOST_ADDR__" ]
+
+test	DHCP: route
+gout	GW ip -j -4 route show|jq -rM '.[] | select(.dst == "default").gateway'
+hout	HOST_GW ip -j -4 route show|jq -rM '[.[] | select(.dst == "default").gateway] | .[0]'
+check	[ "__GW__" = "__HOST_GW__" ]
+
+test	DHCP: MTU
+gout	MTU ip -j link show | jq -rM '.[] | select(.ifname == "__IFNAME__").mtu'
+check	[ __MTU__ = 65520 ]
+
+test	DHCP: DNS
+gout	DNS sed -n 's/^nameserver \([0-9]*\.\)\(.*\)/\1\2/p' /etc/resolv.conf | tr '\n' ',' | sed 's/,$//;s/$/\n/'
+hout	HOST_DNS sed -n 's/^nameserver \([0-9]*\.\)\(.*\)/\1\2/p' /etc/resolv.conf | head -n3 | tr '\n' ',' | sed 's/,$//;s/$/\n/'
+check	[ "__DNS__" = "__HOST_DNS__" ] || ( [ "__DNS__" = "__NAT_NS4__" ] && expr "__HOST_DNS__" : "127[.]" )
+
+# FQDNs should be terminated by dots, but the guest DHCP client might omit them:
+# strip them first
+test	DHCP: search list
+gout	SEARCH sed 's/\. / /g' /etc/resolv.conf | sed 's/\.$//g' | sed -n 's/^search \(.*\)/\1/p' | tr ' \n' ',' | sed 's/,$//;s/$/\n/'
+hout	HOST_SEARCH sed 's/\. / /g' /etc/resolv.conf | sed 's/\.$//g' | sed -n 's/^search \(.*\)/\1/p' | tr ' \n' ',' | sed 's/,$//;s/$/\n/'
+check	[ "__SEARCH__" = "__HOST_SEARCH__" ]
+
+test	DHCPv6: address
+guest	/sbin/dhclient -6 __IFNAME__
+gout	ADDR6 ip -j -6 addr show|jq -rM '[.[] | select(.ifname == "__IFNAME__").addr_info[] | select(.prefixlen == 128).local] | .[0]'
+hout	HOST_ADDR6 ip -j -6 addr show|jq -rM '[.[] | select(.ifname == "__HOST_IFNAME6__").addr_info[] | select(.scope == "global" and .deprecated != true).local] | .[0]'
+check	[ "__ADDR6__" = "__HOST_ADDR6__" ]
+
+test	DHCPv6: route
+gout	GW6 ip -j -6 route show|jq -rM '.[] | select(.dst == "default").gateway'
+hout	HOST_GW6 ip -j -6 route show|jq -rM '[.[] | select(.dst == "default").gateway] | .[0]'
+check	[ "__GW6__" = "__HOST_GW6__" ]
+
+# Strip interface specifier: interface names might differ between host and guest
+test	DHCPv6: DNS
+gout	DNS6 sed -n 's/^nameserver \([^:]*:\)\([^%]*\).*/\1\2/p' /etc/resolv.conf | tr '\n' ',' | sed 's/,$//;s/$/\n/'
+hout	HOST_DNS6 sed -n 's/^nameserver \([^:]*:\)\([^%]*\).*/\1\2/p' /etc/resolv.conf | tr '\n' ',' | sed 's/,$//;s/$/\n/'
+check	[ "__DNS6__" = "__HOST_DNS6__" ] || [ "__DNS6__" = "__NAT_NS6__" -a "__HOST_DNS6__" = "::1" ]
+
+test	DHCPv6: search list
+gout	SEARCH6 sed 's/\. / /g' /etc/resolv.conf | sed 's/\.$//g' | sed -n 's/^search \(.*\)/\1/p' | tr ' \n' ',' | sed 's/,$//;s/$/\n/'
+hout	HOST_SEARCH6 sed 's/\. / /g' /etc/resolv.conf | sed 's/\.$//g' | sed -n 's/^search \(.*\)/\1/p' | tr ' \n' ',' | sed 's/,$//;s/$/\n/'
+check	[ "__SEARCH6__" = "__HOST_SEARCH6__" ]
diff --git a/test/passt_in_ns/tcp b/test/passt_in_ns/tcp
index cdb7060c..919333ca 100644
--- a/test/passt_in_ns/tcp
+++ b/test/passt_in_ns/tcp
@@ -15,6 +15,11 @@ gtools	socat ip jq
 htools	socat ip jq
 nstools	socat ip jq
 
+set	NAT_HOST4 192.0.2.1
+set	NAT_HOST6 2001:db8:9a55::1
+set	NAT_NS4 192.0.2.2
+set	NAT_NS6 2001:db8:9a55::2
+
 set	TEMP_BIG __STATEDIR__/test_big.bin
 set	TEMP_SMALL __STATEDIR__/test_small.bin
 set	TEMP_NS_BIG __STATEDIR__/test_ns_big.bin
@@ -36,16 +41,15 @@ check	cmp __TEMP_NS_BIG__ __BASEPATH__/big.bin
 
 test	TCP/IPv4: guest to host: big transfer
 hostb	socat -u TCP4-LISTEN:10003 OPEN:__TEMP_BIG__,create,trunc
-gout	GW ip -j -4 route show|jq -rM '.[] | select(.dst == "default").gateway'
 sleep	1
-guest	socat -u OPEN:/root/big.bin TCP4:__GW__:10003
+guest	socat -u OPEN:/root/big.bin TCP4:__NAT_HOST4__:10003
 hostw
 check	cmp __TEMP_BIG__ __BASEPATH__/big.bin
 
 test	TCP/IPv4: guest to ns: big transfer
 nsb	socat -u TCP4-LISTEN:10002 OPEN:__TEMP_NS_BIG__,create,trunc
 sleep	1
-guest	socat -u OPEN:/root/big.bin TCP4:__GW__:10002
+guest	socat -u OPEN:/root/big.bin TCP4:__NAT_NS4__:10002
 nsw
 check	cmp __TEMP_NS_BIG__ __BASEPATH__/big.bin
 
@@ -59,7 +63,7 @@ check	cmp __TEMP_BIG__ __BASEPATH__/big.bin
 test	TCP/IPv4: ns to host (via tap): big transfer
 hostb	socat -u TCP4-LISTEN:10003 OPEN:__TEMP_BIG__,create,trunc
 sleep	1
-ns	socat -u OPEN:__BASEPATH__/big.bin TCP4:__GW__:10003
+ns	socat -u OPEN:__BASEPATH__/big.bin TCP4:__NAT_HOST4__:10003
 hostw
 check	cmp __TEMP_BIG__ __BASEPATH__/big.bin
 
@@ -95,16 +99,15 @@ check	cmp __TEMP_NS_SMALL__ __BASEPATH__/small.bin
 
 test	TCP/IPv4: guest to host: small transfer
 hostb	socat -u TCP4-LISTEN:10003 OPEN:__TEMP_SMALL__,create,trunc
-gout	GW ip -j -4 route show|jq -rM '.[] | select(.dst == "default").gateway'
 sleep	1
-guest	socat -u OPEN:/root/small.bin TCP4:__GW__:10003
+guest	socat -u OPEN:/root/small.bin TCP4:__NAT_HOST4__:10003
 hostw
 check	cmp __TEMP_SMALL__ __BASEPATH__/small.bin
 
 test	TCP/IPv4: guest to ns: small transfer
 nsb	socat -u TCP4-LISTEN:10002 OPEN:__TEMP_NS_SMALL__,create,trunc
 sleep	1
-guest	socat -u OPEN:/root/small.bin TCP4:__GW__:10002
+guest	socat -u OPEN:/root/small.bin TCP4:__NAT_NS4__:10002
 nsw
 check	cmp __TEMP_NS_SMALL__ __BASEPATH__/small.bin
 
@@ -118,7 +121,7 @@ check	cmp __TEMP_SMALL__ __BASEPATH__/small.bin
 test	TCP/IPv4: ns to host (via tap): small transfer
 hostb	socat -u TCP4-LISTEN:10003 OPEN:__TEMP_SMALL__,create,trunc
 sleep	1
-ns	socat -u OPEN:__BASEPATH__/small.bin TCP4:__GW__:10003
+ns	socat -u OPEN:__BASEPATH__/small.bin TCP4:__NAT_HOST4__:10003
 hostw
 check	cmp __TEMP_SMALL__ __BASEPATH__/small.bin
 
@@ -152,17 +155,15 @@ check	cmp __TEMP_NS_BIG__ __BASEPATH__/big.bin
 
 test	TCP/IPv6: guest to host: big transfer
 hostb	socat -u TCP6-LISTEN:10003 OPEN:__TEMP_BIG__,create,trunc
-gout	GW6 ip -j -6 route show|jq -rM '.[] | select(.dst == "default").gateway'
-gout	IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname'
 sleep	1
-guest	socat -u OPEN:/root/big.bin TCP6:[__GW6__%__IFNAME__]:10003
+guest	socat -u OPEN:/root/big.bin TCP6:[__NAT_HOST6__]:10003
 hostw
 check	cmp __TEMP_BIG__ __BASEPATH__/big.bin
 
 test	TCP/IPv6: guest to ns: big transfer
 nsb	socat -u TCP6-LISTEN:10002 OPEN:__TEMP_NS_BIG__,create,trunc
 sleep	1
-guest	socat -u OPEN:/root/big.bin TCP6:[__GW6__%__IFNAME__]:10002
+guest	socat -u OPEN:/root/big.bin TCP6:[__NAT_NS6__]:10002
 nsw
 check	cmp __TEMP_NS_BIG__ __BASEPATH__/big.bin
 
@@ -175,9 +176,8 @@ check	cmp __TEMP_BIG__ __BASEPATH__/big.bin
 
 test	TCP/IPv6: ns to host (via tap): big transfer
 hostb	socat -u TCP6-LISTEN:10003 OPEN:__TEMP_BIG__,create,trunc
-nsout	IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname'
 sleep	1
-ns	socat -u OPEN:__BASEPATH__/big.bin TCP6:[__GW6__%__IFNAME__]:10003
+ns	socat -u OPEN:__BASEPATH__/big.bin TCP6:[__NAT_HOST6__]:10003
 hostw
 check	cmp __TEMP_BIG__ __BASEPATH__/big.bin
 
@@ -190,6 +190,7 @@ guest	cmp test_big.bin /root/big.bin
 
 test	TCP/IPv6: ns to guest (using namespace address): big transfer
 guestb	socat -u TCP6-LISTEN:10001 OPEN:test_big.bin,create,trunc
+nsout	IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname'
 nsout	ADDR6 ip -j -6 addr show|jq -rM '.[] | select(.ifname == "__IFNAME__").addr_info[0].local'
 sleep	1
 ns	socat -u OPEN:__BASEPATH__/big.bin TCP6:[__ADDR6__]:10001
@@ -212,17 +213,15 @@ check	cmp __TEMP_NS_SMALL__ __BASEPATH__/small.bin
 
 test	TCP/IPv6: guest to host: small transfer
 hostb	socat -u TCP6-LISTEN:10003 OPEN:__TEMP_SMALL__,create,trunc
-gout	GW6 ip -j -6 route show|jq -rM '.[] | select(.dst == "default").gateway'
-gout	IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname'
 sleep	1
-guest	socat -u OPEN:/root/small.bin TCP6:[__GW6__%__IFNAME__]:10003
+guest	socat -u OPEN:/root/small.bin TCP6:[__NAT_HOST6__]:10003
 hostw
 check	cmp __TEMP_SMALL__ __BASEPATH__/small.bin
 
 test	TCP/IPv6: guest to ns: small transfer
 nsb	socat -u TCP6-LISTEN:10002 OPEN:__TEMP_NS_SMALL__
 sleep	1
-guest	socat -u OPEN:/root/small.bin TCP6:[__GW6__%__IFNAME__]:10002
+guest	socat -u OPEN:/root/small.bin TCP6:[__NAT_NS6__]:10002
 nsw
 check	cmp __TEMP_NS_SMALL__ __BASEPATH__/small.bin
 
@@ -235,9 +234,8 @@ check	cmp __TEMP_SMALL__ __BASEPATH__/small.bin
 
 test	TCP/IPv6: ns to host (via tap): small transfer
 hostb	socat -u TCP6-LISTEN:10003 OPEN:__TEMP_SMALL__,create,trunc
-nsout	IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname'
 sleep	1
-ns	socat -u OPEN:__BASEPATH__/small.bin TCP6:[__GW6__%__IFNAME__]:10003
+ns	socat -u OPEN:__BASEPATH__/small.bin TCP6:[__NAT_HOST6__]:10003
 hostw
 check	cmp __TEMP_SMALL__ __BASEPATH__/small.bin
 
diff --git a/test/passt_in_ns/udp b/test/passt_in_ns/udp
index 8a025131..0e3574f5 100644
--- a/test/passt_in_ns/udp
+++ b/test/passt_in_ns/udp
@@ -15,6 +15,11 @@ gtools	socat ip jq
 nstools	socat ip jq
 htools	socat ip jq
 
+set	NAT_HOST4 192.0.2.1
+set	NAT_HOST6 2001:db8:9a55::1
+set	NAT_NS4 192.0.2.2
+set	NAT_NS6 2001:db8:9a55::2
+
 set	TEMP __STATEDIR__/test.bin
 set	TEMP_NS __STATEDIR__/test_ns.bin
 
@@ -34,16 +39,15 @@ check	cmp __TEMP_NS__ __BASEPATH__/medium.bin
 
 test	UDP/IPv4: guest to host
 hostb	socat -u UDP4-LISTEN:10003,null-eof OPEN:__TEMP__,create,trunc
-gout	GW ip -j -4 route show|jq -rM '.[] | select(.dst == "default").gateway'
 sleep	1
-guest	socat -u OPEN:/root/medium.bin UDP4:__GW__:10003,shut-null
+guest	socat -u OPEN:/root/medium.bin UDP4:__NAT_HOST4__:10003,shut-null
 hostw
 check	cmp __TEMP__ __BASEPATH__/medium.bin
 
 test	UDP/IPv4: guest to ns
 nsb	socat -u UDP4-LISTEN:10002,null-eof OPEN:__TEMP_NS__,create,trunc
 sleep	1
-guest	socat -u OPEN:/root/medium.bin UDP4:__GW__:10002,shut-null
+guest	socat -u OPEN:/root/medium.bin UDP4:__NAT_NS4__:10002,shut-null
 nsw
 check	cmp __TEMP_NS__ __BASEPATH__/medium.bin
 
@@ -57,7 +61,7 @@ check	cmp __TEMP__ __BASEPATH__/medium.bin
 test	UDP/IPv4: ns to host (via tap)
 hostb	socat -u UDP4-LISTEN:10003,null-eof OPEN:__TEMP__,create,trunc
 sleep	1
-ns	socat -u OPEN:__BASEPATH__/medium.bin UDP4:__GW__:10003,shut-null
+ns	socat -u OPEN:__BASEPATH__/medium.bin UDP4:__NAT_HOST4__:10003,shut-null
 hostw
 check	cmp __TEMP__ __BASEPATH__/medium.bin
 
@@ -93,17 +97,15 @@ check	cmp __TEMP_NS__ __BASEPATH__/medium.bin
 
 test	UDP/IPv6: guest to host
 hostb	socat -u UDP6-LISTEN:10003,null-eof OPEN:__TEMP__,create,trunc
-gout	GW6 ip -j -6 route show|jq -rM '.[] | select(.dst == "default").gateway'
-gout	IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname'
 sleep	1
-guest	socat -u OPEN:/root/medium.bin UDP6:[__GW6__%__IFNAME__]:10003,shut-null
+guest	socat -u OPEN:/root/medium.bin UDP6:[__NAT_HOST6__]:10003,shut-null
 hostw
 check	cmp __TEMP__ __BASEPATH__/medium.bin
 
 test	UDP/IPv6: guest to ns
 nsb	socat -u UDP6-LISTEN:10002,null-eof OPEN:__TEMP_NS__,create,trunc
 sleep	1
-guest	socat -u OPEN:/root/medium.bin UDP6:[__GW6__%__IFNAME__]:10002,shut-null
+guest	socat -u OPEN:/root/medium.bin UDP6:[__NAT_NS6__]:10002,shut-null
 nsw
 check	cmp __TEMP_NS__ __BASEPATH__/medium.bin
 
@@ -116,9 +118,8 @@ check	cmp __TEMP__ __BASEPATH__/medium.bin
 
 test	UDP/IPv6: ns to host (via tap)
 hostb	socat -u UDP6-LISTEN:10003,null-eof OPEN:__TEMP__,create,trunc
-nsout	IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname'
 sleep	1
-ns	socat -u OPEN:__BASEPATH__/medium.bin UDP6:[__GW6__%__IFNAME__]:10003,shut-null
+ns	socat -u OPEN:__BASEPATH__/medium.bin UDP6:[__NAT_HOST6__]:10003,shut-null
 hostw
 check	cmp __TEMP__ __BASEPATH__/medium.bin
 
@@ -131,6 +132,7 @@ guest	cmp test.bin /root/medium.bin
 
 test	UDP/IPv6: ns to guest (using namespace address)
 guestb	socat -u UDP6-LISTEN:10001,null-eof OPEN:test.bin,create,trunc
+nsout	IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname'
 nsout	ADDR6 ip -j -6 addr show|jq -rM '.[] | select(.ifname == "__IFNAME__").addr_info[0].local'
 sleep	1
 ns	socat -u OPEN:__BASEPATH__/medium.bin UDP6:[__ADDR6__]:10001,shut-null
diff --git a/test/perf/passt_tcp b/test/perf/passt_tcp
index 695479f3..ae03c7df 100644
--- a/test/perf/passt_tcp
+++ b/test/perf/passt_tcp
@@ -15,6 +15,9 @@ gtools	/sbin/sysctl ip jq nproc seq sleep iperf3 tcp_rr tcp_crr # From neper
 nstools	/sbin/sysctl ip jq nproc seq sleep iperf3 tcp_rr tcp_crr
 htools	bc head sed seq
 
+set	NAT_NS4 192.0.2.2
+set	NAT_NS6 2001:db8:9a55::2
+
 test	passt: throughput and latency
 
 guest	/sbin/sysctl -w net.core.rmem_max=536870912
@@ -29,8 +32,6 @@ ns	/sbin/sysctl -w net.ipv4.tcp_rmem="4096 524288 134217728"
 ns	/sbin/sysctl -w net.ipv4.tcp_wmem="4096 524288 134217728"
 ns	/sbin/sysctl -w net.ipv4.tcp_timestamps=0
 
-gout	GW ip -j -4 route show|jq -rM '.[] | select(.dst == "default").gateway'
-gout	GW6 ip -j -6 route show|jq -rM '.[] | select(.dst == "default").gateway'
 gout	IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname'
 
 hout	FREQ_PROCFS (echo "scale=1"; sed -n 's/cpu MHz.*: \([0-9]*\)\..*$/(\1+10^2\/2)\/10^3/p' /proc/cpuinfo) | bc -l | head -n1
@@ -54,16 +55,16 @@ iperf3s	ns 10002
 bw	-
 bw	-
 guest	ip link set dev __IFNAME__ mtu 1280
-iperf3	BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -w 4M
+iperf3	BW guest __NAT_NS6__ 10002 __TIME__ __OPTS__ -w 4M
 bw	__BW__ 1.2 1.5
 guest	ip link set dev __IFNAME__ mtu 1500
-iperf3	BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -w 4M
+iperf3	BW guest __NAT_NS6__ 10002 __TIME__ __OPTS__ -w 4M
 bw	__BW__ 1.6 1.8
 guest	ip link set dev __IFNAME__ mtu 9000
-iperf3	BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -w 8M
+iperf3	BW guest __NAT_NS6__ 10002 __TIME__ __OPTS__ -w 8M
 bw	__BW__ 4.0 5.0
 guest	ip link set dev __IFNAME__ mtu 65520
-iperf3	BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -w 16M
+iperf3	BW guest __NAT_NS6__ 10002 __TIME__ __OPTS__ -w 16M
 bw	__BW__ 7.0 8.0
 
 iperf3k	ns
@@ -75,7 +76,7 @@ lat	-
 lat	-
 lat	-
 nsb	tcp_rr --nolog -6
-gout	LAT tcp_rr --nolog -l1 -6 -c -H __GW6__%__IFNAME__ | sed -n 's/^throughput=\(.*\)/\1/p'
+gout	LAT tcp_rr --nolog -l1 -6 -c -H __NAT_NS6__ | sed -n 's/^throughput=\(.*\)/\1/p'
 lat	__LAT__ 200 150
 
 tl	TCP CRR latency over IPv6: guest to host
@@ -85,29 +86,29 @@ lat	-
 lat	-
 lat	-
 nsb	tcp_crr --nolog -6
-gout	LAT tcp_crr --nolog -l1 -6 -c -H __GW6__%__IFNAME__ | sed -n 's/^throughput=\(.*\)/\1/p'
+gout	LAT tcp_crr --nolog -l1 -6 -c -H __NAT_NS6__ | sed -n 's/^throughput=\(.*\)/\1/p'
 lat	__LAT__ 500 400
 
 tr	TCP throughput over IPv4: guest to host
 iperf3s	ns 10002
 
 guest	ip link set dev __IFNAME__ mtu 256
-iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -w 1M
+iperf3	BW guest __NAT_NS4__ 10002 __TIME__ __OPTS__ -w 1M
 bw	__BW__ 0.2 0.3
 guest	ip link set dev __IFNAME__ mtu 576
-iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -w 1M
+iperf3	BW guest __NAT_NS4__ 10002 __TIME__ __OPTS__ -w 1M
 bw	__BW__ 0.5 0.8
 guest	ip link set dev __IFNAME__ mtu 1280
-iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -w 4M
+iperf3	BW guest __NAT_NS4__ 10002 __TIME__ __OPTS__ -w 4M
 bw	__BW__ 1.2 1.5
 guest	ip link set dev __IFNAME__ mtu 1500
-iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -w 4M
+iperf3	BW guest __NAT_NS4__ 10002 __TIME__ __OPTS__ -w 4M
 bw	__BW__ 1.6 1.8
 guest	ip link set dev __IFNAME__ mtu 9000
-iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -w 8M
+iperf3	BW guest __NAT_NS4__ 10002 __TIME__ __OPTS__ -w 8M
 bw	__BW__ 4.0 5.0
 guest	ip link set dev __IFNAME__ mtu 65520
-iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -w 16M
+iperf3	BW guest __NAT_NS4__ 10002 __TIME__ __OPTS__ -w 16M
 bw	__BW__ 7.0 8.0
 
 iperf3k	ns
@@ -119,7 +120,7 @@ lat	-
 lat	-
 lat	-
 nsb	tcp_rr --nolog -4
-gout	LAT tcp_rr --nolog -l1 -4 -c -H __GW__ | sed -n 's/^throughput=\(.*\)/\1/p'
+gout	LAT tcp_rr --nolog -l1 -4 -c -H __NAT_NS4__ | sed -n 's/^throughput=\(.*\)/\1/p'
 lat	__LAT__ 200 150
 
 tl	TCP CRR latency over IPv4: guest to host
@@ -129,7 +130,7 @@ lat	-
 lat	-
 lat	-
 nsb	tcp_crr --nolog -4
-gout	LAT tcp_crr --nolog -l1 -4 -c -H __GW__ | sed -n 's/^throughput=\(.*\)/\1/p'
+gout	LAT tcp_crr --nolog -l1 -4 -c -H __NAT_NS4__ | sed -n 's/^throughput=\(.*\)/\1/p'
 lat	__LAT__ 500 400
 
 tr	TCP throughput over IPv6: host to guest
diff --git a/test/perf/passt_udp b/test/perf/passt_udp
index f25c9033..2160797c 100644
--- a/test/perf/passt_udp
+++ b/test/perf/passt_udp
@@ -15,6 +15,9 @@ gtools	/sbin/sysctl ip jq nproc sleep iperf3 udp_rr # From neper
 nstools	ip jq sleep iperf3 udp_rr
 htools	bc head sed
 
+set	NAT_NS4 192.0.2.2
+set	NAT_NS6 2001:db8:9a55::2
+
 test	passt: throughput and latency
 
 guest	/sbin/sysctl -w net.core.rmem_max=16777216
@@ -22,10 +25,6 @@ guest	/sbin/sysctl -w net.core.wmem_max=16777216
 guest	/sbin/sysctl -w net.core.rmem_default=16777216
 guest	/sbin/sysctl -w net.core.wmem_default=16777216
 
-gout	GW ip -j -4 route show|jq -rM '.[] | select(.dst == "default").gateway'
-gout	GW6 ip -j -6 route show|jq -rM '.[] | select(.dst == "default").gateway'
-gout	IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname'
-
 hout	FREQ_PROCFS (echo "scale=1"; sed -n 's/cpu MHz.*: \([0-9]*\)\..*$/(\1+10^2\/2)\/10^3/p' /proc/cpuinfo) | bc -l | head -n1
 hout	FREQ_CPUFREQ (echo "scale=1"; printf '( %i + 10^5 / 2 ) / 10^6\n' $(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq) ) | bc -l
 hout	FREQ [ -n "__FREQ_CPUFREQ__" ] && echo __FREQ_CPUFREQ__ || echo __FREQ_PROCFS__
@@ -46,13 +45,13 @@ iperf3s	ns 10002
 
 bw	-
 bw	-
-iperf3	BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -b 3G -l 1232
+iperf3	BW guest __NAT_NS6__ 10002 __TIME__ __OPTS__ -b 3G -l 1232
 bw	__BW__ 0.8 1.2
-iperf3	BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -b 4G -l 1452
+iperf3	BW guest __NAT_NS6__ 10002 __TIME__ __OPTS__ -b 4G -l 1452
 bw	__BW__ 1.0 1.5
-iperf3	BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -b 8G -l 8952
+iperf3	BW guest __NAT_NS6__ 10002 __TIME__ __OPTS__ -b 8G -l 8952
 bw	__BW__ 4.0 5.0
-iperf3	BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -b 15G -l 64372
+iperf3	BW guest __NAT_NS6__ 10002 __TIME__ __OPTS__ -b 15G -l 64372
 bw	__BW__ 4.0 5.0
 
 iperf3k	ns
@@ -64,7 +63,7 @@ lat	-
 lat	-
 lat	-
 nsb	udp_rr --nolog -6
-gout	LAT udp_rr --nolog -6 -c -H __GW6__%__IFNAME__ | sed -n 's/^throughput=\(.*\)/\1/p'
+gout	LAT udp_rr --nolog -6 -c -H __NAT_NS6__ | sed -n 's/^throughput=\(.*\)/\1/p'
 lat	__LAT__ 200 150
 
 
@@ -72,17 +71,17 @@ tr	UDP throughput over IPv4: guest to host
 iperf3s	ns 10002
 # (datagram size) = (packet size) - 28: 20 bytes of IPv4 header, 8 of UDP header
 
-iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -b 1G -l 228
+iperf3	BW guest __NAT_NS4__ 10002 __TIME__ __OPTS__ -b 1G -l 228
 bw	__BW__ 0.0 0.0
-iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -b 2G -l 548
+iperf3	BW guest __NAT_NS4__ 10002 __TIME__ __OPTS__ -b 2G -l 548
 bw	__BW__ 0.4 0.6
-iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -b 3G -l 1252
+iperf3	BW guest __NAT_NS4__ 10002 __TIME__ __OPTS__ -b 3G -l 1252
 bw	__BW__ 0.8 1.2
-iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -b 4G -l 1472
+iperf3	BW guest __NAT_NS4__ 10002 __TIME__ __OPTS__ -b 4G -l 1472
 bw	__BW__ 1.0 1.5
-iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -b 8G -l 8972
+iperf3	BW guest __NAT_NS4__ 10002 __TIME__ __OPTS__ -b 8G -l 8972
 bw	__BW__ 4.0 5.0
-iperf3	BW guest __GW__ 10002 __TIME__ __OPTS__ -b 15G -l 65492
+iperf3	BW guest __NAT_NS4__ 10002 __TIME__ __OPTS__ -b 15G -l 65492
 bw	__BW__ 4.0 5.0
 
 iperf3k	ns
@@ -94,7 +93,7 @@ lat	-
 lat	-
 lat	-
 nsb	udp_rr --nolog -4
-gout	LAT udp_rr --nolog -4 -c -H __GW__ | sed -n 's/^throughput=\(.*\)/\1/p'
+gout	LAT udp_rr --nolog -4 -c -H __NAT_NS4__ | sed -n 's/^throughput=\(.*\)/\1/p'
 lat	__LAT__ 200 150
 
 
diff --git a/test/perf/pasta_tcp b/test/perf/pasta_tcp
index a443f5a9..a6ea062c 100644
--- a/test/perf/pasta_tcp
+++ b/test/perf/pasta_tcp
@@ -14,6 +14,9 @@
 htools	head ip seq bc sleep iperf3 tcp_rr tcp_crr jq sed
 nstools	/sbin/sysctl nproc ip seq sleep iperf3 tcp_rr tcp_crr jq sed
 
+set	NAT_HOST4 192.0.2.1
+set	NAT_HOST6 2001:db8:9a55::1
+
 test	pasta: throughput and latency (local connections)
 
 ns	/sbin/sysctl -w net.ipv4.tcp_rmem="131072 524288 134217728"
@@ -122,8 +125,6 @@ te
 
 test	pasta: throughput and latency (connections via tap)
 
-nsout	GW ip -j -4 route show|jq -rM '.[] | select(.dst == "default").gateway'
-nsout	GW6 ip -j -6 route show|jq -rM '.[] | select(.dst == "default").gateway'
 nsout	IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname'
 set	THREADS 2
 set	OPTS -Z -P __THREADS__ -i1 -O__OMIT__
@@ -137,16 +138,16 @@ tr	TCP throughput over IPv6: ns to host
 iperf3s	host 10003
 
 ns	ip link set dev __IFNAME__ mtu 1500
-iperf3	BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -w 512k
+iperf3	BW ns __NAT_HOST6__ 10003 __TIME__ __OPTS__ -w 512k
 bw	__BW__ 0.2 0.4
 ns	ip link set dev __IFNAME__ mtu 4000
-iperf3	BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -w 1M
+iperf3	BW ns __NAT_HOST6__ 10003 __TIME__ __OPTS__ -w 1M
 bw	__BW__ 0.3 0.5
 ns	ip link set dev __IFNAME__ mtu 16384
-iperf3	BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -w 8M
+iperf3	BW ns __NAT_HOST6__ 10003 __TIME__ __OPTS__ -w 8M
 bw	__BW__ 1.5 2.0
 ns	ip link set dev __IFNAME__ mtu 65520
-iperf3	BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -w 8M
+iperf3	BW ns __NAT_HOST6__ 10003 __TIME__ __OPTS__ -w 8M
 bw	__BW__ 2.0 2.5
 
 iperf3k	host
@@ -156,7 +157,7 @@ lat	-
 lat	-
 lat	-
 hostb	tcp_rr --nolog -P 10003 -C 10013 -6
-nsout	LAT tcp_rr --nolog -l1 -P 10003 -C 10013 -6 -c -H __GW6__%__IFNAME__ | sed -n 's/^throughput=\(.*\)/\1/p'
+nsout	LAT tcp_rr --nolog -l1 -P 10003 -C 10013 -6 -c -H __NAT_HOST6__ | sed -n 's/^throughput=\(.*\)/\1/p'
 hostw
 lat	__LAT__ 150 100
 
@@ -165,7 +166,7 @@ lat	-
 lat	-
 lat	-
 hostb	tcp_crr --nolog -P 10003 -C 10013 -6
-nsout	LAT tcp_crr --nolog -l1 -P 10003 -C 10013 -6 -c -H __GW6__%__IFNAME__ | sed -n 's/^throughput=\(.*\)/\1/p'
+nsout	LAT tcp_crr --nolog -l1 -P 10003 -C 10013 -6 -c -H __NAT_HOST6__ | sed -n 's/^throughput=\(.*\)/\1/p'
 hostw
 lat	__LAT__ 1500 500
 
@@ -174,16 +175,16 @@ tr	TCP throughput over IPv4: ns to host
 iperf3s	host 10003
 
 ns	ip link set dev __IFNAME__ mtu 1500
-iperf3	BW ns __GW__ 10003 __TIME__ __OPTS__ -w 512k
+iperf3	BW ns __NAT_HOST4__ 10003 __TIME__ __OPTS__ -w 512k
 bw	__BW__ 0.2 0.4
 ns	ip link set dev __IFNAME__ mtu 4000
-iperf3	BW ns __GW__ 10003 __TIME__ __OPTS__ -w 1M
+iperf3	BW ns __NAT_HOST4__ 10003 __TIME__ __OPTS__ -w 1M
 bw	__BW__ 0.3 0.5
 ns	ip link set dev __IFNAME__ mtu 16384
-iperf3	BW ns __GW__ 10003 __TIME__ __OPTS__ -w 8M
+iperf3	BW ns __NAT_HOST4__ 10003 __TIME__ __OPTS__ -w 8M
 bw	__BW__ 1.5 2.0
 ns	ip link set dev __IFNAME__ mtu 65520
-iperf3	BW ns __GW__ 10003 __TIME__ __OPTS__ -w 8M
+iperf3	BW ns __NAT_HOST4__ 10003 __TIME__ __OPTS__ -w 8M
 bw	__BW__ 2.0 2.5
 
 iperf3k	host
@@ -193,7 +194,7 @@ lat	-
 lat	-
 lat	-
 hostb	tcp_rr --nolog -P 10003 -C 10013 -4
-nsout	LAT tcp_rr --nolog -l1 -P 10003 -C 10013 -4 -c -H __GW__ | sed -n 's/^throughput=\(.*\)/\1/p'
+nsout	LAT tcp_rr --nolog -l1 -P 10003 -C 10013 -4 -c -H __NAT_HOST4__ | sed -n 's/^throughput=\(.*\)/\1/p'
 hostw
 lat	__LAT__ 150 100
 
@@ -202,7 +203,7 @@ lat	-
 lat	-
 lat	-
 hostb	tcp_crr --nolog -P 10003 -C 10013 -4
-nsout	LAT tcp_crr --nolog -l1 -P 10003 -C 10013 -4 -c -H __GW__ | sed -n 's/^throughput=\(.*\)/\1/p'
+nsout	LAT tcp_crr --nolog -l1 -P 10003 -C 10013 -4 -c -H __NAT_HOST4__ | sed -n 's/^throughput=\(.*\)/\1/p'
 hostw
 lat	__LAT__ 1500 500
 
diff --git a/test/perf/pasta_udp b/test/perf/pasta_udp
index 9fed62e4..146e41b8 100644
--- a/test/perf/pasta_udp
+++ b/test/perf/pasta_udp
@@ -14,6 +14,9 @@
 htools	bc head ip sleep iperf3 udp_rr jq sed
 nstools	ip sleep iperf3 udp_rr jq sed
 
+set	NAT_HOST4 192.0.2.1
+set	NAT_HOST6 2001:db8:9a55::1
+
 test	pasta: throughput and latency (local traffic)
 
 hout	FREQ_PROCFS (echo "scale=1"; sed -n 's/cpu MHz.*: \([0-9]*\)\..*$/(\1+10^2\/2)\/10^3/p' /proc/cpuinfo) | bc -l | head -n1
@@ -133,8 +136,6 @@ te
 
 test	pasta: throughput and latency (traffic via tap)
 
-nsout	GW ip -j -4 route show|jq -rM '.[] | select(.dst == "default").gateway'
-nsout	GW6 ip -j -6 route show|jq -rM '.[] | select(.dst == "default").gateway'
 nsout	IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname'
 
 info	Throughput in Gbps, latency in µs, one thread at __FREQ__ GHz
@@ -146,13 +147,13 @@ tr	UDP throughput over IPv6: ns to host
 iperf3s	host 10003
 # (datagram size) = (packet size) - 48: 40 bytes of IPv6 header, 8 of UDP header
 
-iperf3	BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -b 8G -l 1472
+iperf3	BW ns __NAT_HOST6__ 10003 __TIME__ __OPTS__ -b 8G -l 1472
 bw	__BW__ 0.3 0.5
-iperf3	BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -b 12G -l 3972
+iperf3	BW ns __NAT_HOST6__ 10003 __TIME__ __OPTS__ -b 12G -l 3972
 bw	__BW__ 0.5 0.8
-iperf3	BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -b 20G -l 16356
+iperf3	BW ns __NAT_HOST6__ 10003 __TIME__ __OPTS__ -b 20G -l 16356
 bw	__BW__ 3.0 4.0
-iperf3	BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -b 30G -l 65472
+iperf3	BW ns __NAT_HOST6__ 10003 __TIME__ __OPTS__ -b 30G -l 65472
 bw	__BW__ 6.0 7.0
 
 iperf3k	host
@@ -162,7 +163,7 @@ lat	-
 lat	-
 lat	-
 hostb	udp_rr --nolog -P 10003 -C 10013 -6
-nsout	LAT udp_rr --nolog -P 10003 -C 10013 -6 -c -H __GW6__%__IFNAME__ | sed -n 's/^throughput=\(.*\)/\1/p'
+nsout	LAT udp_rr --nolog -P 10003 -C 10013 -6 -c -H __NAT_HOST6__ | sed -n 's/^throughput=\(.*\)/\1/p'
 hostw
 lat	__LAT__ 200 150
 
@@ -171,13 +172,13 @@ tr	UDP throughput over IPv4: ns to host
 iperf3s	host 10003
 # (datagram size) = (packet size) - 28: 20 bytes of IPv4 header, 8 of UDP header
 
-iperf3	BW ns __GW__ 10003 __TIME__ __OPTS__ -b 8G -l 1472
+iperf3	BW ns __NAT_HOST4__ 10003 __TIME__ __OPTS__ -b 8G -l 1472
 bw	__BW__ 0.3 0.5
-iperf3	BW ns __GW__ 10003 __TIME__ __OPTS__ -b 12G -l 3972
+iperf3	BW ns __NAT_HOST4__ 10003 __TIME__ __OPTS__ -b 12G -l 3972
 bw	__BW__ 0.5 0.8
-iperf3	BW ns __GW__ 10003 __TIME__ __OPTS__ -b 20G -l 16356
+iperf3	BW ns __NAT_HOST4__ 10003 __TIME__ __OPTS__ -b 20G -l 16356
 bw	__BW__ 3.0 4.0
-iperf3	BW ns __GW__ 10003 __TIME__ __OPTS__ -b 30G -l 65492
+iperf3	BW ns __NAT_HOST4__ 10003 __TIME__ __OPTS__ -b 30G -l 65492
 bw	__BW__ 6.0 7.0
 
 iperf3k	host
@@ -187,7 +188,7 @@ lat	-
 lat	-
 lat	-
 hostb	udp_rr --nolog -P 10003 -C 10013 -4
-nsout	LAT udp_rr --nolog -P 10003 -C 10013 -4 -c -H __GW__ | sed -n 's/^throughput=\(.*\)/\1/p'
+nsout	LAT udp_rr --nolog -P 10003 -C 10013 -4 -c -H __NAT_HOST4__ | sed -n 's/^throughput=\(.*\)/\1/p'
 hostw
 lat	__LAT__ 200 150
 
diff --git a/test/run b/test/run
index 3b376639..cd6d7076 100755
--- a/test/run
+++ b/test/run
@@ -101,7 +101,7 @@ run() {
 	VALGRIND=1
 	setup passt_in_ns
 	test passt/ndp
-	test passt/dhcp
+	test passt_in_ns/dhcp
 	test passt_in_ns/icmp
 	test passt_in_ns/tcp
 	test passt_in_ns/udp
@@ -115,7 +115,7 @@ run() {
 	VALGRIND=0
 	setup passt_in_ns
 	test passt/ndp
-	test passt/dhcp
+	test passt_in_ns/dhcp
 	test perf/passt_tcp
 	test perf/passt_udp
 	test perf/pasta_tcp
-- 
@@ -101,7 +101,7 @@ run() {
 	VALGRIND=1
 	setup passt_in_ns
 	test passt/ndp
-	test passt/dhcp
+	test passt_in_ns/dhcp
 	test passt_in_ns/icmp
 	test passt_in_ns/tcp
 	test passt_in_ns/udp
@@ -115,7 +115,7 @@ run() {
 	VALGRIND=0
 	setup passt_in_ns
 	test passt/ndp
-	test passt/dhcp
+	test passt_in_ns/dhcp
 	test perf/passt_tcp
 	test perf/passt_udp
 	test perf/pasta_tcp
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 21/22] fwd: Distinguish translatable from untranslatable addresses on inbound
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
                   ` (19 preceding siblings ...)
  2024-08-16  5:40 ` [PATCH 20/22] conf: Allow address remapped to host to be configured David Gibson
@ 2024-08-16  5:40 ` David Gibson
  2024-08-16  5:40 ` [PATCH 22/22] fwd, conf: Allow NAT of the guest's assigned address David Gibson
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-16  5:40 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger, David Gibson

fwd_nat_from_host() needs to adjust the source address for new flows coming
from an address which is not accessible to the guest.  Currently we always
use our_tap_addr or our_tap_ll.  However in cases where the address is
accessible to the guest via translation (i.e. via --nat-host-loopback) then
it makes more sense to use that translation, rather than the fallback
mapping of our_tap_*.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 fwd.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fwd.c b/fwd.c
index 779278a9..7718f7e2 100644
--- a/fwd.c
+++ b/fwd.c
@@ -386,7 +386,14 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto,
 		return PIF_SPLICE;
 	}
 
-	if (!fwd_guest_accessible(c, &ini->eaddr)) {
+	if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.nat_host_loopback) &&
+	    inany_equals4(&ini->eaddr, &in4addr_loopback)) {
+		/* Specifically 127.0.0.1, not 127.0.0.0/8 */
+		tgt->oaddr = inany_from_v4(c->ip4.nat_host_loopback);
+	} else if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.nat_host_loopback) &&
+		   inany_equals6(&ini->eaddr, &in6addr_loopback)) {
+		tgt->oaddr.a6 = c->ip6.nat_host_loopback;
+	} else if (!fwd_guest_accessible(c, &ini->eaddr)) {
 		if (inany_v4(&ini->eaddr)) {
 			if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.our_tap_addr))
 				/* No source address we can use */
-- 
@@ -386,7 +386,14 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto,
 		return PIF_SPLICE;
 	}
 
-	if (!fwd_guest_accessible(c, &ini->eaddr)) {
+	if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.nat_host_loopback) &&
+	    inany_equals4(&ini->eaddr, &in4addr_loopback)) {
+		/* Specifically 127.0.0.1, not 127.0.0.0/8 */
+		tgt->oaddr = inany_from_v4(c->ip4.nat_host_loopback);
+	} else if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.nat_host_loopback) &&
+		   inany_equals6(&ini->eaddr, &in6addr_loopback)) {
+		tgt->oaddr.a6 = c->ip6.nat_host_loopback;
+	} else if (!fwd_guest_accessible(c, &ini->eaddr)) {
 		if (inany_v4(&ini->eaddr)) {
 			if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.our_tap_addr))
 				/* No source address we can use */
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 22/22] fwd, conf: Allow NAT of the guest's assigned address
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
                   ` (20 preceding siblings ...)
  2024-08-16  5:40 ` [PATCH 21/22] fwd: Distinguish translatable from untranslatable addresses on inbound David Gibson
@ 2024-08-16  5:40 ` David Gibson
  2024-08-20 19:56   ` Stefano Brivio
  2024-08-16 14:45 ` [PATCH 00/22] RFC: Allow configuration of special case NATs Paul Holzinger
  2024-08-19  8:46 ` David Gibson
  23 siblings, 1 reply; 55+ messages in thread
From: David Gibson @ 2024-08-16  5:40 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger, David Gibson

The guest is usually assigned one of the host's IP addresses.  That means
it can't access the host itself via its usual address.  The
--nat-host-loopback option (enabled by default with the gateway address)
allows the guest to contact the host.  However, connections forwarded this
way appear on the host to have originated from the loopback interface,
which isn't always desirable.

Add a new --nat-guest-addr option, which acts similarly but forwarded
connections will go to the host's external address, instead of loopback.

If '-a' is used, so the guest's address is not the same as the host's, this
will instead forward to whatever host-visible site is shadowed by the
guest's assigned address.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c  | 51 ++++++++++++++++++++++++++++++++++-----------------
 fwd.c   | 10 ++++++++++
 passt.1 | 15 +++++++++++++++
 passt.h |  6 ++++++
 4 files changed, 65 insertions(+), 17 deletions(-)

diff --git a/conf.c b/conf.c
index c5831e82..d14abc63 100644
--- a/conf.c
+++ b/conf.c
@@ -825,6 +825,14 @@ static void usage(const char *name, FILE *f, int status)
 	        "    Can be specified zero to two (for IPv4 and IPv6)\n"
 		"    default: gateway address, or none if --no-map-gw is also\n"
 		"             specified\n"
+		"  --nat-guest-addr ADDR	NAT ADDR to guest's address\n"
+		"    Packets from the guest to ADDR will be redirected to the\n"
+		"    adress on the host that's the same as the guest's\n"
+		"    assigned address.  Usually that means (one of) the host's\n"
+		"    global address.\n"
+		"    ADDR can be 'none', in which case nothing is mapped\n"
+	        "    Can be specified zero to two (for IPv4 and IPv6)\n"
+		"    default: none\n"
 		"  --dns-forward ADDR	Forward DNS queries sent to ADDR\n"
 		"    can be specified zero to two times (for IPv4 and IPv6)\n"
 		"    default: don't forward DNS queries\n"
@@ -1141,29 +1149,32 @@ static void conf_ugid(char *runas, uid_t *uid, gid_t *gid)
 }
 
 /**
- * conf_nat() - Parse --nat-host-loopback option
- * @c:		Execution context
- * @arg:	String argument to --nat-host-loopback
- * @no_map_gw:	--no-map-gw flag, updated for "none" argument
+ * conf_nat() - Parse --nat-host-loopback or --nat-guest-addr option
+ * @arg:	String argument to option
+ * @addr4:	IPv4 to update with parsed address
+ * @addr6:	IPv6 to update with parsed address
+ * @no_map_gw:	--no-map-gw flag, or NULL, updated for "none" argument
  */
-static void conf_nat(struct ctx *c, const char *arg, int *no_map_gw)
+static void conf_nat(const char *arg, struct in_addr *addr4,
+		     struct in6_addr *addr6, int *no_map_gw)
 {
 	if (strcmp(arg, "none") == 0) {
-		c->ip4.nat_host_loopback = in4addr_any;
-		c->ip6.nat_host_loopback = in6addr_any;
-		*no_map_gw = 1;
+		*addr4 = in4addr_any;
+		*addr6 = in6addr_any;
+		if (no_map_gw)
+			*no_map_gw = 1;
 	}
 
-	if (inet_pton(AF_INET6, arg, &c->ip6.nat_host_loopback) &&
-	    !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.nat_host_loopback)	&&
-	    !IN6_IS_ADDR_LOOPBACK(&c->ip6.nat_host_loopback)	&&
-	    !IN6_IS_ADDR_MULTICAST(&c->ip6.nat_host_loopback))
+	if (inet_pton(AF_INET6, arg, addr6)	&&
+	    !IN6_IS_ADDR_UNSPECIFIED(addr6)	&&
+	    !IN6_IS_ADDR_LOOPBACK(addr6)	&&
+	    !IN6_IS_ADDR_MULTICAST(addr6))
 		return;
 
-	if (inet_pton(AF_INET, arg, &c->ip4.nat_host_loopback)	&&
-	    !IN4_IS_ADDR_UNSPECIFIED(&c->ip4.nat_host_loopback)	&&
-	    !IN4_IS_ADDR_LOOPBACK(&c->ip4.nat_host_loopback)	&&
-	    !IN4_IS_ADDR_MULTICAST(&c->ip4.nat_host_loopback))
+	if (inet_pton(AF_INET, arg, addr4)	&&
+	    !IN4_IS_ADDR_UNSPECIFIED(addr4)	&&
+	    !IN4_IS_ADDR_LOOPBACK(addr4)	&&
+	    !IN4_IS_ADDR_MULTICAST(addr4))
 		return;
 
 	die("Invalid address to remap to host: %s", optarg);
@@ -1279,6 +1290,7 @@ void conf(struct ctx *c, int argc, char **argv)
 		{"no-copy-addrs", no_argument,		NULL,		19 },
 		{"netns-only",	no_argument,		NULL,		20 },
 		{"nat-host-loopback", required_argument, NULL,		21 },
+		{"nat-guest-addr", required_argument,	NULL,		22 },
 		{ 0 },
 	};
 	const char *logname = (c->mode == MODE_PASTA) ? "pasta" : "passt";
@@ -1449,7 +1461,12 @@ void conf(struct ctx *c, int argc, char **argv)
 			*userns = 0;
 			break;
 		case 21:
-			conf_nat(c, optarg, &no_map_gw);
+			conf_nat(optarg, &c->ip4.nat_host_loopback,
+				 &c->ip6.nat_host_loopback, &no_map_gw);
+			break;
+		case 22:
+			conf_nat(optarg, &c->ip4.nat_guest_addr,
+				 &c->ip6.nat_guest_addr, NULL);
 			break;
 		case 'd':
 			c->debug = 1;
diff --git a/fwd.c b/fwd.c
index 7718f7e2..ff4789a2 100644
--- a/fwd.c
+++ b/fwd.c
@@ -272,6 +272,10 @@ uint8_t fwd_nat_from_tap(const struct ctx *c, uint8_t proto,
 		tgt->eaddr = inany_loopback4;
 	else if (inany_equals6(&ini->oaddr, &c->ip6.nat_host_loopback))
 		tgt->eaddr = inany_loopback6;
+	else if (inany_equals4(&ini->oaddr, &c->ip4.nat_guest_addr))
+		tgt->eaddr = inany_from_v4(c->ip4.addr);
+	else if (inany_equals6(&ini->oaddr, &c->ip6.nat_guest_addr))
+		tgt->eaddr.a6 = c->ip6.addr;
 	else
 		tgt->eaddr = ini->oaddr;
 
@@ -393,6 +397,12 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto,
 	} else if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.nat_host_loopback) &&
 		   inany_equals6(&ini->eaddr, &in6addr_loopback)) {
 		tgt->oaddr.a6 = c->ip6.nat_host_loopback;
+	} else if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.nat_guest_addr) &&
+		   inany_equals4(&ini->eaddr, &c->ip4.addr)) {
+		tgt->oaddr = inany_from_v4(c->ip4.nat_guest_addr);
+	} else if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.nat_guest_addr) &&
+		   inany_equals6(&ini->eaddr, &c->ip6.addr)) {
+		tgt->oaddr.a6 = c->ip6.nat_guest_addr;
 	} else if (!fwd_guest_accessible(c, &ini->eaddr)) {
 		if (inany_v4(&ini->eaddr)) {
 			if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.our_tap_addr))
diff --git a/passt.1 b/passt.1
index 3680056a..7cf553cf 100644
--- a/passt.1
+++ b/passt.1
@@ -350,6 +350,21 @@ as destination, to the host. Implied if there is no gateway on the selected
 default route, or if there is no default route, for any of the enabled address
 families.
 
+.TP
+.BR \-\-nat-guest-loopback " " \fIaddr
+Translate \fIaddr\fR in the guest to be equal to the guest's assigned
+address on the host.  That is, packets from the guest to \fIaddr\fR
+will be redirected to the address assigned to the guest with \fB-a\fR,
+or by default the host's global address.  This allows the guest to
+access services availble on the host's global address, even though its
+own address shadows that of the host.
+
+If \fIaddr\fR is 'none', no address is mapped.  Only one IPv4 and one
+IPv6 address can be translated, if the option is specified multiple
+times, the last one for each address type takes effect.
+
+Default is no mapping.
+
 .TP
 .BR \-4 ", " \-\-ipv4-only
 Enable IPv4-only operation. IPv6 traffic will be ignored.
diff --git a/passt.h b/passt.h
index 20a5904a..586c1d05 100644
--- a/passt.h
+++ b/passt.h
@@ -104,6 +104,8 @@ enum passt_modes {
  * @guest_gw:		IPv4 gateway as seen by the guest
  * @nat_host_loopback:	Outbound connections to this address are NATted to the
  *                      host's 127.0.0.1
+ * @nat_guest_addr:	Outbound connections to this address are NATted to the
+ *                      guest's assigned address
  * @dns:		DNS addresses for DHCP, zero-terminated
  * @dns_match:		Forward DNS query if sent to this address
  * @our_tap_addr:	IPv4 address for passt's use on tap
@@ -120,6 +122,7 @@ struct ip4_ctx {
 	int prefix_len;
 	struct in_addr guest_gw;
 	struct in_addr nat_host_loopback;
+	struct in_addr nat_guest_addr;
 	struct in_addr dns[MAXNS + 1];
 	struct in_addr dns_match;
 	struct in_addr our_tap_addr;
@@ -142,6 +145,8 @@ struct ip4_ctx {
  * @guest_gw:		IPv6 gateway as seen by the guest
  * @nat_host_loopback:	Outbound connections to this address are NATted to the
  *                      host's [::1]
+ * @nat_guest_addr:	Outbound connections to this address are NATted to the
+ *                      guest's assigned address
  * @dns:		DNS addresses for DHCPv6 and NDP, zero-terminated
  * @dns_match:		Forward DNS query if sent to this address
  * @our_tap_ll:		Link-local IPv6 address for passt's use on tap
@@ -158,6 +163,7 @@ struct ip6_ctx {
 	struct in6_addr addr_ll_seen;
 	struct in6_addr guest_gw;
 	struct in6_addr nat_host_loopback;
+	struct in6_addr nat_guest_addr;
 	struct in6_addr dns[MAXNS + 1];
 	struct in6_addr dns_match;
 	struct in6_addr our_tap_ll;
-- 
@@ -104,6 +104,8 @@ enum passt_modes {
  * @guest_gw:		IPv4 gateway as seen by the guest
  * @nat_host_loopback:	Outbound connections to this address are NATted to the
  *                      host's 127.0.0.1
+ * @nat_guest_addr:	Outbound connections to this address are NATted to the
+ *                      guest's assigned address
  * @dns:		DNS addresses for DHCP, zero-terminated
  * @dns_match:		Forward DNS query if sent to this address
  * @our_tap_addr:	IPv4 address for passt's use on tap
@@ -120,6 +122,7 @@ struct ip4_ctx {
 	int prefix_len;
 	struct in_addr guest_gw;
 	struct in_addr nat_host_loopback;
+	struct in_addr nat_guest_addr;
 	struct in_addr dns[MAXNS + 1];
 	struct in_addr dns_match;
 	struct in_addr our_tap_addr;
@@ -142,6 +145,8 @@ struct ip4_ctx {
  * @guest_gw:		IPv6 gateway as seen by the guest
  * @nat_host_loopback:	Outbound connections to this address are NATted to the
  *                      host's [::1]
+ * @nat_guest_addr:	Outbound connections to this address are NATted to the
+ *                      guest's assigned address
  * @dns:		DNS addresses for DHCPv6 and NDP, zero-terminated
  * @dns_match:		Forward DNS query if sent to this address
  * @our_tap_ll:		Link-local IPv6 address for passt's use on tap
@@ -158,6 +163,7 @@ struct ip6_ctx {
 	struct in6_addr addr_ll_seen;
 	struct in6_addr guest_gw;
 	struct in6_addr nat_host_loopback;
+	struct in6_addr nat_guest_addr;
 	struct in6_addr dns[MAXNS + 1];
 	struct in6_addr dns_match;
 	struct in6_addr our_tap_ll;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/22] RFC: Allow configuration of special case NATs
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
                   ` (21 preceding siblings ...)
  2024-08-16  5:40 ` [PATCH 22/22] fwd, conf: Allow NAT of the guest's assigned address David Gibson
@ 2024-08-16 14:45 ` Paul Holzinger
  2024-08-16 15:03   ` Stefano Brivio
  2024-08-19  8:46 ` David Gibson
  23 siblings, 1 reply; 55+ messages in thread
From: Paul Holzinger @ 2024-08-16 14:45 UTC (permalink / raw)
  To: David Gibson, Stefano Brivio, passt-dev

Hi,

On 16/08/2024 07:39, David Gibson wrote:
> Based on Stefano's recent patch for faster tests.
>
> Allow the user to specify which addresses are translated when used by
> the guest, rather than always being the gateway address or nothing.
> We also allow this remapping to go to the host's global address (more
> precisely the address assigned to the guest) rather than just host
> loopback.
>
> Suggestions for better names for the new options in patches 20 & 22
> are most welcome.
>
> Along the way to implementing that make many changes to clarify what
> various addresses we track mean, fixing a number of small bugs as
> well.
>
> NOTE: there is a bug in 21/22 which breaks some of the passt_tcp perf
> tests.  I haven't managed to figure out why it's causing the problem,
> or even what the exact triggering conditions are (running the single
> stalling iperf alone doesn't do it).  Have to wrap up for today, so I
> thought I'd get this out for review anyway.
>
> Paul, amongst other things, I think this will allow podman to
> (finally) nicely address #19213, picking an address to remap to the
> host's external address with --nat-guest-addr, much like it already
> uses --dns-forward.

Thanks this looks promising. I will try to test it out next week.

No strong feelings about the naming but how about s/--nat/--map/ for the 
options?

>
> David Gibson (22):
>    treewide: Use "our address" instead of "forwarding address"
>    util: Helper for formatting MAC addresses
>    treewide: Rename MAC address fields for clarity
>    treewide: Use struct assignment instead of memcpy() for IP addresses
>    conf: Use array indices rather than pointers for DNS array slots
>    conf: More accurately count entries added in get_dns()
>    conf: Move DNS array bounds checks into add_dns[46]
>    conf: Move adding of a nameserver from resolv.conf into subfunction
>    conf: Correct setting of dns_match address in add_dns6()
>    conf: Treat --dns addresses as guest visible addresses
>    conf: Remove incorrect initialisation of addr_ll_seen
>    util: Correct sock_l4() binding for link local addresses
>    treewide: Change misleading 'addr_ll' name
>    Clarify which addresses in ip[46]_ctx are meaningful where
>    Initialise our_tap_ll to ip6.gw when suitable
>    fwd: Helpers to clarify what host addresses aren't guest accessible
>    fwd: Split notion of "our tap address" from gateway for IPv4
>    Don't take "our" MAC address from the host
>    conf, fwd: Split notion of gateway/router from guest-visible host
>      address
>    conf: Allow address remapped to host to be configured
>    fwd: Distinguish translatable from untranslatable addresses on inbound
>    fwd, conf: Allow NAT of the guest's assigned address
>
>   arp.c                 |   4 +-
>   conf.c                | 328 +++++++++++++++++++++++++-----------------
>   dhcp.c                |  19 +--
>   dhcpv6.c              |  21 +--
>   flow.c                |  72 +++++-----
>   flow.h                |  18 +--
>   fwd.c                 | 170 +++++++++++++++++-----
>   icmp.c                |   4 +-
>   ndp.c                 |   9 +-
>   passt.1               |  45 +++++-
>   passt.c               |   2 +-
>   passt.h               |  53 +++++--
>   pasta.c               |  14 +-
>   tap.c                 |  12 +-
>   tcp.c                 |  33 ++---
>   tcp_internal.h        |   2 +-
>   test/lib/setup        |  11 +-
>   test/passt_in_ns/dhcp |  73 ++++++++++
>   test/passt_in_ns/tcp  |  38 +++--
>   test/passt_in_ns/udp  |  22 +--
>   test/perf/passt_tcp   |  33 ++---
>   test/perf/passt_udp   |  31 ++--
>   test/perf/pasta_tcp   |  29 ++--
>   test/perf/pasta_udp   |  25 ++--
>   test/run              |   4 +-
>   udp.c                 |  12 +-
>   util.c                |  22 ++-
>   util.h                |   4 +-
>   28 files changed, 719 insertions(+), 391 deletions(-)
>   create mode 100644 test/passt_in_ns/dhcp
>
-- 
Paul Holzinger


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/22] RFC: Allow configuration of special case NATs
  2024-08-16 14:45 ` [PATCH 00/22] RFC: Allow configuration of special case NATs Paul Holzinger
@ 2024-08-16 15:03   ` Stefano Brivio
  2024-08-17  8:01     ` David Gibson
  0 siblings, 1 reply; 55+ messages in thread
From: Stefano Brivio @ 2024-08-16 15:03 UTC (permalink / raw)
  To: Paul Holzinger; +Cc: David Gibson, passt-dev

On Fri, 16 Aug 2024 16:45:14 +0200
Paul Holzinger <pholzing@redhat.com> wrote:

> Hi,
> 
> On 16/08/2024 07:39, David Gibson wrote:
> > Based on Stefano's recent patch for faster tests.
> >
> > Allow the user to specify which addresses are translated when used by
> > the guest, rather than always being the gateway address or nothing.
> > We also allow this remapping to go to the host's global address (more
> > precisely the address assigned to the guest) rather than just host
> > loopback.
> >
> > Suggestions for better names for the new options in patches 20 & 22
> > are most welcome.
> >
> > Along the way to implementing that make many changes to clarify what
> > various addresses we track mean, fixing a number of small bugs as
> > well.
> >
> > NOTE: there is a bug in 21/22 which breaks some of the passt_tcp perf
> > tests.  I haven't managed to figure out why it's causing the problem,
> > or even what the exact triggering conditions are (running the single
> > stalling iperf alone doesn't do it).  Have to wrap up for today, so I
> > thought I'd get this out for review anyway.
> >
> > Paul, amongst other things, I think this will allow podman to
> > (finally) nicely address #19213, picking an address to remap to the
> > host's external address with --nat-guest-addr, much like it already
> > uses --dns-forward.  
> 
> Thanks this looks promising. I will try to test it out next week.
> 
> No strong feelings about the naming but how about s/--nat/--map/ for the 
> options?

Exactly the same as I suggested offline a while ago. :) I think it's
easier to understand what it does, that way.

-- 
Stefano


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/22] RFC: Allow configuration of special case NATs
  2024-08-16 15:03   ` Stefano Brivio
@ 2024-08-17  8:01     ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-17  8:01 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Paul Holzinger, passt-dev

[-- Attachment #1: Type: text/plain, Size: 2166 bytes --]

On Fri, Aug 16, 2024 at 05:03:22PM +0200, Stefano Brivio wrote:
> On Fri, 16 Aug 2024 16:45:14 +0200
> Paul Holzinger <pholzing@redhat.com> wrote:
> 
> > Hi,
> > 
> > On 16/08/2024 07:39, David Gibson wrote:
> > > Based on Stefano's recent patch for faster tests.
> > >
> > > Allow the user to specify which addresses are translated when used by
> > > the guest, rather than always being the gateway address or nothing.
> > > We also allow this remapping to go to the host's global address (more
> > > precisely the address assigned to the guest) rather than just host
> > > loopback.
> > >
> > > Suggestions for better names for the new options in patches 20 & 22
> > > are most welcome.
> > >
> > > Along the way to implementing that make many changes to clarify what
> > > various addresses we track mean, fixing a number of small bugs as
> > > well.
> > >
> > > NOTE: there is a bug in 21/22 which breaks some of the passt_tcp perf
> > > tests.  I haven't managed to figure out why it's causing the problem,
> > > or even what the exact triggering conditions are (running the single
> > > stalling iperf alone doesn't do it).  Have to wrap up for today, so I
> > > thought I'd get this out for review anyway.
> > >
> > > Paul, amongst other things, I think this will allow podman to
> > > (finally) nicely address #19213, picking an address to remap to the
> > > host's external address with --nat-guest-addr, much like it already
> > > uses --dns-forward.  
> > 
> > Thanks this looks promising. I will try to test it out next week.
> > 
> > No strong feelings about the naming but how about s/--nat/--map/ for the 
> > options?
> 
> Exactly the same as I suggested offline a while ago. :) I think it's
> easier to understand what it does, that way.

Ok.  I think I was going to do that originally but changed it for
reasons that I've now forgotten.  --map is more consistent with
--no-map-gw too, so I'll change this.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/22] treewide: Use "our address" instead of "forwarding address"
  2024-08-16  5:39 ` [PATCH 01/22] treewide: Use "our address" instead of "forwarding address" David Gibson
@ 2024-08-18 15:44   ` Stefano Brivio
  2024-08-19  1:28     ` David Gibson
  0 siblings, 1 reply; 55+ messages in thread
From: Stefano Brivio @ 2024-08-18 15:44 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Paul Holzinger

On Fri, 16 Aug 2024 15:39:42 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> The term "forwarding address" to indicate the local-to-passt address was
> well-intentioned, but ends up being kinda confusing.  As discussed on a
> recent call, let's try "our" instead.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  flow.c         | 72 +++++++++++++++++++++++++-------------------------
>  flow.h         | 18 ++++++-------
>  fwd.c          | 70 ++++++++++++++++++++++++------------------------
>  icmp.c         |  4 +--
>  tcp.c          | 33 ++++++++++++-----------
>  tcp_internal.h |  2 +-
>  udp.c          | 12 ++++-----
>  7 files changed, 106 insertions(+), 105 deletions(-)
> 
> diff --git a/flow.c b/flow.c
> index 93b687dc..8915e366 100644
> --- a/flow.c
> +++ b/flow.c
> @@ -127,18 +127,18 @@ static struct timespec flow_timer_run;
>   * @af:		Address family (AF_INET or AF_INET6)
>   * @eaddr:	Endpoint address (pointer to in_addr or in6_addr)
>   * @eport:	Endpoint port
> - * @faddr:	Forwarding address (pointer to in_addr or in6_addr)
> - * @fport:	Forwarding port
> + * @oaddr:	Our address (pointer to in_addr or in6_addr)
> + * @oport:	Our port
>   */
>  static void flowside_from_af(struct flowside *side, sa_family_t af,
>  			     const void *eaddr, in_port_t eport,
> -			     const void *faddr, in_port_t fport)
> +			     const void *oaddr, in_port_t oport)
>  {
> -	if (faddr)
> -		inany_from_af(&side->faddr, af, faddr);
> +	if (oaddr)
> +		inany_from_af(&side->oaddr, af, oaddr);
>  	else
> -		side->faddr = inany_any6;
> -	side->fport = fport;
> +		side->oaddr = inany_any6;
> +	side->oport = oport;
>  
>  	if (eaddr)
>  		inany_from_af(&side->eaddr, af, eaddr);
> @@ -193,8 +193,8 @@ static int flowside_sock_splice(void *arg)
>   * @tgt:	Target flowside
>   * @data:	epoll reference portion for protocol handlers
>   *
> - * Return: socket fd of protocol @proto bound to the forwarding address and port
> - *         from @tgt (if specified).
> + * Return: socket fd of protocol @proto bound to our address and port from @tgt
> + *         (if specified).
>   */
>  int flowside_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
>  		     const struct flowside *tgt, uint32_t data)
> @@ -205,11 +205,11 @@ int flowside_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
>  
>  	ASSERT(pif_is_socket(pif));
>  
> -	pif_sockaddr(c, &sa, &sl, pif, &tgt->faddr, tgt->fport);
> +	pif_sockaddr(c, &sa, &sl, pif, &tgt->oaddr, tgt->oport);
>  
>  	switch (pif) {
>  	case PIF_HOST:
> -		if (inany_is_loopback(&tgt->faddr))
> +		if (inany_is_loopback(&tgt->oaddr))
>  			ifname = NULL;
>  		else if (sa.sa_family == AF_INET)
>  			ifname = c->ip4.ifname_out;
> @@ -309,11 +309,11 @@ static void flow_set_state(struct flow_common *f, enum flow_state state)
>  			  pif_name(f->pif[INISIDE]),
>  			  inany_ntop(&ini->eaddr, estr0, sizeof(estr0)),
>  			  ini->eport,
> -			  inany_ntop(&ini->faddr, fstr0, sizeof(fstr0)),
> -			  ini->fport,
> +			  inany_ntop(&ini->oaddr, fstr0, sizeof(fstr0)),
> +			  ini->oport,
>  			  pif_name(f->pif[TGTSIDE]),
> -			  inany_ntop(&tgt->faddr, fstr1, sizeof(fstr1)),
> -			  tgt->fport,
> +			  inany_ntop(&tgt->oaddr, fstr1, sizeof(fstr1)),
> +			  tgt->oport,
>  			  inany_ntop(&tgt->eaddr, estr1, sizeof(estr1)),
>  			  tgt->eport);
>  	else if (MAX(state, oldstate) >= FLOW_STATE_INI)
> @@ -321,8 +321,8 @@ static void flow_set_state(struct flow_common *f, enum flow_state state)
>  			  pif_name(f->pif[INISIDE]),
>  			  inany_ntop(&ini->eaddr, estr0, sizeof(estr0)),
>  			  ini->eport,
> -			  inany_ntop(&ini->faddr, fstr0, sizeof(fstr0)),
> -			  ini->fport);
> +			  inany_ntop(&ini->oaddr, fstr0, sizeof(fstr0)),
> +			  ini->oport);
>  }
>  
>  /**
> @@ -347,7 +347,7 @@ static void flow_initiate_(union flow *flow, uint8_t pif)
>   * flow_initiate_af() - Move flow to INI, setting INISIDE details
>   * @flow:	Flow to change state
>   * @pif:	pif of the initiating side
> - * @af:		Address family of @eaddr and @faddr
> + * @af:		Address family of @eaddr and @oaddr

Pre-existing, but this made me realise that flow_initiate_af() doesn't
actually take @eaddr and @faddr at all (it's @saddr and @daddr instead).

-- 
Stefano


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/22] util: Helper for formatting MAC addresses
  2024-08-16  5:39 ` [PATCH 02/22] util: Helper for formatting MAC addresses David Gibson
@ 2024-08-18 15:44   ` Stefano Brivio
  2024-08-19  1:29     ` David Gibson
  0 siblings, 1 reply; 55+ messages in thread
From: Stefano Brivio @ 2024-08-18 15:44 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Paul Holzinger

On Fri, 16 Aug 2024 15:39:43 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> There are a couple of places where we somewhat messily open code formatting
> an Ethernet like MAC address for display.  Add an eth_ntop() helper for
> this.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  conf.c |  7 +++----
>  dhcp.c |  5 ++---
>  util.c | 19 +++++++++++++++++++
>  util.h |  3 +++
>  4 files changed, 27 insertions(+), 7 deletions(-)
> 
> diff --git a/conf.c b/conf.c
> index ed097bdc..830f91a6 100644
> --- a/conf.c
> +++ b/conf.c
> @@ -921,7 +921,8 @@ pasta_opts:
>   */
>  static void conf_print(const struct ctx *c)
>  {
> -	char buf4[INET_ADDRSTRLEN], buf6[INET6_ADDRSTRLEN], ifn[IFNAMSIZ];
> +	char buf4[INET_ADDRSTRLEN], buf6[INET6_ADDRSTRLEN];
> +	char bufmac[ETH_ADDRSTRLEN], ifn[IFNAMSIZ];
>  	int i;
>  
>  	info("Template interface: %s%s%s%s%s",
> @@ -955,9 +956,7 @@ static void conf_print(const struct ctx *c)
>  		info("Namespace interface: %s", c->pasta_ifn);
>  
>  	info("MAC:");
> -	info("    host: %02x:%02x:%02x:%02x:%02x:%02x",
> -	     c->mac[0], c->mac[1], c->mac[2],
> -	     c->mac[3], c->mac[4], c->mac[5]);
> +	info("    host: %s", eth_ntop(c->mac, bufmac, sizeof(bufmac)));
>  
>  	if (c->ifi4) {
>  		if (!c->no_dhcp) {
> diff --git a/dhcp.c b/dhcp.c
> index aa9f59da..acc5b03e 100644
> --- a/dhcp.c
> +++ b/dhcp.c
> @@ -276,6 +276,7 @@ static void opt_set_dns_search(const struct ctx *c, size_t max_len)
>  int dhcp(const struct ctx *c, const struct pool *p)
>  {
>  	size_t mlen, dlen, offset = 0, opt_len, opt_off = 0;
> +	char macstr[ETH_ADDRSTRLEN];
>  	const struct ethhdr *eh;
>  	const struct iphdr *iph;
>  	const struct udphdr *uh;
> @@ -340,9 +341,7 @@ int dhcp(const struct ctx *c, const struct pool *p)
>  		return -1;
>  	}
>  
> -	info("    from %02x:%02x:%02x:%02x:%02x:%02x",
> -	     m->chaddr[0], m->chaddr[1], m->chaddr[2],
> -	     m->chaddr[3], m->chaddr[4], m->chaddr[5]);
> +	info("    from %s", eth_ntop(m->chaddr, macstr, sizeof(macstr)));
>  
>  	m->yiaddr = c->ip4.addr;
>  	mask.s_addr = htonl(0xffffffff << (32 - c->ip4.prefix_len));
> diff --git a/util.c b/util.c
> index 0b414045..892358b1 100644
> --- a/util.c
> +++ b/util.c
> @@ -676,6 +676,25 @@ const char *sockaddr_ntop(const void *sa, char *dst, socklen_t size)
>  	return dst;
>  }
>  
> +/** eth_ntop() - Convert an Ethernet MAC address to text format
> + * @mac:	MAC address
> + * @dst:	output buffer, minimum ETH_ADDRSTRLEN bytes
> + * @size:	size of buffer at @dst

Nit: s/output/Output, s/size/Size

> + *
> + * Return: On success, a non-null pointer to @dst, NULL on failure
> + */
> +const char *eth_ntop(const unsigned char *mac, char *dst, size_t size)
> +{
> +	int len;
> +
> +	len = snprintf(dst, size, "%02x:%02x:%02x:%02x:%02x:%02x",
> +		       mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
> +	if (len < 0 || (size_t)len >= size)
> +		return NULL;
> +
> +	return dst;
> +}
> +
>  /** str_ee_origin() - Convert socket extended error origin to a string
>   * @ee:		Socket extended error structure
>   *
> diff --git a/util.h b/util.h
> index cb4d181c..c1748074 100644
> --- a/util.h
> +++ b/util.h
> @@ -215,9 +215,12 @@ static inline const char *af_name(sa_family_t af)
>  
>  #define SOCKADDR_STRLEN		MAX(SOCKADDR_INET_STRLEN, SOCKADDR_INET6_STRLEN)
>  
> +#define ETH_ADDRSTRLEN	(ETH_ALEN * 3)

The fact that this includes two digits plus separator for all non-last
octets of a MAC address, and two digits plus NULL terminator for the
last octet, looks a bit subtle to me.

Defining this as sizeof("00:11:22:33:44:55") wouldn't scream
"off-by-one" as much, to me. Not a strong preference.

-- 
Stefano


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 03/22] treewide: Rename MAC address fields for clarity
  2024-08-16  5:39 ` [PATCH 03/22] treewide: Rename MAC address fields for clarity David Gibson
@ 2024-08-18 15:45   ` Stefano Brivio
  2024-08-19  1:36     ` David Gibson
  0 siblings, 1 reply; 55+ messages in thread
From: Stefano Brivio @ 2024-08-18 15:45 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Paul Holzinger

On Fri, 16 Aug 2024 15:39:44 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> c->mac isn't a great name, because it doesn't say whose mac address it is
> and it's not necessarily obvious in all the contexts we use it.  Since this
> is specifically the address that we (passt/pasta) use on the tap interface,
> rename it to "our_tap_mac".  Rename the "mac_guest" field to "guest_mac"
> to be grammatically consistent.

Wouldn't "our_mac" suffice?

Even the day we get to support other types of link (well, "tap" for a
guest is already not a tap, I know...), or especially multiple links at
the same time, I guess we will still want to use a single MAC address.

-- 
Stefano


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 04/22] treewide: Use struct assignment instead of memcpy() for IP addresses
  2024-08-16  5:39 ` [PATCH 04/22] treewide: Use struct assignment instead of memcpy() for IP addresses David Gibson
@ 2024-08-18 15:45   ` Stefano Brivio
  2024-08-19  1:38     ` David Gibson
  0 siblings, 1 reply; 55+ messages in thread
From: Stefano Brivio @ 2024-08-18 15:45 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Paul Holzinger

On Fri, 16 Aug 2024 15:39:45 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> We rely on C11 already, so we can use clearer and more type-checkable
> struct assignment instead of mempcy() for copying IP addresses around.
> 
> This exposes some "pointer could be const" warnings from cppcheck, so
> address those too.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  conf.c   | 12 ++++++------
>  dhcpv6.c | 10 ++++++----
>  2 files changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/conf.c b/conf.c
> index 750fdc86..9b05afeb 100644
> --- a/conf.c
> +++ b/conf.c
> @@ -389,14 +389,14 @@ static void add_dns6(struct ctx *c,
>  	/* Guest or container can only access local addresses via redirect */
>  	if (IN6_IS_ADDR_LOOPBACK(addr)) {
>  		if (!c->no_map_gw) {
> -			memcpy(*conf, &c->ip6.gw, sizeof(**conf));
> +			**conf = c->ip6.gw;
>  			(*conf)++;
>  
>  			if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_match))
> -				memcpy(&c->ip6.dns_match, addr, sizeof(*addr));
> +				c->ip6.dns_match = *addr;
>  		}
>  	} else {
> -		memcpy(*conf, addr, sizeof(**conf));
> +		**conf = *addr;
>  		(*conf)++;
>  	}
>  
> @@ -632,7 +632,7 @@ static unsigned int conf_ip4(unsigned int ifi,
>  			ip4->prefix_len = 32;
>  	}
>  
> -	memcpy(&ip4->addr_seen, &ip4->addr, sizeof(ip4->addr_seen));
> +	ip4->addr_seen = ip4->addr;
>  
>  	if (MAC_IS_ZERO(mac)) {
>  		int rc = nl_link_get_mac(nl_sock, ifi, mac);
> @@ -693,8 +693,8 @@ static unsigned int conf_ip6(unsigned int ifi,
>  		return 0;
>  	}
>  
> -	memcpy(&ip6->addr_seen, &ip6->addr, sizeof(ip6->addr));
> -	memcpy(&ip6->addr_ll_seen, &ip6->addr_ll, sizeof(ip6->addr_ll));
> +	ip6->addr_seen = ip6->addr;
> +	ip6->addr_ll_seen = ip6->addr_ll;
>  
>  	if (MAC_IS_ZERO(mac)) {
>  		rc = nl_link_get_mac(nl_sock, ifi, mac);
> diff --git a/dhcpv6.c b/dhcpv6.c
> index bbed41dc..87b3c3eb 100644
> --- a/dhcpv6.c
> +++ b/dhcpv6.c
> @@ -298,7 +298,8 @@ static struct opt_hdr *dhcpv6_ia_notonlink(const struct pool *p,
>  {
>  	char buf[INET6_ADDRSTRLEN];
>  	struct in6_addr req_addr;
> -	struct opt_hdr *ia, *h;
> +	const struct opt_hdr *h;
> +	struct opt_hdr *ia;
>  	size_t offset;
>  	int ia_type;
>  
> @@ -312,12 +313,13 @@ ia_ta:
>  		offset += sizeof(struct opt_ia_na);
>  
>  		while ((h = dhcpv6_opt(p, &offset, OPT_IAAADR))) {
> -			struct opt_ia_addr *opt_addr = (struct opt_ia_addr *)h;
> +			const struct opt_ia_addr *opt_addr
> +				= (const struct opt_ia_addr *)h;

Nit: the assignment could go on its own line, then?

>  			if (ntohs(h->l) != OPT_VSIZE(ia_addr))
>  				return NULL;
>  
> -			memcpy(&req_addr, &opt_addr->addr, sizeof(req_addr));
> +			req_addr = opt_addr->addr;
>  			if (!IN6_ARE_ADDR_EQUAL(la, &req_addr)) {
>  				info("DHCPv6: requested address %s not on link",
>  				     inet_ntop(AF_INET6, &req_addr,
> @@ -363,7 +365,7 @@ static size_t dhcpv6_dns_fill(const struct ctx *c, char *buf, int offset)
>  			srv->hdr.l = 0;
>  		}
>  
> -		memcpy(&srv->addr[i], &c->ip6.dns[i], sizeof(srv->addr[i]));
> +		srv->addr[i] = c->ip6.dns[i];
>  		srv->hdr.l += sizeof(srv->addr[i]);
>  		offset += sizeof(srv->addr[i]);
>  	}

I only reviewed up to this patch so far.

-- 
Stefano


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/22] treewide: Use "our address" instead of "forwarding address"
  2024-08-18 15:44   ` Stefano Brivio
@ 2024-08-19  1:28     ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-19  1:28 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Paul Holzinger

[-- Attachment #1: Type: text/plain, Size: 4897 bytes --]

On Sun, Aug 18, 2024 at 05:44:51PM +0200, Stefano Brivio wrote:
> On Fri, 16 Aug 2024 15:39:42 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > The term "forwarding address" to indicate the local-to-passt address was
> > well-intentioned, but ends up being kinda confusing.  As discussed on a
> > recent call, let's try "our" instead.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  flow.c         | 72 +++++++++++++++++++++++++-------------------------
> >  flow.h         | 18 ++++++-------
> >  fwd.c          | 70 ++++++++++++++++++++++++------------------------
> >  icmp.c         |  4 +--
> >  tcp.c          | 33 ++++++++++++-----------
> >  tcp_internal.h |  2 +-
> >  udp.c          | 12 ++++-----
> >  7 files changed, 106 insertions(+), 105 deletions(-)
> > 
> > diff --git a/flow.c b/flow.c
> > index 93b687dc..8915e366 100644
> > --- a/flow.c
> > +++ b/flow.c
> > @@ -127,18 +127,18 @@ static struct timespec flow_timer_run;
> >   * @af:		Address family (AF_INET or AF_INET6)
> >   * @eaddr:	Endpoint address (pointer to in_addr or in6_addr)
> >   * @eport:	Endpoint port
> > - * @faddr:	Forwarding address (pointer to in_addr or in6_addr)
> > - * @fport:	Forwarding port
> > + * @oaddr:	Our address (pointer to in_addr or in6_addr)
> > + * @oport:	Our port
> >   */
> >  static void flowside_from_af(struct flowside *side, sa_family_t af,
> >  			     const void *eaddr, in_port_t eport,
> > -			     const void *faddr, in_port_t fport)
> > +			     const void *oaddr, in_port_t oport)
> >  {
> > -	if (faddr)
> > -		inany_from_af(&side->faddr, af, faddr);
> > +	if (oaddr)
> > +		inany_from_af(&side->oaddr, af, oaddr);
> >  	else
> > -		side->faddr = inany_any6;
> > -	side->fport = fport;
> > +		side->oaddr = inany_any6;
> > +	side->oport = oport;
> >  
> >  	if (eaddr)
> >  		inany_from_af(&side->eaddr, af, eaddr);
> > @@ -193,8 +193,8 @@ static int flowside_sock_splice(void *arg)
> >   * @tgt:	Target flowside
> >   * @data:	epoll reference portion for protocol handlers
> >   *
> > - * Return: socket fd of protocol @proto bound to the forwarding address and port
> > - *         from @tgt (if specified).
> > + * Return: socket fd of protocol @proto bound to our address and port from @tgt
> > + *         (if specified).
> >   */
> >  int flowside_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
> >  		     const struct flowside *tgt, uint32_t data)
> > @@ -205,11 +205,11 @@ int flowside_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
> >  
> >  	ASSERT(pif_is_socket(pif));
> >  
> > -	pif_sockaddr(c, &sa, &sl, pif, &tgt->faddr, tgt->fport);
> > +	pif_sockaddr(c, &sa, &sl, pif, &tgt->oaddr, tgt->oport);
> >  
> >  	switch (pif) {
> >  	case PIF_HOST:
> > -		if (inany_is_loopback(&tgt->faddr))
> > +		if (inany_is_loopback(&tgt->oaddr))
> >  			ifname = NULL;
> >  		else if (sa.sa_family == AF_INET)
> >  			ifname = c->ip4.ifname_out;
> > @@ -309,11 +309,11 @@ static void flow_set_state(struct flow_common *f, enum flow_state state)
> >  			  pif_name(f->pif[INISIDE]),
> >  			  inany_ntop(&ini->eaddr, estr0, sizeof(estr0)),
> >  			  ini->eport,
> > -			  inany_ntop(&ini->faddr, fstr0, sizeof(fstr0)),
> > -			  ini->fport,
> > +			  inany_ntop(&ini->oaddr, fstr0, sizeof(fstr0)),
> > +			  ini->oport,
> >  			  pif_name(f->pif[TGTSIDE]),
> > -			  inany_ntop(&tgt->faddr, fstr1, sizeof(fstr1)),
> > -			  tgt->fport,
> > +			  inany_ntop(&tgt->oaddr, fstr1, sizeof(fstr1)),
> > +			  tgt->oport,
> >  			  inany_ntop(&tgt->eaddr, estr1, sizeof(estr1)),
> >  			  tgt->eport);
> >  	else if (MAX(state, oldstate) >= FLOW_STATE_INI)
> > @@ -321,8 +321,8 @@ static void flow_set_state(struct flow_common *f, enum flow_state state)
> >  			  pif_name(f->pif[INISIDE]),
> >  			  inany_ntop(&ini->eaddr, estr0, sizeof(estr0)),
> >  			  ini->eport,
> > -			  inany_ntop(&ini->faddr, fstr0, sizeof(fstr0)),
> > -			  ini->fport);
> > +			  inany_ntop(&ini->oaddr, fstr0, sizeof(fstr0)),
> > +			  ini->oport);
> >  }
> >  
> >  /**
> > @@ -347,7 +347,7 @@ static void flow_initiate_(union flow *flow, uint8_t pif)
> >   * flow_initiate_af() - Move flow to INI, setting INISIDE details
> >   * @flow:	Flow to change state
> >   * @pif:	pif of the initiating side
> > - * @af:		Address family of @eaddr and @faddr
> > + * @af:		Address family of @eaddr and @oaddr
> 
> Pre-existing, but this made me realise that flow_initiate_af() doesn't
> actually take @eaddr and @faddr at all (it's @saddr and @daddr instead).

Oops, yes.  I've folded a fix for that into this patch.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/22] util: Helper for formatting MAC addresses
  2024-08-18 15:44   ` Stefano Brivio
@ 2024-08-19  1:29     ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-19  1:29 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Paul Holzinger

[-- Attachment #1: Type: text/plain, Size: 4301 bytes --]

On Sun, Aug 18, 2024 at 05:44:55PM +0200, Stefano Brivio wrote:
> On Fri, 16 Aug 2024 15:39:43 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > There are a couple of places where we somewhat messily open code formatting
> > an Ethernet like MAC address for display.  Add an eth_ntop() helper for
> > this.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  conf.c |  7 +++----
> >  dhcp.c |  5 ++---
> >  util.c | 19 +++++++++++++++++++
> >  util.h |  3 +++
> >  4 files changed, 27 insertions(+), 7 deletions(-)
> > 
> > diff --git a/conf.c b/conf.c
> > index ed097bdc..830f91a6 100644
> > --- a/conf.c
> > +++ b/conf.c
> > @@ -921,7 +921,8 @@ pasta_opts:
> >   */
> >  static void conf_print(const struct ctx *c)
> >  {
> > -	char buf4[INET_ADDRSTRLEN], buf6[INET6_ADDRSTRLEN], ifn[IFNAMSIZ];
> > +	char buf4[INET_ADDRSTRLEN], buf6[INET6_ADDRSTRLEN];
> > +	char bufmac[ETH_ADDRSTRLEN], ifn[IFNAMSIZ];
> >  	int i;
> >  
> >  	info("Template interface: %s%s%s%s%s",
> > @@ -955,9 +956,7 @@ static void conf_print(const struct ctx *c)
> >  		info("Namespace interface: %s", c->pasta_ifn);
> >  
> >  	info("MAC:");
> > -	info("    host: %02x:%02x:%02x:%02x:%02x:%02x",
> > -	     c->mac[0], c->mac[1], c->mac[2],
> > -	     c->mac[3], c->mac[4], c->mac[5]);
> > +	info("    host: %s", eth_ntop(c->mac, bufmac, sizeof(bufmac)));
> >  
> >  	if (c->ifi4) {
> >  		if (!c->no_dhcp) {
> > diff --git a/dhcp.c b/dhcp.c
> > index aa9f59da..acc5b03e 100644
> > --- a/dhcp.c
> > +++ b/dhcp.c
> > @@ -276,6 +276,7 @@ static void opt_set_dns_search(const struct ctx *c, size_t max_len)
> >  int dhcp(const struct ctx *c, const struct pool *p)
> >  {
> >  	size_t mlen, dlen, offset = 0, opt_len, opt_off = 0;
> > +	char macstr[ETH_ADDRSTRLEN];
> >  	const struct ethhdr *eh;
> >  	const struct iphdr *iph;
> >  	const struct udphdr *uh;
> > @@ -340,9 +341,7 @@ int dhcp(const struct ctx *c, const struct pool *p)
> >  		return -1;
> >  	}
> >  
> > -	info("    from %02x:%02x:%02x:%02x:%02x:%02x",
> > -	     m->chaddr[0], m->chaddr[1], m->chaddr[2],
> > -	     m->chaddr[3], m->chaddr[4], m->chaddr[5]);
> > +	info("    from %s", eth_ntop(m->chaddr, macstr, sizeof(macstr)));
> >  
> >  	m->yiaddr = c->ip4.addr;
> >  	mask.s_addr = htonl(0xffffffff << (32 - c->ip4.prefix_len));
> > diff --git a/util.c b/util.c
> > index 0b414045..892358b1 100644
> > --- a/util.c
> > +++ b/util.c
> > @@ -676,6 +676,25 @@ const char *sockaddr_ntop(const void *sa, char *dst, socklen_t size)
> >  	return dst;
> >  }
> >  
> > +/** eth_ntop() - Convert an Ethernet MAC address to text format
> > + * @mac:	MAC address
> > + * @dst:	output buffer, minimum ETH_ADDRSTRLEN bytes
> > + * @size:	size of buffer at @dst
> 
> Nit: s/output/Output, s/size/Size

Fixed.

> > + *
> > + * Return: On success, a non-null pointer to @dst, NULL on failure
> > + */
> > +const char *eth_ntop(const unsigned char *mac, char *dst, size_t size)
> > +{
> > +	int len;
> > +
> > +	len = snprintf(dst, size, "%02x:%02x:%02x:%02x:%02x:%02x",
> > +		       mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
> > +	if (len < 0 || (size_t)len >= size)
> > +		return NULL;
> > +
> > +	return dst;
> > +}
> > +
> >  /** str_ee_origin() - Convert socket extended error origin to a string
> >   * @ee:		Socket extended error structure
> >   *
> > diff --git a/util.h b/util.h
> > index cb4d181c..c1748074 100644
> > --- a/util.h
> > +++ b/util.h
> > @@ -215,9 +215,12 @@ static inline const char *af_name(sa_family_t af)
> >  
> >  #define SOCKADDR_STRLEN		MAX(SOCKADDR_INET_STRLEN, SOCKADDR_INET6_STRLEN)
> >  
> > +#define ETH_ADDRSTRLEN	(ETH_ALEN * 3)
> 
> The fact that this includes two digits plus separator for all non-last
> octets of a MAC address, and two digits plus NULL terminator for the
> last octet, looks a bit subtle to me.
> 
> Defining this as sizeof("00:11:22:33:44:55") wouldn't scream
> "off-by-one" as much, to me. Not a strong preference.

Yeah, that makes sense.  Done.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 03/22] treewide: Rename MAC address fields for clarity
  2024-08-18 15:45   ` Stefano Brivio
@ 2024-08-19  1:36     ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-19  1:36 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Paul Holzinger

[-- Attachment #1: Type: text/plain, Size: 1154 bytes --]

On Sun, Aug 18, 2024 at 05:45:00PM +0200, Stefano Brivio wrote:
> On Fri, 16 Aug 2024 15:39:44 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > c->mac isn't a great name, because it doesn't say whose mac address it is
> > and it's not necessarily obvious in all the contexts we use it.  Since this
> > is specifically the address that we (passt/pasta) use on the tap interface,
> > rename it to "our_tap_mac".  Rename the "mac_guest" field to "guest_mac"
> > to be grammatically consistent.
> 
> Wouldn't "our_mac" suffice?

Maybe.  This is supposed to emphasise that this is used on PIF_TAP -
we also (usually) have a MAC address on the host interfaces, though we
don't really need to care about it.

> Even the day we get to support other types of link (well, "tap" for a
> guest is already not a tap, I know...), or especially multiple links at
> the same time, I guess we will still want to use a single MAC address.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 04/22] treewide: Use struct assignment instead of memcpy() for IP addresses
  2024-08-18 15:45   ` Stefano Brivio
@ 2024-08-19  1:38     ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-19  1:38 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Paul Holzinger

[-- Attachment #1: Type: text/plain, Size: 3766 bytes --]

On Sun, Aug 18, 2024 at 05:45:03PM +0200, Stefano Brivio wrote:
> On Fri, 16 Aug 2024 15:39:45 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > We rely on C11 already, so we can use clearer and more type-checkable
> > struct assignment instead of mempcy() for copying IP addresses around.
> > 
> > This exposes some "pointer could be const" warnings from cppcheck, so
> > address those too.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  conf.c   | 12 ++++++------
> >  dhcpv6.c | 10 ++++++----
> >  2 files changed, 12 insertions(+), 10 deletions(-)
> > 
> > diff --git a/conf.c b/conf.c
> > index 750fdc86..9b05afeb 100644
> > --- a/conf.c
> > +++ b/conf.c
> > @@ -389,14 +389,14 @@ static void add_dns6(struct ctx *c,
> >  	/* Guest or container can only access local addresses via redirect */
> >  	if (IN6_IS_ADDR_LOOPBACK(addr)) {
> >  		if (!c->no_map_gw) {
> > -			memcpy(*conf, &c->ip6.gw, sizeof(**conf));
> > +			**conf = c->ip6.gw;
> >  			(*conf)++;
> >  
> >  			if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_match))
> > -				memcpy(&c->ip6.dns_match, addr, sizeof(*addr));
> > +				c->ip6.dns_match = *addr;
> >  		}
> >  	} else {
> > -		memcpy(*conf, addr, sizeof(**conf));
> > +		**conf = *addr;
> >  		(*conf)++;
> >  	}
> >  
> > @@ -632,7 +632,7 @@ static unsigned int conf_ip4(unsigned int ifi,
> >  			ip4->prefix_len = 32;
> >  	}
> >  
> > -	memcpy(&ip4->addr_seen, &ip4->addr, sizeof(ip4->addr_seen));
> > +	ip4->addr_seen = ip4->addr;
> >  
> >  	if (MAC_IS_ZERO(mac)) {
> >  		int rc = nl_link_get_mac(nl_sock, ifi, mac);
> > @@ -693,8 +693,8 @@ static unsigned int conf_ip6(unsigned int ifi,
> >  		return 0;
> >  	}
> >  
> > -	memcpy(&ip6->addr_seen, &ip6->addr, sizeof(ip6->addr));
> > -	memcpy(&ip6->addr_ll_seen, &ip6->addr_ll, sizeof(ip6->addr_ll));
> > +	ip6->addr_seen = ip6->addr;
> > +	ip6->addr_ll_seen = ip6->addr_ll;
> >  
> >  	if (MAC_IS_ZERO(mac)) {
> >  		rc = nl_link_get_mac(nl_sock, ifi, mac);
> > diff --git a/dhcpv6.c b/dhcpv6.c
> > index bbed41dc..87b3c3eb 100644
> > --- a/dhcpv6.c
> > +++ b/dhcpv6.c
> > @@ -298,7 +298,8 @@ static struct opt_hdr *dhcpv6_ia_notonlink(const struct pool *p,
> >  {
> >  	char buf[INET6_ADDRSTRLEN];
> >  	struct in6_addr req_addr;
> > -	struct opt_hdr *ia, *h;
> > +	const struct opt_hdr *h;
> > +	struct opt_hdr *ia;
> >  	size_t offset;
> >  	int ia_type;
> >  
> > @@ -312,12 +313,13 @@ ia_ta:
> >  		offset += sizeof(struct opt_ia_na);
> >  
> >  		while ((h = dhcpv6_opt(p, &offset, OPT_IAAADR))) {
> > -			struct opt_ia_addr *opt_addr = (struct opt_ia_addr *)h;
> > +			const struct opt_ia_addr *opt_addr
> > +				= (const struct opt_ia_addr *)h;
> 
> Nit: the assignment could go on its own line, then?

Good point, done.

> >  			if (ntohs(h->l) != OPT_VSIZE(ia_addr))
> >  				return NULL;
> >  
> > -			memcpy(&req_addr, &opt_addr->addr, sizeof(req_addr));
> > +			req_addr = opt_addr->addr;
> >  			if (!IN6_ARE_ADDR_EQUAL(la, &req_addr)) {
> >  				info("DHCPv6: requested address %s not on link",
> >  				     inet_ntop(AF_INET6, &req_addr,
> > @@ -363,7 +365,7 @@ static size_t dhcpv6_dns_fill(const struct ctx *c, char *buf, int offset)
> >  			srv->hdr.l = 0;
> >  		}
> >  
> > -		memcpy(&srv->addr[i], &c->ip6.dns[i], sizeof(srv->addr[i]));
> > +		srv->addr[i] = c->ip6.dns[i];
> >  		srv->hdr.l += sizeof(srv->addr[i]);
> >  		offset += sizeof(srv->addr[i]);
> >  	}
> 
> I only reviewed up to this patch so far.
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/22] RFC: Allow configuration of special case NATs
  2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
                   ` (22 preceding siblings ...)
  2024-08-16 14:45 ` [PATCH 00/22] RFC: Allow configuration of special case NATs Paul Holzinger
@ 2024-08-19  8:46 ` David Gibson
  2024-08-19  9:27   ` Stefano Brivio
  23 siblings, 1 reply; 55+ messages in thread
From: David Gibson @ 2024-08-19  8:46 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: Paul Holzinger

[-- Attachment #1: Type: text/plain, Size: 3275 bytes --]

On Fri, Aug 16, 2024 at 03:39:41PM +1000, David Gibson wrote:
> Based on Stefano's recent patch for faster tests.
> 
> Allow the user to specify which addresses are translated when used by
> the guest, rather than always being the gateway address or nothing.
> We also allow this remapping to go to the host's global address (more
> precisely the address assigned to the guest) rather than just host
> loopback.
> 
> Suggestions for better names for the new options in patches 20 & 22
> are most welcome.
> 
> Along the way to implementing that make many changes to clarify what
> various addresses we track mean, fixing a number of small bugs as
> well.
> 
> NOTE: there is a bug in 21/22 which breaks some of the passt_tcp perf
> tests.  I haven't managed to figure out why it's causing the problem,
> or even what the exact triggering conditions are (running the single
> stalling iperf alone doesn't do it).  Have to wrap up for today, so I
> thought I'd get this out for review anyway.

I've identified the bug here.  IMO, it's a pre-existing problem that
only works by accident at the moment.  The immediate fix is pretty
obvious, but it raises some broader questions

The problem arises because of the MTU changes we make in order to test
throughput with different packet sizes.  Specifically we change the
MTU to values < 1280, which implicitly disables IPv6 since it requires
an MTU >= 1280.  When we change the MTU back to a larger value IPv6 is
re-enabled, but some configuration has been lost in the meantime.

After the MTU is restored the guest reconfigures with NDP, but does
not re-DHCPv6.  That means the guest gets a SLAAC address in the right
prefix but not the exact /128 address we've tried to assign to it.
However, at least with the sequence of things we have in the tests,
the guest never sends any packets with the new address, so passt
doesn't update addr_seen.  When the inbound connection comes we send
it to the assigned address instead of the guest's actual address and
the guest rejects it.

This "worked" previously, because before this patch, passt would
translate the inbound connection to have source/dest as link-local
addresses.  We *do* have a current addr_ll_seen because (a) it won't
change if the guest doesn't change MAC and (b) when IPv6 is re-enabled
the NDP traffic the guest generates will have link-local addresses
that update addr_ll_seen.  With this patch, and a global address for
--map-host-loopback, we now need to send to addr_seen instead of
addr_ll_seen, hence exposing the bug.

In the short term, the obvious fix would be to re-run dhclient -6 in
the guest after we twiddle MTU but before running IPv6 tests.

This kind of opens a question about how hard we should try to
accomodate guests which don't configure themselves how we told them.
Personally I'd be ok with saying that nothing works if the guest
doesn't configure itself properly, thereby removing addr_seen and
addr_ll_seen entirely.  But I think, Stefano, you've been against that
idea in the past.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/22] RFC: Allow configuration of special case NATs
  2024-08-19  8:46 ` David Gibson
@ 2024-08-19  9:27   ` Stefano Brivio
  2024-08-19  9:52     ` David Gibson
  0 siblings, 1 reply; 55+ messages in thread
From: Stefano Brivio @ 2024-08-19  9:27 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Paul Holzinger

On Mon, 19 Aug 2024 18:46:31 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Fri, Aug 16, 2024 at 03:39:41PM +1000, David Gibson wrote:
> > Based on Stefano's recent patch for faster tests.
> > 
> > Allow the user to specify which addresses are translated when used by
> > the guest, rather than always being the gateway address or nothing.
> > We also allow this remapping to go to the host's global address (more
> > precisely the address assigned to the guest) rather than just host
> > loopback.
> > 
> > Suggestions for better names for the new options in patches 20 & 22
> > are most welcome.
> > 
> > Along the way to implementing that make many changes to clarify what
> > various addresses we track mean, fixing a number of small bugs as
> > well.
> > 
> > NOTE: there is a bug in 21/22 which breaks some of the passt_tcp perf
> > tests.  I haven't managed to figure out why it's causing the problem,
> > or even what the exact triggering conditions are (running the single
> > stalling iperf alone doesn't do it).  Have to wrap up for today, so I
> > thought I'd get this out for review anyway.  
> 
> I've identified the bug here.  IMO, it's a pre-existing problem that
> only works by accident at the moment.  The immediate fix is pretty
> obvious, but it raises some broader questions
> 
> The problem arises because of the MTU changes we make in order to test
> throughput with different packet sizes.  Specifically we change the
> MTU to values < 1280, which implicitly disables IPv6 since it requires
> an MTU >= 1280.  When we change the MTU back to a larger value IPv6 is
> re-enabled, but some configuration has been lost in the meantime.
> 
> After the MTU is restored the guest reconfigures with NDP, but does
> not re-DHCPv6.  That means the guest gets a SLAAC address in the right
> prefix but not the exact /128 address we've tried to assign to it.
> However, at least with the sequence of things we have in the tests,
> the guest never sends any packets with the new address, so passt
> doesn't update addr_seen.  When the inbound connection comes we send
> it to the assigned address instead of the guest's actual address and
> the guest rejects it.

I still have to take a closer look, but I'm fairly sure I hit a similar
issue while I was writing these tests originally. I pondered
reconfiguring the address via DHCPv6, or using the keep_addr_on_down
sysctl (net.ipv6.conf.<interface>.keep_addr_on_down), which was added
around that time.

Then:

> This "worked" previously, because before this patch, passt would
> translate the inbound connection to have source/dest as link-local
> addresses.

...I realised that this worked and forgot about the whole issue.

> We *do* have a current addr_ll_seen because (a) it won't
> change if the guest doesn't change MAC and (b) when IPv6 is re-enabled
> the NDP traffic the guest generates will have link-local addresses
> that update addr_ll_seen.  With this patch, and a global address for
> --map-host-loopback, we now need to send to addr_seen instead of
> addr_ll_seen, hence exposing the bug.
> 
> In the short term, the obvious fix would be to re-run dhclient -6 in
> the guest after we twiddle MTU but before running IPv6 tests.

I guess setting keep_addr_on_down (even for "all" interfaces) should
work as well.

> This kind of opens a question about how hard we should try to
> accomodate guests which don't configure themselves how we told them.

There's a notable distinction between guests temporarily diverging (in
different ways) and guests we don't configure at all.

It's probably more important to ensure we use the right type of address
(security) rather than ensuring we somehow manage to deliver packets at
any time (minor glitch otherwise), also because the one you describe is
something we're unlikely to hit outside of tests.

> Personally I'd be ok with saying that nothing works if the guest
> doesn't configure itself properly, thereby removing addr_seen and
> addr_ll_seen entirely.  But I think, Stefano, you've been against that
> idea in the past.

Yes, I still think we should support guests that don't use DHCPv6 or
NDP at all, or where related exchanges fail for any reason. It improves
reliability and compatibility at a small cost. In this case, I think
it's a nice feature that we would resume communicating as soon as the
guest shows its global unicast address.

If the cost is using the wrong type of address, then not, I'm not
suggesting we do that, so I think the change from this series is
desirable, but in a general case, things just work and we don't break
anything, as far as I know.

-- 
Stefano


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/22] RFC: Allow configuration of special case NATs
  2024-08-19  9:27   ` Stefano Brivio
@ 2024-08-19  9:52     ` David Gibson
  2024-08-19 13:01       ` Stefano Brivio
  0 siblings, 1 reply; 55+ messages in thread
From: David Gibson @ 2024-08-19  9:52 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Paul Holzinger

[-- Attachment #1: Type: text/plain, Size: 8201 bytes --]

On Mon, Aug 19, 2024 at 11:27:49AM +0200, Stefano Brivio wrote:
> On Mon, 19 Aug 2024 18:46:31 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Fri, Aug 16, 2024 at 03:39:41PM +1000, David Gibson wrote:
> > > Based on Stefano's recent patch for faster tests.
> > > 
> > > Allow the user to specify which addresses are translated when used by
> > > the guest, rather than always being the gateway address or nothing.
> > > We also allow this remapping to go to the host's global address (more
> > > precisely the address assigned to the guest) rather than just host
> > > loopback.
> > > 
> > > Suggestions for better names for the new options in patches 20 & 22
> > > are most welcome.
> > > 
> > > Along the way to implementing that make many changes to clarify what
> > > various addresses we track mean, fixing a number of small bugs as
> > > well.
> > > 
> > > NOTE: there is a bug in 21/22 which breaks some of the passt_tcp perf
> > > tests.  I haven't managed to figure out why it's causing the problem,
> > > or even what the exact triggering conditions are (running the single
> > > stalling iperf alone doesn't do it).  Have to wrap up for today, so I
> > > thought I'd get this out for review anyway.  
> > 
> > I've identified the bug here.  IMO, it's a pre-existing problem that
> > only works by accident at the moment.  The immediate fix is pretty
> > obvious, but it raises some broader questions
> > 
> > The problem arises because of the MTU changes we make in order to test
> > throughput with different packet sizes.  Specifically we change the
> > MTU to values < 1280, which implicitly disables IPv6 since it requires
> > an MTU >= 1280.  When we change the MTU back to a larger value IPv6 is
> > re-enabled, but some configuration has been lost in the meantime.
> > 
> > After the MTU is restored the guest reconfigures with NDP, but does
> > not re-DHCPv6.  That means the guest gets a SLAAC address in the right
> > prefix but not the exact /128 address we've tried to assign to it.
> > However, at least with the sequence of things we have in the tests,
> > the guest never sends any packets with the new address, so passt
> > doesn't update addr_seen.  When the inbound connection comes we send
> > it to the assigned address instead of the guest's actual address and
> > the guest rejects it.
> 
> I still have to take a closer look, but I'm fairly sure I hit a similar
> issue while I was writing these tests originally. I pondered
> reconfiguring the address via DHCPv6, or using the keep_addr_on_down
> sysctl (net.ipv6.conf.<interface>.keep_addr_on_down), which was added
> around that time.
> 
> Then:
> 
> > This "worked" previously, because before this patch, passt would
> > translate the inbound connection to have source/dest as link-local
> > addresses.
> 
> ...I realised that this worked and forgot about the whole issue.
> 
> > We *do* have a current addr_ll_seen because (a) it won't
> > change if the guest doesn't change MAC and (b) when IPv6 is re-enabled
> > the NDP traffic the guest generates will have link-local addresses
> > that update addr_ll_seen.  With this patch, and a global address for
> > --map-host-loopback, we now need to send to addr_seen instead of
> > addr_ll_seen, hence exposing the bug.
> > 
> > In the short term, the obvious fix would be to re-run dhclient -6 in
> > the guest after we twiddle MTU but before running IPv6 tests.
> 
> I guess setting keep_addr_on_down (even for "all" interfaces) should
> work as well.

Sounds like it.  I wasn't aware of that one.

/me tests..  actually, no it doesn't work..

# sysctl -a | grep keep_addr_on_down
net.ipv6.conf.all.keep_addr_on_down = 1
net.ipv6.conf.default.keep_addr_on_down = 1
net.ipv6.conf.dummy0.keep_addr_on_down = 1
net.ipv6.conf.lo.keep_addr_on_down = 0
# ip addr add 2001:db8::1 dev dummy0
# ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether c2:02:f2:79:f9:94 brd ff:ff:ff:ff:ff:ff
    inet6 2001:db8::1/128 scope global 
       valid_lft forever preferred_lft forever
# ip link set dummy0 mtu 1200
# ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: dummy0: <BROADCAST,NOARP> mtu 1200 qdisc noop state DOWN group default qlen 1000
    link/ether c2:02:f2:79:f9:94 brd ff:ff:ff:ff:ff:ff
# ip link set dummy0 mtu 1500
# ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether c2:02:f2:79:f9:94 brd ff:ff:ff:ff:ff:ff

My guess is that IPv6 being deconfigured because of an unsuitable MTU
is considered a different event from a mere "down".

> > This kind of opens a question about how hard we should try to
> > accomodate guests which don't configure themselves how we told them.
> 
> There's a notable distinction between guests temporarily diverging (in
> different ways) and guests we don't configure at all.

I'm not really sure what you're getting at here.

> It's probably more important to ensure we use the right type of address

"type" in what sense here?

> (security) rather than ensuring we somehow manage to deliver packets at
> any time (minor glitch otherwise), also because the one you describe is
> something we're unlikely to hit outside of tests.
> 
> > Personally I'd be ok with saying that nothing works if the guest
> > doesn't configure itself properly, thereby removing addr_seen and
> > addr_ll_seen entirely.  But I think, Stefano, you've been against that
> > idea in the past.
> 
> Yes, I still think we should support guests that don't use DHCPv6 or
> NDP at all,

Well, you still wouldn't *need* DHCPv6 or NDP, but you'd have to
manually configure the interface in the guest to match the address
you've configured with -a.  Just like you'd expect to have to
correctly configure your address on a real network.

> or where related exchanges fail for any reason. It improves
> reliability and compatibility at a small cost. In this case, I think
> it's a nice feature that we would resume communicating as soon as the
> guest shows its global unicast address.

Hm, maybe.  I'm not entirely convinced the cost is so small long term.
It's pretty badly incompatible with having multiple guests behind the
same passt instance: such as the initial guest bridging or routing to
nested guests.

I'm actually not sure if encountering this bug makes me more or less
in favour of addr_seen.  On the one hand I think it highlights the
flakiness of this approach; there are situations where we just won't
know the right address.  On the other hand if shows a relatively
plausible case where the guest won't get exactly the address we want
it to (it uses NDP but not DHCPv6)

Hrm... actually this also shows a potential danger in the recent
patches to disable DAD in the guest.  With DAD enabled, when the guest
grabs a new address, we'd expect it to emit DAD messages, which would
have the side effect of updating our addr_seen (although I'm pretty
sure I hit this patch before the nodad patches were applied, so that
doesn't seem to be foolproof).

We could maybe update addr_seen when we send RA messages to the guest
- assuming that it will use the same host part (low 64-bits) for both
link-local and global addresses.  Not sure if that's a widely safe
assumption or not.

> If the cost is using the wrong type of address, then not, I'm not
> suggesting we do that, so I think the change from this series is
> desirable, but in a general case, things just work and we don't break
> anything, as far as I know.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/22] RFC: Allow configuration of special case NATs
  2024-08-19  9:52     ` David Gibson
@ 2024-08-19 13:01       ` Stefano Brivio
  2024-08-20  0:42         ` David Gibson
  0 siblings, 1 reply; 55+ messages in thread
From: Stefano Brivio @ 2024-08-19 13:01 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Paul Holzinger

On Mon, 19 Aug 2024 19:52:49 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Mon, Aug 19, 2024 at 11:27:49AM +0200, Stefano Brivio wrote:
> > On Mon, 19 Aug 2024 18:46:31 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Fri, Aug 16, 2024 at 03:39:41PM +1000, David Gibson wrote:  
> > > > Based on Stefano's recent patch for faster tests.
> > > > 
> > > > Allow the user to specify which addresses are translated when used by
> > > > the guest, rather than always being the gateway address or nothing.
> > > > We also allow this remapping to go to the host's global address (more
> > > > precisely the address assigned to the guest) rather than just host
> > > > loopback.
> > > > 
> > > > Suggestions for better names for the new options in patches 20 & 22
> > > > are most welcome.
> > > > 
> > > > Along the way to implementing that make many changes to clarify what
> > > > various addresses we track mean, fixing a number of small bugs as
> > > > well.
> > > > 
> > > > NOTE: there is a bug in 21/22 which breaks some of the passt_tcp perf
> > > > tests.  I haven't managed to figure out why it's causing the problem,
> > > > or even what the exact triggering conditions are (running the single
> > > > stalling iperf alone doesn't do it).  Have to wrap up for today, so I
> > > > thought I'd get this out for review anyway.    
> > > 
> > > I've identified the bug here.  IMO, it's a pre-existing problem that
> > > only works by accident at the moment.  The immediate fix is pretty
> > > obvious, but it raises some broader questions
> > > 
> > > The problem arises because of the MTU changes we make in order to test
> > > throughput with different packet sizes.  Specifically we change the
> > > MTU to values < 1280, which implicitly disables IPv6 since it requires
> > > an MTU >= 1280.  When we change the MTU back to a larger value IPv6 is
> > > re-enabled, but some configuration has been lost in the meantime.
> > > 
> > > After the MTU is restored the guest reconfigures with NDP, but does
> > > not re-DHCPv6.  That means the guest gets a SLAAC address in the right
> > > prefix but not the exact /128 address we've tried to assign to it.
> > > However, at least with the sequence of things we have in the tests,
> > > the guest never sends any packets with the new address, so passt
> > > doesn't update addr_seen.  When the inbound connection comes we send
> > > it to the assigned address instead of the guest's actual address and
> > > the guest rejects it.  
> > 
> > I still have to take a closer look, but I'm fairly sure I hit a similar
> > issue while I was writing these tests originally. I pondered
> > reconfiguring the address via DHCPv6, or using the keep_addr_on_down
> > sysctl (net.ipv6.conf.<interface>.keep_addr_on_down), which was added
> > around that time.
> > 
> > Then:
> >   
> > > This "worked" previously, because before this patch, passt would
> > > translate the inbound connection to have source/dest as link-local
> > > addresses.  
> > 
> > ...I realised that this worked and forgot about the whole issue.
> >   
> > > We *do* have a current addr_ll_seen because (a) it won't
> > > change if the guest doesn't change MAC and (b) when IPv6 is re-enabled
> > > the NDP traffic the guest generates will have link-local addresses
> > > that update addr_ll_seen.  With this patch, and a global address for
> > > --map-host-loopback, we now need to send to addr_seen instead of
> > > addr_ll_seen, hence exposing the bug.
> > > 
> > > In the short term, the obvious fix would be to re-run dhclient -6 in
> > > the guest after we twiddle MTU but before running IPv6 tests.  
> > 
> > I guess setting keep_addr_on_down (even for "all" interfaces) should
> > work as well.  
> 
> Sounds like it.  I wasn't aware of that one.
> 
> /me tests..  actually, no it doesn't work..
> 
> # sysctl -a | grep keep_addr_on_down
> net.ipv6.conf.all.keep_addr_on_down = 1
> net.ipv6.conf.default.keep_addr_on_down = 1
> net.ipv6.conf.dummy0.keep_addr_on_down = 1
> net.ipv6.conf.lo.keep_addr_on_down = 0
> # ip addr add 2001:db8::1 dev dummy0
> # ip a
> 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
>     link/ether c2:02:f2:79:f9:94 brd ff:ff:ff:ff:ff:ff
>     inet6 2001:db8::1/128 scope global 
>        valid_lft forever preferred_lft forever
> # ip link set dummy0 mtu 1200
> # ip a
> 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> 2: dummy0: <BROADCAST,NOARP> mtu 1200 qdisc noop state DOWN group default qlen 1000
>     link/ether c2:02:f2:79:f9:94 brd ff:ff:ff:ff:ff:ff
> # ip link set dummy0 mtu 1500
> # ip a
> 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
>     link/ether c2:02:f2:79:f9:94 brd ff:ff:ff:ff:ff:ff
> 
> My guess is that IPv6 being deconfigured because of an unsuitable MTU
> is considered a different event from a mere "down".

I guess it's because they're not IFA_F_PERMANENT, because
addrconf_permanent_addr() has:

        case NETDEV_CHANGEMTU:
                /* if MTU under IPV6_MIN_MTU stop IPv6 on this interface. */
                if (dev->mtu < IPV6_MIN_MTU) {
                        addrconf_ifdown(dev, dev != net->loopback_dev);
                        break;
                }

but addrconf_ifdown() does:

                                if (!keep_addr ||
                                    !(ifa->flags & IFA_F_PERMANENT) ||
                                    addr_is_local(&ifa->addr)) {
                                        hlist_del_init_rcu(&ifa->addr_lst);
                                        goto restart;
                                }

I'm not sure about the logic behind that. We could actually set those
addresses as permanent once the DHCPv6 client configures them, if it's
cleaner.

> > > This kind of opens a question about how hard we should try to
> > > accomodate guests which don't configure themselves how we told them.  
> > 
> > There's a notable distinction between guests temporarily diverging (in
> > different ways) and guests we don't configure at all.  
> 
> I'm not really sure what you're getting at here.

In this case, it's not true that the guest doesn't configure itself in
the way we requested -- it's just a temporary diversion from that
configuration.

Those are different cases that we can handle in different ways, I
think. If it's a glitch that will only happen during testing, let's
work around that.

But if the guest really ignores DHCPv6 information, I think we should
keep that working.

> > It's probably more important to ensure we use the right type of address  
> 
> "type" in what sense here?

Global unicast instead of link-local.

> > (security) rather than ensuring we somehow manage to deliver packets at
> > any time (minor glitch otherwise), also because the one you describe is
> > something we're unlikely to hit outside of tests.
> >   
> > > Personally I'd be ok with saying that nothing works if the guest
> > > doesn't configure itself properly, thereby removing addr_seen and
> > > addr_ll_seen entirely.  But I think, Stefano, you've been against that
> > > idea in the past.  
> > 
> > Yes, I still think we should support guests that don't use DHCPv6 or
> > NDP at all,  
> 
> Well, you still wouldn't *need* DHCPv6 or NDP, but you'd have to
> manually configure the interface in the guest to match the address
> you've configured with -a.  Just like you'd expect to have to
> correctly configure your address on a real network.

True, but if we make correctness as optional as possible, we'll be more
compatible (less time spent by users fixing situations that don't
necessarily need fixing, less time spent by developers to look into
reports, no matter who's at fault).

> > or where related exchanges fail for any reason. It improves
> > reliability and compatibility at a small cost. In this case, I think
> > it's a nice feature that we would resume communicating as soon as the
> > guest shows its global unicast address.  
> 
> Hm, maybe.  I'm not entirely convinced the cost is so small long term.
> It's pretty badly incompatible with having multiple guests behind the
> same passt instance: such as the initial guest bridging or routing to
> nested guests.

Why? We will need to hash the interface/guest index anyway, for
outbound flows.

And for inbound flows, if a guest steals the address of another guest,
we'll give priority to the normal 'addr' versions instead of the
'_seen' ones, to decide how to direct traffic.

> I'm actually not sure if encountering this bug makes me more or less
> in favour of addr_seen.  On the one hand I think it highlights the
> flakiness of this approach; there are situations where we just won't
> know the right address.

I don't understand this argument: indeed, there are such situations,
and they are annoying. Why should we make them more common?

> On the other hand if shows a relatively
> plausible case where the guest won't get exactly the address we want
> it to (it uses NDP but not DHCPv6)
> 
> Hrm... actually this also shows a potential danger in the recent
> patches to disable DAD in the guest.  With DAD enabled, when the guest
> grabs a new address, we'd expect it to emit DAD messages, which would
> have the side effect of updating our addr_seen (although I'm pretty
> sure I hit this patch before the nodad patches were applied, so that
> doesn't seem to be foolproof).

Well, but we do that for containers with --config-net only. In that
case, the addresses we configure have infinite lifetime anyway.

Besides, I don't think we need to have addr_seen updated as quickly and
correctly as possible just for the sake of it, we can also update it
when we get any other neighbour solicitation because the guest is
actually using the network. It's not meant to be perfect.

> We could maybe update addr_seen when we send RA messages to the guest
> - assuming that it will use the same host part (low 64-bits) for both
> link-local and global addresses.  Not sure if that's a widely safe
> assumption or not.

I don't understand: what case are you trying to cover with this?

-- 
Stefano


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 12/22] util: Correct sock_l4() binding for link local addresses
  2024-08-16  5:39 ` [PATCH 12/22] util: Correct sock_l4() binding for link local addresses David Gibson
@ 2024-08-20  0:14   ` Stefano Brivio
  2024-08-20  1:29     ` David Gibson
  0 siblings, 1 reply; 55+ messages in thread
From: Stefano Brivio @ 2024-08-20  0:14 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Paul Holzinger

On Fri, 16 Aug 2024 15:39:53 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> When binding an IPv6 socket in sock_l4() we need to supply a scope id if
> the address is link-local.  We check for this by comparing the given
> address to c->ip6.addr_ll.  This is correct only by accident: while
> c->ip6.addr_ll is typically set to the hsot interface's link local
> address, the actually purpose of it is to provide a link local address

Nits: host, actual

-- 
Stefano


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 13/22] treewide: Change misleading 'addr_ll' name
  2024-08-16  5:39 ` [PATCH 13/22] treewide: Change misleading 'addr_ll' name David Gibson
@ 2024-08-20  0:15   ` Stefano Brivio
  2024-08-20  1:30     ` David Gibson
  0 siblings, 1 reply; 55+ messages in thread
From: Stefano Brivio @ 2024-08-20  0:15 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Paul Holzinger

On Fri, 16 Aug 2024 15:39:54 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> c->ip6.addr_ll is not like c->ip6.addr.  The latter is an address for the
> guest, but the former is an address for our use on the tap link.  Rename it
> accordingly, to 'our_tap_ll'.

Same as 3/22: could this be "our_ll"? Same here, not a strong
preference.

I reviewed only up to 16/22 so far.

-- 
Stefano


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/22] RFC: Allow configuration of special case NATs
  2024-08-19 13:01       ` Stefano Brivio
@ 2024-08-20  0:42         ` David Gibson
  2024-08-20 20:39           ` Stefano Brivio
  0 siblings, 1 reply; 55+ messages in thread
From: David Gibson @ 2024-08-20  0:42 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Paul Holzinger

[-- Attachment #1: Type: text/plain, Size: 12638 bytes --]

On Mon, Aug 19, 2024 at 03:01:00PM +0200, Stefano Brivio wrote:
> On Mon, 19 Aug 2024 19:52:49 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Mon, Aug 19, 2024 at 11:27:49AM +0200, Stefano Brivio wrote:
> > > On Mon, 19 Aug 2024 18:46:31 +1000
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > On Fri, Aug 16, 2024 at 03:39:41PM +1000, David Gibson wrote:  
> > > > > Based on Stefano's recent patch for faster tests.
> > > > > 
> > > > > Allow the user to specify which addresses are translated when used by
> > > > > the guest, rather than always being the gateway address or nothing.
> > > > > We also allow this remapping to go to the host's global address (more
> > > > > precisely the address assigned to the guest) rather than just host
> > > > > loopback.
> > > > > 
> > > > > Suggestions for better names for the new options in patches 20 & 22
> > > > > are most welcome.
> > > > > 
> > > > > Along the way to implementing that make many changes to clarify what
> > > > > various addresses we track mean, fixing a number of small bugs as
> > > > > well.
> > > > > 
> > > > > NOTE: there is a bug in 21/22 which breaks some of the passt_tcp perf
> > > > > tests.  I haven't managed to figure out why it's causing the problem,
> > > > > or even what the exact triggering conditions are (running the single
> > > > > stalling iperf alone doesn't do it).  Have to wrap up for today, so I
> > > > > thought I'd get this out for review anyway.    
> > > > 
> > > > I've identified the bug here.  IMO, it's a pre-existing problem that
> > > > only works by accident at the moment.  The immediate fix is pretty
> > > > obvious, but it raises some broader questions
> > > > 
> > > > The problem arises because of the MTU changes we make in order to test
> > > > throughput with different packet sizes.  Specifically we change the
> > > > MTU to values < 1280, which implicitly disables IPv6 since it requires
> > > > an MTU >= 1280.  When we change the MTU back to a larger value IPv6 is
> > > > re-enabled, but some configuration has been lost in the meantime.
> > > > 
> > > > After the MTU is restored the guest reconfigures with NDP, but does
> > > > not re-DHCPv6.  That means the guest gets a SLAAC address in the right
> > > > prefix but not the exact /128 address we've tried to assign to it.
> > > > However, at least with the sequence of things we have in the tests,
> > > > the guest never sends any packets with the new address, so passt
> > > > doesn't update addr_seen.  When the inbound connection comes we send
> > > > it to the assigned address instead of the guest's actual address and
> > > > the guest rejects it.  
> > > 
> > > I still have to take a closer look, but I'm fairly sure I hit a similar
> > > issue while I was writing these tests originally. I pondered
> > > reconfiguring the address via DHCPv6, or using the keep_addr_on_down
> > > sysctl (net.ipv6.conf.<interface>.keep_addr_on_down), which was added
> > > around that time.
> > > 
> > > Then:
> > >   
> > > > This "worked" previously, because before this patch, passt would
> > > > translate the inbound connection to have source/dest as link-local
> > > > addresses.  
> > > 
> > > ...I realised that this worked and forgot about the whole issue.
> > >   
> > > > We *do* have a current addr_ll_seen because (a) it won't
> > > > change if the guest doesn't change MAC and (b) when IPv6 is re-enabled
> > > > the NDP traffic the guest generates will have link-local addresses
> > > > that update addr_ll_seen.  With this patch, and a global address for
> > > > --map-host-loopback, we now need to send to addr_seen instead of
> > > > addr_ll_seen, hence exposing the bug.
> > > > 
> > > > In the short term, the obvious fix would be to re-run dhclient -6 in
> > > > the guest after we twiddle MTU but before running IPv6 tests.  
> > > 
> > > I guess setting keep_addr_on_down (even for "all" interfaces) should
> > > work as well.  
> > 
> > Sounds like it.  I wasn't aware of that one.
> > 
> > /me tests..  actually, no it doesn't work..
> > 
> > # sysctl -a | grep keep_addr_on_down
> > net.ipv6.conf.all.keep_addr_on_down = 1
> > net.ipv6.conf.default.keep_addr_on_down = 1
> > net.ipv6.conf.dummy0.keep_addr_on_down = 1
> > net.ipv6.conf.lo.keep_addr_on_down = 0
> > # ip addr add 2001:db8::1 dev dummy0
> > # ip a
> > 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
> >     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> > 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
> >     link/ether c2:02:f2:79:f9:94 brd ff:ff:ff:ff:ff:ff
> >     inet6 2001:db8::1/128 scope global 
> >        valid_lft forever preferred_lft forever
> > # ip link set dummy0 mtu 1200
> > # ip a
> > 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
> >     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> > 2: dummy0: <BROADCAST,NOARP> mtu 1200 qdisc noop state DOWN group default qlen 1000
> >     link/ether c2:02:f2:79:f9:94 brd ff:ff:ff:ff:ff:ff
> > # ip link set dummy0 mtu 1500
> > # ip a
> > 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
> >     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> > 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
> >     link/ether c2:02:f2:79:f9:94 brd ff:ff:ff:ff:ff:ff
> > 
> > My guess is that IPv6 being deconfigured because of an unsuitable MTU
> > is considered a different event from a mere "down".
> 
> I guess it's because they're not IFA_F_PERMANENT, because
> addrconf_permanent_addr() has:
> 
>         case NETDEV_CHANGEMTU:
>                 /* if MTU under IPV6_MIN_MTU stop IPv6 on this interface. */
>                 if (dev->mtu < IPV6_MIN_MTU) {
>                         addrconf_ifdown(dev, dev != net->loopback_dev);
>                         break;
>                 }
> 
> but addrconf_ifdown() does:
> 
>                                 if (!keep_addr ||
>                                     !(ifa->flags & IFA_F_PERMANENT) ||
>                                     addr_is_local(&ifa->addr)) {
>                                         hlist_del_init_rcu(&ifa->addr_lst);
>                                         goto restart;
>                                 }
> 
> I'm not sure about the logic behind that. We could actually set those
> addresses as permanent once the DHCPv6 client configures them, if it's
> cleaner.

Huh.  Not in the passt/VM case, though, which is where I actually
encountered this.

> > > > This kind of opens a question about how hard we should try to
> > > > accomodate guests which don't configure themselves how we told them.  
> > > 
> > > There's a notable distinction between guests temporarily diverging (in
> > > different ways) and guests we don't configure at all.  
> > 
> > I'm not really sure what you're getting at here.
> 
> In this case, it's not true that the guest doesn't configure itself in
> the way we requested -- it's just a temporary diversion from that
> configuration.

Oh, I see.  Assuming that at some point the DHCP client will re-run.

> Those are different cases that we can handle in different ways, I
> think. If it's a glitch that will only happen during testing, let's
> work around that.
> 
> But if the guest really ignores DHCPv6 information, I think we should
> keep that working.
> 
> > > It's probably more important to ensure we use the right type of address  
> > 
> > "type" in what sense here?
> 
> Global unicast instead of link-local.

Ok.

> > > (security) rather than ensuring we somehow manage to deliver packets at
> > > any time (minor glitch otherwise), also because the one you describe is
> > > something we're unlikely to hit outside of tests.
> > >   
> > > > Personally I'd be ok with saying that nothing works if the guest
> > > > doesn't configure itself properly, thereby removing addr_seen and
> > > > addr_ll_seen entirely.  But I think, Stefano, you've been against that
> > > > idea in the past.  
> > > 
> > > Yes, I still think we should support guests that don't use DHCPv6 or
> > > NDP at all,  
> > 
> > Well, you still wouldn't *need* DHCPv6 or NDP, but you'd have to
> > manually configure the interface in the guest to match the address
> > you've configured with -a.  Just like you'd expect to have to
> > correctly configure your address on a real network.
> 
> True, but if we make correctness as optional as possible, we'll be more
> compatible (less time spent by users fixing situations that don't
> necessarily need fixing, less time spent by developers to look into
> reports, no matter who's at fault).

Eh, maybe.  Unless us trying to make sense of a nonsense situation
causes some unpredictable behaviour that breaks something else.

> > > or where related exchanges fail for any reason. It improves
> > > reliability and compatibility at a small cost. In this case, I think
> > > it's a nice feature that we would resume communicating as soon as the
> > > guest shows its global unicast address.  
> > 
> > Hm, maybe.  I'm not entirely convinced the cost is so small long term.
> > It's pretty badly incompatible with having multiple guests behind the
> > same passt instance: such as the initial guest bridging or routing to
> > nested guests.
> 
> Why? We will need to hash the interface/guest index anyway, for
> outbound flows.

If we have separate interfaces for each guest, yes.  But not if we
have multiple guests behind a single tap because the initial guest
sets up a bridge or routing.  Then we have nothing but the address.

> And for inbound flows, if a guest steals the address of another guest,
> we'll give priority to the normal 'addr' versions instead of the
> '_seen' ones, to decide how to direct traffic.

I don't see how we'd know we're in this situation, so when to
prioritise which address over the other.

> > I'm actually not sure if encountering this bug makes me more or less
> > in favour of addr_seen.  On the one hand I think it highlights the
> > flakiness of this approach; there are situations where we just won't
> > know the right address.
> 
> I don't understand this argument: indeed, there are such situations,
> and they are annoying. Why should we make them more common?

Because predictability is good, and working _most_ of the time is a
failure of predictability.

> > On the other hand if shows a relatively
> > plausible case where the guest won't get exactly the address we want
> > it to (it uses NDP but not DHCPv6)
> > 
> > Hrm... actually this also shows a potential danger in the recent
> > patches to disable DAD in the guest.  With DAD enabled, when the guest
> > grabs a new address, we'd expect it to emit DAD messages, which would
> > have the side effect of updating our addr_seen (although I'm pretty
> > sure I hit this patch before the nodad patches were applied, so that
> > doesn't seem to be foolproof).
> 
> Well, but we do that for containers with --config-net only. In that
> case, the addresses we configure have infinite lifetime anyway.

Oh, good point.  Hrm... then I'm unsure why the guest wasn't re-DADing
its new address.

> Besides, I don't think we need to have addr_seen updated as quickly and
> correctly as possible just for the sake of it, we can also update it
> when we get any other neighbour solicitation because the guest is
> actually using the network. It's not meant to be perfect.

If the guest is a pure server (a common case for containers AFAICT),
then I don't know that we can expect NS messages for anything other
than the default gateway, which is (typically) link-local and so won't
help us to learn the new global address.

> > We could maybe update addr_seen when we send RA messages to the guest
> > - assuming that it will use the same host part (low 64-bits) for both
> > link-local and global addresses.  Not sure if that's a widely safe
> > assumption or not.
> 
> I don't understand: what case are you trying to cover with this?

A case just like the one in the tests: the interface bounces, and we
get NDP traffic on the link-local address, but nothing on the global
address before an inbound connection.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 12/22] util: Correct sock_l4() binding for link local addresses
  2024-08-20  0:14   ` Stefano Brivio
@ 2024-08-20  1:29     ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-20  1:29 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Paul Holzinger

[-- Attachment #1: Type: text/plain, Size: 767 bytes --]

On Tue, Aug 20, 2024 at 02:14:59AM +0200, Stefano Brivio wrote:
> On Fri, 16 Aug 2024 15:39:53 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > When binding an IPv6 socket in sock_l4() we need to supply a scope id if
> > the address is link-local.  We check for this by comparing the given
> > address to c->ip6.addr_ll.  This is correct only by accident: while
> > c->ip6.addr_ll is typically set to the hsot interface's link local
> > address, the actually purpose of it is to provide a link local address
> 
> Nits: host, actual

Fixed.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 13/22] treewide: Change misleading 'addr_ll' name
  2024-08-20  0:15   ` Stefano Brivio
@ 2024-08-20  1:30     ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-20  1:30 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Paul Holzinger

[-- Attachment #1: Type: text/plain, Size: 882 bytes --]

On Tue, Aug 20, 2024 at 02:15:03AM +0200, Stefano Brivio wrote:
> On Fri, 16 Aug 2024 15:39:54 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > c->ip6.addr_ll is not like c->ip6.addr.  The latter is an address for the
> > guest, but the former is an address for our use on the tap link.  Rename it
> > accordingly, to 'our_tap_ll'.
> 
> Same as 3/22: could this be "our_ll"? Same here, not a strong
> preference.

Same answer here.  Maybe, but I want to emphasise that it's our
address as used on PIF_TAP.  Obviously we may use host LL addresses if
we contact external hosts on the same link as the host.

> 
> I reviewed only up to 16/22 so far.
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 16/22] fwd: Helpers to clarify what host addresses aren't guest accessible
  2024-08-16  5:39 ` [PATCH 16/22] fwd: Helpers to clarify what host addresses aren't guest accessible David Gibson
@ 2024-08-20 19:56   ` Stefano Brivio
  2024-08-21  1:40     ` David Gibson
  0 siblings, 1 reply; 55+ messages in thread
From: Stefano Brivio @ 2024-08-20 19:56 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Paul Holzinger

On Fri, 16 Aug 2024 15:39:57 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> We usually avoid NAT, but in a few cases we need to apply address
> translations.  For inbound connections that happens for addresses which
> make sense to the host but are either inaccessible, or mean a different
> location from the guest's point of view.
> 
> Add some helper functions to determine such addresses, and use them in
> fwd_nat_from_host().  In doing so clarify some of the reasons for the
> logic.  We'll also have further use for these helpers in future.
> 
> While we're there fix one unneccessary inconsistency between IPv4 and IPv6.
> We always translated the guest's observed address, but for IPv4 we didn't
> translate the guest's assigned address, whereas for IPv6 we did.  Change
> this to translate both in all cases for consistency.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  fwd.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 87 insertions(+), 11 deletions(-)
> 
> diff --git a/fwd.c b/fwd.c
> index 75dc0151..1baae338 100644
> --- a/fwd.c
> +++ b/fwd.c
> @@ -170,6 +170,85 @@ static bool is_dns_flow(uint8_t proto, const struct flowside *ini)
>  		((ini->oport == 53) || (ini->oport == 853));
>  }
>  
> +/**
> + * fwd_guest_accessible4() - Is IPv4 address guest accessible

Nit: I wonder if we should say "guest-accessible" in all these cases,
it's a bit easier for me to decode, but not necessarily more correct.
It's fine by me either way.

-- 
Stefano


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 17/22] fwd: Split notion of "our tap address" from gateway for IPv4
  2024-08-16  5:39 ` [PATCH 17/22] fwd: Split notion of "our tap address" from gateway for IPv4 David Gibson
@ 2024-08-20 19:56   ` Stefano Brivio
  2024-08-21  1:56     ` David Gibson
  0 siblings, 1 reply; 55+ messages in thread
From: Stefano Brivio @ 2024-08-20 19:56 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Paul Holzinger

On Fri, 16 Aug 2024 15:39:58 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> ip4.gw conflates 3 conceptually different things, which (for now) have the
> same value:
>   1. The router/gateway address as seen by the guest
>   2. An address to NAT to the host with --no-map-gw isn't specified
>   3. An address to use as source when nothing else makes sense
> 
> Case 3 occurs in two situations:
> 
> a) for our DHCP responses - since they come from passt internally there's
>    no naturally meaningful address for them to come from
> b) for forwarded connections coming from an address that isn't guest
>    accessible (localhost or the guest's own address).
> 
> (b) occurs even with --no-map-gw, and the expected behaviour of forwarding
> local connections requires it.
> 
> For IPv6 role (3) is now taken by ip6.our_tap_ll (which usually has the
> same value as ip6.gw).  For future flexibility we may want to make this
> "address of last resort" different from the gateway address, so split them
> logically for IPv4 as well.
> 
> Specifically, add a new ip4.our_tap_addr field for the address with this
> role, and initialise it to ip4.gw for now.  Unlike IPv6 where we can always
> get a link-local address, we might not be able to get a (non 0.0.0.0)
> address here.  In that case we have to disable DHCP

It's not entirely clear to me in which case we would not be able to
get any address, but at least RFC 2131 doesn't have a problem with this:

diff --git a/dhcp.c b/dhcp.c
index aa9f59d..3de8a6e 100644
--- a/dhcp.c
+++ b/dhcp.c
@@ -282,6 +282,7 @@ int dhcp(const struct ctx *c, const struct pool *p)
 	struct in_addr mask;
 	unsigned int i;
 	struct msg *m;
+	struct in_addr zeroes = { 0 };
 
 	eh  = packet_get(p, 0, offset, sizeof(*eh),  NULL);
 	offset += sizeof(*eh);
@@ -378,7 +379,7 @@ int dhcp(const struct ctx *c, const struct pool *p)
 		opt_set_dns_search(c, sizeof(m->o));
 
 	dlen = offsetof(struct msg, o) + fill(m);
-	tap_udp4_send(c, c->ip4.gw, 67, c->ip4.addr, 68, m, dlen);
+	tap_udp4_send(c, zeroes, 67, c->ip4.addr, 68, m, dlen);
 
 	return 1;
 }

and:

$ ./pasta -p dhcp.pcap
Saving packet capture to dhcp.pcap
# dhclient
# tshark -r dhcp.pcap
Running as user "root" and group "root". This could be dangerous.
    1   0.000000           :: → ff02::16     ICMPv6 90 Multicast Listener Report Message v2
    2   0.016265      0.0.0.0 → 255.255.255.255 DHCP 342 DHCP Discover - Transaction ID 0x75759d11
    3   0.016361      0.0.0.0 → 88.198.0.164 DHCP 342 DHCP Offer    - Transaction ID 0x75759d11
    4   0.016479      0.0.0.0 → 255.255.255.255 DHCP 342 DHCP Request  - Transaction ID 0x75759d11
    5   0.016493      0.0.0.0 → 88.198.0.164 DHCP 342 DHCP ACK      - Transaction ID 0x75759d11
[...]

so this could be a reasonable fallback.

> and forwarding of
> inbound connections with guest-inaccessible source addresses.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  conf.c  |  7 ++++++-
>  dhcp.c  |  4 ++--
>  fwd.c   | 10 +++++++---
>  passt.h |  2 ++
>  4 files changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/conf.c b/conf.c
> index 954f20ea..9f962fc8 100644
> --- a/conf.c
> +++ b/conf.c
> @@ -660,6 +660,8 @@ static unsigned int conf_ip4(unsigned int ifi,
>  
>  	ip4->addr_seen = ip4->addr;
>  
> +	ip4->our_tap_addr = ip4->gw;
> +
>  	if (MAC_IS_ZERO(mac)) {
>  		int rc = nl_link_get_mac(nl_sock, ifi, mac);
>  		if (rc < 0) {
> @@ -1666,7 +1668,10 @@ void conf(struct ctx *c, int argc, char **argv)
>  		die("External interface not usable");
>  
>  	if (c->ifi4 && IN4_IS_ADDR_UNSPECIFIED(&c->ip4.gw))
> -		c->no_map_gw = c->no_dhcp = 1;
> +		c->no_map_gw = 1;
> +
> +	if (c->ifi4 && IN4_IS_ADDR_UNSPECIFIED(&c->ip4.our_tap_addr))
> +		c->no_dhcp = 1;
>  
>  	if (c->ifi6 && IN6_IS_ADDR_UNSPECIFIED(&c->ip6.gw))
>  		c->no_map_gw = 1;
> diff --git a/dhcp.c b/dhcp.c
> index acc5b03e..a935dc94 100644
> --- a/dhcp.c
> +++ b/dhcp.c
> @@ -347,7 +347,7 @@ int dhcp(const struct ctx *c, const struct pool *p)
>  	mask.s_addr = htonl(0xffffffff << (32 - c->ip4.prefix_len));
>  	memcpy(opts[1].s,  &mask,        sizeof(mask));
>  	memcpy(opts[3].s,  &c->ip4.gw,   sizeof(c->ip4.gw));
> -	memcpy(opts[54].s, &c->ip4.gw,   sizeof(c->ip4.gw));
> +	memcpy(opts[54].s, &c->ip4.our_tap_addr, sizeof(c->ip4.our_tap_addr));

Nit: this was supposed to look like a table, so it would be nice to add
extra whitespace in the lines above this one.

-- 
Stefano


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 19/22] conf, fwd: Split notion of gateway/router from guest-visible host address
  2024-08-16  5:40 ` [PATCH 19/22] conf, fwd: Split notion of gateway/router from guest-visible host address David Gibson
@ 2024-08-20 19:56   ` Stefano Brivio
  2024-08-21  1:59     ` David Gibson
  0 siblings, 1 reply; 55+ messages in thread
From: Stefano Brivio @ 2024-08-20 19:56 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Paul Holzinger

On Fri, 16 Aug 2024 15:40:00 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> The @gw fields in the ip4_ctx and ip6_ctx give the (host's) default
> gateway.  We use this for two quite distinct things: advertising the
> gateway that the guest should use (via DHCP, NDP and/or --config-net)
> and for a limited form of NAT.  So that the guest can access services
> on the host, we map the gateway address within the guest to the
> loopback address on the host.
> 
> Using the gateway address for this isn't necessarily the best choice
> for this purpose, certainly not for all circumstances.  So, start off
> by splitting the notion of these into two different values: @guest_gw
> which is the gateway address the guest should use and @nat_host_loopback,
> which is the guest visible address to remap to the host's loopback.
> 
> Usually nat_host_loopback will have the same value as guest_gw.  However
> when --no-map-gw is specified we leave them unspecified instead.  This
> means when we use nat_host_loopback, we don't need to separately check
> c->no_map_gw to see if it's relevant.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  conf.c  | 60 +++++++++++++++++++++++++++++----------------------------
>  dhcp.c  | 10 ++++++----
>  fwd.c   |  4 ++--
>  passt.h | 16 +++++++++------
>  pasta.c |  6 ++++--
>  5 files changed, 53 insertions(+), 43 deletions(-)
> 
> diff --git a/conf.c b/conf.c
> index b1c58d5b..26373584 100644
> --- a/conf.c
> +++ b/conf.c
> @@ -410,12 +410,12 @@ static void add_dns_resolv(struct ctx *c, const char *nameserver,
>  		 * redirect
>  		 */
>  		if (IN4_IS_ADDR_LOOPBACK(&ns4)) {
> -			if (c->no_map_gw)
> +			if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.nat_host_loopback))

If you change the command-line option name to use "map", it would be
good to also change these names.

-- 
Stefano


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 20/22] conf: Allow address remapped to host to be configured
  2024-08-16  5:40 ` [PATCH 20/22] conf: Allow address remapped to host to be configured David Gibson
@ 2024-08-20 19:56   ` Stefano Brivio
  2024-08-21  2:23     ` David Gibson
  0 siblings, 1 reply; 55+ messages in thread
From: Stefano Brivio @ 2024-08-20 19:56 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Paul Holzinger

On Fri, 16 Aug 2024 15:40:01 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> Because the host and guest share the same IP address with passt/pasta, it's
> not possible for the guest to directly address the host.  Therefore we
> allow packets from the guest going to a special "NAT to host" address to be
> redirected to the host, appearing there as though they have both source and
> destination address of loopback.
> 
> Currently that special address is always the address of the default
> gateway (or none).  That can be a problem if we want that gateway to be
> addressable by the guest.  Therefore, allow the special "NAT to host"
> address to be overridden on the command line with a new --nat-host-loopback
> option.
> 
> In order to exercise and test it, update the passt_in_ns and perf
> tests to use this option and give different mapping addresses for the
> two layers of the environment.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  conf.c                | 57 +++++++++++++++++++++++++++++++--
>  passt.1               | 16 ++++++++++
>  test/lib/setup        | 11 +++++--
>  test/passt_in_ns/dhcp | 73 +++++++++++++++++++++++++++++++++++++++++++
>  test/passt_in_ns/tcp  | 38 +++++++++++-----------
>  test/passt_in_ns/udp  | 22 +++++++------
>  test/perf/passt_tcp   | 33 +++++++++----------
>  test/perf/passt_udp   | 31 +++++++++---------
>  test/perf/pasta_tcp   | 29 ++++++++---------
>  test/perf/pasta_udp   | 25 ++++++++-------
>  test/run              |  4 +--
>  11 files changed, 244 insertions(+), 95 deletions(-)
>  create mode 100644 test/passt_in_ns/dhcp
> 
> diff --git a/conf.c b/conf.c
> index 26373584..c5831e82 100644
> --- a/conf.c
> +++ b/conf.c
> @@ -817,6 +817,14 @@ static void usage(const char *name, FILE *f, int status)
>  		fprintf(f, "  --no-dhcp-search	No list in DHCP/DHCPv6/NDP\n");
>  
>  	fprintf(f,
> +		"  --nat-host-loopback ADDR	NAT ADDR to refer to host\n"
> +		"    Packets from the guest to ADDR will be redirected to the\n"
> +		"    host.  On the host such packets will appear to have both\n"
> +		"    source and destination of loopback (127.0.0.1 or ::1).\n"

I would leave these three lines to the man page. The help message is
already 90 lines long. This should be a quick guide/reminder, not a
full description.

This reminds me that 127.0.0.1 isn't the only IPv4 loopback address. I
don't know if anybody will ever have a use case where they would need
a different, specific, loopback source address, but, together with
--nat-guest-addr from 22/22, I start wondering: what if we had a single
option taking, optionally, an arbitrary (within limits) source address?

Now, given that we plan to add a configurable flow table at some point
in the future, it makes no sense to make this exceedingly flexible. But
I just wanted to bring this up for consideration, in case it's doable
at a small cost (I'm really not sure):

  --map-host [source,]address

where "source" would default to 127.0.0.1, but it could also be another
loopback address, or another address altogether (and we'll fail if it's
not local, of course).

If we want (can?) go that way and keep equivalent functionality as you
have now, we would have the additional problem that this option could
be given up to two times (one for loopback, one for non-loopback), and
not more (we don't have a data structure ready for an arbitrary number
of those), so it's not as generic as it might look like, and I'm not
sure if it's a good idea. But we could also expand on it in the future.

> +		"    ADDR can be 'none', in which case nothing is mapped\n"

This is a nice feature by the way as it should eventually allow us to
get consistent options in Podman instead of "--map-gw": Podman could
add by default '--map-host-loopback none', unless the user overrides
that with an actual address.

> +	        "    Can be specified zero to two (for IPv4 and IPv6)\n"

"can" (for consistency, but also because the subject is still the
option, this is not a separate sentence).

...times.

> +		"    default: gateway address, or none if --no-map-gw is also\n"
> +		"             specified\n"

I don't think we need to mention here that --no-map-gw implies none,
doing it in the man page is enough.

>  		"  --dns-forward ADDR	Forward DNS queries sent to ADDR\n"
>  		"    can be specified zero to two times (for IPv4 and IPv6)\n"
>  		"    default: don't forward DNS queries\n"
> @@ -959,6 +967,11 @@ static void conf_print(const struct ctx *c)
>  	info("    host: %s", eth_ntop(c->our_tap_mac, bufmac, sizeof(bufmac)));
>  
>  	if (c->ifi4) {
> +		if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.nat_host_loopback))
> +			info("    NAT to host 127.0.0.1: %s",
> +			     inet_ntop(AF_INET, &c->ip4.nat_host_loopback,
> +				       buf4, sizeof(buf4)));
> +
>  		if (!c->no_dhcp) {
>  			uint32_t mask;
>  
> @@ -989,6 +1002,11 @@ static void conf_print(const struct ctx *c)
>  	}
>  
>  	if (c->ifi6) {
> +		if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.nat_host_loopback))
> +			info("    NAT to host ::1: %s",
> +			     inet_ntop(AF_INET6, &c->ip6.nat_host_loopback,
> +				       buf6, sizeof(buf6)));
> +
>  		if (!c->no_ndp && !c->no_dhcpv6)
>  			info("NDP/DHCPv6:");
>  		else if (!c->no_ndp)
> @@ -1122,6 +1140,35 @@ static void conf_ugid(char *runas, uid_t *uid, gid_t *gid)
>  	}
>  }
>  
> +/**
> + * conf_nat() - Parse --nat-host-loopback option
> + * @c:		Execution context
> + * @arg:	String argument to --nat-host-loopback
> + * @no_map_gw:	--no-map-gw flag, updated for "none" argument
> + */
> +static void conf_nat(struct ctx *c, const char *arg, int *no_map_gw)
> +{
> +	if (strcmp(arg, "none") == 0) {
> +		c->ip4.nat_host_loopback = in4addr_any;
> +		c->ip6.nat_host_loopback = in6addr_any;
> +		*no_map_gw = 1;
> +	}
> +
> +	if (inet_pton(AF_INET6, arg, &c->ip6.nat_host_loopback) &&
> +	    !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.nat_host_loopback)	&&
> +	    !IN6_IS_ADDR_LOOPBACK(&c->ip6.nat_host_loopback)	&&
> +	    !IN6_IS_ADDR_MULTICAST(&c->ip6.nat_host_loopback))
> +		return;
> +
> +	if (inet_pton(AF_INET, arg, &c->ip4.nat_host_loopback)	&&
> +	    !IN4_IS_ADDR_UNSPECIFIED(&c->ip4.nat_host_loopback)	&&
> +	    !IN4_IS_ADDR_LOOPBACK(&c->ip4.nat_host_loopback)	&&
> +	    !IN4_IS_ADDR_MULTICAST(&c->ip4.nat_host_loopback))
> +		return;
> +
> +	die("Invalid address to remap to host: %s", optarg);
> +}
> +
>  /**
>   * conf_open_files() - Open files as requested by configuration
>   * @c:		Execution context
> @@ -1231,6 +1278,7 @@ void conf(struct ctx *c, int argc, char **argv)
>  		{"no-copy-routes", no_argument,		NULL,		18 },
>  		{"no-copy-addrs", no_argument,		NULL,		19 },
>  		{"netns-only",	no_argument,		NULL,		20 },
> +		{"nat-host-loopback", required_argument, NULL,		21 },
>  		{ 0 },
>  	};
>  	const char *logname = (c->mode == MODE_PASTA) ? "pasta" : "passt";
> @@ -1400,6 +1448,9 @@ void conf(struct ctx *c, int argc, char **argv)
>  			netns_only = 1;
>  			*userns = 0;
>  			break;
> +		case 21:
> +			conf_nat(c, optarg, &no_map_gw);
> +			break;
>  		case 'd':
>  			c->debug = 1;
>  			c->quiet = 0;
> @@ -1639,10 +1690,12 @@ void conf(struct ctx *c, int argc, char **argv)
>  	    (*c->ip6.ifname_out && !c->ifi6))
>  		die("External interface not usable");
>  
> -	if (c->ifi4 && !no_map_gw)
> +	if (c->ifi4 && !no_map_gw &&
> +	    IN4_IS_ADDR_UNSPECIFIED(&c->ip4.nat_host_loopback))
>  		c->ip4.nat_host_loopback = c->ip4.guest_gw;
>  
> -	if (c->ifi6 && !no_map_gw)
> +	if (c->ifi6 && !no_map_gw &&
> +	    IN6_IS_ADDR_UNSPECIFIED(&c->ip6.nat_host_loopback))
>  		c->ip6.nat_host_loopback = c->ip6.guest_gw;
>  
>  	if (c->ifi4 && IN4_IS_ADDR_UNSPECIFIED(&c->ip4.our_tap_addr))
> diff --git a/passt.1 b/passt.1
> index dca433b6..3680056a 100644
> --- a/passt.1
> +++ b/passt.1
> @@ -327,6 +327,22 @@ namespace will be silently dropped.
>  Disable Router Advertisements. Router Solicitations coming from guest or target
>  namespace will be ignored.
>  
> +.TP
> +.BR \-\-nat-host-loopback " " \fIaddr
> +Translate \fIaddr\fR to refer to the host. Packets from the guest to
> +\fIaddr\fR will be redirected to the host.  On the host such packets
> +will appear to have both source and destination of loopback (127.0.0.1

I would skip "of loopback" and just say "127.0.0.1 or ::1", to avoid
implying that there's a single loopback address for IPv4.

> +or ::1).
> +
> +If \fIaddr\fR is 'none', no address is mapped (this implies
> +\fB--no-map-gw\fR).  Only one IPv4 and one IPv6 address can be
> +translated, if the option is specified multiple times, the last one
> +takes effect.
> +
> +Default is to translate the guest's default gateway address, unless
> +\fB--no-map-gw\fR is also given, in which case no address is mapped by

Why "also"? You're describing the default, so I guess this option is
not actually given in that case.

> +default.
> +
>  .TP
>  .BR \-\-no-map-gw
>  Don't remap TCP connections and untracked UDP traffic, with the gateway address
> diff --git a/test/lib/setup b/test/lib/setup
> index 9b39b9fe..061bf997 100755
> --- a/test/lib/setup
> +++ b/test/lib/setup
> @@ -124,7 +124,12 @@ setup_passt_in_ns() {
>  	[ ${DEBUG} -eq 1 ] && __opts="${__opts} -d"
>  	[ ${TRACE} -eq 1 ] && __opts="${__opts} --trace"
>  
> -	context_run_bg pasta "./pasta ${__opts} -t 10001,10002,10011,10012 -T 10003,10013 -u 10001,10002,10011,10012 -U 10003,10013 -P ${STATESETUP}/pasta.pid --config-net ${NSTOOL} hold ${STATESETUP}/ns.hold"
> +        __nat_host4=192.0.2.1
> +        __nat_host6=2001:db8:9a55::1
> +        __nat_ns4=192.0.2.2
> +        __nat_ns6=2001:db8:9a55::2
> +
> +	context_run_bg pasta "./pasta ${__opts} -t 10001,10002,10011,10012 -T 10003,10013 -u 10001,10002,10011,10012 -U 10003,10013 -P ${STATESETUP}/pasta.pid --nat-host-loopback ${__nat_host4} --nat-host-loopback ${__nat_host6} --config-net ${NSTOOL} hold ${STATESETUP}/ns.hold"
>  	wait_for [ -f "${STATESETUP}/pasta.pid" ]
>  
>  	context_setup_nstool qemu ${STATESETUP}/ns.hold
> @@ -139,11 +144,11 @@ setup_passt_in_ns() {
>  	if [ ${VALGRIND} -eq 1 ]; then
>  		context_run passt "make clean"
>  		context_run passt "make valgrind"
> -		context_run_bg passt "valgrind --max-stackframe=$((4 * 1024 * 1024)) --trace-children=yes --vgdb=no --error-exitcode=1 --suppressions=test/valgrind.supp ./passt -f ${__opts} -s ${STATESETUP}/passt.socket -t 10001,10011,10021,10031 -u 10001,10011,10021,10031 -P ${STATESETUP}/passt.pid"
> +		context_run_bg passt "valgrind --max-stackframe=$((4 * 1024 * 1024)) --trace-children=yes --vgdb=no --error-exitcode=1 --suppressions=test/valgrind.supp ./passt -f ${__opts} -s ${STATESETUP}/passt.socket -t 10001,10011,10021,10031 -u 10001,10011,10021,10031 -P ${STATESETUP}/passt.pid --nat-host-loopback ${__nat_ns4} --nat-host-loopback ${__nat_ns6}"
>  	else
>  		context_run passt "make clean"
>  		context_run passt "make"
> -		context_run_bg passt "./passt -f ${__opts} -s ${STATESETUP}/passt.socket -t 10001,10011,10021,10031 -u 10001,10011,10021,10031 -P ${STATESETUP}/passt.pid"
> +		context_run_bg passt "./passt -f ${__opts} -s ${STATESETUP}/passt.socket -t 10001,10011,10021,10031 -u 10001,10011,10021,10031 -P ${STATESETUP}/passt.pid --nat-host-loopback ${__nat_ns4} --nat-host-loopback ${__nat_ns6}"
>  	fi
>  	wait_for [ -f "${STATESETUP}/passt.pid" ]
>  
> diff --git a/test/passt_in_ns/dhcp b/test/passt_in_ns/dhcp
> new file mode 100644
> index 00000000..48c7d197
> --- /dev/null
> +++ b/test/passt_in_ns/dhcp

...how did this happen? This file already exists.

-- 
Stefano


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 22/22] fwd, conf: Allow NAT of the guest's assigned address
  2024-08-16  5:40 ` [PATCH 22/22] fwd, conf: Allow NAT of the guest's assigned address David Gibson
@ 2024-08-20 19:56   ` Stefano Brivio
  2024-08-21  2:28     ` David Gibson
  0 siblings, 1 reply; 55+ messages in thread
From: Stefano Brivio @ 2024-08-20 19:56 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Paul Holzinger

On Fri, 16 Aug 2024 15:40:03 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> The guest is usually assigned one of the host's IP addresses.  That means
> it can't access the host itself via its usual address.  The
> --nat-host-loopback option (enabled by default with the gateway address)
> allows the guest to contact the host.  However, connections forwarded this
> way appear on the host to have originated from the loopback interface,
> which isn't always desirable.
> 
> Add a new --nat-guest-addr option, which acts similarly but forwarded
> connections will go to the host's external address, instead of loopback.
> 
> If '-a' is used, so the guest's address is not the same as the host's, this
> will instead forward to whatever host-visible site is shadowed by the
> guest's assigned address.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  conf.c  | 51 ++++++++++++++++++++++++++++++++++-----------------
>  fwd.c   | 10 ++++++++++
>  passt.1 | 15 +++++++++++++++
>  passt.h |  6 ++++++
>  4 files changed, 65 insertions(+), 17 deletions(-)
> 
> diff --git a/conf.c b/conf.c
> index c5831e82..d14abc63 100644
> --- a/conf.c
> +++ b/conf.c
> @@ -825,6 +825,14 @@ static void usage(const char *name, FILE *f, int status)
>  	        "    Can be specified zero to two (for IPv4 and IPv6)\n"
>  		"    default: gateway address, or none if --no-map-gw is also\n"
>  		"             specified\n"
> +		"  --nat-guest-addr ADDR	NAT ADDR to guest's address\n"
> +		"    Packets from the guest to ADDR will be redirected to the\n"
> +		"    adress on the host that's the same as the guest's\n"
> +		"    assigned address.  Usually that means (one of) the host's\n"
> +		"    global address.\n"

Same as 20/22, it's probably enough to have this in the man page.

> +		"    ADDR can be 'none', in which case nothing is mapped\n"
> +	        "    Can be specified zero to two (for IPv4 and IPv6)\n"

"can", times

> +		"    default: none\n"
>  		"  --dns-forward ADDR	Forward DNS queries sent to ADDR\n"
>  		"    can be specified zero to two times (for IPv4 and IPv6)\n"
>  		"    default: don't forward DNS queries\n"
> @@ -1141,29 +1149,32 @@ static void conf_ugid(char *runas, uid_t *uid, gid_t *gid)
>  }
>  
>  /**
> - * conf_nat() - Parse --nat-host-loopback option
> - * @c:		Execution context
> - * @arg:	String argument to --nat-host-loopback
> - * @no_map_gw:	--no-map-gw flag, updated for "none" argument
> + * conf_nat() - Parse --nat-host-loopback or --nat-guest-addr option
> + * @arg:	String argument to option
> + * @addr4:	IPv4 to update with parsed address
> + * @addr6:	IPv6 to update with parsed address
> + * @no_map_gw:	--no-map-gw flag, or NULL, updated for "none" argument
>   */
> -static void conf_nat(struct ctx *c, const char *arg, int *no_map_gw)
> +static void conf_nat(const char *arg, struct in_addr *addr4,
> +		     struct in6_addr *addr6, int *no_map_gw)
>  {
>  	if (strcmp(arg, "none") == 0) {
> -		c->ip4.nat_host_loopback = in4addr_any;
> -		c->ip6.nat_host_loopback = in6addr_any;
> -		*no_map_gw = 1;
> +		*addr4 = in4addr_any;
> +		*addr6 = in6addr_any;
> +		if (no_map_gw)
> +			*no_map_gw = 1;
>  	}
>  
> -	if (inet_pton(AF_INET6, arg, &c->ip6.nat_host_loopback) &&
> -	    !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.nat_host_loopback)	&&
> -	    !IN6_IS_ADDR_LOOPBACK(&c->ip6.nat_host_loopback)	&&
> -	    !IN6_IS_ADDR_MULTICAST(&c->ip6.nat_host_loopback))
> +	if (inet_pton(AF_INET6, arg, addr6)	&&
> +	    !IN6_IS_ADDR_UNSPECIFIED(addr6)	&&
> +	    !IN6_IS_ADDR_LOOPBACK(addr6)	&&
> +	    !IN6_IS_ADDR_MULTICAST(addr6))
>  		return;
>  
> -	if (inet_pton(AF_INET, arg, &c->ip4.nat_host_loopback)	&&
> -	    !IN4_IS_ADDR_UNSPECIFIED(&c->ip4.nat_host_loopback)	&&
> -	    !IN4_IS_ADDR_LOOPBACK(&c->ip4.nat_host_loopback)	&&
> -	    !IN4_IS_ADDR_MULTICAST(&c->ip4.nat_host_loopback))
> +	if (inet_pton(AF_INET, arg, addr4)	&&
> +	    !IN4_IS_ADDR_UNSPECIFIED(addr4)	&&
> +	    !IN4_IS_ADDR_LOOPBACK(addr4)	&&
> +	    !IN4_IS_ADDR_MULTICAST(addr4))
>  		return;
>  
>  	die("Invalid address to remap to host: %s", optarg);
> @@ -1279,6 +1290,7 @@ void conf(struct ctx *c, int argc, char **argv)
>  		{"no-copy-addrs", no_argument,		NULL,		19 },
>  		{"netns-only",	no_argument,		NULL,		20 },
>  		{"nat-host-loopback", required_argument, NULL,		21 },
> +		{"nat-guest-addr", required_argument,	NULL,		22 },
>  		{ 0 },
>  	};
>  	const char *logname = (c->mode == MODE_PASTA) ? "pasta" : "passt";
> @@ -1449,7 +1461,12 @@ void conf(struct ctx *c, int argc, char **argv)
>  			*userns = 0;
>  			break;
>  		case 21:
> -			conf_nat(c, optarg, &no_map_gw);
> +			conf_nat(optarg, &c->ip4.nat_host_loopback,
> +				 &c->ip6.nat_host_loopback, &no_map_gw);
> +			break;
> +		case 22:
> +			conf_nat(optarg, &c->ip4.nat_guest_addr,
> +				 &c->ip6.nat_guest_addr, NULL);
>  			break;
>  		case 'd':
>  			c->debug = 1;
> diff --git a/fwd.c b/fwd.c
> index 7718f7e2..ff4789a2 100644
> --- a/fwd.c
> +++ b/fwd.c
> @@ -272,6 +272,10 @@ uint8_t fwd_nat_from_tap(const struct ctx *c, uint8_t proto,
>  		tgt->eaddr = inany_loopback4;
>  	else if (inany_equals6(&ini->oaddr, &c->ip6.nat_host_loopback))
>  		tgt->eaddr = inany_loopback6;
> +	else if (inany_equals4(&ini->oaddr, &c->ip4.nat_guest_addr))
> +		tgt->eaddr = inany_from_v4(c->ip4.addr);
> +	else if (inany_equals6(&ini->oaddr, &c->ip6.nat_guest_addr))
> +		tgt->eaddr.a6 = c->ip6.addr;
>  	else
>  		tgt->eaddr = ini->oaddr;
>  
> @@ -393,6 +397,12 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto,
>  	} else if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.nat_host_loopback) &&
>  		   inany_equals6(&ini->eaddr, &in6addr_loopback)) {
>  		tgt->oaddr.a6 = c->ip6.nat_host_loopback;
> +	} else if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.nat_guest_addr) &&
> +		   inany_equals4(&ini->eaddr, &c->ip4.addr)) {
> +		tgt->oaddr = inany_from_v4(c->ip4.nat_guest_addr);
> +	} else if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.nat_guest_addr) &&
> +		   inany_equals6(&ini->eaddr, &c->ip6.addr)) {
> +		tgt->oaddr.a6 = c->ip6.nat_guest_addr;
>  	} else if (!fwd_guest_accessible(c, &ini->eaddr)) {
>  		if (inany_v4(&ini->eaddr)) {
>  			if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.our_tap_addr))
> diff --git a/passt.1 b/passt.1
> index 3680056a..7cf553cf 100644
> --- a/passt.1
> +++ b/passt.1
> @@ -350,6 +350,21 @@ as destination, to the host. Implied if there is no gateway on the selected
>  default route, or if there is no default route, for any of the enabled address
>  families.
>  
> +.TP
> +.BR \-\-nat-guest-loopback " " \fIaddr
> +Translate \fIaddr\fR in the guest to be equal to the guest's assigned
> +address on the host.  That is, packets from the guest to \fIaddr\fR
> +will be redirected to the address assigned to the guest with \fB-a\fR,
> +or by default the host's global address.  This allows the guest to
> +access services availble on the host's global address, even though its
> +own address shadows that of the host.
> +
> +If \fIaddr\fR is 'none', no address is mapped.  Only one IPv4 and one
> +IPv6 address can be translated, if the option is specified multiple

, and if

> +times, the last one for each address type takes effect.
> +
> +Default is no mapping.
> +
>  .TP
>  .BR \-4 ", " \-\-ipv4-only
>  Enable IPv4-only operation. IPv6 traffic will be ignored.
> diff --git a/passt.h b/passt.h
> index 20a5904a..586c1d05 100644
> --- a/passt.h
> +++ b/passt.h
> @@ -104,6 +104,8 @@ enum passt_modes {
>   * @guest_gw:		IPv4 gateway as seen by the guest
>   * @nat_host_loopback:	Outbound connections to this address are NATted to the
>   *                      host's 127.0.0.1
> + * @nat_guest_addr:	Outbound connections to this address are NATted to the
> + *                      guest's assigned address
>   * @dns:		DNS addresses for DHCP, zero-terminated
>   * @dns_match:		Forward DNS query if sent to this address
>   * @our_tap_addr:	IPv4 address for passt's use on tap
> @@ -120,6 +122,7 @@ struct ip4_ctx {
>  	int prefix_len;
>  	struct in_addr guest_gw;
>  	struct in_addr nat_host_loopback;
> +	struct in_addr nat_guest_addr;
>  	struct in_addr dns[MAXNS + 1];
>  	struct in_addr dns_match;
>  	struct in_addr our_tap_addr;
> @@ -142,6 +145,8 @@ struct ip4_ctx {
>   * @guest_gw:		IPv6 gateway as seen by the guest
>   * @nat_host_loopback:	Outbound connections to this address are NATted to the
>   *                      host's [::1]
> + * @nat_guest_addr:	Outbound connections to this address are NATted to the
> + *                      guest's assigned address
>   * @dns:		DNS addresses for DHCPv6 and NDP, zero-terminated
>   * @dns_match:		Forward DNS query if sent to this address
>   * @our_tap_ll:		Link-local IPv6 address for passt's use on tap
> @@ -158,6 +163,7 @@ struct ip6_ctx {
>  	struct in6_addr addr_ll_seen;
>  	struct in6_addr guest_gw;
>  	struct in6_addr nat_host_loopback;
> +	struct in6_addr nat_guest_addr;
>  	struct in6_addr dns[MAXNS + 1];
>  	struct in6_addr dns_match;
>  	struct in6_addr our_tap_ll;

-- 
Stefano


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/22] RFC: Allow configuration of special case NATs
  2024-08-20  0:42         ` David Gibson
@ 2024-08-20 20:39           ` Stefano Brivio
  2024-08-21  2:51             ` David Gibson
  0 siblings, 1 reply; 55+ messages in thread
From: Stefano Brivio @ 2024-08-20 20:39 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Paul Holzinger

On Tue, 20 Aug 2024 10:42:17 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Mon, Aug 19, 2024 at 03:01:00PM +0200, Stefano Brivio wrote:
> > On Mon, 19 Aug 2024 19:52:49 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Mon, Aug 19, 2024 at 11:27:49AM +0200, Stefano Brivio wrote:  
> > > > On Mon, 19 Aug 2024 18:46:31 +1000
> > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >     
> > > > > On Fri, Aug 16, 2024 at 03:39:41PM +1000, David Gibson wrote:    
> > > > > > Based on Stefano's recent patch for faster tests.
> > > > > > 
> > > > > > Allow the user to specify which addresses are translated when used by
> > > > > > the guest, rather than always being the gateway address or nothing.
> > > > > > We also allow this remapping to go to the host's global address (more
> > > > > > precisely the address assigned to the guest) rather than just host
> > > > > > loopback.
> > > > > > 
> > > > > > Suggestions for better names for the new options in patches 20 & 22
> > > > > > are most welcome.
> > > > > > 
> > > > > > Along the way to implementing that make many changes to clarify what
> > > > > > various addresses we track mean, fixing a number of small bugs as
> > > > > > well.
> > > > > > 
> > > > > > NOTE: there is a bug in 21/22 which breaks some of the passt_tcp perf
> > > > > > tests.  I haven't managed to figure out why it's causing the problem,
> > > > > > or even what the exact triggering conditions are (running the single
> > > > > > stalling iperf alone doesn't do it).  Have to wrap up for today, so I
> > > > > > thought I'd get this out for review anyway.      
> > > > > 
> > > > > I've identified the bug here.  IMO, it's a pre-existing problem that
> > > > > only works by accident at the moment.  The immediate fix is pretty
> > > > > obvious, but it raises some broader questions
> > > > > 
> > > > > The problem arises because of the MTU changes we make in order to test
> > > > > throughput with different packet sizes.  Specifically we change the
> > > > > MTU to values < 1280, which implicitly disables IPv6 since it requires
> > > > > an MTU >= 1280.  When we change the MTU back to a larger value IPv6 is
> > > > > re-enabled, but some configuration has been lost in the meantime.
> > > > > 
> > > > > After the MTU is restored the guest reconfigures with NDP, but does
> > > > > not re-DHCPv6.  That means the guest gets a SLAAC address in the right
> > > > > prefix but not the exact /128 address we've tried to assign to it.
> > > > > However, at least with the sequence of things we have in the tests,
> > > > > the guest never sends any packets with the new address, so passt
> > > > > doesn't update addr_seen.  When the inbound connection comes we send
> > > > > it to the assigned address instead of the guest's actual address and
> > > > > the guest rejects it.    
> > > > 
> > > > I still have to take a closer look, but I'm fairly sure I hit a similar
> > > > issue while I was writing these tests originally. I pondered
> > > > reconfiguring the address via DHCPv6, or using the keep_addr_on_down
> > > > sysctl (net.ipv6.conf.<interface>.keep_addr_on_down), which was added
> > > > around that time.
> > > > 
> > > > Then:
> > > >     
> > > > > This "worked" previously, because before this patch, passt would
> > > > > translate the inbound connection to have source/dest as link-local
> > > > > addresses.    
> > > > 
> > > > ...I realised that this worked and forgot about the whole issue.
> > > >     
> > > > > We *do* have a current addr_ll_seen because (a) it won't
> > > > > change if the guest doesn't change MAC and (b) when IPv6 is re-enabled
> > > > > the NDP traffic the guest generates will have link-local addresses
> > > > > that update addr_ll_seen.  With this patch, and a global address for
> > > > > --map-host-loopback, we now need to send to addr_seen instead of
> > > > > addr_ll_seen, hence exposing the bug.
> > > > > 
> > > > > In the short term, the obvious fix would be to re-run dhclient -6 in
> > > > > the guest after we twiddle MTU but before running IPv6 tests.    
> > > > 
> > > > I guess setting keep_addr_on_down (even for "all" interfaces) should
> > > > work as well.    
> > > 
> > > Sounds like it.  I wasn't aware of that one.
> > > 
> > > /me tests..  actually, no it doesn't work..
> > > 
> > > # sysctl -a | grep keep_addr_on_down
> > > net.ipv6.conf.all.keep_addr_on_down = 1
> > > net.ipv6.conf.default.keep_addr_on_down = 1
> > > net.ipv6.conf.dummy0.keep_addr_on_down = 1
> > > net.ipv6.conf.lo.keep_addr_on_down = 0
> > > # ip addr add 2001:db8::1 dev dummy0
> > > # ip a
> > > 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
> > >     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> > > 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
> > >     link/ether c2:02:f2:79:f9:94 brd ff:ff:ff:ff:ff:ff
> > >     inet6 2001:db8::1/128 scope global 
> > >        valid_lft forever preferred_lft forever
> > > # ip link set dummy0 mtu 1200
> > > # ip a
> > > 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
> > >     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> > > 2: dummy0: <BROADCAST,NOARP> mtu 1200 qdisc noop state DOWN group default qlen 1000
> > >     link/ether c2:02:f2:79:f9:94 brd ff:ff:ff:ff:ff:ff
> > > # ip link set dummy0 mtu 1500
> > > # ip a
> > > 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
> > >     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> > > 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
> > >     link/ether c2:02:f2:79:f9:94 brd ff:ff:ff:ff:ff:ff
> > > 
> > > My guess is that IPv6 being deconfigured because of an unsuitable MTU
> > > is considered a different event from a mere "down".  
> > 
> > I guess it's because they're not IFA_F_PERMANENT, because
> > addrconf_permanent_addr() has:
> > 
> >         case NETDEV_CHANGEMTU:
> >                 /* if MTU under IPV6_MIN_MTU stop IPv6 on this interface. */
> >                 if (dev->mtu < IPV6_MIN_MTU) {
> >                         addrconf_ifdown(dev, dev != net->loopback_dev);
> >                         break;
> >                 }
> > 
> > but addrconf_ifdown() does:
> > 
> >                                 if (!keep_addr ||
> >                                     !(ifa->flags & IFA_F_PERMANENT) ||
> >                                     addr_is_local(&ifa->addr)) {
> >                                         hlist_del_init_rcu(&ifa->addr_lst);
> >                                         goto restart;
> >                                 }
> > 
> > I'm not sure about the logic behind that. We could actually set those
> > addresses as permanent once the DHCPv6 client configures them, if it's
> > cleaner.  
> 
> Huh.  Not in the passt/VM case, though, which is where I actually
> encountered this.

I meant using ip(8) from the test script itself, but it doesn't
actually make sense:

# ip address change 2a01:4f8:222:904:c800:94ff:fe29:a8d/64 permanent dev eth0
Warning: permanent option is not mutable from userspace

because (RFC 3549):

   IFA_F_PERMANENT  For a permanent address set by the user.
                    When this is not set, it means the address
                    was dynamically created (e.g., by stateless
                    autoconfiguration).

So the address you used in your test _should_ have IFA_F_PERMANENT. The
plot thickens.

I just tried this, which confirms your hypothesis that bringing the
link down is a different event:

# ip addr add 2001:db8::1 dev dummy0
# ip link set dummy0 down
# ip addr show dev dummy0
5: dummy0: <BROADCAST,NOARP> mtu 1280 qdisc noqueue state DOWN group default qlen 1000
    link/ether 02:59:00:28:1b:5f brd ff:ff:ff:ff:ff:ff
    inet 1.2.3.1/24 scope global dummy0
       valid_lft forever preferred_lft forever
    inet6 2001:db8::1/128 scope global 
       valid_lft forever preferred_lft forever
# ip link set dummy0 mtu 1279
# ip addr show dev dummy0
5: dummy0: <BROADCAST,NOARP> mtu 1279 qdisc noqueue state DOWN group default qlen 1000
    link/ether 02:59:00:28:1b:5f brd ff:ff:ff:ff:ff:ff
    inet 1.2.3.1/24 scope global dummy0
       valid_lft forever preferred_lft forever

...I just can't see that from the code.

> > > > > This kind of opens a question about how hard we should try to
> > > > > accomodate guests which don't configure themselves how we told them.    
> > > > 
> > > > There's a notable distinction between guests temporarily diverging (in
> > > > different ways) and guests we don't configure at all.    
> > > 
> > > I'm not really sure what you're getting at here.  
> > 
> > In this case, it's not true that the guest doesn't configure itself in
> > the way we requested -- it's just a temporary diversion from that
> > configuration.  
> 
> Oh, I see.  Assuming that at some point the DHCP client will re-run.
> 
> > Those are different cases that we can handle in different ways, I
> > think. If it's a glitch that will only happen during testing, let's
> > work around that.
> > 
> > But if the guest really ignores DHCPv6 information, I think we should
> > keep that working.
> >   
> > > > It's probably more important to ensure we use the right type of address    
> > > 
> > > "type" in what sense here?  
> > 
> > Global unicast instead of link-local.  
> 
> Ok.
> 
> > > > (security) rather than ensuring we somehow manage to deliver packets at
> > > > any time (minor glitch otherwise), also because the one you describe is
> > > > something we're unlikely to hit outside of tests.
> > > >     
> > > > > Personally I'd be ok with saying that nothing works if the guest
> > > > > doesn't configure itself properly, thereby removing addr_seen and
> > > > > addr_ll_seen entirely.  But I think, Stefano, you've been against that
> > > > > idea in the past.    
> > > > 
> > > > Yes, I still think we should support guests that don't use DHCPv6 or
> > > > NDP at all,    
> > > 
> > > Well, you still wouldn't *need* DHCPv6 or NDP, but you'd have to
> > > manually configure the interface in the guest to match the address
> > > you've configured with -a.  Just like you'd expect to have to
> > > correctly configure your address on a real network.  
> > 
> > True, but if we make correctness as optional as possible, we'll be more
> > compatible (less time spent by users fixing situations that don't
> > necessarily need fixing, less time spent by developers to look into
> > reports, no matter who's at fault).  
> 
> Eh, maybe.  Unless us trying to make sense of a nonsense situation
> causes some unpredictable behaviour that breaks something else.
> 
> > > > or where related exchanges fail for any reason. It improves
> > > > reliability and compatibility at a small cost. In this case, I think
> > > > it's a nice feature that we would resume communicating as soon as the
> > > > guest shows its global unicast address.    
> > > 
> > > Hm, maybe.  I'm not entirely convinced the cost is so small long term.
> > > It's pretty badly incompatible with having multiple guests behind the
> > > same passt instance: such as the initial guest bridging or routing to
> > > nested guests.  
> > 
> > Why? We will need to hash the interface/guest index anyway, for
> > outbound flows.  
> 
> If we have separate interfaces for each guest, yes.  But not if we
> have multiple guests behind a single tap because the initial guest
> sets up a bridge or routing.  Then we have nothing but the address.

...but then we should have multiple addresses anyway. By the way, I'm
not sure we'll ever be able to support that kind of configuration. How
does a guest set up a bridge and use passt at the same time?

> > And for inbound flows, if a guest steals the address of another guest,
> > we'll give priority to the normal 'addr' versions instead of the
> > '_seen' ones, to decide how to direct traffic.  
> 
> I don't see how we'd know we're in this situation, so when to
> prioritise which address over the other.

In the set of all addr_seen and addr, we would have at least a
non-unique value. Or, practically speaking, we should refuse to set
addr_seen if it matches addr for another guest.

> > > I'm actually not sure if encountering this bug makes me more or less
> > > in favour of addr_seen.  On the one hand I think it highlights the
> > > flakiness of this approach; there are situations where we just won't
> > > know the right address.  
> > 
> > I don't understand this argument: indeed, there are such situations,
> > and they are annoying. Why should we make them more common?  
> 
> Because predictability is good, and working _most_ of the time is a
> failure of predictability.

It avoids substantial effort and frustration for everybody involved
though. The practical problem with lacking predictability is if it
makes things harder to debug, I guess, which shouldn't be the case here.

> > > On the other hand if shows a relatively
> > > plausible case where the guest won't get exactly the address we want
> > > it to (it uses NDP but not DHCPv6)
> > > 
> > > Hrm... actually this also shows a potential danger in the recent
> > > patches to disable DAD in the guest.  With DAD enabled, when the guest
> > > grabs a new address, we'd expect it to emit DAD messages, which would
> > > have the side effect of updating our addr_seen (although I'm pretty
> > > sure I hit this patch before the nodad patches were applied, so that
> > > doesn't seem to be foolproof).  
> > 
> > Well, but we do that for containers with --config-net only. In that
> > case, the addresses we configure have infinite lifetime anyway.  
> 
> Oh, good point.  Hrm... then I'm unsure why the guest wasn't re-DADing
> its new address.

It probably did, but we ignored that anyway because DAD is done by
sending neighbour solicitations with an unspecified address as source,
for example (the "change" here drops "nodad"):

$ ./pasta --config-net -p dad.pcap
Saving packet capture to dad.pcap
# ip addr change dev enp9s0 fe80::3882:b5ff:fe01:e9a1/64
# tshark -r dad.pcap |grep Neigh
Running as user "root" and group "root". This could be dangerous.
   10   2.642467           :: → ff02::1:ff01:e9a1 ICMPv6 86 Neighbor Solicitation for fe80::3882:b5ff:fe01:e9a1

and in tap6_handler() we do:

                } else if (!IN6_IS_ADDR_UNSPECIFIED(saddr)){
                        c->ip6.addr_seen = *saddr;
                }

...then, in ndp():

                if (IN6_IS_ADDR_UNSPECIFIED(saddr))
                        return 1;

we could set addr_seen by looking at the *target* address of the
neighbour solicitation when the source address is ::, but it's not
implemented yet.

> > Besides, I don't think we need to have addr_seen updated as quickly and
> > correctly as possible just for the sake of it, we can also update it
> > when we get any other neighbour solicitation because the guest is
> > actually using the network. It's not meant to be perfect.  
> 
> If the guest is a pure server (a common case for containers AFAICT),
> then I don't know that we can expect NS messages for anything other
> than the default gateway, which is (typically) link-local and so won't
> help us to learn the new global address.

Containers running actual applications are noisy. I've only seen this
kind of problem (addr_seen not set/matching) in particularly crafted
test environments.

> > > We could maybe update addr_seen when we send RA messages to the guest
> > > - assuming that it will use the same host part (low 64-bits) for both
> > > link-local and global addresses.  Not sure if that's a widely safe
> > > assumption or not.  
> > 
> > I don't understand: what case are you trying to cover with this?  
> 
> A case just like the one in the tests: the interface bounces, and we
> get NDP traffic on the link-local address, but nothing on the global
> address before an inbound connection.

Oh, I see. I think it makes sense, even though we'll set addr_seen a
bit too early, but not enough to be a practical issue, I think.

-- 
Stefano


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 16/22] fwd: Helpers to clarify what host addresses aren't guest accessible
  2024-08-20 19:56   ` Stefano Brivio
@ 2024-08-21  1:40     ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-21  1:40 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Paul Holzinger

[-- Attachment #1: Type: text/plain, Size: 1938 bytes --]

On Tue, Aug 20, 2024 at 09:56:18PM +0200, Stefano Brivio wrote:
> On Fri, 16 Aug 2024 15:39:57 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > We usually avoid NAT, but in a few cases we need to apply address
> > translations.  For inbound connections that happens for addresses which
> > make sense to the host but are either inaccessible, or mean a different
> > location from the guest's point of view.
> > 
> > Add some helper functions to determine such addresses, and use them in
> > fwd_nat_from_host().  In doing so clarify some of the reasons for the
> > logic.  We'll also have further use for these helpers in future.
> > 
> > While we're there fix one unneccessary inconsistency between IPv4 and IPv6.
> > We always translated the guest's observed address, but for IPv4 we didn't
> > translate the guest's assigned address, whereas for IPv6 we did.  Change
> > this to translate both in all cases for consistency.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  fwd.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++++++++-------
> >  1 file changed, 87 insertions(+), 11 deletions(-)
> > 
> > diff --git a/fwd.c b/fwd.c
> > index 75dc0151..1baae338 100644
> > --- a/fwd.c
> > +++ b/fwd.c
> > @@ -170,6 +170,85 @@ static bool is_dns_flow(uint8_t proto, const struct flowside *ini)
> >  		((ini->oport == 53) || (ini->oport == 853));
> >  }
> >  
> > +/**
> > + * fwd_guest_accessible4() - Is IPv4 address guest accessible
> 
> Nit: I wonder if we should say "guest-accessible" in all these cases,
> it's a bit easier for me to decode, but not necessarily more correct.
> It's fine by me either way.

Just adding the hyphen?  Sure, done.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 17/22] fwd: Split notion of "our tap address" from gateway for IPv4
  2024-08-20 19:56   ` Stefano Brivio
@ 2024-08-21  1:56     ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-21  1:56 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Paul Holzinger

[-- Attachment #1: Type: text/plain, Size: 5389 bytes --]

On Tue, Aug 20, 2024 at 09:56:24PM +0200, Stefano Brivio wrote:
> On Fri, 16 Aug 2024 15:39:58 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > ip4.gw conflates 3 conceptually different things, which (for now) have the
> > same value:
> >   1. The router/gateway address as seen by the guest
> >   2. An address to NAT to the host with --no-map-gw isn't specified
> >   3. An address to use as source when nothing else makes sense
> > 
> > Case 3 occurs in two situations:
> > 
> > a) for our DHCP responses - since they come from passt internally there's
> >    no naturally meaningful address for them to come from
> > b) for forwarded connections coming from an address that isn't guest
> >    accessible (localhost or the guest's own address).
> > 
> > (b) occurs even with --no-map-gw, and the expected behaviour of forwarding
> > local connections requires it.
> > 
> > For IPv6 role (3) is now taken by ip6.our_tap_ll (which usually has the
> > same value as ip6.gw).  For future flexibility we may want to make this
> > "address of last resort" different from the gateway address, so split them
> > logically for IPv4 as well.
> > 
> > Specifically, add a new ip4.our_tap_addr field for the address with this
> > role, and initialise it to ip4.gw for now.  Unlike IPv6 where we can always
> > get a link-local address, we might not be able to get a (non 0.0.0.0)
> > address here.  In that case we have to disable DHCP
> 
> It's not entirely clear to me in which case we would not be able to
> get any address,

Currently, when we don't have a gateway address on the host: no
connectivity, or a point-to-point link with no gateway, or the like.
We used to absolutely require it, but that restriction has been eased
and may ease further in future.

> but at least RFC 2131 doesn't have a problem with this:
> 
> diff --git a/dhcp.c b/dhcp.c
> index aa9f59d..3de8a6e 100644
> --- a/dhcp.c
> +++ b/dhcp.c
> @@ -282,6 +282,7 @@ int dhcp(const struct ctx *c, const struct pool *p)
>  	struct in_addr mask;
>  	unsigned int i;
>  	struct msg *m;
> +	struct in_addr zeroes = { 0 };
>  
>  	eh  = packet_get(p, 0, offset, sizeof(*eh),  NULL);
>  	offset += sizeof(*eh);
> @@ -378,7 +379,7 @@ int dhcp(const struct ctx *c, const struct pool *p)
>  		opt_set_dns_search(c, sizeof(m->o));
>  
>  	dlen = offsetof(struct msg, o) + fill(m);
> -	tap_udp4_send(c, c->ip4.gw, 67, c->ip4.addr, 68, m, dlen);
> +	tap_udp4_send(c, zeroes, 67, c->ip4.addr, 68, m, dlen);
>  
>  	return 1;
>  }
> 
> and:
> 
> $ ./pasta -p dhcp.pcap
> Saving packet capture to dhcp.pcap
> # dhclient
> # tshark -r dhcp.pcap
> Running as user "root" and group "root". This could be dangerous.
>     1   0.000000           :: → ff02::16     ICMPv6 90 Multicast Listener Report Message v2
>     2   0.016265      0.0.0.0 → 255.255.255.255 DHCP 342 DHCP Discover - Transaction ID 0x75759d11
>     3   0.016361      0.0.0.0 → 88.198.0.164 DHCP 342 DHCP Offer    - Transaction ID 0x75759d11
>     4   0.016479      0.0.0.0 → 255.255.255.255 DHCP 342 DHCP Request  - Transaction ID 0x75759d11
>     5   0.016493      0.0.0.0 → 88.198.0.164 DHCP 342 DHCP ACK      - Transaction ID 0x75759d11
> [...]
> 
> so this could be a reasonable fallback.

Fair point.  I've removed the disabling of DHCP in this case.


> 
> > and forwarding of
> > inbound connections with guest-inaccessible source addresses.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  conf.c  |  7 ++++++-
> >  dhcp.c  |  4 ++--
> >  fwd.c   | 10 +++++++---
> >  passt.h |  2 ++
> >  4 files changed, 17 insertions(+), 6 deletions(-)
> > 
> > diff --git a/conf.c b/conf.c
> > index 954f20ea..9f962fc8 100644
> > --- a/conf.c
> > +++ b/conf.c
> > @@ -660,6 +660,8 @@ static unsigned int conf_ip4(unsigned int ifi,
> >  
> >  	ip4->addr_seen = ip4->addr;
> >  
> > +	ip4->our_tap_addr = ip4->gw;
> > +
> >  	if (MAC_IS_ZERO(mac)) {
> >  		int rc = nl_link_get_mac(nl_sock, ifi, mac);
> >  		if (rc < 0) {
> > @@ -1666,7 +1668,10 @@ void conf(struct ctx *c, int argc, char **argv)
> >  		die("External interface not usable");
> >  
> >  	if (c->ifi4 && IN4_IS_ADDR_UNSPECIFIED(&c->ip4.gw))
> > -		c->no_map_gw = c->no_dhcp = 1;
> > +		c->no_map_gw = 1;
> > +
> > +	if (c->ifi4 && IN4_IS_ADDR_UNSPECIFIED(&c->ip4.our_tap_addr))
> > +		c->no_dhcp = 1;
> >  
> >  	if (c->ifi6 && IN6_IS_ADDR_UNSPECIFIED(&c->ip6.gw))
> >  		c->no_map_gw = 1;
> > diff --git a/dhcp.c b/dhcp.c
> > index acc5b03e..a935dc94 100644
> > --- a/dhcp.c
> > +++ b/dhcp.c
> > @@ -347,7 +347,7 @@ int dhcp(const struct ctx *c, const struct pool *p)
> >  	mask.s_addr = htonl(0xffffffff << (32 - c->ip4.prefix_len));
> >  	memcpy(opts[1].s,  &mask,        sizeof(mask));
> >  	memcpy(opts[3].s,  &c->ip4.gw,   sizeof(c->ip4.gw));
> > -	memcpy(opts[54].s, &c->ip4.gw,   sizeof(c->ip4.gw));
> > +	memcpy(opts[54].s, &c->ip4.our_tap_addr, sizeof(c->ip4.our_tap_addr));
> 
> Nit: this was supposed to look like a table, so it would be nice to add
> extra whitespace in the lines above this one.

Makes sense, done.
-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 19/22] conf, fwd: Split notion of gateway/router from guest-visible host address
  2024-08-20 19:56   ` Stefano Brivio
@ 2024-08-21  1:59     ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-21  1:59 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Paul Holzinger

[-- Attachment #1: Type: text/plain, Size: 2243 bytes --]

On Tue, Aug 20, 2024 at 09:56:31PM +0200, Stefano Brivio wrote:
> On Fri, 16 Aug 2024 15:40:00 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > The @gw fields in the ip4_ctx and ip6_ctx give the (host's) default
> > gateway.  We use this for two quite distinct things: advertising the
> > gateway that the guest should use (via DHCP, NDP and/or --config-net)
> > and for a limited form of NAT.  So that the guest can access services
> > on the host, we map the gateway address within the guest to the
> > loopback address on the host.
> > 
> > Using the gateway address for this isn't necessarily the best choice
> > for this purpose, certainly not for all circumstances.  So, start off
> > by splitting the notion of these into two different values: @guest_gw
> > which is the gateway address the guest should use and @nat_host_loopback,
> > which is the guest visible address to remap to the host's loopback.
> > 
> > Usually nat_host_loopback will have the same value as guest_gw.  However
> > when --no-map-gw is specified we leave them unspecified instead.  This
> > means when we use nat_host_loopback, we don't need to separately check
> > c->no_map_gw to see if it's relevant.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  conf.c  | 60 +++++++++++++++++++++++++++++----------------------------
> >  dhcp.c  | 10 ++++++----
> >  fwd.c   |  4 ++--
> >  passt.h | 16 +++++++++------
> >  pasta.c |  6 ++++--
> >  5 files changed, 53 insertions(+), 43 deletions(-)
> > 
> > diff --git a/conf.c b/conf.c
> > index b1c58d5b..26373584 100644
> > --- a/conf.c
> > +++ b/conf.c
> > @@ -410,12 +410,12 @@ static void add_dns_resolv(struct ctx *c, const char *nameserver,
> >  		 * redirect
> >  		 */
> >  		if (IN4_IS_ADDR_LOOPBACK(&ns4)) {
> > -			if (c->no_map_gw)
> > +			if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.nat_host_loopback))
> 
> If you change the command-line option name to use "map", it would be
> good to also change these names.

Will do.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 20/22] conf: Allow address remapped to host to be configured
  2024-08-20 19:56   ` Stefano Brivio
@ 2024-08-21  2:23     ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-21  2:23 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Paul Holzinger

[-- Attachment #1: Type: text/plain, Size: 13602 bytes --]

On Tue, Aug 20, 2024 at 09:56:34PM +0200, Stefano Brivio wrote:
> On Fri, 16 Aug 2024 15:40:01 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > Because the host and guest share the same IP address with passt/pasta, it's
> > not possible for the guest to directly address the host.  Therefore we
> > allow packets from the guest going to a special "NAT to host" address to be
> > redirected to the host, appearing there as though they have both source and
> > destination address of loopback.
> > 
> > Currently that special address is always the address of the default
> > gateway (or none).  That can be a problem if we want that gateway to be
> > addressable by the guest.  Therefore, allow the special "NAT to host"
> > address to be overridden on the command line with a new --nat-host-loopback
> > option.
> > 
> > In order to exercise and test it, update the passt_in_ns and perf
> > tests to use this option and give different mapping addresses for the
> > two layers of the environment.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  conf.c                | 57 +++++++++++++++++++++++++++++++--
> >  passt.1               | 16 ++++++++++
> >  test/lib/setup        | 11 +++++--
> >  test/passt_in_ns/dhcp | 73 +++++++++++++++++++++++++++++++++++++++++++
> >  test/passt_in_ns/tcp  | 38 +++++++++++-----------
> >  test/passt_in_ns/udp  | 22 +++++++------
> >  test/perf/passt_tcp   | 33 +++++++++----------
> >  test/perf/passt_udp   | 31 +++++++++---------
> >  test/perf/pasta_tcp   | 29 ++++++++---------
> >  test/perf/pasta_udp   | 25 ++++++++-------
> >  test/run              |  4 +--
> >  11 files changed, 244 insertions(+), 95 deletions(-)
> >  create mode 100644 test/passt_in_ns/dhcp
> > 
> > diff --git a/conf.c b/conf.c
> > index 26373584..c5831e82 100644
> > --- a/conf.c
> > +++ b/conf.c
> > @@ -817,6 +817,14 @@ static void usage(const char *name, FILE *f, int status)
> >  		fprintf(f, "  --no-dhcp-search	No list in DHCP/DHCPv6/NDP\n");
> >  
> >  	fprintf(f,
> > +		"  --nat-host-loopback ADDR	NAT ADDR to refer to host\n"
> > +		"    Packets from the guest to ADDR will be redirected to the\n"
> > +		"    host.  On the host such packets will appear to have both\n"
> > +		"    source and destination of loopback (127.0.0.1 or ::1).\n"
> 
> I would leave these three lines to the man page. The help message is
> already 90 lines long. This should be a quick guide/reminder, not a
> full description.

Good idea, done.

> This reminds me that 127.0.0.1 isn't the only IPv4 loopback address. I
> don't know if anybody will ever have a use case where they would need
> a different, specific, loopback source address, but, together with

This is primarily about translation of outbound connections, so
loopback is more the destination address than the source here.

> --nat-guest-addr from 22/22, I start wondering: what if we had a single
> option taking, optionally, an arbitrary (within limits) source address?

I'd like to see that, but it's a more complex exercise - we'd need a
table of NATs to step through.  This series is just aiming to handle
the most common cases for now.

> Now, given that we plan to add a configurable flow table at some point
> in the future, it makes no sense to make this exceedingly flexible. But
> I just wanted to bring this up for consideration, in case it's doable
> at a small cost (I'm really not sure):
> 
>   --map-host [source,]address
> 
> where "source" would default to 127.0.0.1, but it could also be another
> loopback address, or another address altogether (and we'll fail if it's
> not local, of course).

There's no particular reason it has to fail if non-local.  Even if we
have this in future, I think --map-guest-addr would still be useful
because it avoids the user having to spell out what host address they
expect the guest to take.

> If we want (can?) go that way and keep equivalent functionality as you
> have now, we would have the additional problem that this option could
> be given up to two times (one for loopback, one for non-loopback), and
> not more (we don't have a data structure ready for an arbitrary number
> of those), so it's not as generic as it might look like, and I'm not
> sure if it's a good idea. But we could also expand on it in the future.

Yeah, I see this more as a future extension.

> > +		"    ADDR can be 'none', in which case nothing is mapped\n"
> 
> This is a nice feature by the way as it should eventually allow us to
> get consistent options in Podman instead of "--map-gw": Podman could
> add by default '--map-host-loopback none', unless the user overrides
> that with an actual address.

Exactly.  The idea here is that we can eventually deprecate
--no-map-gw in favour of --map-host-loopback=none.

> 
> > +	        "    Can be specified zero to two (for IPv4 and IPv6)\n"
> 
> "can" (for consistency, but also because the subject is still the
> option, this is not a separate sentence).

Done.

> ...times.

And done.

> 
> > +		"    default: gateway address, or none if --no-map-gw is also\n"
> > +		"             specified\n"
> 
> I don't think we need to mention here that --no-map-gw implies none,
> doing it in the man page is enough.

Done.

> 
> >  		"  --dns-forward ADDR	Forward DNS queries sent to ADDR\n"
> >  		"    can be specified zero to two times (for IPv4 and IPv6)\n"
> >  		"    default: don't forward DNS queries\n"
> > @@ -959,6 +967,11 @@ static void conf_print(const struct ctx *c)
> >  	info("    host: %s", eth_ntop(c->our_tap_mac, bufmac, sizeof(bufmac)));
> >  
> >  	if (c->ifi4) {
> > +		if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.nat_host_loopback))
> > +			info("    NAT to host 127.0.0.1: %s",
> > +			     inet_ntop(AF_INET, &c->ip4.nat_host_loopback,
> > +				       buf4, sizeof(buf4)));
> > +
> >  		if (!c->no_dhcp) {
> >  			uint32_t mask;
> >  
> > @@ -989,6 +1002,11 @@ static void conf_print(const struct ctx *c)
> >  	}
> >  
> >  	if (c->ifi6) {
> > +		if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.nat_host_loopback))
> > +			info("    NAT to host ::1: %s",
> > +			     inet_ntop(AF_INET6, &c->ip6.nat_host_loopback,
> > +				       buf6, sizeof(buf6)));
> > +
> >  		if (!c->no_ndp && !c->no_dhcpv6)
> >  			info("NDP/DHCPv6:");
> >  		else if (!c->no_ndp)
> > @@ -1122,6 +1140,35 @@ static void conf_ugid(char *runas, uid_t *uid, gid_t *gid)
> >  	}
> >  }
> >  
> > +/**
> > + * conf_nat() - Parse --nat-host-loopback option
> > + * @c:		Execution context
> > + * @arg:	String argument to --nat-host-loopback
> > + * @no_map_gw:	--no-map-gw flag, updated for "none" argument
> > + */
> > +static void conf_nat(struct ctx *c, const char *arg, int *no_map_gw)
> > +{
> > +	if (strcmp(arg, "none") == 0) {
> > +		c->ip4.nat_host_loopback = in4addr_any;
> > +		c->ip6.nat_host_loopback = in6addr_any;
> > +		*no_map_gw = 1;
> > +	}
> > +
> > +	if (inet_pton(AF_INET6, arg, &c->ip6.nat_host_loopback) &&
> > +	    !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.nat_host_loopback)	&&
> > +	    !IN6_IS_ADDR_LOOPBACK(&c->ip6.nat_host_loopback)	&&
> > +	    !IN6_IS_ADDR_MULTICAST(&c->ip6.nat_host_loopback))
> > +		return;
> > +
> > +	if (inet_pton(AF_INET, arg, &c->ip4.nat_host_loopback)	&&
> > +	    !IN4_IS_ADDR_UNSPECIFIED(&c->ip4.nat_host_loopback)	&&
> > +	    !IN4_IS_ADDR_LOOPBACK(&c->ip4.nat_host_loopback)	&&
> > +	    !IN4_IS_ADDR_MULTICAST(&c->ip4.nat_host_loopback))
> > +		return;
> > +
> > +	die("Invalid address to remap to host: %s", optarg);
> > +}
> > +
> >  /**
> >   * conf_open_files() - Open files as requested by configuration
> >   * @c:		Execution context
> > @@ -1231,6 +1278,7 @@ void conf(struct ctx *c, int argc, char **argv)
> >  		{"no-copy-routes", no_argument,		NULL,		18 },
> >  		{"no-copy-addrs", no_argument,		NULL,		19 },
> >  		{"netns-only",	no_argument,		NULL,		20 },
> > +		{"nat-host-loopback", required_argument, NULL,		21 },
> >  		{ 0 },
> >  	};
> >  	const char *logname = (c->mode == MODE_PASTA) ? "pasta" : "passt";
> > @@ -1400,6 +1448,9 @@ void conf(struct ctx *c, int argc, char **argv)
> >  			netns_only = 1;
> >  			*userns = 0;
> >  			break;
> > +		case 21:
> > +			conf_nat(c, optarg, &no_map_gw);
> > +			break;
> >  		case 'd':
> >  			c->debug = 1;
> >  			c->quiet = 0;
> > @@ -1639,10 +1690,12 @@ void conf(struct ctx *c, int argc, char **argv)
> >  	    (*c->ip6.ifname_out && !c->ifi6))
> >  		die("External interface not usable");
> >  
> > -	if (c->ifi4 && !no_map_gw)
> > +	if (c->ifi4 && !no_map_gw &&
> > +	    IN4_IS_ADDR_UNSPECIFIED(&c->ip4.nat_host_loopback))
> >  		c->ip4.nat_host_loopback = c->ip4.guest_gw;
> >  
> > -	if (c->ifi6 && !no_map_gw)
> > +	if (c->ifi6 && !no_map_gw &&
> > +	    IN6_IS_ADDR_UNSPECIFIED(&c->ip6.nat_host_loopback))
> >  		c->ip6.nat_host_loopback = c->ip6.guest_gw;
> >  
> >  	if (c->ifi4 && IN4_IS_ADDR_UNSPECIFIED(&c->ip4.our_tap_addr))
> > diff --git a/passt.1 b/passt.1
> > index dca433b6..3680056a 100644
> > --- a/passt.1
> > +++ b/passt.1
> > @@ -327,6 +327,22 @@ namespace will be silently dropped.
> >  Disable Router Advertisements. Router Solicitations coming from guest or target
> >  namespace will be ignored.
> >  
> > +.TP
> > +.BR \-\-nat-host-loopback " " \fIaddr
> > +Translate \fIaddr\fR to refer to the host. Packets from the guest to
> > +\fIaddr\fR will be redirected to the host.  On the host such packets
> > +will appear to have both source and destination of loopback (127.0.0.1
> 
> I would skip "of loopback" and just say "127.0.0.1 or ::1", to avoid
> implying that there's a single loopback address for IPv4.

Done.

> > +or ::1).
> > +
> > +If \fIaddr\fR is 'none', no address is mapped (this implies
> > +\fB--no-map-gw\fR).  Only one IPv4 and one IPv6 address can be
> > +translated, if the option is specified multiple times, the last one
> > +takes effect.
> > +
> > +Default is to translate the guest's default gateway address, unless
> > +\fB--no-map-gw\fR is also given, in which case no address is mapped by
> 
> Why "also"? You're describing the default, so I guess this option is
> not actually given in that case.

Good point, fixed.

> > +default.
> > +
> >  .TP
> >  .BR \-\-no-map-gw
> >  Don't remap TCP connections and untracked UDP traffic, with the gateway address
> > diff --git a/test/lib/setup b/test/lib/setup
> > index 9b39b9fe..061bf997 100755
> > --- a/test/lib/setup
> > +++ b/test/lib/setup
> > @@ -124,7 +124,12 @@ setup_passt_in_ns() {
> >  	[ ${DEBUG} -eq 1 ] && __opts="${__opts} -d"
> >  	[ ${TRACE} -eq 1 ] && __opts="${__opts} --trace"
> >  
> > -	context_run_bg pasta "./pasta ${__opts} -t 10001,10002,10011,10012 -T 10003,10013 -u 10001,10002,10011,10012 -U 10003,10013 -P ${STATESETUP}/pasta.pid --config-net ${NSTOOL} hold ${STATESETUP}/ns.hold"
> > +        __nat_host4=192.0.2.1
> > +        __nat_host6=2001:db8:9a55::1
> > +        __nat_ns4=192.0.2.2
> > +        __nat_ns6=2001:db8:9a55::2
> > +
> > +	context_run_bg pasta "./pasta ${__opts} -t 10001,10002,10011,10012 -T 10003,10013 -u 10001,10002,10011,10012 -U 10003,10013 -P ${STATESETUP}/pasta.pid --nat-host-loopback ${__nat_host4} --nat-host-loopback ${__nat_host6} --config-net ${NSTOOL} hold ${STATESETUP}/ns.hold"
> >  	wait_for [ -f "${STATESETUP}/pasta.pid" ]
> >  
> >  	context_setup_nstool qemu ${STATESETUP}/ns.hold
> > @@ -139,11 +144,11 @@ setup_passt_in_ns() {
> >  	if [ ${VALGRIND} -eq 1 ]; then
> >  		context_run passt "make clean"
> >  		context_run passt "make valgrind"
> > -		context_run_bg passt "valgrind --max-stackframe=$((4 * 1024 * 1024)) --trace-children=yes --vgdb=no --error-exitcode=1 --suppressions=test/valgrind.supp ./passt -f ${__opts} -s ${STATESETUP}/passt.socket -t 10001,10011,10021,10031 -u 10001,10011,10021,10031 -P ${STATESETUP}/passt.pid"
> > +		context_run_bg passt "valgrind --max-stackframe=$((4 * 1024 * 1024)) --trace-children=yes --vgdb=no --error-exitcode=1 --suppressions=test/valgrind.supp ./passt -f ${__opts} -s ${STATESETUP}/passt.socket -t 10001,10011,10021,10031 -u 10001,10011,10021,10031 -P ${STATESETUP}/passt.pid --nat-host-loopback ${__nat_ns4} --nat-host-loopback ${__nat_ns6}"
> >  	else
> >  		context_run passt "make clean"
> >  		context_run passt "make"
> > -		context_run_bg passt "./passt -f ${__opts} -s ${STATESETUP}/passt.socket -t 10001,10011,10021,10031 -u 10001,10011,10021,10031 -P ${STATESETUP}/passt.pid"
> > +		context_run_bg passt "./passt -f ${__opts} -s ${STATESETUP}/passt.socket -t 10001,10011,10021,10031 -u 10001,10011,10021,10031 -P ${STATESETUP}/passt.pid --nat-host-loopback ${__nat_ns4} --nat-host-loopback ${__nat_ns6}"
> >  	fi
> >  	wait_for [ -f "${STATESETUP}/passt.pid" ]
> >  
> > diff --git a/test/passt_in_ns/dhcp b/test/passt_in_ns/dhcp
> > new file mode 100644
> > index 00000000..48c7d197
> > --- /dev/null
> > +++ b/test/passt_in_ns/dhcp
> 
> ...how did this happen? This file already exists.

No, it didn't.  Previously we reused passt/dhcp for the passt_in_ns
tests.  With the change to the tests exercising the new option that
doesn't work any more, because we need slightly different checks for
DHCP to match what we expect when --map-host-loopback is used.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 22/22] fwd, conf: Allow NAT of the guest's assigned address
  2024-08-20 19:56   ` Stefano Brivio
@ 2024-08-21  2:28     ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-21  2:28 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Paul Holzinger

[-- Attachment #1: Type: text/plain, Size: 10009 bytes --]

On Tue, Aug 20, 2024 at 09:56:40PM +0200, Stefano Brivio wrote:
> On Fri, 16 Aug 2024 15:40:03 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > The guest is usually assigned one of the host's IP addresses.  That means
> > it can't access the host itself via its usual address.  The
> > --nat-host-loopback option (enabled by default with the gateway address)
> > allows the guest to contact the host.  However, connections forwarded this
> > way appear on the host to have originated from the loopback interface,
> > which isn't always desirable.
> > 
> > Add a new --nat-guest-addr option, which acts similarly but forwarded
> > connections will go to the host's external address, instead of loopback.
> > 
> > If '-a' is used, so the guest's address is not the same as the host's, this
> > will instead forward to whatever host-visible site is shadowed by the
> > guest's assigned address.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  conf.c  | 51 ++++++++++++++++++++++++++++++++++-----------------
> >  fwd.c   | 10 ++++++++++
> >  passt.1 | 15 +++++++++++++++
> >  passt.h |  6 ++++++
> >  4 files changed, 65 insertions(+), 17 deletions(-)
> > 
> > diff --git a/conf.c b/conf.c
> > index c5831e82..d14abc63 100644
> > --- a/conf.c
> > +++ b/conf.c
> > @@ -825,6 +825,14 @@ static void usage(const char *name, FILE *f, int status)
> >  	        "    Can be specified zero to two (for IPv4 and IPv6)\n"
> >  		"    default: gateway address, or none if --no-map-gw is also\n"
> >  		"             specified\n"
> > +		"  --nat-guest-addr ADDR	NAT ADDR to guest's address\n"
> > +		"    Packets from the guest to ADDR will be redirected to the\n"
> > +		"    adress on the host that's the same as the guest's\n"
> > +		"    assigned address.  Usually that means (one of) the host's\n"
> > +		"    global address.\n"
> 
> Same as 20/22, it's probably enough to have this in the man page.
> 
> > +		"    ADDR can be 'none', in which case nothing is mapped\n"
> > +	        "    Can be specified zero to two (for IPv4 and IPv6)\n"
> 
> "can", times

Done.

> > +		"    default: none\n"
> >  		"  --dns-forward ADDR	Forward DNS queries sent to ADDR\n"
> >  		"    can be specified zero to two times (for IPv4 and IPv6)\n"
> >  		"    default: don't forward DNS queries\n"
> > @@ -1141,29 +1149,32 @@ static void conf_ugid(char *runas, uid_t *uid, gid_t *gid)
> >  }
> >  
> >  /**
> > - * conf_nat() - Parse --nat-host-loopback option
> > - * @c:		Execution context
> > - * @arg:	String argument to --nat-host-loopback
> > - * @no_map_gw:	--no-map-gw flag, updated for "none" argument
> > + * conf_nat() - Parse --nat-host-loopback or --nat-guest-addr option
> > + * @arg:	String argument to option
> > + * @addr4:	IPv4 to update with parsed address
> > + * @addr6:	IPv6 to update with parsed address
> > + * @no_map_gw:	--no-map-gw flag, or NULL, updated for "none" argument
> >   */
> > -static void conf_nat(struct ctx *c, const char *arg, int *no_map_gw)
> > +static void conf_nat(const char *arg, struct in_addr *addr4,
> > +		     struct in6_addr *addr6, int *no_map_gw)
> >  {
> >  	if (strcmp(arg, "none") == 0) {
> > -		c->ip4.nat_host_loopback = in4addr_any;
> > -		c->ip6.nat_host_loopback = in6addr_any;
> > -		*no_map_gw = 1;
> > +		*addr4 = in4addr_any;
> > +		*addr6 = in6addr_any;
> > +		if (no_map_gw)
> > +			*no_map_gw = 1;
> >  	}
> >  
> > -	if (inet_pton(AF_INET6, arg, &c->ip6.nat_host_loopback) &&
> > -	    !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.nat_host_loopback)	&&
> > -	    !IN6_IS_ADDR_LOOPBACK(&c->ip6.nat_host_loopback)	&&
> > -	    !IN6_IS_ADDR_MULTICAST(&c->ip6.nat_host_loopback))
> > +	if (inet_pton(AF_INET6, arg, addr6)	&&
> > +	    !IN6_IS_ADDR_UNSPECIFIED(addr6)	&&
> > +	    !IN6_IS_ADDR_LOOPBACK(addr6)	&&
> > +	    !IN6_IS_ADDR_MULTICAST(addr6))
> >  		return;
> >  
> > -	if (inet_pton(AF_INET, arg, &c->ip4.nat_host_loopback)	&&
> > -	    !IN4_IS_ADDR_UNSPECIFIED(&c->ip4.nat_host_loopback)	&&
> > -	    !IN4_IS_ADDR_LOOPBACK(&c->ip4.nat_host_loopback)	&&
> > -	    !IN4_IS_ADDR_MULTICAST(&c->ip4.nat_host_loopback))
> > +	if (inet_pton(AF_INET, arg, addr4)	&&
> > +	    !IN4_IS_ADDR_UNSPECIFIED(addr4)	&&
> > +	    !IN4_IS_ADDR_LOOPBACK(addr4)	&&
> > +	    !IN4_IS_ADDR_MULTICAST(addr4))
> >  		return;
> >  
> >  	die("Invalid address to remap to host: %s", optarg);
> > @@ -1279,6 +1290,7 @@ void conf(struct ctx *c, int argc, char **argv)
> >  		{"no-copy-addrs", no_argument,		NULL,		19 },
> >  		{"netns-only",	no_argument,		NULL,		20 },
> >  		{"nat-host-loopback", required_argument, NULL,		21 },
> > +		{"nat-guest-addr", required_argument,	NULL,		22 },
> >  		{ 0 },
> >  	};
> >  	const char *logname = (c->mode == MODE_PASTA) ? "pasta" : "passt";
> > @@ -1449,7 +1461,12 @@ void conf(struct ctx *c, int argc, char **argv)
> >  			*userns = 0;
> >  			break;
> >  		case 21:
> > -			conf_nat(c, optarg, &no_map_gw);
> > +			conf_nat(optarg, &c->ip4.nat_host_loopback,
> > +				 &c->ip6.nat_host_loopback, &no_map_gw);
> > +			break;
> > +		case 22:
> > +			conf_nat(optarg, &c->ip4.nat_guest_addr,
> > +				 &c->ip6.nat_guest_addr, NULL);
> >  			break;
> >  		case 'd':
> >  			c->debug = 1;
> > diff --git a/fwd.c b/fwd.c
> > index 7718f7e2..ff4789a2 100644
> > --- a/fwd.c
> > +++ b/fwd.c
> > @@ -272,6 +272,10 @@ uint8_t fwd_nat_from_tap(const struct ctx *c, uint8_t proto,
> >  		tgt->eaddr = inany_loopback4;
> >  	else if (inany_equals6(&ini->oaddr, &c->ip6.nat_host_loopback))
> >  		tgt->eaddr = inany_loopback6;
> > +	else if (inany_equals4(&ini->oaddr, &c->ip4.nat_guest_addr))
> > +		tgt->eaddr = inany_from_v4(c->ip4.addr);
> > +	else if (inany_equals6(&ini->oaddr, &c->ip6.nat_guest_addr))
> > +		tgt->eaddr.a6 = c->ip6.addr;
> >  	else
> >  		tgt->eaddr = ini->oaddr;
> >  
> > @@ -393,6 +397,12 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto,
> >  	} else if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.nat_host_loopback) &&
> >  		   inany_equals6(&ini->eaddr, &in6addr_loopback)) {
> >  		tgt->oaddr.a6 = c->ip6.nat_host_loopback;
> > +	} else if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.nat_guest_addr) &&
> > +		   inany_equals4(&ini->eaddr, &c->ip4.addr)) {
> > +		tgt->oaddr = inany_from_v4(c->ip4.nat_guest_addr);
> > +	} else if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.nat_guest_addr) &&
> > +		   inany_equals6(&ini->eaddr, &c->ip6.addr)) {
> > +		tgt->oaddr.a6 = c->ip6.nat_guest_addr;
> >  	} else if (!fwd_guest_accessible(c, &ini->eaddr)) {
> >  		if (inany_v4(&ini->eaddr)) {
> >  			if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.our_tap_addr))
> > diff --git a/passt.1 b/passt.1
> > index 3680056a..7cf553cf 100644
> > --- a/passt.1
> > +++ b/passt.1
> > @@ -350,6 +350,21 @@ as destination, to the host. Implied if there is no gateway on the selected
> >  default route, or if there is no default route, for any of the enabled address
> >  families.
> >  
> > +.TP
> > +.BR \-\-nat-guest-loopback " " \fIaddr
> > +Translate \fIaddr\fR in the guest to be equal to the guest's assigned
> > +address on the host.  That is, packets from the guest to \fIaddr\fR
> > +will be redirected to the address assigned to the guest with \fB-a\fR,
> > +or by default the host's global address.  This allows the guest to
> > +access services availble on the host's global address, even though its
> > +own address shadows that of the host.
> > +
> > +If \fIaddr\fR is 'none', no address is mapped.  Only one IPv4 and one
> > +IPv6 address can be translated, if the option is specified multiple
> 
> , and if

Done.

Also fixed the fact I incorrectly called it --nat-guest-loopback
instead of --map-guest-addr above.

> > +times, the last one for each address type takes effect.
> > +
> > +Default is no mapping.
> > +
> >  .TP
> >  .BR \-4 ", " \-\-ipv4-only
> >  Enable IPv4-only operation. IPv6 traffic will be ignored.
> > diff --git a/passt.h b/passt.h
> > index 20a5904a..586c1d05 100644
> > --- a/passt.h
> > +++ b/passt.h
> > @@ -104,6 +104,8 @@ enum passt_modes {
> >   * @guest_gw:		IPv4 gateway as seen by the guest
> >   * @nat_host_loopback:	Outbound connections to this address are NATted to the
> >   *                      host's 127.0.0.1
> > + * @nat_guest_addr:	Outbound connections to this address are NATted to the
> > + *                      guest's assigned address
> >   * @dns:		DNS addresses for DHCP, zero-terminated
> >   * @dns_match:		Forward DNS query if sent to this address
> >   * @our_tap_addr:	IPv4 address for passt's use on tap
> > @@ -120,6 +122,7 @@ struct ip4_ctx {
> >  	int prefix_len;
> >  	struct in_addr guest_gw;
> >  	struct in_addr nat_host_loopback;
> > +	struct in_addr nat_guest_addr;
> >  	struct in_addr dns[MAXNS + 1];
> >  	struct in_addr dns_match;
> >  	struct in_addr our_tap_addr;
> > @@ -142,6 +145,8 @@ struct ip4_ctx {
> >   * @guest_gw:		IPv6 gateway as seen by the guest
> >   * @nat_host_loopback:	Outbound connections to this address are NATted to the
> >   *                      host's [::1]
> > + * @nat_guest_addr:	Outbound connections to this address are NATted to the
> > + *                      guest's assigned address
> >   * @dns:		DNS addresses for DHCPv6 and NDP, zero-terminated
> >   * @dns_match:		Forward DNS query if sent to this address
> >   * @our_tap_ll:		Link-local IPv6 address for passt's use on tap
> > @@ -158,6 +163,7 @@ struct ip6_ctx {
> >  	struct in6_addr addr_ll_seen;
> >  	struct in6_addr guest_gw;
> >  	struct in6_addr nat_host_loopback;
> > +	struct in6_addr nat_guest_addr;
> >  	struct in6_addr dns[MAXNS + 1];
> >  	struct in6_addr dns_match;
> >  	struct in6_addr our_tap_ll;
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/22] RFC: Allow configuration of special case NATs
  2024-08-20 20:39           ` Stefano Brivio
@ 2024-08-21  2:51             ` David Gibson
  0 siblings, 0 replies; 55+ messages in thread
From: David Gibson @ 2024-08-21  2:51 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Paul Holzinger

[-- Attachment #1: Type: text/plain, Size: 18550 bytes --]

On Tue, Aug 20, 2024 at 10:39:26PM +0200, Stefano Brivio wrote:
> On Tue, 20 Aug 2024 10:42:17 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Mon, Aug 19, 2024 at 03:01:00PM +0200, Stefano Brivio wrote:
> > > On Mon, 19 Aug 2024 19:52:49 +1000
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > On Mon, Aug 19, 2024 at 11:27:49AM +0200, Stefano Brivio wrote:  
> > > > > On Mon, 19 Aug 2024 18:46:31 +1000
> > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > >     
> > > > > > On Fri, Aug 16, 2024 at 03:39:41PM +1000, David Gibson wrote:    
> > > > > > > Based on Stefano's recent patch for faster tests.
> > > > > > > 
> > > > > > > Allow the user to specify which addresses are translated when used by
> > > > > > > the guest, rather than always being the gateway address or nothing.
> > > > > > > We also allow this remapping to go to the host's global address (more
> > > > > > > precisely the address assigned to the guest) rather than just host
> > > > > > > loopback.
> > > > > > > 
> > > > > > > Suggestions for better names for the new options in patches 20 & 22
> > > > > > > are most welcome.
> > > > > > > 
> > > > > > > Along the way to implementing that make many changes to clarify what
> > > > > > > various addresses we track mean, fixing a number of small bugs as
> > > > > > > well.
> > > > > > > 
> > > > > > > NOTE: there is a bug in 21/22 which breaks some of the passt_tcp perf
> > > > > > > tests.  I haven't managed to figure out why it's causing the problem,
> > > > > > > or even what the exact triggering conditions are (running the single
> > > > > > > stalling iperf alone doesn't do it).  Have to wrap up for today, so I
> > > > > > > thought I'd get this out for review anyway.      
> > > > > > 
> > > > > > I've identified the bug here.  IMO, it's a pre-existing problem that
> > > > > > only works by accident at the moment.  The immediate fix is pretty
> > > > > > obvious, but it raises some broader questions
> > > > > > 
> > > > > > The problem arises because of the MTU changes we make in order to test
> > > > > > throughput with different packet sizes.  Specifically we change the
> > > > > > MTU to values < 1280, which implicitly disables IPv6 since it requires
> > > > > > an MTU >= 1280.  When we change the MTU back to a larger value IPv6 is
> > > > > > re-enabled, but some configuration has been lost in the meantime.
> > > > > > 
> > > > > > After the MTU is restored the guest reconfigures with NDP, but does
> > > > > > not re-DHCPv6.  That means the guest gets a SLAAC address in the right
> > > > > > prefix but not the exact /128 address we've tried to assign to it.
> > > > > > However, at least with the sequence of things we have in the tests,
> > > > > > the guest never sends any packets with the new address, so passt
> > > > > > doesn't update addr_seen.  When the inbound connection comes we send
> > > > > > it to the assigned address instead of the guest's actual address and
> > > > > > the guest rejects it.    
> > > > > 
> > > > > I still have to take a closer look, but I'm fairly sure I hit a similar
> > > > > issue while I was writing these tests originally. I pondered
> > > > > reconfiguring the address via DHCPv6, or using the keep_addr_on_down
> > > > > sysctl (net.ipv6.conf.<interface>.keep_addr_on_down), which was added
> > > > > around that time.
> > > > > 
> > > > > Then:
> > > > >     
> > > > > > This "worked" previously, because before this patch, passt would
> > > > > > translate the inbound connection to have source/dest as link-local
> > > > > > addresses.    
> > > > > 
> > > > > ...I realised that this worked and forgot about the whole issue.
> > > > >     
> > > > > > We *do* have a current addr_ll_seen because (a) it won't
> > > > > > change if the guest doesn't change MAC and (b) when IPv6 is re-enabled
> > > > > > the NDP traffic the guest generates will have link-local addresses
> > > > > > that update addr_ll_seen.  With this patch, and a global address for
> > > > > > --map-host-loopback, we now need to send to addr_seen instead of
> > > > > > addr_ll_seen, hence exposing the bug.
> > > > > > 
> > > > > > In the short term, the obvious fix would be to re-run dhclient -6 in
> > > > > > the guest after we twiddle MTU but before running IPv6 tests.    
> > > > > 
> > > > > I guess setting keep_addr_on_down (even for "all" interfaces) should
> > > > > work as well.    
> > > > 
> > > > Sounds like it.  I wasn't aware of that one.
> > > > 
> > > > /me tests..  actually, no it doesn't work..
> > > > 
> > > > # sysctl -a | grep keep_addr_on_down
> > > > net.ipv6.conf.all.keep_addr_on_down = 1
> > > > net.ipv6.conf.default.keep_addr_on_down = 1
> > > > net.ipv6.conf.dummy0.keep_addr_on_down = 1
> > > > net.ipv6.conf.lo.keep_addr_on_down = 0
> > > > # ip addr add 2001:db8::1 dev dummy0
> > > > # ip a
> > > > 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
> > > >     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> > > > 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
> > > >     link/ether c2:02:f2:79:f9:94 brd ff:ff:ff:ff:ff:ff
> > > >     inet6 2001:db8::1/128 scope global 
> > > >        valid_lft forever preferred_lft forever
> > > > # ip link set dummy0 mtu 1200
> > > > # ip a
> > > > 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
> > > >     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> > > > 2: dummy0: <BROADCAST,NOARP> mtu 1200 qdisc noop state DOWN group default qlen 1000
> > > >     link/ether c2:02:f2:79:f9:94 brd ff:ff:ff:ff:ff:ff
> > > > # ip link set dummy0 mtu 1500
> > > > # ip a
> > > > 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
> > > >     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> > > > 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
> > > >     link/ether c2:02:f2:79:f9:94 brd ff:ff:ff:ff:ff:ff
> > > > 
> > > > My guess is that IPv6 being deconfigured because of an unsuitable MTU
> > > > is considered a different event from a mere "down".  
> > > 
> > > I guess it's because they're not IFA_F_PERMANENT, because
> > > addrconf_permanent_addr() has:
> > > 
> > >         case NETDEV_CHANGEMTU:
> > >                 /* if MTU under IPV6_MIN_MTU stop IPv6 on this interface. */
> > >                 if (dev->mtu < IPV6_MIN_MTU) {
> > >                         addrconf_ifdown(dev, dev != net->loopback_dev);
> > >                         break;
> > >                 }
> > > 
> > > but addrconf_ifdown() does:
> > > 
> > >                                 if (!keep_addr ||
> > >                                     !(ifa->flags & IFA_F_PERMANENT) ||
> > >                                     addr_is_local(&ifa->addr)) {
> > >                                         hlist_del_init_rcu(&ifa->addr_lst);
> > >                                         goto restart;
> > >                                 }
> > > 
> > > I'm not sure about the logic behind that. We could actually set those
> > > addresses as permanent once the DHCPv6 client configures them, if it's
> > > cleaner.  
> > 
> > Huh.  Not in the passt/VM case, though, which is where I actually
> > encountered this.
> 
> I meant using ip(8) from the test script itself, but it doesn't
> actually make sense:
> 
> # ip address change 2a01:4f8:222:904:c800:94ff:fe29:a8d/64 permanent dev eth0
> Warning: permanent option is not mutable from userspace
> 
> because (RFC 3549):
> 
>    IFA_F_PERMANENT  For a permanent address set by the user.
>                     When this is not set, it means the address
>                     was dynamically created (e.g., by stateless
>                     autoconfiguration).
> 
> So the address you used in your test _should_ have IFA_F_PERMANENT. The
> plot thickens.
> 
> I just tried this, which confirms your hypothesis that bringing the
> link down is a different event:
> 
> # ip addr add 2001:db8::1 dev dummy0
> # ip link set dummy0 down
> # ip addr show dev dummy0
> 5: dummy0: <BROADCAST,NOARP> mtu 1280 qdisc noqueue state DOWN group default qlen 1000
>     link/ether 02:59:00:28:1b:5f brd ff:ff:ff:ff:ff:ff
>     inet 1.2.3.1/24 scope global dummy0
>        valid_lft forever preferred_lft forever
>     inet6 2001:db8::1/128 scope global 
>        valid_lft forever preferred_lft forever
> # ip link set dummy0 mtu 1279
> # ip addr show dev dummy0
> 5: dummy0: <BROADCAST,NOARP> mtu 1279 qdisc noqueue state DOWN group default qlen 1000
>     link/ether 02:59:00:28:1b:5f brd ff:ff:ff:ff:ff:ff
>     inet 1.2.3.1/24 scope global dummy0
>        valid_lft forever preferred_lft forever
> 
> ...I just can't see that from the code.

Ok.

> > > > > > This kind of opens a question about how hard we should try to
> > > > > > accomodate guests which don't configure themselves how we told them.    
> > > > > 
> > > > > There's a notable distinction between guests temporarily diverging (in
> > > > > different ways) and guests we don't configure at all.    
> > > > 
> > > > I'm not really sure what you're getting at here.  
> > > 
> > > In this case, it's not true that the guest doesn't configure itself in
> > > the way we requested -- it's just a temporary diversion from that
> > > configuration.  
> > 
> > Oh, I see.  Assuming that at some point the DHCP client will re-run.
> > 
> > > Those are different cases that we can handle in different ways, I
> > > think. If it's a glitch that will only happen during testing, let's
> > > work around that.
> > > 
> > > But if the guest really ignores DHCPv6 information, I think we should
> > > keep that working.
> > >   
> > > > > It's probably more important to ensure we use the right type of address    
> > > > 
> > > > "type" in what sense here?  
> > > 
> > > Global unicast instead of link-local.  
> > 
> > Ok.
> > 
> > > > > (security) rather than ensuring we somehow manage to deliver packets at
> > > > > any time (minor glitch otherwise), also because the one you describe is
> > > > > something we're unlikely to hit outside of tests.
> > > > >     
> > > > > > Personally I'd be ok with saying that nothing works if the guest
> > > > > > doesn't configure itself properly, thereby removing addr_seen and
> > > > > > addr_ll_seen entirely.  But I think, Stefano, you've been against that
> > > > > > idea in the past.    
> > > > > 
> > > > > Yes, I still think we should support guests that don't use DHCPv6 or
> > > > > NDP at all,    
> > > > 
> > > > Well, you still wouldn't *need* DHCPv6 or NDP, but you'd have to
> > > > manually configure the interface in the guest to match the address
> > > > you've configured with -a.  Just like you'd expect to have to
> > > > correctly configure your address on a real network.  
> > > 
> > > True, but if we make correctness as optional as possible, we'll be more
> > > compatible (less time spent by users fixing situations that don't
> > > necessarily need fixing, less time spent by developers to look into
> > > reports, no matter who's at fault).  
> > 
> > Eh, maybe.  Unless us trying to make sense of a nonsense situation
> > causes some unpredictable behaviour that breaks something else.
> > 
> > > > > or where related exchanges fail for any reason. It improves
> > > > > reliability and compatibility at a small cost. In this case, I think
> > > > > it's a nice feature that we would resume communicating as soon as the
> > > > > guest shows its global unicast address.    
> > > > 
> > > > Hm, maybe.  I'm not entirely convinced the cost is so small long term.
> > > > It's pretty badly incompatible with having multiple guests behind the
> > > > same passt instance: such as the initial guest bridging or routing to
> > > > nested guests.  
> > > 
> > > Why? We will need to hash the interface/guest index anyway, for
> > > outbound flows.  
> > 
> > If we have separate interfaces for each guest, yes.  But not if we
> > have multiple guests behind a single tap because the initial guest
> > sets up a bridge or routing.  Then we have nothing but the address.
> 
> ...but then we should have multiple addresses anyway.

Yes.. that's kind of my point.

> By the way, I'm
> not sure we'll ever be able to support that kind of configuration.

I don't see why not.  It would require configuration so that it's
clear what each inbound forward targets.  But I don't see any inherent
problem here, though there are a number of current implementation
details which prevent it (addr_seen is one, replying to all arps is
another).

> How
> does a guest set up a bridge and use passt at the same time?

I'm not thinking of a bridge shared with the host, but a bridge (or
routing) between nested guests or namespaces.  This is essentially the
"private switch with pasta uplink" case we've discussed occasionally
before.  It doesn't technically have to be nested guests - the guest
could bridge between its uplink and a tunnel, but nested guests is the
likely use case.

> > > And for inbound flows, if a guest steals the address of another guest,
> > > we'll give priority to the normal 'addr' versions instead of the
> > > '_seen' ones, to decide how to direct traffic.  
> > 
> > I don't see how we'd know we're in this situation, so when to
> > prioritise which address over the other.
> 
> In the set of all addr_seen and addr, we would have at least a
> non-unique value. Or, practically speaking, we should refuse to set
> addr_seen if it matches addr for another guest.

Ah, ok.  So again, assuming a static configuration of known guests,
rather than a local bridge established by a guest at runtime.

> > > > I'm actually not sure if encountering this bug makes me more or less
> > > > in favour of addr_seen.  On the one hand I think it highlights the
> > > > flakiness of this approach; there are situations where we just won't
> > > > know the right address.  
> > > 
> > > I don't understand this argument: indeed, there are such situations,
> > > and they are annoying. Why should we make them more common?  
> > 
> > Because predictability is good, and working _most_ of the time is a
> > failure of predictability.
> 
> It avoids substantial effort and frustration for everybody involved
> though. The practical problem with lacking predictability is if it
> makes things harder to debug, I guess, which shouldn't be the case here.
> 
> > > > On the other hand if shows a relatively
> > > > plausible case where the guest won't get exactly the address we want
> > > > it to (it uses NDP but not DHCPv6)
> > > > 
> > > > Hrm... actually this also shows a potential danger in the recent
> > > > patches to disable DAD in the guest.  With DAD enabled, when the guest
> > > > grabs a new address, we'd expect it to emit DAD messages, which would
> > > > have the side effect of updating our addr_seen (although I'm pretty
> > > > sure I hit this patch before the nodad patches were applied, so that
> > > > doesn't seem to be foolproof).  
> > > 
> > > Well, but we do that for containers with --config-net only. In that
> > > case, the addresses we configure have infinite lifetime anyway.  
> > 
> > Oh, good point.  Hrm... then I'm unsure why the guest wasn't re-DADing
> > its new address.
> 
> It probably did, but we ignored that anyway because DAD is done by
> sending neighbour solicitations with an unspecified address as source,
> for example (the "change" here drops "nodad"):
> 
> $ ./pasta --config-net -p dad.pcap
> Saving packet capture to dad.pcap
> # ip addr change dev enp9s0 fe80::3882:b5ff:fe01:e9a1/64
> # tshark -r dad.pcap |grep Neigh
> Running as user "root" and group "root". This could be dangerous.
>    10   2.642467           :: → ff02::1:ff01:e9a1 ICMPv6 86 Neighbor Solicitation for fe80::3882:b5ff:fe01:e9a1
> 
> and in tap6_handler() we do:
> 
>                 } else if (!IN6_IS_ADDR_UNSPECIFIED(saddr)){
>                         c->ip6.addr_seen = *saddr;
>                 }
> 
> ...then, in ndp():
> 
>                 if (IN6_IS_ADDR_UNSPECIFIED(saddr))
>                         return 1;
> 
> we could set addr_seen by looking at the *target* address of the
> neighbour solicitation when the source address is ::, but it's not
> implemented yet.

Right.  I forgot the NS went out with :: as source.  Snooping the NS
that way again assumes that there's only one logical machine on the
guest side.  But since this is for addr_seen which fundamentally
assumes that anyway, I guess it doesn't make anything worse.

> > > Besides, I don't think we need to have addr_seen updated as quickly and
> > > correctly as possible just for the sake of it, we can also update it
> > > when we get any other neighbour solicitation because the guest is
> > > actually using the network. It's not meant to be perfect.  
> > 
> > If the guest is a pure server (a common case for containers AFAICT),
> > then I don't know that we can expect NS messages for anything other
> > than the default gateway, which is (typically) link-local and so won't
> > help us to learn the new global address.
> 
> Containers running actual applications are noisy. I've only seen this
> kind of problem (addr_seen not set/matching) in particularly crafted
> test environments.
> 
> > > > We could maybe update addr_seen when we send RA messages to the guest
> > > > - assuming that it will use the same host part (low 64-bits) for both
> > > > link-local and global addresses.  Not sure if that's a widely safe
> > > > assumption or not.  
> > > 
> > > I don't understand: what case are you trying to cover with this?  
> > 
> > A case just like the one in the tests: the interface bounces, and we
> > get NDP traffic on the link-local address, but nothing on the global
> > address before an inbound connection.
> 
> Oh, I see. I think it makes sense, even though we'll set addr_seen a
> bit too early, but not enough to be a practical issue, I think.

Yes, but I think snopping the NS from DAD is probably a better idea.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2024-08-21  2:51 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-16  5:39 [PATCH 00/22] RFC: Allow configuration of special case NATs David Gibson
2024-08-16  5:39 ` [PATCH 01/22] treewide: Use "our address" instead of "forwarding address" David Gibson
2024-08-18 15:44   ` Stefano Brivio
2024-08-19  1:28     ` David Gibson
2024-08-16  5:39 ` [PATCH 02/22] util: Helper for formatting MAC addresses David Gibson
2024-08-18 15:44   ` Stefano Brivio
2024-08-19  1:29     ` David Gibson
2024-08-16  5:39 ` [PATCH 03/22] treewide: Rename MAC address fields for clarity David Gibson
2024-08-18 15:45   ` Stefano Brivio
2024-08-19  1:36     ` David Gibson
2024-08-16  5:39 ` [PATCH 04/22] treewide: Use struct assignment instead of memcpy() for IP addresses David Gibson
2024-08-18 15:45   ` Stefano Brivio
2024-08-19  1:38     ` David Gibson
2024-08-16  5:39 ` [PATCH 05/22] conf: Use array indices rather than pointers for DNS array slots David Gibson
2024-08-16  5:39 ` [PATCH 06/22] conf: More accurately count entries added in get_dns() David Gibson
2024-08-16  5:39 ` [PATCH 07/22] conf: Move DNS array bounds checks into add_dns[46] David Gibson
2024-08-16  5:39 ` [PATCH 08/22] conf: Move adding of a nameserver from resolv.conf into subfunction David Gibson
2024-08-16  5:39 ` [PATCH 09/22] conf: Correct setting of dns_match address in add_dns6() David Gibson
2024-08-16  5:39 ` [PATCH 10/22] conf: Treat --dns addresses as guest visible addresses David Gibson
2024-08-16  5:39 ` [PATCH 11/22] conf: Remove incorrect initialisation of addr_ll_seen David Gibson
2024-08-16  5:39 ` [PATCH 12/22] util: Correct sock_l4() binding for link local addresses David Gibson
2024-08-20  0:14   ` Stefano Brivio
2024-08-20  1:29     ` David Gibson
2024-08-16  5:39 ` [PATCH 13/22] treewide: Change misleading 'addr_ll' name David Gibson
2024-08-20  0:15   ` Stefano Brivio
2024-08-20  1:30     ` David Gibson
2024-08-16  5:39 ` [PATCH 14/22] Clarify which addresses in ip[46]_ctx are meaningful where David Gibson
2024-08-16  5:39 ` [PATCH 15/22] Initialise our_tap_ll to ip6.gw when suitable David Gibson
2024-08-16  5:39 ` [PATCH 16/22] fwd: Helpers to clarify what host addresses aren't guest accessible David Gibson
2024-08-20 19:56   ` Stefano Brivio
2024-08-21  1:40     ` David Gibson
2024-08-16  5:39 ` [PATCH 17/22] fwd: Split notion of "our tap address" from gateway for IPv4 David Gibson
2024-08-20 19:56   ` Stefano Brivio
2024-08-21  1:56     ` David Gibson
2024-08-16  5:39 ` [PATCH 18/22] Don't take "our" MAC address from the host David Gibson
2024-08-16  5:40 ` [PATCH 19/22] conf, fwd: Split notion of gateway/router from guest-visible host address David Gibson
2024-08-20 19:56   ` Stefano Brivio
2024-08-21  1:59     ` David Gibson
2024-08-16  5:40 ` [PATCH 20/22] conf: Allow address remapped to host to be configured David Gibson
2024-08-20 19:56   ` Stefano Brivio
2024-08-21  2:23     ` David Gibson
2024-08-16  5:40 ` [PATCH 21/22] fwd: Distinguish translatable from untranslatable addresses on inbound David Gibson
2024-08-16  5:40 ` [PATCH 22/22] fwd, conf: Allow NAT of the guest's assigned address David Gibson
2024-08-20 19:56   ` Stefano Brivio
2024-08-21  2:28     ` David Gibson
2024-08-16 14:45 ` [PATCH 00/22] RFC: Allow configuration of special case NATs Paul Holzinger
2024-08-16 15:03   ` Stefano Brivio
2024-08-17  8:01     ` David Gibson
2024-08-19  8:46 ` David Gibson
2024-08-19  9:27   ` Stefano Brivio
2024-08-19  9:52     ` David Gibson
2024-08-19 13:01       ` Stefano Brivio
2024-08-20  0:42         ` David Gibson
2024-08-20 20:39           ` Stefano Brivio
2024-08-21  2:51             ` David Gibson

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).