public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: passt-dev@passt.top, Stefano Brivio <sbrivio@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Subject: [PATCH v5 7/7] fwd: Direct inbound spliced forwards to the guest's external address
Date: Fri, 18 Oct 2024 12:35:56 +1100	[thread overview]
Message-ID: <20241018013556.1266295-8-david@gibson.dropbear.id.au> (raw)
In-Reply-To: <20241018013556.1266295-1-david@gibson.dropbear.id.au>

In pasta mode, where addressing permits we "splice" connections, forwarding
directly from host socket to guest/container socket without any L2 or L3
processing.  This gives us a very large performance improvement when it's
possible.

Since the traffic is from a local socket within the guest, it will go over
the guest's 'lo' interface, and accordingly we set the guest side address
to be the loopback address.  However this has a surprising side effect:
sometimes guests will run services that are only supposed to be used within
the guest and are therefore bound to only 127.0.0.1 and/or ::1.  pasta's
forwarding exposes those services to the host, which isn't generally what
we want.

Correct this by instead forwarding inbound "splice" flows to the guest's
external address.

Link: https://github.com/containers/podman/issues/24045

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c  |  9 +++++++++
 fwd.c   | 31 +++++++++++++++++++++++--------
 passt.1 | 23 +++++++++++++++++++----
 passt.h |  2 ++
 4 files changed, 53 insertions(+), 12 deletions(-)

diff --git a/conf.c b/conf.c
index c631019..b3b5342 100644
--- a/conf.c
+++ b/conf.c
@@ -912,6 +912,9 @@ pasta_opts:
 		"  -U, --udp-ns SPEC	UDP port forwarding to init namespace\n"
 		"    SPEC is as described above\n"
 		"    default: auto\n"
+		"  --host-lo-to-ns-lo	DEPRECATED:\n"
+		"			Translate host-loopback forwards to\n"
+		"			namespace loopback\n"
 		"  --userns NSPATH 	Target user namespace to join\n"
 		"  --netns PATH|NAME	Target network namespace to join\n"
 		"  --netns-only		Don't join existing user namespace\n"
@@ -1289,6 +1292,7 @@ void conf(struct ctx *c, int argc, char **argv)
 		{"netns-only",	no_argument,		NULL,		20 },
 		{"map-host-loopback", required_argument, NULL,		21 },
 		{"map-guest-addr", required_argument,	NULL,		22 },
+		{"host-lo-to-ns-lo", no_argument, 	NULL,		23 },
 		{"dns-host",	required_argument,	NULL,		24 },
 		{ 0 },
 	};
@@ -1467,6 +1471,11 @@ void conf(struct ctx *c, int argc, char **argv)
 			conf_nat(optarg, &c->ip4.map_guest_addr,
 				 &c->ip6.map_guest_addr, NULL);
 			break;
+		case 23:
+			if (c->mode != MODE_PASTA)
+				die("--host-lo-to-ns-lo is for pasta mode only");
+			c->host_lo_to_ns_lo = 1;
+			break;
 		case 24:
 			if (inet_pton(AF_INET6, optarg, &c->ip6.dns_host) &&
 			    !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_host))
diff --git a/fwd.c b/fwd.c
index a505098..c71f5e1 100644
--- a/fwd.c
+++ b/fwd.c
@@ -447,20 +447,35 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto,
 	    (proto == IPPROTO_TCP || proto == IPPROTO_UDP)) {
 		/* spliceable */
 
-		/* Preserve the specific loopback adddress used, but let the
-		 * kernel pick a source port on the target side
+		/* The traffic will go over the guest's 'lo' interface, but by
+		 * default use its external address, so we don't inadvertently
+		 * expose services that listen only on the guest's loopback
+		 * address.  That can be overridden by --host-lo-to-ns-lo which
+		 * will instead forward to the loopback address in the guest.
+		 *
+		 * In either case, let the kernel pick the source address to
+		 * match.
 		 */
-		tgt->oaddr = ini->eaddr;
+		if (inany_v4(&ini->eaddr)) {
+			if (c->host_lo_to_ns_lo)
+				tgt->eaddr = inany_loopback4;
+			else
+				tgt->eaddr = inany_from_v4(c->ip4.addr_seen);
+			tgt->oaddr = inany_any4;
+		} else {
+			if (c->host_lo_to_ns_lo)
+				tgt->eaddr = inany_loopback6;
+			else
+				tgt->eaddr.a6 = c->ip6.addr_seen;
+			tgt->oaddr = inany_any6;
+		}
+
+		/* Let the kernel pick source port */
 		tgt->oport = 0;
 		if (proto == IPPROTO_UDP)
 			/* But for UDP preserve the source port */
 			tgt->oport = ini->eport;
 
-		if (inany_v4(&ini->eaddr))
-			tgt->eaddr = inany_loopback4;
-		else
-			tgt->eaddr = inany_loopback6;
-
 		return PIF_SPLICE;
 	}
 
diff --git a/passt.1 b/passt.1
index 46100e2..f084978 100644
--- a/passt.1
+++ b/passt.1
@@ -605,6 +605,13 @@ Configure UDP port forwarding from target namespace to init namespace.
 
 Default is \fBauto\fR.
 
+.TP
+.BR \-\-host-lo-to-ns-lo " " (DEPRECATED)
+If specified, connections forwarded with \fB\-t\fR and \fB\-u\fR from
+the host's loopback address will appear on the loopback address in the
+guest as well.  Without this option such forwarded packets will appear
+to come from the guest's public address.
+
 .TP
 .BR \-\-userns " " \fIspec
 Target user namespace to join, as a path. If PID is given, without this option,
@@ -893,8 +900,9 @@ interfaces, and it would also be impossible for guest or target
 namespace to route answers back.
 
 For convenience, the source address on these packets is translated to
-the address specified by the \fB\-\-map-host-loopback\fR option.  If
-not specified this defaults, somewhat arbitrarily, to the address of
+the address specified by the \fB\-\-map-host-loopback\fR option (with
+some exceptions in pasta mode, see next section below).  If not
+specified this defaults, somewhat arbitrarily, to the address of
 default IPv4 or IPv6 gateway (if any) -- this is known to be an
 existing, valid address on the same subnet.  If \fB\-\-no-map-gw\fR or
 \fB\-\-map-host-loopback none\fR are specified this translation is
@@ -931,8 +939,15 @@ and the new socket using the \fBsplice\fR(2) system call, and for UDP, a pair
 of \fBrecvmmsg\fR(2) and \fBsendmmsg\fR(2) system calls deals with packet
 transfers.
 
-This bypass only applies to local connections and traffic, because it's not
-possible to bind sockets to foreign addresses.
+Because it's not possible to bind sockets to foreign addresses, this
+bypass only applies to local connections and traffic.  It also means
+that the address translation differs slightly from passt mode.
+Connections from loopback to loopback on the host will appear to come
+from the target namespace's public address within the guest, unless
+\fB\-\-host-lo-to-ns-lo\fR is specified, in which case they will
+appear to come from loopback in the namespace as well.  The latter
+behaviour used to be the default, but is usually undesirable, since it
+can unintentionally expose namespace local services to the host.
 
 .SS Binding to low numbered ports (well-known or system ports, up to 1023)
 
diff --git a/passt.h b/passt.h
index 4908ed9..72c7f72 100644
--- a/passt.h
+++ b/passt.h
@@ -225,6 +225,7 @@ struct ip6_ctx {
  * @no_dhcpv6:		Disable DHCPv6 server
  * @no_ndp:		Disable NDP handler altogether
  * @no_ra:		Disable router advertisements
+ * @host_lo_to_ns_lo:	Map host loopback addresses to ns loopback addresses
  * @freebind:		Allow binding of non-local addresses for forwarding
  * @low_wmem:		Low probed net.core.wmem_max
  * @low_rmem:		Low probed net.core.rmem_max
@@ -285,6 +286,7 @@ struct ctx {
 	int no_dhcpv6;
 	int no_ndp;
 	int no_ra;
+	int host_lo_to_ns_lo;
 	int freebind;
 
 	int low_wmem;
-- 
@@ -225,6 +225,7 @@ struct ip6_ctx {
  * @no_dhcpv6:		Disable DHCPv6 server
  * @no_ndp:		Disable NDP handler altogether
  * @no_ra:		Disable router advertisements
+ * @host_lo_to_ns_lo:	Map host loopback addresses to ns loopback addresses
  * @freebind:		Allow binding of non-local addresses for forwarding
  * @low_wmem:		Low probed net.core.wmem_max
  * @low_rmem:		Low probed net.core.rmem_max
@@ -285,6 +286,7 @@ struct ctx {
 	int no_dhcpv6;
 	int no_ndp;
 	int no_ra;
+	int host_lo_to_ns_lo;
 	int freebind;
 
 	int low_wmem;
-- 
2.47.0


  parent reply	other threads:[~2024-10-18  1:36 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-18  1:35 [PATCH v5 0/7] Don't expose container loopback services to the host David Gibson
2024-10-18  1:35 ` [PATCH v5 1/7] arp: Fix a handful of small warts David Gibson
2024-10-18  1:35 ` [PATCH v5 2/7] test: Explicitly wait for DAD to complete on SLAAC addresses David Gibson
2024-10-18  1:35 ` [PATCH v5 3/7] test: Wait for DAD on DHCPv6 addresses David Gibson
2024-10-18  1:35 ` [PATCH v5 4/7] passt.1: Mark --stderr as deprecated more prominently David Gibson
2024-10-18  1:35 ` [PATCH v5 5/7] passt.1: Clarify and update "Handling of local addresses" section David Gibson
2024-10-18  1:35 ` [PATCH v5 6/7] test: Clarify test for spliced inbound transfers David Gibson
2024-10-18  1:35 ` David Gibson [this message]
2024-10-18 19:06 ` [PATCH v5 0/7] Don't expose container loopback services to the host Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241018013556.1266295-8-david@gibson.dropbear.id.au \
    --to=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).