From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202408 header.b=XFmQYD9Q; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 204385A004C for ; Wed, 02 Oct 2024 03:30:17 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202408; t=1727832613; bh=YhZW4BaxZDbuluqUE9HtCOQezKsW/S5kO4dwkVHPzPo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=XFmQYD9QqhOWrSB2zXKxL5DyPzyxMh9B2ArNnvb6cEXzEhtAlSPrN6E5VczLpHUsP wpqnLaBOdhlvICDsQcg+wRahOwETKvgtttBlOsfM0uos0gImjOBPJDzwyExVOLueL+ b7rSlXuj5VD2g/Bd8/6y9848hnhXcFLrS1yZT+qsgHuo/FNzCg4lXacUcaskT/tFQ2 5poNgnAIcK2aprdO3IqocCZfWDlZKSE2YGEEjVhlAIq3tcEoeexWKPwF9ijUjhTw4W jSbLob799aJDFvrmunN+ijgnmRB39NFI890qU5aQAv031XiisuYFXbkZ+L++nCtX4D X5HJ3JwFmZPvA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4XJHLF3p4Jz4wnw; Wed, 2 Oct 2024 11:30:13 +1000 (AEST) Date: Wed, 2 Oct 2024 11:29:19 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v2 4/4] fwd: Direct inbound spliced forwards to the guest's external address Message-ID: References: <20241001062502.1345449-1-david@gibson.dropbear.id.au> <20241001062502.1345449-5-david@gibson.dropbear.id.au> <20241001202220.7683c732@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="c4tx1JSsSQIQw54H" Content-Disposition: inline In-Reply-To: <20241001202220.7683c732@elisabeth> Message-ID-Hash: 3OUHRXOK6PKGGTHL7HP36B3XRQDR5FSQ X-Message-ID-Hash: 3OUHRXOK6PKGGTHL7HP36B3XRQDR5FSQ X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --c4tx1JSsSQIQw54H Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Oct 01, 2024 at 08:22:20PM +0200, Stefano Brivio wrote: > Some nits, but the series looks overall good to me and I would also > apply it like this: >=20 > On Tue, 1 Oct 2024 16:25:02 +1000 > David Gibson wrote: >=20 > > In pasta mode, where addressing permits we "splice" connections, forwar= ding > > directly from host socket to guest/container socket without any L2 or L3 > > processing. This gives us a very large performance improvement when it= 's > > possible. > >=20 > > Since the traffic is from a local socket within the guest, it will go o= ver > > the guest's 'lo' interface, and accordingly we set the guest side addre= ss > > to be the loopback address. However this has a surprising side effect: > > sometimes guests will run services that are only supposed to be used wi= thin > > the guest and are therefore bound to only 127.0.0.1 and/or ::1. pasta's > > forwarding exposes those services to the host, which isn't generally wh= at > > we want. > >=20 > > Correct this by instead forwarding inbound "splice" flows to the guest's > > external address. > >=20 > > Link: https://github.com/containers/podman/issues/24045 > >=20 > > Signed-off-by: David Gibson > > --- > > conf.c | 9 +++++++++ > > fwd.c | 31 +++++++++++++++++++++++-------- > > passt.1 | 23 +++++++++++++++++++---- > > passt.h | 1 + > > 4 files changed, 52 insertions(+), 12 deletions(-) > >=20 > > diff --git a/conf.c b/conf.c > > index 6e62510..b5318f3 100644 > > --- a/conf.c > > +++ b/conf.c > > @@ -908,6 +908,9 @@ pasta_opts: > > " -U, --udp-ns SPEC UDP port forwarding to init namespace\n" > > " SPEC is as described above\n" > > " default: auto\n" > > + " --host-lo-to-ns-lo DEPRECATED:\n" >=20 > For me, this is harder to type (and remember) than --expose-ns-lo or > similar. It doesn't matter so much as it's a deprecated option anyway. I'm not especially attached to the "host-lo-to-ns-lo" name, but I also don't love "expose-ns-lo". I was aiming to give a bit more of a hint of exactly *when* the ns lo is exposed. > --help lines are precious, I wouldn't use one to say DEPRECATED. On the > other hand, it's deprecated :) so I don't care that much (and it has > the advantage of making it very clear for users). Right, I copied the style to be consistent with --no-copy-routes and --no-copy-addrs. >=20 > > + " Translate host-loopback forwards to\n" > > + " namespace loopback\n" > > " --userns NSPATH Target user namespace to join\n" > > " --netns PATH|NAME Target network namespace to join\n" > > " --netns-only Don't join existing user namespace\n" > > @@ -1284,6 +1287,7 @@ void conf(struct ctx *c, int argc, char **argv) > > {"netns-only", no_argument, NULL, 20 }, > > {"map-host-loopback", required_argument, NULL, 21 }, > > {"map-guest-addr", required_argument, NULL, 22 }, > > + {"host-lo-to-ns-lo", no_argument, NULL, 23 }, > > { 0 }, > > }; > > const char *logname =3D (c->mode =3D=3D MODE_PASTA) ? "pasta" : "pass= t"; > > @@ -1461,6 +1465,11 @@ void conf(struct ctx *c, int argc, char **argv) > > conf_nat(optarg, &c->ip4.map_guest_addr, > > &c->ip6.map_guest_addr, NULL); > > break; > > + case 23: > > + if (c->mode !=3D MODE_PASTA) > > + die("--host-lo-to-ns-lo is for pasta mode only"); > > + c->host_lo_to_ns_lo =3D 1; > > + break; > > case 'd': > > c->debug =3D 1; > > c->quiet =3D 0; > > diff --git a/fwd.c b/fwd.c > > index a505098..c71f5e1 100644 > > --- a/fwd.c > > +++ b/fwd.c > > @@ -447,20 +447,35 @@ uint8_t fwd_nat_from_host(const struct ctx *c, ui= nt8_t proto, > > (proto =3D=3D IPPROTO_TCP || proto =3D=3D IPPROTO_UDP)) { > > /* spliceable */ > > =20 > > - /* Preserve the specific loopback adddress used, but let the > > - * kernel pick a source port on the target side > > + /* The traffic will go over the guest's 'lo' interface, but by > > + * default use its external address, so we don't inadvertently > > + * expose services that listen only on the guest's loopback > > + * address. That can be overridden by --host-lo-to-ns-lo which > > + * will instead forward to the loopback address in the guest. > > + * > > + * In either case, let the kernel pick the source address to > > + * match. > > */ > > - tgt->oaddr =3D ini->eaddr; > > + if (inany_v4(&ini->eaddr)) { > > + if (c->host_lo_to_ns_lo) > > + tgt->eaddr =3D inany_loopback4; > > + else > > + tgt->eaddr =3D inany_from_v4(c->ip4.addr_seen); > > + tgt->oaddr =3D inany_any4; > > + } else { > > + if (c->host_lo_to_ns_lo) > > + tgt->eaddr =3D inany_loopback6; > > + else > > + tgt->eaddr.a6 =3D c->ip6.addr_seen; > > + tgt->oaddr =3D inany_any6; > > + } > > + > > + /* Let the kernel pick source port */ > > tgt->oport =3D 0; > > if (proto =3D=3D IPPROTO_UDP) > > /* But for UDP preserve the source port */ > > tgt->oport =3D ini->eport; > > =20 > > - if (inany_v4(&ini->eaddr)) > > - tgt->eaddr =3D inany_loopback4; > > - else > > - tgt->eaddr =3D inany_loopback6; > > - > > return PIF_SPLICE; > > } > > =20 > > diff --git a/passt.1 b/passt.1 > > index 11104e1..332384c 100644 > > --- a/passt.1 > > +++ b/passt.1 > > @@ -586,6 +586,13 @@ Configure UDP port forwarding from target namespac= e to init namespace. > > =20 > > Default is \fBauto\fR. > > =20 > > +.TP > > +.BR \-\-host-lo-to-ns-lo " " (DEPRECATED) > > +If specified, connections forwarded with \fB\-t\fR and \fB\-u\fR from > > +the host's loopback address will appear on the loopback address in the > > +guest as well. Without this option such forwarded packets will appear > > +to come from the guest's public address. > > + > > .TP > > .BR \-\-userns " " \fIspec > > Target user namespace to join, as a path. If PID is given, without thi= s option, > > @@ -874,8 +881,9 @@ interfaces, and it would also be impossible for gue= st or target > > namespace to route answers back. > > =20 > > For convenience, the source address on these packets is translated to > > -the address specified by the \fB\-\-map-host-loopback\fR option. If > > -not specified this defaults, somewhat arbitrarily, to the address of > > +the address specified by the \fB\-\-map-host-loopback\fR option (with > > +some exceptions in pasta mode, see next section below). If not > > +specified this defaults, somewhat arbitrarily, to the address of > > default IPv4 or IPv6 gateway (if any) -- this is known to be an > > existing, valid address on the same subnet. If \fB\-\-no-map-gw\fR or > > \fB\-\-map-host-loopback none\fR are specified this translation is > > @@ -912,8 +920,15 @@ and the new socket using the \fBsplice\fR(2) syste= m call, and for UDP, a pair > > of \fBrecvmmsg\fR(2) and \fBsendmmsg\fR(2) system calls deals with pac= ket > > transfers. > > =20 > > -This bypass only applies to local connections and traffic, because it'= s not > > -possible to bind sockets to foreign addresses. > > +Because it's not possible to bind sockets to foreign addresses, this > > +bypass only applies to local connections and traffic. It also means > > +that the address translation differs slightly from passt mode. > > +Connections from loopback to loopback on the host will appear to come > > +from the target namespace's public address within the guest, unless > > +\fB\-\-host-lo-to-ns-lo\fR is specified, in which case they will > > +appear to come from loopback in the namespace as well. The latter > > +behaviour used to be the default, but is usually undesirable, since it > > +can unintentionally expose namespace local services to the host. > > =20 > > .SS Binding to low numbered ports (well-known or system ports, up to 1= 023) > > =20 > > diff --git a/passt.h b/passt.h > > index 031c9b6..3b7b281 100644 > > --- a/passt.h > > +++ b/passt.h > > @@ -284,6 +284,7 @@ struct ctx { > > int no_dhcpv6; > > int no_ndp; > > int no_ra; > > + int host_lo_to_ns_lo; >=20 > This is missing from the struct comment, but... again, it's deprecated, > so it doesn't matter much. Oops. That I've added. I could repost immediately, or wait and see if we can brainstorm a better option name. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --c4tx1JSsSQIQw54H Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmb8oeoACgkQzQJF27ox 2GcxcA/+M6j5O1xX4iDlJUcxEqeOgnglD+twyueEjqWT29ryd8Pduw7Bd0h2DbJN 9r0asXS8O23+hjoz5ESPVYnzlZ30x+l0qG2hmysQcROjcl/7OIUAPzFS6bgdyh70 Jdh2jCpTc8yo0wWDElLZ54Ld52A1Ua6TpGLcOw/4Z+yfmOGY8vNW5BJ+ah1H8Bsi 6Pr3ffu8BozfLezeidl3nq51wlLASCK7NSVt5ji4vx+oa7tUzLPe8pLf7aNQpiUL 6BINzeCXh3YUM5m5qDNXizyFstFgFOn647Jc5NIBa4sFYzRcvuPou1L5+P8MzTqa T2JJte5M0Qxqgla39p8RAJjNNJirSJ5q6O8MsxFcazdKiXjb5qb8NIU+Dx1ODHWo dmXcaWbi/fegBpso4ViXTtlbd0VUSMsf0DITkUXC3k7f681eHd6BosF7b6DnZYa/ Q29jkwc9z7q6ZbIvawwM5Pf4YGKbDUhmaBiQTYJwlfZubQkilfOw9WshOhVncm0o 7ZRRkjGMv9wWErinkG1UgwpVamFGDxNeKIFgSJp8tYyX4efKypRsrHik2ptvZnYp j3k1X6GVmr5daRMZyvUJjdJTZ5jvWtC62HSBn96c/ZvLa61waa+yI80lqw+AU8pL aci/n+RtG4tkDCzToWR3G66sAu/og4pwjHHTEAab76hEtMzppik= =QuQK -----END PGP SIGNATURE----- --c4tx1JSsSQIQw54H--