On Tue, Oct 01, 2024 at 08:22:20PM +0200, Stefano Brivio wrote: > Some nits, but the series looks overall good to me and I would also > apply it like this: > > On Tue, 1 Oct 2024 16:25:02 +1000 > David Gibson wrote: > > > In pasta mode, where addressing permits we "splice" connections, forwarding > > directly from host socket to guest/container socket without any L2 or L3 > > processing. This gives us a very large performance improvement when it's > > possible. > > > > Since the traffic is from a local socket within the guest, it will go over > > the guest's 'lo' interface, and accordingly we set the guest side address > > to be the loopback address. However this has a surprising side effect: > > sometimes guests will run services that are only supposed to be used within > > the guest and are therefore bound to only 127.0.0.1 and/or ::1. pasta's > > forwarding exposes those services to the host, which isn't generally what > > we want. > > > > Correct this by instead forwarding inbound "splice" flows to the guest's > > external address. > > > > Link: https://github.com/containers/podman/issues/24045 > > > > Signed-off-by: David Gibson > > --- > > conf.c | 9 +++++++++ > > fwd.c | 31 +++++++++++++++++++++++-------- > > passt.1 | 23 +++++++++++++++++++---- > > passt.h | 1 + > > 4 files changed, 52 insertions(+), 12 deletions(-) > > > > diff --git a/conf.c b/conf.c > > index 6e62510..b5318f3 100644 > > --- a/conf.c > > +++ b/conf.c > > @@ -908,6 +908,9 @@ pasta_opts: > > " -U, --udp-ns SPEC UDP port forwarding to init namespace\n" > > " SPEC is as described above\n" > > " default: auto\n" > > + " --host-lo-to-ns-lo DEPRECATED:\n" > > For me, this is harder to type (and remember) than --expose-ns-lo or > similar. It doesn't matter so much as it's a deprecated option anyway. I'm not especially attached to the "host-lo-to-ns-lo" name, but I also don't love "expose-ns-lo". I was aiming to give a bit more of a hint of exactly *when* the ns lo is exposed. > --help lines are precious, I wouldn't use one to say DEPRECATED. On the > other hand, it's deprecated :) so I don't care that much (and it has > the advantage of making it very clear for users). Right, I copied the style to be consistent with --no-copy-routes and --no-copy-addrs. > > > + " Translate host-loopback forwards to\n" > > + " namespace loopback\n" > > " --userns NSPATH Target user namespace to join\n" > > " --netns PATH|NAME Target network namespace to join\n" > > " --netns-only Don't join existing user namespace\n" > > @@ -1284,6 +1287,7 @@ void conf(struct ctx *c, int argc, char **argv) > > {"netns-only", no_argument, NULL, 20 }, > > {"map-host-loopback", required_argument, NULL, 21 }, > > {"map-guest-addr", required_argument, NULL, 22 }, > > + {"host-lo-to-ns-lo", no_argument, NULL, 23 }, > > { 0 }, > > }; > > const char *logname = (c->mode == MODE_PASTA) ? "pasta" : "passt"; > > @@ -1461,6 +1465,11 @@ void conf(struct ctx *c, int argc, char **argv) > > conf_nat(optarg, &c->ip4.map_guest_addr, > > &c->ip6.map_guest_addr, NULL); > > break; > > + case 23: > > + if (c->mode != MODE_PASTA) > > + die("--host-lo-to-ns-lo is for pasta mode only"); > > + c->host_lo_to_ns_lo = 1; > > + break; > > case 'd': > > c->debug = 1; > > c->quiet = 0; > > diff --git a/fwd.c b/fwd.c > > index a505098..c71f5e1 100644 > > --- a/fwd.c > > +++ b/fwd.c > > @@ -447,20 +447,35 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, > > (proto == IPPROTO_TCP || proto == IPPROTO_UDP)) { > > /* spliceable */ > > > > - /* Preserve the specific loopback adddress used, but let the > > - * kernel pick a source port on the target side > > + /* The traffic will go over the guest's 'lo' interface, but by > > + * default use its external address, so we don't inadvertently > > + * expose services that listen only on the guest's loopback > > + * address. That can be overridden by --host-lo-to-ns-lo which > > + * will instead forward to the loopback address in the guest. > > + * > > + * In either case, let the kernel pick the source address to > > + * match. > > */ > > - tgt->oaddr = ini->eaddr; > > + if (inany_v4(&ini->eaddr)) { > > + if (c->host_lo_to_ns_lo) > > + tgt->eaddr = inany_loopback4; > > + else > > + tgt->eaddr = inany_from_v4(c->ip4.addr_seen); > > + tgt->oaddr = inany_any4; > > + } else { > > + if (c->host_lo_to_ns_lo) > > + tgt->eaddr = inany_loopback6; > > + else > > + tgt->eaddr.a6 = c->ip6.addr_seen; > > + tgt->oaddr = inany_any6; > > + } > > + > > + /* Let the kernel pick source port */ > > tgt->oport = 0; > > if (proto == IPPROTO_UDP) > > /* But for UDP preserve the source port */ > > tgt->oport = ini->eport; > > > > - if (inany_v4(&ini->eaddr)) > > - tgt->eaddr = inany_loopback4; > > - else > > - tgt->eaddr = inany_loopback6; > > - > > return PIF_SPLICE; > > } > > > > diff --git a/passt.1 b/passt.1 > > index 11104e1..332384c 100644 > > --- a/passt.1 > > +++ b/passt.1 > > @@ -586,6 +586,13 @@ Configure UDP port forwarding from target namespace to init namespace. > > > > Default is \fBauto\fR. > > > > +.TP > > +.BR \-\-host-lo-to-ns-lo " " (DEPRECATED) > > +If specified, connections forwarded with \fB\-t\fR and \fB\-u\fR from > > +the host's loopback address will appear on the loopback address in the > > +guest as well. Without this option such forwarded packets will appear > > +to come from the guest's public address. > > + > > .TP > > .BR \-\-userns " " \fIspec > > Target user namespace to join, as a path. If PID is given, without this option, > > @@ -874,8 +881,9 @@ interfaces, and it would also be impossible for guest or target > > namespace to route answers back. > > > > For convenience, the source address on these packets is translated to > > -the address specified by the \fB\-\-map-host-loopback\fR option. If > > -not specified this defaults, somewhat arbitrarily, to the address of > > +the address specified by the \fB\-\-map-host-loopback\fR option (with > > +some exceptions in pasta mode, see next section below). If not > > +specified this defaults, somewhat arbitrarily, to the address of > > default IPv4 or IPv6 gateway (if any) -- this is known to be an > > existing, valid address on the same subnet. If \fB\-\-no-map-gw\fR or > > \fB\-\-map-host-loopback none\fR are specified this translation is > > @@ -912,8 +920,15 @@ and the new socket using the \fBsplice\fR(2) system call, and for UDP, a pair > > of \fBrecvmmsg\fR(2) and \fBsendmmsg\fR(2) system calls deals with packet > > transfers. > > > > -This bypass only applies to local connections and traffic, because it's not > > -possible to bind sockets to foreign addresses. > > +Because it's not possible to bind sockets to foreign addresses, this > > +bypass only applies to local connections and traffic. It also means > > +that the address translation differs slightly from passt mode. > > +Connections from loopback to loopback on the host will appear to come > > +from the target namespace's public address within the guest, unless > > +\fB\-\-host-lo-to-ns-lo\fR is specified, in which case they will > > +appear to come from loopback in the namespace as well. The latter > > +behaviour used to be the default, but is usually undesirable, since it > > +can unintentionally expose namespace local services to the host. > > > > .SS Binding to low numbered ports (well-known or system ports, up to 1023) > > > > diff --git a/passt.h b/passt.h > > index 031c9b6..3b7b281 100644 > > --- a/passt.h > > +++ b/passt.h > > @@ -284,6 +284,7 @@ struct ctx { > > int no_dhcpv6; > > int no_ndp; > > int no_ra; > > + int host_lo_to_ns_lo; > > This is missing from the struct comment, but... again, it's deprecated, > so it doesn't matter much. Oops. That I've added. I could repost immediately, or wait and see if we can brainstorm a better option name. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson