From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=h22n/u32; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTP id A40305A004E for ; Wed, 02 Oct 2024 07:32:54 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1727847173; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Jp03B+AWbkmt4MvLPpvX/cr9UhHnjpfM1geN/5AiRpI=; b=h22n/u32xqrEGa+lRjOYSfha/FxASrxcwZEtMf+i4ZtFAoxXQp9kxNeTCpA8ZtaUtaN81A UqC1KSKldig4hvgmzqKdLWeFl/IEZcFpLEo/RC/bz5PSgYopl8OvbEWAUeTsAVMxbS6G6C 5cR4Yi+d9Wb9ClZzd9VxnuC9dZUX2ac= Received: from mail-pj1-f69.google.com (mail-pj1-f69.google.com [209.85.216.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-586-e5mPVAVkO0yxWfissOIIQw-1; Wed, 02 Oct 2024 01:32:52 -0400 X-MC-Unique: e5mPVAVkO0yxWfissOIIQw-1 Received: by mail-pj1-f69.google.com with SMTP id 98e67ed59e1d1-2db470aa646so5965035a91.3 for ; Tue, 01 Oct 2024 22:32:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727847170; x=1728451970; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Jp03B+AWbkmt4MvLPpvX/cr9UhHnjpfM1geN/5AiRpI=; b=fB4myhDO5Y5ieZ3c72p2/iLsNh5SExh+huwgjYQyZQQUeeKwf+lBPCP+VH3LUdzfwb FmgD0N7izvo9dW9ix2JrNIhmS1sSJzCr78HUBwWYqTz0rNl8fFiY+Q41P3EmTs+gHHF8 JQ1JwEV/XjeWpboW5PSW6xpZC+yCfHr45HJjzPCAx7jQHuFjEBW+kBwmT0SqJTFrpHob mxbANZwdav7Quto3tiDZw+5JdqdAWKi2eTyY3SSIaxWr7vMf6p7i3NTu9PUVmgBrUXun xCMdEhy+hqZ1rmbCsmxDfkzvMfv12ITusWRmD7sB3c1frFK4rcG9/EbYmUtHMMhzwWNy nRIw== X-Gm-Message-State: AOJu0Yw6LlLrEQ3sHe3wK2jxZmcR/vaaacgi3c6BXjlRCGqADYD07OcH FdY/TcncAtOtf1ntkKNVamBFuG4LmGw+zoAkZC/4+Mnx8lWEGY04TUe02TjOtHyeSm6LJWFPyee DDmKlA2VkMgPA4YlVyrz1uTMK0m2TreHl254gP1szxr4zKvgGYVwy42loCA== X-Received: by 2002:a17:90b:3a85:b0:2c9:df1c:4a58 with SMTP id 98e67ed59e1d1-2e1846a6a16mr2852790a91.23.1727847170240; Tue, 01 Oct 2024 22:32:50 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFcjqhldwhv6icwEGcUX/Ez5shIs0O2BVOcd6Iq7NqYqIirDZQEDEyTiQlXyeLlhNthmAvMsg== X-Received: by 2002:a17:90b:3a85:b0:2c9:df1c:4a58 with SMTP id 98e67ed59e1d1-2e1846a6a16mr2852765a91.23.1727847169678; Tue, 01 Oct 2024 22:32:49 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2e18f89d99esm661888a91.30.2024.10.01.22.32.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Oct 2024 22:32:49 -0700 (PDT) Date: Wed, 2 Oct 2024 07:32:46 +0200 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH v2 4/4] fwd: Direct inbound spliced forwards to the guest's external address Message-ID: <20241002073246.06b6f903@elisabeth> In-Reply-To: References: <20241001062502.1345449-1-david@gibson.dropbear.id.au> <20241001062502.1345449-5-david@gibson.dropbear.id.au> <20241001202220.7683c732@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: 6DYSVWRGSAUXLI5OPEOOHZHCWAP2VXOS X-Message-ID-Hash: 6DYSVWRGSAUXLI5OPEOOHZHCWAP2VXOS X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Wed, 2 Oct 2024 11:29:19 +1000 David Gibson wrote: > On Tue, Oct 01, 2024 at 08:22:20PM +0200, Stefano Brivio wrote: > > Some nits, but the series looks overall good to me and I would also > > apply it like this: > > > > On Tue, 1 Oct 2024 16:25:02 +1000 > > David Gibson wrote: > > > > > In pasta mode, where addressing permits we "splice" connections, forwarding > > > directly from host socket to guest/container socket without any L2 or L3 > > > processing. This gives us a very large performance improvement when it's > > > possible. > > > > > > Since the traffic is from a local socket within the guest, it will go over > > > the guest's 'lo' interface, and accordingly we set the guest side address > > > to be the loopback address. However this has a surprising side effect: > > > sometimes guests will run services that are only supposed to be used within > > > the guest and are therefore bound to only 127.0.0.1 and/or ::1. pasta's > > > forwarding exposes those services to the host, which isn't generally what > > > we want. > > > > > > Correct this by instead forwarding inbound "splice" flows to the guest's > > > external address. > > > > > > Link: https://github.com/containers/podman/issues/24045 > > > > > > Signed-off-by: David Gibson > > > --- > > > conf.c | 9 +++++++++ > > > fwd.c | 31 +++++++++++++++++++++++-------- > > > passt.1 | 23 +++++++++++++++++++---- > > > passt.h | 1 + > > > 4 files changed, 52 insertions(+), 12 deletions(-) > > > > > > diff --git a/conf.c b/conf.c > > > index 6e62510..b5318f3 100644 > > > --- a/conf.c > > > +++ b/conf.c > > > @@ -908,6 +908,9 @@ pasta_opts: > > > " -U, --udp-ns SPEC UDP port forwarding to init namespace\n" > > > " SPEC is as described above\n" > > > " default: auto\n" > > > + " --host-lo-to-ns-lo DEPRECATED:\n" > > > > For me, this is harder to type (and remember) than --expose-ns-lo or > > similar. It doesn't matter so much as it's a deprecated option anyway. > > I'm not especially attached to the "host-lo-to-ns-lo" name, but I also > don't love "expose-ns-lo". I was aiming to give a bit more of a hint > of exactly *when* the ns lo is exposed. Ah, right, fair enough. > > > --help lines are precious, I wouldn't use one to say DEPRECATED. On the > > other hand, it's deprecated :) so I don't care that much (and it has > > the advantage of making it very clear for users). > > Right, I copied the style to be consistent with --no-copy-routes and > --no-copy-addrs. > > > > > > + " Translate host-loopback forwards to\n" > > > + " namespace loopback\n" > > > " --userns NSPATH Target user namespace to join\n" > > > " --netns PATH|NAME Target network namespace to join\n" > > > " --netns-only Don't join existing user namespace\n" > > > @@ -1284,6 +1287,7 @@ void conf(struct ctx *c, int argc, char **argv) > > > {"netns-only", no_argument, NULL, 20 }, > > > {"map-host-loopback", required_argument, NULL, 21 }, > > > {"map-guest-addr", required_argument, NULL, 22 }, > > > + {"host-lo-to-ns-lo", no_argument, NULL, 23 }, > > > { 0 }, > > > }; > > > const char *logname = (c->mode == MODE_PASTA) ? "pasta" : "passt"; > > > @@ -1461,6 +1465,11 @@ void conf(struct ctx *c, int argc, char **argv) > > > conf_nat(optarg, &c->ip4.map_guest_addr, > > > &c->ip6.map_guest_addr, NULL); > > > break; > > > + case 23: > > > + if (c->mode != MODE_PASTA) > > > + die("--host-lo-to-ns-lo is for pasta mode only"); > > > + c->host_lo_to_ns_lo = 1; > > > + break; > > > case 'd': > > > c->debug = 1; > > > c->quiet = 0; > > > diff --git a/fwd.c b/fwd.c > > > index a505098..c71f5e1 100644 > > > --- a/fwd.c > > > +++ b/fwd.c > > > @@ -447,20 +447,35 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, > > > (proto == IPPROTO_TCP || proto == IPPROTO_UDP)) { > > > /* spliceable */ > > > > > > - /* Preserve the specific loopback adddress used, but let the > > > - * kernel pick a source port on the target side > > > + /* The traffic will go over the guest's 'lo' interface, but by > > > + * default use its external address, so we don't inadvertently > > > + * expose services that listen only on the guest's loopback > > > + * address. That can be overridden by --host-lo-to-ns-lo which > > > + * will instead forward to the loopback address in the guest. > > > + * > > > + * In either case, let the kernel pick the source address to > > > + * match. > > > */ > > > - tgt->oaddr = ini->eaddr; > > > + if (inany_v4(&ini->eaddr)) { > > > + if (c->host_lo_to_ns_lo) > > > + tgt->eaddr = inany_loopback4; > > > + else > > > + tgt->eaddr = inany_from_v4(c->ip4.addr_seen); > > > + tgt->oaddr = inany_any4; > > > + } else { > > > + if (c->host_lo_to_ns_lo) > > > + tgt->eaddr = inany_loopback6; > > > + else > > > + tgt->eaddr.a6 = c->ip6.addr_seen; > > > + tgt->oaddr = inany_any6; > > > + } > > > + > > > + /* Let the kernel pick source port */ > > > tgt->oport = 0; > > > if (proto == IPPROTO_UDP) > > > /* But for UDP preserve the source port */ > > > tgt->oport = ini->eport; > > > > > > - if (inany_v4(&ini->eaddr)) > > > - tgt->eaddr = inany_loopback4; > > > - else > > > - tgt->eaddr = inany_loopback6; > > > - > > > return PIF_SPLICE; > > > } > > > > > > diff --git a/passt.1 b/passt.1 > > > index 11104e1..332384c 100644 > > > --- a/passt.1 > > > +++ b/passt.1 > > > @@ -586,6 +586,13 @@ Configure UDP port forwarding from target namespace to init namespace. > > > > > > Default is \fBauto\fR. > > > > > > +.TP > > > +.BR \-\-host-lo-to-ns-lo " " (DEPRECATED) > > > +If specified, connections forwarded with \fB\-t\fR and \fB\-u\fR from > > > +the host's loopback address will appear on the loopback address in the > > > +guest as well. Without this option such forwarded packets will appear > > > +to come from the guest's public address. > > > + > > > .TP > > > .BR \-\-userns " " \fIspec > > > Target user namespace to join, as a path. If PID is given, without this option, > > > @@ -874,8 +881,9 @@ interfaces, and it would also be impossible for guest or target > > > namespace to route answers back. > > > > > > For convenience, the source address on these packets is translated to > > > -the address specified by the \fB\-\-map-host-loopback\fR option. If > > > -not specified this defaults, somewhat arbitrarily, to the address of > > > +the address specified by the \fB\-\-map-host-loopback\fR option (with > > > +some exceptions in pasta mode, see next section below). If not > > > +specified this defaults, somewhat arbitrarily, to the address of > > > default IPv4 or IPv6 gateway (if any) -- this is known to be an > > > existing, valid address on the same subnet. If \fB\-\-no-map-gw\fR or > > > \fB\-\-map-host-loopback none\fR are specified this translation is > > > @@ -912,8 +920,15 @@ and the new socket using the \fBsplice\fR(2) system call, and for UDP, a pair > > > of \fBrecvmmsg\fR(2) and \fBsendmmsg\fR(2) system calls deals with packet > > > transfers. > > > > > > -This bypass only applies to local connections and traffic, because it's not > > > -possible to bind sockets to foreign addresses. > > > +Because it's not possible to bind sockets to foreign addresses, this > > > +bypass only applies to local connections and traffic. It also means > > > +that the address translation differs slightly from passt mode. > > > +Connections from loopback to loopback on the host will appear to come > > > +from the target namespace's public address within the guest, unless > > > +\fB\-\-host-lo-to-ns-lo\fR is specified, in which case they will > > > +appear to come from loopback in the namespace as well. The latter > > > +behaviour used to be the default, but is usually undesirable, since it > > > +can unintentionally expose namespace local services to the host. > > > > > > .SS Binding to low numbered ports (well-known or system ports, up to 1023) > > > > > > diff --git a/passt.h b/passt.h > > > index 031c9b6..3b7b281 100644 > > > --- a/passt.h > > > +++ b/passt.h > > > @@ -284,6 +284,7 @@ struct ctx { > > > int no_dhcpv6; > > > int no_ndp; > > > int no_ra; > > > + int host_lo_to_ns_lo; > > > > This is missing from the struct comment, but... again, it's deprecated, > > so it doesn't matter much. > > Oops. That I've added. I could repost immediately, or wait and see > if we can brainstorm a better option name. I don't really have better ideas, let's keep it that way then. -- Stefano