From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTP id 6C1015A0265 for ; Fri, 14 Oct 2022 08:39:05 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1665729544; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4tHeIIYf9SPU00nHeqHidbexi1Pi2iolboPGBu8j8Yk=; b=aPN0mJrNGSSLh5csJUESLLYUV64ig8EYu+4NskxIAQFTwlPzHbq7VWvZo+++hpeA3Xd/Sb UOGRbvgLn9On0I2bc8g3ZIfXSNLuMPy6coxtEfO71nfzzsxERrTo00F8NN03CSc6Vqf8l+ bZpTQgNgOCYB+AkBX2B6NoJ2LQ9L90c= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-464-Y5FZMlEXOKqSkoGU9T2_9w-1; Fri, 14 Oct 2022 02:38:55 -0400 X-MC-Unique: Y5FZMlEXOKqSkoGU9T2_9w-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6BF988027EC; Fri, 14 Oct 2022 06:38:55 +0000 (UTC) Received: from maya.cloud.tilaa.com (ovpn-208-3.brq.redhat.com [10.40.208.3]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DCF4C54F3E3; Fri, 14 Oct 2022 06:38:54 +0000 (UTC) Date: Fri, 14 Oct 2022 08:38:51 +0200 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH] conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user() Message-ID: <20221014083851.3b7a3571@elisabeth> In-Reply-To: References: <20221013163406.3727136-1-sbrivio@redhat.com> Organization: Red Hat MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: WQU2K47DZTSABWEOY6O5MGV6X3HIIBY7 X-Message-ID-Hash: WQU2K47DZTSABWEOY6O5MGV6X3HIIBY7 X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: <> Archived-At: List-Archive: <> List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, 14 Oct 2022 14:12:26 +1100 David Gibson wrote: > On Thu, Oct 13, 2022 at 06:34:06PM +0200, Stefano Brivio wrote: > > Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in > > the target user namespace as we isolate the process, which means > > we're unable to bind to low ports at that point. > > > > Bind inbound ports, and only those, before isolate_user(). Keep the > > handling of outbound ports (for pasta mode only) after the setup of > > the namespace, because that's where we'll bind them. > > > > To this end, initialise the netlink socket for the init namespace > > before isolate_user() as well, as we actually need to know the > > addresses of the upstream interface before binding ports, in case > > they're not explicitly passed by the user. > > > > As we now call nl_sock_init() twice, checking its return code from > > conf() twice looks a bit heavy: make it exit(), instead, as we > > can't do much if we don't have netlink sockets. > > > > While at it: > > > > - move the v4_only && v6_only options check just after the first > > option processing loop, as this is more strictly related to > > option parsing proper > > > > - update the man page, explaining that CAP_NET_BIND_SERVICE is > > *not* the preferred way to bind ports, because passt and pasta > > can be abused to allow other processes to make effective usage > > of it. Add a note about the recommended sysctl instead > > > > Reported-by: David Gibson > > Signed-off-by: Stefano Brivio > > --- > > This is based on David's patchset "Fixes and cleanups for capability > > handling". > > > > conf.c | 71 +++++++++++++++++++++++++++---------------------------- > > netlink.c | 20 +++++++++------- > > netlink.h | 2 +- > > passt.1 | 45 +++++++++++++++++++++++++++++------ > > tap.c | 1 + > > 5 files changed, 87 insertions(+), 52 deletions(-) > > > > diff --git a/conf.c b/conf.c > > index a22ef48..e1f42f1 100644 > > --- a/conf.c > > +++ b/conf.c > > @@ -1530,6 +1530,11 @@ void conf(struct ctx *c, int argc, char **argv) > > } > > } while (name != -1); > > > > + if (v4_only && v6_only) { > > + err("Options ipv4-only and ipv6-only are mutually exclusive"); > > + usage(argv[0]); > > + } > > + > > ret = conf_ugid(runas, &uid, &gid); > > if (ret) > > usage(argv[0]); > > @@ -1539,6 +1544,30 @@ void conf(struct ctx *c, int argc, char **argv) > > logfile, logsize); > > } > > > > + nl_sock_init(c, false); > > + if (!v6_only) > > + c->ifi4 = conf_ip4(ifi, &c->ip4, c->mac); > > + if (!v4_only) > > + c->ifi6 = conf_ip6(ifi, &c->ip6, c->mac); > > + if (!c->ifi4 && !c->ifi6) { > > + err("External interface not usable"); > > + exit(EXIT_FAILURE); > > + } > > + > > + /* Inbound port options can be parsed now (after IPv4/IPv6 settings) */ > > + optind = 1; > > + do { > > + struct port_fwd *fwd; > > + > > + name = getopt_long(argc, argv, optstring, options, NULL); > > + > > + if ((name == 't' && (fwd = &c->tcp.fwd_in)) || > > + (name == 'u' && (fwd = &c->udp.fwd_in.f))) { > > + if (!optarg || conf_ports(c, name, optarg, fwd)) > > + usage(argv[0]); > > + } > > + } while (name != -1); > > + > > if (c->mode == MODE_PASTA) { > > if (conf_pasta_ns(&netns_only, userns, netns, > > optind, argc, argv) < 0) > > @@ -1561,50 +1590,20 @@ void conf(struct ctx *c, int argc, char **argv) > > } > > } > > > > - if (nl_sock_init(c)) { > > - err("Failed to get netlink socket"); > > - exit(EXIT_FAILURE); > > - } > > - > > - if (v4_only && v6_only) { > > - err("Options ipv4-only and ipv6-only are mutually exclusive"); > > - usage(argv[0]); > > - } > > - if (!v6_only) > > - c->ifi4 = conf_ip4(ifi, &c->ip4, c->mac); > > - if (!v4_only) > > - c->ifi6 = conf_ip6(ifi, &c->ip6, c->mac); > > - if (!c->ifi4 && !c->ifi6) { > > - err("External interface not usable"); > > - exit(EXIT_FAILURE); > > - } > > + if (c->mode == MODE_PASTA) > > + nl_sock_init(c, true); > > > > - /* Now we can process port configuration options */ > > + /* ...and outbound port options now that namespaces are set up. */ > > optind = 1; > > do { > > - struct port_fwd *fwd = NULL; > > + struct port_fwd *fwd; > > > > name = getopt_long(argc, argv, optstring, options, NULL); > > - switch (name) { > > - case 't': > > - case 'u': > > - case 'T': > > - case 'U': > > - if (name == 't') > > - fwd = &c->tcp.fwd_in; > > - else if (name == 'T') > > - fwd = &c->tcp.fwd_out; > > - else if (name == 'u') > > - fwd = &c->udp.fwd_in.f; > > - else if (name == 'U') > > - fwd = &c->udp.fwd_out.f; > > > > + if ((name == 'T' && (fwd = &c->tcp.fwd_out)) || > > + (name == 'U' && (fwd = &c->udp.fwd_out.f))) { > > if (!optarg || conf_ports(c, name, optarg, fwd)) > > usage(argv[0]); > > - > > - break; > > - default: > > - break; > > } > > } while (name != -1); > > > > diff --git a/netlink.c b/netlink.c > > index 6e5a96b..3ee4d42 100644 > > --- a/netlink.c > > +++ b/netlink.c > > @@ -19,6 +19,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -71,25 +72,28 @@ ns: > > } > > > > /** > > - * nl_sock_init() - Call nl_sock_init_do() and check for failures > > + * nl_sock_init() - Call nl_sock_init_do(), won't return on failure > > * @c: Execution context > > - * > > - * Return: -EIO if sockets couldn't be set up, 0 otherwise > > + * @ns: Get socket in namespace, not in init > > */ > > -int nl_sock_init(const struct ctx *c) > > +void nl_sock_init(const struct ctx *c, bool ns) > > { > > - if (c->mode == MODE_PASTA) { > > + if (c->mode == MODE_PASTA && ns) { > > No need for the mode check here: ns is only ever true when in pasta > mode. Since you're also calling nl_sock_init() twice explicitly, the > nasty goto in nl_sock_init_do() isn't really needed any more, and we > could make it just get one socket per call. But maybe that cleanup's > not in scope for this patch. Hmm, yeah, why not, changed in v2. On Fri, 14 Oct 2022 14:20:44 +1100 David Gibson wrote: > On Fri, Oct 14, 2022 at 02:12:26PM +1100, David Gibson wrote: > > On Thu, Oct 13, 2022 at 06:34:06PM +0200, Stefano Brivio wrote: > > > Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in > > > the target user namespace as we isolate the process, which means > > > we're unable to bind to low ports at that point. > > > > > > Bind inbound ports, and only those, before isolate_user(). Keep the > > > handling of outbound ports (for pasta mode only) after the setup of > > > the namespace, because that's where we'll bind them. > > > > > > To this end, initialise the netlink socket for the init namespace > > > before isolate_user() as well, as we actually need to know the > > > addresses of the upstream interface before binding ports, in case > > > they're not explicitly passed by the user. > > > > > > As we now call nl_sock_init() twice, checking its return code from > > > conf() twice looks a bit heavy: make it exit(), instead, as we > > > can't do much if we don't have netlink sockets. > > > > > > While at it: > > > > > > - move the v4_only && v6_only options check just after the first > > > option processing loop, as this is more strictly related to > > > option parsing proper > > > > > > - update the man page, explaining that CAP_NET_BIND_SERVICE is > > > *not* the preferred way to bind ports, because passt and pasta > > > can be abused to allow other processes to make effective usage > > > of it. Add a note about the recommended sysctl instead > > > > > > Reported-by: David Gibson > > > Signed-off-by: Stefano Brivio > > Sorry, sent the previous reply before I'd finished. > > > > -If the port forwarding configuration requires binding to port numbers lower than > > > -1024, \fBpasst\fR and \fBpasta\fR will try to bind to them, but will fail if not > > > -running as root, or without the \fICAP_NET_BIND_SERVICE\fR Linux capability, see > > > -\fBservices\fR(5) and \fBcapabilities\fR(7). To grant the > > > -\fICAP_NET_BIND_SERVICE\fR capability to passt, you can issue, as root: > > > +If the port forwarding configuration requires binding to ports with numbers > > > +lower than 1024, \fBpasst\fR and \fBpasta\fR will try to bind to them, but will > > > +fail, unless, either: > > > + > > > +.IP \(bu 2 > > > +the \fIsys.net.ipv4.ip_unprivileged_port_start\fR sysctl is set to the number > > > +of the lowest port \fBpasst\fR and \fBpasta\fR need. For example, as root: > > > + > > > +.nf > > > + sysctl -w net.ipv4.ip_unprivileged_port_start=443 > > > +.fi > > > + > > > +\fBNote\fR: this is the recommended way of enabling \fBpasst\fR and \fBpasta\fR > > > +to bind to ports with numbers below 1024. > > > + > > > +.IP \(bu > > > +or the \fICAP_NET_BIND_SERVICE\fR Linux capability is granted, see > > > +\fBservices\fR(5) and \fBcapabilities\fR(7). > > > + > > > +This is, in general, \fBnot the recommended way\fR, because \fBpasst\fR and > > > +\fBpasta\fR might be used as vector to effectively use this capability from > > > +another process. > > > + > > > +However, if your environment is sufficiently controlled by an LSM (Linux > > > +Security Module) such as \fIAppArmor\fR, \fISELinux\fR, \fISmack\fR or > > > +\fITOMOYO\fR, and no other processes can interact in such a way in virtue of > > > +this, granting this capability to \fBpasst\fR and \fBpasta\fR only can > > > +effectively prevent other processes from utilising it. > > > + > > > +Note that this will not work for automatic detection and forwarding of ports > > > +with \fBpasta\fR, because \fBpasta\fR will relinquish this capability at > > > +runtime. > > > + > > > +To grant this capability, you can issue, as root: > > > + > > > +.nf > > > + setcap 'cap_net_bind_service=+ep' $(which passt) > > > +.fi > > > > > > -.RS > > > -setcap 'cap_net_bind_service=+ep' $(which passt) > > > .RE > > These likely won't be enough, since for most users the caps on the > passt.avx2 binary are the ones that will matter. ...I didn't want to make the man page arch-specific, but on the other hand it's very likely somebody will struggle with this. Replaced with a while loop on which(1) output, that's pretty much the only way to keep stderr clean and the man page nicely displayed in small terminals. -- Stefano