From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 3E05E5A0269 for ; Fri, 14 Oct 2022 06:25:43 +0200 (CEST) Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4MpYGW3ld6z4xGt; Fri, 14 Oct 2022 15:25:39 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=201602; t=1665721539; bh=E0i29RBegTDwW7KooOlMi22Hzuph3mjRC0lNW9mlffY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=QDKS1vvPpebrVGQSerYFfDm1pwhsBoUButi9okxaPNRAU14JBt/pGiYTDjgdcpSF6 XtnQCdeNSF3jzlBdqSn4iXHWaAx4detut5JLpBvFh3vf66ErWgnzRr+JVLWqaIOQON spnh0rtoo6Nur7qKRtK4D4X7IjJpX542WbBri+Og= Date: Fri, 14 Oct 2022 14:12:26 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH] conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user() Message-ID: References: <20221013163406.3727136-1-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="xu2j1AZ/LPU5jOo+" Content-Disposition: inline In-Reply-To: <20221013163406.3727136-1-sbrivio@redhat.com> Message-ID-Hash: TLWYFLDE3G3QWHEQYE3OU2ULPDYEVT46 X-Message-ID-Hash: TLWYFLDE3G3QWHEQYE3OU2ULPDYEVT46 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: <> Archived-At: List-Archive: <> List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --xu2j1AZ/LPU5jOo+ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Oct 13, 2022 at 06:34:06PM +0200, Stefano Brivio wrote: > Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in > the target user namespace as we isolate the process, which means > we're unable to bind to low ports at that point. >=20 > Bind inbound ports, and only those, before isolate_user(). Keep the > handling of outbound ports (for pasta mode only) after the setup of > the namespace, because that's where we'll bind them. >=20 > To this end, initialise the netlink socket for the init namespace > before isolate_user() as well, as we actually need to know the > addresses of the upstream interface before binding ports, in case > they're not explicitly passed by the user. >=20 > As we now call nl_sock_init() twice, checking its return code from > conf() twice looks a bit heavy: make it exit(), instead, as we > can't do much if we don't have netlink sockets. >=20 > While at it: >=20 > - move the v4_only && v6_only options check just after the first > option processing loop, as this is more strictly related to > option parsing proper >=20 > - update the man page, explaining that CAP_NET_BIND_SERVICE is > *not* the preferred way to bind ports, because passt and pasta > can be abused to allow other processes to make effective usage > of it. Add a note about the recommended sysctl instead >=20 > Reported-by: David Gibson > Signed-off-by: Stefano Brivio > --- > This is based on David's patchset "Fixes and cleanups for capability > handling". >=20 > conf.c | 71 +++++++++++++++++++++++++++---------------------------- > netlink.c | 20 +++++++++------- > netlink.h | 2 +- > passt.1 | 45 +++++++++++++++++++++++++++++------ > tap.c | 1 + > 5 files changed, 87 insertions(+), 52 deletions(-) >=20 > diff --git a/conf.c b/conf.c > index a22ef48..e1f42f1 100644 > --- a/conf.c > +++ b/conf.c > @@ -1530,6 +1530,11 @@ void conf(struct ctx *c, int argc, char **argv) > } > } while (name !=3D -1); > =20 > + if (v4_only && v6_only) { > + err("Options ipv4-only and ipv6-only are mutually exclusive"); > + usage(argv[0]); > + } > + > ret =3D conf_ugid(runas, &uid, &gid); > if (ret) > usage(argv[0]); > @@ -1539,6 +1544,30 @@ void conf(struct ctx *c, int argc, char **argv) > logfile, logsize); > } > =20 > + nl_sock_init(c, false); > + if (!v6_only) > + c->ifi4 =3D conf_ip4(ifi, &c->ip4, c->mac); > + if (!v4_only) > + c->ifi6 =3D conf_ip6(ifi, &c->ip6, c->mac); > + if (!c->ifi4 && !c->ifi6) { > + err("External interface not usable"); > + exit(EXIT_FAILURE); > + } > + > + /* Inbound port options can be parsed now (after IPv4/IPv6 settings) */ > + optind =3D 1; > + do { > + struct port_fwd *fwd; > + > + name =3D getopt_long(argc, argv, optstring, options, NULL); > + > + if ((name =3D=3D 't' && (fwd =3D &c->tcp.fwd_in)) || > + (name =3D=3D 'u' && (fwd =3D &c->udp.fwd_in.f))) { > + if (!optarg || conf_ports(c, name, optarg, fwd)) > + usage(argv[0]); > + } > + } while (name !=3D -1); > + > if (c->mode =3D=3D MODE_PASTA) { > if (conf_pasta_ns(&netns_only, userns, netns, > optind, argc, argv) < 0) > @@ -1561,50 +1590,20 @@ void conf(struct ctx *c, int argc, char **argv) > } > } > =20 > - if (nl_sock_init(c)) { > - err("Failed to get netlink socket"); > - exit(EXIT_FAILURE); > - } > - > - if (v4_only && v6_only) { > - err("Options ipv4-only and ipv6-only are mutually exclusive"); > - usage(argv[0]); > - } > - if (!v6_only) > - c->ifi4 =3D conf_ip4(ifi, &c->ip4, c->mac); > - if (!v4_only) > - c->ifi6 =3D conf_ip6(ifi, &c->ip6, c->mac); > - if (!c->ifi4 && !c->ifi6) { > - err("External interface not usable"); > - exit(EXIT_FAILURE); > - } > + if (c->mode =3D=3D MODE_PASTA) > + nl_sock_init(c, true); > =20 > - /* Now we can process port configuration options */ > + /* ...and outbound port options now that namespaces are set up. */ > optind =3D 1; > do { > - struct port_fwd *fwd =3D NULL; > + struct port_fwd *fwd; > =20 > name =3D getopt_long(argc, argv, optstring, options, NULL); > - switch (name) { > - case 't': > - case 'u': > - case 'T': > - case 'U': > - if (name =3D=3D 't') > - fwd =3D &c->tcp.fwd_in; > - else if (name =3D=3D 'T') > - fwd =3D &c->tcp.fwd_out; > - else if (name =3D=3D 'u') > - fwd =3D &c->udp.fwd_in.f; > - else if (name =3D=3D 'U') > - fwd =3D &c->udp.fwd_out.f; > =20 > + if ((name =3D=3D 'T' && (fwd =3D &c->tcp.fwd_out)) || > + (name =3D=3D 'U' && (fwd =3D &c->udp.fwd_out.f))) { > if (!optarg || conf_ports(c, name, optarg, fwd)) > usage(argv[0]); > - > - break; > - default: > - break; > } > } while (name !=3D -1); > =20 > diff --git a/netlink.c b/netlink.c > index 6e5a96b..3ee4d42 100644 > --- a/netlink.c > +++ b/netlink.c > @@ -19,6 +19,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -71,25 +72,28 @@ ns: > } > =20 > /** > - * nl_sock_init() - Call nl_sock_init_do() and check for failures > + * nl_sock_init() - Call nl_sock_init_do(), won't return on failure > * @c: Execution context > - * > - * Return: -EIO if sockets couldn't be set up, 0 otherwise > + * @ns: Get socket in namespace, not in init > */ > -int nl_sock_init(const struct ctx *c) > +void nl_sock_init(const struct ctx *c, bool ns) > { > - if (c->mode =3D=3D MODE_PASTA) { > + if (c->mode =3D=3D MODE_PASTA && ns) { No need for the mode check here: ns is only ever true when in pasta mode. Since you're also calling nl_sock_init() twice explicitly, the nasty goto in nl_sock_init_do() isn't really needed any more, and we could make it just get one socket per call. But maybe that cleanup's not in scope for this patch. > NS_CALL(nl_sock_init_do, c); > if (nl_sock_ns =3D=3D -1) > - return -EIO; > + goto fail; > } else { > nl_sock_init_do(NULL); > } > =20 > if (nl_sock =3D=3D -1) > - return -EIO; > + goto fail; > =20 > - return 0; > + return; > + > +fail: > + err("Failed to get netlink socket"); > + exit(EXIT_FAILURE); > } > =20 > /** > diff --git a/netlink.h b/netlink.h > index 5ce5037..3c991cc 100644 > --- a/netlink.h > +++ b/netlink.h > @@ -6,7 +6,7 @@ > #ifndef NETLINK_H > #define NETLINK_H > =20 > -int nl_sock_init(const struct ctx *c); > +void nl_sock_init(const struct ctx *c, bool ns); > unsigned int nl_get_ext_if(sa_family_t af); > void nl_route(int ns, unsigned int ifi, sa_family_t af, void *gw); > void nl_addr(int ns, unsigned int ifi, sa_family_t af, > diff --git a/passt.1 b/passt.1 > index 7d113f2..2cdba5d 100644 > --- a/passt.1 > +++ b/passt.1 > @@ -771,14 +771,45 @@ possible to bind sockets to foreign addresses. > =20 > .SS Binding to low numbered ports (well-known or system ports, up to 102= 3) > =20 > -If the port forwarding configuration requires binding to port numbers lo= wer than > -1024, \fBpasst\fR and \fBpasta\fR will try to bind to them, but will fai= l if not > -running as root, or without the \fICAP_NET_BIND_SERVICE\fR Linux capabil= ity, see > -\fBservices\fR(5) and \fBcapabilities\fR(7). To grant the > -\fICAP_NET_BIND_SERVICE\fR capability to passt, you can issue, as root: > +If the port forwarding configuration requires binding to ports with numb= ers > +lower than 1024, \fBpasst\fR and \fBpasta\fR will try to bind to them, b= ut will > +fail, unless, either: > + > +.IP \(bu 2 > +the \fIsys.net.ipv4.ip_unprivileged_port_start\fR sysctl is set to the n= umber > +of the lowest port \fBpasst\fR and \fBpasta\fR need. For example, as roo= t: > + > +.nf > + sysctl -w net.ipv4.ip_unprivileged_port_start=3D443 > +.fi > + > +\fBNote\fR: this is the recommended way of enabling \fBpasst\fR and \fBp= asta\fR > +to bind to ports with numbers below 1024. > + > +.IP \(bu > +or the \fICAP_NET_BIND_SERVICE\fR Linux capability is granted, see > +\fBservices\fR(5) and \fBcapabilities\fR(7). > + > +This is, in general, \fBnot the recommended way\fR, because \fBpasst\fR = and > +\fBpasta\fR might be used as vector to effectively use this capability f= rom > +another process. > + > +However, if your environment is sufficiently controlled by an LSM (Linux > +Security Module) such as \fIAppArmor\fR, \fISELinux\fR, \fISmack\fR or > +\fITOMOYO\fR, and no other processes can interact in such a way in virtu= e of > +this, granting this capability to \fBpasst\fR and \fBpasta\fR only can > +effectively prevent other processes from utilising it. > + > +Note that this will not work for automatic detection and forwarding of p= orts > +with \fBpasta\fR, because \fBpasta\fR will relinquish this capability at > +runtime. > + > +To grant this capability, you can issue, as root: > + > +.nf > + setcap 'cap_net_bind_service=3D+ep' $(which passt) > +.fi > =20 > -.RS > -setcap 'cap_net_bind_service=3D+ep' $(which passt) > .RE > =20 > .SS ICMP/ICMPv6 Echo sockets > diff --git a/tap.c b/tap.c > index 77e513c..8b6d9bc 100644 > --- a/tap.c > +++ b/tap.c > @@ -30,6 +30,7 @@ > #include > #include > #include > +#include > #include > #include > #include --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --xu2j1AZ/LPU5jOo+ Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEoULxWu4/Ws0dB+XtgypY4gEwYSIFAmNI05MACgkQgypY4gEw YSJ7DA/+PMbgnoKPXFsOVU/RwamsMVeTBdYpYmbZQygPSPAqgTmhPh/h7jd5rTx0 vGcc9HdNLDNEDkLFkJylBh0sbTodw+g3YPtobWj9rLZ185eomzEbd4dq8zA8/Rvy q3ZH9dVmy5PssngALt4MjEMm9sQJ+DiyhHg3Bec5WSe4jBppZ5AZufLn2oEjnxdI bqmJnDmfoUaRnf6SP4WuIPAriM5FOMNzNKI5L+xS22KY7yGXHDGlc3oppkG0vJCh a2L9eRg4JcHASPbsqpsb5G5cT/pXXhAotM6AcZgt1/w1dRxfqgiud8fb36iS9OHf iAPxAZCcKOn3qnHZmkjPBrYjmkzX4SoxVVsy2CONkb/XRu60/3uexy4l8mBcOU5I bHjRtkv91N+/1sSelAEStdjAX0TZQ57qN1Sk56JMtCD//HlxkqCxlWPhGj4q0iTz rRPYVyc5I0VJoWfDl0b928YfDnf5EXXoHjn9BAeVgl5JmQoKsieIXQWmG3bmlGAT nUvA1BTI6Wm0OsyKXtO2aghtQ+IWZplV2sZibFTXXjEQb548XfhxEY6VByGdCjz0 DEIhwZal3A+dcM450jQXISuFdgcgvFecWqV8Z00u7JRBnWHxUBr4D/yMNXvWnGBc 1NEad5VyxgJJNBiwxmHg6qrfuzOPfKh6ItHW+VeuowCQirT1zgU= =mcwq -----END PGP SIGNATURE----- --xu2j1AZ/LPU5jOo+--