From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id D15085A005E for ; Mon, 17 Oct 2022 06:26:32 +0200 (CEST) Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4MrP854CCrz4xG5; Mon, 17 Oct 2022 15:26:29 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=201602; t=1665980789; bh=H5R6eG9znF71F26cfLua9DYC0YF6SdgXez44nFjKHKI=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=O9xapVMfAKZ1+5S2lDAhk+7Kz9qEscQFsdyBRP/AIf+62mJxDiq/IpGdbM7C6p6/H ALIrPkDxRbJDCeQNM3KlZAydxIdErE7QjVUMC4IpEsPZH5WF4KBQzecvFtLgAHArVg HKBvb8GbjLIfNPU0DmUZWGUQlf90lYEv0NkOHn8A= Date: Mon, 17 Oct 2022 15:26:25 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v2] conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user() Message-ID: References: <20221014063624.327051-1-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="5BV0m6PxPc2HK0N/" Content-Disposition: inline In-Reply-To: <20221014063624.327051-1-sbrivio@redhat.com> Message-ID-Hash: LNDBP4A2XCCZ2G2KLFEVDMJNA72T6XTN X-Message-ID-Hash: LNDBP4A2XCCZ2G2KLFEVDMJNA72T6XTN X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: <> Archived-At: List-Archive: <> List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --5BV0m6PxPc2HK0N/ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Oct 14, 2022 at 08:36:24AM +0200, Stefano Brivio wrote: > Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in > the target user namespace as we isolate the process, which means > we're unable to bind to low ports at that point. >=20 > Bind inbound ports, and only those, before isolate_user(). Keep the > handling of outbound ports (for pasta mode only) after the setup of > the namespace, because that's where we'll bind them. >=20 > To this end, initialise the netlink socket for the init namespace > before isolate_user() as well, as we actually need to know the > addresses of the upstream interface before binding ports, in case > they're not explicitly passed by the user. >=20 > As we now call nl_sock_init() twice, checking its return code from > conf() twice looks a bit heavy: make it exit(), instead, as we > can't do much if we don't have netlink sockets. >=20 > While at it: >=20 > - move the v4_only && v6_only options check just after the first > option processing loop, as this is more strictly related to > option parsing proper >=20 > - update the man page, explaining that CAP_NET_BIND_SERVICE is > *not* the preferred way to bind ports, because passt and pasta > can be abused to allow other processes to make effective usage > of it. Add a note about the recommended sysctl instead >=20 > - simplify nl_sock_init_do() now that it's called once for each > case >=20 > Reported-by: David Gibson > Signed-off-by: Stefano Brivio Reviewed-by: David Gibson > --- > v2: Simplify nl_sock_init_do(), add setcap for passt.avx2 in > man page (David Gibson) >=20 > conf.c | 71 +++++++++++++++++++++++++++---------------------------- > netlink.c | 40 ++++++++++++++++--------------- > netlink.h | 2 +- > passt.1 | 47 ++++++++++++++++++++++++++++++------ > tap.c | 1 + > 5 files changed, 98 insertions(+), 63 deletions(-) >=20 > diff --git a/conf.c b/conf.c > index a22ef48..e1f42f1 100644 > --- a/conf.c > +++ b/conf.c > @@ -1530,6 +1530,11 @@ void conf(struct ctx *c, int argc, char **argv) > } > } while (name !=3D -1); > =20 > + if (v4_only && v6_only) { > + err("Options ipv4-only and ipv6-only are mutually exclusive"); > + usage(argv[0]); > + } > + > ret =3D conf_ugid(runas, &uid, &gid); > if (ret) > usage(argv[0]); > @@ -1539,6 +1544,30 @@ void conf(struct ctx *c, int argc, char **argv) > logfile, logsize); > } > =20 > + nl_sock_init(c, false); > + if (!v6_only) > + c->ifi4 =3D conf_ip4(ifi, &c->ip4, c->mac); > + if (!v4_only) > + c->ifi6 =3D conf_ip6(ifi, &c->ip6, c->mac); > + if (!c->ifi4 && !c->ifi6) { > + err("External interface not usable"); > + exit(EXIT_FAILURE); > + } > + > + /* Inbound port options can be parsed now (after IPv4/IPv6 settings) */ > + optind =3D 1; > + do { > + struct port_fwd *fwd; > + > + name =3D getopt_long(argc, argv, optstring, options, NULL); > + > + if ((name =3D=3D 't' && (fwd =3D &c->tcp.fwd_in)) || > + (name =3D=3D 'u' && (fwd =3D &c->udp.fwd_in.f))) { > + if (!optarg || conf_ports(c, name, optarg, fwd)) > + usage(argv[0]); > + } > + } while (name !=3D -1); > + > if (c->mode =3D=3D MODE_PASTA) { > if (conf_pasta_ns(&netns_only, userns, netns, > optind, argc, argv) < 0) > @@ -1561,50 +1590,20 @@ void conf(struct ctx *c, int argc, char **argv) > } > } > =20 > - if (nl_sock_init(c)) { > - err("Failed to get netlink socket"); > - exit(EXIT_FAILURE); > - } > - > - if (v4_only && v6_only) { > - err("Options ipv4-only and ipv6-only are mutually exclusive"); > - usage(argv[0]); > - } > - if (!v6_only) > - c->ifi4 =3D conf_ip4(ifi, &c->ip4, c->mac); > - if (!v4_only) > - c->ifi6 =3D conf_ip6(ifi, &c->ip6, c->mac); > - if (!c->ifi4 && !c->ifi6) { > - err("External interface not usable"); > - exit(EXIT_FAILURE); > - } > + if (c->mode =3D=3D MODE_PASTA) > + nl_sock_init(c, true); > =20 > - /* Now we can process port configuration options */ > + /* ...and outbound port options now that namespaces are set up. */ > optind =3D 1; > do { > - struct port_fwd *fwd =3D NULL; > + struct port_fwd *fwd; > =20 > name =3D getopt_long(argc, argv, optstring, options, NULL); > - switch (name) { > - case 't': > - case 'u': > - case 'T': > - case 'U': > - if (name =3D=3D 't') > - fwd =3D &c->tcp.fwd_in; > - else if (name =3D=3D 'T') > - fwd =3D &c->tcp.fwd_out; > - else if (name =3D=3D 'u') > - fwd =3D &c->udp.fwd_in.f; > - else if (name =3D=3D 'U') > - fwd =3D &c->udp.fwd_out.f; > =20 > + if ((name =3D=3D 'T' && (fwd =3D &c->tcp.fwd_out)) || > + (name =3D=3D 'U' && (fwd =3D &c->udp.fwd_out.f))) { > if (!optarg || conf_ports(c, name, optarg, fwd)) > usage(argv[0]); > - > - break; > - default: > - break; > } > } while (name !=3D -1); > =20 > diff --git a/netlink.c b/netlink.c > index 6e5a96b..0850cbe 100644 > --- a/netlink.c > +++ b/netlink.c > @@ -19,6 +19,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -39,57 +40,58 @@ static int nl_sock_ns =3D -1; > static int nl_seq; > =20 > /** > - * nl_sock_init_do() - Set up netlink sockets in init and target namespa= ce > - * @arg: Execution context > + * nl_sock_init_do() - Set up netlink sockets in init or target namespace > + * @arg: Execution context, if running from namespace, NULL otherwise > * > * Return: 0 > */ > static int nl_sock_init_do(void *arg) > { > struct sockaddr_nl addr =3D { .nl_family =3D AF_NETLINK, }; > - int *s =3D &nl_sock; > + int *s =3D arg ? &nl_sock_ns : &nl_sock; > #ifdef NETLINK_GET_STRICT_CHK > int y =3D 1; > #endif > =20 > -ns: > + if (arg) > + ns_enter((struct ctx *)arg); > + > if (((*s) =3D socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE)) < 0 || > - bind(*s, (struct sockaddr *)&addr, sizeof(addr))) > + bind(*s, (struct sockaddr *)&addr, sizeof(addr))) { > *s =3D -1; > - > - if (*s =3D=3D -1 || !arg || s =3D=3D &nl_sock_ns) > return 0; > + } > =20 > #ifdef NETLINK_GET_STRICT_CHK > if (setsockopt(*s, SOL_NETLINK, NETLINK_GET_STRICT_CHK, &y, sizeof(y))) > debug("netlink: cannot set NETLINK_GET_STRICT_CHK on %i", *s); > #endif > - > - ns_enter((struct ctx *)arg); > - s =3D &nl_sock_ns; > - goto ns; > + return 0; > } > =20 > /** > - * nl_sock_init() - Call nl_sock_init_do() and check for failures > + * nl_sock_init() - Call nl_sock_init_do(), won't return on failure > * @c: Execution context > - * > - * Return: -EIO if sockets couldn't be set up, 0 otherwise > + * @ns: Get socket in namespace, not in init > */ > -int nl_sock_init(const struct ctx *c) > +void nl_sock_init(const struct ctx *c, bool ns) > { > - if (c->mode =3D=3D MODE_PASTA) { > + if (ns) { > NS_CALL(nl_sock_init_do, c); > if (nl_sock_ns =3D=3D -1) > - return -EIO; > + goto fail; > } else { > nl_sock_init_do(NULL); > } > =20 > if (nl_sock =3D=3D -1) > - return -EIO; > + goto fail; > =20 > - return 0; > + return; > + > +fail: > + err("Failed to get netlink socket"); > + exit(EXIT_FAILURE); > } > =20 > /** > diff --git a/netlink.h b/netlink.h > index 5ce5037..3c991cc 100644 > --- a/netlink.h > +++ b/netlink.h > @@ -6,7 +6,7 @@ > #ifndef NETLINK_H > #define NETLINK_H > =20 > -int nl_sock_init(const struct ctx *c); > +void nl_sock_init(const struct ctx *c, bool ns); > unsigned int nl_get_ext_if(sa_family_t af); > void nl_route(int ns, unsigned int ifi, sa_family_t af, void *gw); > void nl_addr(int ns, unsigned int ifi, sa_family_t af, > diff --git a/passt.1 b/passt.1 > index 7d113f2..4848087 100644 > --- a/passt.1 > +++ b/passt.1 > @@ -771,14 +771,47 @@ possible to bind sockets to foreign addresses. > =20 > .SS Binding to low numbered ports (well-known or system ports, up to 102= 3) > =20 > -If the port forwarding configuration requires binding to port numbers lo= wer than > -1024, \fBpasst\fR and \fBpasta\fR will try to bind to them, but will fai= l if not > -running as root, or without the \fICAP_NET_BIND_SERVICE\fR Linux capabil= ity, see > -\fBservices\fR(5) and \fBcapabilities\fR(7). To grant the > -\fICAP_NET_BIND_SERVICE\fR capability to passt, you can issue, as root: > +If the port forwarding configuration requires binding to ports with numb= ers > +lower than 1024, \fBpasst\fR and \fBpasta\fR will try to bind to them, b= ut will > +fail, unless, either: > + > +.IP \(bu 2 > +the \fIsys.net.ipv4.ip_unprivileged_port_start\fR sysctl is set to the n= umber > +of the lowest port \fBpasst\fR and \fBpasta\fR need. For example, as roo= t: > + > +.nf > + sysctl -w net.ipv4.ip_unprivileged_port_start=3D443 > +.fi > + > +\fBNote\fR: this is the recommended way of enabling \fBpasst\fR and \fBp= asta\fR > +to bind to ports with numbers below 1024. > + > +.IP \(bu > +or the \fICAP_NET_BIND_SERVICE\fR Linux capability is granted, see > +\fBservices\fR(5) and \fBcapabilities\fR(7). > + > +This is, in general, \fBnot the recommended way\fR, because \fBpasst\fR = and > +\fBpasta\fR might be used as vector to effectively use this capability f= rom > +another process. > + > +However, if your environment is sufficiently controlled by an LSM (Linux > +Security Module) such as \fIAppArmor\fR, \fISELinux\fR, \fISmack\fR or > +\fITOMOYO\fR, and no other processes can interact in such a way in virtu= e of > +this, granting this capability to \fBpasst\fR and \fBpasta\fR only can > +effectively prevent other processes from utilising it. > + > +Note that this will not work for automatic detection and forwarding of p= orts > +with \fBpasta\fR, because \fBpasta\fR will relinquish this capability at > +runtime. > + > +To grant this capability, you can issue, as root: > + > +.nf > + for p in $(which passt passt.avx2); do > + setcap 'cap_net_bind_service=3D+ep' "${p}" > + done > +.fi > =20 > -.RS > -setcap 'cap_net_bind_service=3D+ep' $(which passt) > .RE > =20 > .SS ICMP/ICMPv6 Echo sockets > diff --git a/tap.c b/tap.c > index 77e513c..8b6d9bc 100644 > --- a/tap.c > +++ b/tap.c > @@ -30,6 +30,7 @@ > #include > #include > #include > +#include > #include > #include > #include --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --5BV0m6PxPc2HK0N/ Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEoULxWu4/Ws0dB+XtgypY4gEwYSIFAmNM2WsACgkQgypY4gEw YSJcQBAAv/J+tUuPi1X6kVGcowGs18leNPce/xMjwkriE+Aq9Uiu8NG98PEABDIx PINLtwlTRuWdTaacDCpV3DwHe9Ksm+b/8wd+NErVy2s4f19wfpWH+IL39jZfp/au rGuNST/LBUoT47T0jfEWEhMCQsm8LHjgZV+Ag/Ocx4myALiQNsu85WgBRTwojIKV EIcQq3LpfSZe7la7/2cXGufFrZeHUOEVYd8MxC9waD9GUxZpYYNP/UziGijlffz+ kj+wMb6Qh+wLiSWGD8u6UwJwX2jxn5XabCcd97tRnthUcRWhqggFz/cmxTel8Nq4 n/LzrHld0KuVDH/SRACp5zX8qyVOuq9wZ4yDmGwBW6B6YDhK8qKTM/erJR3SFVUT q0o4Jpxk+iTKPi4rlNBks5NfL+plFbTusfTap4eAF20xlpKVSlGcAYVrWHjObcC/ fD+A0to3iiltDaOZPRJOON/jQhEQr2msoAhIJ/c0lSCKzMvTRAgQUEkCvNhAk6cm XxIg1ukSO3ZPAfHFkmmOKxPpfmYJKH25m/tWOVk523a7uosMuQVLKN4Z/3gL0uCT 6bQwf9Ltof9m5FS49Ur01v5JFCWZahApIlC44G+y+yNICWNzKZO1EEb/V7lpLsQG EdCzVvPgnVytn34OLUDIOsrFrVKUIH0yNDGgZtjWIkd/Q+MYeDs= =jNvN -----END PGP SIGNATURE----- --5BV0m6PxPc2HK0N/--