From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 3E3515A005E for ; Fri, 14 Oct 2022 01:42:46 +0200 (CEST) Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4MpQzr0dvCz4xGw; Fri, 14 Oct 2022 10:42:32 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=201602; t=1665704552; bh=vmFHqmKZkTsBuYW3XNXGkZ9j2Ho1JKn+2lrxT5jPvk4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=XxH+X7M3g2A7ejQNDNp06JtaS+FIW1e43TQ/LmzgrXqnjdfBJshYAFKFz850egdTy Dy4AM8EYW1SrRXuzgOlSaY5jAPSd++XCrav4Rky99RSnx7B96Jau0XmAqebSGlWD9V l6uJznIHH8AJHvtI05X7PLYIOZUKqGV9YJ1a8GcI= Date: Fri, 14 Oct 2022 10:42:18 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 07/10] isolation: Replace drop_caps() with a version that actually does something Message-ID: References: <20221011054018.1449506-1-david@gibson.dropbear.id.au> <20221011054018.1449506-8-david@gibson.dropbear.id.au> <20221013060119.48d51493@elisabeth> <20221013150802.356cf532@elisabeth> <20221013183705.56e28681@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="cEgB+JZlnCBb5sD3" Content-Disposition: inline In-Reply-To: <20221013183705.56e28681@elisabeth> Message-ID-Hash: JOSW2Z75SYEHAXQBOJFEMGECYF3QLTR3 X-Message-ID-Hash: JOSW2Z75SYEHAXQBOJFEMGECYF3QLTR3 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: <> Archived-At: List-Archive: <> List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --cEgB+JZlnCBb5sD3 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Oct 13, 2022 at 06:37:05PM +0200, Stefano Brivio wrote: > On Thu, 13 Oct 2022 15:08:02 +0200 > Stefano Brivio wrote: >=20 > > On Thu, 13 Oct 2022 06:01:19 +0200 > > Stefano Brivio wrote: > >=20 > > > On Tue, 11 Oct 2022 16:40:15 +1100 > > > David Gibson wrote: > > > =20 > > > > @@ -251,7 +275,19 @@ int isolate_prefork(struct ctx *c) > > > > return -errno; > > > > } > > > > =20 > > > > - drop_caps(); /* Relative to the new user namespace this time. */ > > > > + /* Drop capabilites in our new userns */ > > > > + if (c->mode =3D=3D MODE_PASTA) { > > > > + /* Keep CAP_SYS_ADMIN, so that we can setns() to the > > > > + * netns when we need to act upon it > > > > + */ > > > > + ns_caps |=3D 1UL << CAP_SYS_ADMIN; > > > > + /* Keep CAP_NET_BIND_SERVICE, so we can splice > > > > + * outbound connections to low port numbers > > > > + */ > > > > + ns_caps |=3D 1UL << CAP_NET_BIND_SERVICE; > > > > + } > > > > + > > > > + drop_caps_ep_except(ns_caps); =20 > > >=20 > > > Hmm, I didn't really look into this yet, but there seems to be an iss= ue > > > with filesystem-bound network namespaces now. Running something like: > > >=20 > > > pasta --config-net --netns /run/user/1000/netns/netns-6466ff4b-1efc= -2b58-685b-cbc12feb9ccc > > >=20 > > > (from Podman), this happens: > > >=20 > > > [...] > > > > > > [pid 1763223] setns(7, CLONE_NEWNET) =3D -1 EPERM (Operation not p= ermitted) =20 > >=20 > > Ah, "of course". Podman calls us with UID 0 in the user namespace it > > just created, so if we drop CAP_SYS_ADMIN in isolate_initial() we can't > > join the network namespace, and if we drop CAP_NET_ADMIN we can't > > configure it. > >=20 > > So for that case (and only for that, I suppose), we need something like > > (tested): > >=20 > > diff --git a/isolation.c b/isolation.c > > index 1769180..fee6dbd 100644 > > --- a/isolation.c > > +++ b/isolation.c > > @@ -190,7 +190,7 @@ void isolate_initial(void) > > * namespace if we have it, so that we can forward low ports > > * into the guest/namespace > > */ > > - drop_caps_ep_except((1UL << CAP_NET_BIND_SERVICE)); > > + drop_caps_ep_except(BIT(CAP_SYS_ADMIN) | BIT(CAP_NET_ADMIN)); > > } > >=20 > > ...which is a bit pointless. Better than *any* capability, but not by > > far. > >=20 > > So, if we make this totally independent from configuration, we need > > those two capabilities. > >=20 > > We could add a "postconf" stage and cover a tiny bit more of conf.c. > >=20 > > Or we could have a special path in isolate_initial() for the case we > > know we're not in the init namespace. > >=20 > > I'm not sure. If you have a specific preference/strong opinion I would > > actually be happier. :) >=20 > Further on, if we are started as root, we'll fail to drop to 'nobody' > or any other user, if we lose CAP_SETUID and CAP_SETGID here. I have > tested this version of isolate_initial(): >=20 > drop_caps_ep_except(BIT(CAP_SYS_ADMIN) | BIT(CAP_NET_ADMIN) | > BIT(CAP_SETGID) | BIT(CAP_SETUID) | > BIT(CAP_NET_BIND_SERVICE)); >=20 > for any use case I can reasonably think of. Yes, it's a lot -- we > should make it really clear that those are not the capabilities we > actually use at "run time". Ah, right. I didn't think through the --netns-only case. I think what we need to do in the short term at least is: * Add all those caps and a comment in isolate_initial() (or even just remove the drop_caps there) * In isolate user drop the caps we can at that point (SETGID and SETUID) - whether or not we've joined a new userns * Remove the rest of the "runtime" caps in isolate_prefork() like we do now. I'll rework accordingly. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --cEgB+JZlnCBb5sD3 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEoULxWu4/Ws0dB+XtgypY4gEwYSIFAmNIolMACgkQgypY4gEw YSK0sRAAg7FUG9ZYEMinskk0MK84cSJiR0KMZw/4WDGjD7IOyjWSkAY7hbXb4be5 S5SMAHv1+fRr59BlFa5ZGCDdZTx+ILbkn1xabujwqjFmOf3FVhRJb+QNzSkWKyfx quhJ8qmzSPezxqfdDiy/i5xRwjZTE7eeuBCeBpISFYU+pb9sQNl+2UTSeAVzISnz zpjBrfTcVkdfFNLOy9bodchspOOYlLz23SW9QJ5RSp5esFhbrcvB078v1R/T2YnH HGx8GEOeMMupdD8Ji0UB90BwhanSrEweWnGLtjz5sfA+MwiLDLfXvA6lXxX8YdmF h8mf7W8+lE0W+4EPW6lhV0xDoL2SLeIbfUUDtNWx4O4qpIJBYu4Ua1FmLhVp1L1Z F4eS/uaKip0pLCd2GmfTuYZr0RDBlW/gx0LkSApdBmJhyak3q/ArOzHo/+oHU6Rf cZotl0RI6u+b7eqrlIZuOi6prWBd+18e2mcFayAfVo2spE/lT1zPVG4clrTErbL4 sD/2U1azu2UaXHXPZrM1pt2VhquxjLBD9ftgjlbhenFfQZQm4uYQ5h9iaKPDOgAJ Ln57PGtqQYh9Vh6XZ/caswpHQ6T+wWwQyGPhcvbrWkBuoqsurwQ/iWzLAdVq5405 ACbUxIYe7l8czeRgZ40AVreYIzPonefnVgC6Iop2fLxC2caGgN4= =ShrN -----END PGP SIGNATURE----- --cEgB+JZlnCBb5sD3--