From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id C108B5A005E for ; Thu, 13 Oct 2022 12:05:42 +0200 (CEST) Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4Mp4s83G05z4xGG; Thu, 13 Oct 2022 21:05:32 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=201602; t=1665655532; bh=IddZEUf30vuMA1HFFPws5V96SI0DDLjFIqXvSWpH1LU=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=JJloRepqsD+jpJRKSdAhIjrJLkT5KQ76hDqTiWjKMBTLz3AAD+rPAhoe6QZUw+h4i Fv4LJmtjhAzLFQJlDKkeule6R2pFKufMlXNJLVcLvK7t59xe44LUPsnMzh4YW4Mhc2 0taMlbO02NzsziN9kpcYLd/JWgBlkF0pxMgv2tcY= Date: Thu, 13 Oct 2022 20:44:58 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 07/10] isolation: Replace drop_caps() with a version that actually does something Message-ID: References: <20221011054018.1449506-1-david@gibson.dropbear.id.au> <20221011054018.1449506-8-david@gibson.dropbear.id.au> <20221013041713.16db5ad1@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="gLcr635E8TAKYGiA" Content-Disposition: inline In-Reply-To: <20221013041713.16db5ad1@elisabeth> Message-ID-Hash: WWYWI6D2IGRW72WAO35NMF47R4KJUDQ2 X-Message-ID-Hash: WWYWI6D2IGRW72WAO35NMF47R4KJUDQ2 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: <> Archived-At: List-Archive: <> List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --gLcr635E8TAKYGiA Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Oct 13, 2022 at 04:18:24AM +0200, Stefano Brivio wrote: > Well, this drop_caps() is pretty much the same as patch 8/10, so it > actually did something. :) Yes, but not what we wanted :). > On Tue, 11 Oct 2022 16:40:15 +1100 > David Gibson wrote: >=20 > > The current implementation of drop_caps() doesn't really work because it > > attempts to drop capabilities from the bounding set. hat's not the set > > that really matters: the bounding set is about limiting the abilities of > > otherwise things we might later exec() rather than our own capabilities. > > In addition altering the bounding set requires CAP_SETPCAP which we won= 't > > usually have. > >=20 > > Replace it with a new version which uses setcap(2) to drop capabilities > > from the effective and permitted sets, which is what actually matters f= or > > most purposes. For now we leave the inheritable set alone, since we do= n't > > want to preclude the user from passing inheritable capabilities to the > > command spawed by pasta. > >=20 > > Correctly dropping caps reveals that we actually need CAP_SYS_ADMIN wit= hin > > the userns we create/join in pasta mode, so that we can later setns() to > > the netns within it. > >=20 > > Signed-off-by: David Gibson > > --- > > isolation.c | 52 ++++++++++++++++++++++++++++++++++++++++++++-------- > > 1 file changed, 44 insertions(+), 8 deletions(-) > >=20 > > diff --git a/isolation.c b/isolation.c > > index 4aa75e6..2468f84 100644 > > --- a/isolation.c > > +++ b/isolation.c > > @@ -86,18 +86,37 @@ > > #include "passt.h" > > #include "isolation.h" > > =20 > > +#define CAP_VERSION _LINUX_CAPABILITY_VERSION_3 > > +#define CAP_WORDS _LINUX_CAPABILITY_U32S_3 > > + > > /** > > - * drop_caps() - Drop capabilities we might have except for CAP_NET_BI= ND_SERVICE > > + * drop_caps_ep_except() - Drop capabilities from effective & permitte= d sets > > + * @keep: Capabilities to keep > > */ > > -static void drop_caps(void) > > +static void drop_caps_ep_except(uint64_t keep) > > { > > + struct __user_cap_header_struct hdr =3D { > > + .version =3D CAP_VERSION, > > + .pid =3D 0, > > + }; > > + struct __user_cap_data_struct data[CAP_WORDS]; > > int i; > > =20 > > - for (i =3D 0; i < 64; i++) { > > - if (i =3D=3D CAP_NET_BIND_SERVICE) > > - continue; > > + if (syscall(SYS_capget, &hdr, data)) { > > + err("Couldn't get current capabilities: %s", strerror(errno)); > > + exit(EXIT_FAILURE); > > + } > > + > > + for (i =3D 0; i < CAP_WORDS; i++) { > > + uint32_t mask =3D keep >> (32 * i); > > + > > + data[i].effective &=3D mask; > > + data[i].permitted &=3D mask; > > + } > > =20 > > - prctl(PR_CAPBSET_DROP, i, 0, 0, 0); > > + if (syscall(SYS_capset, &hdr, data)) { > > + err("Couldn't drop capabilities: %s", strerror(errno)); > > + exit(EXIT_FAILURE); > > } > > } > > =20 > > @@ -111,7 +130,11 @@ static void drop_caps(void) > > */ > > void isolate_initial(void) > > { > > - drop_caps(); > > + /* We want to keep CAP_NET_BIND_SERVICE in the initial > > + * namespace if we have it, so that we can forward low ports > > + * into the guest/namespace > > + */ > > + drop_caps_ep_except((1UL << CAP_NET_BIND_SERVICE)); >=20 > You could use BIT() (util.h) here, Ah, yes, done. > > } > > =20 > > /** > > @@ -211,6 +234,7 @@ void isolate_user(uid_t uid, gid_t gid, bool use_us= erns, const char *userns) > > int isolate_prefork(struct ctx *c) > > { > > int flags =3D CLONE_NEWIPC | CLONE_NEWNS | CLONE_NEWUTS; > > + uint64_t ns_caps =3D 0; > > =20 > > /* If we run in foreground, we have no chance to actually move to a n= ew > > * PID namespace. For passt, use CLONE_NEWPID anyway, in case somebody > > @@ -251,7 +275,19 @@ int isolate_prefork(struct ctx *c) > > return -errno; > > } > > =20 > > - drop_caps(); /* Relative to the new user namespace this time. */ > > + /* Drop capabilites in our new userns */ > > + if (c->mode =3D=3D MODE_PASTA) { > > + /* Keep CAP_SYS_ADMIN, so that we can setns() to the > > + * netns when we need to act upon it > > + */ > > + ns_caps |=3D 1UL << CAP_SYS_ADMIN; >=20 > here, >=20 > > + /* Keep CAP_NET_BIND_SERVICE, so we can splice > > + * outbound connections to low port numbers > > + */ > > + ns_caps |=3D 1UL << CAP_NET_BIND_SERVICE; >=20 > and here. >=20 > > + } > > + > > + drop_caps_ep_except(ns_caps); > > =20 > > return 0; > > } >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --gLcr635E8TAKYGiA Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEoULxWu4/Ws0dB+XtgypY4gEwYSIFAmNH3gYACgkQgypY4gEw YSKIWBAAh+e7G4WebqY0NowtlROTK4/0LGvbzBeZPxzZuRW+2eA0A2SowFsRgG2+ N1PCm8tmq8eTHY6+E7AZGFO6QH5g0U988tb9pMlIYCzXg6FPiVNFUKcWilfDlW4o TDuUc7dBsRPtFebBW7ty2BjgH5dR43WZOZS1yU+gx8A54otJL/9PaVgKJf87i/8s euGMw+yE5clLZdAKl1NVWCwYN/5XOg63hSIQtAcG7fNM+4XoUqy4M4RRevcglFeV 81VlGdOzpbckSUatscYe1I+ZdHkTMrdVruieYBL3CJxnnd0N80NskfaafxXqY/EX CJQX+2FRFzM0SYzwpLEn27tFIWJXS63QgSXy1gGyTM1GpKzqmy0oX1GY+KA3MOcF ap0hASBu2cZfqxfphD0dNlxOY/VqAUgSA5rfqEkUH5QQoDmbUgwcLnwzLxg6iMkR vEflrM9wwAOtoQFvOg4vL3Ezm6BELfYQot4KGijjkzhajLmTIuRHwGAaWFBZlJB6 QyrcJsQL2dNvy2MzyYBJx0UsvOTceD8RVQErKWZ/EWogaAjdnEBlck0M9PSB/wxD 7yN5bGOUBw83ZeezmQNOmA2K5HrdReoR2NAxEjL91DpvFXE6kyZb4l9coHNxZig/ 5rHH6SeUdbgPmMTuMtYUkbY+onQRrYkGF782n9USoMKMFlP/4VY= =IOst -----END PGP SIGNATURE----- --gLcr635E8TAKYGiA--