From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTP id 00FBB5A0265 for ; Thu, 13 Oct 2022 04:18:48 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1665627528; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bWhw6ZYH/ChdUFidBTGELL8g+gFr8wd96/lW8a3hN/s=; b=AHzUBM/MJ5npAFIGPc8DRLIvIgaHSgjfsLyhBp8TanAi2rRsRCRwRdds+KTlPPh0XkwY3X 9BiAnehbPxtnBw6+8W8UHMTnGZjKjSITznTiWhIt4QJ8y5HqNeUgvZm07XDEhoM8fWOEx5 hiEcpfuyt2TgkHBLEM7aJgwhR/R6p2o= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-393-G9HIOTyAOSu2rCBHdiUyzg-1; Wed, 12 Oct 2022 22:18:44 -0400 X-MC-Unique: G9HIOTyAOSu2rCBHdiUyzg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 07078811E87; Thu, 13 Oct 2022 02:18:34 +0000 (UTC) Received: from maya.cloud.tilaa.com (ovpn-208-3.brq.redhat.com [10.40.208.3]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 22F7C40C2064; Thu, 13 Oct 2022 02:18:29 +0000 (UTC) Date: Thu, 13 Oct 2022 04:18:24 +0200 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 07/10] isolation: Replace drop_caps() with a version that actually does something Message-ID: <20221013041713.16db5ad1@elisabeth> In-Reply-To: <20221011054018.1449506-8-david@gibson.dropbear.id.au> References: <20221011054018.1449506-1-david@gibson.dropbear.id.au> <20221011054018.1449506-8-david@gibson.dropbear.id.au> Organization: Red Hat MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: NDUHAW7IRQERF2KKYXSUQG7PFOXYZZEB X-Message-ID-Hash: NDUHAW7IRQERF2KKYXSUQG7PFOXYZZEB X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: <> Archived-At: List-Archive: <> List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Well, this drop_caps() is pretty much the same as patch 8/10, so it actually did something. :) On Tue, 11 Oct 2022 16:40:15 +1100 David Gibson wrote: > The current implementation of drop_caps() doesn't really work because it > attempts to drop capabilities from the bounding set. hat's not the set > that really matters: the bounding set is about limiting the abilities of > otherwise things we might later exec() rather than our own capabilities. > In addition altering the bounding set requires CAP_SETPCAP which we won't > usually have. > > Replace it with a new version which uses setcap(2) to drop capabilities > from the effective and permitted sets, which is what actually matters for > most purposes. For now we leave the inheritable set alone, since we don't > want to preclude the user from passing inheritable capabilities to the > command spawed by pasta. > > Correctly dropping caps reveals that we actually need CAP_SYS_ADMIN within > the userns we create/join in pasta mode, so that we can later setns() to > the netns within it. > > Signed-off-by: David Gibson > --- > isolation.c | 52 ++++++++++++++++++++++++++++++++++++++++++++-------- > 1 file changed, 44 insertions(+), 8 deletions(-) > > diff --git a/isolation.c b/isolation.c > index 4aa75e6..2468f84 100644 > --- a/isolation.c > +++ b/isolation.c > @@ -86,18 +86,37 @@ > #include "passt.h" > #include "isolation.h" > > +#define CAP_VERSION _LINUX_CAPABILITY_VERSION_3 > +#define CAP_WORDS _LINUX_CAPABILITY_U32S_3 > + > /** > - * drop_caps() - Drop capabilities we might have except for CAP_NET_BIND_SERVICE > + * drop_caps_ep_except() - Drop capabilities from effective & permitted sets > + * @keep: Capabilities to keep > */ > -static void drop_caps(void) > +static void drop_caps_ep_except(uint64_t keep) > { > + struct __user_cap_header_struct hdr = { > + .version = CAP_VERSION, > + .pid = 0, > + }; > + struct __user_cap_data_struct data[CAP_WORDS]; > int i; > > - for (i = 0; i < 64; i++) { > - if (i == CAP_NET_BIND_SERVICE) > - continue; > + if (syscall(SYS_capget, &hdr, data)) { > + err("Couldn't get current capabilities: %s", strerror(errno)); > + exit(EXIT_FAILURE); > + } > + > + for (i = 0; i < CAP_WORDS; i++) { > + uint32_t mask = keep >> (32 * i); > + > + data[i].effective &= mask; > + data[i].permitted &= mask; > + } > > - prctl(PR_CAPBSET_DROP, i, 0, 0, 0); > + if (syscall(SYS_capset, &hdr, data)) { > + err("Couldn't drop capabilities: %s", strerror(errno)); > + exit(EXIT_FAILURE); > } > } > > @@ -111,7 +130,11 @@ static void drop_caps(void) > */ > void isolate_initial(void) > { > - drop_caps(); > + /* We want to keep CAP_NET_BIND_SERVICE in the initial > + * namespace if we have it, so that we can forward low ports > + * into the guest/namespace > + */ > + drop_caps_ep_except((1UL << CAP_NET_BIND_SERVICE)); You could use BIT() (util.h) here, > } > > /** > @@ -211,6 +234,7 @@ void isolate_user(uid_t uid, gid_t gid, bool use_userns, const char *userns) > int isolate_prefork(struct ctx *c) > { > int flags = CLONE_NEWIPC | CLONE_NEWNS | CLONE_NEWUTS; > + uint64_t ns_caps = 0; > > /* If we run in foreground, we have no chance to actually move to a new > * PID namespace. For passt, use CLONE_NEWPID anyway, in case somebody > @@ -251,7 +275,19 @@ int isolate_prefork(struct ctx *c) > return -errno; > } > > - drop_caps(); /* Relative to the new user namespace this time. */ > + /* Drop capabilites in our new userns */ > + if (c->mode == MODE_PASTA) { > + /* Keep CAP_SYS_ADMIN, so that we can setns() to the > + * netns when we need to act upon it > + */ > + ns_caps |= 1UL << CAP_SYS_ADMIN; here, > + /* Keep CAP_NET_BIND_SERVICE, so we can splice > + * outbound connections to low port numbers > + */ > + ns_caps |= 1UL << CAP_NET_BIND_SERVICE; and here. > + } > + > + drop_caps_ep_except(ns_caps); > > return 0; > } -- Stefano