From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTP id 8A2555A0265 for ; Thu, 13 Oct 2022 04:18:22 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1665627501; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OTlku5Gr515w2Emks9tRPuPxqG1NE1HloXnTO/05TGQ=; b=SoiBiNUbSGEuphUEKCDd8rCJTZVyoQ4GyuCMBYw+M6xJ6kab3PvxUZkSNbNIGWUX3uwSmx A6tYNO7efQnivwnSTKPRuBERPOXlB91VQRuJ2aMDkCEu8DQHvHJgpGMsXvYeCGQfUSjZ2E OeaWUcNRpZfuF6t0a0TT3FaFbM1tgfg= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-362-Epcs3a6_NoGeuqSXCp9eUg-1; Wed, 12 Oct 2022 22:18:18 -0400 X-MC-Unique: Epcs3a6_NoGeuqSXCp9eUg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4557C8039A6; Thu, 13 Oct 2022 02:18:18 +0000 (UTC) Received: from maya.cloud.tilaa.com (ovpn-208-3.brq.redhat.com [10.40.208.3]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D66772166BA9; Thu, 13 Oct 2022 02:18:17 +0000 (UTC) Date: Thu, 13 Oct 2022 04:17:30 +0200 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 08/10] isolation: Prevent any child processes gaining capabilities Message-ID: <20221013041730.6d75759a@elisabeth> In-Reply-To: <20221011054018.1449506-9-david@gibson.dropbear.id.au> References: <20221011054018.1449506-1-david@gibson.dropbear.id.au> <20221011054018.1449506-9-david@gibson.dropbear.id.au> Organization: Red Hat MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: PSCOJSLF4YFQHAOXNHS4YJ5ZJ43UB5NM X-Message-ID-Hash: PSCOJSLF4YFQHAOXNHS4YJ5ZJ43UB5NM X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: <> Archived-At: List-Archive: <> List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Tue, 11 Oct 2022 16:40:16 +1100 David Gibson wrote: > We drop our own capabilities, but it's possible that processes we exec() > could gain extra privilege via file capabilities. It shouldn't be possible > for us to exec() anyway due to seccomp() and our filesystem isolation. But > just in case, zero the bounding and inheritable capability sets to prevent > any such child from gainin privilege. > > Note that we do this *after* spawning the pasta shell/command (if any), > because we do want the user to be able to give that privilege if they want. > > Signed-off-by: David Gibson > --- > isolation.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 56 insertions(+) > > diff --git a/isolation.c b/isolation.c > index 2468f84..e1a024d 100644 > --- a/isolation.c > +++ b/isolation.c > @@ -120,6 +120,61 @@ static void drop_caps_ep_except(uint64_t keep) > } > } > > +/** > + * clamp_caps() - Prevent any children from gaining caps "clamp" doesn't sound very specific or clear. caps_drop_inherit_bound() would actually tell me what the function does, but it's a bit of a mouthful in comparison. I'm fine with both, really, but if you have a better idea... > + * > + * This drops all capabilities from both the inheritable and the > + * bounding set. This means that any exec()ed processes can't gain > + * capabilities, even if they have file capabilities which would grant > + * them. We shouldn't ever exec() in any case, but this provides an > + * additional layer of protection. Executing this requires > + * CAP_SETPCAP, which we will have within our userns. > + * > + * Note that dropping capabilites from the bounding set limits > + * exec()ed processes, but does not remove them from the effective or > + * permitted sets, so it doesn't reduce our own capabilities. > + */ > +static void clamp_caps(void) > +{ > + struct __user_cap_header_struct hdr = { > + .version = CAP_VERSION, > + .pid = 0, > + }; > + struct __user_cap_data_struct data[CAP_WORDS]; For consistency, I'd move this before hdr. > + int i; > + > + for (i = 0; i < 64; i++) { > + /* Some errors can be ignored: > + * - EINVAL, we'll get this for all values in 0..63 > + * that are not actually allocated caps > + * - EPERM, we'll get this if we don't have > + * CAP_SETPCAP, which can happen if using > + * --netns-only. We don't need CAP_SETPCAP for > + * normal operation, so carry on without it. > + */ > + if (prctl(PR_CAPBSET_DROP, i, 0, 0, 0) && > + errno != EINVAL && errno != EPERM) { > + err("Couldn't drop cap %i from bounding set: %s", > + i, strerror(errno)); > + exit(EXIT_FAILURE); > + } > + } > + > + if (syscall(SYS_capget, &hdr, data)) { > + err("Couldn't get current capabilities: %s", strerror(errno)); > + exit(EXIT_FAILURE); > + } > + > + for (i = 0; i < CAP_WORDS; i++) > + data[i].inheritable = 0; Any specific reason why? Initialisers can have variable sizes to some extent, but if there's a reason why it can't be done, perhaps that would warrant a comment here. > + > + if (syscall(SYS_capset, &hdr, data)) { > + err("Couldn't drop inheritable capabilities: %s", > + strerror(errno)); > + exit(EXIT_FAILURE); > + } > +} > + > /** > * isolate_initial() - Early, config independent self isolation > * > @@ -287,6 +342,7 @@ int isolate_prefork(struct ctx *c) > ns_caps |= 1UL << CAP_NET_BIND_SERVICE; > } > > + clamp_caps(); > drop_caps_ep_except(ns_caps); > > return 0; -- Stefano