From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTP id 487095A005E for ; Thu, 13 Oct 2022 11:50:57 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1665654656; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LRXUijdismf9nkqJPJBaLnRKpy5H3+SMkH2Xg7iXBdM=; b=Hij43nIwEP/yMkxxS3jL47OgnfDk1va6JozwdOgKAyCA/cXEM7rLnLrjMgtmoN9ysg/eQx 48ePEAiuUMpgzcJVG1BXVcLEuVlotvnBYxQqU+SCYBZtvb05ovQE3Fmx5hwZ4VeNjt38zz MmZNSnPhGO1ITZxzQj6YXn2n82HYNvs= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-312-2yE7fwU-OJeDm6GjP6cwpw-1; Thu, 13 Oct 2022 05:50:55 -0400 X-MC-Unique: 2yE7fwU-OJeDm6GjP6cwpw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id F32ED811E67; Thu, 13 Oct 2022 09:50:54 +0000 (UTC) Received: from maya.cloud.tilaa.com (ovpn-208-3.brq.redhat.com [10.40.208.3]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 8D96C1121314; Thu, 13 Oct 2022 09:50:54 +0000 (UTC) Date: Thu, 13 Oct 2022 11:50:51 +0200 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 08/10] isolation: Prevent any child processes gaining capabilities Message-ID: <20221013115051.6f48f50a@elisabeth> In-Reply-To: References: <20221011054018.1449506-1-david@gibson.dropbear.id.au> <20221011054018.1449506-9-david@gibson.dropbear.id.au> <20221013041730.6d75759a@elisabeth> Organization: Red Hat MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: JTA22DYJBFVBVHAHHIM4GSQKNZQNYBFM X-Message-ID-Hash: JTA22DYJBFVBVHAHHIM4GSQKNZQNYBFM X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: <> Archived-At: List-Archive: <> List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, 13 Oct 2022 20:33:34 +1100 David Gibson wrote: > On Thu, Oct 13, 2022 at 04:17:30AM +0200, Stefano Brivio wrote: > > On Tue, 11 Oct 2022 16:40:16 +1100 > > David Gibson wrote: > > > > > We drop our own capabilities, but it's possible that processes we exec() > > > could gain extra privilege via file capabilities. It shouldn't be possible > > > for us to exec() anyway due to seccomp() and our filesystem isolation. But > > > just in case, zero the bounding and inheritable capability sets to prevent > > > any such child from gainin privilege. > > > > > > Note that we do this *after* spawning the pasta shell/command (if any), > > > because we do want the user to be able to give that privilege if they want. > > > > > > Signed-off-by: David Gibson > > > --- > > > isolation.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 56 insertions(+) > > > > > > diff --git a/isolation.c b/isolation.c > > > index 2468f84..e1a024d 100644 > > > --- a/isolation.c > > > +++ b/isolation.c > > > @@ -120,6 +120,61 @@ static void drop_caps_ep_except(uint64_t keep) > > > } > > > } > > > > > > +/** > > > + * clamp_caps() - Prevent any children from gaining caps > > > > "clamp" doesn't sound very specific or clear. caps_drop_inherit_bound() > > would actually tell me what the function does, but it's a bit of a > > mouthful in comparison. I'm fine with both, really, but if you have a > > better idea... > > Yeah, I couldn't think of something that was both brief and specific, > so I went with brief. > > > > + * > > > + * This drops all capabilities from both the inheritable and the > > > + * bounding set. This means that any exec()ed processes can't gain > > > + * capabilities, even if they have file capabilities which would grant > > > + * them. We shouldn't ever exec() in any case, but this provides an > > > + * additional layer of protection. Executing this requires > > > + * CAP_SETPCAP, which we will have within our userns. > > > + * > > > + * Note that dropping capabilites from the bounding set limits > > > + * exec()ed processes, but does not remove them from the effective or > > > + * permitted sets, so it doesn't reduce our own capabilities. > > > + */ > > > +static void clamp_caps(void) > > > +{ > > > + struct __user_cap_header_struct hdr = { > > > + .version = CAP_VERSION, > > > + .pid = 0, > > > + }; > > > + struct __user_cap_data_struct data[CAP_WORDS]; > > > > For consistency, I'd move this before hdr. > > Ok. > > > > + int i; > > > + > > > + for (i = 0; i < 64; i++) { > > > + /* Some errors can be ignored: > > > + * - EINVAL, we'll get this for all values in 0..63 > > > + * that are not actually allocated caps > > > + * - EPERM, we'll get this if we don't have > > > + * CAP_SETPCAP, which can happen if using > > > + * --netns-only. We don't need CAP_SETPCAP for > > > + * normal operation, so carry on without it. > > > + */ > > > + if (prctl(PR_CAPBSET_DROP, i, 0, 0, 0) && > > > + errno != EINVAL && errno != EPERM) { > > > + err("Couldn't drop cap %i from bounding set: %s", > > > + i, strerror(errno)); > > > + exit(EXIT_FAILURE); > > > + } > > > + } > > > + > > > + if (syscall(SYS_capget, &hdr, data)) { > > > + err("Couldn't get current capabilities: %s", strerror(errno)); > > > + exit(EXIT_FAILURE); > > > + } > > > + > > > + for (i = 0; i < CAP_WORDS; i++) > > > + data[i].inheritable = 0; > > > > Any specific reason why? Initialisers can have variable sizes to some > > extent, but if there's a reason why it can't be done, perhaps that > > would warrant a comment here. > > Why what? We're not trying to alter the permitted or effective sets > here, so we're doing a capget() first, zeroing the inheritable field, > then setting it back again. Oops, never mind, of course, I missed the capget() for a moment. -- Stefano