From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 2C23E5A026F for ; Tue, 11 Oct 2022 07:40:27 +0200 (CEST) Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4Mml450F9mz4xGv; Tue, 11 Oct 2022 16:40:21 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=201602; t=1665466821; bh=jOvx+PWpl6/4uH+Lt5UfUTrXIMBKQZEYNeBPpGfU/mU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ZFQ7EHYZPqFNEnXkxlrD1J5RWMlHSvJMcwlU0D0o6umIz5mofA1J6nQR2IAp7+zLa reSTWGVOfC9LjJ7TmPhYbAPE+DVemt9ch50xuJa18GaBFykB1GXtJOtupVY8yejnYM G4UiavWQWkVZW2YbiCOPdHX/cpeqyPChCFCsA74g= From: David Gibson To: Stefano Brivio Subject: [PATCH 08/10] isolation: Prevent any child processes gaining capabilities Date: Tue, 11 Oct 2022 16:40:16 +1100 Message-Id: <20221011054018.1449506-9-david@gibson.dropbear.id.au> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221011054018.1449506-1-david@gibson.dropbear.id.au> References: <20221011054018.1449506-1-david@gibson.dropbear.id.au> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-ID-Hash: 374NLSQCBQSMQUXKCV4LHVEKLM3W26W2 X-Message-ID-Hash: 374NLSQCBQSMQUXKCV4LHVEKLM3W26W2 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, David Gibson X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: <> Archived-At: List-Archive: <> List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: We drop our own capabilities, but it's possible that processes we exec() could gain extra privilege via file capabilities. It shouldn't be possible for us to exec() anyway due to seccomp() and our filesystem isolation. But just in case, zero the bounding and inheritable capability sets to prevent any such child from gainin privilege. Note that we do this *after* spawning the pasta shell/command (if any), because we do want the user to be able to give that privilege if they want. Signed-off-by: David Gibson --- isolation.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) diff --git a/isolation.c b/isolation.c index 2468f84..e1a024d 100644 --- a/isolation.c +++ b/isolation.c @@ -120,6 +120,61 @@ static void drop_caps_ep_except(uint64_t keep) } } +/** + * clamp_caps() - Prevent any children from gaining caps + * + * This drops all capabilities from both the inheritable and the + * bounding set. This means that any exec()ed processes can't gain + * capabilities, even if they have file capabilities which would grant + * them. We shouldn't ever exec() in any case, but this provides an + * additional layer of protection. Executing this requires + * CAP_SETPCAP, which we will have within our userns. + * + * Note that dropping capabilites from the bounding set limits + * exec()ed processes, but does not remove them from the effective or + * permitted sets, so it doesn't reduce our own capabilities. + */ +static void clamp_caps(void) +{ + struct __user_cap_header_struct hdr = { + .version = CAP_VERSION, + .pid = 0, + }; + struct __user_cap_data_struct data[CAP_WORDS]; + int i; + + for (i = 0; i < 64; i++) { + /* Some errors can be ignored: + * - EINVAL, we'll get this for all values in 0..63 + * that are not actually allocated caps + * - EPERM, we'll get this if we don't have + * CAP_SETPCAP, which can happen if using + * --netns-only. We don't need CAP_SETPCAP for + * normal operation, so carry on without it. + */ + if (prctl(PR_CAPBSET_DROP, i, 0, 0, 0) && + errno != EINVAL && errno != EPERM) { + err("Couldn't drop cap %i from bounding set: %s", + i, strerror(errno)); + exit(EXIT_FAILURE); + } + } + + if (syscall(SYS_capget, &hdr, data)) { + err("Couldn't get current capabilities: %s", strerror(errno)); + exit(EXIT_FAILURE); + } + + for (i = 0; i < CAP_WORDS; i++) + data[i].inheritable = 0; + + if (syscall(SYS_capset, &hdr, data)) { + err("Couldn't drop inheritable capabilities: %s", + strerror(errno)); + exit(EXIT_FAILURE); + } +} + /** * isolate_initial() - Early, config independent self isolation * @@ -287,6 +342,7 @@ int isolate_prefork(struct ctx *c) ns_caps |= 1UL << CAP_NET_BIND_SERVICE; } + clamp_caps(); drop_caps_ep_except(ns_caps); return 0; -- 2.37.3