From mboxrd@z Thu Jan 1 00:00:00 1970 Received: by passt.top (Postfix, from userid 1000) id 10C205A0284; Mon, 22 May 2023 10:52:05 +0200 (CEST) From: Stefano Brivio To: passt-dev@passt.top Subject: [PATCH v2 3/3] isolation: Initially Keep CAP_SETFCAP if running as UID 0 in non-init Date: Mon, 22 May 2023 10:52:05 +0200 Message-Id: <20230522085205.2803560-4-sbrivio@redhat.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230522085205.2803560-1-sbrivio@redhat.com> References: <20230522085205.2803560-1-sbrivio@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-ID-Hash: KP2QP2PIHWCHEVLQVAJFCJII3VPFBKV6 X-Message-ID-Hash: KP2QP2PIHWCHEVLQVAJFCJII3VPFBKV6 X-MailFrom: sbrivio@passt.top X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: David Gibson X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: If pasta spawns a child process while running as UID 0, which is only allowed from a non-init namespace, we need to keep CAP_SETFCAP before pasta_start_ns() is called: otherwise, starting from Linux 5.12, we won't be able to update /proc/self/uid_map with the intended mapping (from 0 to 0). See user_namespaces(7). Signed-off-by: Stefano Brivio Reviewed-by: David Gibson --- isolation.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/isolation.c b/isolation.c index 5f89047..19932bf 100644 --- a/isolation.c +++ b/isolation.c @@ -177,6 +177,8 @@ static void clamp_caps(void) */ void isolate_initial(void) { + uint64_t keep; + /* We want to keep CAP_NET_BIND_SERVICE in the initial * namespace if we have it, so that we can forward low ports * into the guest/namespace @@ -193,9 +195,18 @@ void isolate_initial(void) * further capabilites in isolate_user() and * isolate_prefork(). */ - drop_caps_ep_except(BIT(CAP_NET_BIND_SERVICE) | - BIT(CAP_SETUID) | BIT(CAP_SETGID) | - BIT(CAP_SYS_ADMIN) | BIT(CAP_NET_ADMIN)); + keep = BIT(CAP_NET_BIND_SERVICE) | BIT(CAP_SETUID) | BIT(CAP_SETGID) | + BIT(CAP_SYS_ADMIN) | BIT(CAP_NET_ADMIN); + + /* Since Linux 5.12, if we want to update /proc/self/uid_map to create + * a mapping from UID 0, which only happens with pasta spawning a child + * from a non-init user namespace (pasta can't run as root), we need to + * retain CAP_SETFCAP too. + */ + if (!ns_is_init() && !geteuid()) + keep |= BIT(CAP_SETFCAP); + + drop_caps_ep_except(keep); } /** -- 2.39.2