From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTP id E68A85A005E for ; Thu, 13 Oct 2022 18:37:26 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1665679045; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kJxuHocXio1ScG5k4zd4F//snbFihHQ5Vjb9OdPuTGE=; b=huyWfdHx1mgXHJJaI2qaMDWo1sp6/7XaK3Bg1tNgj6aP2CEorJIT07ctwLjGD/xVyeRGLM EpK44wwuJ8moCwr+rz0SyEhtSS3lpNgdFrVRdnDYriL7wkb3OkKG+zXwyzPYWqqaiNgBZa i7cYe6IQAGufiuC5g8875V9z8KYBs+w= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-617-Azgi6nDLMjuKqHdPh4Vvuw-1; Thu, 13 Oct 2022 12:37:22 -0400 X-MC-Unique: Azgi6nDLMjuKqHdPh4Vvuw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B50531C075A6; Thu, 13 Oct 2022 16:37:16 +0000 (UTC) Received: from maya.cloud.tilaa.com (ovpn-208-3.brq.redhat.com [10.40.208.3]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4B3D89D48F; Thu, 13 Oct 2022 16:37:09 +0000 (UTC) Date: Thu, 13 Oct 2022 18:37:05 +0200 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 07/10] isolation: Replace drop_caps() with a version that actually does something Message-ID: <20221013183705.56e28681@elisabeth> In-Reply-To: <20221013150802.356cf532@elisabeth> References: <20221011054018.1449506-1-david@gibson.dropbear.id.au> <20221011054018.1449506-8-david@gibson.dropbear.id.au> <20221013060119.48d51493@elisabeth> <20221013150802.356cf532@elisabeth> Organization: Red Hat MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: IF55KIC6Y6F6LRGIWO6JFVKWIRMTZ3TT X-Message-ID-Hash: IF55KIC6Y6F6LRGIWO6JFVKWIRMTZ3TT X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: <> Archived-At: List-Archive: <> List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, 13 Oct 2022 15:08:02 +0200 Stefano Brivio wrote: > On Thu, 13 Oct 2022 06:01:19 +0200 > Stefano Brivio wrote: > > > On Tue, 11 Oct 2022 16:40:15 +1100 > > David Gibson wrote: > > > > > @@ -251,7 +275,19 @@ int isolate_prefork(struct ctx *c) > > > return -errno; > > > } > > > > > > - drop_caps(); /* Relative to the new user namespace this time. */ > > > + /* Drop capabilites in our new userns */ > > > + if (c->mode == MODE_PASTA) { > > > + /* Keep CAP_SYS_ADMIN, so that we can setns() to the > > > + * netns when we need to act upon it > > > + */ > > > + ns_caps |= 1UL << CAP_SYS_ADMIN; > > > + /* Keep CAP_NET_BIND_SERVICE, so we can splice > > > + * outbound connections to low port numbers > > > + */ > > > + ns_caps |= 1UL << CAP_NET_BIND_SERVICE; > > > + } > > > + > > > + drop_caps_ep_except(ns_caps); > > > > Hmm, I didn't really look into this yet, but there seems to be an issue > > with filesystem-bound network namespaces now. Running something like: > > > > pasta --config-net --netns /run/user/1000/netns/netns-6466ff4b-1efc-2b58-685b-cbc12feb9ccc > > > > (from Podman), this happens: > > > > [...] > > > > [pid 1763223] setns(7, CLONE_NEWNET) = -1 EPERM (Operation not permitted) > > Ah, "of course". Podman calls us with UID 0 in the user namespace it > just created, so if we drop CAP_SYS_ADMIN in isolate_initial() we can't > join the network namespace, and if we drop CAP_NET_ADMIN we can't > configure it. > > So for that case (and only for that, I suppose), we need something like > (tested): > > diff --git a/isolation.c b/isolation.c > index 1769180..fee6dbd 100644 > --- a/isolation.c > +++ b/isolation.c > @@ -190,7 +190,7 @@ void isolate_initial(void) > * namespace if we have it, so that we can forward low ports > * into the guest/namespace > */ > - drop_caps_ep_except((1UL << CAP_NET_BIND_SERVICE)); > + drop_caps_ep_except(BIT(CAP_SYS_ADMIN) | BIT(CAP_NET_ADMIN)); > } > > ...which is a bit pointless. Better than *any* capability, but not by > far. > > So, if we make this totally independent from configuration, we need > those two capabilities. > > We could add a "postconf" stage and cover a tiny bit more of conf.c. > > Or we could have a special path in isolate_initial() for the case we > know we're not in the init namespace. > > I'm not sure. If you have a specific preference/strong opinion I would > actually be happier. :) Further on, if we are started as root, we'll fail to drop to 'nobody' or any other user, if we lose CAP_SETUID and CAP_SETGID here. I have tested this version of isolate_initial(): drop_caps_ep_except(BIT(CAP_SYS_ADMIN) | BIT(CAP_NET_ADMIN) | BIT(CAP_SETGID) | BIT(CAP_SETUID) | BIT(CAP_NET_BIND_SERVICE)); for any use case I can reasonably think of. Yes, it's a lot -- we should make it really clear that those are not the capabilities we actually use at "run time". -- Stefano