From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTP id 0D6955A0265 for ; Thu, 13 Oct 2022 04:17:15 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1665627434; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PoVu1/uxWavVmypt+mKo2cxm3aordCVrzaCmTx/Sdxs=; b=QG6XNx5kO93+aPzhXQHqSlRwVcQNSoPMrLxN68jQpTSigPIoqdQHgs7LLcOXlZsxnXVdDK mI8UpI0ty3TsM6WA3tPENq8i0p9EInLxap33+okc30UOkN8qneugJRUNeomwvAijwFy3OR eboUGWUf13g1psG5no+NsCUJcUsrc5M= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-178-ANIb9urxMq6HUEpS_3-Z3Q-1; Wed, 12 Oct 2022 22:17:13 -0400 X-MC-Unique: ANIb9urxMq6HUEpS_3-Z3Q-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 05AB885A59D; Thu, 13 Oct 2022 02:17:12 +0000 (UTC) Received: from maya.cloud.tilaa.com (ovpn-208-3.brq.redhat.com [10.40.208.3]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 42F97C04483; Thu, 13 Oct 2022 02:17:10 +0000 (UTC) Date: Thu, 13 Oct 2022 04:17:05 +0200 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 05/10] Clarify various self-isolation steps Message-ID: <20221013041705.31f63c5c@elisabeth> In-Reply-To: <20221011054018.1449506-6-david@gibson.dropbear.id.au> References: <20221011054018.1449506-1-david@gibson.dropbear.id.au> <20221011054018.1449506-6-david@gibson.dropbear.id.au> Organization: Red Hat MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: FCFGSBSJXXKVWQMYBSJJ4BNASNHNWNL5 X-Message-ID-Hash: FCFGSBSJXXKVWQMYBSJJ4BNASNHNWNL5 X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: <> Archived-At: List-Archive: <> List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: I think this, in particular, is really a notable improvement. Same here, only nits: On Tue, 11 Oct 2022 16:40:13 +1100 David Gibson wrote: > We have a number of steps of self-isolation scattered across our code. > Improve function names and add comments to make it clearer what the self > isolation model is, what the steps do, and why they happen at the points > they happen. > > Signed-off-by: David Gibson > --- > isolation.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++---- > isolation.h | 6 ++-- > passt.c | 8 ++--- > 3 files changed, 86 insertions(+), 13 deletions(-) > > diff --git a/isolation.c b/isolation.c > index 124dea4..10cef05 100644 > --- a/isolation.c > +++ b/isolation.c > @@ -12,6 +12,48 @@ > * Author: Stefano Brivio > * Author: David Gibson > */ > +/** > + * DOC: Theory of Operation > + * > + * For security the passt/pasta process performs a number of > + * self-isolations steps, dropping capabilities, setting namespaces > + * and otherwise minimizing the impact we can have on the system at > + * large if we were compromised. I would try to be consistent with the BrE spelling we used everywhere else: minimising, daemonising. > + * > + * Obviously we can't isolate ourselves from resources before we've > + * done anything we need to do with those resources, so we have > + * multiple stages of self-isolation. In order these are: > + * > + * 1. isolate_initial() > + * ==================== > + * > + * Executed immediately after startup, drops capabilities we don't > + * need at any point during execution (or which we gain back when we > + * need by joining other namespaces). > + * > + * 2. isolate_user() > + * ================= > + * > + * Executed once we know what user and user namespace we want to > + * operate in. Sets our final UID & GID, and enters the correct user > + * namespace. > + * > + * 3. isolate_prefork() > + * ==================== > + * > + * Executed after all setup, but before daemonizing (fork()ing into > + * the background). Uses mount namespace and pivot_root() to remove > + * our access to the filesystem(). filesystem() is not a function I know about. :) > + * > + * 4. isolate_postfork() > + * ===================== > + * > + * Executed immediately after daemonizing, but before entering the > + * actual packet forwarding phase of operation. Or, if not > + * daemonizing, immediately after isolate_prefork(). Uses seccomp() > + * to restrict ourselves to the handful of syscalls we need during > + * runtime operation. > + */ > > #include > #include > @@ -47,7 +89,7 @@ > /** > * drop_caps() - Drop capabilities we might have except for CAP_NET_BIND_SERVICE > */ > -void drop_caps(void) > +static void drop_caps(void) > { > int i; > > @@ -59,12 +101,31 @@ void drop_caps(void) > } > } > > +/** > + * isolate_initial() - Early, config independent self isolation > + * > + * Should: > + * - drop unneeded capabilities > + * Musn't: > + * - remove filessytem access (we need to access files during setup) > + */ > +void isolate_initial(void) > +{ > + drop_caps(); > +} > + > /** > * isolate_user() - Switch to final UID/GID and move into userns > * @uid: User ID to run as (in original userns) > * @gid: Group ID to run as (in original userns) > * @use_userns: Whether to join or create a userns > * @userns: userns path to enter, may be empty > + * > + * Should: > + * - set our final UID and GID > + * - enter our final user namespace > + * Mustn't: > + * - remove filesystem access (we need that for further setup) > */ > void isolate_user(uid_t uid, gid_t gid, bool use_userns, const char *userns) > { > @@ -134,11 +195,19 @@ void isolate_user(uid_t uid, gid_t gid, bool use_userns, const char *userns) > } > > /** > - * sandbox() - Unshare IPC, mount, PID, UTS, and user namespaces, "unmount" root > + * isolate_prefork() - Self isolation before daemonizing > + * @c: Execution context > + * What does it return? > + * Should: > + * - Moves us to our own IPC and UTS namespaces > + * - Moves us to a mount namespace with only an empty directory > + * - Drops unneeded capabilities (in the new user namespace) - move us ... - drop ... now, this will not move us into a new PID namespace, so it's a bit difficult to summarise in a very short form what it does with regard to that, and the comment below is already descriptive enough -- unless you can think of something. > + * Mustn't: > + * - Remove syscalls we need to daemonize "remove", "daemonise", for consistency. > * > * Return: negative error code on failure, zero on success > */ > -int sandbox(struct ctx *c) > +int isolate_prefork(struct ctx *c) > { > int flags = CLONE_NEWIPC | CLONE_NEWNS | CLONE_NEWUTS; > > @@ -187,13 +256,19 @@ int sandbox(struct ctx *c) > } > > /** > - * seccomp() - Set up seccomp filters depending on mode, won't return on failure > + * isolate_postfork() - Self isolation after daemonizing > * @c: Execution context > + * > + * Should: > + * - disable core dumps > + * - limit to a minimal set of syscalls > */ > -void seccomp(const struct ctx *c) > +void isolate_postfork(const struct ctx *c) > { > struct sock_fprog prog; > > + prctl(PR_SET_DUMPABLE, 0); > + > if (c->mode == MODE_PASST) { > prog.len = (unsigned short)ARRAY_SIZE(filter_passt); > prog.filter = filter_passt; > diff --git a/isolation.h b/isolation.h > index 2c73df7..70a38f9 100644 > --- a/isolation.h > +++ b/isolation.h > @@ -7,9 +7,9 @@ > #ifndef ISOLATION_H > #define ISOLATION_H > > -void drop_caps(void); > +void isolate_initial(void); > void isolate_user(uid_t uid, gid_t gid, bool use_userns, const char *userns); > -int sandbox(struct ctx *c); > -void seccomp(const struct ctx *c); > +int isolate_prefork(struct ctx *c); > +void isolate_postfork(const struct ctx *c); > > #endif /* ISOLATION_H */ > diff --git a/passt.c b/passt.c > index 2c4a986..46f80a0 100644 > --- a/passt.c > +++ b/passt.c > @@ -184,7 +184,7 @@ int main(int argc, char **argv) > > arch_avx2_exec(argv); > > - drop_caps(); > + isolate_initial(); > > c.pasta_netns_fd = c.fd_tap = c.fd_tap_listen = -1; > > @@ -287,7 +287,7 @@ int main(int argc, char **argv) > } > } > > - if (sandbox(&c)) { > + if (isolate_prefork(&c)) { > err("Failed to sandbox process, exiting\n"); > exit(EXIT_FAILURE); > } > @@ -297,9 +297,7 @@ int main(int argc, char **argv) > else > write_pidfile(pidfile_fd, getpid()); > > - prctl(PR_SET_DUMPABLE, 0); > - > - seccomp(&c); > + isolate_postfork(&c); > > timer_init(&c, &now); > -- Stefano