From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 569E65A005E for ; Thu, 13 Oct 2022 11:37:26 +0200 (CEST) Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4Mp4Dg17fZz4xGy; Thu, 13 Oct 2022 20:37:23 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=201602; t=1665653843; bh=YtLHFlxD5fmNGfdOXEMn/6s4oL3qpiA/auLY0qCecRY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=DLZYjt5DyN2nLMRF9Xt9qiNYl5f+iUNekR//Cr66n4sQk5/3eFq+pRoWvolhCwgP0 r1lvwM1q7UFVZO1Z/Bw8wr1WsAtCJxqsxaotBirjb5VNyYpu5TW80QJO6Y3aXgfiit WNVHB4ptlWmeTRcHNKNA6kWLcpSoPH5ozQlzBZJk= Date: Thu, 13 Oct 2022 19:31:15 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 05/10] Clarify various self-isolation steps Message-ID: References: <20221011054018.1449506-1-david@gibson.dropbear.id.au> <20221011054018.1449506-6-david@gibson.dropbear.id.au> <20221013041705.31f63c5c@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="Sj+oxTSrWkI+kXuM" Content-Disposition: inline In-Reply-To: <20221013041705.31f63c5c@elisabeth> Message-ID-Hash: CA3O3WOFZHZE63EE7ZMONBTOBDLYSLQL X-Message-ID-Hash: CA3O3WOFZHZE63EE7ZMONBTOBDLYSLQL X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.3 Precedence: list List-Id: Development discussion and patches for passt Archived-At: <> Archived-At: List-Archive: <> List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --Sj+oxTSrWkI+kXuM Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Oct 13, 2022 at 04:17:05AM +0200, Stefano Brivio wrote: > I think this, in particular, is really a notable improvement. Same here, > only nits: >=20 > On Tue, 11 Oct 2022 16:40:13 +1100 > David Gibson wrote: >=20 > > We have a number of steps of self-isolation scattered across our code. > > Improve function names and add comments to make it clearer what the self > > isolation model is, what the steps do, and why they happen at the points > > they happen. > >=20 > > Signed-off-by: David Gibson > > --- > > isolation.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++---- > > isolation.h | 6 ++-- > > passt.c | 8 ++--- > > 3 files changed, 86 insertions(+), 13 deletions(-) > >=20 > > diff --git a/isolation.c b/isolation.c > > index 124dea4..10cef05 100644 > > --- a/isolation.c > > +++ b/isolation.c > > @@ -12,6 +12,48 @@ > > * Author: Stefano Brivio > > * Author: David Gibson > > */ > > +/** > > + * DOC: Theory of Operation > > + * > > + * For security the passt/pasta process performs a number of > > + * self-isolations steps, dropping capabilities, setting namespaces > > + * and otherwise minimizing the impact we can have on the system at > > + * large if we were compromised. >=20 > I would try to be consistent with the BrE spelling we used everywhere > else: minimising, daemonising. Ah, sure. Here in australia we mostly use british spelling, but tend to use 'izing' rather than 'ising'. I've updated it. > > + * > > + * Obviously we can't isolate ourselves from resources before we've > > + * done anything we need to do with those resources, so we have > > + * multiple stages of self-isolation. In order these are: > > + * > > + * 1. isolate_initial() > > + * =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > + * > > + * Executed immediately after startup, drops capabilities we don't > > + * need at any point during execution (or which we gain back when we > > + * need by joining other namespaces). > > + * > > + * 2. isolate_user() > > + * =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > + * > > + * Executed once we know what user and user namespace we want to > > + * operate in. Sets our final UID & GID, and enters the correct user > > + * namespace. > > + * > > + * 3. isolate_prefork() > > + * =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > + * > > + * Executed after all setup, but before daemonizing (fork()ing into > > + * the background). Uses mount namespace and pivot_root() to remove > > + * our access to the filesystem(). >=20 > filesystem() is not a function I know about. :) Heh, oops. > > + * > > + * 4. isolate_postfork() > > + * =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > + * > > + * Executed immediately after daemonizing, but before entering the > > + * actual packet forwarding phase of operation. Or, if not > > + * daemonizing, immediately after isolate_prefork(). Uses seccomp() > > + * to restrict ourselves to the handful of syscalls we need during > > + * runtime operation. > > + */ > > =20 > > #include > > #include > > @@ -47,7 +89,7 @@ > > /** > > * drop_caps() - Drop capabilities we might have except for CAP_NET_BI= ND_SERVICE > > */ > > -void drop_caps(void) > > +static void drop_caps(void) > > { > > int i; > > =20 > > @@ -59,12 +101,31 @@ void drop_caps(void) > > } > > } > > =20 > > +/** > > + * isolate_initial() - Early, config independent self isolation > > + * > > + * Should: > > + * - drop unneeded capabilities > > + * Musn't: > > + * - remove filessytem access (we need to access files during setup) > > + */ > > +void isolate_initial(void) > > +{ > > + drop_caps(); > > +} > > + > > /** > > * isolate_user() - Switch to final UID/GID and move into userns > > * @uid: User ID to run as (in original userns) > > * @gid: Group ID to run as (in original userns) > > * @use_userns: Whether to join or create a userns > > * @userns: userns path to enter, may be empty > > + * > > + * Should: > > + * - set our final UID and GID > > + * - enter our final user namespace > > + * Mustn't: > > + * - remove filesystem access (we need that for further setup) > > */ > > void isolate_user(uid_t uid, gid_t gid, bool use_userns, const char *u= serns) > > { > > @@ -134,11 +195,19 @@ void isolate_user(uid_t uid, gid_t gid, bool use_= userns, const char *userns) > > } > > =20 > > /** > > - * sandbox() - Unshare IPC, mount, PID, UTS, and user namespaces, "unm= ount" root > > + * isolate_prefork() - Self isolation before daemonizing > > + * @c: Execution context > > + * >=20 > What does it return? I had that below, I've moved it up here. > > + * Should: > > + * - Moves us to our own IPC and UTS namespaces > > + * - Moves us to a mount namespace with only an empty directory > > + * - Drops unneeded capabilities (in the new user namespace) >=20 > - move us ... > - drop ... Fixed. > now, this will not move us into a new PID namespace, so it's a bit > difficult to summarise in a very short form what it does with regard to > that, and the comment below is already descriptive enough -- unless you > can think of something. Right. These lists aren't supposed to be totally exhaustive, just covering the key points. > > + * Mustn't: > > + * - Remove syscalls we need to daemonize >=20 > "remove", "daemonise", for consistency. Adjusted. > > * > > * Return: negative error code on failure, zero on success > > */ > > -int sandbox(struct ctx *c) > > +int isolate_prefork(struct ctx *c) > > { > > int flags =3D CLONE_NEWIPC | CLONE_NEWNS | CLONE_NEWUTS; > > =20 > > @@ -187,13 +256,19 @@ int sandbox(struct ctx *c) > > } > > =20 > > /** > > - * seccomp() - Set up seccomp filters depending on mode, won't return = on failure > > + * isolate_postfork() - Self isolation after daemonizing > > * @c: Execution context > > + * > > + * Should: > > + * - disable core dumps > > + * - limit to a minimal set of syscalls > > */ > > -void seccomp(const struct ctx *c) > > +void isolate_postfork(const struct ctx *c) > > { > > struct sock_fprog prog; > > =20 > > + prctl(PR_SET_DUMPABLE, 0); > > + > > if (c->mode =3D=3D MODE_PASST) { > > prog.len =3D (unsigned short)ARRAY_SIZE(filter_passt); > > prog.filter =3D filter_passt; > > diff --git a/isolation.h b/isolation.h > > index 2c73df7..70a38f9 100644 > > --- a/isolation.h > > +++ b/isolation.h > > @@ -7,9 +7,9 @@ > > #ifndef ISOLATION_H > > #define ISOLATION_H > > =20 > > -void drop_caps(void); > > +void isolate_initial(void); > > void isolate_user(uid_t uid, gid_t gid, bool use_userns, const char *u= serns); > > -int sandbox(struct ctx *c); > > -void seccomp(const struct ctx *c); > > +int isolate_prefork(struct ctx *c); > > +void isolate_postfork(const struct ctx *c); > > =20 > > #endif /* ISOLATION_H */ > > diff --git a/passt.c b/passt.c > > index 2c4a986..46f80a0 100644 > > --- a/passt.c > > +++ b/passt.c > > @@ -184,7 +184,7 @@ int main(int argc, char **argv) > > =20 > > arch_avx2_exec(argv); > > =20 > > - drop_caps(); > > + isolate_initial(); > > =20 > > c.pasta_netns_fd =3D c.fd_tap =3D c.fd_tap_listen =3D -1; > > =20 > > @@ -287,7 +287,7 @@ int main(int argc, char **argv) > > } > > } > > =20 > > - if (sandbox(&c)) { > > + if (isolate_prefork(&c)) { > > err("Failed to sandbox process, exiting\n"); > > exit(EXIT_FAILURE); > > } > > @@ -297,9 +297,7 @@ int main(int argc, char **argv) > > else > > write_pidfile(pidfile_fd, getpid()); > > =20 > > - prctl(PR_SET_DUMPABLE, 0); > > - > > - seccomp(&c); > > + isolate_postfork(&c); > > =20 > > timer_init(&c, &now); > > =20 >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --Sj+oxTSrWkI+kXuM Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEoULxWu4/Ws0dB+XtgypY4gEwYSIFAmNHzMsACgkQgypY4gEw YSLhbQ//a1XCLZx496ExqDAAIEFR0yZ39j4wYehY6h2YMgkleU991CDvzqUNrZ53 Spd1VmchjB7v60B1AqjPZ1bdlTjUpsN7HIH8L8SBRKc11MGVRdVFL/+Vs5guZ/kB KgbMLmI/BFsOfCAx4nUhnirmpikzDo97lgahtKtaWw2TA+HjaFPsd8LRPp524GsZ Tg+aNVgWixohnHUWyZ1QHD8Hg7g3zyiOI4hhLHsYzFUL3StdXohR8O2gwd/UQ7PK G2Wg9TG/hDFkb4CA6WpPW/eZ5SWDyyVXNC7cFazeQaMX+fAf0PajaIBdCAQ5gDZD jvXsb9/J3NGHpuME7ciM0geQsDNWvqs2MnIFYPRfhNiU/7esNO80iH37w2QghMXY EbfEfeM+J3iIwX81ajJKaOw+7h1swy6iv3lMtMI9jccF0hythJuWctUOJtC8IoZe N+PTdnXb/pNxUg/Bhgt24ScjTNCM5ZiQmxIKbdzVBLLYJJXevEkSOdddn8+QTrYg m8UEXEkBdcvfc2H/hbYRoVl5HQmLGF/jk62zSS3Eond1xf9Gb10RIHkh09oVV+WR O4F/M+MlEF5aQn7YvK18qanCcqRcM+UQUuvxPD/TmEZl29yQIeVaTSvGp3UOWakq TBAWpzgQ7u4mg5OBsKsrdvRdI3F8yJXVRZqibyldAzL9sZgvQTw= =+OCb -----END PGP SIGNATURE----- --Sj+oxTSrWkI+kXuM--