From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=F2DUVGtr; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id 8CFDE5A0262 for ; Sat, 04 Jul 2026 17:27:15 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1783178834; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VBe/ptY2otspCvjdd6pBngzOFKfFzmr6x88fhRt5Dak=; b=F2DUVGtrSQt/9h4HoXgTzfUwl/keAir2tN1MHMk/S73gyeMBZRqYNB4fw7yfXQii9vcSr0 nDnYDYpnmQmayJ/d/T223MH48CaqJ1chOtFmVejuaIs0WE3iw+CwKIGVFZqI++86PoNhpf //nT6/lDAXd2o7YDsAm57ZbYHYDxRww= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-149-6JNWNDxCOlSDFDUUKkavkw-1; Sat, 04 Jul 2026 11:27:12 -0400 X-MC-Unique: 6JNWNDxCOlSDFDUUKkavkw-1 X-Mimecast-MFC-AGG-ID: 6JNWNDxCOlSDFDUUKkavkw_1783178831 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-47162f83c75so1781555f8f.1 for ; Sat, 04 Jul 2026 08:27:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1783178831; x=1783783631; h=date:content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=VBe/ptY2otspCvjdd6pBngzOFKfFzmr6x88fhRt5Dak=; b=Sv5331jYW/WLBlvmx3GWhRqUetzfhdV5ZEgDaBQ1tOHgubFMYstRneq8DoFlszKqi0 /zg07CxfF8WFcLoIOf9pDS272XqK4JCvL+yKsuj83pf0nlUCHmY0t9PQ8hzoxzXhMNOX 68A34AoTiPFAmMmsynugkFHxztLmqNhe3olZffTk50DCd28y8YaaO49G9YMjt7LwhngE bZwydPLGF68MNY5XtPOo3GOaaf0zstqfaLatV/IF2icSZvmmillAm0LGkhQQw2IAcKvC lAGaAMWIE0DzirzMBlc1JdsH+Ylz/EAgZEUDCNHAPJth8Ef9EF8K2pv96gcyUzLhiPY7 u7Fw== X-Gm-Message-State: AOJu0Yw876CzQ1A97hPgzdXofOr9twcJ0EQQTdxVyzuSc9USfR/HPVWA RvscsRx+anhKugNYWXyDDaKzvLw30h233HRA0aaHRdRSPYEy7w8NG8+S85Vu9K4d3uHQed/v4iP sMvMvpud45aVCDTOjCFgpxG7xUc6WjnvH1s5mnvCmJNz30UxmUrEZqGt5P90eOQ== X-Gm-Gg: AfdE7cnKH3aseNo6230DkxwLex0OPcM2SkQYxVyKx/2dN9Hg3zKWIB6qzV7g6Vvs+Xk UtoNGm7ZU5ucSGFn/h5AZ3M5vDPeJev1R4oBD+PXI/3aTwSYPMxp37F0VQ51FxQ0eejFoArejDX VmzGUusgbIoW3uOMdYEk8+Vah5pzRzAAUP9A5j1KXHjQwLSeWxqETuKnP7NCJ1jSfDehDhgq/dl veh6xg394jFOFcxZUz3gTxypXqb9rQfJooSuDND0frg9PG6RYK66kzh/pvNYNdjyVywgT1Wyy7J ynZnrz5yBDgGHXPnvC86lcjSnrVHxwYKGF7dF9U213K9nm+a8zCLPwdgJqnq9JybapN2pZyq8x1 9i6SJ07QnMWo7PhHyeL16tA== X-Received: by 2002:a05:6000:288b:b0:475:b34a:dc1b with SMTP id ffacd0b85a97d-47abc8b7f0bmr3771864f8f.21.1783178831187; Sat, 04 Jul 2026 08:27:11 -0700 (PDT) X-Received: by 2002:a05:6000:288b:b0:475:b34a:dc1b with SMTP id ffacd0b85a97d-47abc8b7f0bmr3771810f8f.21.1783178830575; Sat, 04 Jul 2026 08:27:10 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-47a9e3e2702sm9102229f8f.9.2026.07.04.08.27.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 04 Jul 2026 08:27:09 -0700 (PDT) From: Stefano Brivio To: Mateusz Andrzejewski Subject: Re: [PATCH v2] isolation: Add --chroot-fallback option Message-ID: <20260704172708.6f884d5c@elisabeth> In-Reply-To: <20260702071526.2460400-1-mateusz.andrzejewski@mikronika.com.pl> References: <20260702071526.2460400-1-mateusz.andrzejewski@mikronika.com.pl> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 Date: Sat, 04 Jul 2026 17:27:09 +0200 (CEST) X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: DP-Y1rMrC7_1eoSZWjykJTNlODeEZB4ZhHslRp4Hx0o_1783178831 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: PYIW44KNYEXD6Q3BYW5GEGGLO26VOUM6 X-Message-ID-Hash: PYIW44KNYEXD6Q3BYW5GEGGLO26VOUM6 X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, piotr.bzdrega@mikronika.com.pl, mateusz.andrzejewski@mikronika.com.pl X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Mateusz, thanks for following up. I see a few very minor issues, reported inline below, that I can fix up on merge if you agree (but you can re-post, as you wish). On Thu, 2 Jul 2026 09:13:31 +0200 Mateusz Andrzejewski wrote: > For integrations, which use rootfs on tmpfs or initramfs, it is not > allowed to use pivot_root(). It results with invalid argument (EINVAL) > error. Introduce --chroot-fallback option as a workaround with > MS_MOVE + chroot(). > > Due to weaker isolation of chroot() method (we don't unmount old root), > user has tu explicitly enable fallback with the new option. First, > always try to sandbox with pivot_root(). In both cases the new root is > placed into an empty tmpfs. > > For the solution to work we keep CAP_SYS_CHROOT capability, which is > dropped at the end of the isolate_prefork() function. > > Link: https://bugs.passt.top/show_bug.cgi?id=104 > > Signed-off-by: Mateusz Andrzejewski > --- > Changes since v1: > - added a description of the new option to the man page > - renamed option to --chroot-fallback > - removed auxiliary variable in conf() > - passed execution context to isolate_user() > - fixed indentation and other coding style issues > - fixed cppcheck issue with variableScope in isolate_prefork() > > Direct write to &c->chroot_fallback is not valid, because it's a bool > variable and compilation results with an incompatible-pointer-types > warning (int* expected). By the way, right, my previous suggestion implied that you would also turn that to an int, but I forgot to say that. On the other hand: > To fix this, the assignment was moved to the '32' label in the switch > statement and the temporary helper variable could be removed. This > follows the same pattern of other boolean options. ...this is also true, and I think it's more important to keep it consistent rather than saving three lines there, so let's leave it like you did now. > conf.c | 8 +++++- > isolation.c | 71 ++++++++++++++++++++++++++++++++++++++++++----------- > isolation.h | 4 +-- > passt.1 | 9 +++++++ > passt.h | 2 ++ > 5 files changed, 77 insertions(+), 17 deletions(-) > > diff --git a/conf.c b/conf.c > index 4755a9f..6755bf6 100644 > --- a/conf.c > +++ b/conf.c > @@ -654,6 +654,8 @@ static void usage(const char *name, FILE *f, int status) > " --no-ra Disable router advertisements\n" > " --freebind Bind to any address for forwarding\n" > " --no-map-gw Don't map gateway address to host\n" > + " --chroot-fallback Use chroot() if pivot_root() fails\n" > + " can be useful for rootfs on tmpfs or initramfs\n" I was suggesting that you would explain this in the documentation, but I think having it in the man page is sufficient, while this usage message isn't supposed to have all the details, and it's already long enough. So I'd rather drop the "can be useful ..." line from here. Not a strong preference from my side, though (it's just one extra line after all). > " -4, --ipv4-only Enable IPv4 operation only\n" > " -6, --ipv6-only Enable IPv6 operation only\n" > " -t, --tcp-ports SPEC TCP port forwarding to %s\n" > @@ -1233,6 +1235,7 @@ void conf(struct ctx *c, int argc, char **argv) > {"migrate-no-linger", no_argument, NULL, 30 }, > {"stats", required_argument, NULL, 31 }, > {"conf-path", required_argument, NULL, 'c' }, > + {"chroot-fallback", no_argument, NULL, 32 }, > { 0 }, > }; > const char *optstring = "+dqfel:hs:c:F:I:p:P:m:a:n:M:g:i:o:D:S:H:461t:u:T:U:"; > @@ -1467,6 +1470,9 @@ void conf(struct ctx *c, int argc, char **argv) > die("Can't display statistics if not running in foreground"); > c->stats = strtol(optarg, NULL, 0); > break; > + case 32: > + c->chroot_fallback = true; > + break; > case 'd': > c->debug = 1; > c->quiet = 0; > @@ -1879,7 +1885,7 @@ void conf(struct ctx *c, int argc, char **argv) > > conf_open_files(c); /* Before any possible setuid() / setgid() */ > > - isolate_user(uid, gid, !netns_only, userns, c->mode); > + isolate_user(c, uid, gid, !netns_only, userns); > > if (c->no_icmp) > c->no_ndp = 1; > diff --git a/isolation.c b/isolation.c > index 7e6225d..08e4008 100644 > --- a/isolation.c > +++ b/isolation.c > @@ -166,6 +166,31 @@ static void clamp_caps(void) > die_perror("Couldn't drop inheritable capabilities"); > } > > +/** > + * move_root() - Use chroot() instead of pivot_root() for sandboxing > + * > + * Return: negative error code on failure, zero on success > + */ > +static int move_root(void) > +{ > + if (mount(TMPDIR, "/", "", MS_MOVE, "")) { > + err_perror("Failed to move root into empty tmpfs"); > + return -errno; > + } > + > + if (chroot(".")) { > + err_perror("Failed to chroot() into empty tmpfs"); > + return -errno; > + } > + > + if (chdir("/")) { > + err_perror("Failed to change directory into new root"); > + return -errno; > + } > + > + return 0; > +} > + > /** > * isolate_initial() - Early, mostly config independent self isolation > * @argc: Argument count > @@ -195,14 +220,18 @@ void isolate_initial(int argc, char **argv) > * - CAP_SYS_ADMIN, so that we can setns() to the netns. > * - Keep CAP_NET_ADMIN, so that we can configure interfaces > * > + * We have to keep CAP_SYS_CHROOT in case of --chroot-fallback option > + * being enabled, so we can fallback from pivot_root() to chroot() in Very minor but... the noun is "fallback" and the verb is "fall back". > + * isolate_prefork(). > + * > * It's debatable whether it's useful to drop caps when we > * retain SETUID and SYS_ADMIN, but we might as well. We drop > * further capabilities in isolate_user() and > * isolate_prefork(). > */ > keep = BIT(CAP_NET_BIND_SERVICE) | BIT(CAP_SETUID) | BIT(CAP_SETGID) | > - BIT(CAP_SYS_ADMIN) | BIT(CAP_NET_ADMIN) | BIT(CAP_DAC_OVERRIDE); > - > + BIT(CAP_SYS_ADMIN) | BIT(CAP_NET_ADMIN) | BIT(CAP_DAC_OVERRIDE) | > + BIT(CAP_SYS_CHROOT); > /* Since Linux 5.12, if we want to update /proc/self/uid_map to create > * a mapping from UID 0, which only happens with pasta spawning a child > * from a non-init user namespace (pasta can't run as root), we need to > @@ -220,11 +249,11 @@ void isolate_initial(int argc, char **argv) > > /** > * isolate_user() - Switch to final UID/GID and move into userns > + * @c: Execution context > * @uid: User ID to run as (in original userns) > * @gid: Group ID to run as (in original userns) > * @use_userns: Whether to join or create a userns > * @userns: userns path to enter, may be empty > - * @mode: Mode (passt or pasta) > * > * Should: > * - set our final UID and GID > @@ -232,8 +261,8 @@ void isolate_initial(int argc, char **argv) > * Mustn't: > * - remove filesystem access (we need that for further setup) > */ > -void isolate_user(uid_t uid, gid_t gid, bool use_userns, const char *userns, > - enum passt_modes mode) > +void isolate_user(const struct ctx *c, uid_t uid, gid_t gid, bool use_userns, > + const char *userns) > { > uint64_t ns_caps = 0; > > @@ -277,7 +306,14 @@ void isolate_user(uid_t uid, gid_t gid, bool use_userns, const char *userns, > * netns > */ > ns_caps |= BIT(CAP_SYS_ADMIN); > - if (mode == MODE_PASTA) { > + > + /* Only keep CAP_SYS_CHROOT for the --chroot-fallback case. Otherwise > + * it can be dropped > + */ > + if (c->chroot_fallback) > + ns_caps |= BIT(CAP_SYS_CHROOT); > + > + if (c->mode == MODE_PASTA) { > /* Keep CAP_NET_ADMIN, so we can configure the if */ > ns_caps |= BIT(CAP_NET_ADMIN); > /* Keep CAP_NET_BIND_SERVICE, so we can splice > @@ -331,7 +367,7 @@ int isolate_prefork(const struct ctx *c) > if (mount("", TMPDIR, "tmpfs", > MS_NODEV | MS_NOEXEC | MS_NOSUID | MS_RDONLY, > "nr_inodes=2,nr_blocks=0")) { > - err_perror("Failed to mount empty tmpfs for pivot_root()"); > + err_perror("Failed to mount empty tmpfs for sandboxing"); > return -errno; > } > > @@ -341,13 +377,20 @@ int isolate_prefork(const struct ctx *c) > } > > if (syscall(SYS_pivot_root, ".", ".")) { > - err_perror("Failed to pivot_root() into empty tmpfs"); > - return -errno; > - } > - > - if (umount2(".", MNT_DETACH | UMOUNT_NOFOLLOW)) { > - err_perror("Failed to unmount original root filesystem"); > - return -errno; > + if (c->chroot_fallback) { > + int rc; > + info("Failed to pivot_root(), fallback to chroot()..."); > + if ((rc = move_root())) > + return rc; > + } else { > + err_perror("Failed to pivot_root() into empty tmpfs"); > + return -errno; > + } > + } else { > + if (umount2(".", MNT_DETACH | UMOUNT_NOFOLLOW)) { > + err_perror("Failed to unmount original root filesystem"); > + return -errno; > + } > } > > /* Now that initialization is more-or-less complete, we can > diff --git a/isolation.h b/isolation.h > index 0576168..66b6968 100644 > --- a/isolation.h > +++ b/isolation.h > @@ -11,8 +11,8 @@ > #include > > void isolate_initial(int argc, char **argv); > -void isolate_user(uid_t uid, gid_t gid, bool use_userns, const char *userns, > - enum passt_modes mode); > +void isolate_user(const struct ctx *c, uid_t uid, gid_t gid, bool use_userns, > + const char *userns); > int isolate_prefork(const struct ctx *c); > void isolate_postfork(const struct ctx *c); > > diff --git a/passt.1 b/passt.1 > index 908fd4a..8a33c2e 100644 > --- a/passt.1 > +++ b/passt.1 > @@ -393,6 +393,15 @@ as destination, to the host. Implied if there is no gateway on the selected > default route, or if there is no default route, for any of the enabled address > families. > > +.TP > +.BR \-\-chroot-fallback > +Enable a fallback to chroot() in case pivot_root() returns an error. Useful for Maybe useful to specify: "in case pivot_root(), used to switch to an empty filesystem for stricter isolation, returns an error". Otherwise most users will have no idea what this is about. > +integrations, which use rootfs on tmpfs or initramfs, where it is not allowed to Also rather minor, but I think this sentence is a bit difficult to follow: - "integrations which use ..." shouldn't have a comma, as the following clause is essential for the meaning of the sentence (restrictive / essential vs. nonrestrictive / nonessential) - "allowed" can't be used in an impersonal way. It needs an actual subject, that is: "pivot_root() usage isn't allowed", but you can't say that "_it_ is not allowed to use pivot_root()" (I happily do this mistake all the time as a native Italian speaker). I would turn this to: Useful for integrations which use tmpfs or initramfs as root filesystem, where pivot_root() can't be used, as it results in an invalid argument error (EINVAL). > +use pivot_root(), because it results with an invalid argument error (EINVAL). > + > +By default, fallback is disabled. If pivot_root() fails, then the entire No need for brevity here: "the fallback is disabled". > +sandboxing process fails. > + > .TP > .BR \-\-map-guest-addr " " \fIaddr > Translate \fIaddr\fR in the guest to be equal to the guest's assigned > diff --git a/passt.h b/passt.h > index 16506dc..a20148e 100644 > --- a/passt.h > +++ b/passt.h > @@ -214,6 +214,7 @@ struct ip6_ctx { > * @splice_only: Only enable loopback forwarding > * @host_lo_to_ns_lo: Map host loopback addresses to ns loopback addresses > * @freebind: Allow binding of non-local addresses for forwarding > + * @chroot_fallback: Use chroot() in case pivot_root() fails > * @low_wmem: Low probed net.core.wmem_max > * @low_rmem: Low probed net.core.rmem_max > * @no_bindtodevice: Unprivileged SO_BINDTODEVICE not available > @@ -299,6 +300,7 @@ struct ctx { > int splice_only; > int host_lo_to_ns_lo; > int freebind; > + bool chroot_fallback; > > int low_wmem; > int low_rmem; The rest looks good to me, let me know if you agree with these changes, and if you prefer to re-post or that I just apply them. -- Stefano