From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=HM+vK9TE; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 1ADAC5A0265 for ; Wed, 01 Jul 2026 09:51:53 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1782892313; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vbAKq8w05yzFdHwYsYlz47M9QIQMpAKLZLCyT4TuWK0=; b=HM+vK9TEF2fOQJA8XLMFV4a1A3NIjYKJ7RbmGGJ9fWsg5slBxt7z2ECedB1MpBuKddhTU1 OL1XuXKF4YRF9e0E3Loptp3bSY5OcX6UBsyLVkb+9I6GDMp16wVrEngBA1eDLArdO3MDnz I5F1jF8mr1bj0hB2xQ7YG1ZIsfBZzW0= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-218-0PpOGGa7OwKWa2gG1plFUw-1; Wed, 01 Jul 2026 03:51:51 -0400 X-MC-Unique: 0PpOGGa7OwKWa2gG1plFUw-1 X-Mimecast-MFC-AGG-ID: 0PpOGGa7OwKWa2gG1plFUw_1782892310 Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-47485fde05aso216051f8f.0 for ; Wed, 01 Jul 2026 00:51:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1782892310; x=1783497110; h=date:content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=vbAKq8w05yzFdHwYsYlz47M9QIQMpAKLZLCyT4TuWK0=; b=TcKxvMiyvJPpaK0SweOtlHAKeq36OKdCnWGbxy1b7eG9YykiXzasDxIawZhIOw1voq mfas1pxgscYDMDqIjfxeOX47Vs9qnaLeYn/0nN6dpGJBqO+t1RqCcPJ7R8lrdtUqESwK mrQb5NczqoxFbokAQBZsvfYOrhfjQ4h1NkRT6XuXp3eFuOlE3Ku2Jj03iDUdXyvwtEei lLfaxcPgwE7dklfpnoPWBFOtIqQYTaObI7dXFbEKeYXd0Roz5NoN1kNfO4JizXrPJRr+ 11uhClH5TwXfhKpAUZN+wtO8ZvHKsp5Wqy9SwCBfLwABa1fjcmMY83DKvfbc87aQFoss NoOA== X-Gm-Message-State: AOJu0YyfxGD1oF+6p4N2Xht1ds6BI9/4eBp0oB+wWwtpxezw7cBHQpL3 MyLTmxVY0okrre0Ti0KXXFwyq7TDHXBAvKa3hAUIsglhJaXhEqxbo4+9v7y/mSQU8GNi4ZOuE6t J7jfJZeD1mH3vR76G1F+MAo1LPclCcexIm4+9rfda7zu160BvArHtNw== X-Gm-Gg: AfdE7ckplPUUUNZi1uspi3EfDbMMaKpoU7I29s+V7oB5rTXsIxKbkr9zoxgEOqLvRmt fOolb/DUalM8boItzs4piDvupB6jP/m27lZiBGnBhp4YJC0vRekVa2N8sqBOaxVLf0KegAbyt49 2JteKSwZJm4lSllirdXzTOLgchl8/uqY0u7zT/0Jr/ebnjF/P737vPO+GT8lur1AFRj3lCWfB0t opU0e8JK/NG9K75Wn6SXlWXAJEtZitnlKYZsJkeMXv6qHMW1As5snA6YeY1ymv3esGbP+TrWc8P 3R7HyH6gaw4SSVVHqmKF7D87n0go4MjzgpgmNtUw8BGwft95Y6KdteI75tn2VYED+otowjCf57Y kBDfEdaMViUaKkLnspf76dw== X-Received: by 2002:a05:6000:5c5:b0:474:530:9d with SMTP id ffacd0b85a97d-47757797bf8mr915995f8f.13.1782892309939; Wed, 01 Jul 2026 00:51:49 -0700 (PDT) X-Received: by 2002:a05:6000:5c5:b0:474:530:9d with SMTP id ffacd0b85a97d-47757797bf8mr915922f8f.13.1782892309308; Wed, 01 Jul 2026 00:51:49 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-475671d02f5sm15705263f8f.28.2026.07.01.00.51.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Jul 2026 00:51:48 -0700 (PDT) From: Stefano Brivio To: Mateusz Andrzejewski Subject: Re: [PATCH] isolation: Add --use-chroot-fallback option Message-ID: <20260701095147.203f2f16@elisabeth> In-Reply-To: <20260624095953.758482-1-mateusz.andrzejewski@mikronika.com.pl> References: <20260624095953.758482-1-mateusz.andrzejewski@mikronika.com.pl> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 Date: Wed, 01 Jul 2026 09:51:48 +0200 (CEST) X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: O6ls_hdrDpjOgUDwCF6L6UixaN_G-GMuc6q4zp8zzuk_1782892310 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: EGNWVF6ED3QMY3TRUDHFUFR7BZF5O5WT X-Message-ID-Hash: EGNWVF6ED3QMY3TRUDHFUFR7BZF5O5WT X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, piotr.bzdrega@mikronika.com.pl, mateusz.andrzejewski@mikronika.com.pl X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Mateusz, thanks for the patch. A few comments below, mostly about coding style: On Wed, 24 Jun 2026 11:59:53 +0200 Mateusz Andrzejewski wrote: > For integrations, which use rootfs (tmpfs, initramfs), it is not > allowed to use pivot_root(). It results with invalid argument (EINVAL) > error. Introduce --use-chroot-fallback option as a workaround with > MS_MOVE + chroot(). > > Due to weaker isolation of chroot() method (we don't umount old root), > user has tu explicitly enable fallback with the new option. First, > always try to sandbox with pivot_root(). In both cases the new root is > placed into an empty tmpfs. > > For the solution to work we keep CAP_SYS_CHROOT capability, which is > dropped at the end of the isolate_prefork() function. > Here, you should add a tag: Link: https://bugs.passt.top/show_bug.cgi?id=104 (see CONTRIBUTING.md... which is slightly misleading, we should fix it at some point). > Signed-off-by: Mateusz Andrzejewski > --- > conf.c | 9 ++++++-- > isolation.c | 63 ++++++++++++++++++++++++++++++++++++++++++++--------- > isolation.h | 2 +- > passt.h | 2 ++ > 4 files changed, 63 insertions(+), 13 deletions(-) > > diff --git a/conf.c b/conf.c > index 4755a9f..fe4d5e9 100644 > --- a/conf.c > +++ b/conf.c > @@ -654,6 +654,7 @@ static void usage(const char *name, FILE *f, int status) > " --no-ra Disable router advertisements\n" > " --freebind Bind to any address for forwarding\n" > " --no-map-gw Don't map gateway address to host\n" > + " --use-chroot-fallback Use chroot in case pivot_root fails\n" Many other options imply "--use...". If you skip "use", the option name is just as descriptive, and you can keep things properly aligned here in the usage message. > " -4, --ipv4-only Enable IPv4 operation only\n" > " -6, --ipv6-only Enable IPv6 operation only\n" > " -t, --tcp-ports SPEC TCP port forwarding to %s\n" > @@ -1158,7 +1159,7 @@ static void conf_sock_listen(const struct ctx *c) > */ > void conf(struct ctx *c, int argc, char **argv) > { > - int netns_only = 0, no_map_gw = 0; > + int netns_only = 0, no_map_gw = 0, use_chroot_fallback = 0; > const struct option options[] = { > {"debug", no_argument, NULL, 'd' }, > {"quiet", no_argument, NULL, 'q' }, > @@ -1192,6 +1193,7 @@ void conf(struct ctx *c, int argc, char **argv) > {"splice-only", no_argument, &c->splice_only, 1 }, > {"freebind", no_argument, &c->freebind, 1 }, > {"no-map-gw", no_argument, &no_map_gw, 1 }, > + {"use-chroot-fallback", no_argument, &use_chroot_fallback, 1 }, You could directly write to &c->chroot_fallback, I think, and: > {"ipv4-only", no_argument, NULL, '4' }, > {"ipv6-only", no_argument, NULL, '6' }, > {"one-off", no_argument, NULL, '1' }, > @@ -1777,6 +1779,8 @@ void conf(struct ctx *c, int argc, char **argv) > c->no_dns_search = 1; > } > > + c->use_chroot_fallback = use_chroot_fallback; ...skip this assignment, and... > + > if (!ifi4 && *c->ip4.ifname_out) > ifi4 = if_nametoindex(c->ip4.ifname_out); > > @@ -1879,7 +1883,8 @@ void conf(struct ctx *c, int argc, char **argv) > > conf_open_files(c); /* Before any possible setuid() / setgid() */ > > - isolate_user(uid, gid, !netns_only, userns, c->mode); > + isolate_user(uid, gid, !netns_only, userns, c->mode, > + c->use_chroot_fallback); pass 'c' to isolate_user(), instead of c->mode and c->chroot_fallback, so that it's a bit more consistent with isolate_prefork() and you save some lines of code as well. > > if (c->no_icmp) > c->no_ndp = 1; > diff --git a/isolation.c b/isolation.c > index 7e6225d..4e6fd32 100644 > --- a/isolation.c > +++ b/isolation.c > @@ -166,6 +166,31 @@ static void clamp_caps(void) > die_perror("Couldn't drop inheritable capabilities"); > } > > +/** > + * move_root() - Use chroot instead of pivot_root for sandboxing. Those are functions or system calls, so chroot() and pivot_root(). You can drop the '.' from the end, here and in the line below, for consistency (those are not full sentences anyway). > + * > + * Return: negative error code on failure, zero on success. > + */ > +static int move_root(void) > +{ > + if (mount(TMPDIR, "/", "", MS_MOVE, "")) { > + err_perror("Failed to move root into empty tmpfs"); > + return -errno; > + } > + > + if (chroot(".")) { > + err_perror("Failed to chroot into empty tmpfs"); > + return -errno; > + } > + > + if (chdir("/")) { > + err_perror("Failed to change directory into new root"); > + return -errno; > + } > + > + return 0; > +} > + > /** > * isolate_initial() - Early, mostly config independent self isolation > * @argc: Argument count > @@ -201,8 +226,8 @@ void isolate_initial(int argc, char **argv) > * isolate_prefork(). > */ > keep = BIT(CAP_NET_BIND_SERVICE) | BIT(CAP_SETUID) | BIT(CAP_SETGID) | > - BIT(CAP_SYS_ADMIN) | BIT(CAP_NET_ADMIN) | BIT(CAP_DAC_OVERRIDE); > - > + BIT(CAP_SYS_ADMIN) | BIT(CAP_NET_ADMIN) | BIT(CAP_DAC_OVERRIDE) | > + BIT(CAP_SYS_CHROOT); We try to logically align things (just like it's done in the Linux kernel coding style), so this would be: keep = BIT(CAP_NET_BIND_SERVICE) | BIT(CAP_SETUID) | BIT(CAP_SETGID) | BIT(CAP_SYS_ADMIN) | BIT(CAP_NET_ADMIN) | BIT(CAP_DAC_OVERRIDE) | BIT(CAP_SYS_CHROOT); because BIT(CAP_SYS_CHROOT) has the same logical role as all the other bits, so it shouldn't be indented further. By the way, while at it, it would be nice to state in the comment above the reason why we need to keep it (it's described there for all the other capabilities). > /* Since Linux 5.12, if we want to update /proc/self/uid_map to create > * a mapping from UID 0, which only happens with pasta spawning a child > * from a non-init user namespace (pasta can't run as root), we need to > @@ -225,6 +250,7 @@ void isolate_initial(int argc, char **argv) > * @use_userns: Whether to join or create a userns > * @userns: userns path to enter, may be empty > * @mode: Mode (passt or pasta) > + * @use_chroot_fallback: Whether to use chroot() as fallback for sandboxing. Same here, this could be @chroot_fallback and the '.' at the end is not needed. > * > * Should: > * - set our final UID and GID > @@ -233,7 +259,7 @@ void isolate_initial(int argc, char **argv) > * - remove filesystem access (we need that for further setup) > */ > void isolate_user(uid_t uid, gid_t gid, bool use_userns, const char *userns, > - enum passt_modes mode) > + enum passt_modes mode, bool use_chroot_fallback) > { > uint64_t ns_caps = 0; > > @@ -277,6 +303,13 @@ void isolate_user(uid_t uid, gid_t gid, bool use_userns, const char *userns, > * netns > */ > ns_caps |= BIT(CAP_SYS_ADMIN); > + > + /* Keep CAP_SYS_CHROOT, so we can fallback to chroot > + * in case of use-chroot-fallback option being enabled > + */ Same here, we would typically align comments like this: /* Keep ... * */ ...this is the "net" variant (networking development) of the Linux kernel coding style, by the way. I started like that out of habit, now we should keep it like this for consistency with the rest of the codebase. > + if (use_chroot_fallback) > + ns_caps |= BIT(CAP_SYS_CHROOT); > + > if (mode == MODE_PASTA) { > /* Keep CAP_NET_ADMIN, so we can configure the if */ > ns_caps |= BIT(CAP_NET_ADMIN); > @@ -310,6 +343,7 @@ int isolate_prefork(const struct ctx *c) > { > int flags = CLONE_NEWIPC | CLONE_NEWNS | CLONE_NEWUTS; > uint64_t ns_caps = 0; > + int result = -1; You could restrict the scope. I'm somewhat surprised that Cppcheck doesn't report this here (it usually does, by the way it's 'make cppcheck'). Anyway: > > /* If we run in foreground, we have no chance to actually move to a new > * PID namespace. For passt, use CLONE_NEWPID anyway, in case somebody > @@ -331,7 +365,7 @@ int isolate_prefork(const struct ctx *c) > if (mount("", TMPDIR, "tmpfs", > MS_NODEV | MS_NOEXEC | MS_NOSUID | MS_RDONLY, > "nr_inodes=2,nr_blocks=0")) { > - err_perror("Failed to mount empty tmpfs for pivot_root()"); > + err_perror("Failed to mount empty tmpfs for sandboxing"); > return -errno; > } > > @@ -341,13 +375,22 @@ int isolate_prefork(const struct ctx *c) > } > > if (syscall(SYS_pivot_root, ".", ".")) { > - err_perror("Failed to pivot_root() into empty tmpfs"); > - return -errno; > + if (c->use_chroot_fallback) { You just use 'result' here, so you could do (we use 'rc' more commonly for 'return code'): int rc; [...] if ((rc = move_root()) return rc; > + info("Failed to pivot_root(), fallback to chroot()..."); > + result = move_root(); > + if (result) > + return result; > + } > + else { Coding style: } else { ... > + err_perror("Failed to pivot_root() into empty tmpfs"); > + return -errno; > + } > } > - > - if (umount2(".", MNT_DETACH | UMOUNT_NOFOLLOW)) { > - err_perror("Failed to unmount original root filesystem"); > - return -errno; > + else { Same as above. > + if (umount2(".", MNT_DETACH | UMOUNT_NOFOLLOW)) { > + err_perror("Failed to unmount original root filesystem"); > + return -errno; > + } > } > > /* Now that initialization is more-or-less complete, we can > diff --git a/isolation.h b/isolation.h > index 0576168..ea8dacb 100644 > --- a/isolation.h > +++ b/isolation.h > @@ -12,7 +12,7 @@ > > void isolate_initial(int argc, char **argv); > void isolate_user(uid_t uid, gid_t gid, bool use_userns, const char *userns, > - enum passt_modes mode); > + enum passt_modes mode, bool use_chroot_fallback); > int isolate_prefork(const struct ctx *c); > void isolate_postfork(const struct ctx *c); > > diff --git a/passt.h b/passt.h > index 16506dc..2085cca 100644 > --- a/passt.h > +++ b/passt.h > @@ -214,6 +214,7 @@ struct ip6_ctx { > * @splice_only: Only enable loopback forwarding > * @host_lo_to_ns_lo: Map host loopback addresses to ns loopback addresses > * @freebind: Allow binding of non-local addresses for forwarding > + * @use_chroot_fallback: Use chroot in case pivot_root fails Same as above. > * @low_wmem: Low probed net.core.wmem_max > * @low_rmem: Low probed net.core.rmem_max > * @no_bindtodevice: Unprivileged SO_BINDTODEVICE not available > @@ -299,6 +300,7 @@ struct ctx { > int splice_only; > int host_lo_to_ns_lo; > int freebind; > + bool use_chroot_fallback; > > int low_wmem; > int low_rmem; ...I guess you forgot to update the man page, see passt.1: we document options and operation there. If you don't want to dig into the beautiful world of man page syntax (nroff/troff), you can just take no-dhcp-dns as an example or something like that. Copy and adapt. The message from the usage is good enough, I think. Perhaps add a note about the fact that this is useful for tmpfs and initramfs. -- Stefano