public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: Mateusz Andrzejewski <mandrzejewski06@gmail.com>
Cc: passt-dev@passt.top, piotr.bzdrega@mikronika.com.pl,
	mateusz.andrzejewski@mikronika.com.pl
Subject: Re: [PATCH] isolation: Add --use-chroot-fallback option
Date: Wed, 01 Jul 2026 09:51:48 +0200 (CEST)	[thread overview]
Message-ID: <20260701095147.203f2f16@elisabeth> (raw)
In-Reply-To: <20260624095953.758482-1-mateusz.andrzejewski@mikronika.com.pl>

Mateusz, thanks for the patch. A few comments below, mostly about coding
style:

On Wed, 24 Jun 2026 11:59:53 +0200
Mateusz Andrzejewski <mandrzejewski06@gmail.com> wrote:

> For integrations, which use rootfs (tmpfs, initramfs), it is not
> allowed to use pivot_root(). It results with invalid argument (EINVAL)
> error. Introduce --use-chroot-fallback option as a workaround with
> MS_MOVE + chroot().
> 
> Due to weaker isolation of chroot() method (we don't umount old root),
> user has tu explicitly enable fallback with the new option. First,
> always try to sandbox with pivot_root(). In both cases the new root is
> placed into an empty tmpfs.
> 
> For the solution to work we keep CAP_SYS_CHROOT capability, which is
> dropped at the end of the isolate_prefork() function.
> 

Here, you should add a tag:

Link: https://bugs.passt.top/show_bug.cgi?id=104

(see CONTRIBUTING.md... which is slightly misleading, we should fix it
at some point).

> Signed-off-by: Mateusz Andrzejewski <mateusz.andrzejewski@mikronika.com.pl>
> ---
>  conf.c      |  9 ++++++--
>  isolation.c | 63 ++++++++++++++++++++++++++++++++++++++++++++---------
>  isolation.h |  2 +-
>  passt.h     |  2 ++
>  4 files changed, 63 insertions(+), 13 deletions(-)
> 
> diff --git a/conf.c b/conf.c
> index 4755a9f..fe4d5e9 100644
> --- a/conf.c
> +++ b/conf.c
> @@ -654,6 +654,7 @@ static void usage(const char *name, FILE *f, int status)
>  		"  --no-ra		Disable router advertisements\n"
>  		"  --freebind		Bind to any address for forwarding\n"
>  		"  --no-map-gw		Don't map gateway address to host\n"
> +		"  --use-chroot-fallback		Use chroot in case pivot_root fails\n"

Many other options imply "--use...". If you skip "use", the option name
is just as descriptive, and you can keep things properly aligned here in
the usage message.

>  		"  -4, --ipv4-only	Enable IPv4 operation only\n"
>  		"  -6, --ipv6-only	Enable IPv6 operation only\n"
>  		"  -t, --tcp-ports SPEC	TCP port forwarding to %s\n"
> @@ -1158,7 +1159,7 @@ static void conf_sock_listen(const struct ctx *c)
>   */
>  void conf(struct ctx *c, int argc, char **argv)
>  {
> -	int netns_only = 0, no_map_gw = 0;
> +	int netns_only = 0, no_map_gw = 0, use_chroot_fallback = 0;
>  	const struct option options[] = {
>  		{"debug",	no_argument,		NULL,		'd' },
>  		{"quiet",	no_argument,		NULL,		'q' },
> @@ -1192,6 +1193,7 @@ void conf(struct ctx *c, int argc, char **argv)
>  		{"splice-only",	no_argument,		&c->splice_only, 1 },
>  		{"freebind",	no_argument,		&c->freebind,	1 },
>  		{"no-map-gw",	no_argument,		&no_map_gw,	1 },
> +		{"use-chroot-fallback",	no_argument,	&use_chroot_fallback,	1 },

You could directly write to &c->chroot_fallback, I think, and:

>  		{"ipv4-only",	no_argument,		NULL,		'4' },
>  		{"ipv6-only",	no_argument,		NULL,		'6' },
>  		{"one-off",	no_argument,		NULL,		'1' },
> @@ -1777,6 +1779,8 @@ void conf(struct ctx *c, int argc, char **argv)
>  		c->no_dns_search = 1;
>  	}
>  
> +	c->use_chroot_fallback = use_chroot_fallback;

...skip this assignment, and...

> +
>  	if (!ifi4 && *c->ip4.ifname_out)
>  		ifi4 = if_nametoindex(c->ip4.ifname_out);
>  
> @@ -1879,7 +1883,8 @@ void conf(struct ctx *c, int argc, char **argv)
>  
>  	conf_open_files(c);	/* Before any possible setuid() / setgid() */
>  
> -	isolate_user(uid, gid, !netns_only, userns, c->mode);
> +	isolate_user(uid, gid, !netns_only, userns, c->mode,
> +		c->use_chroot_fallback);

pass 'c' to isolate_user(), instead of c->mode and c->chroot_fallback,
so that it's a bit more consistent with isolate_prefork() and you save
some lines of code as well.

>  
>  	if (c->no_icmp)
>  		c->no_ndp = 1;
> diff --git a/isolation.c b/isolation.c
> index 7e6225d..4e6fd32 100644
> --- a/isolation.c
> +++ b/isolation.c
> @@ -166,6 +166,31 @@ static void clamp_caps(void)
>  		die_perror("Couldn't drop inheritable capabilities");
>  }
>  
> +/**
> + * move_root() - Use chroot instead of pivot_root for sandboxing.

Those are functions or system calls, so chroot() and pivot_root(). You
can drop the '.' from the end, here and in the line below, for
consistency (those are not full sentences anyway).

> + *
> + * Return: negative error code on failure, zero on success.
> + */
> +static int move_root(void)
> +{
> +	if (mount(TMPDIR, "/", "", MS_MOVE, "")) {
> +		err_perror("Failed to move root into empty tmpfs");
> +		return -errno;
> +	}
> +
> +	if (chroot(".")) {
> +		err_perror("Failed to chroot into empty tmpfs");
> +		return -errno;
> +	}
> +
> +	if (chdir("/")) {
> +		err_perror("Failed to change directory into new root");
> +		return -errno;
> +	}
> +
> +	return 0;
> +}
> +
>  /**
>   * isolate_initial() - Early, mostly config independent self isolation
>   * @argc:	Argument count
> @@ -201,8 +226,8 @@ void isolate_initial(int argc, char **argv)
>  	 * isolate_prefork().
>  	 */
>  	keep = BIT(CAP_NET_BIND_SERVICE) | BIT(CAP_SETUID) | BIT(CAP_SETGID) |
> -	       BIT(CAP_SYS_ADMIN) | BIT(CAP_NET_ADMIN) | BIT(CAP_DAC_OVERRIDE);
> -
> +	       BIT(CAP_SYS_ADMIN) | BIT(CAP_NET_ADMIN) | BIT(CAP_DAC_OVERRIDE) |
> +		   BIT(CAP_SYS_CHROOT);

We try to logically align things (just like it's done in the Linux
kernel coding style), so this would be:

	keep = BIT(CAP_NET_BIND_SERVICE) | BIT(CAP_SETUID) | BIT(CAP_SETGID) |
	       BIT(CAP_SYS_ADMIN) | BIT(CAP_NET_ADMIN) | BIT(CAP_DAC_OVERRIDE) |
	       BIT(CAP_SYS_CHROOT);

because BIT(CAP_SYS_CHROOT) has the same logical role as all the other
bits, so it shouldn't be indented further.

By the way, while at it, it would be nice to state in the comment above
the reason why we need to keep it (it's described there for all the
other capabilities).

>  	/* Since Linux 5.12, if we want to update /proc/self/uid_map to create
>  	 * a mapping from UID 0, which only happens with pasta spawning a child
>  	 * from a non-init user namespace (pasta can't run as root), we need to
> @@ -225,6 +250,7 @@ void isolate_initial(int argc, char **argv)
>   * @use_userns:	Whether to join or create a userns
>   * @userns:	userns path to enter, may be empty
>   * @mode:	Mode (passt or pasta)
> + * @use_chroot_fallback: Whether to use chroot() as fallback for sandboxing.

Same here, this could be @chroot_fallback and the '.' at the end is not
needed.

>   *
>   * Should:
>   *  - set our final UID and GID
> @@ -233,7 +259,7 @@ void isolate_initial(int argc, char **argv)
>   *  - remove filesystem access (we need that for further setup)
>   */
>  void isolate_user(uid_t uid, gid_t gid, bool use_userns, const char *userns,
> -		  enum passt_modes mode)
> +		  enum passt_modes mode, bool use_chroot_fallback)
>  {
>  	uint64_t ns_caps = 0;
>  
> @@ -277,6 +303,13 @@ void isolate_user(uid_t uid, gid_t gid, bool use_userns, const char *userns,
>  	 * netns
>  	 */
>  	ns_caps |= BIT(CAP_SYS_ADMIN);
> +
> +	/* Keep CAP_SYS_CHROOT, so we can fallback to chroot
> +	* in case of use-chroot-fallback option being enabled
> +	*/

Same here, we would typically align comments like this:

	/* Keep ...
	 *
	 */

...this is the "net" variant (networking development) of the Linux
kernel coding style, by the way. I started like that out of habit, now
we should keep it like this for consistency with the rest of the
codebase.

> +	if (use_chroot_fallback)
> +		ns_caps |= BIT(CAP_SYS_CHROOT);
> +
>  	if (mode == MODE_PASTA) {
>  		/* Keep CAP_NET_ADMIN, so we can configure the if */
>  		ns_caps |= BIT(CAP_NET_ADMIN);
> @@ -310,6 +343,7 @@ int isolate_prefork(const struct ctx *c)
>  {
>  	int flags = CLONE_NEWIPC | CLONE_NEWNS | CLONE_NEWUTS;
>  	uint64_t ns_caps = 0;
> +	int result = -1;

You could restrict the scope. I'm somewhat surprised that Cppcheck
doesn't report this here (it usually does, by the way it's 'make
cppcheck'). Anyway:

>  
>  	/* If we run in foreground, we have no chance to actually move to a new
>  	 * PID namespace. For passt, use CLONE_NEWPID anyway, in case somebody
> @@ -331,7 +365,7 @@ int isolate_prefork(const struct ctx *c)
>  	if (mount("", TMPDIR, "tmpfs",
>  		  MS_NODEV | MS_NOEXEC | MS_NOSUID | MS_RDONLY,
>  		  "nr_inodes=2,nr_blocks=0")) {
> -		err_perror("Failed to mount empty tmpfs for pivot_root()");
> +		err_perror("Failed to mount empty tmpfs for sandboxing");
>  		return -errno;
>  	}
>  
> @@ -341,13 +375,22 @@ int isolate_prefork(const struct ctx *c)
>  	}
>  
>  	if (syscall(SYS_pivot_root, ".", ".")) {
> -		err_perror("Failed to pivot_root() into empty tmpfs");
> -		return -errno;
> +		if (c->use_chroot_fallback) {

You just use 'result' here, so you could do (we use 'rc' more commonly
for 'return code'):

			int rc;

			[...]
			if ((rc = move_root())
				return rc;

> +			info("Failed to pivot_root(), fallback to chroot()...");
> +			result = move_root();
> +			if (result)
> +				return result;
> +		}
> +		else {

Coding style:

	} else {

...

> +			err_perror("Failed to pivot_root() into empty tmpfs");
> +			return -errno;
> +		}
>  	}
> -
> -	if (umount2(".", MNT_DETACH | UMOUNT_NOFOLLOW)) {
> -		err_perror("Failed to unmount original root filesystem");
> -		return -errno;
> +	else {

Same as above.

> +		if (umount2(".", MNT_DETACH | UMOUNT_NOFOLLOW)) {
> +			err_perror("Failed to unmount original root filesystem");
> +			return -errno;
> +		}
>  	}
>  
>  	/* Now that initialization is more-or-less complete, we can
> diff --git a/isolation.h b/isolation.h
> index 0576168..ea8dacb 100644
> --- a/isolation.h
> +++ b/isolation.h
> @@ -12,7 +12,7 @@
>  
>  void isolate_initial(int argc, char **argv);
>  void isolate_user(uid_t uid, gid_t gid, bool use_userns, const char *userns,
> -		  enum passt_modes mode);
> +		  enum passt_modes mode, bool use_chroot_fallback);
>  int isolate_prefork(const struct ctx *c);
>  void isolate_postfork(const struct ctx *c);
>  
> diff --git a/passt.h b/passt.h
> index 16506dc..2085cca 100644
> --- a/passt.h
> +++ b/passt.h
> @@ -214,6 +214,7 @@ struct ip6_ctx {
>   * @splice_only:	Only enable loopback forwarding
>   * @host_lo_to_ns_lo:	Map host loopback addresses to ns loopback addresses
>   * @freebind:		Allow binding of non-local addresses for forwarding
> + * @use_chroot_fallback:		Use chroot in case pivot_root fails

Same as above.

>   * @low_wmem:		Low probed net.core.wmem_max
>   * @low_rmem:		Low probed net.core.rmem_max
>   * @no_bindtodevice:	Unprivileged SO_BINDTODEVICE not available
> @@ -299,6 +300,7 @@ struct ctx {
>  	int splice_only;
>  	int host_lo_to_ns_lo;
>  	int freebind;
> +	bool use_chroot_fallback;
>  
>  	int low_wmem;
>  	int low_rmem;

...I guess you forgot to update the man page, see passt.1: we document
options and operation there.

If you don't want to dig into the beautiful world of man page syntax
(nroff/troff), you can just take no-dhcp-dns as an example or something
like that. Copy and adapt. The message from the usage is good enough, I
think. Perhaps add a note about the fact that this is useful for tmpfs
and initramfs.

-- 
Stefano


      reply	other threads:[~2026-07-01  7:51 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-24  9:59 Mateusz Andrzejewski
2026-07-01  7:51 ` Stefano Brivio [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260701095147.203f2f16@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=mandrzejewski06@gmail.com \
    --cc=mateusz.andrzejewski@mikronika.com.pl \
    --cc=passt-dev@passt.top \
    --cc=piotr.bzdrega@mikronika.com.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).