public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: passt-dev@passt.top
Subject: Re: [PATCH] RFC: Remove unusable --netns-only option
Date: Wed, 20 Jul 2022 10:00:40 +0200	[thread overview]
Message-ID: <20220720100040.120eec70@elisabeth> (raw)
In-Reply-To: <YtdsRs1dunkTNp4L@yekko>

[-- Attachment #1: Type: text/plain, Size: 6071 bytes --]

On Wed, 20 Jul 2022 12:45:26 +1000
David Gibson <david(a)gibson.dropbear.id.au> wrote:

> On Tue, Jul 19, 2022 at 10:39:25PM +0200, Stefano Brivio wrote:
> > On Tue, 19 Jul 2022 16:23:10 +1000
> > David Gibson <david(a)gibson.dropbear.id.au> wrote:
> >   
> > > The intended semantics of --netns-only are pretty unclear to me.  It's
> > > intended for pasta, but it's not clear whether its saying the spawned shell
> > > should only enter the target netns, or that the passt/pasta packet
> > > forwarding process should only sandbox itself in a network namespace, not
> > > a user namespace.  
> > 
> > The latter. I think this is marginally more clear in the man page, but needs
> > indeed a better explanation.  
> 
> Definitely.  At present it also appears to affect the spawned shell as
> well, it a rather counter-intuitive way.

Right, in that case we should restrict conditions where we can spawn a
shell to having UID 0 in a non-init namespace. See working example
below.

> > > In any case, as far as I can tell there's not actually any case in which
> > > the --netns-only option will work.  If nothing else, we will always fail
> > > in sandbox(), because it attempts a number of operations which require
> > > CAP_SYS_ADMIN in our current user namespace.  We drop all capabilities in
> > > our initial user namespace when we start, so the only way we can have
> > > CAP_SYS_ADMIN at this point is if we've joined a new user namespace, which
> > > we won't do with --netns-only.
> > > 
> > > For pasta joining an existing namespace (the apparently intended use case), we'll actually fail before
> > > we'll fail before we get to that point: in conf_ns_check() we'll attempt
> > > to join the target network namespace.  This also requires CAP_SYS_ADMIN in
> > > both our current user namespace and the user namespace which owns the
> > > target network namespace.  Again, since we've dropped capabilities in our
> > > original namespace this will never be the case.  
> > 
> > ...however, we can also have UID 0 in a non-init user namespace, and
> > that will work.  
> 
> Hrm..  I thought being UID 0 just meant we started with all the
> capabilities, so once we've explicitly dropped them we still won't be
> able to do this.  That seemed to be what happened when I tried running
> it as root.

If you run it as root, it will drop to nobody (or user passed via
--runas), and it drops capabilities anyway, so it won't be able to do
that.

If you run it as UID 0 in a non-init namespace, it won't change the
UID, though, and even after dropping capabilities, it will be able to
join a network namespace.

> > This is what happens in the Podman integration case. Unfortunately the
> > demo is broken at the moment (I had to rebase the patch with a bit of
> > care, I'll publish the updated one soon).  
> 
> Can you explain a bit more about what the podman use case is, and why
> it requires the netns only logic?

Podman creates a network namespace (with a filesystem handle), starts
slirp4netns (or pasta, in the integration draft) as UID 0 in a new user
namespace, pointing it to the network namespace:

# ps aux|grep pasta
sbrivio  2283703  0.0  0.0 2070672 56468 pts/10  Sl+  Jul19   0:40 ./bin/podman run --net=pasta:-T,5213-5214,-U,5213-5214 -p 5203-5204:5203-5204/tcp -p 5203-5204:5203-5204/udp --rm -ti alpine sh
sbrivio  2283760  0.1  0.0  85300 51120 ?        Ss   Jul19   0:57 /usr/bin/pasta --config-net -u 5203:5203 -t 5203:5203 -T 5213-5214 -U 5213-5214 /run/user/1000/netns/netns-3b6147d8-34e1-a516-87c3-631938a1973e

# readlink /proc/2283703/ns/net
net:[4026531992]
# readlink /proc/2283760/ns/net 
net:[4026531992]
# readlink /proc/2283703/ns/user 
user:[4026533032]
# readlink /proc/2283760/ns/user 
user:[4026533032]

It's equivalent to this example (for convenience, with PIDs instead of
filesystem handles):

---
[TTY #0]

$ unshare -Ur
# echo $$
4117948

[TTY #1]

$ nsenter --preserve-credentials -U -t 4117948
# unshare -n
# ip li sh
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
# echo $$
4126920

[TTY #0]

# ./pasta -f --netns-only 4126920
Outbound interface: enp9s0, namespace interface: enp9s0
ARP:
    address: a8:a1:59:8e:d7:b6
DHCP:
    assign: 88.198.0.164
    mask: 255.255.255.224
    router: 88.198.0.161
DNS:
    185.12.64.1
    185.12.64.2
NDP/DHCPv6:
    assign: 2a01:4f8:222:904::2
    router: fe80::1
    our link-local: fe80::aaa1:59ff:fe8e:d7b6
DNS:
    2a01:4ff:ff00::add:2
    2a01:4ff:ff00::add:1

[TTY #1]

# ip li sh
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp9s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether f2:c0:09:fe:89:c3 brd ff:ff:ff:ff:ff:ff
---

Unrelated to the Podman case: you can also do this and let pasta spawn
an interactive shell with its network namespace (also created by
pasta) detached:

---
$ unshare -Ur
# ./pasta --netns-only
Cannot set ping_group_range, ICMP requests might fail
$ ip li sh
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp9s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether a8:a1:59:8e:d7:b6 brd ff:ff:ff:ff:ff:ff
---

...if you then log out from this shell, it will hang:

openat(AT_FDCWD, "/proc/6500/ns/net", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/proc/6500/ns/net", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/proc/6500/ns/net", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

but that's a separate issue (which I just discovered).

-- 
Stefano


      reply	other threads:[~2022-07-20  8:00 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-19  6:23 [PATCH] RFC: Remove unusable --netns-only option David Gibson
2022-07-19 20:39 ` Stefano Brivio
2022-07-20  2:45   ` David Gibson
2022-07-20  8:00     ` Stefano Brivio [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220720100040.120eec70@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).