On Wed, 20 Jul 2022 12:45:26 +1000 David Gibson wrote: > On Tue, Jul 19, 2022 at 10:39:25PM +0200, Stefano Brivio wrote: > > On Tue, 19 Jul 2022 16:23:10 +1000 > > David Gibson wrote: > > > > > The intended semantics of --netns-only are pretty unclear to me. It's > > > intended for pasta, but it's not clear whether its saying the spawned shell > > > should only enter the target netns, or that the passt/pasta packet > > > forwarding process should only sandbox itself in a network namespace, not > > > a user namespace. > > > > The latter. I think this is marginally more clear in the man page, but needs > > indeed a better explanation. > > Definitely. At present it also appears to affect the spawned shell as > well, it a rather counter-intuitive way. Right, in that case we should restrict conditions where we can spawn a shell to having UID 0 in a non-init namespace. See working example below. > > > In any case, as far as I can tell there's not actually any case in which > > > the --netns-only option will work. If nothing else, we will always fail > > > in sandbox(), because it attempts a number of operations which require > > > CAP_SYS_ADMIN in our current user namespace. We drop all capabilities in > > > our initial user namespace when we start, so the only way we can have > > > CAP_SYS_ADMIN at this point is if we've joined a new user namespace, which > > > we won't do with --netns-only. > > > > > > For pasta joining an existing namespace (the apparently intended use case), we'll actually fail before > > > we'll fail before we get to that point: in conf_ns_check() we'll attempt > > > to join the target network namespace. This also requires CAP_SYS_ADMIN in > > > both our current user namespace and the user namespace which owns the > > > target network namespace. Again, since we've dropped capabilities in our > > > original namespace this will never be the case. > > > > ...however, we can also have UID 0 in a non-init user namespace, and > > that will work. > > Hrm.. I thought being UID 0 just meant we started with all the > capabilities, so once we've explicitly dropped them we still won't be > able to do this. That seemed to be what happened when I tried running > it as root. If you run it as root, it will drop to nobody (or user passed via --runas), and it drops capabilities anyway, so it won't be able to do that. If you run it as UID 0 in a non-init namespace, it won't change the UID, though, and even after dropping capabilities, it will be able to join a network namespace. > > This is what happens in the Podman integration case. Unfortunately the > > demo is broken at the moment (I had to rebase the patch with a bit of > > care, I'll publish the updated one soon). > > Can you explain a bit more about what the podman use case is, and why > it requires the netns only logic? Podman creates a network namespace (with a filesystem handle), starts slirp4netns (or pasta, in the integration draft) as UID 0 in a new user namespace, pointing it to the network namespace: # ps aux|grep pasta sbrivio 2283703 0.0 0.0 2070672 56468 pts/10 Sl+ Jul19 0:40 ./bin/podman run --net=pasta:-T,5213-5214,-U,5213-5214 -p 5203-5204:5203-5204/tcp -p 5203-5204:5203-5204/udp --rm -ti alpine sh sbrivio 2283760 0.1 0.0 85300 51120 ? Ss Jul19 0:57 /usr/bin/pasta --config-net -u 5203:5203 -t 5203:5203 -T 5213-5214 -U 5213-5214 /run/user/1000/netns/netns-3b6147d8-34e1-a516-87c3-631938a1973e # readlink /proc/2283703/ns/net net:[4026531992] # readlink /proc/2283760/ns/net net:[4026531992] # readlink /proc/2283703/ns/user user:[4026533032] # readlink /proc/2283760/ns/user user:[4026533032] It's equivalent to this example (for convenience, with PIDs instead of filesystem handles): --- [TTY #0] $ unshare -Ur # echo $$ 4117948 [TTY #1] $ nsenter --preserve-credentials -U -t 4117948 # unshare -n # ip li sh 1: lo: mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 # echo $$ 4126920 [TTY #0] # ./pasta -f --netns-only 4126920 Outbound interface: enp9s0, namespace interface: enp9s0 ARP: address: a8:a1:59:8e:d7:b6 DHCP: assign: 88.198.0.164 mask: 255.255.255.224 router: 88.198.0.161 DNS: 185.12.64.1 185.12.64.2 NDP/DHCPv6: assign: 2a01:4f8:222:904::2 router: fe80::1 our link-local: fe80::aaa1:59ff:fe8e:d7b6 DNS: 2a01:4ff:ff00::add:2 2a01:4ff:ff00::add:1 [TTY #1] # ip li sh 1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp9s0: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether f2:c0:09:fe:89:c3 brd ff:ff:ff:ff:ff:ff --- Unrelated to the Podman case: you can also do this and let pasta spawn an interactive shell with its network namespace (also created by pasta) detached: --- $ unshare -Ur # ./pasta --netns-only Cannot set ping_group_range, ICMP requests might fail $ ip li sh 1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp9s0: mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 link/ether a8:a1:59:8e:d7:b6 brd ff:ff:ff:ff:ff:ff --- ...if you then log out from this shell, it will hang: openat(AT_FDCWD, "/proc/6500/ns/net", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/proc/6500/ns/net", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/proc/6500/ns/net", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) but that's a separate issue (which I just discovered). -- Stefano