public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* Alas for CAP_NET_BIND_SERVICE
@ 2022-10-12  2:55 David Gibson
  2022-10-12  5:54 ` Stefano Brivio
  0 siblings, 1 reply; 12+ messages in thread
From: David Gibson @ 2022-10-12  2:55 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 2556 bytes --]

Hi Stefano,

I've looked deeper into why giving passt/pasta CAP_NET_BIND_SERVICE
isn't working, and I'm afraid I have bad news.

We lose CAP_NET_BIND_SERVICE in the initial namespace as soon as we
unshare() or setns() into the isolated namespace, and this appears to
be intended behaviour.  From user_namespaces(7), in the Capabilities section:

    The child process created by clone(2) with the CLONE_NEWUSER flag
    starts out with a complete set of capabilities in the new user
    namespace.  Likewise, a process that creates a new user namespace
    using unshare(2) or joins an existing user namespace using
    setns(2) gains a full set of capabilities in that namespace.  ***On
    the other hand, that process has no capabilities in the parent (in
    the case of clone(2)) or previous (in the case of unshare(2) and
    setns(2)) user namespace, even if the new namespace is created or
    joined by the root user (i.e., a process with user ID 0 in the
    root namespace).***

Emphasis (***) mine.  Basically, despite the way it's phrased in many
places, processes don't have an independent set of capabilities in
each userns, they only have a set of capabilities in their current
userns.  Any capabilities in other namespaces are implied in a pretty
much all or nothing way - if the process's UID (the real, init ns one)
owns the userns (or one of its ancestors), it gets all caps, otherwise
none.  cap_capable() has the specific logic in the kernel.

So, using CAP_NET_BIND_SERVICE isn't compatible with isolating
ourselves in our own userns.  At the very least "auto" inbound
forwarding of low ports is pretty much off the cards.

For forwarding of specific low ports, we could delay our entry into
the new userns until we've set up the listening sockets, although it
does mean rolling back some of the simplification we gained from the
new-style userns handling.

Or, we could abandon CAP_NET_BIND_SERVICE, and recommend the
net.ipv4.ip_unprivileged_port_start sysctl as the only way to handle
low ports in passt.  I do see a fair bit of logic in that approach:
passt has no meaningful way to limit what users do with the low ports
it allows them (indirectly) to bind to, giving passt
CAP_NET_BIND_SERVICE is pretty much equivalent to giving any process
which can invoke passt CAP_NET_BIND_SERVICE.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-10-17  4:22 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-12  2:55 Alas for CAP_NET_BIND_SERVICE David Gibson
2022-10-12  5:54 ` Stefano Brivio
2022-10-12  9:31   ` David Gibson
2022-10-12 10:47     ` Stefano Brivio
2022-10-13  0:34       ` David Gibson
2022-10-13  4:54         ` Stefano Brivio
2022-10-13  5:15           ` Stefano Brivio
2022-10-14  2:54           ` David Gibson
2022-10-16  9:46             ` Stefano Brivio
2022-10-17  3:20               ` David Gibson
2022-10-13 10:50       ` Stefano Brivio
2022-10-14  2:56         ` David Gibson

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).