Re: [PATCH] conf, pasta: Add --no-tap option

public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed

From: Yumei Huang <yuhuang@redhat.com>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: Paul Holzinger <pholzing@redhat.com>,
	passt-dev@passt.top, david@gibson.dropbear.id.au,
	Jon Maloy <jmaloy@redhat.com>
Subject: Re: [PATCH] conf, pasta: Add --no-tap option
Date: Mon, 12 Jan 2026 16:20:20 +0800	[thread overview]
Message-ID: <CANsz47=jt+pK_qoYbprf6Krr9uO3GpjD_bBtCTVnm1ez7z8T5g@mail.gmail.com> (raw)
In-Reply-To: <20260110191202.027b7f95@elisabeth>

On Sun, Jan 11, 2026 at 2:12 AM Stefano Brivio <sbrivio@redhat.com> wrote:
>
> [Cc'ing Jon for awareness around the part about netlink monitor and
> capabilities, four paragraphs down]
>
> On Wed, 7 Jan 2026 16:20:18 +0100
> Paul Holzinger <pholzing@redhat.com> wrote:
>
> > On 05/01/2026 22:10, Stefano Brivio wrote:
> > > On Mon, 5 Jan 2026 14:48:15 +0100
> > > Paul Holzinger <pholzing@redhat.com> wrote:
> > >
> > >> Sorry I was out for a while so I didn't had time to clarify on the bug
> > >> before.
> > >>
> > >> On 29/12/2025 10:55, Yumei Huang wrote:
> > >>> This patch introduces a mode where we only forward loopback connections
> > >>> and traffic between two namespaces (via the loopback interface, 'lo'),
> > >>> without a tap device.
> > >>>
> > >>> With this, podman can support forwarding ::1 in custom networks when using
> > >>> rootlesskit for forwarding ports.
> > >> I guess I didn't really communicate my requirements well.
> > > I guess it's more likely that you actually did, but I mixed up the
> > > association between requirements and use cases, sorry for that.
> > >
> > > In any case, good that we need this anyway, just for another use case.
> > > :)
> > >
> > >> When we use
> > >> rootlessport (rootlesskit) today for custom networks we only do so as
> > >> rootless user and it forwards ::1 (by possibly mapping this to v4 inside
> > >> the container) fine.
> > > So, wait a moment, is my comment at:
> > >
> > >    https://github.com/containers/podman/issues/14491#issuecomment-2898191772
> > >
> > > actually wrong? I don't have time right now to test that but from user
> > > reports and some vague memory I thought ::1 forwarding wouldn't work
> > > with custom networks regardless of root or rootless, because
> > > rootlesskit didn't handle that anyway.
> >
> > yes, rootlesskit handles ipv6 just fine, it is just that our
> > rootlessport code remaps that to v4 inside the container.
>
> Actually, at a glance, I don't think that this could be fixed entirely
> in the rootlessport implementation, as rootlesskit doesn't seem to look
> at the destination address of the original connection at all.
>
> > >> My main point for this feature was using as root (requires further
> > >> changes to allow pasta running as root).
> > > ...which should be entirely on Podman side and it's still on my plate,
> > > by the way:
> > >
> > >    https://github.com/containers/podman/issues/17840
> > >    https://pad.passt.top/p/Features_2025#L40
> >
> > I don't see how this can be fixed on the podman side, the network
> > namespace of a rootful container (not userns=auto) is owned by the root
> > user. If you configure something in there you must have real
> > CAP_NET_ADMIN from the host init userns. So pasta must not drop this
> > privilege before configuring the netns.
>
> Oops, right. My starting point was this change, which is actually
> trivial (at least as a test) and something I already tried out, but
> then I hit a number of issues in Podman I never really figured out.
>
> So yes, it takes one change in pasta, but the substantial part left for
> me to figure out is why Podman didn't just work with it. It's not
> necessarily complicated, I spent just a couple of hours on it, so maybe
> there's something simple I missed.
>
> > And even then with the future
> > netlink monitor work we would need to keep that privilege level to
> > modify the netns even during runtime?
>
> This just reminded me that, somewhat surprisingly, for netlink
> operations, the check on capabilities is not just performed on the
> process creating the socket when the socket is created, but also later
> *on the sender of the message*.
>
> This is inconsistent with other operations on other types of sockets
> where the whole context is checked and assigned at the time of the
> creation, and was introduced because of a specific behaviour of Zebra
> (the routing daemon) in 2014, see discussion around:
>
>   https://lore.kernel.org/all/87d2g7d9ag.fsf_-_@x220.int.ebiederm.org/#r
>
> and I stumbled upon it a while ago while preparing a seitan demo
> replaying nft messages for an unprivileged container:
>
>   https://seitan.rocks/seitan/tree/demo/nft.hjson#n38
>
> So, my blanket answer "we create that socket at the beginning" doesn't
> apply here.
>
> However, assuming that this RFC patch from Jon actually works (I haven't
> tested it):
>
>   https://archives.passt.top/passt-dev/20251215015441.887736-11-jmaloy@redhat.com/
>
> I would say we're fine with it. Well, there's still the possibility
> that it doesn't work if Podman originally detached the network
> namespace, I'm not sure.
>
> If it doesn't work, we'll need to retain more capabilities, or even
> keep a cloned process around for this kind of stuff. We could also fix
> that in the kernel, Zebra doesn't need that quirk anymore.
>
> > >> Because as root podman does
> > >> port forwarding via DNAT firewall rules (i.e. custom nftables rules we
> > >> add). The kernel however never added support for DNAT on ::1 meaning
> > >> clients trying to access that are not getting forwarded. The only way to
> > >> support this is using a user space helper. Right now this doesn't work
> > >> and we do not use rootlessport for this either so I was just thinking
> > >> ahead because we do have these users requests who want ::1 to work as root.
> > >>
> > >> For the current rootlessport use case we also must bind all ports as
> > >> given (i.e. also addresses 0.0.0.0 bind address), just forwarding
> > >> loopback to loopback is not what we want or do for security reasons, see
> > >> CVE-2021-20199. And logically it would not really work to have another
> > >> process bind 0.0.0.0 and this pasta helper bind lo on the same port at
> > >> the same time.
> > >>
> > >> The way I am thinking is bind ports as normal, add the no-tap option and
> > >> add two options to give the v4 and v6 namespace (container) side connect
> > >> addresses so we never actually connect to lo. Then we also should have a
> > >> dynamic way to update the connect addresses at runtime which is required
> > >> for podman network connect/disconnect to work which changes the
> > >> addresses inside the namespace, see
> > >> https://github.com/containers/podman/commit/e88d8dbeae2aebd2d816f16a21891764163afcd4.
> > >>
> > >> Overall none of this is a blocker for removing rootlessport. I think our
> > >> plan was and still is to use the dynamic port forwarding logic David is
> > >> working on to replace the rootless custom network port forwarding case
> > >> with that.
> > > Regardless of other requirements that are needed as well to support
> > > forwarding ::1 for root containers (or rootless with --userns=auto),
> > > this feature by itself makes sense as it is and we'll need it as it is,
> > > right?
> > >
> > > By the way we routinely get requests for this feature by pasta (and
> > > Podman) users, regardless of any specific Podman integration, so I
> > > think the feature is generic enough as to make sense regardless of your
> > > plan for root containers.
> >
> > I am not sure how I would use or integrate a loopback to loopback
> > forwarder in podman so I don't think we would need or can use that as is.
>
> Well, I'm not sure, I just remember that you had in mind some use cases
> that could be fixed with this (and even noted them down in the
> references from the ticket).
>
> Sorry Yumei, I should have checked more recently, as it looks like this
> doesn't currently have as much priority as I thought, at least in
> Podman's perspective. In any case it's definitely useful.

No worries at all :)
>
> By the way, if it's for the root case, we'll still need it the day we
> support operation when started as root. If it's to fix up IPv4 / IPv6
> loopback mapping in the rootless case, it would be usable right away.
>
> > I think the use case itself is still interesting and if there are end
> > users asking for it sure not objections from me. I guess it could be
> > interesting to expose a service without giving it access to the full
> > internet and without having to deal with complicated firewall rules,
> > i.e. with this we get a container that only could communicate by
> > replying to the forwarded ports.
>
> Right, yes, it might also be one way to implement "isolated" containers
> as described in https://bugs.passt.top/show_bug.cgi?id=139 (I still have
> to follow up on comments there, and that might take a while, but let me
> quickly mention that it has little/nothing to do with local mode).
>
> --
> Stefano
>


-- 
Thanks,

Yumei Huang

next prev parent reply	other threads:[~2026-01-12  8:20 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-29  9:55 Yumei Huang
2025-12-31 15:07 ` Stefano Brivio
2026-01-05  4:18 ` David Gibson
2026-01-05  8:53   ` Yumei Huang
2026-01-10 18:12     ` Stefano Brivio
2026-01-12  4:26       ` David Gibson
2026-01-13  0:12         ` Stefano Brivio
2026-01-13  2:39           ` David Gibson
2026-01-13  9:57       ` Yumei Huang
2026-01-05 13:48 ` Paul Holzinger
2026-01-05 21:10   ` Stefano Brivio
2026-01-07 15:20     ` Paul Holzinger
2026-01-10 18:12       ` Stefano Brivio
2026-01-12  8:20         ` Yumei Huang [this message]
2026-01-10 18:12 ` Stefano Brivio
2026-01-13 11:20   ` Yumei Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANsz47=jt+pK_qoYbprf6Krr9uO3GpjD_bBtCTVnm1ez7z8T5g@mail.gmail.com' \
    --to=yuhuang@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=jmaloy@redhat.com \
    --cc=passt-dev@passt.top \
    --cc=pholzing@redhat.com \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).