public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top, Paul Holzinger <pholzing@redhat.com>
Subject: Re: [PATCH v3] pasta: Don't try to watch namespaces in procfs with inotify, use timer instead
Date: Mon, 19 Feb 2024 13:34:34 +0100	[thread overview]
Message-ID: <20240219133405.6a295633@elisabeth> (raw)
In-Reply-To: <ZdMQp9Y7pGgkpMvn@zatzit>

On Mon, 19 Feb 2024 19:26:15 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Mon, Feb 19, 2024 at 09:05:33AM +0100, Stefano Brivio wrote:
> > We watch network namespace entries to detect when we should quit
> > (unless --no-netns-quit is passed), and these might stored in a tmpfs
> > typically mounted at /run/user/UID or /var/run/user/UID, or found in
> > procfs at /proc/PID/ns/.
> > 
> > Currently, we try to use inotify for any possible location of those
> > entries, but inotify, of course, doesn't work on pseudo-filesystems
> > (see inotify(7)).
> > 
> > The man page reflects this: the description of --no-netns-quit
> > implies that we won't quit anyway if the namespace is not "bound to
> > the filesystem".
> > 
> > Well, we won't quit, but, since commit 9e0dbc894813 ("More
> > deterministic detection of whether argument is a PID, PATH or NAME"),
> > we try. And, indeed, this is harmless, as the caveat from that
> > commit message states.
> > 
> > Now, it turns out that Buildah, a tool to create container images,
> > sharing its codebase with Podman, passes a procfs entry to pasta, and
> > expects pasta to exit once the network namespace is not needed
> > anymore, that is, once the original Buildah process terminates.
> > 
> > Get this to work by using the timer fallback mechanism if the
> > namespace name is passed as a path belonging to a pseudo-filesystem.
> > This is expected to be procfs, but I covered sysfs and devpts
> > pseudo-filesystems as well, because nothing actually prevents
> > creating this kind of directory structure and links there.
> > 
> > Note that fstatfs(), according to some versions of man pages, was
> > apparently "deprecated" by the LSB. My reasoning for using it is
> > essentially this:
> >   https://lore.kernel.org/linux-man/f54kudgblgk643u32tb6at4cd3kkzha6hslahv24szs4raroaz@ogivjbfdaqtb/t/#u
> > 
> > ...that is, there was no such thing as an LSB deprecation, and
> > anyway there's no other way to get the filesystem type.
> > 
> > Also note that, while it might sound more obvious to detect the
> > filesystem type using fstatfs() on the file descriptor itself
> > (c->pasta_netns_fd), the reported filesystem type for it is nsfs, no
> > matter what path was given to pasta. If we use the parent directory,
> > we'll typically have either tmpfs or procfs reported.
> > 
> > If the target naemsapce is given as a PID, or as a PID-based procfs
> > entry, we don't risk races if this PID is recycled: our handle on
> > /proc/PID/ns will always refer to the original namespace associated
> > with that PID, and we don't re-open this entry from procfs to check
> > it.
> > 
> > Instead of directly monitoring the target namespace, we could have
> > tried to monitor a process with a given PID, using pidfd_open() to
> > get a handle on it, to decide when to terminate.
> > 
> > But it's not guaranteed that the parent process is actually the one
> > associated to the network namespace we operate on, and if we get a
> > PID file descriptor for a PID (parent or not) or path that was given
> > on our command line, this inherently causes a race condition as that
> > PID might have been recycled by the time we call pidfd_open().
> > 
> > Even assuming the process we want to watch is the parent process, and
> > a race-free usage of pidfd_open() would have been possible, I'm not
> > very enthusiastic about enabling yet another system call in the
> > seccomp profile just for this, while openat() is needed anyway.
> > 
> > Update the man page to reflect that, even if the target network
> > namespace is passed as a procfs path or a PID, we'll now quit when
> > the procfs entry is gone.
> > 
> > Reported-by: Paul Holzinger <pholzing@redhat.com>
> > Link: https://github.com/containers/podman/pull/21563#issuecomment-1948200214
> > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> > ---
> > v3: Given that we now open c->netns_dir before checking the
> >     filesystem type, we could as well pass this file descriptor to
> >     fstatfs() to do the check, instead of statfs() on the path.
> > 
> >     Fix a couple of paragraphs in the commit message.
> > 
> > v2: Coverity Scan isn't happy if we "check" (kind of) c->netns_dir
> >     with statfs() before opening it in a non-atomic way. Just to make
> >     things clear, false positive or not: open it, check it, close it
> >     if it wasn't needed: we don't rely on the check.
> > 
> >  passt.1 |  8 ++++++--
> >  pasta.c | 24 +++++++++++++++++++-----
> >  2 files changed, 25 insertions(+), 7 deletions(-)
> > 
> > diff --git a/passt.1 b/passt.1
> > index dc2d719..de6e3bf 100644
> > --- a/passt.1
> > +++ b/passt.1
> > @@ -550,8 +550,12 @@ without \-\-userns.
> >  
> >  .TP
> >  .BR \-\-no-netns-quit
> > -If the target network namespace is bound to the filesystem (that is, if PATH or
> > -NAME are given as target), do not exit once the network namespace is deleted.
> > +If the target network namespace is bound to the filesystem, do not exit once
> > +that path is deleted.
> > +
> > +If the target network namespace is represented by a procfs entry, do not exit
> > +once that entry is removed from procfs (representing the fact that a process
> > +with the given PID terminated).  
> 
> I realised part of the reason this seems so awkward to me is that
> we're describing our normal behaviour w.r.t. netns lifetime, in the
> context of an option that disables that.  So, maybe rephrase something like::
> 
>   --no-netns-quit
>   Don't exit based on the state of the network namespace.
> 
>   Usually we exit if... <details of the logic>.
> 
> 
> >  
> > .TP
> >  .BR \-\-config-net
> > diff --git a/pasta.c b/pasta.c
> > index 01d1511..61feaa9 100644
> > --- a/pasta.c
> > +++ b/pasta.c
> > @@ -33,6 +33,7 @@
> >  #include <sys/timerfd.h>
> >  #include <sys/types.h>
> >  #include <sys/stat.h>
> > +#include <sys/statfs.h>
> >  #include <fcntl.h>
> >  #include <sys/wait.h>
> >  #include <signal.h>
> > @@ -41,6 +42,7 @@
> >  #include <netinet/in.h>
> >  #include <net/ethernet.h>
> >  #include <sys/syscall.h>
> > +#include <linux/magic.h>
> >  
> >  #include "util.h"
> >  #include "passt.h"
> > @@ -390,12 +392,21 @@ void pasta_netns_quit_init(const struct ctx *c)
> >  	union epoll_ref ref = { .type = EPOLL_TYPE_NSQUIT_INOTIFY };
> >  	struct epoll_event ev = { .events = EPOLLIN };
> >  	int flags = O_NONBLOCK | O_CLOEXEC;
> > -	int fd;
> > +	struct statfs s = { 0 };  
> 
> I still don't like this initialisation, but I can live with it.  Also,
> it's slightly shorter than the next line.

Shorter than "bool try_inotify = true;"? They're both 24 characters...?

-- 
Stefano


  reply	other threads:[~2024-02-19 12:35 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-19  8:05 [PATCH v3] pasta: Don't try to watch namespaces in procfs with inotify, use timer instead Stefano Brivio
2024-02-19  8:26 ` David Gibson
2024-02-19 12:34   ` Stefano Brivio [this message]
2024-02-19 12:40 ` Paul Holzinger
2024-02-19 13:06   ` Stefano Brivio
2024-02-19 13:40     ` Paul Holzinger
2024-02-19 15:18       ` Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240219133405.6a295633@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=pholzing@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).