public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top, Paul Holzinger <pholzing@redhat.com>
Subject: Re: [PATCH v2] pasta: Don't try to watch namespaces in procfs with inotify, use timer instead
Date: Mon, 19 Feb 2024 13:38:01 +1100	[thread overview]
Message-ID: <ZdK_CYJY8AyKfzgw@zatzit> (raw)
In-Reply-To: <20240217141859.2358370-1-sbrivio@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 7428 bytes --]

On Sat, Feb 17, 2024 at 03:18:59PM +0100, Stefano Brivio wrote:
> We watch network namespace entries to detect when we should quit
> (unless --no-netns-quit is passed), and these might stored in a tmpfs
> typically mounted at /run/user/UID or /var/run/user/UID, or found in
> procfs at /proc/PID/ns/.
> 
> Currently, we try to use inotify for any possible location of those
> entries, but inotify, of course, doesn't work on pseudo-filesystems
> (see inotify(7)).
> 
> The man page reflects this: the description of --no-netns-quit
> implies that we won't quit anyway if the namespace is not "bound to
> the filesystem".
> 
> Well, we won't quit, but, since commit 9e0dbc894813 ("More
> deterministic detection of whether argument is a PID, PATH or NAME"),
> we try. And, indeed, this is harmless, as the caveat from that
> commit message states.
> 
> Now, it turns out that Buildah, a tool to create container images,
> sharing its codebase with Podman, passes a procfs entry to pasta, and
> expects pasta to exit once the network namespace is not needed
> anymore, that is, once the original Buildah process terminates.
> 
> Get this to work by using the timer fallback mechanism if the
> namespace name is passed as a path belonging to a pseudo-filesystem.
> This is expected to be procfs, but I covered sysfs and devpts
> pseudo-filesystems as well, because nothing actually prevents
> creating this kind of directory structure and links there.
> 
> Note that statfs(), according to some versions of man pages, was
> apparently "deprecated" by the LSB. My reasoning for using it is
> essentially this:
>   https://lore.kernel.org/linux-man/f54kudgblgk643u32tb6at4cd3kkzha6hslahv24szs4raroaz@ogivjbfdaqtb/t/#u
> 
> ...that is, there was no such thing as an LSB deprecation, and
> anyway there's no other way to get the filesystem type.
> 
> Also note that, while it might sound more robust to detect the
> filesystem type using fstatfs() on the file descriptor
> (c->pasta_netns_fd), the reported filesystem type for it is nsfs, no
> matter what path was given to pasta. If we use the parent directory,
> we'll typically have either tmpfs or procfs reported.
> 
> The timer, however, still uses the file descriptor of the parent
> directory later, as it has no access to the filesystem, and that
> avoids possible races if the PID is recycled: if the original process
> terminates, the handle we have on /proc/PID/ns still refers to it,
> not to any other process with the same PID.
> 
> We could have used pidfd_open() to get a handle on the parent
> process. But it's not guaranteed that the parent process is actually
> the one associated to the network namespace we operate on, and if we
> get a PID file descriptor for a PID (parent or not) or path that was
> given on our command line, this inherently causes a race condition as
> that PID might have been recycled by the time we call pidfd_open().
> 
> Even assuming the process we want to watch is the parent process, and
> a race-free usage of pidfd_open() would have been possible, I'm not
> very enthusiastic about enabling yet another system call in the
> seccomp profile just for this, while openat() is needed anyway.
> 
> Update the man page to reflect that, even if the target network
> namespace is passed as a procfs path or a PID, we'll now quit when
> the procfs entry is gone.

Oops, didn't spot this before replying to v1.  Looks like the comments
there still apply, though.

> Reported-by: Paul Holzinger <pholzing@redhat.com>
> Link: https://github.com/containers/podman/pull/21563#issuecomment-1948200214
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> ---
> v2: Coverity Scan isn't happy if we "check" (kind of) c->netns_dir
>     with statfs() before opening it in a non-atomic way. Just to make
>     things clear, false positive or not: open it, check it, close it
>     if it wasn't needed: we don't rely on the check.
> 
>  passt.1 |  8 ++++++--
>  pasta.c | 24 +++++++++++++++++++-----
>  2 files changed, 25 insertions(+), 7 deletions(-)
> 
> diff --git a/passt.1 b/passt.1
> index dc2d719..de6e3bf 100644
> --- a/passt.1
> +++ b/passt.1
> @@ -550,8 +550,12 @@ without \-\-userns.
>  
>  .TP
>  .BR \-\-no-netns-quit
> -If the target network namespace is bound to the filesystem (that is, if PATH or
> -NAME are given as target), do not exit once the network namespace is deleted.
> +If the target network namespace is bound to the filesystem, do not exit once
> +that path is deleted.
> +
> +If the target network namespace is represented by a procfs entry, do not exit
> +once that entry is removed from procfs (representing the fact that a process
> +with the given PID terminated).
>  
>  .TP
>  .BR \-\-config-net
> diff --git a/pasta.c b/pasta.c
> index 01d1511..465fe1a 100644
> --- a/pasta.c
> +++ b/pasta.c
> @@ -33,6 +33,7 @@
>  #include <sys/timerfd.h>
>  #include <sys/types.h>
>  #include <sys/stat.h>
> +#include <sys/statfs.h>
>  #include <fcntl.h>
>  #include <sys/wait.h>
>  #include <signal.h>
> @@ -41,6 +42,7 @@
>  #include <netinet/in.h>
>  #include <net/ethernet.h>
>  #include <sys/syscall.h>
> +#include <linux/magic.h>
>  
>  #include "util.h"
>  #include "passt.h"
> @@ -390,12 +392,21 @@ void pasta_netns_quit_init(const struct ctx *c)
>  	union epoll_ref ref = { .type = EPOLL_TYPE_NSQUIT_INOTIFY };
>  	struct epoll_event ev = { .events = EPOLLIN };
>  	int flags = O_NONBLOCK | O_CLOEXEC;
> -	int fd;
> +	struct statfs s = { 0 };
> +	bool try_inotify = true;
> +	int fd = -1, dir_fd;
>  
>  	if (c->mode != MODE_PASTA || c->no_netns_quit || !*c->netns_base)
>  		return;
>  
> -	if ((fd = inotify_init1(flags)) < 0)
> +	if ((dir_fd = open(c->netns_dir, O_CLOEXEC | O_RDONLY)) < 0)
> +		die("netns dir open: %s, exiting", strerror(errno));
> +
> +	if (statfs(c->netns_dir, &s)     || s.f_type == DEVPTS_SUPER_MAGIC ||
> +	    s.f_type == PROC_SUPER_MAGIC || s.f_type == SYSFS_MAGIC)
> +		try_inotify = false;

Since you're already opening netns_dir, it seems prudent to use
fstatfs() instead of statfs() here.

> +	if (try_inotify && (fd = inotify_init1(flags)) < 0)
>  		warn("inotify_init1(): %s, use a timer", strerror(errno));
>  
>  	if (fd >= 0 && inotify_add_watch(fd, c->netns_dir, IN_DELETE) < 0) {
> @@ -409,11 +420,11 @@ void pasta_netns_quit_init(const struct ctx *c)
>  		if ((fd = pasta_netns_quit_timer()) < 0)
>  			die("Failed to set up fallback netns timer, exiting");
>  
> -		ref.nsdir_fd = open(c->netns_dir, O_CLOEXEC | O_RDONLY);
> -		if (ref.nsdir_fd < 0)
> -			die("netns dir open: %s, exiting", strerror(errno));
> +		ref.nsdir_fd = dir_fd;
>  
>  		ref.type = EPOLL_TYPE_NSQUIT_TIMER;
> +	} else {
> +		close(dir_fd);
>  	}
>  
>  	if (fd > FD_REF_MAX)
> @@ -463,6 +474,9 @@ void pasta_netns_quit_timer_handler(struct ctx *c, union epoll_ref ref)
>  
>  	fd = openat(ref.nsdir_fd, c->netns_base, O_PATH | O_CLOEXEC);
>  	if (fd < 0) {
> +		if (errno == EACCES)	/* Expected for existing procfs entry */
> +			return;
> +
>  		info("Namespace %s is gone, exiting", c->netns_base);
>  		exit(EXIT_SUCCESS);
>  	}

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

      reply	other threads:[~2024-02-19  2:38 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-17 14:18 [PATCH v2] pasta: Don't try to watch namespaces in procfs with inotify, use timer instead Stefano Brivio
2024-02-19  2:38 ` David Gibson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZdK_CYJY8AyKfzgw@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=pholzing@redhat.com \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).