From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 1948F5A027F for ; Fri, 16 Feb 2024 04:00:48 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1708052444; bh=7DiBfUiFM1l119EG0Y2LlLfVqt5Zv6kCGG9/Jj26n2E=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=b39U+j/mdQtC9tTKfxquL5f/Us2JIeOvIzNdUUjJsoER1n5pTkCHk+5RPRSAy1FIS An1NhrWBrTpTWRR5FmdO/EI1cic3oTm5p62WoOf7MzJA4tszQx/vaWYfMcdOkFGKOd mZrveUN9P7akPGh8suWpLeJTAg+DAUgsuFaoA0+jgwXBIYHbT4cBmCPMqwjkfX1xoy hRWiFFMLut23ppfEnchJ8DEDUN7Td+ayGxQPKTl9V87PC/wW4JTiiHISrR6qgbsF8T 6/QTakSGof5fC/bbv0Mpc6hbYFvAQS/uibBCWNw/JwAtymwBpy3xqObIzrbVJjgslH dz5XVuQ/hq5fw== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4TbcBN32N7z4wys; Fri, 16 Feb 2024 14:00:44 +1100 (AEDT) Date: Fri, 16 Feb 2024 14:00:41 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH] pasta: Add fallback timer mechanism to check if namespace is gone Message-ID: References: <20240215223911.486945-1-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="8HmR5W6MfKxiue6P" Content-Disposition: inline In-Reply-To: <20240215223911.486945-1-sbrivio@redhat.com> Message-ID-Hash: LI5UCE2BEPT32EKH4W7MBKEPCPAJJFRM X-Message-ID-Hash: LI5UCE2BEPT32EKH4W7MBKEPCPAJJFRM X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Paul Holzinger X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --8HmR5W6MfKxiue6P Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Feb 15, 2024 at 11:39:11PM +0100, Stefano Brivio wrote: > We don't know how frequently this happens, but hitting > fs.inotify.max_user_watches or similar sysctl limits is definitely > not out of question, and Paul mentioned that, for example, Podman's > CI environments hit similar issues in the past. >=20 > Introduce a fallback mechanism based on a timer file descriptor: we > grab the directory handle at startup, and we can then use openat(), > triggered periodically, to check if the (network) namespace directory > still exists. If openat() fails at some point, exit. >=20 > Link: https://github.com/containers/podman/pull/21563#issuecomment-194350= 5707 > Reported-by: Paul Holzinger > Signed-off-by: Stefano Brivio > --- > passt.c | 6 ++-- > passt.h | 2 ++ > pasta.c | 85 +++++++++++++++++++++++++++++++++++++++++++++------------ > pasta.h | 2 +- > 4 files changed, 73 insertions(+), 22 deletions(-) >=20 > diff --git a/passt.c b/passt.c > index aaa8e58..13670b9 100644 > --- a/passt.c > +++ b/passt.c > @@ -201,7 +201,7 @@ void exit_handler(int signal) > */ > int main(int argc, char **argv) > { > - int nfds, i, devnull_fd =3D -1, pidfile_fd =3D -1, quit_fd; > + int nfds, i, devnull_fd =3D -1, pidfile_fd =3D -1; > struct epoll_event events[EPOLL_EVENTS]; > char *log_name, argv0[PATH_MAX], *name; > struct ctx c =3D { 0 }; > @@ -274,7 +274,7 @@ int main(int argc, char **argv) > if (c.force_stderr || isatty(fileno(stdout))) > __openlog(log_name, LOG_PERROR, LOG_DAEMON); > =20 > - quit_fd =3D pasta_netns_quit_init(&c); > + pasta_netns_quit_init(&c); > =20 > tap_sock_init(&c); > =20 > @@ -371,7 +371,7 @@ loop: > tap_listen_handler(&c, eventmask); > break; > case EPOLL_TYPE_NSQUIT: > - pasta_netns_quit_handler(&c, quit_fd); > + pasta_netns_quit_handler(&c, ref); Hm. As a rule, I've been trying to use a separate EPOLL_TYPE for each different handler, rather than having secondary dispatch based on other details, even if those different handlers are accomplishing similar purposes (e.g. TAP_PASTA vs. TAP_PASST). > break; > case EPOLL_TYPE_TCP: > tcp_sock_handler(&c, ref, eventmask); > diff --git a/passt.h b/passt.h > index a9e8f15..1c6bb13 100644 > --- a/passt.h > +++ b/passt.h > @@ -84,6 +84,7 @@ enum epoll_type { > * @udp: UDP-specific reference part > * @icmp: ICMP-specific reference part > * @data: Data handled by protocol handlers > + * @nsdir_fd: netns dirfd for fallback timer checking if namespace is go= ne > * @u64: Opaque reference for epoll_ctl() and epoll_wait() > */ > union epoll_ref { > @@ -99,6 +100,7 @@ union epoll_ref { > union udp_epoll_ref udp; > union icmp_epoll_ref icmp; > uint32_t data; > + int nsdir_fd; > }; > }; > uint64_t u64; > diff --git a/pasta.c b/pasta.c > index 94807a3..60b6223 100644 > --- a/pasta.c > +++ b/pasta.c > @@ -30,6 +30,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -356,57 +357,105 @@ void pasta_ns_conf(struct ctx *c) > proto_update_l2_buf(c->mac_guest, NULL); > } > =20 > +/** > + * pasta_netns_quit_timer() - Set up fallback timer to monitor namespace > + * > + * Return: timerfd file descriptor, negative error code on failure > + */ > +static int pasta_netns_quit_timer(void) > +{ > + int fd =3D timerfd_create(CLOCK_MONOTONIC, TFD_CLOEXEC); > + struct itimerspec it =3D { { 1, 0 }, { 1, 0 } }; /* one-second interval= */ > + > + if (fd =3D=3D -1) { > + err("timerfd_create(): %s", strerror(errno)); > + return -errno; > + } > + > + if (timerfd_settime(fd, 0, &it, NULL) < 0) { > + err("timerfd_settime(): %s", strerror(errno)); > + close(fd); > + return -errno; > + } > + > + return fd; > +} > + > /** > * pasta_netns_quit_init() - Watch network namespace to quit once it's g= one > * @c: Execution context > * > - * Return: inotify file descriptor, -1 on failure or if not needed/appli= cable > + * Return: file descriptor (inotify or timerfd), -1 if not needed > */ > int pasta_netns_quit_init(const struct ctx *c) > { > + union epoll_ref ref =3D { .type =3D EPOLL_TYPE_NSQUIT, .nsdir_fd =3D -1= }; > + struct epoll_event ev =3D { .events =3D EPOLLIN }; > int flags =3D O_NONBLOCK | O_CLOEXEC; > - union epoll_ref ref =3D { .type =3D EPOLL_TYPE_NSQUIT }; > - struct epoll_event ev =3D { > - .events =3D EPOLLIN > - }; > - int inotify_fd; > + int fd =3D -1; > =20 > if (c->mode !=3D MODE_PASTA || c->no_netns_quit || !*c->netns_base) > return -1; > =20 > - if ((inotify_fd =3D inotify_init1(flags)) < 0) { > - perror("inotify_init(): won't quit once netns is gone"); > - return -1; > + if ((fd =3D inotify_init1(flags)) < 0) > + warn("inotify_init1(): %s, use a timer", strerror(errno)); > + > + if (fd >=3D 0 && inotify_add_watch(fd, c->netns_dir, IN_DELETE) < 0) { > + warn("inotify_add_watch(): %s, use a timer", > + strerror(errno)); > + close(fd); > + fd =3D -1; > } > =20 > - if (inotify_add_watch(inotify_fd, c->netns_dir, IN_DELETE) < 0) { > - perror("inotify_add_watch(): won't quit once netns is gone"); > - return -1; > + if (fd < 0) { > + if ((fd =3D pasta_netns_quit_timer()) < 0) > + die("Failed to set up fallback netns timer, exiting"); > + > + ref.nsdir_fd =3D open(c->netns_dir, O_CLOEXEC | O_RDONLY); > + if (ref.nsdir_fd < 0) > + die("netns dir open: %s, exiting", strerror(errno)); > } > =20 > - ref.fd =3D inotify_fd; > + if (fd > FD_REF_MAX) > + die("netns monitor file number %i too big, exiting", fd); > + > + ref.fd =3D fd; > ev.data.u64 =3D ref.u64; > - epoll_ctl(c->epollfd, EPOLL_CTL_ADD, inotify_fd, &ev); > + epoll_ctl(c->epollfd, EPOLL_CTL_ADD, fd, &ev); > =20 > - return inotify_fd; > + return fd; > } > =20 > /** > * pasta_netns_quit_handler() - Handle ns directory events, exit if ns i= s gone > * @c: Execution context > - * @inotify_fd: inotify file descriptor with watch on namespace directory > + * @ref: epoll reference for inotify or timer descriptor > */ > -void pasta_netns_quit_handler(struct ctx *c, int inotify_fd) > +void pasta_netns_quit_handler(struct ctx *c, union epoll_ref ref) > { > char buf[sizeof(struct inotify_event) + NAME_MAX + 1]; > const struct inotify_event *in_ev =3D (struct inotify_event *)buf; > =20 > - if (read(inotify_fd, buf, sizeof(buf)) < (ssize_t)sizeof(*in_ev)) > + if (ref.nsdir_fd !=3D -1) { > + uint64_t expirations; > + int fd; > + > + read(ref.fd, &expirations, sizeof(expirations)); > + > + if ((fd =3D openat(ref.nsdir_fd, c->netns_base, O_PATH)) < 0) > + goto gone; > + > + close(fd); > + return; > + } > + > + if (read(ref.fd, buf, sizeof(buf)) < (ssize_t)sizeof(*in_ev)) > return; > =20 > if (strncmp(in_ev->name, c->netns_base, sizeof(c->netns_base))) > return; > =20 > +gone: > info("Namespace %s is gone, exiting", c->netns_base); > exit(EXIT_SUCCESS); > } > diff --git a/pasta.h b/pasta.h > index 5d20063..c120e94 100644 > --- a/pasta.h > +++ b/pasta.h > @@ -14,6 +14,6 @@ void pasta_start_ns(struct ctx *c, uid_t uid, gid_t gid, > void pasta_ns_conf(struct ctx *c); > void pasta_child_handler(int signal); > int pasta_netns_quit_init(const struct ctx *c); > -void pasta_netns_quit_handler(struct ctx *c, int inotify_fd); > +void pasta_netns_quit_handler(struct ctx *c, union epoll_ref ref); > =20 > #endif /* PASTA_H */ --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --8HmR5W6MfKxiue6P Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmXOz9gACgkQzQJF27ox 2GdY1w//d1wo2Uz4CpHWIiAuYofYbB0BYU3zX9JTHsBkj/2KpcoV1+XpHNidZoWI yM+ZXWw87P7JpgHb+puLyeT0FllGMq5H5BaSoJHgmFv2BsaeqC7X+/yJ0ja1kZOA JcpbaqoP6Nwrp3JoXyrWfa3Wz+A6cqqve4XVfpJjkbH3/GcsMrujV05FVx+o501s 70x6z/C7TGQzj7luB13ochx8OmnPHfRRWOsdKGCmurpWOlgXAyChPDZzJGjEoWz4 dJN09dEeulaIr4+QQQrHtctLudQQZK9yO+HGuT60rzlTiNnF+//rwjoQjYqoLWs6 vu8fHTzBZiW2L/RC64rj8LdJ88lNEm15wxmO2EbRtyx21fot0Y4Xlpf/3/uDHqFu QNW0G/Rb4PT3BPtt4lCRL6dUDV5seXXp1PQ7ALFO+8QAPFIoKjzqzRg/ifMCUaJN s0PRUzzS39KKbuVRSqj9Sn2xNSWAfhW/FJPeVLPvutgVbz9819kesekxVrPnkQfq Lt7U6dK6gXUJfDYP8lx9vZJnNRppKlXMzVCK47nPDteoNnpga8ckvAbb+BepI8Ru eLzManr0LehyuQSNmjvPZczO2A8axca0bPjsHbpUeoW5EBoN7EQRr0D1Xuasl9qc c4Vm+pI02tnZErZG07tmW0/EuA0No1t2qTU8pgLP6VGLP6EEepw= =5FAH -----END PGP SIGNATURE----- --8HmR5W6MfKxiue6P--