public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH 0/3] More caution with NONBLOCK flag on Unix sockets
@ 2026-05-13  4:14 David Gibson
  2026-05-13  4:14 ` [PATCH 1/3] treewide: Add SOCK_CLOEXEC to accept() calls that are missing it David Gibson
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: David Gibson @ 2026-05-13  4:14 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

This is a revision of the one patch from my pesto series that wasn't
merged: a rework to the handling of the blocking flag for both
listening and accepted Unix sockets.  This new version is split into
several pieces to make the rationales clearer at each step.  It's also
a bit more cautious in what it does, so should avoid the problems with
the original version.

David Gibson (3):
  treewide: Add SOCK_CLOEXEC to accept() calls that are missing it
  conf, tap, repair: Uniformly use non-blocking accept() on Unix sockets
  conf, repair, tap: More caution about blocking flag on Unix sockets

 conf.c   | 10 +++++++---
 repair.c | 13 +++++++++----
 tap.c    | 10 ++++++++--
 util.c   |  2 +-
 4 files changed, 25 insertions(+), 10 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] treewide: Add SOCK_CLOEXEC to accept() calls that are missing it
  2026-05-13  4:14 [PATCH 0/3] More caution with NONBLOCK flag on Unix sockets David Gibson
@ 2026-05-13  4:14 ` David Gibson
  2026-05-16 15:46   ` Stefano Brivio
  2026-05-13  4:14 ` [PATCH 2/3] conf, tap, repair: Uniformly use non-blocking accept() on Unix sockets David Gibson
  2026-05-13  4:14 ` [PATCH 3/3] conf, repair, tap: More caution about blocking flag " David Gibson
  2 siblings, 1 reply; 16+ messages in thread
From: David Gibson @ 2026-05-13  4:14 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Generally we try to set the O_CLOEXEC flag on every fd we create.  This
seems to be generally accepted security best practice these days, and we
never fork(), so certainly have no need to pass fds to children.

A handful of accept4() calls on Unix sockets are missing the SOCK_CLOEXEC
flag to set this though.  Add the missing flag.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 repair.c | 5 +++--
 tap.c    | 4 ++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/repair.c b/repair.c
index 69c53077..3e0e3e0a 100644
--- a/repair.c
+++ b/repair.c
@@ -87,7 +87,7 @@ int repair_listen_handler(struct ctx *c, uint32_t events)
 	/* Another client is already connected: accept and close right away. */
 	if (c->fd_repair != -1) {
 		int discard = accept4(c->fd_repair_listen, NULL, NULL,
-				      SOCK_NONBLOCK);
+				      SOCK_NONBLOCK | SOCK_CLOEXEC);
 
 		if (discard == -1)
 			return errno;
@@ -99,7 +99,8 @@ int repair_listen_handler(struct ctx *c, uint32_t events)
 		return EEXIST;
 	}
 
-	if ((c->fd_repair = accept4(c->fd_repair_listen, NULL, NULL, 0)) < 0) {
+	if ((c->fd_repair = accept4(c->fd_repair_listen, NULL, NULL,
+				    SOCK_CLOEXEC)) < 0) {
 		rc = errno;
 		debug_perror("accept4() on TCP_REPAIR helper listening socket");
 		return rc;
diff --git a/tap.c b/tap.c
index 0920a325..e7cac9df 100644
--- a/tap.c
+++ b/tap.c
@@ -1477,7 +1477,7 @@ void tap_listen_handler(struct ctx *c, uint32_t events)
 	/* Another client is already connected: accept and close right away. */
 	if (c->fd_tap != -1) {
 		int discard = accept4(c->fd_tap_listen, NULL, NULL,
-				      SOCK_NONBLOCK);
+				      SOCK_NONBLOCK | SOCK_CLOEXEC);
 
 		if (discard == -1)
 			return;
@@ -1490,7 +1490,7 @@ void tap_listen_handler(struct ctx *c, uint32_t events)
 		return;
 	}
 
-	c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL, 0);
+	c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL, SOCK_CLOEXEC);
 
 	if (!getsockopt(c->fd_tap, SOL_SOCKET, SO_PEERCRED, &ucred, &len))
 		info("accepted connection from PID %i", ucred.pid);
-- 
2.54.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/3] conf, tap, repair: Uniformly use non-blocking accept() on Unix sockets
  2026-05-13  4:14 [PATCH 0/3] More caution with NONBLOCK flag on Unix sockets David Gibson
  2026-05-13  4:14 ` [PATCH 1/3] treewide: Add SOCK_CLOEXEC to accept() calls that are missing it David Gibson
@ 2026-05-13  4:14 ` David Gibson
  2026-05-13  5:51   ` David Gibson
  2026-05-13  4:14 ` [PATCH 3/3] conf, repair, tap: More caution about blocking flag " David Gibson
  2 siblings, 1 reply; 16+ messages in thread
From: David Gibson @ 2026-05-13  4:14 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

sock_unix(), which creates a listening Unix socket, doesn't set the
SOCK_NONBLOCK flag, meaning that accept() will block if called with no
pending connections.  Generally, this doesn't matter because we only
accept() once we've received an epoll event indicating there's a pending
connection request.

Control connections (pesto) are an exception, because the way we queue
connections requires that we call accept() when we close one connection to
see if there's another one waiting.  We rely on an EAGAIN here to know that
there's nothing waiting.  To handle these we have an explicit fcntl() to
enable NONBLOCK on the control listening socket.

However, always using non-blocking accept() for Unix sockets would make
things a bit more uniform, and should be a bit less fragile in the case
that we ever somehow got a spurious connection event.  So, alter
sock_unix() to always use the SOCK_NONBLOCK flag.  Remove the control
socket's special case fcntl(), and adjust the error handling on each
Unix socket accept() for the new behaviour.  As a bonus the last adds
reporting for accept() errors on tap socket connections.

we will need non-blocking accept() for the upcoming control/configuration
socket.  Always add SOCK_NONBLOCK, which is more robust and in keeping with
the normal non-blocking style of passt.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c   | 4 +---
 repair.c | 4 ++--
 tap.c    | 5 +++++
 util.c   | 2 +-
 4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/conf.c b/conf.c
index 029b9c7c..dec43fca 100644
--- a/conf.c
+++ b/conf.c
@@ -1091,8 +1091,6 @@ static void conf_open_files(struct ctx *c)
 			die_perror("Couldn't open control socket %s",
 				   c->control_path);
 		}
-		if (fcntl(c->fd_control_listen, F_SETFL, O_NONBLOCK))
-			die_perror("Couldn't set O_NONBLOCK on control socket");
 	} else {
 		c->fd_control_listen = -1;
 	}
@@ -2087,7 +2085,7 @@ retry:
 	fd = accept4(c->fd_control_listen, NULL, NULL, SOCK_CLOEXEC);
 	if (fd < 0) {
 		if (errno != EAGAIN)
-			warn_perror("accept4() on configuration listening socket");
+			warn_perror("Error accept()ing configuration socket");
 		return;
 	}
 
diff --git a/repair.c b/repair.c
index 3e0e3e0a..42c4ae97 100644
--- a/repair.c
+++ b/repair.c
@@ -101,8 +101,8 @@ int repair_listen_handler(struct ctx *c, uint32_t events)
 
 	if ((c->fd_repair = accept4(c->fd_repair_listen, NULL, NULL,
 				    SOCK_CLOEXEC)) < 0) {
-		rc = errno;
-		debug_perror("accept4() on TCP_REPAIR helper listening socket");
+		if ((rc = errno) != EAGAIN)
+			warn_perror("Error accept()ing repair helper");
 		return rc;
 	}
 
diff --git a/tap.c b/tap.c
index e7cac9df..fda2da9b 100644
--- a/tap.c
+++ b/tap.c
@@ -1491,6 +1491,11 @@ void tap_listen_handler(struct ctx *c, uint32_t events)
 	}
 
 	c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL, SOCK_CLOEXEC);
+	if (c->fd_tap < 0) {
+		if (errno != EAGAIN)
+			warn_perror("Error accepting tap client");
+		return;
+	}
 
 	if (!getsockopt(c->fd_tap, SOL_SOCKET, SO_PEERCRED, &ucred, &len))
 		info("accepted connection from PID %i", ucred.pid);
diff --git a/util.c b/util.c
index 73c9d51d..204391c7 100644
--- a/util.c
+++ b/util.c
@@ -238,7 +238,7 @@ int sock_l4_dualstack_any(const struct ctx *c, enum epoll_type type,
  */
 int sock_unix(char *sock_path)
 {
-	int fd = socket(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0);
+	int fd = socket(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC | SOCK_NONBLOCK, 0);
 	struct sockaddr_un addr = {
 		.sun_family = AF_UNIX,
 	};
-- 
2.54.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/3] conf, repair, tap: More caution about blocking flag on Unix sockets
  2026-05-13  4:14 [PATCH 0/3] More caution with NONBLOCK flag on Unix sockets David Gibson
  2026-05-13  4:14 ` [PATCH 1/3] treewide: Add SOCK_CLOEXEC to accept() calls that are missing it David Gibson
  2026-05-13  4:14 ` [PATCH 2/3] conf, tap, repair: Uniformly use non-blocking accept() on Unix sockets David Gibson
@ 2026-05-13  4:14 ` David Gibson
  2026-05-16 15:46   ` Stefano Brivio
  2 siblings, 1 reply; 16+ messages in thread
From: David Gibson @ 2026-05-13  4:14 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev; +Cc: David Gibson

Most of our operation is asynchronous, based on non-blocking fds handled
in our epoll loop.  However, our several Unix sockets (tap client, repair
helper, control client) are all blocking fds after accept().

That's correct for the repair helper, and (for now) correct for the control
client.  However, the reasons for that might not be obvious, so add some
extra comments giving the rationale.

I don't believe it's correct for the tap client; having this socket be
blocking means we could potentially block the main loop if we ever got a
a spurious EPOLL{IN,OUT} event on the tap socket.  Switch the tap socket
to non-blocking for better robustness, and consistency with nearly every
other fd we track.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 conf.c   | 6 ++++++
 repair.c | 4 ++++
 tap.c    | 3 ++-
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/conf.c b/conf.c
index dec43fca..dc85f0f8 100644
--- a/conf.c
+++ b/conf.c
@@ -2082,6 +2082,12 @@ static void conf_accept(struct ctx *c)
 	int fd, rc;
 
 retry:
+	/* Currently we perform the configuration transaction more-or-less
+	 * synchronously, so we want the accepted socket to be blocking.
+	 *
+	 * FIXME: We should make the configuration update asynchronous, like
+	 * most of our operation, so a misbehaving configuration client can't
+	 * block the main forwarding loop */
 	fd = accept4(c->fd_control_listen, NULL, NULL, SOCK_CLOEXEC);
 	if (fd < 0) {
 		if (errno != EAGAIN)
diff --git a/repair.c b/repair.c
index 42c4ae97..8a2d119d 100644
--- a/repair.c
+++ b/repair.c
@@ -99,6 +99,10 @@ int repair_listen_handler(struct ctx *c, uint32_t events)
 		return EEXIST;
 	}
 
+	/* We want accepted socket to be blocking; we use it during migration
+	 * which is a synchronous interruption to our normal non-blocking
+	 * behaviour.
+	 */
 	if ((c->fd_repair = accept4(c->fd_repair_listen, NULL, NULL,
 				    SOCK_CLOEXEC)) < 0) {
 		if ((rc = errno) != EAGAIN)
diff --git a/tap.c b/tap.c
index fda2da9b..3b8a3f3d 100644
--- a/tap.c
+++ b/tap.c
@@ -1490,7 +1490,8 @@ void tap_listen_handler(struct ctx *c, uint32_t events)
 		return;
 	}
 
-	c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL, SOCK_CLOEXEC);
+	c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL,
+			    SOCK_NONBLOCK | SOCK_CLOEXEC);
 	if (c->fd_tap < 0) {
 		if (errno != EAGAIN)
 			warn_perror("Error accepting tap client");
-- 
2.54.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] conf, tap, repair: Uniformly use non-blocking accept() on Unix sockets
  2026-05-13  4:14 ` [PATCH 2/3] conf, tap, repair: Uniformly use non-blocking accept() on Unix sockets David Gibson
@ 2026-05-13  5:51   ` David Gibson
  2026-05-16 15:46     ` Stefano Brivio
  0 siblings, 1 reply; 16+ messages in thread
From: David Gibson @ 2026-05-13  5:51 UTC (permalink / raw)
  To: Stefano Brivio, passt-dev

[-- Attachment #1: Type: text/plain, Size: 4352 bytes --]

On Wed, May 13, 2026 at 02:14:22PM +1000, David Gibson wrote:
> sock_unix(), which creates a listening Unix socket, doesn't set the
> SOCK_NONBLOCK flag, meaning that accept() will block if called with no
> pending connections.  Generally, this doesn't matter because we only
> accept() once we've received an epoll event indicating there's a pending
> connection request.
> 
> Control connections (pesto) are an exception, because the way we queue
> connections requires that we call accept() when we close one connection to
> see if there's another one waiting.  We rely on an EAGAIN here to know that
> there's nothing waiting.  To handle these we have an explicit fcntl() to
> enable NONBLOCK on the control listening socket.
> 
> However, always using non-blocking accept() for Unix sockets would make
> things a bit more uniform, and should be a bit less fragile in the case
> that we ever somehow got a spurious connection event.  So, alter
> sock_unix() to always use the SOCK_NONBLOCK flag.  Remove the control
> socket's special case fcntl(), and adjust the error handling on each
> Unix socket accept() for the new behaviour.  As a bonus the last adds
> reporting for accept() errors on tap socket connections.

I didn't realise it, but adding that reporting also removes a valid,
if fairly minor coverity warning (at least with coverity 2026.3.0).

> we will need non-blocking accept() for the upcoming control/configuration
> socket.  Always add SOCK_NONBLOCK, which is more robust and in keeping with
> the normal non-blocking style of passt.

Oops.  This paragraph is left over from a previous version.  Can you
remove on merge, if there's no other reason to respin?

> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  conf.c   | 4 +---
>  repair.c | 4 ++--
>  tap.c    | 5 +++++
>  util.c   | 2 +-
>  4 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/conf.c b/conf.c
> index 029b9c7c..dec43fca 100644
> --- a/conf.c
> +++ b/conf.c
> @@ -1091,8 +1091,6 @@ static void conf_open_files(struct ctx *c)
>  			die_perror("Couldn't open control socket %s",
>  				   c->control_path);
>  		}
> -		if (fcntl(c->fd_control_listen, F_SETFL, O_NONBLOCK))
> -			die_perror("Couldn't set O_NONBLOCK on control socket");
>  	} else {
>  		c->fd_control_listen = -1;
>  	}
> @@ -2087,7 +2085,7 @@ retry:
>  	fd = accept4(c->fd_control_listen, NULL, NULL, SOCK_CLOEXEC);
>  	if (fd < 0) {
>  		if (errno != EAGAIN)
> -			warn_perror("accept4() on configuration listening socket");
> +			warn_perror("Error accept()ing configuration socket");
>  		return;
>  	}
>  
> diff --git a/repair.c b/repair.c
> index 3e0e3e0a..42c4ae97 100644
> --- a/repair.c
> +++ b/repair.c
> @@ -101,8 +101,8 @@ int repair_listen_handler(struct ctx *c, uint32_t events)
>  
>  	if ((c->fd_repair = accept4(c->fd_repair_listen, NULL, NULL,
>  				    SOCK_CLOEXEC)) < 0) {
> -		rc = errno;
> -		debug_perror("accept4() on TCP_REPAIR helper listening socket");
> +		if ((rc = errno) != EAGAIN)
> +			warn_perror("Error accept()ing repair helper");
>  		return rc;
>  	}
>  
> diff --git a/tap.c b/tap.c
> index e7cac9df..fda2da9b 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -1491,6 +1491,11 @@ void tap_listen_handler(struct ctx *c, uint32_t events)
>  	}
>  
>  	c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL, SOCK_CLOEXEC);
> +	if (c->fd_tap < 0) {
> +		if (errno != EAGAIN)
> +			warn_perror("Error accepting tap client");
> +		return;
> +	}
>  
>  	if (!getsockopt(c->fd_tap, SOL_SOCKET, SO_PEERCRED, &ucred, &len))
>  		info("accepted connection from PID %i", ucred.pid);
> diff --git a/util.c b/util.c
> index 73c9d51d..204391c7 100644
> --- a/util.c
> +++ b/util.c
> @@ -238,7 +238,7 @@ int sock_l4_dualstack_any(const struct ctx *c, enum epoll_type type,
>   */
>  int sock_unix(char *sock_path)
>  {
> -	int fd = socket(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0);
> +	int fd = socket(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC | SOCK_NONBLOCK, 0);
>  	struct sockaddr_un addr = {
>  		.sun_family = AF_UNIX,
>  	};
> -- 
> 2.54.0
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] treewide: Add SOCK_CLOEXEC to accept() calls that are missing it
  2026-05-13  4:14 ` [PATCH 1/3] treewide: Add SOCK_CLOEXEC to accept() calls that are missing it David Gibson
@ 2026-05-16 15:46   ` Stefano Brivio
  2026-05-18  2:28     ` David Gibson
  0 siblings, 1 reply; 16+ messages in thread
From: Stefano Brivio @ 2026-05-16 15:46 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Wed, 13 May 2026 14:14:21 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> Generally we try to set the O_CLOEXEC flag on every fd we create.  This
> seems to be generally accepted security best practice these days, and we
> never fork(), so certainly have no need to pass fds to children.

But we do clone() with CLONE_FILES (even though when we clone() to call
execvp() later, we don't set CLONE_FILES), so, even though I don't see
a reason to skip O_CLOEXEC for c->fd_tap, this conclusion shouldn't be
automatic from the fact we don't fork().

I spent some time on it and I really couldn't find a reason why we
don't have O_CLOEXEC there, so probably there isn't any, and I think
this patch is fine.

I would just change this paragraph to "[...] these days, and we don't
need to pass file descriptors to children."

> A handful of accept4() calls on Unix sockets are missing the SOCK_CLOEXEC
> flag to set this though.  Add the missing flag.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  repair.c | 5 +++--
>  tap.c    | 4 ++--
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/repair.c b/repair.c
> index 69c53077..3e0e3e0a 100644
> --- a/repair.c
> +++ b/repair.c
> @@ -87,7 +87,7 @@ int repair_listen_handler(struct ctx *c, uint32_t events)
>  	/* Another client is already connected: accept and close right away. */
>  	if (c->fd_repair != -1) {
>  		int discard = accept4(c->fd_repair_listen, NULL, NULL,
> -				      SOCK_NONBLOCK);
> +				      SOCK_NONBLOCK | SOCK_CLOEXEC);
>  
>  		if (discard == -1)
>  			return errno;
> @@ -99,7 +99,8 @@ int repair_listen_handler(struct ctx *c, uint32_t events)
>  		return EEXIST;
>  	}
>  
> -	if ((c->fd_repair = accept4(c->fd_repair_listen, NULL, NULL, 0)) < 0) {
> +	if ((c->fd_repair = accept4(c->fd_repair_listen, NULL, NULL,
> +				    SOCK_CLOEXEC)) < 0) {
>  		rc = errno;
>  		debug_perror("accept4() on TCP_REPAIR helper listening socket");
>  		return rc;
> diff --git a/tap.c b/tap.c
> index 0920a325..e7cac9df 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -1477,7 +1477,7 @@ void tap_listen_handler(struct ctx *c, uint32_t events)
>  	/* Another client is already connected: accept and close right away. */
>  	if (c->fd_tap != -1) {
>  		int discard = accept4(c->fd_tap_listen, NULL, NULL,
> -				      SOCK_NONBLOCK);
> +				      SOCK_NONBLOCK | SOCK_CLOEXEC);
>  
>  		if (discard == -1)
>  			return;
> @@ -1490,7 +1490,7 @@ void tap_listen_handler(struct ctx *c, uint32_t events)
>  		return;
>  	}
>  
> -	c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL, 0);
> +	c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL, SOCK_CLOEXEC);
>  
>  	if (!getsockopt(c->fd_tap, SOL_SOCKET, SO_PEERCRED, &ucred, &len))
>  		info("accepted connection from PID %i", ucred.pid);

-- 
Stefano


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] conf, repair, tap: More caution about blocking flag on Unix sockets
  2026-05-13  4:14 ` [PATCH 3/3] conf, repair, tap: More caution about blocking flag " David Gibson
@ 2026-05-16 15:46   ` Stefano Brivio
  2026-05-18  2:50     ` David Gibson
  0 siblings, 1 reply; 16+ messages in thread
From: Stefano Brivio @ 2026-05-16 15:46 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Wed, 13 May 2026 14:14:23 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> Most of our operation is asynchronous, based on non-blocking fds handled
> in our epoll loop.  However, our several Unix sockets (tap client, repair
> helper, control client) are all blocking fds after accept().
> 
> That's correct for the repair helper, and (for now) correct for the control
> client.  However, the reasons for that might not be obvious, so add some
> extra comments giving the rationale.
> 
> I don't believe it's correct for the tap client; having this socket be
> blocking means we could potentially block the main loop if we ever got a
> a spurious EPOLL{IN,OUT} event on the tap socket.  Switch the tap socket
> to non-blocking for better robustness, and consistency with nearly every
> other fd we track.

That socket needs to be blocking for the second usage we make of it in
tap_send_frames_passt(), that is, the one via write_remainder() without
MSG_DONTWAIT.

While a part of https://bugs.passt.top/show_bug.cgi?id=38 is solved
(there are no blocking reads left in tap_passt_input()), this isn't the
case for the writing side of it.

Some nits below:

> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  conf.c   | 6 ++++++
>  repair.c | 4 ++++
>  tap.c    | 3 ++-
>  3 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/conf.c b/conf.c
> index dec43fca..dc85f0f8 100644
> --- a/conf.c
> +++ b/conf.c
> @@ -2082,6 +2082,12 @@ static void conf_accept(struct ctx *c)
>  	int fd, rc;
>  
>  retry:
> +	/* Currently we perform the configuration transaction more-or-less
> +	 * synchronously, so we want the accepted socket to be blocking.
> +	 *
> +	 * FIXME: We should make the configuration update asynchronous, like
> +	 * most of our operation, so a misbehaving configuration client can't
> +	 * block the main forwarding loop */

	 * ... loop.
	 */

>  	fd = accept4(c->fd_control_listen, NULL, NULL, SOCK_CLOEXEC);
>  	if (fd < 0) {
>  		if (errno != EAGAIN)
> diff --git a/repair.c b/repair.c
> index 42c4ae97..8a2d119d 100644
> --- a/repair.c
> +++ b/repair.c
> @@ -99,6 +99,10 @@ int repair_listen_handler(struct ctx *c, uint32_t events)
>  		return EEXIST;
>  	}
>  
> +	/* We want accepted socket to be blocking; we use it during migration

"the accepted socket"

> +	 * which is a synchronous interruption to our normal non-blocking
> +	 * behaviour.
> +	 */
>  	if ((c->fd_repair = accept4(c->fd_repair_listen, NULL, NULL,
>  				    SOCK_CLOEXEC)) < 0) {
>  		if ((rc = errno) != EAGAIN)
> diff --git a/tap.c b/tap.c
> index fda2da9b..3b8a3f3d 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -1490,7 +1490,8 @@ void tap_listen_handler(struct ctx *c, uint32_t events)
>  		return;
>  	}
>  
> -	c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL, SOCK_CLOEXEC);
> +	c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL,
> +			    SOCK_NONBLOCK | SOCK_CLOEXEC);

...so this part would need to be dropped. 

>  	if (c->fd_tap < 0) {
>  		if (errno != EAGAIN)
>  			warn_perror("Error accepting tap client");

The rest looks good to me.

-- 
Stefano


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] conf, tap, repair: Uniformly use non-blocking accept() on Unix sockets
  2026-05-13  5:51   ` David Gibson
@ 2026-05-16 15:46     ` Stefano Brivio
  2026-05-18  2:40       ` David Gibson
  0 siblings, 1 reply; 16+ messages in thread
From: Stefano Brivio @ 2026-05-16 15:46 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Wed, 13 May 2026 15:51:27 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, May 13, 2026 at 02:14:22PM +1000, David Gibson wrote:
> > sock_unix(), which creates a listening Unix socket, doesn't set the
> > SOCK_NONBLOCK flag, meaning that accept() will block if called with no
> > pending connections.  Generally, this doesn't matter because we only
> > accept() once we've received an epoll event indicating there's a pending
> > connection request.
> > 
> > Control connections (pesto) are an exception, because the way we queue
> > connections requires that we call accept() when we close one connection to
> > see if there's another one waiting.  We rely on an EAGAIN here to know that
> > there's nothing waiting.  To handle these we have an explicit fcntl() to
> > enable NONBLOCK on the control listening socket.
> > 
> > However, always using non-blocking accept() for Unix sockets would make
> > things a bit more uniform, and should be a bit less fragile in the case
> > that we ever somehow got a spurious connection event.  So, alter
> > sock_unix() to always use the SOCK_NONBLOCK flag.  Remove the control
> > socket's special case fcntl(), and adjust the error handling on each
> > Unix socket accept() for the new behaviour.  As a bonus the last adds
> > reporting for accept() errors on tap socket connections.  
> 
> I didn't realise it, but adding that reporting also removes a valid,
> if fairly minor coverity warning (at least with coverity 2026.3.0).
> 
> > we will need non-blocking accept() for the upcoming control/configuration
> > socket.  Always add SOCK_NONBLOCK, which is more robust and in keeping with
> > the normal non-blocking style of passt.  
> 
> Oops.  This paragraph is left over from a previous version.  Can you
> remove on merge, if there's no other reason to respin?

I think the comments I'm raising (the one below and the one to 3/3)
actually warrant a respin at this point.

> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  conf.c   | 4 +---
> >  repair.c | 4 ++--
> >  tap.c    | 5 +++++
> >  util.c   | 2 +-
> >  4 files changed, 9 insertions(+), 6 deletions(-)
> > 
> > diff --git a/conf.c b/conf.c
> > index 029b9c7c..dec43fca 100644
> > --- a/conf.c
> > +++ b/conf.c
> > @@ -1091,8 +1091,6 @@ static void conf_open_files(struct ctx *c)
> >  			die_perror("Couldn't open control socket %s",
> >  				   c->control_path);
> >  		}
> > -		if (fcntl(c->fd_control_listen, F_SETFL, O_NONBLOCK))
> > -			die_perror("Couldn't set O_NONBLOCK on control socket");
> >  	} else {
> >  		c->fd_control_listen = -1;
> >  	}
> > @@ -2087,7 +2085,7 @@ retry:
> >  	fd = accept4(c->fd_control_listen, NULL, NULL, SOCK_CLOEXEC);
> >  	if (fd < 0) {
> >  		if (errno != EAGAIN)
> > -			warn_perror("accept4() on configuration listening socket");
> > +			warn_perror("Error accept()ing configuration socket");
> >  		return;
> >  	}
> >  
> > diff --git a/repair.c b/repair.c
> > index 3e0e3e0a..42c4ae97 100644
> > --- a/repair.c
> > +++ b/repair.c
> > @@ -101,8 +101,8 @@ int repair_listen_handler(struct ctx *c, uint32_t events)
> >  
> >  	if ((c->fd_repair = accept4(c->fd_repair_listen, NULL, NULL,
> >  				    SOCK_CLOEXEC)) < 0) {
> > -		rc = errno;
> > -		debug_perror("accept4() on TCP_REPAIR helper listening socket");
> > +		if ((rc = errno) != EAGAIN)
> > +			warn_perror("Error accept()ing repair helper");

See repair_wait() for the reason why this particular listening socket
needs to be blocking (with a timeout).

> >  		return rc;
> >  	}
> >  
> > diff --git a/tap.c b/tap.c
> > index e7cac9df..fda2da9b 100644
> > --- a/tap.c
> > +++ b/tap.c
> > @@ -1491,6 +1491,11 @@ void tap_listen_handler(struct ctx *c, uint32_t events)
> >  	}
> >  
> >  	c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL, SOCK_CLOEXEC);
> > +	if (c->fd_tap < 0) {
> > +		if (errno != EAGAIN)
> > +			warn_perror("Error accepting tap client");
> > +		return;
> > +	}
> >  
> >  	if (!getsockopt(c->fd_tap, SOL_SOCKET, SO_PEERCRED, &ucred, &len))
> >  		info("accepted connection from PID %i", ucred.pid);
> > diff --git a/util.c b/util.c
> > index 73c9d51d..204391c7 100644
> > --- a/util.c
> > +++ b/util.c
> > @@ -238,7 +238,7 @@ int sock_l4_dualstack_any(const struct ctx *c, enum epoll_type type,
> >   */
> >  int sock_unix(char *sock_path)
> >  {
> > -	int fd = socket(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0);
> > +	int fd = socket(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC | SOCK_NONBLOCK, 0);
> >  	struct sockaddr_un addr = {
> >  		.sun_family = AF_UNIX,
> >  	};
> > -- 
> > 2.54.0

-- 
Stefano


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] treewide: Add SOCK_CLOEXEC to accept() calls that are missing it
  2026-05-16 15:46   ` Stefano Brivio
@ 2026-05-18  2:28     ` David Gibson
  2026-05-20  0:37       ` Stefano Brivio
  0 siblings, 1 reply; 16+ messages in thread
From: David Gibson @ 2026-05-18  2:28 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 3638 bytes --]

On Sat, May 16, 2026 at 05:46:11PM +0200, Stefano Brivio wrote:
> On Wed, 13 May 2026 14:14:21 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > Generally we try to set the O_CLOEXEC flag on every fd we create.  This
> > seems to be generally accepted security best practice these days, and we
> > never fork(), so certainly have no need to pass fds to children.
> 
> But we do clone() with CLONE_FILES (even though when we clone() to call
> execvp() later, we don't set CLONE_FILES), so, even though I don't see
> a reason to skip O_CLOEXEC for c->fd_tap, this conclusion shouldn't be
> automatic from the fact we don't fork().

So, I did think about that when  wrote it, but went for the short
version rather than saying clone() with CLONE_FILES doesn't count.

Now, I realised that we've both fallen for the trap again, forgetting
that this has nothing to do with fork() or clone() and is, as it says
right there in the name, about exec().  I'll update the message.

> I spent some time on it and I really couldn't find a reason why we
> don't have O_CLOEXEC there, so probably there isn't any, and I think
> this patch is fine.
> 
> I would just change this paragraph to "[...] these days, and we don't
> need to pass file descriptors to children."
> 
> > A handful of accept4() calls on Unix sockets are missing the SOCK_CLOEXEC
> > flag to set this though.  Add the missing flag.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  repair.c | 5 +++--
> >  tap.c    | 4 ++--
> >  2 files changed, 5 insertions(+), 4 deletions(-)
> > 
> > diff --git a/repair.c b/repair.c
> > index 69c53077..3e0e3e0a 100644
> > --- a/repair.c
> > +++ b/repair.c
> > @@ -87,7 +87,7 @@ int repair_listen_handler(struct ctx *c, uint32_t events)
> >  	/* Another client is already connected: accept and close right away. */
> >  	if (c->fd_repair != -1) {
> >  		int discard = accept4(c->fd_repair_listen, NULL, NULL,
> > -				      SOCK_NONBLOCK);
> > +				      SOCK_NONBLOCK | SOCK_CLOEXEC);
> >  
> >  		if (discard == -1)
> >  			return errno;
> > @@ -99,7 +99,8 @@ int repair_listen_handler(struct ctx *c, uint32_t events)
> >  		return EEXIST;
> >  	}
> >  
> > -	if ((c->fd_repair = accept4(c->fd_repair_listen, NULL, NULL, 0)) < 0) {
> > +	if ((c->fd_repair = accept4(c->fd_repair_listen, NULL, NULL,
> > +				    SOCK_CLOEXEC)) < 0) {
> >  		rc = errno;
> >  		debug_perror("accept4() on TCP_REPAIR helper listening socket");
> >  		return rc;
> > diff --git a/tap.c b/tap.c
> > index 0920a325..e7cac9df 100644
> > --- a/tap.c
> > +++ b/tap.c
> > @@ -1477,7 +1477,7 @@ void tap_listen_handler(struct ctx *c, uint32_t events)
> >  	/* Another client is already connected: accept and close right away. */
> >  	if (c->fd_tap != -1) {
> >  		int discard = accept4(c->fd_tap_listen, NULL, NULL,
> > -				      SOCK_NONBLOCK);
> > +				      SOCK_NONBLOCK | SOCK_CLOEXEC);
> >  
> >  		if (discard == -1)
> >  			return;
> > @@ -1490,7 +1490,7 @@ void tap_listen_handler(struct ctx *c, uint32_t events)
> >  		return;
> >  	}
> >  
> > -	c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL, 0);
> > +	c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL, SOCK_CLOEXEC);
> >  
> >  	if (!getsockopt(c->fd_tap, SOL_SOCKET, SO_PEERCRED, &ucred, &len))
> >  		info("accepted connection from PID %i", ucred.pid);
> 
> -- 
> Stefano
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] conf, tap, repair: Uniformly use non-blocking accept() on Unix sockets
  2026-05-16 15:46     ` Stefano Brivio
@ 2026-05-18  2:40       ` David Gibson
  0 siblings, 0 replies; 16+ messages in thread
From: David Gibson @ 2026-05-18  2:40 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 4161 bytes --]

On Sat, May 16, 2026 at 05:46:27PM +0200, Stefano Brivio wrote:
> On Wed, 13 May 2026 15:51:27 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Wed, May 13, 2026 at 02:14:22PM +1000, David Gibson wrote:
> > > sock_unix(), which creates a listening Unix socket, doesn't set the
> > > SOCK_NONBLOCK flag, meaning that accept() will block if called with no
> > > pending connections.  Generally, this doesn't matter because we only
> > > accept() once we've received an epoll event indicating there's a pending
> > > connection request.
> > > 
> > > Control connections (pesto) are an exception, because the way we queue
> > > connections requires that we call accept() when we close one connection to
> > > see if there's another one waiting.  We rely on an EAGAIN here to know that
> > > there's nothing waiting.  To handle these we have an explicit fcntl() to
> > > enable NONBLOCK on the control listening socket.
> > > 
> > > However, always using non-blocking accept() for Unix sockets would make
> > > things a bit more uniform, and should be a bit less fragile in the case
> > > that we ever somehow got a spurious connection event.  So, alter
> > > sock_unix() to always use the SOCK_NONBLOCK flag.  Remove the control
> > > socket's special case fcntl(), and adjust the error handling on each
> > > Unix socket accept() for the new behaviour.  As a bonus the last adds
> > > reporting for accept() errors on tap socket connections.  
> > 
> > I didn't realise it, but adding that reporting also removes a valid,
> > if fairly minor coverity warning (at least with coverity 2026.3.0).
> > 
> > > we will need non-blocking accept() for the upcoming control/configuration
> > > socket.  Always add SOCK_NONBLOCK, which is more robust and in keeping with
> > > the normal non-blocking style of passt.  
> > 
> > Oops.  This paragraph is left over from a previous version.  Can you
> > remove on merge, if there's no other reason to respin?
> 
> I think the comments I'm raising (the one below and the one to 3/3)
> actually warrant a respin at this point.

Agreed.

> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > ---
> > >  conf.c   | 4 +---
> > >  repair.c | 4 ++--
> > >  tap.c    | 5 +++++
> > >  util.c   | 2 +-
> > >  4 files changed, 9 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/conf.c b/conf.c
> > > index 029b9c7c..dec43fca 100644
> > > --- a/conf.c
> > > +++ b/conf.c
> > > @@ -1091,8 +1091,6 @@ static void conf_open_files(struct ctx *c)
> > >  			die_perror("Couldn't open control socket %s",
> > >  				   c->control_path);
> > >  		}
> > > -		if (fcntl(c->fd_control_listen, F_SETFL, O_NONBLOCK))
> > > -			die_perror("Couldn't set O_NONBLOCK on control socket");
> > >  	} else {
> > >  		c->fd_control_listen = -1;
> > >  	}
> > > @@ -2087,7 +2085,7 @@ retry:
> > >  	fd = accept4(c->fd_control_listen, NULL, NULL, SOCK_CLOEXEC);
> > >  	if (fd < 0) {
> > >  		if (errno != EAGAIN)
> > > -			warn_perror("accept4() on configuration listening socket");
> > > +			warn_perror("Error accept()ing configuration socket");
> > >  		return;
> > >  	}
> > >  
> > > diff --git a/repair.c b/repair.c
> > > index 3e0e3e0a..42c4ae97 100644
> > > --- a/repair.c
> > > +++ b/repair.c
> > > @@ -101,8 +101,8 @@ int repair_listen_handler(struct ctx *c, uint32_t events)
> > >  
> > >  	if ((c->fd_repair = accept4(c->fd_repair_listen, NULL, NULL,
> > >  				    SOCK_CLOEXEC)) < 0) {
> > > -		rc = errno;
> > > -		debug_perror("accept4() on TCP_REPAIR helper listening socket");
> > > +		if ((rc = errno) != EAGAIN)
> > > +			warn_perror("Error accept()ing repair helper");
> 
> See repair_wait() for the reason why this particular listening socket
> needs to be blocking (with a timeout).

Ah, good point.  That makes this whole patch pretty ill-conceived.
I'll ditch everything except the error reporting addition.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] conf, repair, tap: More caution about blocking flag on Unix sockets
  2026-05-16 15:46   ` Stefano Brivio
@ 2026-05-18  2:50     ` David Gibson
  0 siblings, 0 replies; 16+ messages in thread
From: David Gibson @ 2026-05-18  2:50 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 3716 bytes --]

On Sat, May 16, 2026 at 05:46:16PM +0200, Stefano Brivio wrote:
> On Wed, 13 May 2026 14:14:23 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > Most of our operation is asynchronous, based on non-blocking fds handled
> > in our epoll loop.  However, our several Unix sockets (tap client, repair
> > helper, control client) are all blocking fds after accept().
> > 
> > That's correct for the repair helper, and (for now) correct for the control
> > client.  However, the reasons for that might not be obvious, so add some
> > extra comments giving the rationale.
> > 
> > I don't believe it's correct for the tap client; having this socket be
> > blocking means we could potentially block the main loop if we ever got a
> > a spurious EPOLL{IN,OUT} event on the tap socket.  Switch the tap socket
> > to non-blocking for better robustness, and consistency with nearly every
> > other fd we track.
> 
> That socket needs to be blocking for the second usage we make of it in
> tap_send_frames_passt(), that is, the one via write_remainder() without
> MSG_DONTWAIT.

Good point.  I've dropped that change and adjusted the text to match.

> 
> While a part of https://bugs.passt.top/show_bug.cgi?id=38 is solved
> (there are no blocking reads left in tap_passt_input()), this isn't the
> case for the writing side of it.
> 
> Some nits below:
> 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  conf.c   | 6 ++++++
> >  repair.c | 4 ++++
> >  tap.c    | 3 ++-
> >  3 files changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/conf.c b/conf.c
> > index dec43fca..dc85f0f8 100644
> > --- a/conf.c
> > +++ b/conf.c
> > @@ -2082,6 +2082,12 @@ static void conf_accept(struct ctx *c)
> >  	int fd, rc;
> >  
> >  retry:
> > +	/* Currently we perform the configuration transaction more-or-less
> > +	 * synchronously, so we want the accepted socket to be blocking.
> > +	 *
> > +	 * FIXME: We should make the configuration update asynchronous, like
> > +	 * most of our operation, so a misbehaving configuration client can't
> > +	 * block the main forwarding loop */
> 
> 	 * ... loop.
> 	 */
> 
> >  	fd = accept4(c->fd_control_listen, NULL, NULL, SOCK_CLOEXEC);
> >  	if (fd < 0) {
> >  		if (errno != EAGAIN)
> > diff --git a/repair.c b/repair.c
> > index 42c4ae97..8a2d119d 100644
> > --- a/repair.c
> > +++ b/repair.c
> > @@ -99,6 +99,10 @@ int repair_listen_handler(struct ctx *c, uint32_t events)
> >  		return EEXIST;
> >  	}
> >  
> > +	/* We want accepted socket to be blocking; we use it during migration
> 
> "the accepted socket"
> 
> > +	 * which is a synchronous interruption to our normal non-blocking
> > +	 * behaviour.
> > +	 */
> >  	if ((c->fd_repair = accept4(c->fd_repair_listen, NULL, NULL,
> >  				    SOCK_CLOEXEC)) < 0) {
> >  		if ((rc = errno) != EAGAIN)
> > diff --git a/tap.c b/tap.c
> > index fda2da9b..3b8a3f3d 100644
> > --- a/tap.c
> > +++ b/tap.c
> > @@ -1490,7 +1490,8 @@ void tap_listen_handler(struct ctx *c, uint32_t events)
> >  		return;
> >  	}
> >  
> > -	c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL, SOCK_CLOEXEC);
> > +	c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL,
> > +			    SOCK_NONBLOCK | SOCK_CLOEXEC);
> 
> ...so this part would need to be dropped. 
> 
> >  	if (c->fd_tap < 0) {
> >  		if (errno != EAGAIN)
> >  			warn_perror("Error accepting tap client");
> 
> The rest looks good to me.
> 
> -- 
> Stefano
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] treewide: Add SOCK_CLOEXEC to accept() calls that are missing it
  2026-05-18  2:28     ` David Gibson
@ 2026-05-20  0:37       ` Stefano Brivio
  2026-05-20  1:04         ` David Gibson
  0 siblings, 1 reply; 16+ messages in thread
From: Stefano Brivio @ 2026-05-20  0:37 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Mon, 18 May 2026 12:28:57 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Sat, May 16, 2026 at 05:46:11PM +0200, Stefano Brivio wrote:
> > On Wed, 13 May 2026 14:14:21 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > Generally we try to set the O_CLOEXEC flag on every fd we create.  This
> > > seems to be generally accepted security best practice these days, and we
> > > never fork(), so certainly have no need to pass fds to children.  
> > 
> > But we do clone() with CLONE_FILES (even though when we clone() to call
> > execvp() later, we don't set CLONE_FILES), so, even though I don't see
> > a reason to skip O_CLOEXEC for c->fd_tap, this conclusion shouldn't be
> > automatic from the fact we don't fork().  
> 
> So, I did think about that when  wrote it, but went for the short
> version rather than saying clone() with CLONE_FILES doesn't count.
> 
> Now, I realised that we've both fallen for the trap again, forgetting
> that this has nothing to do with fork() or clone() and is, as it says
> right there in the name, about exec().

No, wait, I didn't fall for it, not this time. :) That's why I was
mentioning that when we call clone() and execvp() later (which would be
the only path that matters), we don't set CLONE_FILES anyway.

-- 
Stefano


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] treewide: Add SOCK_CLOEXEC to accept() calls that are missing it
  2026-05-20  0:37       ` Stefano Brivio
@ 2026-05-20  1:04         ` David Gibson
  2026-05-20 11:36           ` Stefano Brivio
  0 siblings, 1 reply; 16+ messages in thread
From: David Gibson @ 2026-05-20  1:04 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 1863 bytes --]

On Wed, May 20, 2026 at 02:37:02AM +0200, Stefano Brivio wrote:
> On Mon, 18 May 2026 12:28:57 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Sat, May 16, 2026 at 05:46:11PM +0200, Stefano Brivio wrote:
> > > On Wed, 13 May 2026 14:14:21 +1000
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > Generally we try to set the O_CLOEXEC flag on every fd we create.  This
> > > > seems to be generally accepted security best practice these days, and we
> > > > never fork(), so certainly have no need to pass fds to children.  
> > > 
> > > But we do clone() with CLONE_FILES (even though when we clone() to call
> > > execvp() later, we don't set CLONE_FILES), so, even though I don't see
> > > a reason to skip O_CLOEXEC for c->fd_tap, this conclusion shouldn't be
> > > automatic from the fact we don't fork().  
> > 
> > So, I did think about that when  wrote it, but went for the short
> > version rather than saying clone() with CLONE_FILES doesn't count.
> > 
> > Now, I realised that we've both fallen for the trap again, forgetting
> > that this has nothing to do with fork() or clone() and is, as it says
> > right there in the name, about exec().
> 
> No, wait, I didn't fall for it, not this time. :) That's why I was
> mentioning that when we call clone() and execvp() later (which would be

Uh...?  I'm pretty sure the only execve(2) in the entire program is
where we spawn passt.avx2.  That's essentially the very first thing we
do, long before this point.

> the only path that matters), we don't set CLONE_FILES anyway.

CLONE_FILES is irrelevant, it's lost during execve(2).

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] treewide: Add SOCK_CLOEXEC to accept() calls that are missing it
  2026-05-20  1:04         ` David Gibson
@ 2026-05-20 11:36           ` Stefano Brivio
  2026-05-20 12:52             ` David Gibson
  0 siblings, 1 reply; 16+ messages in thread
From: Stefano Brivio @ 2026-05-20 11:36 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Wed, 20 May 2026 11:04:58 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, May 20, 2026 at 02:37:02AM +0200, Stefano Brivio wrote:
> > On Mon, 18 May 2026 12:28:57 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Sat, May 16, 2026 at 05:46:11PM +0200, Stefano Brivio wrote:  
> > > > On Wed, 13 May 2026 14:14:21 +1000
> > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >     
> > > > > Generally we try to set the O_CLOEXEC flag on every fd we create.  This
> > > > > seems to be generally accepted security best practice these days, and we
> > > > > never fork(), so certainly have no need to pass fds to children.    
> > > > 
> > > > But we do clone() with CLONE_FILES (even though when we clone() to call
> > > > execvp() later, we don't set CLONE_FILES), so, even though I don't see
> > > > a reason to skip O_CLOEXEC for c->fd_tap, this conclusion shouldn't be
> > > > automatic from the fact we don't fork().    
> > > 
> > > So, I did think about that when  wrote it, but went for the short
> > > version rather than saying clone() with CLONE_FILES doesn't count.
> > > 
> > > Now, I realised that we've both fallen for the trap again, forgetting
> > > that this has nothing to do with fork() or clone() and is, as it says
> > > right there in the name, about exec().  
> > 
> > No, wait, I didn't fall for it, not this time. :) That's why I was
> > mentioning that when we call clone() and execvp() later (which would be  
> 
> Uh...?  I'm pretty sure the only execve(2) in the entire program is
> where we spawn passt.avx2.  That's essentially the very first thing we
> do, long before this point.

Well, grep would beg to differ, as we don't call execve() at all, but:

$ grep execv *.c | grep -v qrap
arch.c:		execv(new_path, argv);
pasta.c:	execvp(a->exe, a->argv);

O_CLOEXEC (or lack thereof) also matters on execvp().

> > the only path that matters), we don't set CLONE_FILES anyway.  
> 
> CLONE_FILES is irrelevant, it's lost during execve(2).

Yes, but if you first clone(), which we actually do before calling
pasta_spawn_cmd(), and then execvp(), CLONE_FILES on clone() *would*
matter, because the cloned process would inherit the open files, and
the process started by execvp() would then get those files as well.

But as I was mentioning in that path we don't use CLONE_FILES anyway,
so that's not relevant.

-- 
Stefano


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] treewide: Add SOCK_CLOEXEC to accept() calls that are missing it
  2026-05-20 11:36           ` Stefano Brivio
@ 2026-05-20 12:52             ` David Gibson
  2026-05-20 14:22               ` Stefano Brivio
  0 siblings, 1 reply; 16+ messages in thread
From: David Gibson @ 2026-05-20 12:52 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 3652 bytes --]

On Wed, May 20, 2026 at 01:36:48PM +0200, Stefano Brivio wrote:
> On Wed, 20 May 2026 11:04:58 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Wed, May 20, 2026 at 02:37:02AM +0200, Stefano Brivio wrote:
> > > On Mon, 18 May 2026 12:28:57 +1000
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > On Sat, May 16, 2026 at 05:46:11PM +0200, Stefano Brivio wrote:  
> > > > > On Wed, 13 May 2026 14:14:21 +1000
> > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > >     
> > > > > > Generally we try to set the O_CLOEXEC flag on every fd we create.  This
> > > > > > seems to be generally accepted security best practice these days, and we
> > > > > > never fork(), so certainly have no need to pass fds to children.    
> > > > > 
> > > > > But we do clone() with CLONE_FILES (even though when we clone() to call
> > > > > execvp() later, we don't set CLONE_FILES), so, even though I don't see
> > > > > a reason to skip O_CLOEXEC for c->fd_tap, this conclusion shouldn't be
> > > > > automatic from the fact we don't fork().    
> > > > 
> > > > So, I did think about that when  wrote it, but went for the short
> > > > version rather than saying clone() with CLONE_FILES doesn't count.
> > > > 
> > > > Now, I realised that we've both fallen for the trap again, forgetting
> > > > that this has nothing to do with fork() or clone() and is, as it says
> > > > right there in the name, about exec().  
> > > 
> > > No, wait, I didn't fall for it, not this time. :) That's why I was
> > > mentioning that when we call clone() and execvp() later (which would be  
> > 
> > Uh...?  I'm pretty sure the only execve(2) in the entire program is
> > where we spawn passt.avx2.  That's essentially the very first thing we
> > do, long before this point.
> 
> Well, grep would beg to differ, as we don't call execve() at all, but:

I meant the system call execve(2), which execv() and execvp() are
library wrappers around.

> $ grep execv *.c | grep -v qrap
> arch.c:		execv(new_path, argv);
> pasta.c:	execvp(a->exe, a->argv);

Ah, I did miss the one in pasta_spawn_cmd().  Of course, we definitely
don't want to leak our internal fds into the spawned command, so
CLOEXEC is what we want.

> O_CLOEXEC (or lack thereof) also matters on execvp().
> 
> > > the only path that matters), we don't set CLONE_FILES anyway.  
> > 
> > CLONE_FILES is irrelevant, it's lost during execve(2).
> 
> Yes, but if you first clone(), which we actually do before calling
> pasta_spawn_cmd(), and then execvp(), CLONE_FILES on clone() *would*
> matter, because the cloned process would inherit the open files, and
> the process started by execvp() would then get those files as well.

No, it doesn't matter.  If you clone() without CLONE_FILES, the new
thread/process gets a copy of the handles, which do or don't survive
exec() depending on O_CLOEXEC.  If you clone with CLONE_FILES, the new
process shares the fd table.  The fd table is unshared again as part
of the exec().

From execve(2):
>        •  The file descriptor table is unshared, undoing the effect of the CLONE_FILES flag of clone(2).

.. then the now copied files do or don't survive depending on
O_CLOEXEC.  Either way, O_CLOEXEC has the same final effect.

> But as I was mentioning in that path we don't use CLONE_FILES anyway,
> so that's not relevant.
> 
> -- 
> Stefano
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] treewide: Add SOCK_CLOEXEC to accept() calls that are missing it
  2026-05-20 12:52             ` David Gibson
@ 2026-05-20 14:22               ` Stefano Brivio
  0 siblings, 0 replies; 16+ messages in thread
From: Stefano Brivio @ 2026-05-20 14:22 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Wed, 20 May 2026 22:52:47 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, May 20, 2026 at 01:36:48PM +0200, Stefano Brivio wrote:
> > On Wed, 20 May 2026 11:04:58 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Wed, May 20, 2026 at 02:37:02AM +0200, Stefano Brivio wrote:  
> > > > On Mon, 18 May 2026 12:28:57 +1000
> > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >     
> > > > > On Sat, May 16, 2026 at 05:46:11PM +0200, Stefano Brivio wrote:    
> > > > > > On Wed, 13 May 2026 14:14:21 +1000
> > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > >       
> > > > > > > Generally we try to set the O_CLOEXEC flag on every fd we create.  This
> > > > > > > seems to be generally accepted security best practice these days, and we
> > > > > > > never fork(), so certainly have no need to pass fds to children.      
> > > > > > 
> > > > > > But we do clone() with CLONE_FILES (even though when we clone() to call
> > > > > > execvp() later, we don't set CLONE_FILES), so, even though I don't see
> > > > > > a reason to skip O_CLOEXEC for c->fd_tap, this conclusion shouldn't be
> > > > > > automatic from the fact we don't fork().      
> > > > > 
> > > > > So, I did think about that when  wrote it, but went for the short
> > > > > version rather than saying clone() with CLONE_FILES doesn't count.
> > > > > 
> > > > > Now, I realised that we've both fallen for the trap again, forgetting
> > > > > that this has nothing to do with fork() or clone() and is, as it says
> > > > > right there in the name, about exec().    
> > > > 
> > > > No, wait, I didn't fall for it, not this time. :) That's why I was
> > > > mentioning that when we call clone() and execvp() later (which would be    
> > > 
> > > Uh...?  I'm pretty sure the only execve(2) in the entire program is
> > > where we spawn passt.avx2.  That's essentially the very first thing we
> > > do, long before this point.  
> > 
> > Well, grep would beg to differ, as we don't call execve() at all, but:  
> 
> I meant the system call execve(2), which execv() and execvp() are
> library wrappers around.

Oops, I missed the (2).

> > $ grep execv *.c | grep -v qrap
> > arch.c:		execv(new_path, argv);
> > pasta.c:	execvp(a->exe, a->argv);  
> 
> Ah, I did miss the one in pasta_spawn_cmd().  Of course, we definitely
> don't want to leak our internal fds into the spawned command, so
> CLOEXEC is what we want.
> 
> > O_CLOEXEC (or lack thereof) also matters on execvp().
> >   
> > > > the only path that matters), we don't set CLONE_FILES anyway.    
> > > 
> > > CLONE_FILES is irrelevant, it's lost during execve(2).  
> > 
> > Yes, but if you first clone(), which we actually do before calling
> > pasta_spawn_cmd(), and then execvp(), CLONE_FILES on clone() *would*
> > matter, because the cloned process would inherit the open files, and
> > the process started by execvp() would then get those files as well.  
> 
> No, it doesn't matter.  If you clone() without CLONE_FILES, the new
> thread/process gets a copy of the handles, which do or don't survive
> exec() depending on O_CLOEXEC.  If you clone with CLONE_FILES, the new
> process shares the fd table.  The fd table is unshared again as part
> of the exec().
> 
> From execve(2):
> >        •  The file descriptor table is unshared, undoing the effect of the CLONE_FILES flag of clone(2).  
> 
> .. then the now copied files do or don't survive depending on
> O_CLOEXEC.  Either way, O_CLOEXEC has the same final effect.

Ah, right, I forgot about this part. But anyway, O_CLOEXEC is always
relevant, as long as we call execvp() not right after start, regardless
of having called clone() before.

And in this case (pasta_spawn_cmd()) we call it rather "late", so the
fact we don't call fork() is not really relevant for this purpose.

-- 
Stefano


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2026-05-20 14:22 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-05-13  4:14 [PATCH 0/3] More caution with NONBLOCK flag on Unix sockets David Gibson
2026-05-13  4:14 ` [PATCH 1/3] treewide: Add SOCK_CLOEXEC to accept() calls that are missing it David Gibson
2026-05-16 15:46   ` Stefano Brivio
2026-05-18  2:28     ` David Gibson
2026-05-20  0:37       ` Stefano Brivio
2026-05-20  1:04         ` David Gibson
2026-05-20 11:36           ` Stefano Brivio
2026-05-20 12:52             ` David Gibson
2026-05-20 14:22               ` Stefano Brivio
2026-05-13  4:14 ` [PATCH 2/3] conf, tap, repair: Uniformly use non-blocking accept() on Unix sockets David Gibson
2026-05-13  5:51   ` David Gibson
2026-05-16 15:46     ` Stefano Brivio
2026-05-18  2:40       ` David Gibson
2026-05-13  4:14 ` [PATCH 3/3] conf, repair, tap: More caution about blocking flag " David Gibson
2026-05-16 15:46   ` Stefano Brivio
2026-05-18  2:50     ` David Gibson

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).