On Sat, May 16, 2026 at 05:46:16PM +0200, Stefano Brivio wrote: > On Wed, 13 May 2026 14:14:23 +1000 > David Gibson wrote: > > > Most of our operation is asynchronous, based on non-blocking fds handled > > in our epoll loop. However, our several Unix sockets (tap client, repair > > helper, control client) are all blocking fds after accept(). > > > > That's correct for the repair helper, and (for now) correct for the control > > client. However, the reasons for that might not be obvious, so add some > > extra comments giving the rationale. > > > > I don't believe it's correct for the tap client; having this socket be > > blocking means we could potentially block the main loop if we ever got a > > a spurious EPOLL{IN,OUT} event on the tap socket. Switch the tap socket > > to non-blocking for better robustness, and consistency with nearly every > > other fd we track. > > That socket needs to be blocking for the second usage we make of it in > tap_send_frames_passt(), that is, the one via write_remainder() without > MSG_DONTWAIT. Good point. I've dropped that change and adjusted the text to match. > > While a part of https://bugs.passt.top/show_bug.cgi?id=38 is solved > (there are no blocking reads left in tap_passt_input()), this isn't the > case for the writing side of it. > > Some nits below: > > > Signed-off-by: David Gibson > > --- > > conf.c | 6 ++++++ > > repair.c | 4 ++++ > > tap.c | 3 ++- > > 3 files changed, 12 insertions(+), 1 deletion(-) > > > > diff --git a/conf.c b/conf.c > > index dec43fca..dc85f0f8 100644 > > --- a/conf.c > > +++ b/conf.c > > @@ -2082,6 +2082,12 @@ static void conf_accept(struct ctx *c) > > int fd, rc; > > > > retry: > > + /* Currently we perform the configuration transaction more-or-less > > + * synchronously, so we want the accepted socket to be blocking. > > + * > > + * FIXME: We should make the configuration update asynchronous, like > > + * most of our operation, so a misbehaving configuration client can't > > + * block the main forwarding loop */ > > * ... loop. > */ > > > fd = accept4(c->fd_control_listen, NULL, NULL, SOCK_CLOEXEC); > > if (fd < 0) { > > if (errno != EAGAIN) > > diff --git a/repair.c b/repair.c > > index 42c4ae97..8a2d119d 100644 > > --- a/repair.c > > +++ b/repair.c > > @@ -99,6 +99,10 @@ int repair_listen_handler(struct ctx *c, uint32_t events) > > return EEXIST; > > } > > > > + /* We want accepted socket to be blocking; we use it during migration > > "the accepted socket" > > > + * which is a synchronous interruption to our normal non-blocking > > + * behaviour. > > + */ > > if ((c->fd_repair = accept4(c->fd_repair_listen, NULL, NULL, > > SOCK_CLOEXEC)) < 0) { > > if ((rc = errno) != EAGAIN) > > diff --git a/tap.c b/tap.c > > index fda2da9b..3b8a3f3d 100644 > > --- a/tap.c > > +++ b/tap.c > > @@ -1490,7 +1490,8 @@ void tap_listen_handler(struct ctx *c, uint32_t events) > > return; > > } > > > > - c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL, SOCK_CLOEXEC); > > + c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL, > > + SOCK_NONBLOCK | SOCK_CLOEXEC); > > ...so this part would need to be dropped. > > > if (c->fd_tap < 0) { > > if (errno != EAGAIN) > > warn_perror("Error accepting tap client"); > > The rest looks good to me. > > -- > Stefano > -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson