public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH] flow: Set EPOLLFD_ID_DEFAULT on newly allocated flows, not EPOLLFD_ID_INVALID
@ 2025-12-08 21:28 Stefano Brivio
  2025-12-08 21:54 ` Stefano Brivio
  0 siblings, 1 reply; 6+ messages in thread
From: Stefano Brivio @ 2025-12-08 21:28 UTC (permalink / raw)
  To: passt-dev; +Cc: David Gibson, Laurent Vivier

We're somehow hitting:

  ASSERTION FAILED in flow_epollfd (flow.c:362): f->epollid < ((1 << 8) - 1)

on an inbound spliced connection, with a single forwarded port, an
HTTP server in a Podman container, and a GET request. Reproducer at
https://bodhi.fedoraproject.org/updates/FEDORA-2025-93b4eb64c3#comment-4473411

  printf 'FROM registry.fedoraproject.org/fedora:latest\nRUN /usr/bin/dnf install -y httpd\nEXPOSE 80\nCMD ["-D", "FOREGROUND"]\nENTRYPOINT ["/usr/sbin/httpd"]\n' > Containerfile
  podman build -t fedora-httpd $(pwd)
  podman run -d -p 8080:80 localhost/fedora-httpd

  curl http://localhost:8080

I guess we don't set EPOLLFD_ID_DEFAULT early enough on inbound spliced
sockets for some reason and we get a socket event while we still have
EPOLLFD_ID_INVALID set.

As we're not really using epoll identifiers yet, set
EPOLLFD_ID_DEFAULT right away on newly allocated flows, while we
figure this out.

Link: https://bodhi.fedoraproject.org/updates/FEDORA-2025-93b4eb64c3#comment-4473411
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
I just merged this, posting for awareness / review.

 flow.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/flow.c b/flow.c
index 8d72965..f1bde9a 100644
--- a/flow.c
+++ b/flow.c
@@ -382,7 +382,8 @@ void flow_epollid_set(struct flow_common *f, int epollid)
  */
 void flow_epollid_clear(struct flow_common *f)
 {
-	f->epollid = EPOLLFD_ID_INVALID;
+	/* FIXME: Use EPOLLFD_ID_INVALID instead once it's safe to do so */
+	f->epollid = EPOLLFD_ID_DEFAULT;
 }
 
 /**
-- 
2.43.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] flow: Set EPOLLFD_ID_DEFAULT on newly allocated flows, not EPOLLFD_ID_INVALID
  2025-12-08 21:28 [PATCH] flow: Set EPOLLFD_ID_DEFAULT on newly allocated flows, not EPOLLFD_ID_INVALID Stefano Brivio
@ 2025-12-08 21:54 ` Stefano Brivio
  2025-12-08 23:36   ` David Gibson
  2025-12-09  0:01   ` David Gibson
  0 siblings, 2 replies; 6+ messages in thread
From: Stefano Brivio @ 2025-12-08 21:54 UTC (permalink / raw)
  To: passt-dev; +Cc: David Gibson, Laurent Vivier

On Mon,  8 Dec 2025 22:28:22 +0100
Stefano Brivio <sbrivio@redhat.com> wrote:

> We're somehow hitting:
> 
>   ASSERTION FAILED in flow_epollfd (flow.c:362): f->epollid < ((1 << 8) - 1)
> 
> on an inbound spliced connection, with a single forwarded port, an
> HTTP server in a Podman container, and a GET request. Reproducer at
> https://bodhi.fedoraproject.org/updates/FEDORA-2025-93b4eb64c3#comment-4473411
> 
>   printf 'FROM registry.fedoraproject.org/fedora:latest\nRUN /usr/bin/dnf install -y httpd\nEXPOSE 80\nCMD ["-D", "FOREGROUND"]\nENTRYPOINT ["/usr/sbin/httpd"]\n' > Containerfile
>   podman build -t fedora-httpd $(pwd)
>   podman run -d -p 8080:80 localhost/fedora-httpd
> 
>   curl http://localhost:8080
> 
> I guess we don't set EPOLLFD_ID_DEFAULT early enough on inbound spliced
> sockets for some reason and we get a socket event while we still have
> EPOLLFD_ID_INVALID set.
> 
> As we're not really using epoll identifiers yet, set
> EPOLLFD_ID_DEFAULT right away on newly allocated flows, while we
> figure this out.
> 
> Link: https://bodhi.fedoraproject.org/updates/FEDORA-2025-93b4eb64c3#comment-4473411
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> ---
> I just merged this, posting for awareness / review.

Ah, never mind, this makes it worse somehow:

5.6384: Flow 0 (TCP connection (spliced)): SPLICE_CONNECT
5.6384: Flow 0 (TCP connection (spliced)): ERROR on epoll_ctl(): No such file or directory

...still looking for a workaround / fix.

-- 
Stefano


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] flow: Set EPOLLFD_ID_DEFAULT on newly allocated flows, not EPOLLFD_ID_INVALID
  2025-12-08 21:54 ` Stefano Brivio
@ 2025-12-08 23:36   ` David Gibson
  2025-12-08 23:46     ` Stefano Brivio
  2025-12-09  0:01   ` David Gibson
  1 sibling, 1 reply; 6+ messages in thread
From: David Gibson @ 2025-12-08 23:36 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Laurent Vivier

[-- Attachment #1: Type: text/plain, Size: 2031 bytes --]

On Mon, Dec 08, 2025 at 10:54:00PM +0100, Stefano Brivio wrote:
> On Mon,  8 Dec 2025 22:28:22 +0100
> Stefano Brivio <sbrivio@redhat.com> wrote:
> 
> > We're somehow hitting:
> > 
> >   ASSERTION FAILED in flow_epollfd (flow.c:362): f->epollid < ((1 << 8) - 1)
> > 
> > on an inbound spliced connection, with a single forwarded port, an
> > HTTP server in a Podman container, and a GET request. Reproducer at
> > https://bodhi.fedoraproject.org/updates/FEDORA-2025-93b4eb64c3#comment-4473411
> > 
> >   printf 'FROM registry.fedoraproject.org/fedora:latest\nRUN /usr/bin/dnf install -y httpd\nEXPOSE 80\nCMD ["-D", "FOREGROUND"]\nENTRYPOINT ["/usr/sbin/httpd"]\n' > Containerfile
> >   podman build -t fedora-httpd $(pwd)
> >   podman run -d -p 8080:80 localhost/fedora-httpd
> > 
> >   curl http://localhost:8080
> > 
> > I guess we don't set EPOLLFD_ID_DEFAULT early enough on inbound spliced
> > sockets for some reason and we get a socket event while we still have
> > EPOLLFD_ID_INVALID set.
> > 
> > As we're not really using epoll identifiers yet, set
> > EPOLLFD_ID_DEFAULT right away on newly allocated flows, while we
> > figure this out.
> > 
> > Link: https://bodhi.fedoraproject.org/updates/FEDORA-2025-93b4eb64c3#comment-4473411
> > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> > ---
> > I just merged this, posting for awareness / review.
> 
> Ah, never mind, this makes it worse somehow:
> 
> 5.6384: Flow 0 (TCP connection (spliced)): SPLICE_CONNECT
> 5.6384: Flow 0 (TCP connection (spliced)): ERROR on epoll_ctl(): No such file or directory

Does this imply you managed to reproduce locally?  You hadn't as of
your comment a few after the one linked.  I also haven't managed to
reproduce this.

> ...still looking for a workaround / fix.
> 
> -- 
> Stefano
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] flow: Set EPOLLFD_ID_DEFAULT on newly allocated flows, not EPOLLFD_ID_INVALID
  2025-12-08 23:36   ` David Gibson
@ 2025-12-08 23:46     ` Stefano Brivio
  0 siblings, 0 replies; 6+ messages in thread
From: Stefano Brivio @ 2025-12-08 23:46 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Laurent Vivier

On Tue, 9 Dec 2025 10:36:01 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Mon, Dec 08, 2025 at 10:54:00PM +0100, Stefano Brivio wrote:
> > On Mon,  8 Dec 2025 22:28:22 +0100
> > Stefano Brivio <sbrivio@redhat.com> wrote:
> >   
> > > We're somehow hitting:
> > > 
> > >   ASSERTION FAILED in flow_epollfd (flow.c:362): f->epollid < ((1 << 8) - 1)
> > > 
> > > on an inbound spliced connection, with a single forwarded port, an
> > > HTTP server in a Podman container, and a GET request. Reproducer at
> > > https://bodhi.fedoraproject.org/updates/FEDORA-2025-93b4eb64c3#comment-4473411
> > > 
> > >   printf 'FROM registry.fedoraproject.org/fedora:latest\nRUN /usr/bin/dnf install -y httpd\nEXPOSE 80\nCMD ["-D", "FOREGROUND"]\nENTRYPOINT ["/usr/sbin/httpd"]\n' > Containerfile
> > >   podman build -t fedora-httpd $(pwd)
> > >   podman run -d -p 8080:80 localhost/fedora-httpd
> > > 
> > >   curl http://localhost:8080
> > > 
> > > I guess we don't set EPOLLFD_ID_DEFAULT early enough on inbound spliced
> > > sockets for some reason and we get a socket event while we still have
> > > EPOLLFD_ID_INVALID set.
> > > 
> > > As we're not really using epoll identifiers yet, set
> > > EPOLLFD_ID_DEFAULT right away on newly allocated flows, while we
> > > figure this out.
> > > 
> > > Link: https://bodhi.fedoraproject.org/updates/FEDORA-2025-93b4eb64c3#comment-4473411
> > > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> > > ---
> > > I just merged this, posting for awareness / review.  
> > 
> > Ah, never mind, this makes it worse somehow:
> > 
> > 5.6384: Flow 0 (TCP connection (spliced)): SPLICE_CONNECT
> > 5.6384: Flow 0 (TCP connection (spliced)): ERROR on epoll_ctl(): No such file or directory  
> 
> Does this imply you managed to reproduce locally?  You hadn't as of
> your comment a few after the one linked.  I also haven't managed to
> reproduce this.

Just simulate an error (that's not EINPROGRESS) on connect() in
tcp_splice_connect(). Patch coming.

-- 
Stefano


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] flow: Set EPOLLFD_ID_DEFAULT on newly allocated flows, not EPOLLFD_ID_INVALID
  2025-12-08 21:54 ` Stefano Brivio
  2025-12-08 23:36   ` David Gibson
@ 2025-12-09  0:01   ` David Gibson
  2025-12-09  0:05     ` Stefano Brivio
  1 sibling, 1 reply; 6+ messages in thread
From: David Gibson @ 2025-12-09  0:01 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev, Laurent Vivier

[-- Attachment #1: Type: text/plain, Size: 2466 bytes --]

On Mon, Dec 08, 2025 at 10:54:00PM +0100, Stefano Brivio wrote:
> On Mon,  8 Dec 2025 22:28:22 +0100
> Stefano Brivio <sbrivio@redhat.com> wrote:
> 
> > We're somehow hitting:
> > 
> >   ASSERTION FAILED in flow_epollfd (flow.c:362): f->epollid < ((1 << 8) - 1)
> > 
> > on an inbound spliced connection, with a single forwarded port, an
> > HTTP server in a Podman container, and a GET request. Reproducer at
> > https://bodhi.fedoraproject.org/updates/FEDORA-2025-93b4eb64c3#comment-4473411
> > 
> >   printf 'FROM registry.fedoraproject.org/fedora:latest\nRUN /usr/bin/dnf install -y httpd\nEXPOSE 80\nCMD ["-D", "FOREGROUND"]\nENTRYPOINT ["/usr/sbin/httpd"]\n' > Containerfile
> >   podman build -t fedora-httpd $(pwd)
> >   podman run -d -p 8080:80 localhost/fedora-httpd
> > 
> >   curl http://localhost:8080
> > 
> > I guess we don't set EPOLLFD_ID_DEFAULT early enough on inbound spliced
> > sockets for some reason and we get a socket event while we still have
> > EPOLLFD_ID_INVALID set.
> > 
> > As we're not really using epoll identifiers yet, set
> > EPOLLFD_ID_DEFAULT right away on newly allocated flows, while we
> > figure this out.
> > 
> > Link: https://bodhi.fedoraproject.org/updates/FEDORA-2025-93b4eb64c3#comment-4473411
> > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> > ---
> > I just merged this, posting for awareness / review.
> 
> Ah, never mind, this makes it worse somehow:
> 
> 5.6384: Flow 0 (TCP connection (spliced)): SPLICE_CONNECT
> 5.6384: Flow 0 (TCP connection (spliced)): ERROR on epoll_ctl(): No such file or directory

This makes sense: epollfd != EPOLLFD_ID_INVALID indicates that the
flow's fds are already in the epoll (flow_in_epoll() will return
true).  With epollfd initialised to EPOLLFD_ID_DEFAULT, we'll attempt
EPOLL_CTL_MOD on the very first tcp_splice_epoll_ctl(), having never
added the fds to the epoll set, hence this error.

> ...still looking for a workaround / fix.

Could the flow - for some other reason - be closing almost
immediately, before it even adds itself to the epoll?  If that's the
case, we could potentially trigger this in the (flag == CLOSING)
section of conn_flag_do().

I haven't managed to reproduce, so I can't test this myself.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] flow: Set EPOLLFD_ID_DEFAULT on newly allocated flows, not EPOLLFD_ID_INVALID
  2025-12-09  0:01   ` David Gibson
@ 2025-12-09  0:05     ` Stefano Brivio
  0 siblings, 0 replies; 6+ messages in thread
From: Stefano Brivio @ 2025-12-09  0:05 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, Laurent Vivier

On Tue, 9 Dec 2025 11:01:27 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> Could the flow - for some other reason - be closing almost
> immediately, before it even adds itself to the epoll?  If that's the
> case, we could potentially trigger this in the (flag == CLOSING)
> section of conn_flag_do().

Yes, that's what happens, see my previous email.

-- 
Stefano


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-12-09  0:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-08 21:28 [PATCH] flow: Set EPOLLFD_ID_DEFAULT on newly allocated flows, not EPOLLFD_ID_INVALID Stefano Brivio
2025-12-08 21:54 ` Stefano Brivio
2025-12-08 23:36   ` David Gibson
2025-12-08 23:46     ` Stefano Brivio
2025-12-09  0:01   ` David Gibson
2025-12-09  0:05     ` Stefano Brivio

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).