On Wed, Aug 14, 2024 at 08:40:22AM +0200, Stefano Brivio wrote: > Hi Matt, > > On Tue, 13 Aug 2024 22:58:42 -0700 > Matt Hamilton wrote: > > > I am using Podman in Fedora 40, which uses pasta by default for rootless > > container networking. > > > > Fedora 40's base version of passt is `passt-0^20240326.g4988e2b-1.fc40`, > > but recently two newer versions were released, > > `passt-0^20240726.g57a21d2-1.fc40` and `0^20240806.gee36266-1.fc40`. > > > > After upgrading, one pod kept going offline after a few minutes. The > > containers remained running, but could not make outbound connections. > > Journalctl revealed that the pasta process for the pod had crashed with: > > > > Aug 08 23:07:55 dev pasta[95859]: ASSERTION FAILED in flow_hash > > (flow.c:566): pif != PIF_NONE && !inany_is_unspecified(&side->eaddr) > > && side->eport != 0 && side->fport != 0 > > Aug 08 23:07:55 dev audit[95859]: SECCOMP auid=1000 uid=1000 > > gid=1000 ses=1 > > subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023 > > pid=95859 comm="pasta.avx2" exe="/usr/bin/pasta.avx2" sig=31 > > arch=c000003e syscall=186 compat=0 ip=0x7f8f8c23b64f code=0x80000000 > > Aug 08 23:07:55 dev audit[95859]: ANOM_ABEND auid=1000 uid=1000 > > gid=1000 ses=1 > > subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023 > > pid=95859 comm="pasta.avx2" exe="/usr/bin/pasta.avx2" sig=31 res=1 > > > > After much debugging, I isolated the trigger to a particular container > > making a peer-to-peer TCP connection to a remote address with port 0. > > Thanks for the analysis and for the report! > > > Reverting passt to version 20240326 works as expected, and the container > > stays online. It's been a long time since I wrote any C, but the code > > seems clear and checks that the endpoint and forwarding ports do not > > equal 0. I assume that a port 0 connection is not realistic or useful,  > > and that actual attempt to connect over this port indicate a bug in the > > client code. Is this correct? > > Right, that's somehow unexpected because TCP port zero is reserved > and not assigned, so it should never be used. However, I'm not sure how > we can even reach flow_hash() with it. > > David, this seems to come from 163a339214dd ("tcp, flow: Replace TCP > specific hash function with general flow hash"), any clue? Stefano reproduced, and I've found the issue. The assert was intended to check that we never created flows with 0 port - and we don't. Unfortunately it was also invoked when searching for an existing flow matching a new packet. Patch coming shortly. Note that this will fix the crash, but it still won't permit the connection to port 0 to go through. I don't know if that will allow your application to run, or whether it relies on that port 0 connection. Actually allowing the connection to go through would be much harder. It's easy to remove the explicit checks, obviously, but making sure we never pass that 0 to an API where it doesn't mean what we want it to would require some time. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson