* Re: Pasta 20240726 and newer crash with ASSERTION FAILED in flow_hash
[not found] <1f7aefdc-11e8-4993-b647-7429da67b26c@thmail.io>
@ 2024-08-14 6:39 ` David Gibson
2024-08-14 6:56 ` Matt Hamilton
2024-08-14 6:40 ` Stefano Brivio
1 sibling, 1 reply; 7+ messages in thread
From: David Gibson @ 2024-08-14 6:39 UTC (permalink / raw)
To: Matt Hamilton; +Cc: passt-user
[-- Attachment #1: Type: text/plain, Size: 3515 bytes --]
On Tue, Aug 13, 2024 at 10:58:42PM -0700, Matt Hamilton wrote:
> I am using Podman in Fedora 40, which uses pasta by default for rootless
> container networking.
>
> Fedora 40's base version of passt is `passt-0^20240326.g4988e2b-1.fc40`, but
> recently two newer versions were released,
> `passt-0^20240726.g57a21d2-1.fc40` and `0^20240806.gee36266-1.fc40`.
>
> After upgrading, one pod kept going offline after a few minutes. The
> containers remained running, but could not make outbound connections.
> Journalctl revealed that the pasta process for the pod had crashed with:
>
> Aug 08 23:07:55 dev pasta[95859]: ASSERTION FAILED in flow_hash
> (flow.c:566): pif != PIF_NONE && !inany_is_unspecified(&side->eaddr)
> && side->eport != 0 && side->fport != 0
Ouch.
> Aug 08 23:07:55 dev audit[95859]: SECCOMP auid=1000 uid=1000
> gid=1000 ses=1
> subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023
> pid=95859 comm="pasta.avx2" exe="/usr/bin/pasta.avx2" sig=31
> arch=c000003e syscall=186 compat=0 ip=0x7f8f8c23b64f code=0x80000000
> Aug 08 23:07:55 dev audit[95859]: ANOM_ABEND auid=1000 uid=1000
> gid=1000 ses=1
> subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023
> pid=95859 comm="pasta.avx2" exe="/usr/bin/pasta.avx2" sig=31 res=1
>
> After much debugging, I isolated the trigger to a particular container
> making a peer-to-peer TCP connection to a remote address with port 0.
Huh.
> Reverting passt to version 20240326 works as expected, and the container
> stays online. It's been a long time since I wrote any C, but the code seems
> clear and checks that the endpoint and forwarding ports do not equal 0. I
> assume that a port 0 connection is not realistic or useful, and that actual
> attempt to connect over this port indicate a bug in the client code. Is this
> correct?
So, AFAICT the RFCs don't preclude using port 0 for connections on the
wire. However, it's usually not really sensible to do so: at least on
systems with a BSD-like socket interface, a port of 0 usually means
"unspecified" or "kernel, please pick for me". Obviously this client
is making it happen - my guess would be that a 0 port in connect() is
interpreted as a literal port 0, but I'm not sure how the server is
receiving it in thie case, since a bind() with port 0 will cause the
kernel to pick a port.
So, it does look like the client is doing something weird, although
whether it's technically invalid is debateable.
Even if it is valid for the client to do this, pasta can't really
handle that case, because it's using the sockets interface to do the
forwarding. BUT, it absolutely should not be crashing - it should log
a debug message, drop the connection and carry on.
We have code which is supposed to handle this case gracefully before
reaching that assertion. I'm not immediately sure why that's not working.
One possibility is that the client _isn't_ doing something weird, but
an unusual port forwarding configuration on pasta is remapping a
sensible port to port 0, thus causing the crash.
Getting the full podman command line for the failing container would
be the next step here. If you could file a bug at
https://bugs.passt.top that would be most helpful.
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Pasta 20240726 and newer crash with ASSERTION FAILED in flow_hash
[not found] <1f7aefdc-11e8-4993-b647-7429da67b26c@thmail.io>
2024-08-14 6:39 ` Pasta 20240726 and newer crash with ASSERTION FAILED in flow_hash David Gibson
@ 2024-08-14 6:40 ` Stefano Brivio
2024-08-14 10:01 ` David Gibson
1 sibling, 1 reply; 7+ messages in thread
From: Stefano Brivio @ 2024-08-14 6:40 UTC (permalink / raw)
To: Matt Hamilton, David Gibson; +Cc: passt-user
Hi Matt,
On Tue, 13 Aug 2024 22:58:42 -0700
Matt Hamilton <matt@thmail.io> wrote:
> I am using Podman in Fedora 40, which uses pasta by default for rootless
> container networking.
>
> Fedora 40's base version of passt is `passt-0^20240326.g4988e2b-1.fc40`,
> but recently two newer versions were released,
> `passt-0^20240726.g57a21d2-1.fc40` and `0^20240806.gee36266-1.fc40`.
>
> After upgrading, one pod kept going offline after a few minutes. The
> containers remained running, but could not make outbound connections.
> Journalctl revealed that the pasta process for the pod had crashed with:
>
> Aug 08 23:07:55 dev pasta[95859]: ASSERTION FAILED in flow_hash
> (flow.c:566): pif != PIF_NONE && !inany_is_unspecified(&side->eaddr)
> && side->eport != 0 && side->fport != 0
> Aug 08 23:07:55 dev audit[95859]: SECCOMP auid=1000 uid=1000
> gid=1000 ses=1
> subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023
> pid=95859 comm="pasta.avx2" exe="/usr/bin/pasta.avx2" sig=31
> arch=c000003e syscall=186 compat=0 ip=0x7f8f8c23b64f code=0x80000000
> Aug 08 23:07:55 dev audit[95859]: ANOM_ABEND auid=1000 uid=1000
> gid=1000 ses=1
> subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023
> pid=95859 comm="pasta.avx2" exe="/usr/bin/pasta.avx2" sig=31 res=1
>
> After much debugging, I isolated the trigger to a particular container
> making a peer-to-peer TCP connection to a remote address with port 0.
Thanks for the analysis and for the report!
> Reverting passt to version 20240326 works as expected, and the container
> stays online. It's been a long time since I wrote any C, but the code
> seems clear and checks that the endpoint and forwarding ports do not
> equal 0. I assume that a port 0 connection is not realistic or useful,
> and that actual attempt to connect over this port indicate a bug in the
> client code. Is this correct?
Right, that's somehow unexpected because TCP port zero is reserved
and not assigned, so it should never be used. However, I'm not sure how
we can even reach flow_hash() with it.
David, this seems to come from 163a339214dd ("tcp, flow: Replace TCP
specific hash function with general flow hash"), any clue?
--
Stefano
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Pasta 20240726 and newer crash with ASSERTION FAILED in flow_hash
2024-08-14 6:39 ` Pasta 20240726 and newer crash with ASSERTION FAILED in flow_hash David Gibson
@ 2024-08-14 6:56 ` Matt Hamilton
2024-08-14 7:01 ` Stefano Brivio
2024-08-14 9:57 ` Stefano Brivio
0 siblings, 2 replies; 7+ messages in thread
From: Matt Hamilton @ 2024-08-14 6:56 UTC (permalink / raw)
To: David Gibson; +Cc: passt-user
On 8/13/24 11:39 PM, David Gibson wrote:
> On Tue, Aug 13, 2024 at 10:58:42PM -0700, Matt Hamilton wrote:
>> I am using Podman in Fedora 40, which uses pasta by default for rootless
>> container networking.
>>
>> Fedora 40's base version of passt is `passt-0^20240326.g4988e2b-1.fc40`, but
>> recently two newer versions were released,
>> `passt-0^20240726.g57a21d2-1.fc40` and `0^20240806.gee36266-1.fc40`.
>>
>> After upgrading, one pod kept going offline after a few minutes. The
>> containers remained running, but could not make outbound connections.
>> Journalctl revealed that the pasta process for the pod had crashed with:
>>
>> Aug 08 23:07:55 dev pasta[95859]: ASSERTION FAILED in flow_hash
>> (flow.c:566): pif != PIF_NONE && !inany_is_unspecified(&side->eaddr)
>> && side->eport != 0 && side->fport != 0
> Ouch.
>
>> Aug 08 23:07:55 dev audit[95859]: SECCOMP auid=1000 uid=1000
>> gid=1000 ses=1
>> subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023
>> pid=95859 comm="pasta.avx2" exe="/usr/bin/pasta.avx2" sig=31
>> arch=c000003e syscall=186 compat=0 ip=0x7f8f8c23b64f code=0x80000000
>> Aug 08 23:07:55 dev audit[95859]: ANOM_ABEND auid=1000 uid=1000
>> gid=1000 ses=1
>> subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023
>> pid=95859 comm="pasta.avx2" exe="/usr/bin/pasta.avx2" sig=31 res=1
>>
>> After much debugging, I isolated the trigger to a particular container
>> making a peer-to-peer TCP connection to a remote address with port 0.
> Huh.
>
>> Reverting passt to version 20240326 works as expected, and the container
>> stays online. It's been a long time since I wrote any C, but the code seems
>> clear and checks that the endpoint and forwarding ports do not equal 0. I
>> assume that a port 0 connection is not realistic or useful, and that actual
>> attempt to connect over this port indicate a bug in the client code. Is this
>> correct?
> So, AFAICT the RFCs don't preclude using port 0 for connections on the
> wire. However, it's usually not really sensible to do so: at least on
> systems with a BSD-like socket interface, a port of 0 usually means
> "unspecified" or "kernel, please pick for me". Obviously this client
> is making it happen - my guess would be that a 0 port in connect() is
> interpreted as a literal port 0, but I'm not sure how the server is
> receiving it in thie case, since a bind() with port 0 will cause the
> kernel to pick a port.
>
> So, it does look like the client is doing something weird, although
> whether it's technically invalid is debateable.
>
> Even if it is valid for the client to do this, pasta can't really
> handle that case, because it's using the sockets interface to do the
> forwarding. BUT, it absolutely should not be crashing - it should log
> a debug message, drop the connection and carry on.
>
> We have code which is supposed to handle this case gracefully before
> reaching that assertion. I'm not immediately sure why that's not working.
>
> One possibility is that the client _isn't_ doing something weird, but
> an unusual port forwarding configuration on pasta is remapping a
> sensible port to port 0, thus causing the crash.
>
> Getting the full podman command line for the failing container would
> be the next step here. If you could file a bug at
> https://bugs.passt.top that would be most helpful.
I tried to make an account on bugzilla a day or two ago, but haven't
received the email confirmation link - I tried signing up using my
personal domain (used here) and a free service (gmail). I came here as a
second attempt to reach the devs!
If you can get me hooked up over there, I can file a bug with more
detailed logs and the podman command to reproduce.
>
>
> _______________________________________________
> user mailing list -- passt-user@passt.top
> To unsubscribe send an email to passt-user-leave@passt.top
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Pasta 20240726 and newer crash with ASSERTION FAILED in flow_hash
2024-08-14 6:56 ` Matt Hamilton
@ 2024-08-14 7:01 ` Stefano Brivio
2024-08-14 9:57 ` Stefano Brivio
1 sibling, 0 replies; 7+ messages in thread
From: Stefano Brivio @ 2024-08-14 7:01 UTC (permalink / raw)
To: Matt Hamilton; +Cc: David Gibson, passt-user
On Tue, 13 Aug 2024 23:56:56 -0700
Matt Hamilton <matt@thmail.io> wrote:
> On 8/13/24 11:39 PM, David Gibson wrote:
> > On Tue, Aug 13, 2024 at 10:58:42PM -0700, Matt Hamilton wrote:
> >> I am using Podman in Fedora 40, which uses pasta by default for rootless
> >> container networking.
> >>
> >> Fedora 40's base version of passt is `passt-0^20240326.g4988e2b-1.fc40`, but
> >> recently two newer versions were released,
> >> `passt-0^20240726.g57a21d2-1.fc40` and `0^20240806.gee36266-1.fc40`.
> >>
> >> After upgrading, one pod kept going offline after a few minutes. The
> >> containers remained running, but could not make outbound connections.
> >> Journalctl revealed that the pasta process for the pod had crashed with:
> >>
> >> Aug 08 23:07:55 dev pasta[95859]: ASSERTION FAILED in flow_hash
> >> (flow.c:566): pif != PIF_NONE && !inany_is_unspecified(&side->eaddr)
> >> && side->eport != 0 && side->fport != 0
> > Ouch.
> >
> >> Aug 08 23:07:55 dev audit[95859]: SECCOMP auid=1000 uid=1000
> >> gid=1000 ses=1
> >> subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023
> >> pid=95859 comm="pasta.avx2" exe="/usr/bin/pasta.avx2" sig=31
> >> arch=c000003e syscall=186 compat=0 ip=0x7f8f8c23b64f code=0x80000000
> >> Aug 08 23:07:55 dev audit[95859]: ANOM_ABEND auid=1000 uid=1000
> >> gid=1000 ses=1
> >> subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023
> >> pid=95859 comm="pasta.avx2" exe="/usr/bin/pasta.avx2" sig=31 res=1
> >>
> >> After much debugging, I isolated the trigger to a particular container
> >> making a peer-to-peer TCP connection to a remote address with port 0.
> > Huh.
> >
> >> Reverting passt to version 20240326 works as expected, and the container
> >> stays online. It's been a long time since I wrote any C, but the code seems
> >> clear and checks that the endpoint and forwarding ports do not equal 0. I
> >> assume that a port 0 connection is not realistic or useful, and that actual
> >> attempt to connect over this port indicate a bug in the client code. Is this
> >> correct?
> > So, AFAICT the RFCs don't preclude using port 0 for connections on the
> > wire. However, it's usually not really sensible to do so: at least on
> > systems with a BSD-like socket interface, a port of 0 usually means
> > "unspecified" or "kernel, please pick for me". Obviously this client
> > is making it happen - my guess would be that a 0 port in connect() is
> > interpreted as a literal port 0, but I'm not sure how the server is
> > receiving it in thie case, since a bind() with port 0 will cause the
> > kernel to pick a port.
> >
> > So, it does look like the client is doing something weird, although
> > whether it's technically invalid is debateable.
> >
> > Even if it is valid for the client to do this, pasta can't really
> > handle that case, because it's using the sockets interface to do the
> > forwarding. BUT, it absolutely should not be crashing - it should log
> > a debug message, drop the connection and carry on.
> >
> > We have code which is supposed to handle this case gracefully before
> > reaching that assertion. I'm not immediately sure why that's not working.
> >
> > One possibility is that the client _isn't_ doing something weird, but
> > an unusual port forwarding configuration on pasta is remapping a
> > sensible port to port 0, thus causing the crash.
> >
> > Getting the full podman command line for the failing container would
> > be the next step here. If you could file a bug at
> > https://bugs.passt.top that would be most helpful.
>
> I tried to make an account on bugzilla a day or two ago, but haven't
> received the email confirmation link - I tried signing up using my
> personal domain (used here) and a free service (gmail). I came here as a
> second attempt to reach the devs!
Sorry, my bad, I'm temporarily reviewing email confirmation requests
because of an influx of spam and missed yours. You should have it now.
--
Stefano
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Pasta 20240726 and newer crash with ASSERTION FAILED in flow_hash
2024-08-14 6:56 ` Matt Hamilton
2024-08-14 7:01 ` Stefano Brivio
@ 2024-08-14 9:57 ` Stefano Brivio
1 sibling, 0 replies; 7+ messages in thread
From: Stefano Brivio @ 2024-08-14 9:57 UTC (permalink / raw)
To: Matt Hamilton; +Cc: David Gibson, passt-user
On Tue, 13 Aug 2024 23:56:56 -0700
Matt Hamilton <matt@thmail.io> wrote:
> On 8/13/24 11:39 PM, David Gibson wrote:
> > On Tue, Aug 13, 2024 at 10:58:42PM -0700, Matt Hamilton wrote:
> >> I am using Podman in Fedora 40, which uses pasta by default for rootless
> >> container networking.
> >>
> >> Fedora 40's base version of passt is `passt-0^20240326.g4988e2b-1.fc40`, but
> >> recently two newer versions were released,
> >> `passt-0^20240726.g57a21d2-1.fc40` and `0^20240806.gee36266-1.fc40`.
> >>
> >> After upgrading, one pod kept going offline after a few minutes. The
> >> containers remained running, but could not make outbound connections.
> >> Journalctl revealed that the pasta process for the pod had crashed with:
> >>
> >> Aug 08 23:07:55 dev pasta[95859]: ASSERTION FAILED in flow_hash
> >> (flow.c:566): pif != PIF_NONE && !inany_is_unspecified(&side->eaddr)
> >> && side->eport != 0 && side->fport != 0
> > Ouch.
> >
> >> Aug 08 23:07:55 dev audit[95859]: SECCOMP auid=1000 uid=1000
> >> gid=1000 ses=1
> >> subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023
> >> pid=95859 comm="pasta.avx2" exe="/usr/bin/pasta.avx2" sig=31
> >> arch=c000003e syscall=186 compat=0 ip=0x7f8f8c23b64f code=0x80000000
> >> Aug 08 23:07:55 dev audit[95859]: ANOM_ABEND auid=1000 uid=1000
> >> gid=1000 ses=1
> >> subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023
> >> pid=95859 comm="pasta.avx2" exe="/usr/bin/pasta.avx2" sig=31 res=1
> >>
> >> After much debugging, I isolated the trigger to a particular container
> >> making a peer-to-peer TCP connection to a remote address with port 0.
> > Huh.
> >
> >> Reverting passt to version 20240326 works as expected, and the container
> >> stays online. It's been a long time since I wrote any C, but the code seems
> >> clear and checks that the endpoint and forwarding ports do not equal 0. I
> >> assume that a port 0 connection is not realistic or useful, and that actual
> >> attempt to connect over this port indicate a bug in the client code. Is this
> >> correct?
> > So, AFAICT the RFCs don't preclude using port 0 for connections on the
> > wire. However, it's usually not really sensible to do so: at least on
> > systems with a BSD-like socket interface, a port of 0 usually means
> > "unspecified" or "kernel, please pick for me". Obviously this client
> > is making it happen - my guess would be that a 0 port in connect() is
> > interpreted as a literal port 0, but I'm not sure how the server is
> > receiving it in thie case, since a bind() with port 0 will cause the
> > kernel to pick a port.
> >
> > So, it does look like the client is doing something weird, although
> > whether it's technically invalid is debateable.
> >
> > Even if it is valid for the client to do this, pasta can't really
> > handle that case, because it's using the sockets interface to do the
> > forwarding. BUT, it absolutely should not be crashing - it should log
> > a debug message, drop the connection and carry on.
> >
> > We have code which is supposed to handle this case gracefully before
> > reaching that assertion. I'm not immediately sure why that's not working.
> >
> > One possibility is that the client _isn't_ doing something weird, but
> > an unusual port forwarding configuration on pasta is remapping a
> > sensible port to port 0, thus causing the crash.
> >
> > Getting the full podman command line for the failing container would
> > be the next step here. If you could file a bug at
> > https://bugs.passt.top that would be most helpful.
>
> I tried to make an account on bugzilla a day or two ago, but haven't
> received the email confirmation link - I tried signing up using my
> personal domain (used here) and a free service (gmail). I came here as a
> second attempt to reach the devs!
>
> If you can get me hooked up over there, I can file a bug with more
> detailed logs and the podman command to reproduce.
I hope you got the Bugzilla confirmation request email by now, but
anyway, we just managed to reproduce this, and a fix is on its way, so
there's no need for you to collect more information. Thanks again!
--
Stefano
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Pasta 20240726 and newer crash with ASSERTION FAILED in flow_hash
2024-08-14 6:40 ` Stefano Brivio
@ 2024-08-14 10:01 ` David Gibson
2024-08-14 17:22 ` Matt Hamilton
0 siblings, 1 reply; 7+ messages in thread
From: David Gibson @ 2024-08-14 10:01 UTC (permalink / raw)
To: Stefano Brivio; +Cc: Matt Hamilton, passt-user
[-- Attachment #1: Type: text/plain, Size: 3360 bytes --]
On Wed, Aug 14, 2024 at 08:40:22AM +0200, Stefano Brivio wrote:
> Hi Matt,
>
> On Tue, 13 Aug 2024 22:58:42 -0700
> Matt Hamilton <matt@thmail.io> wrote:
>
> > I am using Podman in Fedora 40, which uses pasta by default for rootless
> > container networking.
> >
> > Fedora 40's base version of passt is `passt-0^20240326.g4988e2b-1.fc40`,
> > but recently two newer versions were released,
> > `passt-0^20240726.g57a21d2-1.fc40` and `0^20240806.gee36266-1.fc40`.
> >
> > After upgrading, one pod kept going offline after a few minutes. The
> > containers remained running, but could not make outbound connections.
> > Journalctl revealed that the pasta process for the pod had crashed with:
> >
> > Aug 08 23:07:55 dev pasta[95859]: ASSERTION FAILED in flow_hash
> > (flow.c:566): pif != PIF_NONE && !inany_is_unspecified(&side->eaddr)
> > && side->eport != 0 && side->fport != 0
> > Aug 08 23:07:55 dev audit[95859]: SECCOMP auid=1000 uid=1000
> > gid=1000 ses=1
> > subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023
> > pid=95859 comm="pasta.avx2" exe="/usr/bin/pasta.avx2" sig=31
> > arch=c000003e syscall=186 compat=0 ip=0x7f8f8c23b64f code=0x80000000
> > Aug 08 23:07:55 dev audit[95859]: ANOM_ABEND auid=1000 uid=1000
> > gid=1000 ses=1
> > subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023
> > pid=95859 comm="pasta.avx2" exe="/usr/bin/pasta.avx2" sig=31 res=1
> >
> > After much debugging, I isolated the trigger to a particular container
> > making a peer-to-peer TCP connection to a remote address with port 0.
>
> Thanks for the analysis and for the report!
>
> > Reverting passt to version 20240326 works as expected, and the container
> > stays online. It's been a long time since I wrote any C, but the code
> > seems clear and checks that the endpoint and forwarding ports do not
> > equal 0. I assume that a port 0 connection is not realistic or useful,
> > and that actual attempt to connect over this port indicate a bug in the
> > client code. Is this correct?
>
> Right, that's somehow unexpected because TCP port zero is reserved
> and not assigned, so it should never be used. However, I'm not sure how
> we can even reach flow_hash() with it.
>
> David, this seems to come from 163a339214dd ("tcp, flow: Replace TCP
> specific hash function with general flow hash"), any clue?
Stefano reproduced, and I've found the issue. The assert was intended
to check that we never created flows with 0 port - and we don't.
Unfortunately it was also invoked when searching for an existing flow
matching a new packet.
Patch coming shortly. Note that this will fix the crash, but it still
won't permit the connection to port 0 to go through. I don't know if
that will allow your application to run, or whether it relies on that
port 0 connection.
Actually allowing the connection to go through would be much harder.
It's easy to remove the explicit checks, obviously, but making sure we
never pass that 0 to an API where it doesn't mean what we want it to
would require some time.
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Pasta 20240726 and newer crash with ASSERTION FAILED in flow_hash
2024-08-14 10:01 ` David Gibson
@ 2024-08-14 17:22 ` Matt Hamilton
0 siblings, 0 replies; 7+ messages in thread
From: Matt Hamilton @ 2024-08-14 17:22 UTC (permalink / raw)
To: David Gibson, Stefano Brivio; +Cc: passt-user
On 8/14/24 3:01 AM, David Gibson wrote:
> On Wed, Aug 14, 2024 at 08:40:22AM +0200, Stefano Brivio wrote:
>> Hi Matt,
>>
>> On Tue, 13 Aug 2024 22:58:42 -0700
>> Matt Hamilton <matt@thmail.io> wrote:
>>
>>> I am using Podman in Fedora 40, which uses pasta by default for rootless
>>> container networking.
>>>
>>> Fedora 40's base version of passt is `passt-0^20240326.g4988e2b-1.fc40`,
>>> but recently two newer versions were released,
>>> `passt-0^20240726.g57a21d2-1.fc40` and `0^20240806.gee36266-1.fc40`.
>>>
>>> After upgrading, one pod kept going offline after a few minutes. The
>>> containers remained running, but could not make outbound connections.
>>> Journalctl revealed that the pasta process for the pod had crashed with:
>>>
>>> Aug 08 23:07:55 dev pasta[95859]: ASSERTION FAILED in flow_hash
>>> (flow.c:566): pif != PIF_NONE && !inany_is_unspecified(&side->eaddr)
>>> && side->eport != 0 && side->fport != 0
>>> Aug 08 23:07:55 dev audit[95859]: SECCOMP auid=1000 uid=1000
>>> gid=1000 ses=1
>>> subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023
>>> pid=95859 comm="pasta.avx2" exe="/usr/bin/pasta.avx2" sig=31
>>> arch=c000003e syscall=186 compat=0 ip=0x7f8f8c23b64f code=0x80000000
>>> Aug 08 23:07:55 dev audit[95859]: ANOM_ABEND auid=1000 uid=1000
>>> gid=1000 ses=1
>>> subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023
>>> pid=95859 comm="pasta.avx2" exe="/usr/bin/pasta.avx2" sig=31 res=1
>>>
>>> After much debugging, I isolated the trigger to a particular container
>>> making a peer-to-peer TCP connection to a remote address with port 0.
>> Thanks for the analysis and for the report!
>>
>>> Reverting passt to version 20240326 works as expected, and the container
>>> stays online. It's been a long time since I wrote any C, but the code
>>> seems clear and checks that the endpoint and forwarding ports do not
>>> equal 0. I assume that a port 0 connection is not realistic or useful,
>>> and that actual attempt to connect over this port indicate a bug in the
>>> client code. Is this correct?
>> Right, that's somehow unexpected because TCP port zero is reserved
>> and not assigned, so it should never be used. However, I'm not sure how
>> we can even reach flow_hash() with it.
>>
>> David, this seems to come from 163a339214dd ("tcp, flow: Replace TCP
>> specific hash function with general flow hash"), any clue?
> Stefano reproduced, and I've found the issue. The assert was intended
> to check that we never created flows with 0 port - and we don't.
> Unfortunately it was also invoked when searching for an existing flow
> matching a new packet.
>
> Patch coming shortly. Note that this will fix the crash, but it still
> won't permit the connection to port 0 to go through. I don't know if
> that will allow your application to run, or whether it relies on that
> port 0 connection.
>
> Actually allowing the connection to go through would be much harder.
> It's easy to remove the explicit checks, obviously, but making sure we
> never pass that 0 to an API where it doesn't mean what we want it to
> would require some time.
>
Thank you gents! I will pull the git repo and test a local build with
the patch applied.
I agree with your approach, pasta shouldn't be responsible for rewiring
a badly formatted connection request. The zero port destination address
is nonsensical so I'll keep running down the issue with the client devs
- I think they should be discarding the address instead of letting it
propagate to ultimately fail at the networking layer.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-08-14 17:22 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <1f7aefdc-11e8-4993-b647-7429da67b26c@thmail.io>
2024-08-14 6:39 ` Pasta 20240726 and newer crash with ASSERTION FAILED in flow_hash David Gibson
2024-08-14 6:56 ` Matt Hamilton
2024-08-14 7:01 ` Stefano Brivio
2024-08-14 9:57 ` Stefano Brivio
2024-08-14 6:40 ` Stefano Brivio
2024-08-14 10:01 ` David Gibson
2024-08-14 17:22 ` Matt Hamilton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).