public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* passt crashes on CentOS Stream 9
@ 2022-10-21 17:05 Andrea Bolognani
  2022-10-21 18:48 ` Stefano Brivio
  2022-10-22  8:57 ` Stefano Brivio
  0 siblings, 2 replies; 5+ messages in thread
From: Andrea Bolognani @ 2022-10-21 17:05 UTC (permalink / raw)
  To: passt-dev

This should probably be filed on bugzilla but I can't be bothered
signing up for yet another service, sorry! O:-)


Short version: in a CentOS Stream 9 container, install the latest
build (0^20221015.gb3f3591-1) from the official COPR, then run

  $ passt --runas 65534 -e -t 1234
  Segmentation fault (core dumped)


Doing the same thing in a CentOS Stream 8 container doesn't result in
a crash, and the previous build (0^20220929.g06aa26f-1) is fine even
on CentOS Stream 9.


The backtrace produced by gdb doesn't look very illuminating, but
maybe it will make more sense to a developer:

  Starting program: /usr/bin/passt --runas 65534 -e -t 1234
  warning: Error disabling address space randomization: Operation not permitted
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib64/libthread_db.so.1".
  process 2856 is executing new program: /usr/bin/passt.avx2
  warning: Could not load shared library symbols for linux-vdso.so.1.
  Do you need "set solib-search-path" or "set sysroot"?
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib64/libthread_db.so.1".
  Program received signal SIGSEGV, Segmentation fault.
  0x000055663d5307ff in nl_sock_init (c=0x7fffd7fe4ed0, ns=false) at
/usr/src/debug/passt-0^20221015.gb3f3591-1.el9.x86_64/netlink.c:78
  78	{
  (gdb) t a a bt

  Thread 1 (Thread 0x7fe8763da740 (LWP 2856) "passt.avx2"):
  #0  0x000055663d5307ff in nl_sock_init (c=0x7fffd7fe4ed0, ns=false)
at /usr/src/debug/passt-0^20221015.gb3f3591-1.el9.x86_64/netlink.c:78
  #1  0x000055663d531296 in conf (c=<optimized out>, argc=<optimized
out>, argv=<optimized out>) at
/usr/src/debug/passt-0^20221015.gb3f3591-1.el9.x86_64/conf.c:1547
  #2  0x000055663d5262e6 in main (argc=6, argv=0x7fffd82b2a98) at
/usr/src/debug/passt-0^20221015.gb3f3591-1.el9.x86_64/passt.c:243


A very interesting thing that I've noticed is that the crash doesn't
occur when building from upstream sources (tag 2022_10_15.b3f3591, so
it should match what's in the RPM). So I've tried looking into the
compiler options used during the RPM build, and the gcc command line
for passt.avx2 looks like

  gcc -Wall -Wextra -pedantic -std=c99 -D_XOPEN_SOURCE=700 \
  -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -pie -fPIE -DPAGE_SIZE=4096 \
  -DNETNS_RUN_DIR=\"/run/netns\" -DPASST_AUDIT_ARCH=AUDIT_ARCH_X86_64 \
  -DRLIMIT_STACK_VAL=8192 -DARCH=\"x86_64\" \
  -DVERSION=\"0^20221015.gb3f3591-1.el9.x86_64\" -DTCP_HASH_NOINLINE \
  -DSIPHASH_20B_NOINLINE -DCSUM_UNALIGNED_NO_IPA -DHAS_SND_WND \
  -DHAS_BYTES_ACKED -DHAS_MIN_RTT -DHAS_GETRANDOM \
  -fstack-protector-strong -Ofast -mavx2 -ftree-vectorize \
  -funroll-loops -flto=auto -ffat-lto-objects -fexceptions -g \
  -grecord-gcc-switches -pipe -Wall -Werror=format-security \
  -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS \
  -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 \
  -fstack-protector-strong \
  -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=x86-64-v2 \
  -mtune=generic -fasynchronous-unwind-tables \
  -fstack-clash-protection -fcf-protection arch.c arp.c checksum.c \
  conf.c dhcp.c dhcpv6.c icmp.c igmp.c isolation.c lineread.c log.c \
  mld.c ndp.c netlink.c packet.c passt.c pasta.c pcap.c siphash.c \
  tap.c tcp.c tcp_splice.c udp.c util.c -o passt.avx2 -Wl,-z,relro \
  -Wl,--as-needed -Wl,-z,now \
  -specs=/usr/lib/rpm/redhat/redhat-hardened-ld \
  -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1

I tried making educated guesses at which ones among those could cause
trouble, and pretty quickly landed on the LTO stuff. Indeed, dropping

  -flto=auto -ffat-lto-objects

from the command results in a working binary, and adding

  %global _lto_cflags %nil

to the top of the spec file produces a working RPM.


Of course disabling LTO is a workaround, not a solution, especially
considering that the previous version didn't have any problem with
it, but hopefully there's enough information in here to allow the
developers to track down and resolve the underlying issue :)

-- 
Andrea Bolognani / Red Hat / Virtualization


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: passt crashes on CentOS Stream 9
  2022-10-21 17:05 passt crashes on CentOS Stream 9 Andrea Bolognani
@ 2022-10-21 18:48 ` Stefano Brivio
  2022-10-21 19:15   ` Stefano Brivio
  2022-10-22  8:57 ` Stefano Brivio
  1 sibling, 1 reply; 5+ messages in thread
From: Stefano Brivio @ 2022-10-21 18:48 UTC (permalink / raw)
  To: Andrea Bolognani; +Cc: passt-dev

Hi Andrea,

Thanks for reporting!

On Fri, 21 Oct 2022 10:05:03 -0700
Andrea Bolognani <abologna@redhat.com> wrote:

> This should probably be filed on bugzilla but I can't be bothered
> signing up for yet another service, sorry! O:-)

Ah, no worries at all, this ought to be fixed quickly enough.

> Short version: in a CentOS Stream 9 container, install the latest
> build (0^20221015.gb3f3591-1) from the official COPR, then run
> 
>   $ passt --runas 65534 -e -t 1234
>   Segmentation fault (core dumped)
> 
> 
> Doing the same thing in a CentOS Stream 8 container doesn't result in
> a crash, and the previous build (0^20220929.g06aa26f-1) is fine even
> on CentOS Stream 9.
> 
> 
> The backtrace produced by gdb doesn't look very illuminating, but
> maybe it will make more sense to a developer:
> 
>   Starting program: /usr/bin/passt --runas 65534 -e -t 1234
>   warning: Error disabling address space randomization: Operation not permitted
>   [Thread debugging using libthread_db enabled]
>   Using host libthread_db library "/lib64/libthread_db.so.1".
>   process 2856 is executing new program: /usr/bin/passt.avx2
>   warning: Could not load shared library symbols for linux-vdso.so.1.
>   Do you need "set solib-search-path" or "set sysroot"?
>   [Thread debugging using libthread_db enabled]
>   Using host libthread_db library "/lib64/libthread_db.so.1".
>   Program received signal SIGSEGV, Segmentation fault.
>   0x000055663d5307ff in nl_sock_init (c=0x7fffd7fe4ed0, ns=false) at
> /usr/src/debug/passt-0^20221015.gb3f3591-1.el9.x86_64/netlink.c:78
>   78	{
>   (gdb) t a a bt
> 
>   Thread 1 (Thread 0x7fe8763da740 (LWP 2856) "passt.avx2"):
>   #0  0x000055663d5307ff in nl_sock_init (c=0x7fffd7fe4ed0, ns=false)
> at /usr/src/debug/passt-0^20221015.gb3f3591-1.el9.x86_64/netlink.c:78
>   #1  0x000055663d531296 in conf (c=<optimized out>, argc=<optimized
> out>, argv=<optimized out>) at  
> /usr/src/debug/passt-0^20221015.gb3f3591-1.el9.x86_64/conf.c:1547
>   #2  0x000055663d5262e6 in main (argc=6, argv=0x7fffd82b2a98) at
> /usr/src/debug/passt-0^20221015.gb3f3591-1.el9.x86_64/passt.c:243
> 
> 
> A very interesting thing that I've noticed is that the crash doesn't
> occur when building from upstream sources (tag 2022_10_15.b3f3591, so
> it should match what's in the RPM). So I've tried looking into the
> compiler options used during the RPM build, and the gcc command line
> for passt.avx2 looks like
> 
>   gcc -Wall -Wextra -pedantic -std=c99 -D_XOPEN_SOURCE=700 \
>   -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -pie -fPIE -DPAGE_SIZE=4096 \
>   -DNETNS_RUN_DIR=\"/run/netns\" -DPASST_AUDIT_ARCH=AUDIT_ARCH_X86_64 \
>   -DRLIMIT_STACK_VAL=8192 -DARCH=\"x86_64\" \
>   -DVERSION=\"0^20221015.gb3f3591-1.el9.x86_64\" -DTCP_HASH_NOINLINE \
>   -DSIPHASH_20B_NOINLINE -DCSUM_UNALIGNED_NO_IPA -DHAS_SND_WND \
>   -DHAS_BYTES_ACKED -DHAS_MIN_RTT -DHAS_GETRANDOM \
>   -fstack-protector-strong -Ofast -mavx2 -ftree-vectorize \
>   -funroll-loops -flto=auto -ffat-lto-objects -fexceptions -g \
>   -grecord-gcc-switches -pipe -Wall -Werror=format-security \
>   -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS \
>   -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 \
>   -fstack-protector-strong \
>   -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=x86-64-v2 \
>   -mtune=generic -fasynchronous-unwind-tables \
>   -fstack-clash-protection -fcf-protection arch.c arp.c checksum.c \
>   conf.c dhcp.c dhcpv6.c icmp.c igmp.c isolation.c lineread.c log.c \
>   mld.c ndp.c netlink.c packet.c passt.c pasta.c pcap.c siphash.c \
>   tap.c tcp.c tcp_splice.c udp.c util.c -o passt.avx2 -Wl,-z,relro \
>   -Wl,--as-needed -Wl,-z,now \
>   -specs=/usr/lib/rpm/redhat/redhat-hardened-ld \
>   -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1
> 
> I tried making educated guesses at which ones among those could cause
> trouble, and pretty quickly landed on the LTO stuff. Indeed, dropping

Nice guess ;)

>   -flto=auto -ffat-lto-objects
> 
> from the command results in a working binary, and adding
> 
>   %global _lto_cflags %nil
> 
> to the top of the spec file produces a working RPM.

Uh-oh. We recently worked around a couple of issues we hit with LTO and
gcc 12 (which doesn't automatically imply gcc has an issue, of course):

  06aa26fcf398 Makefile: Hack for optimised-away store in ndp() before checksum calculation
https://passt.top/passt/commit/?id=06aa26fcf398f5d19ab46e42996190d7f95e837a

  505a33e9f9d9 Makefile: Extend noinline workarounds for LTO and -O2 to gcc 12
https://passt.top/passt/commit/?id=505a33e9f9d9d766e39fd9c54c6cb2136ae99ecb

...I wonder if this is somehow related.

Could you also quickly try to start it with strace and report a couple
of lines before the mischief?

> Of course disabling LTO is a workaround, not a solution, especially
> considering that the previous version didn't have any problem with
> it, but hopefully there's enough information in here to allow the
> developers to track down and resolve the underlying issue :)

Probably yes. :) I'm looking into this quickly now, but I'll be
travelling tomorrow. If I fail, I hope David is faster than me ;)

-- 
Stefano


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: passt crashes on CentOS Stream 9
  2022-10-21 18:48 ` Stefano Brivio
@ 2022-10-21 19:15   ` Stefano Brivio
  2022-10-21 19:33     ` Stefano Brivio
  0 siblings, 1 reply; 5+ messages in thread
From: Stefano Brivio @ 2022-10-21 19:15 UTC (permalink / raw)
  To: Andrea Bolognani; +Cc: passt-dev

On Fri, 21 Oct 2022 20:48:20 +0200
Stefano Brivio <sbrivio@redhat.com> wrote:

> Could you also quickly try to start it with strace and report a couple
> of lines before the mischief?

Never mind, just reproduced...

-- 
Stefano


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: passt crashes on CentOS Stream 9
  2022-10-21 19:15   ` Stefano Brivio
@ 2022-10-21 19:33     ` Stefano Brivio
  0 siblings, 0 replies; 5+ messages in thread
From: Stefano Brivio @ 2022-10-21 19:33 UTC (permalink / raw)
  To: Andrea Bolognani; +Cc: passt-dev

On Fri, 21 Oct 2022 21:15:07 +0200
Stefano Brivio <sbrivio@redhat.com> wrote:

> On Fri, 21 Oct 2022 20:48:20 +0200
> Stefano Brivio <sbrivio@redhat.com> wrote:
> 
> > Could you also quickly try to start it with strace and report a couple
> > of lines before the mischief?  
> 
> Never mind, just reproduced...

Another workaround:

diff --git a/util.h b/util.h
index 27829b1..64b9a26 100644
--- a/util.h
+++ b/util.h
@@ -72,7 +72,7 @@
 #define IPV4_IS_LOOPBACK(addr)						\
 	((addr) >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET)
 
-#define NS_FN_STACK_SIZE	(RLIMIT_STACK_VAL * 1024 / 4)
+#define NS_FN_STACK_SIZE	(RLIMIT_STACK_VAL * 1024 / 10)
 #define NS_CALL(fn, arg)						\
 	do {								\
 		char ns_fn_stack[NS_FN_STACK_SIZE];			\

...we need to harden this "against" -fstack-protector-strong when
inlining gets quite extreme due to LTO, with some build-time
assertions, or a more reasonable (and involved) calculation of what
ns_fn_stack really needs.

I'll try to send a patch soon (again, if nobody beats me at it).

-- 
@@ -72,7 +72,7 @@
 #define IPV4_IS_LOOPBACK(addr)						\
 	((addr) >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET)
 
-#define NS_FN_STACK_SIZE	(RLIMIT_STACK_VAL * 1024 / 4)
+#define NS_FN_STACK_SIZE	(RLIMIT_STACK_VAL * 1024 / 10)
 #define NS_CALL(fn, arg)						\
 	do {								\
 		char ns_fn_stack[NS_FN_STACK_SIZE];			\

...we need to harden this "against" -fstack-protector-strong when
inlining gets quite extreme due to LTO, with some build-time
assertions, or a more reasonable (and involved) calculation of what
ns_fn_stack really needs.

I'll try to send a patch soon (again, if nobody beats me at it).

-- 
Stefano


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: passt crashes on CentOS Stream 9
  2022-10-21 17:05 passt crashes on CentOS Stream 9 Andrea Bolognani
  2022-10-21 18:48 ` Stefano Brivio
@ 2022-10-22  8:57 ` Stefano Brivio
  1 sibling, 0 replies; 5+ messages in thread
From: Stefano Brivio @ 2022-10-22  8:57 UTC (permalink / raw)
  To: Andrea Bolognani; +Cc: passt-dev

On Fri, 21 Oct 2022 10:05:03 -0700
Andrea Bolognani <abologna@redhat.com> wrote:

> This should probably be filed on bugzilla but I can't be bothered
> signing up for yet another service, sorry! O:-)
> 
> Short version: in a CentOS Stream 9 container, install the latest
> build (0^20221015.gb3f3591-1) from the official COPR, then run

It should now be fixed in 0^20221022.gb68da10-1:
  https://download.copr.fedorainfracloud.org/results/sbrivio/passt/centos-stream-9-x86_64/04973930-passt/

Persistent mirror at:
  https://passt.top/builds/copr/centos-stream-9-x86_64/04973930-passt/

-- 
Stefano


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-10-22  8:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-21 17:05 passt crashes on CentOS Stream 9 Andrea Bolognani
2022-10-21 18:48 ` Stefano Brivio
2022-10-21 19:15   ` Stefano Brivio
2022-10-21 19:33     ` Stefano Brivio
2022-10-22  8:57 ` Stefano Brivio

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).