From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top, Andrea Bolognani <abologna@redhat.com>
Subject: Re: [PATCH] util: Set NS_FN_STACK_SIZE to one eighth of ulimit-reported maximum stack size
Date: Mon, 24 Oct 2022 02:36:33 +0200 [thread overview]
Message-ID: <20221024023633.526148e5@elisabeth> (raw)
In-Reply-To: <Y1XT10cYgAX5pecw@yekko>
On Mon, 24 Oct 2022 10:52:55 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> On Mon, Oct 24, 2022 at 10:36:19AM +1100, David Gibson wrote:
> > On Sat, Oct 22, 2022 at 10:15:35AM +0200, Stefano Brivio wrote:
> > > On Sat, 22 Oct 2022 08:45:03 +0200
> > > Stefano Brivio <sbrivio@redhat.com> wrote:
> > >
> > > > ...instead of one fourth. On the main() -> conf() -> nl_sock_init()
> > > > call path, LTO from gcc 12 on (at least) x86_64 decides to inline...
> > > > everything: nl_sock_init() is effectively part of main(), after
> > > > commit 3e2eb4337bc0 ("conf: Bind inbound ports with
> > > > CAP_NET_BIND_SERVICE before isolate_user()").
> > > >
> > > > This means we exceed the maximum stack size, and we get SIGSEGV,
> > > > under any condition, at start time, as reported by Andrea on a recent
> > > > build for CentOS Stream 9.
> > > >
> > > > The calculation of NS_FN_STACK_SIZE, which is the stack size we
> > > > reserve for clones, was previously obtained by dividing the maximum
> > > > stack size by two, to avoid an explicit check on architecture (on
> > > > PA-RISC, also known as hppa, the stack grows up, so we point the
> > > > clone to the middle of this area), and then further divided by two
> > > > to allow for any additional usage in the caller.
> > > >
> > > > Well, if there are essentially no function calls anymore, this is
> > > > not enough. Divide it by eight, which is anyway much more than
> > > > possibly needed by any clone()d callee.
> > > >
> > > > I think this is robust, so it's a fix in some sense. Strictly
> > > > speaking, though, we have no formal guarantees that this isn't
> > > > either too little or too much.
> > > >
> > > > What we should do, eventually: check cloned() callees, there are just
> > > > thirteen of them at the moment. Note down any stack usage (they are
> > > > mostly small helpers), bonus points for an automated way at build
> > > > time, quadruple that or so, to allow for extreme clumsiness, and use
> > > > as NS_FN_STACK_SIZE. Perhaps introduce a specific condition for hppa.
> > > >
> > > > Reported-by: Andrea Bolognani <abologna@redhat.com>
> > > > Fixes: 3e2eb4337bc0 ("conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user()")
> > > > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> > > > ---
> > >
> > > I posted this in any case for (later) review, but I'm actually applying
> > > it right away, given that some builds are completely unusable otherwise.
> >
> > That's sensible, however, this patch confuses me. I don't really
> > understand how reducing the stack size is avoiding a SEGV, regardless
> > of what LTO does.
>
> Shortly after I wrote this, I realized what the issue was. IIUC, the
> problem is basically because we're allocating the stack of the
> sub-thread as a buffer on the stack of the main thread, so the main
> thread stack has to have room for both.
Right, I could have phrased this better.
> > The fact that we're basing the runtime stack size
> > on a limit that's from build time also doesn't really make sense to
> > me.
>
> This aspect still seems pretty bogus to me.
It is.
It's still a useful approximation, because those limits are rarely set
to non-default per-arch values (all the distributions and versions we
test happen to have, on a given architecture, the same value), and at
the same time vary wildly depending on the architecture. And with that,
we avoid bug-prone VLAs or, worse, alloca().
And if users set limits to substantially lower values, typically other
programs won't be able to run as well.
But... we don't really need that. With 16 KiB, you usually won't be able
to ls. With 128 KiB, gimp crashes for me. We probably need something in
between, but that implies we shouldn't just give NS_CALL()ed helpers "a
lot", rather (some multiple of) what they need.
They're not many and they're quite short, so we could note down some
manual calculations, or we could automate that with nm, or pahole, or
gcc -save-temps... or even some objdump script. I'm not sure that's
worth adding a further compilation phase, though.
We have well-known architecture-specific type sizes and alignments on
Linux and we already do something similar for AVX2 alignments, so I
would try to do this manually, take some reasonable margin on top, and
maybe add an explicit stack guard (I think we shouldn't rely on
-fstack-protector alone, or at all).
--
Stefano
prev parent reply other threads:[~2022-10-24 0:37 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-22 6:45 [PATCH] util: Set NS_FN_STACK_SIZE to one eighth of ulimit-reported maximum stack size Stefano Brivio
2022-10-22 8:15 ` Stefano Brivio
2022-10-23 23:36 ` David Gibson
2022-10-23 23:52 ` David Gibson
2022-10-24 0:36 ` Stefano Brivio [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221024023633.526148e5@elisabeth \
--to=sbrivio@redhat.com \
--cc=abologna@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).