On Wed, Nov 15, 2023 at 07:11:19PM +1100, David Gibson wrote: > On Wed, Nov 15, 2023 at 06:32:59AM +0100, Stefano Brivio wrote: > > On Wed, 15 Nov 2023 15:41:24 +1100 > > David Gibson wrote: > > > > > Usually, of course, it's invalid to pass a NULL buffer to recv(). However, > > > it's acceptable when using MSG_TRUNC, because that suppresses actually > > > writing to the buffer. So, we pass NULL in tcp_sock_consume(). > > > > > > Unfortunately, checker tools aren't always aware of that special case: we > > > already have a suppression for cppcheck to cover this. valgrind-3.22.0 > > > (present in Fedora 39) has a similar problem, generating a spurious warning > > > here. > > > > I haven't tried valgrind 3.22 yet, but... do you happen to know why > > test/valgrind.supp doesn't cover this anymore? > > Huh.. I hadn't spotted there was an existing suppression. I don't > know why that's not working any more, I can have a closer look. > > > > We could generate another suppression for valgrind, however, it so happens > > > that we already have tcp_buf_discard ready to hand. If we pass this > > > instead of NULL it makes both cppcheck and valgrind happy. We're still > > > using the MSG_TRUNC flag, the kernel doesn't actually have to copy data, > > > so we should still have the performance benefits of it. > > > > I'm not enthusiastic about this, because using tcp_buf_discard there > > might tell an optimising compiler that it's useful to prefetch it. > > > > We would also pass the actual address of tcp_buf_discard to the kernel, > > and I'm not sure if this has further subtle implications on possible > > optimisations in the kernel implementation (even though as you said no > > data is actually copied). > > Ok, fair points. I'll revisit this. Huh.. so.. this actually intersects with the stuff we discussed on the last call about whether it's a good idea to build without optimization for the valgrind tests (we currently do). So, in terms of -g, my understanding is that valgrind doesn't need debug symbols for its actual test mechanisms. But, I realised later, that it obviously does in order to identify meaningfully where the problems occurred - which also includes matching then to suppressions. So it seems the change that's caused the error for me is not in valgrind, but in the compiler. Even with -O0, the compiler in Fedora 39 is inlining tcp_sock_consume() (confirmed with objdump). Since there's no stack frame for it, valgrind doesn't match it against the suppression. I'm now testing a new spin that uses an explicit ((noinline) to fix the suppression. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson