On Thu, Jan 02, 2025 at 11:00:04PM +0100, Stefano Brivio wrote:
> On Thu, 2 Jan 2025 13:15:40 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Wed, Jan 01, 2025 at 10:54:37PM +0100, Stefano Brivio wrote:
> > > On Fri, 20 Dec 2024 19:35:30 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > Two places in the code use the packet_get_try() variant on packet_get().
> > > > The difference is that packet_get_try() passes a NULL 'func' to
> > > > packet_get_do(), which suppresses log messages.  The places we use this
> > > > are ones where we expect to sometimes have a failure retreiving the packet
> > > > range, even in normal cases.  So, suppressing the log messages seems like
> > > > it makes sense, except:
> > > > 
> > > > 1) It suppresses log messages on all errors.  We (somewhat) expect to hit
> > > >    the case where the requested range is not within the received packet.
> > > >    However, it also suppresses message in cases where the requested packet
> > > >    index doesn't exist, the requested range has a nonsensical length or
> > > >    doesn't lie in even the right vague part of memory.  None of those
> > > >    latter cases are expected, and the message would be helpful if we ever
> > > >    actually hit them.
> > > > 
> > > > 2) The suppressed messages aren't actually that disruptive.  For the case
> > > >    in ip.c, we'd log only if we run out of IPv6 packet before reaching a
> > > >    (non-option) L4 header.  That shouldn't be the case in normal operation
> > > >    so getting a message (at trave level) is not unreasonable.
> > > >    For the case in dhcpv6.c we do suppress a message every time we look for
> > > >    but don't find a specific DHCPv6 option.  That can happen in perfectly
> > > >    ok cases, but, again these are trace() level and DHCPv6 transactions
> > > >    aren't that common.  Suppressing the message for this case doesn't
> > > >    seem worth the drawbacks of (1).  
> > > 
> > > The reason why I implemented packet_get_try() is that the messages from
> > > packet_get_do() indicate serious issues, and if I'm debugging something
> > > by looking at traces it's not great to have messages indicating that we
> > > hit a serious issue while we're simply validating identity associations.  
> > 
> > I'm not following your argument here.  It's exactly because (most of)
> > the message indicate serious issues that I don't want to suppress
> > them.  I don't know what you mean by "validating identity
> > associations".
> 
> But dhcpv6_opt() trying to get data that doesn't exist is *not* an
> issue, including not a serious one, so if I'm debugging something with
> --trace and I see one of these messages I'll shout at "memory" or
> "packet" badness and waste time thinking it's an actual issue.

Oh.. I think I see the confusion.  dhcpv6_opt() trying to get data
that's not in the packet is not an issue.  dhcpv6_opt() trying to get
data that is (theoretically) within the packet, but *not* in the
buffer indicates something very bad has happened.  The former is
exactly one check, every other one is the second class - trying to
separate those cases is the purpose of the later "different
severities" patch.

The difficulty is that passing func==NULL to indicate the "try" case
doesn't work if we want to still give useful errors for the serious
cases: we need the function name for those too.

I had been considering printing occasional trace level messages for
the ok case an acceptable tradeoff for not suppressing the messages
which are serious.  But I see your case or that being too confusing
when debugging.  I did have a draft where I used an explicit boolean
flag to enable/disable the non-serious errors, but gave up on it for
simplicity.

I'll look into a way to continue suppressing the non-serious error
here.  Maybe moving the (single) non-serious error case message into
the caller with a wrapper.

> Validating identity associations (IA_NA, IA_TA, RFC 3315) is what
> dhcpv6_ia_notonlink() does. That's the most common case where we'll
> routinely call dhcpv6_opt() to fetch data which isn't there.

Ok.

> > > It's not about the amount of logged messages, it's about the type of
> > > message being logged and the distracting noise possibly resulting in a
> > > substantial time waste.
> > > 
> > > About 1): dhcpv6_opt() always picks pool index 0, and the base offset
> > > was already checked by the caller.  
> > 
> > Right, but dhcpv6_opt() is called in a loop, that only stops when it
> > returns NULL.  So, by construction the last call to dhcpv6_opt(),
> > which terminates the loop, _will_ have a failing call to packet_get().
> > At this point - at least assuming a correctly constructed packet - the
> > offset will point to just past the last option, which should be
> > exactly at the end of the packet.
> 
> Yes, I get that, and:
> 
> - I would be happy if that one were *not* reported as failure

Right, that's also my preference, but as above I compromised on this
to simplify preserving the error cases that do matter.

> - the calls before that one should always be enough to check if we have
>   an actual issue with the packet

Yes, in this case I think that's correct.

> > > In ipv6_l4hdr(), the index was
> > > already validated by a previous call to packet_get(), and the starting
> > > offset as well.  
> > 
> > Not AFAICT, the initial packet_get just validates the basic IPv6
> > header.  The calls to packet_get_try() in the loop validate additional
> > IP options.  I don't think it will ever fail on a well-constructed
> > packet, but it could on a bogus (or truncated) packet, where the
> > nexthdr field indicates an option that's actually missing.
> > 
> > This is kind of my point: it will only trip on a badly constructed
> > packet, in which case I don't think we want to suppress messages.
> 
> There, I used packet_get_try() because a missing option or payload
> doesn't indicate a bad packet at the _data_ level.

Not really sure what you mean by the data level, here.

> On the other hand, it's bad at the network level anyway, because option
> 59 *must* be there otherwise (I just realised), so while I'd still
> prefer another wording of the warning (not mentioning packet/buffer
> ranges... something more network-y), I would be fine with it.

That sounds like another argument for moving the message for the
"requested range is outside packet" case into the caller.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson