On Thu, Jan 02, 2025 at 11:00:04PM +0100, Stefano Brivio wrote: > On Thu, 2 Jan 2025 13:15:40 +1100 > David Gibson wrote: > > > On Wed, Jan 01, 2025 at 10:54:37PM +0100, Stefano Brivio wrote: > > > On Fri, 20 Dec 2024 19:35:30 +1100 > > > David Gibson wrote: > > > > > > > Two places in the code use the packet_get_try() variant on packet_get(). > > > > The difference is that packet_get_try() passes a NULL 'func' to > > > > packet_get_do(), which suppresses log messages. The places we use this > > > > are ones where we expect to sometimes have a failure retreiving the packet > > > > range, even in normal cases. So, suppressing the log messages seems like > > > > it makes sense, except: > > > > > > > > 1) It suppresses log messages on all errors. We (somewhat) expect to hit > > > > the case where the requested range is not within the received packet. > > > > However, it also suppresses message in cases where the requested packet > > > > index doesn't exist, the requested range has a nonsensical length or > > > > doesn't lie in even the right vague part of memory. None of those > > > > latter cases are expected, and the message would be helpful if we ever > > > > actually hit them. > > > > > > > > 2) The suppressed messages aren't actually that disruptive. For the case > > > > in ip.c, we'd log only if we run out of IPv6 packet before reaching a > > > > (non-option) L4 header. That shouldn't be the case in normal operation > > > > so getting a message (at trave level) is not unreasonable. > > > > For the case in dhcpv6.c we do suppress a message every time we look for > > > > but don't find a specific DHCPv6 option. That can happen in perfectly > > > > ok cases, but, again these are trace() level and DHCPv6 transactions > > > > aren't that common. Suppressing the message for this case doesn't > > > > seem worth the drawbacks of (1). > > > > > > The reason why I implemented packet_get_try() is that the messages from > > > packet_get_do() indicate serious issues, and if I'm debugging something > > > by looking at traces it's not great to have messages indicating that we > > > hit a serious issue while we're simply validating identity associations. > > > > I'm not following your argument here. It's exactly because (most of) > > the message indicate serious issues that I don't want to suppress > > them. I don't know what you mean by "validating identity > > associations". > > But dhcpv6_opt() trying to get data that doesn't exist is *not* an > issue, including not a serious one, so if I'm debugging something with > --trace and I see one of these messages I'll shout at "memory" or > "packet" badness and waste time thinking it's an actual issue. Oh.. I think I see the confusion. dhcpv6_opt() trying to get data that's not in the packet is not an issue. dhcpv6_opt() trying to get data that is (theoretically) within the packet, but *not* in the buffer indicates something very bad has happened. The former is exactly one check, every other one is the second class - trying to separate those cases is the purpose of the later "different severities" patch. The difficulty is that passing func==NULL to indicate the "try" case doesn't work if we want to still give useful errors for the serious cases: we need the function name for those too. I had been considering printing occasional trace level messages for the ok case an acceptable tradeoff for not suppressing the messages which are serious. But I see your case or that being too confusing when debugging. I did have a draft where I used an explicit boolean flag to enable/disable the non-serious errors, but gave up on it for simplicity. I'll look into a way to continue suppressing the non-serious error here. Maybe moving the (single) non-serious error case message into the caller with a wrapper. > Validating identity associations (IA_NA, IA_TA, RFC 3315) is what > dhcpv6_ia_notonlink() does. That's the most common case where we'll > routinely call dhcpv6_opt() to fetch data which isn't there. Ok. > > > It's not about the amount of logged messages, it's about the type of > > > message being logged and the distracting noise possibly resulting in a > > > substantial time waste. > > > > > > About 1): dhcpv6_opt() always picks pool index 0, and the base offset > > > was already checked by the caller. > > > > Right, but dhcpv6_opt() is called in a loop, that only stops when it > > returns NULL. So, by construction the last call to dhcpv6_opt(), > > which terminates the loop, _will_ have a failing call to packet_get(). > > At this point - at least assuming a correctly constructed packet - the > > offset will point to just past the last option, which should be > > exactly at the end of the packet. > > Yes, I get that, and: > > - I would be happy if that one were *not* reported as failure Right, that's also my preference, but as above I compromised on this to simplify preserving the error cases that do matter. > - the calls before that one should always be enough to check if we have > an actual issue with the packet Yes, in this case I think that's correct. > > > In ipv6_l4hdr(), the index was > > > already validated by a previous call to packet_get(), and the starting > > > offset as well. > > > > Not AFAICT, the initial packet_get just validates the basic IPv6 > > header. The calls to packet_get_try() in the loop validate additional > > IP options. I don't think it will ever fail on a well-constructed > > packet, but it could on a bogus (or truncated) packet, where the > > nexthdr field indicates an option that's actually missing. > > > > This is kind of my point: it will only trip on a badly constructed > > packet, in which case I don't think we want to suppress messages. > > There, I used packet_get_try() because a missing option or payload > doesn't indicate a bad packet at the _data_ level. Not really sure what you mean by the data level, here. > On the other hand, it's bad at the network level anyway, because option > 59 *must* be there otherwise (I just realised), so while I'd still > prefer another wording of the warning (not mentioning packet/buffer > ranges... something more network-y), I would be fine with it. That sounds like another argument for moving the message for the "requested range is outside packet" case into the caller. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson