public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: [PATCH v2 09/12] packet: Distinguish severities of different packet_{add,git}_do() errors
Date: Mon, 6 Jan 2025 11:55:25 +0100	[thread overview]
Message-ID: <20250106115525.0d3c393f@elisabeth> (raw)
In-Reply-To: <Z3dwcduhnXLzZgHe@zatzit>

On Fri, 3 Jan 2025 16:06:57 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Jan 02, 2025 at 11:00:08PM +0100, Stefano Brivio wrote:
> > On Thu, 2 Jan 2025 13:58:19 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Wed, Jan 01, 2025 at 10:54:41PM +0100, Stefano Brivio wrote:  
> > > > On Fri, 20 Dec 2024 19:35:32 +1100
> > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >     
> > > > > packet_add() and packet_get() can fail for a number of reasons, and we
> > > > > currently treat them all basically the same: we log a trace() level message
> > > > > and for packet_get() we return NULL.  However the different causes of
> > > > > error are quite different and suggest different handling:
> > > > > 
> > > > > 1) If we run out of space in the pool on add, that's (rightly) non-fatal,
> > > > >    but is an unusual situation which we might want to know about.  Promote
> > > > >    it's message to debug() level.
> > > > > 
> > > > > 2) All packet_check_range() errors and the checks on packet length indicate
> > > > >    a serious problem.  Due to either a bug in calling code, or some sort
> > > > >    of memory-clobbering, we've been given addresses or sizes of packets
> > > > >    that are nonsensical.  If things are this bad, we shouldn't try to carry
> > > > >    on replace these with asserts.
> > > > > 
> > > > > 3) Requesting a packet index that doesn't exist in the pool in packet_get()
> > > > >    indicates a bug in the caller - it should always check the index first.
> > > > >    Replace this with an assert.    
> > > > 
> > > > The reasons for 2) and 3) being trace() messages (they should probably
> > > > be debug() following the same reasoning as 1)) are graceful degradation
> > > > and security (availability).    
> > > 
> > > Hmm..
> > >   
> > > > If we have a packet triggering some sort of "memory clobbering", or an
> > > > issue in the caller, that might very well be just one packet or a few
> > > > of them.    
> > > 
> > > I see the case for packet_add_do().  In that case it could be that
> > > this is just a bad packet triggering bugs in the initial processing
> > > code, and just dropping it will be sufficient to save us from damaging
> > > state.
> > > 
> > > For packet_get_do(), however, this indicates that we have a packet
> > > that's out of bounds, despite the fact that we checked its bounds when
> > > it was added to the pool.  That means we *already* have corrupted
> > > memory, and we have no way to assess the extent of the damage.  It's
> > > entirely possible we're already badly compromised.  Under those
> > > circumstances I think it is much better to terminate immediately,
> > > rather than risk doing any more hard to trace damage.  
> > 
> > It depends on which part of packet_get_do(). If we're exceeding the
> > packet count or the size of a pool, then I guess yes, otherwise not so
> > much.  
> 
> There's one, maybe two of the tests in packet_get_do() that don't
> indicate a serious error:
> 	if (len + offset > p->pkt[idx].iov_len) {
> is definitely ok, and you've convinced me I should try harder to make
> sure this doesn't print any error (even trace()) in at least the "try"
> cases.
> 	if (len > UINT16_MAX) {
> is questionable.  Technically it could be ok, but since that parameter
> is generally a constant, chances are it does indicate a bad error in
> the caller.

...generally, yes, but I'm sure we don't want to crash on:

- a malformed DHCP request plus subtle bug (not saying that there's
  one, just that I could see myself adding one), dhcp():

  packet_get(p, 0, offset + opt_off + 2,   *olen,  NULL);

- a malformed TCP header plus some kind of bug, tcp_data_from_tap():

  data = packet_get(p, i, off, len, NULL);

Yes, those would need "bad errors". But the bad error here would crash
us with an assert, and cause us to throw away one packet without it.

> All the other checks, from packet_check_range() must indicate
> something very bad.
> 
> > > > I think it's a disservice to users to crash in that case, and it has
> > > > the potential to worsen a security flaw if this packet can be built on
> > > > purpose. It's also a disservice to package maintainers because it has
> > > > the potential to turn a harmless issue into an annoying situation with
> > > > associated time pressure and everything.    
> > > 
> > > Ok, that argument makes sense to me for packet_add_do(), but again,
> > > not for packet_get_do().
> > >   
> > > > Those are not warn() calls, by the way, because we don't want to make
> > > > it too easy, in that (perhaps unlikely) case, to flood logs, and
> > > > possibly hide information due to rotation.    
> > > 
> > > Right, I see the argument, but it also worries me that the message -
> > > which even if we are successfully containing the damage here
> > > definitely indicates a bug in the caller - could easily never be seen
> > > at trace or even debug level.
> > >   
> > > > I suppose that the argument for asserts here is so that we discover
> > > > functional issues as soon as possible, which I understand of course,    
> > > 
> > > That's the main reason.  The secondary reason is that it clarifies the
> > > meaning of a NULL return from packet_get_do().  With this change it
> > > means: you requested a range that's not in the packet, whereas
> > > previously it could mean either that or "something is horribly wrong
> > > in the pool state, and we don't know what".
> > >   
> > > > but it relies on two assumptions that might not hold:
> > > > 
> > > > 1. the crash will be reported, while a debug() message will go
> > > >    unnoticed.
> > > > 
> > > >    Well, the crash is more noticeable, but that doesn't mean that it
> > > >    will be reported. Users might/will try to hack something around it
> > > >    or try to survive with slirp4netns and all its flaws.    
> > > 
> > > I suppose so.
> > >   
> > > >    On the other hand, a malfunction (say, failed connections) might be
> > > >    equally likely to be reported, along with debug() messages.    
> > > 
> > > That only applies if this causes malfunctions in intended behaviour.
> > > If it's triggered due to an attack, rather than a flaw handling
> > > expected behaviour then there's no reason it would be reported.
> > >   
> > > > 2. the check and the assertion themselves are correct. This was not the
> > > >    case for the two most recent severe issues (commit 61c0b0d0f199, bug
> > > >    #105) we discovered through assertions.    
> > > 
> > > That's fair, I can see why you'd be gun shy about asserts given that.
> > >   
> > > >    Issues such as https://github.com/containers/podman/issues/22925, on
> > > >    the other hand, are good examples of how asserts saved us a lot
> > > >    of time and effort later on (compared to, say, "connections stop
> > > >    working after three days"). But I would argue that those are
> > > >    substantially different cases where assertions are checking
> > > >    something that's much more inherent to the implementation.    
> > > 
> > > Eh... I can't really see a way these assertions are different other
> > > than that one turned out to be wrong and the other didn't.  
> > 
> > Ah, sorry, I meant that there is a substantial difference between the
> > ones related to Podman issue 22925 *and what might happen in
> > packet_get_range()*.
> > 
> > Not that there's a difference between those related to Podman issue
> > 22925 and those from commit 61c0b0d0f199 and bug #105.  
> 
> Ok, but I'm still not really seeing how to distinguish the cases
> without the benefit of hindsight.
> 
> > > > In this case, I would argue that there's no need to crash, so we
> > > > shouldn't. We can presumably carry on after that, and we're not leaking
> > > > any resource or sneakily leave behind some condition that will cause
> > > > apparently unrelated issues at a later time.    
> > > 
> > > Again, true (plausibly) for packet_add_do(), but not for
> > > packet_get_do().
> > >   
> > > > > 4) On packet_get() requesting a section of the packet beyond its bounds is
> > > > >    correct already.  This happens routinely from some callers when they
> > > > >    probe to see if certain parts of the packet are present.  At worst it
> > > > >    indicates that the guest has generate a malformed packet, which we
> > > > >    should quietly drop as we already do.    
> > > 
> > > This patch doesn't change that.  We check against the stated packet
> > > bounds first and return NULL *before* we check against the buffer
> > > bounds and possibly trigger an assert.  
> > 
> > Yes... I wasn't actually disputing that (quote is from yourself).
> >   
> > > Ok... here's a tentative proposal, let me know what you think:  
> > 
> > So, first off, I have to say that it's a bit hard to reason about
> > something that is not giving users any problem at the moment (well, at
> > least not without a part of this series), and which didn't really give
> > us problems in the past.
> > 
> > It's difficult to quantify with experience and examples (even letting
> > alone the priority of the whole matter for a moment...). Anyway, let me
> > try:  
> 
> I mean, I see your point.  This is basically following on from the MTU
> stuff - I was tracing what enforces / relies on packet/datagram/frame
> size limits and making sure they're consistent through the entire
> chain.  At the moment we allow the user to set --mtu 65535 (i.e. 65535
> byte IP datagrams, not including L2), but the pool layer (amongst
> other things) doesn't allow us to actually deliver it.  This does
> cause user visible (though minor) problems, such as bug 65 & bug 66.

I see the relationship to that up to patch 6/12. After that, not so
much.

I understand that there might be inconsistencies (at least in your
interpretation) but you wouldn't be making them any worse.

> I hope merging the two series will make what's going on clearer.
> 
> > >   * I promote the packet_check_range() messages to warn(), or even
> > >     err().  Yes, this might allow for log flooding, but I would argue
> > >     that if we have a bug allowing this path to be triggered and
> > >     something is frequently triggering it, we *want* that to be noisy.
> > >     We *think* we're containing the damage here, but we don't really
> > >     know.  
> > 
> > I still think it's our responsibility to not flood the system log in
> > *any* circumstance.
> > 
> > Remember that this is not necessarily *our* log where the worst that
> > can happen is that we hide information. Even with sane system log
> > configurations, one message per packet can get close to a substantial
> > denial of service (especially with systemd-syslogd, see also
> > https://github.com/systemd/systemd/issues/13425).
> > 
> > I guess you don't realise until you run passt without -f, for example
> > with libvirt, for a large number of virtual machines. Or in KubeVirt.
> > Or for a bunch of containers. It's a common and very much intended
> > usage.  
> 
> Hm, fair point.  I feel like at some point we're probably going to
> have to bite the bullet and implement rate-limited messages and/or
> some sort of "warn once" thing at some point.

I thought you implemented something like that, by the way, but I can't
find it right now.

Yes, a "warn once" thing would fix all this. I think we could happily
warn() or err() on anything like that if we had it.

By the way, same here, I guess that wouldn't really need to be in the
same series as patches up to 6/12 here.

> > It's not true in general that we don't know. We can quantify the
> > difference in damage very clearly if the attacker manages to make
> > packet_check_range() print one line for each packet (and nothing
> > else).
> >   
> > >   * I make packet_check_range() failures in packet_add_do() drop the
> > >     packet and continue, as they used to
> > >   * packet_check_range() failures in packet_get_do() will still
> > >     assert()  
> > 
> > I guess the (start < p->buf) one makes sense, but not so much the one
> > where we check that offset plus length is within the pool, because that
> > one could be very well triggered by a specific packet itself.  
> 
> I really don't see the difference.  Since we're talking
> packet_get_do() here, this means that we've *already* inserted a
> packet in the pool which extends outside the buffer(s) it's supposed
> to be in.

That's not necessarily the case: it could be that we just use an offset
that extends beyond the end of the pool, due to bad/missing handling of
some malformed packet/request.

Maybe the packet is fine, but we're asking for something that's not.

I'm talking about this one:

	if (start + len + offset > p->buf + p->buf_size) {

> But we can't have inserted this packet with
> packet_add_do(), because it checked the same things non-fatally.
> Which means either a packet has been inserted bypassing
> packet_add_do() or the pool metadata has been corrupted since then.
> Either of those situations indicates something very, very bad.
> 
> > >   * I leave the assert()s for packet _index_ out of range  
> > 
> > Okay, yes, those make sense to me as well.
> >   
> > > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > > ---
> > > > >  packet.c    | 71 ++++++++++++++++++++---------------------------------
> > > > >  packet.h    |  3 ++-
> > > > >  vu_common.c | 13 +++++++---
> > > > >  3 files changed, 38 insertions(+), 49 deletions(-)
> > > > > 
> > > > > diff --git a/packet.c b/packet.c
> > > > > index c921aa15..24f12448 100644
> > > > > --- a/packet.c
> > > > > +++ b/packet.c
> > > > > @@ -30,37 +30,27 @@
> > > > >   * @func:	For tracing: name of calling function
> > > > >   * @line:	For tracing: caller line of function call
> > > > >   *
> > > > > - * Return: 0 if the range is valid, -1 otherwise
> > > > > + * ASSERT()s if the given range isn't valid (within the expected buffer(s) for
> > > > > + * the pool).
> > > > >   */
> > > > > -static int packet_check_range(const struct pool *p, const char *ptr, size_t len,
> > > > > -			      const char *func, int line)
> > > > > +static void packet_check_range(const struct pool *p,
> > > > > +			       const char *ptr, size_t len,
> > > > > +			       const char *func, int line)
> > > > >  {
> > > > >  	if (p->buf_size == 0) {
> > > > > -		int ret;
> > > > > -
> > > > > -		ret = vu_packet_check_range((void *)p->buf, ptr, len);
> > > > > -
> > > > > -		if (ret == -1)
> > > > > -			trace("cannot find region, %s:%i", func, line);
> > > > > -
> > > > > -		return ret;
> > > > > -	}
> > > > > -
> > > > > -	if (ptr < p->buf) {
> > > > > -		trace("packet range start %p before buffer start %p, %s:%i",
> > > > > -		      (void *)ptr, (void *)p->buf, func, line);
> > > > > -		return -1;
> > > > > -	}
> > > > > -
> > > > > -	if (ptr + len > p->buf + p->buf_size) {
> > > > > -		trace("packet range end %p after buffer end %p, %s:%i",
> > > > > -		      (void *)(ptr + len), (void *)(p->buf + p->buf_size),
> > > > > -		      func, line);
> > > > > -		return -1;
> > > > > +		vu_packet_check_range((void *)p->buf, ptr, len, func, line);
> > > > > +		return;
> > > > >  	}
> > > > >  
> > > > > -	return 0;
> > > > > +	ASSERT_WITH_MSG(ptr >= p->buf,
> > > > > +			"packet range start %p before buffer start %p, %s:%i",
> > > > > +			(void *)ptr, (void *)p->buf, func, line);
> > > > > +	ASSERT_WITH_MSG(ptr + len <= p->buf + p->buf_size,
> > > > > +			"packet range end %p after buffer end %p, %s:%i",
> > > > > +			(void *)(ptr + len), (void *)(p->buf + p->buf_size),
> > > > > +			func, line);
> > > > >  }
> > > > > +
> > > > >  /**
> > > > >   * packet_add_do() - Add data as packet descriptor to given pool
> > > > >   * @p:		Existing pool
> > > > > @@ -75,18 +65,16 @@ void packet_add_do(struct pool *p, size_t len, const char *start,
> > > > >  	size_t idx = p->count;
> > > > >  
> > > > >  	if (idx >= p->size) {
> > > > > -		trace("add packet index %zu to pool with size %zu, %s:%i",
> > > > > +		debug("add packet index %zu to pool with size %zu, %s:%i",
> > > > >  		      idx, p->size, func, line);
> > > > >  		return;
> > > > >  	}
> > > > >  
> > > > > -	if (packet_check_range(p, start, len, func, line))
> > > > > -		return;
> > > > > +	packet_check_range(p, start, len, func, line);
> > > > >  
> > > > > -	if (len > PACKET_MAX_LEN) {
> > > > > -		trace("add packet length %zu, %s:%i", len, func, line);
> > > > > -		return;
> > > > > -	}
> > > > > +	ASSERT_WITH_MSG(len <= PACKET_MAX_LEN,
> > > > > +			"add packet length %zu (max %zu), %s:%i",
> > > > > +			len, PACKET_MAX_LEN, func, line);
> > > > >  
> > > > >  	p->pkt[idx].iov_base = (void *)start;
> > > > >  	p->pkt[idx].iov_len = len;
> > > > > @@ -111,16 +99,12 @@ void *packet_get_do(const struct pool *p, size_t idx, size_t offset,
> > > > >  {
> > > > >  	char *ptr;
> > > > >  
> > > > > -	if (idx >= p->size || idx >= p->count) {
> > > > > -		trace("packet %zu from pool size: %zu, count: %zu, %s:%i",
> > > > > -		      idx, p->size, p->count, func, line);
> > > > > -		return NULL;
> > > > > -	}
> > > > > -
> > > > > -	if (len > PACKET_MAX_LEN) {
> > > > > -		trace("packet data length %zu, %s:%i", len, func, line);
> > > > > -		return NULL;
> > > > > -	}
> > > > > +	ASSERT_WITH_MSG(idx < p->size && idx < p->count,
> > > > > +			"packet %zu from pool size: %zu, count: %zu, %s:%i",
> > > > > +			idx, p->size, p->count, func, line);
> > > > > +	ASSERT_WITH_MSG(len <= PACKET_MAX_LEN,
> > > > > +			"packet range length %zu (max %zu), %s:%i",
> > > > > +			len, PACKET_MAX_LEN, func, line);
> > > > >  
> > > > >  	if (len + offset > p->pkt[idx].iov_len) {
> > > > >  		trace("data length %zu, offset %zu from length %zu, %s:%i",
> > > > > @@ -130,8 +114,7 @@ void *packet_get_do(const struct pool *p, size_t idx, size_t offset,
> > > > >  
> > > > >  	ptr = (char *)p->pkt[idx].iov_base + offset;
> > > > >  
> > > > > -	if (packet_check_range(p, ptr, len, func, line))
> > > > > -		return NULL;
> > > > > +	packet_check_range(p, ptr, len, func, line);
> > > > >  
> > > > >  	if (left)
> > > > >  		*left = p->pkt[idx].iov_len - offset - len;
> > > > > diff --git a/packet.h b/packet.h
> > > > > index f95cda08..b164f77e 100644
> > > > > --- a/packet.h
> > > > > +++ b/packet.h
> > > > > @@ -31,7 +31,8 @@ struct pool {
> > > > >  	struct iovec pkt[];
> > > > >  };
> > > > >  
> > > > > -int vu_packet_check_range(void *buf, const char *ptr, size_t len);
> > > > > +void vu_packet_check_range(void *buf, const char *ptr, size_t len,
> > > > > +			   const char *func, int line);
> > > > >  void packet_add_do(struct pool *p, size_t len, const char *start,
> > > > >  		   const char *func, int line);
> > > > >  void *packet_get_do(const struct pool *p, const size_t idx,
> > > > > diff --git a/vu_common.c b/vu_common.c
> > > > > index 531f8786..528b9b08 100644
> > > > > --- a/vu_common.c
> > > > > +++ b/vu_common.c
> > > > > @@ -24,10 +24,13 @@
> > > > >   * @buf:	Array of the available memory regions
> > > > >   * @ptr:	Start of desired data range
> > > > >   * @size:	Length of desired data range
> > > > > + * @func:	For tracing: name of calling function
> > > > > + * @line:	For tracing: caller line of function call    
> > > > 
> > > > If those arguments are mandatory, I would drop the "For tracing"
> > > > specification.
> > > >     
> > > > >   *
> > > > > - * Return: 0 if the zone is in a mapped memory region, -1 otherwise
> > > > > + * Aborts if the zone isn't in any of the device regions
> > > > >   */
> > > > > -int vu_packet_check_range(void *buf, const char *ptr, size_t len)
> > > > > +void vu_packet_check_range(void *buf, const char *ptr, size_t len,
> > > > > +			   const char *func, int line)
> > > > >  {
> > > > >  	struct vu_dev_region *dev_region;
> > > > >  
> > > > > @@ -37,10 +40,12 @@ int vu_packet_check_range(void *buf, const char *ptr, size_t len)
> > > > >  
> > > > >  		if (m <= ptr &&
> > > > >  		    ptr + len <= m + dev_region->mmap_offset + dev_region->size)
> > > > > -			return 0;
> > > > > +			return;
> > > > >  	}
> > > > >  
> > > > > -	return -1;
> > > > > +	abort_with_msg(
> > > > > +		"package range at %p, length %zd not within dev region %s:%i",    
> > > > 
> > > > s/package/packet/
> > > >     
> > > > > +		(void *)ptr, len, func, line);
> > > > >  }
> > > > >  
> > > > >  /**    

-- 
Stefano


  reply	other threads:[~2025-01-06 10:55 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-20  8:35 [PATCH v2 00/12] Cleanups to packet pool handling and sizing David Gibson
2024-12-20  8:35 ` [PATCH v2 01/12] test focus David Gibson
2024-12-20  8:35 ` [PATCH v2 02/12] hack: stop on fail, but not perf fail David Gibson
2024-12-20  8:35 ` [PATCH v2 03/12] make passt dumpable David Gibson
2024-12-20  8:35 ` [PATCH v2 04/12] packet: Use flexible array member in struct pool David Gibson
2024-12-20  8:35 ` [PATCH v2 05/12] packet: Don't pass start and offset separately too packet_check_range() David Gibson
2024-12-20  8:35 ` [PATCH v2 06/12] packet: Don't hard code maximum packet size to UINT16_MAX David Gibson
2025-01-01 21:54   ` Stefano Brivio
2025-01-02  1:00     ` David Gibson
2025-01-02 21:59       ` Stefano Brivio
2025-01-03  1:16         ` David Gibson
2025-01-05 23:43           ` Stefano Brivio
2025-01-08  1:48             ` David Gibson
2024-12-20  8:35 ` [PATCH v2 07/12] packet: Remove unhelpful packet_get_try() macro David Gibson
2025-01-01 21:54   ` Stefano Brivio
2025-01-02  2:15     ` David Gibson
2025-01-02 22:00       ` Stefano Brivio
2025-01-03  4:48         ` David Gibson
2025-01-06 10:55           ` Stefano Brivio
2025-01-08  4:44             ` David Gibson
2024-12-20  8:35 ` [PATCH v2 08/12] util: Add abort_with_msg() and ASSERT_WITH_MSG() helpers David Gibson
2024-12-20  8:35 ` [PATCH v2 09/12] packet: Distinguish severities of different packet_{add,git}_do() errors David Gibson
2025-01-01 21:54   ` Stefano Brivio
2025-01-02  2:58     ` David Gibson
2025-01-02 22:00       ` Stefano Brivio
2025-01-03  5:06         ` David Gibson
2025-01-06 10:55           ` Stefano Brivio [this message]
2025-01-08  4:53             ` David Gibson
2024-12-20  8:35 ` [PATCH v2 10/12] packet: Move packet length checks into packet_check_range() David Gibson
2024-12-20  8:35 ` [PATCH v2 11/12] tap: Don't size pool_tap[46] for the maximum number of packets David Gibson
2025-01-01 21:54   ` Stefano Brivio
2025-01-02  3:46     ` David Gibson
2025-01-02 22:00       ` Stefano Brivio
2025-01-03  6:06         ` David Gibson
2025-01-07 22:56           ` Stefano Brivio
2024-12-20  8:35 ` [PATCH v2 12/12] packet: More cautious checks to avoid pointer arithmetic UB David Gibson
2024-12-20  9:00 ` [PATCH v2 00/12] Cleanups to packet pool handling and sizing David Gibson
2024-12-20 10:06   ` Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250106115525.0d3c393f@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).