From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202412 header.b=lsJSQ7uQ; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 4AA8E5A0274 for ; Thu, 02 Jan 2025 04:47:04 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202412; t=1735789606; bh=X2hg5rIIn727gWn73lN/hJ+9VG6tYkSxBrZdNpDZHrs=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=lsJSQ7uQVPk56ypKdX5tqNSPpymUc8s0sh3IFHzU2DbbbVSCHKT6mUQ1UVJLHZIvU FWyuZtSNvGgGMBdm5ttveoKT5rTt3VuBnG2CY7yXuw/61jMk9JNBg6I5eal5dgVRkw V97LdJfrYPp8I55hFxXsmishhWg0X1OxpVM1FHUcPfMY3cHDTZdQPrBJMP7k5t7Y4k 9k7araEpA0bhWKTZKCJWmSK+/yLkKHKsC0RinTSYsXfNmbkodEA5Xr/IsvrwtlvZU6 pFHVfTkHbdC404pfSvegU7gXazfvGdX71gSy2SNkdebppDiLW1PejF4BmfACxykOd7 YjGn/tGEXHZGQ== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4YNt1L71fyz4xcZ; Thu, 2 Jan 2025 14:46:46 +1100 (AEDT) Date: Thu, 2 Jan 2025 13:58:19 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v2 09/12] packet: Distinguish severities of different packet_{add,git}_do() errors Message-ID: References: <20241220083535.1372523-1-david@gibson.dropbear.id.au> <20241220083535.1372523-10-david@gibson.dropbear.id.au> <20250101225441.00bdb632@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="a8+acWZEnnMfb37m" Content-Disposition: inline In-Reply-To: <20250101225441.00bdb632@elisabeth> Message-ID-Hash: ECGII4GR3AVWWM7XX6UOKB34JY5T7H5I X-Message-ID-Hash: ECGII4GR3AVWWM7XX6UOKB34JY5T7H5I X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --a8+acWZEnnMfb37m Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jan 01, 2025 at 10:54:41PM +0100, Stefano Brivio wrote: > On Fri, 20 Dec 2024 19:35:32 +1100 > David Gibson wrote: >=20 > > packet_add() and packet_get() can fail for a number of reasons, and we > > currently treat them all basically the same: we log a trace() level mes= sage > > and for packet_get() we return NULL. However the different causes of > > error are quite different and suggest different handling: > >=20 > > 1) If we run out of space in the pool on add, that's (rightly) non-fata= l, > > but is an unusual situation which we might want to know about. Prom= ote > > it's message to debug() level. > >=20 > > 2) All packet_check_range() errors and the checks on packet length indi= cate > > a serious problem. Due to either a bug in calling code, or some sort > > of memory-clobbering, we've been given addresses or sizes of packets > > that are nonsensical. If things are this bad, we shouldn't try to c= arry > > on replace these with asserts. > >=20 > > 3) Requesting a packet index that doesn't exist in the pool in packet_g= et() > > indicates a bug in the caller - it should always check the index fir= st. > > Replace this with an assert. >=20 > The reasons for 2) and 3) being trace() messages (they should probably > be debug() following the same reasoning as 1)) are graceful degradation > and security (availability). Hmm.. > If we have a packet triggering some sort of "memory clobbering", or an > issue in the caller, that might very well be just one packet or a few > of them. I see the case for packet_add_do(). In that case it could be that this is just a bad packet triggering bugs in the initial processing code, and just dropping it will be sufficient to save us from damaging state. For packet_get_do(), however, this indicates that we have a packet that's out of bounds, despite the fact that we checked its bounds when it was added to the pool. That means we *already* have corrupted memory, and we have no way to assess the extent of the damage. It's entirely possible we're already badly compromised. Under those circumstances I think it is much better to terminate immediately, rather than risk doing any more hard to trace damage. > I think it's a disservice to users to crash in that case, and it has > the potential to worsen a security flaw if this packet can be built on > purpose. It's also a disservice to package maintainers because it has > the potential to turn a harmless issue into an annoying situation with > associated time pressure and everything. Ok, that argument makes sense to me for packet_add_do(), but again, not for packet_get_do(). > Those are not warn() calls, by the way, because we don't want to make > it too easy, in that (perhaps unlikely) case, to flood logs, and > possibly hide information due to rotation. Right, I see the argument, but it also worries me that the message - which even if we are successfully containing the damage here definitely indicates a bug in the caller - could easily never be seen at trace or even debug level. > I suppose that the argument for asserts here is so that we discover > functional issues as soon as possible, which I understand of course, That's the main reason. The secondary reason is that it clarifies the meaning of a NULL return from packet_get_do(). With this change it means: you requested a range that's not in the packet, whereas previously it could mean either that or "something is horribly wrong in the pool state, and we don't know what". > but it relies on two assumptions that might not hold: >=20 > 1. the crash will be reported, while a debug() message will go > unnoticed. >=20 > Well, the crash is more noticeable, but that doesn't mean that it > will be reported. Users might/will try to hack something around it > or try to survive with slirp4netns and all its flaws. I suppose so. > On the other hand, a malfunction (say, failed connections) might be > equally likely to be reported, along with debug() messages. That only applies if this causes malfunctions in intended behaviour. If it's triggered due to an attack, rather than a flaw handling expected behaviour then there's no reason it would be reported. > 2. the check and the assertion themselves are correct. This was not the > case for the two most recent severe issues (commit 61c0b0d0f199, bug > #105) we discovered through assertions. That's fair, I can see why you'd be gun shy about asserts given that. > Issues such as https://github.com/containers/podman/issues/22925, on > the other hand, are good examples of how asserts saved us a lot > of time and effort later on (compared to, say, "connections stop > working after three days"). But I would argue that those are > substantially different cases where assertions are checking > something that's much more inherent to the implementation. Eh... I can't really see a way these assertions are different other than that one turned out to be wrong and the other didn't. > In this case, I would argue that there's no need to crash, so we > shouldn't. We can presumably carry on after that, and we're not leaking > any resource or sneakily leave behind some condition that will cause > apparently unrelated issues at a later time. Again, true (plausibly) for packet_add_do(), but not for packet_get_do(). >=20 > > 4) On packet_get() requesting a section of the packet beyond its bounds= is > > correct already. This happens routinely from some callers when they > > probe to see if certain parts of the packet are present. At worst it > > indicates that the guest has generate a malformed packet, which we > > should quietly drop as we already do. This patch doesn't change that. We check against the stated packet bounds first and return NULL *before* we check against the buffer bounds and possibly trigger an assert. Ok... here's a tentative proposal, let me know what you think: * I promote the packet_check_range() messages to warn(), or even err(). Yes, this might allow for log flooding, but I would argue that if we have a bug allowing this path to be triggered and something is frequently triggering it, we *want* that to be noisy. We *think* we're containing the damage here, but we don't really know. * I make packet_check_range() failures in packet_add_do() drop the packet and continue, as they used to * packet_check_range() failures in packet_get_do() will still assert() * I leave the assert()s for packet _index_ out of range > > Signed-off-by: David Gibson > > --- > > packet.c | 71 ++++++++++++++++++++--------------------------------- > > packet.h | 3 ++- > > vu_common.c | 13 +++++++--- > > 3 files changed, 38 insertions(+), 49 deletions(-) > >=20 > > diff --git a/packet.c b/packet.c > > index c921aa15..24f12448 100644 > > --- a/packet.c > > +++ b/packet.c > > @@ -30,37 +30,27 @@ > > * @func: For tracing: name of calling function > > * @line: For tracing: caller line of function call > > * > > - * Return: 0 if the range is valid, -1 otherwise > > + * ASSERT()s if the given range isn't valid (within the expected buffe= r(s) for > > + * the pool). > > */ > > -static int packet_check_range(const struct pool *p, const char *ptr, s= ize_t len, > > - const char *func, int line) > > +static void packet_check_range(const struct pool *p, > > + const char *ptr, size_t len, > > + const char *func, int line) > > { > > if (p->buf_size =3D=3D 0) { > > - int ret; > > - > > - ret =3D vu_packet_check_range((void *)p->buf, ptr, len); > > - > > - if (ret =3D=3D -1) > > - trace("cannot find region, %s:%i", func, line); > > - > > - return ret; > > - } > > - > > - if (ptr < p->buf) { > > - trace("packet range start %p before buffer start %p, %s:%i", > > - (void *)ptr, (void *)p->buf, func, line); > > - return -1; > > - } > > - > > - if (ptr + len > p->buf + p->buf_size) { > > - trace("packet range end %p after buffer end %p, %s:%i", > > - (void *)(ptr + len), (void *)(p->buf + p->buf_size), > > - func, line); > > - return -1; > > + vu_packet_check_range((void *)p->buf, ptr, len, func, line); > > + return; > > } > > =20 > > - return 0; > > + ASSERT_WITH_MSG(ptr >=3D p->buf, > > + "packet range start %p before buffer start %p, %s:%i", > > + (void *)ptr, (void *)p->buf, func, line); > > + ASSERT_WITH_MSG(ptr + len <=3D p->buf + p->buf_size, > > + "packet range end %p after buffer end %p, %s:%i", > > + (void *)(ptr + len), (void *)(p->buf + p->buf_size), > > + func, line); > > } > > + > > /** > > * packet_add_do() - Add data as packet descriptor to given pool > > * @p: Existing pool > > @@ -75,18 +65,16 @@ void packet_add_do(struct pool *p, size_t len, cons= t char *start, > > size_t idx =3D p->count; > > =20 > > if (idx >=3D p->size) { > > - trace("add packet index %zu to pool with size %zu, %s:%i", > > + debug("add packet index %zu to pool with size %zu, %s:%i", > > idx, p->size, func, line); > > return; > > } > > =20 > > - if (packet_check_range(p, start, len, func, line)) > > - return; > > + packet_check_range(p, start, len, func, line); > > =20 > > - if (len > PACKET_MAX_LEN) { > > - trace("add packet length %zu, %s:%i", len, func, line); > > - return; > > - } > > + ASSERT_WITH_MSG(len <=3D PACKET_MAX_LEN, > > + "add packet length %zu (max %zu), %s:%i", > > + len, PACKET_MAX_LEN, func, line); > > =20 > > p->pkt[idx].iov_base =3D (void *)start; > > p->pkt[idx].iov_len =3D len; > > @@ -111,16 +99,12 @@ void *packet_get_do(const struct pool *p, size_t i= dx, size_t offset, > > { > > char *ptr; > > =20 > > - if (idx >=3D p->size || idx >=3D p->count) { > > - trace("packet %zu from pool size: %zu, count: %zu, %s:%i", > > - idx, p->size, p->count, func, line); > > - return NULL; > > - } > > - > > - if (len > PACKET_MAX_LEN) { > > - trace("packet data length %zu, %s:%i", len, func, line); > > - return NULL; > > - } > > + ASSERT_WITH_MSG(idx < p->size && idx < p->count, > > + "packet %zu from pool size: %zu, count: %zu, %s:%i", > > + idx, p->size, p->count, func, line); > > + ASSERT_WITH_MSG(len <=3D PACKET_MAX_LEN, > > + "packet range length %zu (max %zu), %s:%i", > > + len, PACKET_MAX_LEN, func, line); > > =20 > > if (len + offset > p->pkt[idx].iov_len) { > > trace("data length %zu, offset %zu from length %zu, %s:%i", > > @@ -130,8 +114,7 @@ void *packet_get_do(const struct pool *p, size_t id= x, size_t offset, > > =20 > > ptr =3D (char *)p->pkt[idx].iov_base + offset; > > =20 > > - if (packet_check_range(p, ptr, len, func, line)) > > - return NULL; > > + packet_check_range(p, ptr, len, func, line); > > =20 > > if (left) > > *left =3D p->pkt[idx].iov_len - offset - len; > > diff --git a/packet.h b/packet.h > > index f95cda08..b164f77e 100644 > > --- a/packet.h > > +++ b/packet.h > > @@ -31,7 +31,8 @@ struct pool { > > struct iovec pkt[]; > > }; > > =20 > > -int vu_packet_check_range(void *buf, const char *ptr, size_t len); > > +void vu_packet_check_range(void *buf, const char *ptr, size_t len, > > + const char *func, int line); > > void packet_add_do(struct pool *p, size_t len, const char *start, > > const char *func, int line); > > void *packet_get_do(const struct pool *p, const size_t idx, > > diff --git a/vu_common.c b/vu_common.c > > index 531f8786..528b9b08 100644 > > --- a/vu_common.c > > +++ b/vu_common.c > > @@ -24,10 +24,13 @@ > > * @buf: Array of the available memory regions > > * @ptr: Start of desired data range > > * @size: Length of desired data range > > + * @func: For tracing: name of calling function > > + * @line: For tracing: caller line of function call >=20 > If those arguments are mandatory, I would drop the "For tracing" > specification. >=20 > > * > > - * Return: 0 if the zone is in a mapped memory region, -1 otherwise > > + * Aborts if the zone isn't in any of the device regions > > */ > > -int vu_packet_check_range(void *buf, const char *ptr, size_t len) > > +void vu_packet_check_range(void *buf, const char *ptr, size_t len, > > + const char *func, int line) > > { > > struct vu_dev_region *dev_region; > > =20 > > @@ -37,10 +40,12 @@ int vu_packet_check_range(void *buf, const char *pt= r, size_t len) > > =20 > > if (m <=3D ptr && > > ptr + len <=3D m + dev_region->mmap_offset + dev_region->size) > > - return 0; > > + return; > > } > > =20 > > - return -1; > > + abort_with_msg( > > + "package range at %p, length %zd not within dev region %s:%i", >=20 > s/package/packet/ >=20 > > + (void *)ptr, len, func, line); > > } > > =20 > > /** >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --a8+acWZEnnMfb37m Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmd2AMoACgkQzQJF27ox 2GcG3BAAgDsASM8B2y5VPIc/IQ/Y6yn8t6jP325B/6WwyTOuQJOqvzUZb5/tJr0O lUHT4piQvV25UZ3oZGFQkbR4tZSLAYlbodzNkmtR/RU95VOv5FDmVd6qkMosetVI Vo11sy44g0u+qnnVFN6TTUEI+2hSSlaDQm9CdtytBiHJOZ98KG7dd3u09trKq55i IgL+JDrYALSKQ68vMjAIbQWNvO9KLpT/TDy4eMwWOncaHEdaASWQwaA5aKvNNkZ9 plBVq370agp7EHmEmNvuJBYmy687mulE/9odS+hMr71XpSp+ZfulMl4Mb1JKJF22 d/oSfsr0X7NYkVHsJGvABcJn1mZGMiZxmQ+qvv/f7LLl+GL3OKEJwHY2NzsFt5DH z1IoQm5GwUwW0js3Bx6S6uzYh1FSfBS7UlCu9RU3EBZwuImLY8C4AuLQ8q5KAhNv VcSA7BrvPbof7d0qMLiY7ybSwkv2y0V6W10GbfNg5TzYKlVo53pW/i/TX23D6Jyi HNVC96VDjyoFBra0uJDz4MxSU7CKl5j6Wv9NHWysnVFWDl7NsAtY8+QE/gJWDy/J VtHMyE9+4/SToHC0LKp7qF8oEISDMYs2fkfShN7dWlojMXG8vWOL8qdMT0vD9c57 yPm0T1oP4tyoRmuldBRmqOfyZLp9FUMbqpaKnqPXQV+3VBIRg/A= =iIfZ -----END PGP SIGNATURE----- --a8+acWZEnnMfb37m--