From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202412 header.b=SfYDU6OH; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 0AF505A0271 for ; Fri, 03 Jan 2025 06:07:14 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202412; t=1735880814; bh=C1ZipMpMdXwrAuyxswQOYp57gJB+UVJuWyisV0IRFSA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=SfYDU6OHy0agD3T0NL5Hp3Ut/1/E9Hf+EJlxTke7MAS6pBmI2SFz69/pSbSUDiQWw 8qDs3Rj3hlXfchDGP8YAYIcJ3VqbunMk6MARRHAM68QLoaw57kJ4OYlYMpbHTHGChl D3MsXAilXe3+ZIOfC9C3nmqZweR2AOtMcczrYAawNLSoszDcBXycQzTGTChW17xTjW wYSQFNn6ikBmfFI2bYd72rq3+BH2mhqaZmcdv0qFyOlUC2BKup6cIKylH0nhqfsv/v 52Fd1FsO3wJqWqvdJLXycRKjvPxS2cihjNlcFELLv4cJPnH4xSJcHU31m/7OJF5aFh THMuQhNZd3MUA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4YPWlL5ZBMz4xfl; Fri, 3 Jan 2025 16:06:54 +1100 (AEDT) Date: Fri, 3 Jan 2025 16:06:57 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v2 09/12] packet: Distinguish severities of different packet_{add,git}_do() errors Message-ID: References: <20241220083535.1372523-1-david@gibson.dropbear.id.au> <20241220083535.1372523-10-david@gibson.dropbear.id.au> <20250101225441.00bdb632@elisabeth> <20250102230008.391068cb@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="UXMcZBw+XavDWrdo" Content-Disposition: inline In-Reply-To: <20250102230008.391068cb@elisabeth> Message-ID-Hash: DUCXON76P53R7ORM74MBEPUL4BBYOENY X-Message-ID-Hash: DUCXON76P53R7ORM74MBEPUL4BBYOENY X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --UXMcZBw+XavDWrdo Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jan 02, 2025 at 11:00:08PM +0100, Stefano Brivio wrote: > On Thu, 2 Jan 2025 13:58:19 +1100 > David Gibson wrote: >=20 > > On Wed, Jan 01, 2025 at 10:54:41PM +0100, Stefano Brivio wrote: > > > On Fri, 20 Dec 2024 19:35:32 +1100 > > > David Gibson wrote: > > > =20 > > > > packet_add() and packet_get() can fail for a number of reasons, and= we > > > > currently treat them all basically the same: we log a trace() level= message > > > > and for packet_get() we return NULL. However the different causes = of > > > > error are quite different and suggest different handling: > > > >=20 > > > > 1) If we run out of space in the pool on add, that's (rightly) non-= fatal, > > > > but is an unusual situation which we might want to know about. = Promote > > > > it's message to debug() level. > > > >=20 > > > > 2) All packet_check_range() errors and the checks on packet length = indicate > > > > a serious problem. Due to either a bug in calling code, or some= sort > > > > of memory-clobbering, we've been given addresses or sizes of pac= kets > > > > that are nonsensical. If things are this bad, we shouldn't try = to carry > > > > on replace these with asserts. > > > >=20 > > > > 3) Requesting a packet index that doesn't exist in the pool in pack= et_get() > > > > indicates a bug in the caller - it should always check the index= first. > > > > Replace this with an assert. =20 > > >=20 > > > The reasons for 2) and 3) being trace() messages (they should probably > > > be debug() following the same reasoning as 1)) are graceful degradati= on > > > and security (availability). =20 > >=20 > > Hmm.. > >=20 > > > If we have a packet triggering some sort of "memory clobbering", or an > > > issue in the caller, that might very well be just one packet or a few > > > of them. =20 > >=20 > > I see the case for packet_add_do(). In that case it could be that > > this is just a bad packet triggering bugs in the initial processing > > code, and just dropping it will be sufficient to save us from damaging > > state. > >=20 > > For packet_get_do(), however, this indicates that we have a packet > > that's out of bounds, despite the fact that we checked its bounds when > > it was added to the pool. That means we *already* have corrupted > > memory, and we have no way to assess the extent of the damage. It's > > entirely possible we're already badly compromised. Under those > > circumstances I think it is much better to terminate immediately, > > rather than risk doing any more hard to trace damage. >=20 > It depends on which part of packet_get_do(). If we're exceeding the > packet count or the size of a pool, then I guess yes, otherwise not so > much. There's one, maybe two of the tests in packet_get_do() that don't indicate a serious error: if (len + offset > p->pkt[idx].iov_len) { is definitely ok, and you've convinced me I should try harder to make sure this doesn't print any error (even trace()) in at least the "try" cases. if (len > UINT16_MAX) { is questionable. Technically it could be ok, but since that parameter is generally a constant, chances are it does indicate a bad error in the caller. All the other checks, from packet_check_range() must indicate something very bad. > > > I think it's a disservice to users to crash in that case, and it has > > > the potential to worsen a security flaw if this packet can be built on > > > purpose. It's also a disservice to package maintainers because it has > > > the potential to turn a harmless issue into an annoying situation with > > > associated time pressure and everything. =20 > >=20 > > Ok, that argument makes sense to me for packet_add_do(), but again, > > not for packet_get_do(). > >=20 > > > Those are not warn() calls, by the way, because we don't want to make > > > it too easy, in that (perhaps unlikely) case, to flood logs, and > > > possibly hide information due to rotation. =20 > >=20 > > Right, I see the argument, but it also worries me that the message - > > which even if we are successfully containing the damage here > > definitely indicates a bug in the caller - could easily never be seen > > at trace or even debug level. > >=20 > > > I suppose that the argument for asserts here is so that we discover > > > functional issues as soon as possible, which I understand of course, = =20 > >=20 > > That's the main reason. The secondary reason is that it clarifies the > > meaning of a NULL return from packet_get_do(). With this change it > > means: you requested a range that's not in the packet, whereas > > previously it could mean either that or "something is horribly wrong > > in the pool state, and we don't know what". > >=20 > > > but it relies on two assumptions that might not hold: > > >=20 > > > 1. the crash will be reported, while a debug() message will go > > > unnoticed. > > >=20 > > > Well, the crash is more noticeable, but that doesn't mean that it > > > will be reported. Users might/will try to hack something around it > > > or try to survive with slirp4netns and all its flaws. =20 > >=20 > > I suppose so. > >=20 > > > On the other hand, a malfunction (say, failed connections) might be > > > equally likely to be reported, along with debug() messages. =20 > >=20 > > That only applies if this causes malfunctions in intended behaviour. > > If it's triggered due to an attack, rather than a flaw handling > > expected behaviour then there's no reason it would be reported. > >=20 > > > 2. the check and the assertion themselves are correct. This was not t= he > > > case for the two most recent severe issues (commit 61c0b0d0f199, b= ug > > > #105) we discovered through assertions. =20 > >=20 > > That's fair, I can see why you'd be gun shy about asserts given that. > >=20 > > > Issues such as https://github.com/containers/podman/issues/22925, = on > > > the other hand, are good examples of how asserts saved us a lot > > > of time and effort later on (compared to, say, "connections stop > > > working after three days"). But I would argue that those are > > > substantially different cases where assertions are checking > > > something that's much more inherent to the implementation. =20 > >=20 > > Eh... I can't really see a way these assertions are different other > > than that one turned out to be wrong and the other didn't. >=20 > Ah, sorry, I meant that there is a substantial difference between the > ones related to Podman issue 22925 *and what might happen in > packet_get_range()*. >=20 > Not that there's a difference between those related to Podman issue > 22925 and those from commit 61c0b0d0f199 and bug #105. Ok, but I'm still not really seeing how to distinguish the cases without the benefit of hindsight. > > > In this case, I would argue that there's no need to crash, so we > > > shouldn't. We can presumably carry on after that, and we're not leaki= ng > > > any resource or sneakily leave behind some condition that will cause > > > apparently unrelated issues at a later time. =20 > >=20 > > Again, true (plausibly) for packet_add_do(), but not for > > packet_get_do(). > >=20 > > > > 4) On packet_get() requesting a section of the packet beyond its bo= unds is > > > > correct already. This happens routinely from some callers when = they > > > > probe to see if certain parts of the packet are present. At wor= st it > > > > indicates that the guest has generate a malformed packet, which = we > > > > should quietly drop as we already do. =20 > >=20 > > This patch doesn't change that. We check against the stated packet > > bounds first and return NULL *before* we check against the buffer > > bounds and possibly trigger an assert. >=20 > Yes... I wasn't actually disputing that (quote is from yourself). >=20 > > Ok... here's a tentative proposal, let me know what you think: >=20 > So, first off, I have to say that it's a bit hard to reason about > something that is not giving users any problem at the moment (well, at > least not without a part of this series), and which didn't really give > us problems in the past. >=20 > It's difficult to quantify with experience and examples (even letting > alone the priority of the whole matter for a moment...). Anyway, let me > try: I mean, I see your point. This is basically following on from the MTU stuff - I was tracing what enforces / relies on packet/datagram/frame size limits and making sure they're consistent through the entire chain. At the moment we allow the user to set --mtu 65535 (i.e. 65535 byte IP datagrams, not including L2), but the pool layer (amongst other things) doesn't allow us to actually deliver it. This does cause user visible (though minor) problems, such as bug 65 & bug 66. I hope merging the two series will make what's going on clearer. > > * I promote the packet_check_range() messages to warn(), or even > > err(). Yes, this might allow for log flooding, but I would argue > > that if we have a bug allowing this path to be triggered and > > something is frequently triggering it, we *want* that to be noisy. > > We *think* we're containing the damage here, but we don't really > > know. >=20 > I still think it's our responsibility to not flood the system log in > *any* circumstance. >=20 > Remember that this is not necessarily *our* log where the worst that > can happen is that we hide information. Even with sane system log > configurations, one message per packet can get close to a substantial > denial of service (especially with systemd-syslogd, see also > https://github.com/systemd/systemd/issues/13425). >=20 > I guess you don't realise until you run passt without -f, for example > with libvirt, for a large number of virtual machines. Or in KubeVirt. > Or for a bunch of containers. It's a common and very much intended > usage. Hm, fair point. I feel like at some point we're probably going to have to bite the bullet and implement rate-limited messages and/or some sort of "warn once" thing at some point. > It's not true in general that we don't know. We can quantify the > difference in damage very clearly if the attacker manages to make > packet_check_range() print one line for each packet (and nothing > else). >=20 > > * I make packet_check_range() failures in packet_add_do() drop the > > packet and continue, as they used to > > * packet_check_range() failures in packet_get_do() will still > > assert() >=20 > I guess the (start < p->buf) one makes sense, but not so much the one > where we check that offset plus length is within the pool, because that > one could be very well triggered by a specific packet itself. I really don't see the difference. Since we're talking packet_get_do() here, this means that we've *already* inserted a packet in the pool which extends outside the buffer(s) it's supposed to be in. But we can't have inserted this packet with packet_add_do(), because it checked the same things non-fatally. Which means either a packet has been inserted bypassing packet_add_do() or the pool metadata has been corrupted since then. Either of those situations indicates something very, very bad. > > * I leave the assert()s for packet _index_ out of range >=20 > Okay, yes, those make sense to me as well. >=20 > > > > Signed-off-by: David Gibson > > > > --- > > > > packet.c | 71 ++++++++++++++++++++-----------------------------= ---- > > > > packet.h | 3 ++- > > > > vu_common.c | 13 +++++++--- > > > > 3 files changed, 38 insertions(+), 49 deletions(-) > > > >=20 > > > > diff --git a/packet.c b/packet.c > > > > index c921aa15..24f12448 100644 > > > > --- a/packet.c > > > > +++ b/packet.c > > > > @@ -30,37 +30,27 @@ > > > > * @func: For tracing: name of calling function > > > > * @line: For tracing: caller line of function call > > > > * > > > > - * Return: 0 if the range is valid, -1 otherwise > > > > + * ASSERT()s if the given range isn't valid (within the expected b= uffer(s) for > > > > + * the pool). > > > > */ > > > > -static int packet_check_range(const struct pool *p, const char *pt= r, size_t len, > > > > - const char *func, int line) > > > > +static void packet_check_range(const struct pool *p, > > > > + const char *ptr, size_t len, > > > > + const char *func, int line) > > > > { > > > > if (p->buf_size =3D=3D 0) { > > > > - int ret; > > > > - > > > > - ret =3D vu_packet_check_range((void *)p->buf, ptr, len); > > > > - > > > > - if (ret =3D=3D -1) > > > > - trace("cannot find region, %s:%i", func, line); > > > > - > > > > - return ret; > > > > - } > > > > - > > > > - if (ptr < p->buf) { > > > > - trace("packet range start %p before buffer start %p, %s:%i", > > > > - (void *)ptr, (void *)p->buf, func, line); > > > > - return -1; > > > > - } > > > > - > > > > - if (ptr + len > p->buf + p->buf_size) { > > > > - trace("packet range end %p after buffer end %p, %s:%i", > > > > - (void *)(ptr + len), (void *)(p->buf + p->buf_size), > > > > - func, line); > > > > - return -1; > > > > + vu_packet_check_range((void *)p->buf, ptr, len, func, line); > > > > + return; > > > > } > > > > =20 > > > > - return 0; > > > > + ASSERT_WITH_MSG(ptr >=3D p->buf, > > > > + "packet range start %p before buffer start %p, %s:%i", > > > > + (void *)ptr, (void *)p->buf, func, line); > > > > + ASSERT_WITH_MSG(ptr + len <=3D p->buf + p->buf_size, > > > > + "packet range end %p after buffer end %p, %s:%i", > > > > + (void *)(ptr + len), (void *)(p->buf + p->buf_size), > > > > + func, line); > > > > } > > > > + > > > > /** > > > > * packet_add_do() - Add data as packet descriptor to given pool > > > > * @p: Existing pool > > > > @@ -75,18 +65,16 @@ void packet_add_do(struct pool *p, size_t len, = const char *start, > > > > size_t idx =3D p->count; > > > > =20 > > > > if (idx >=3D p->size) { > > > > - trace("add packet index %zu to pool with size %zu, %s:%i", > > > > + debug("add packet index %zu to pool with size %zu, %s:%i", > > > > idx, p->size, func, line); > > > > return; > > > > } > > > > =20 > > > > - if (packet_check_range(p, start, len, func, line)) > > > > - return; > > > > + packet_check_range(p, start, len, func, line); > > > > =20 > > > > - if (len > PACKET_MAX_LEN) { > > > > - trace("add packet length %zu, %s:%i", len, func, line); > > > > - return; > > > > - } > > > > + ASSERT_WITH_MSG(len <=3D PACKET_MAX_LEN, > > > > + "add packet length %zu (max %zu), %s:%i", > > > > + len, PACKET_MAX_LEN, func, line); > > > > =20 > > > > p->pkt[idx].iov_base =3D (void *)start; > > > > p->pkt[idx].iov_len =3D len; > > > > @@ -111,16 +99,12 @@ void *packet_get_do(const struct pool *p, size= _t idx, size_t offset, > > > > { > > > > char *ptr; > > > > =20 > > > > - if (idx >=3D p->size || idx >=3D p->count) { > > > > - trace("packet %zu from pool size: %zu, count: %zu, %s:%i", > > > > - idx, p->size, p->count, func, line); > > > > - return NULL; > > > > - } > > > > - > > > > - if (len > PACKET_MAX_LEN) { > > > > - trace("packet data length %zu, %s:%i", len, func, line); > > > > - return NULL; > > > > - } > > > > + ASSERT_WITH_MSG(idx < p->size && idx < p->count, > > > > + "packet %zu from pool size: %zu, count: %zu, %s:%i", > > > > + idx, p->size, p->count, func, line); > > > > + ASSERT_WITH_MSG(len <=3D PACKET_MAX_LEN, > > > > + "packet range length %zu (max %zu), %s:%i", > > > > + len, PACKET_MAX_LEN, func, line); > > > > =20 > > > > if (len + offset > p->pkt[idx].iov_len) { > > > > trace("data length %zu, offset %zu from length %zu, %s:%i", > > > > @@ -130,8 +114,7 @@ void *packet_get_do(const struct pool *p, size_= t idx, size_t offset, > > > > =20 > > > > ptr =3D (char *)p->pkt[idx].iov_base + offset; > > > > =20 > > > > - if (packet_check_range(p, ptr, len, func, line)) > > > > - return NULL; > > > > + packet_check_range(p, ptr, len, func, line); > > > > =20 > > > > if (left) > > > > *left =3D p->pkt[idx].iov_len - offset - len; > > > > diff --git a/packet.h b/packet.h > > > > index f95cda08..b164f77e 100644 > > > > --- a/packet.h > > > > +++ b/packet.h > > > > @@ -31,7 +31,8 @@ struct pool { > > > > struct iovec pkt[]; > > > > }; > > > > =20 > > > > -int vu_packet_check_range(void *buf, const char *ptr, size_t len); > > > > +void vu_packet_check_range(void *buf, const char *ptr, size_t len, > > > > + const char *func, int line); > > > > void packet_add_do(struct pool *p, size_t len, const char *start, > > > > const char *func, int line); > > > > void *packet_get_do(const struct pool *p, const size_t idx, > > > > diff --git a/vu_common.c b/vu_common.c > > > > index 531f8786..528b9b08 100644 > > > > --- a/vu_common.c > > > > +++ b/vu_common.c > > > > @@ -24,10 +24,13 @@ > > > > * @buf: Array of the available memory regions > > > > * @ptr: Start of desired data range > > > > * @size: Length of desired data range > > > > + * @func: For tracing: name of calling function > > > > + * @line: For tracing: caller line of function call =20 > > >=20 > > > If those arguments are mandatory, I would drop the "For tracing" > > > specification. > > > =20 > > > > * > > > > - * Return: 0 if the zone is in a mapped memory region, -1 otherwise > > > > + * Aborts if the zone isn't in any of the device regions > > > > */ > > > > -int vu_packet_check_range(void *buf, const char *ptr, size_t len) > > > > +void vu_packet_check_range(void *buf, const char *ptr, size_t len, > > > > + const char *func, int line) > > > > { > > > > struct vu_dev_region *dev_region; > > > > =20 > > > > @@ -37,10 +40,12 @@ int vu_packet_check_range(void *buf, const char= *ptr, size_t len) > > > > =20 > > > > if (m <=3D ptr && > > > > ptr + len <=3D m + dev_region->mmap_offset + dev_region->siz= e) > > > > - return 0; > > > > + return; > > > > } > > > > =20 > > > > - return -1; > > > > + abort_with_msg( > > > > + "package range at %p, length %zd not within dev region %s:%i", = =20 > > >=20 > > > s/package/packet/ > > > =20 > > > > + (void *)ptr, len, func, line); > > > > } > > > > =20 > > > > /** =20 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --UXMcZBw+XavDWrdo Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmd3cHAACgkQzQJF27ox 2Gfbog/9HqgtjefE1u5zLa7LwNXm+Z4hJ4zmsZSVtkSfNVw+6gURyFxH6cSPWf0B e/KgAuDFzw2qDnfO6oPDeVzYVHXXTXDg7tfEs4IKEYWL3mOUoa+XByTF28PeglgE km6sp5Sg45w8/4g55Ss/q8k4cMnBG0pWrnLVd9+MJrdw6VRe0L0X4EsBLSrzRxeU wbseFwxg2g+zcLj8r910EJPK7/X3kr/nuadcO6FUm4sYrKX2lxLYlwS/MRRUQB1e cT5mZ4tWsUk1bYikotHJDnaZdprKMviA0fYhoxN0z8UdlQvRd9mVfnQaAHA6rEuC ionyDrbhjEfgpt62RnILjDJnOPjeKbs0nguorrDO10VoIHMNEreWXbRtwebQhZeg 7nhAbR5/rFYkyVbUINWqDbTkeb2bn512pq2LlJ9SiMKMWjS3TE0tr8nOC9xzOanu NPJe1Qd1c09aN+7YLWWc/aP/KkUjgtYQEAuXiaOA7/wgC0CdTxpKt1sGYK+ANQue oe5y38xt3ypO3e0OE2ayLy4FqKP95/2GjRjG5sBYf4zw3x4paKYsO7n+WMxU4w90 01wNRs4LQXJbG93MdorcqtIj5SR6LCf4t6Qxn+NNbcwIFZFwQUkQaG/uPiLgyelj VZXBOTQw8JR9iEnz1GYPLPBxxd12q/U3dyXfu79g2E/+hnTx798= =6KRD -----END PGP SIGNATURE----- --UXMcZBw+XavDWrdo--