On Mon, Jan 06, 2025 at 11:55:25AM +0100, Stefano Brivio wrote: > On Fri, 3 Jan 2025 16:06:57 +1100 > David Gibson wrote: > > > On Thu, Jan 02, 2025 at 11:00:08PM +0100, Stefano Brivio wrote: > > > On Thu, 2 Jan 2025 13:58:19 +1100 > > > David Gibson wrote: > > > > > > > On Wed, Jan 01, 2025 at 10:54:41PM +0100, Stefano Brivio wrote: > > > > > On Fri, 20 Dec 2024 19:35:32 +1100 > > > > > David Gibson wrote: > > > > > > > > > > > packet_add() and packet_get() can fail for a number of reasons, and we > > > > > > currently treat them all basically the same: we log a trace() level message > > > > > > and for packet_get() we return NULL. However the different causes of > > > > > > error are quite different and suggest different handling: > > > > > > > > > > > > 1) If we run out of space in the pool on add, that's (rightly) non-fatal, > > > > > > but is an unusual situation which we might want to know about. Promote > > > > > > it's message to debug() level. > > > > > > > > > > > > 2) All packet_check_range() errors and the checks on packet length indicate > > > > > > a serious problem. Due to either a bug in calling code, or some sort > > > > > > of memory-clobbering, we've been given addresses or sizes of packets > > > > > > that are nonsensical. If things are this bad, we shouldn't try to carry > > > > > > on replace these with asserts. > > > > > > > > > > > > 3) Requesting a packet index that doesn't exist in the pool in packet_get() > > > > > > indicates a bug in the caller - it should always check the index first. > > > > > > Replace this with an assert. > > > > > > > > > > The reasons for 2) and 3) being trace() messages (they should probably > > > > > be debug() following the same reasoning as 1)) are graceful degradation > > > > > and security (availability). > > > > > > > > Hmm.. > > > > > > > > > If we have a packet triggering some sort of "memory clobbering", or an > > > > > issue in the caller, that might very well be just one packet or a few > > > > > of them. > > > > > > > > I see the case for packet_add_do(). In that case it could be that > > > > this is just a bad packet triggering bugs in the initial processing > > > > code, and just dropping it will be sufficient to save us from damaging > > > > state. > > > > > > > > For packet_get_do(), however, this indicates that we have a packet > > > > that's out of bounds, despite the fact that we checked its bounds when > > > > it was added to the pool. That means we *already* have corrupted > > > > memory, and we have no way to assess the extent of the damage. It's > > > > entirely possible we're already badly compromised. Under those > > > > circumstances I think it is much better to terminate immediately, > > > > rather than risk doing any more hard to trace damage. > > > > > > It depends on which part of packet_get_do(). If we're exceeding the > > > packet count or the size of a pool, then I guess yes, otherwise not so > > > much. > > > > There's one, maybe two of the tests in packet_get_do() that don't > > indicate a serious error: > > if (len + offset > p->pkt[idx].iov_len) { > > is definitely ok, and you've convinced me I should try harder to make > > sure this doesn't print any error (even trace()) in at least the "try" > > cases. > > if (len > UINT16_MAX) { > > is questionable. Technically it could be ok, but since that parameter > > is generally a constant, chances are it does indicate a bad error in > > the caller. > > ...generally, yes, but I'm sure we don't want to crash on: > > - a malformed DHCP request plus subtle bug (not saying that there's > one, just that I could see myself adding one), dhcp(): > > packet_get(p, 0, offset + opt_off + 2, *olen, NULL); > > - a malformed TCP header plus some kind of bug, tcp_data_from_tap(): > > data = packet_get(p, i, off, len, NULL); > > Yes, those would need "bad errors". But the bad error here would crash > us with an assert, and cause us to throw away one packet without it. Actually, I realised this borderline case is addressed by the end of the series. The "hard" length check is moved into packet_check_range() and is always fatal. However, *before* we do that we indirectly check the range length against the packet length with if (len + offset > p->pkt[idx].iov_len) { If that fails, we hit a "soft" error (returning NULL), before we call packet_check_range() and possibly trigger a "hard" error (assert). I'll attempt to get the order of operations right when I rework this, so we don't have interim patches where this breaks. > > All the other checks, from packet_check_range() must indicate > > something very bad. > > > > > > > I think it's a disservice to users to crash in that case, and it has > > > > > the potential to worsen a security flaw if this packet can be built on > > > > > purpose. It's also a disservice to package maintainers because it has > > > > > the potential to turn a harmless issue into an annoying situation with > > > > > associated time pressure and everything. > > > > > > > > Ok, that argument makes sense to me for packet_add_do(), but again, > > > > not for packet_get_do(). > > > > > > > > > Those are not warn() calls, by the way, because we don't want to make > > > > > it too easy, in that (perhaps unlikely) case, to flood logs, and > > > > > possibly hide information due to rotation. > > > > > > > > Right, I see the argument, but it also worries me that the message - > > > > which even if we are successfully containing the damage here > > > > definitely indicates a bug in the caller - could easily never be seen > > > > at trace or even debug level. > > > > > > > > > I suppose that the argument for asserts here is so that we discover > > > > > functional issues as soon as possible, which I understand of course, > > > > > > > > That's the main reason. The secondary reason is that it clarifies the > > > > meaning of a NULL return from packet_get_do(). With this change it > > > > means: you requested a range that's not in the packet, whereas > > > > previously it could mean either that or "something is horribly wrong > > > > in the pool state, and we don't know what". > > > > > > > > > but it relies on two assumptions that might not hold: > > > > > > > > > > 1. the crash will be reported, while a debug() message will go > > > > > unnoticed. > > > > > > > > > > Well, the crash is more noticeable, but that doesn't mean that it > > > > > will be reported. Users might/will try to hack something around it > > > > > or try to survive with slirp4netns and all its flaws. > > > > > > > > I suppose so. > > > > > > > > > On the other hand, a malfunction (say, failed connections) might be > > > > > equally likely to be reported, along with debug() messages. > > > > > > > > That only applies if this causes malfunctions in intended behaviour. > > > > If it's triggered due to an attack, rather than a flaw handling > > > > expected behaviour then there's no reason it would be reported. > > > > > > > > > 2. the check and the assertion themselves are correct. This was not the > > > > > case for the two most recent severe issues (commit 61c0b0d0f199, bug > > > > > #105) we discovered through assertions. > > > > > > > > That's fair, I can see why you'd be gun shy about asserts given that. > > > > > > > > > Issues such as https://github.com/containers/podman/issues/22925, on > > > > > the other hand, are good examples of how asserts saved us a lot > > > > > of time and effort later on (compared to, say, "connections stop > > > > > working after three days"). But I would argue that those are > > > > > substantially different cases where assertions are checking > > > > > something that's much more inherent to the implementation. > > > > > > > > Eh... I can't really see a way these assertions are different other > > > > than that one turned out to be wrong and the other didn't. > > > > > > Ah, sorry, I meant that there is a substantial difference between the > > > ones related to Podman issue 22925 *and what might happen in > > > packet_get_range()*. > > > > > > Not that there's a difference between those related to Podman issue > > > 22925 and those from commit 61c0b0d0f199 and bug #105. > > > > Ok, but I'm still not really seeing how to distinguish the cases > > without the benefit of hindsight. > > > > > > > In this case, I would argue that there's no need to crash, so we > > > > > shouldn't. We can presumably carry on after that, and we're not leaking > > > > > any resource or sneakily leave behind some condition that will cause > > > > > apparently unrelated issues at a later time. > > > > > > > > Again, true (plausibly) for packet_add_do(), but not for > > > > packet_get_do(). > > > > > > > > > > 4) On packet_get() requesting a section of the packet beyond its bounds is > > > > > > correct already. This happens routinely from some callers when they > > > > > > probe to see if certain parts of the packet are present. At worst it > > > > > > indicates that the guest has generate a malformed packet, which we > > > > > > should quietly drop as we already do. > > > > > > > > This patch doesn't change that. We check against the stated packet > > > > bounds first and return NULL *before* we check against the buffer > > > > bounds and possibly trigger an assert. > > > > > > Yes... I wasn't actually disputing that (quote is from yourself). > > > > > > > Ok... here's a tentative proposal, let me know what you think: > > > > > > So, first off, I have to say that it's a bit hard to reason about > > > something that is not giving users any problem at the moment (well, at > > > least not without a part of this series), and which didn't really give > > > us problems in the past. > > > > > > It's difficult to quantify with experience and examples (even letting > > > alone the priority of the whole matter for a moment...). Anyway, let me > > > try: > > > > I mean, I see your point. This is basically following on from the MTU > > stuff - I was tracing what enforces / relies on packet/datagram/frame > > size limits and making sure they're consistent through the entire > > chain. At the moment we allow the user to set --mtu 65535 (i.e. 65535 > > byte IP datagrams, not including L2), but the pool layer (amongst > > other things) doesn't allow us to actually deliver it. This does > > cause user visible (though minor) problems, such as bug 65 & bug 66. > > I see the relationship to that up to patch 6/12. After that, not so > much. True, the later patches are for different reasons. I'll reexamine what lives in the series. > I understand that there might be inconsistencies (at least in your > interpretation) but you wouldn't be making them any worse. > > > I hope merging the two series will make what's going on clearer. > > > > > > * I promote the packet_check_range() messages to warn(), or even > > > > err(). Yes, this might allow for log flooding, but I would argue > > > > that if we have a bug allowing this path to be triggered and > > > > something is frequently triggering it, we *want* that to be noisy. > > > > We *think* we're containing the damage here, but we don't really > > > > know. > > > > > > I still think it's our responsibility to not flood the system log in > > > *any* circumstance. > > > > > > Remember that this is not necessarily *our* log where the worst that > > > can happen is that we hide information. Even with sane system log > > > configurations, one message per packet can get close to a substantial > > > denial of service (especially with systemd-syslogd, see also > > > https://github.com/systemd/systemd/issues/13425). > > > > > > I guess you don't realise until you run passt without -f, for example > > > with libvirt, for a large number of virtual machines. Or in KubeVirt. > > > Or for a bunch of containers. It's a common and very much intended > > > usage. > > > > Hm, fair point. I feel like at some point we're probably going to > > have to bite the bullet and implement rate-limited messages and/or > > some sort of "warn once" thing at some point. > > I thought you implemented something like that, by the way, but I can't > find it right now. > > Yes, a "warn once" thing would fix all this. I think we could happily > warn() or err() on anything like that if we had it. > > By the way, same here, I guess that wouldn't really need to be in the > same series as patches up to 6/12 here. No. > > > It's not true in general that we don't know. We can quantify the > > > difference in damage very clearly if the attacker manages to make > > > packet_check_range() print one line for each packet (and nothing > > > else). > > > > > > > * I make packet_check_range() failures in packet_add_do() drop the > > > > packet and continue, as they used to > > > > * packet_check_range() failures in packet_get_do() will still > > > > assert() > > > > > > I guess the (start < p->buf) one makes sense, but not so much the one > > > where we check that offset plus length is within the pool, because that > > > one could be very well triggered by a specific packet itself. > > > > I really don't see the difference. Since we're talking > > packet_get_do() here, this means that we've *already* inserted a > > packet in the pool which extends outside the buffer(s) it's supposed > > to be in. > > That's not necessarily the case: it could be that we just use an offset > that extends beyond the end of the pool, due to bad/missing handling of > some malformed packet/request. > > Maybe the packet is fine, but we're asking for something that's not. No. We've already checked the requested range against the bounds of the packet (and returned a soft error if it's out of bounds). If the range is bad once we reach packet_check_range(), then the packet is bad too. > I'm talking about this one: > > if (start + len + offset > p->buf + p->buf_size) { > > > But we can't have inserted this packet with > > packet_add_do(), because it checked the same things non-fatally. > > Which means either a packet has been inserted bypassing > > packet_add_do() or the pool metadata has been corrupted since then. > > Either of those situations indicates something very, very bad. > > > > > > * I leave the assert()s for packet _index_ out of range > > > > > > Okay, yes, those make sense to me as well. > > > > > > > > > Signed-off-by: David Gibson > > > > > > --- > > > > > > packet.c | 71 ++++++++++++++++++++--------------------------------- > > > > > > packet.h | 3 ++- > > > > > > vu_common.c | 13 +++++++--- > > > > > > 3 files changed, 38 insertions(+), 49 deletions(-) > > > > > > > > > > > > diff --git a/packet.c b/packet.c > > > > > > index c921aa15..24f12448 100644 > > > > > > --- a/packet.c > > > > > > +++ b/packet.c > > > > > > @@ -30,37 +30,27 @@ > > > > > > * @func: For tracing: name of calling function > > > > > > * @line: For tracing: caller line of function call > > > > > > * > > > > > > - * Return: 0 if the range is valid, -1 otherwise > > > > > > + * ASSERT()s if the given range isn't valid (within the expected buffer(s) for > > > > > > + * the pool). > > > > > > */ > > > > > > -static int packet_check_range(const struct pool *p, const char *ptr, size_t len, > > > > > > - const char *func, int line) > > > > > > +static void packet_check_range(const struct pool *p, > > > > > > + const char *ptr, size_t len, > > > > > > + const char *func, int line) > > > > > > { > > > > > > if (p->buf_size == 0) { > > > > > > - int ret; > > > > > > - > > > > > > - ret = vu_packet_check_range((void *)p->buf, ptr, len); > > > > > > - > > > > > > - if (ret == -1) > > > > > > - trace("cannot find region, %s:%i", func, line); > > > > > > - > > > > > > - return ret; > > > > > > - } > > > > > > - > > > > > > - if (ptr < p->buf) { > > > > > > - trace("packet range start %p before buffer start %p, %s:%i", > > > > > > - (void *)ptr, (void *)p->buf, func, line); > > > > > > - return -1; > > > > > > - } > > > > > > - > > > > > > - if (ptr + len > p->buf + p->buf_size) { > > > > > > - trace("packet range end %p after buffer end %p, %s:%i", > > > > > > - (void *)(ptr + len), (void *)(p->buf + p->buf_size), > > > > > > - func, line); > > > > > > - return -1; > > > > > > + vu_packet_check_range((void *)p->buf, ptr, len, func, line); > > > > > > + return; > > > > > > } > > > > > > > > > > > > - return 0; > > > > > > + ASSERT_WITH_MSG(ptr >= p->buf, > > > > > > + "packet range start %p before buffer start %p, %s:%i", > > > > > > + (void *)ptr, (void *)p->buf, func, line); > > > > > > + ASSERT_WITH_MSG(ptr + len <= p->buf + p->buf_size, > > > > > > + "packet range end %p after buffer end %p, %s:%i", > > > > > > + (void *)(ptr + len), (void *)(p->buf + p->buf_size), > > > > > > + func, line); > > > > > > } > > > > > > + > > > > > > /** > > > > > > * packet_add_do() - Add data as packet descriptor to given pool > > > > > > * @p: Existing pool > > > > > > @@ -75,18 +65,16 @@ void packet_add_do(struct pool *p, size_t len, const char *start, > > > > > > size_t idx = p->count; > > > > > > > > > > > > if (idx >= p->size) { > > > > > > - trace("add packet index %zu to pool with size %zu, %s:%i", > > > > > > + debug("add packet index %zu to pool with size %zu, %s:%i", > > > > > > idx, p->size, func, line); > > > > > > return; > > > > > > } > > > > > > > > > > > > - if (packet_check_range(p, start, len, func, line)) > > > > > > - return; > > > > > > + packet_check_range(p, start, len, func, line); > > > > > > > > > > > > - if (len > PACKET_MAX_LEN) { > > > > > > - trace("add packet length %zu, %s:%i", len, func, line); > > > > > > - return; > > > > > > - } > > > > > > + ASSERT_WITH_MSG(len <= PACKET_MAX_LEN, > > > > > > + "add packet length %zu (max %zu), %s:%i", > > > > > > + len, PACKET_MAX_LEN, func, line); > > > > > > > > > > > > p->pkt[idx].iov_base = (void *)start; > > > > > > p->pkt[idx].iov_len = len; > > > > > > @@ -111,16 +99,12 @@ void *packet_get_do(const struct pool *p, size_t idx, size_t offset, > > > > > > { > > > > > > char *ptr; > > > > > > > > > > > > - if (idx >= p->size || idx >= p->count) { > > > > > > - trace("packet %zu from pool size: %zu, count: %zu, %s:%i", > > > > > > - idx, p->size, p->count, func, line); > > > > > > - return NULL; > > > > > > - } > > > > > > - > > > > > > - if (len > PACKET_MAX_LEN) { > > > > > > - trace("packet data length %zu, %s:%i", len, func, line); > > > > > > - return NULL; > > > > > > - } > > > > > > + ASSERT_WITH_MSG(idx < p->size && idx < p->count, > > > > > > + "packet %zu from pool size: %zu, count: %zu, %s:%i", > > > > > > + idx, p->size, p->count, func, line); > > > > > > + ASSERT_WITH_MSG(len <= PACKET_MAX_LEN, > > > > > > + "packet range length %zu (max %zu), %s:%i", > > > > > > + len, PACKET_MAX_LEN, func, line); > > > > > > > > > > > > if (len + offset > p->pkt[idx].iov_len) { > > > > > > trace("data length %zu, offset %zu from length %zu, %s:%i", > > > > > > @@ -130,8 +114,7 @@ void *packet_get_do(const struct pool *p, size_t idx, size_t offset, > > > > > > > > > > > > ptr = (char *)p->pkt[idx].iov_base + offset; > > > > > > > > > > > > - if (packet_check_range(p, ptr, len, func, line)) > > > > > > - return NULL; > > > > > > + packet_check_range(p, ptr, len, func, line); > > > > > > > > > > > > if (left) > > > > > > *left = p->pkt[idx].iov_len - offset - len; > > > > > > diff --git a/packet.h b/packet.h > > > > > > index f95cda08..b164f77e 100644 > > > > > > --- a/packet.h > > > > > > +++ b/packet.h > > > > > > @@ -31,7 +31,8 @@ struct pool { > > > > > > struct iovec pkt[]; > > > > > > }; > > > > > > > > > > > > -int vu_packet_check_range(void *buf, const char *ptr, size_t len); > > > > > > +void vu_packet_check_range(void *buf, const char *ptr, size_t len, > > > > > > + const char *func, int line); > > > > > > void packet_add_do(struct pool *p, size_t len, const char *start, > > > > > > const char *func, int line); > > > > > > void *packet_get_do(const struct pool *p, const size_t idx, > > > > > > diff --git a/vu_common.c b/vu_common.c > > > > > > index 531f8786..528b9b08 100644 > > > > > > --- a/vu_common.c > > > > > > +++ b/vu_common.c > > > > > > @@ -24,10 +24,13 @@ > > > > > > * @buf: Array of the available memory regions > > > > > > * @ptr: Start of desired data range > > > > > > * @size: Length of desired data range > > > > > > + * @func: For tracing: name of calling function > > > > > > + * @line: For tracing: caller line of function call > > > > > > > > > > If those arguments are mandatory, I would drop the "For tracing" > > > > > specification. > > > > > > > > > > > * > > > > > > - * Return: 0 if the zone is in a mapped memory region, -1 otherwise > > > > > > + * Aborts if the zone isn't in any of the device regions > > > > > > */ > > > > > > -int vu_packet_check_range(void *buf, const char *ptr, size_t len) > > > > > > +void vu_packet_check_range(void *buf, const char *ptr, size_t len, > > > > > > + const char *func, int line) > > > > > > { > > > > > > struct vu_dev_region *dev_region; > > > > > > > > > > > > @@ -37,10 +40,12 @@ int vu_packet_check_range(void *buf, const char *ptr, size_t len) > > > > > > > > > > > > if (m <= ptr && > > > > > > ptr + len <= m + dev_region->mmap_offset + dev_region->size) > > > > > > - return 0; > > > > > > + return; > > > > > > } > > > > > > > > > > > > - return -1; > > > > > > + abort_with_msg( > > > > > > + "package range at %p, length %zd not within dev region %s:%i", > > > > > > > > > > s/package/packet/ > > > > > > > > > > > + (void *)ptr, len, func, line); > > > > > > } > > > > > > > > > > > > /** > -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson