From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTP id 7A1345A004E for ; Thu, 25 Jul 2024 13:10:23 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721905822; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dTZd0L6C04KvZ/YPMo8wObH//j6YW4Pr4BJUmFH5O+U=; b=f3hB7xKTM1ZPIEXQ4rvNLa5chXLtZ4SXi/dR8yonFEhempIYlLqdgplIIB4IeLKdxl8Fkx RzcKDZsrcXqHZqA8ZO2ZG3e6gPo2Bw3jYub+nSXn+bsC6PjBaacf5R6Qe0hwLQ05L8NY1k NlXpss5/X+r6Mj3RVkfX/MdNvQvXByg= Received: from mail-yb1-f200.google.com (mail-yb1-f200.google.com [209.85.219.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-689-IzFwp3FJO42Hed3VIFc3bw-1; Thu, 25 Jul 2024 07:10:20 -0400 X-MC-Unique: IzFwp3FJO42Hed3VIFc3bw-1 Received: by mail-yb1-f200.google.com with SMTP id 3f1490d57ef6-e0335450936so1584813276.1 for ; Thu, 25 Jul 2024 04:10:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721905820; x=1722510620; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=dTZd0L6C04KvZ/YPMo8wObH//j6YW4Pr4BJUmFH5O+U=; b=etSKDfUv7MoWXUemWYEOBep1k5EGtpWp2GlsyerMYonfMA2/lflsTjMNCICuTUQZZB hew5drL9+EIKqZevpaFlOQm742PYu/vZfDhJNUCSbOaQbKhCXf/S8hrvO7bknOt3X/jZ JdUCF7auaMCdaYfsDU2VRmyXbZwUVIRtSm3yiuPENSXmKSIlYg2BJcaCeeyN5Ttwm91e 1O05pQmHGMFbsZyMirfaQMrU/hsSQc2l+43/PFpH2NEFqcLfThlhz26Yi+ZeAgRyj22R lNbJiYvq/u6QERqqS3ovXkQuPkbCz4SkbCrAdcy8U3tQHqHEty/8hmaI1XyIcQBHpdEh b6xA== X-Gm-Message-State: AOJu0YzJQpj/MAom5Dd8ICUNDxO/LF8RUcx6jdB98nzHDvVTi1qbImow i8PU0jeocoaGmTHcoigwtILMN0BrrUoJocRBysTBoGI0eP3J17Hs8f2QY+FV7Kl0EdCPbxaKccS MfN2PvdsbVfnEU8qUxMRUwZAp6aCTSfdFB4KLtGbaIu9J9qnCsi+SHy58XAV6 X-Received: by 2002:a5b:bd2:0:b0:e06:d61b:2c17 with SMTP id 3f1490d57ef6-e0b231a3590mr2611166276.40.1721905819842; Thu, 25 Jul 2024 04:10:19 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFjFxcEhuYJYVwoqTp3jXtthr2k1L9DTVLMDF/VfBVsZTEERw0mHRoUX7xD7DPEQpT/CsTBWQ== X-Received: by 2002:a5b:bd2:0:b0:e06:d61b:2c17 with SMTP id 3f1490d57ef6-e0b231a3590mr2611142276.40.1721905819365; Thu, 25 Jul 2024 04:10:19 -0700 (PDT) Received: from maya.cloud.tilaa.com (maya.cloud.tilaa.com. [164.138.29.33]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44fe840ce1esm5153911cf.80.2024.07.25.04.10.18 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 25 Jul 2024 04:10:18 -0700 (PDT) Date: Thu, 25 Jul 2024 13:09:19 +0200 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 10/11] tap: Discard guest data on length descriptor mismatch Message-ID: <20240725130908.2ca21e7c@elisabeth> In-Reply-To: References: <20240724215021.3366863-1-sbrivio@redhat.com> <20240724215021.3366863-11-sbrivio@redhat.com> <20240725111456.47c37d6f@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: DPUVAKB5X3IAXZHGAWHV7KIAVM25MJ2T X-Message-ID-Hash: DPUVAKB5X3IAXZHGAWHV7KIAVM25MJ2T X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, 25 Jul 2024 20:23:55 +1000 David Gibson wrote: > On Thu, Jul 25, 2024 at 11:15:03AM +0200, Stefano Brivio wrote: > > On Thu, 25 Jul 2024 14:37:43 +1000 > > David Gibson wrote: > > > > > On Wed, Jul 24, 2024 at 11:50:16PM +0200, Stefano Brivio wrote: > > > > This was reported by Matej a while ago, but we forgot to fix it. Even > > > > if the hypervisor is necessarily trusted by passt, as it can in any > > > > case terminate the guest or disrupt guest connectivity, it's a good > > > > idea to be robust against possible issues. > > > > > > > > Instead of resetting the connection to the hypervisor, just discard > > > > the data we read with a single recv(), as we had a few cases where > > > > QEMU would get the length descriptor wrong, in the past. > > > > > > > > While at it, change l2len in tap_handler_passt() to uint32_t, as the > > > > length descriptor is logically unsigned and 32-bit wide. > > > > > > > > Reported-by: Matej Hrica > > > > Suggested-by: Matej Hrica > > > > Signed-off-by: Stefano Brivio > > > > --- > > > > tap.c | 10 ++++++---- > > > > 1 file changed, 6 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/tap.c b/tap.c > > > > index 44bd444..62ba6a4 100644 > > > > --- a/tap.c > > > > +++ b/tap.c > > > > @@ -1011,15 +1011,18 @@ redo: > > > > } > > > > > > > > while (n > (ssize_t)sizeof(uint32_t)) { > > > > - ssize_t l2len = ntohl(*(uint32_t *)p); > > > > + uint32_t l2len = ntohl(*(uint32_t *)p); > > > > > > > > p += sizeof(uint32_t); > > > > n -= sizeof(uint32_t); > > > > > > > > + if (l2len > (ssize_t)TAP_BUF_BYTES - n) > > > > + return; > > > > > > Neither the condition nor the action makes much sense to me here. > > > We're testing if the frame can fit in the the remaining buffer space. > > > > Not really, we're just checking that the length descriptor fits the > > remaining buffer space. We're using this in the second recv() below, > > that's why it matters here. > > But AFAICT, what we need to know is if the remainder of the frame fits > in the buffer. It always does because of how TAP_BUF_BYTES and TAP_BUF_FILL are defined. > That could be less than the length descriptor if we've > already recv()ed part of a frame. > > > > But we may have already read part (or all) of the frame - i.e. it's > > > included in 'n'. So I don't see how that condition is useful. > > > > ...that is, it has nothing to do with exceeding or not exceeding the > > buffer on recv(), that's already taken care of by the recv() call, > > implicitly. > > > > > Then, simply returning doesn't seem right under pretty much any > > > circumstances - that discards some amount of data, and leaves us in an > > > unsynchronized state w.r.t. the frame boundaries. > > > > That might happen, of course, but it might also happen that the > > hypervisor sent us *one* corrupted buffer, and the next recv() will > > read consistent data. > > Well, sure, it's possible, but it doesn't seem particularly likely to > me. AFAICT this is a stream which we need every length field to > interpret properly. If we lose one, or it's corrupted, I think we're > done for. In most cases, the content of one recv() corresponds to a given number of whole frames, so what we're doing by returning here is, practically speaking, similar to what you're suggesting below with MSG_TRUNC. > > > If this is just supposed to be a sanity check on the frame length, > > > then I think we'd be better off with a fixed limit - 64kiB is the > > > obvious choice. > > > > That's already checked below (l2len > ETH_MAX_MTU), and... > > Right. I wonder if it would make sense to do that earlier. We can. But right now what I want to fix here is just that missing check, because it actually causes passt to crash (even though we assume a trusted hypervisor, so it's not really security relevant, but not nice either). > > > If we hit that, we can warn() and discard data up to > > > the end of the too-large frame. That at least has a chance of letting > > > us recover and move on to future acceptable frames. > > > > that's exactly what we do in that case (goto next). > > Only for the case that the length is too long, but not *too* long. In > particular it needs to fit in the buffer to even get there. If we > sanity checked the frame length earlier we could use MSG_TRUNC to > discard even a ludicrously large frame and still continue on to the > next one. We don't know if the length descriptor actually matches the length of the frame, though. If you have a way you think is more robust, would you mind sending a patch? > > > > /* At most one packet might not fit in a single read, and this > > > > * needs to be blocking. > > > > */ > > > > - if (l2len > n) { > > > > + if (l2len > (size_t)n) { > > > > rem = recv(c->fd_tap, p + n, l2len - n, 0); > > ^^^^^^^^^^^^^^^^ > > > > This the reason why the check above is relevant. > > Relevant, sure, but I still don't think it's right. Actually > (TAP_BUF_BYTES - n) is an even stranger quantity than I initially > thought. It's the total space of the buffer minus the current partial > frame - counting *both* the stuff before our partial frame and after > it. That should be enough to make sure we don't have bad side effects on this second recv(), but yes: > I think instead we need to check for (p + l2len > pkt_buf + TAP_BUF_BYTES). ...that is actually more accurate. I can fix this up in another version, unless you can think of a more comprehensive change that also gives us better possibilities to recover from a corrupted stream. > > > > if ((n += rem) != l2len) > > > > return; > > > > > > Pre-existing, but a 'return' here basically lands us in a situation we > > > have no meaningful chance of recovering from. A die() would be > > > preferable. Better yet would be continuing to re-recv() until we have > > > the whole frame, similar to what we do for write_remainder(). > > > > Same as above, it depends on what failure you're assuming. If it's just > > one botched recv(), instead, we recv() again the next time and we > > recover. > > Even if it's just one bad recv(), we still have no idea where we are > w.r.t. frame boundaries, so I can't see any way we could recover. The idea is that the recv() is _usually_ big enough as to flush the buffer anyway. > > But yes, the first attempt should probably be to recv() the rest of the > > frame. I didn't implement this because it adds complexity and I think > > that, eventually, we should turn this into a proper ringbuffer anyway. > > > > > > @@ -1028,8 +1031,7 @@ redo: > > > > /* Complete the partial read above before discarding a malformed > > > > * frame, otherwise the stream will be inconsistent. > > > > */ > > > > - if (l2len < (ssize_t)sizeof(struct ethhdr) || > > > > - l2len > (ssize_t)ETH_MAX_MTU) > > > > + if (l2len < sizeof(struct ethhdr) || l2len > ETH_MAX_MTU) > > > > goto next; > > > > tap_add_packet(c, l2len, p); > > > -- Stefano