From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id BC5075A004E for ; Thu, 25 Jul 2024 12:24:09 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1721903040; bh=3iKXm4iG+7wsVDVzGbjZrEh7o0J4dNlWUSRsNxivoLA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=W2c+D3wo13v4v2EDZYqAAb6BCjcUzpzxUePzbRwpfUPD82UQebkLhpUeM9KrGVXvg plApaVwgL+8tOYi9sPLUGL+9rkxZtjP81FyFi5MI8MOT+4h2yE4EIMINQai1UoOthG kd9voeh1xu7xmo1730c5d9NBerOTeOfMLinN0CdItuUgLZGbMh7HVH31swCB4fpSQE oOQ4X6i46eJMdDBI/ADqMeRTtOWsrxBgyZMEeIejASR8nZwiOCSWZtGd4pIYut9ROj SjQ74VwRQtw3Bhi23pJT194gxj/8scsAcYgGCU/XZQwgmK8OkFEsu9octKGvMLXe8V xLCNFDglseBJQ== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4WV6S04gTkz4wnt; Thu, 25 Jul 2024 20:24:00 +1000 (AEST) Date: Thu, 25 Jul 2024 20:23:55 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 10/11] tap: Discard guest data on length descriptor mismatch Message-ID: References: <20240724215021.3366863-1-sbrivio@redhat.com> <20240724215021.3366863-11-sbrivio@redhat.com> <20240725111456.47c37d6f@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="MgSSTi17Mpk+UDsc" Content-Disposition: inline In-Reply-To: <20240725111456.47c37d6f@elisabeth> Message-ID-Hash: SWU46Z2U35OFXLLOXW4J3K43GKIZPVOJ X-Message-ID-Hash: SWU46Z2U35OFXLLOXW4J3K43GKIZPVOJ X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --MgSSTi17Mpk+UDsc Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jul 25, 2024 at 11:15:03AM +0200, Stefano Brivio wrote: > On Thu, 25 Jul 2024 14:37:43 +1000 > David Gibson wrote: >=20 > > On Wed, Jul 24, 2024 at 11:50:16PM +0200, Stefano Brivio wrote: > > > This was reported by Matej a while ago, but we forgot to fix it. Even > > > if the hypervisor is necessarily trusted by passt, as it can in any > > > case terminate the guest or disrupt guest connectivity, it's a good > > > idea to be robust against possible issues. > > >=20 > > > Instead of resetting the connection to the hypervisor, just discard > > > the data we read with a single recv(), as we had a few cases where > > > QEMU would get the length descriptor wrong, in the past. > > >=20 > > > While at it, change l2len in tap_handler_passt() to uint32_t, as the > > > length descriptor is logically unsigned and 32-bit wide. > > >=20 > > > Reported-by: Matej Hrica > > > Suggested-by: Matej Hrica > > > Signed-off-by: Stefano Brivio > > > --- > > > tap.c | 10 ++++++---- > > > 1 file changed, 6 insertions(+), 4 deletions(-) > > >=20 > > > diff --git a/tap.c b/tap.c > > > index 44bd444..62ba6a4 100644 > > > --- a/tap.c > > > +++ b/tap.c > > > @@ -1011,15 +1011,18 @@ redo: > > > } > > > =20 > > > while (n > (ssize_t)sizeof(uint32_t)) { > > > - ssize_t l2len =3D ntohl(*(uint32_t *)p); > > > + uint32_t l2len =3D ntohl(*(uint32_t *)p); > > > =20 > > > p +=3D sizeof(uint32_t); > > > n -=3D sizeof(uint32_t); > > > =20 > > > + if (l2len > (ssize_t)TAP_BUF_BYTES - n) > > > + return; =20 > >=20 > > Neither the condition nor the action makes much sense to me here. > > We're testing if the frame can fit in the the remaining buffer space. >=20 > Not really, we're just checking that the length descriptor fits the > remaining buffer space. We're using this in the second recv() below, > that's why it matters here. But AFAICT, what we need to know is if the remainder of the frame fits in the buffer. That could be less than the length descriptor if we've already recv()ed part of a frame. > > But we may have already read part (or all) of the frame - i.e. it's > > included in 'n'. So I don't see how that condition is useful. >=20 > ...that is, it has nothing to do with exceeding or not exceeding the > buffer on recv(), that's already taken care of by the recv() call, > implicitly. >=20 > > Then, simply returning doesn't seem right under pretty much any > > circumstances - that discards some amount of data, and leaves us in an > > unsynchronized state w.r.t. the frame boundaries. >=20 > That might happen, of course, but it might also happen that the > hypervisor sent us *one* corrupted buffer, and the next recv() will > read consistent data. Well, sure, it's possible, but it doesn't seem particularly likely to me. AFAICT this is a stream which we need every length field to interpret properly. If we lose one, or it's corrupted, I think we're done for. > > If this is just supposed to be a sanity check on the frame length, > > then I think we'd be better off with a fixed limit - 64kiB is the > > obvious choice. >=20 > That's already checked below (l2len > ETH_MAX_MTU), and... Right. I wonder if it would make sense to do that earlier. > > If we hit that, we can warn() and discard data up to > > the end of the too-large frame. That at least has a chance of letting > > us recover and move on to future acceptable frames. >=20 > that's exactly what we do in that case (goto next). Only for the case that the length is too long, but not *too* long. In particular it needs to fit in the buffer to even get there. If we sanity checked the frame length earlier we could use MSG_TRUNC to discard even a ludicrously large frame and still continue on to the next one. > > > /* At most one packet might not fit in a single read, and this > > > * needs to be blocking. > > > */ > > > - if (l2len > n) { > > > + if (l2len > (size_t)n) { > > > rem =3D recv(c->fd_tap, p + n, l2len - n, 0); > ^^^^^^^^^^^^^^^^ >=20 > This the reason why the check above is relevant. Relevant, sure, but I still don't think it's right. Actually (TAP_BUF_BYTES - n) is an even stranger quantity than I initially thought. It's the total space of the buffer minus the current partial frame - counting *both* the stuff before our partial frame and after it. I think instead we need to check for (p + l2len > pkt_buf + TAP_BUF_BYTES). > > > if ((n +=3D rem) !=3D l2len) > > > return; =20 > >=20 > > Pre-existing, but a 'return' here basically lands us in a situation we > > have no meaningful chance of recovering from. A die() would be > > preferable. Better yet would be continuing to re-recv() until we have > > the whole frame, similar to what we do for write_remainder(). >=20 > Same as above, it depends on what failure you're assuming. If it's just > one botched recv(), instead, we recv() again the next time and we > recover. Even if it's just one bad recv(), we still have no idea where we are w.r.t. frame boundaries, so I can't see any way we could recover. > But yes, the first attempt should probably be to recv() the rest of the > frame. I didn't implement this because it adds complexity and I think > that, eventually, we should turn this into a proper ringbuffer anyway. >=20 > > > @@ -1028,8 +1031,7 @@ redo: > > > /* Complete the partial read above before discarding a malformed > > > * frame, otherwise the stream will be inconsistent. > > > */ > > > - if (l2len < (ssize_t)sizeof(struct ethhdr) || > > > - l2len > (ssize_t)ETH_MAX_MTU) > > > + if (l2len < sizeof(struct ethhdr) || l2len > ETH_MAX_MTU) > > > goto next; > > > tap_add_packet(c, l2len, p); =20 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --MgSSTi17Mpk+UDsc Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmaiJ60ACgkQzQJF27ox 2GcQZw/9Ebdne4aYeRY2RnTqKRKMD9I8G8iqV7sMqdaa4RlLrrYFdfCZho+whJV5 ZFEZfhl9YO1e3tTeuo+r2MKSJjIf+Gnz5iA+KrneyqAt5xDSLbHpOyXiOqrau0Rk 9+rDhQzr7hhmaeJtV1xwh7pcvCRnpGdK2gJatroxoMIcXeJ0lHds9m8GHa9xRTPk JGuDXc5ysDNBtyuNe/H3dFGX5AhdFL7jVgERplON2C8UBd8x3/xPoUzcoKGYXFed jkeYg4BIYTZr4KLLt9uSefJLlW3tQlW6GJqIGogrS9IIiie+xAYsnlvxLAoXKdib NqXRYIjtCx9j/SLmpRMXWKkopBb73Zy/l6dTE8U6pzG/4RakUSFxi2GFj/bFq2m2 PpbgvvoBej+1KR3CuAY/OnsFozvhUlfy8zi1d3Wxlc2DZ10CjBsoesqzAXQDDb53 sqH5a0HrtbidDubwZ4YxQ+bo9CWE79VbhI0LoN8Yp/v6YBLNvKrWMvNrZUIhJXZk zU3yTgPsekWUzXu4CsJm1bSwyDCDuipZJJaPhBw+msCdBtnXX66/1ZD48d2jpqR/ x+5YPMBA5ZHG7aE5Y3kbh34Vt3wadDMABFBdzRHjY0aGIW115hSGhFQXpZtpZEfr X2RtNsvihFug1de0EKR3zQU32Rp5q04WxCmTVF991O9rIz033Tc= =dpGA -----END PGP SIGNATURE----- --MgSSTi17Mpk+UDsc--