Re: [PATCH 10/11] tap: Discard guest data on length descriptor mismatch

public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed

From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: [PATCH 10/11] tap: Discard guest data on length descriptor mismatch
Date: Thu, 25 Jul 2024 13:09:19 +0200	[thread overview]
Message-ID: <20240725130908.2ca21e7c@elisabeth> (raw)
In-Reply-To: <ZqInu0VUjlLW9T3S@zatzit>

On Thu, 25 Jul 2024 20:23:55 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Jul 25, 2024 at 11:15:03AM +0200, Stefano Brivio wrote:
> > On Thu, 25 Jul 2024 14:37:43 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Wed, Jul 24, 2024 at 11:50:16PM +0200, Stefano Brivio wrote:  
> > > > This was reported by Matej a while ago, but we forgot to fix it. Even
> > > > if the hypervisor is necessarily trusted by passt, as it can in any
> > > > case terminate the guest or disrupt guest connectivity, it's a good
> > > > idea to be robust against possible issues.
> > > > 
> > > > Instead of resetting the connection to the hypervisor, just discard
> > > > the data we read with a single recv(), as we had a few cases where
> > > > QEMU would get the length descriptor wrong, in the past.
> > > > 
> > > > While at it, change l2len in tap_handler_passt() to uint32_t, as the
> > > > length descriptor is logically unsigned and 32-bit wide.
> > > > 
> > > > Reported-by: Matej Hrica <mhrica@redhat.com>
> > > > Suggested-by: Matej Hrica <mhrica@redhat.com>
> > > > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> > > > ---
> > > >  tap.c | 10 ++++++----
> > > >  1 file changed, 6 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/tap.c b/tap.c
> > > > index 44bd444..62ba6a4 100644
> > > > --- a/tap.c
> > > > +++ b/tap.c
> > > > @@ -1011,15 +1011,18 @@ redo:
> > > >  	}
> > > >  
> > > >  	while (n > (ssize_t)sizeof(uint32_t)) {
> > > > -		ssize_t l2len = ntohl(*(uint32_t *)p);
> > > > +		uint32_t l2len = ntohl(*(uint32_t *)p);
> > > >  
> > > >  		p += sizeof(uint32_t);
> > > >  		n -= sizeof(uint32_t);
> > > >  
> > > > +		if (l2len > (ssize_t)TAP_BUF_BYTES - n)
> > > > +			return;    
> > > 
> > > Neither the condition nor the action makes much sense to me here.
> > > We're testing if the frame can fit in the the remaining buffer space.  
> > 
> > Not really, we're just checking that the length descriptor fits the
> > remaining buffer space. We're using this in the second recv() below,
> > that's why it matters here.  
> 
> But AFAICT, what we need to know is if the remainder of the frame fits
> in the buffer.

It always does because of how TAP_BUF_BYTES and TAP_BUF_FILL are
defined.

> That could be less than the length descriptor if we've
> already recv()ed part of a frame.
> 
> > > But we may have already read part (or all) of the frame - i.e. it's
> > > included in 'n'.  So I don't see how that condition is useful.  
> > 
> > ...that is, it has nothing to do with exceeding or not exceeding the
> > buffer on recv(), that's already taken care of by the recv() call,
> > implicitly.
> >   
> > > Then, simply returning doesn't seem right under pretty much any
> > > circumstances - that discards some amount of data, and leaves us in an
> > > unsynchronized state w.r.t. the frame boundaries.  
> > 
> > That might happen, of course, but it might also happen that the
> > hypervisor sent us *one* corrupted buffer, and the next recv() will
> > read consistent data.  
> 
> Well, sure, it's possible, but it doesn't seem particularly likely to
> me.  AFAICT this is a stream which we need every length field to
> interpret properly.  If we lose one, or it's corrupted, I think we're
> done for.

In most cases, the content of one recv() corresponds to a given number
of whole frames, so what we're doing by returning here is, practically
speaking, similar to what you're suggesting below with MSG_TRUNC.

> > > If this is just supposed to be a sanity check on the frame length,
> > > then I think we'd be better off with a fixed limit - 64kiB is the
> > > obvious choice.  
> > 
> > That's already checked below (l2len > ETH_MAX_MTU), and...  
> 
> Right.  I wonder if it would make sense to do that earlier.

We can. But right now what I want to fix here is just that missing
check, because it actually causes passt to crash (even though we assume
a trusted hypervisor, so it's not really security relevant, but not nice
either).

> > > If we hit that, we can warn() and discard data up to
> > > the end of the too-large frame.  That at least has a chance of letting
> > > us recover and move on to future acceptable frames.  
> > 
> > that's exactly what we do in that case (goto next).  
> 
> Only for the case that the length is too long, but not *too* long.  In
> particular it needs to fit in the buffer to even get there.  If we
> sanity checked the frame length earlier we could use MSG_TRUNC to
> discard even a ludicrously large frame and still continue on to the
> next one.

We don't know if the length descriptor actually matches the length of
the frame, though. If you have a way you think is more robust, would
you mind sending a patch?

> > > >  		/* At most one packet might not fit in a single read, and this
> > > >  		 * needs to be blocking.
> > > >  		 */
> > > > -		if (l2len > n) {
> > > > +		if (l2len > (size_t)n) {
> > > >  			rem = recv(c->fd_tap, p + n, l2len - n, 0);  
> >                                               ^^^^^^^^^^^^^^^^
> > 
> > This the reason why the check above is relevant.  
> 
> Relevant, sure, but I still don't think it's right.  Actually
> (TAP_BUF_BYTES - n) is an even stranger quantity than I initially
> thought.  It's the total space of the buffer minus the current partial
> frame - counting *both* the stuff before our partial frame and after
> it.

That should be enough to make sure we don't have bad side effects on
this second recv(), but yes:

> I think instead we need to check for (p + l2len > pkt_buf + TAP_BUF_BYTES).

...that is actually more accurate. I can fix this up in another
version, unless you can think of a more comprehensive change that also
gives us better possibilities to recover from a corrupted stream.

> > > >  			if ((n += rem) != l2len)
> > > >  				return;    
> > > 
> > > Pre-existing, but a 'return' here basically lands us in a situation we
> > > have no meaningful chance of recovering from.  A die() would be
> > > preferable.  Better yet would be continuing to re-recv() until we have
> > > the whole frame, similar to what we do for write_remainder().  
> > 
> > Same as above, it depends on what failure you're assuming. If it's just
> > one botched recv(), instead, we recv() again the next time and we
> > recover.  
> 
> Even if it's just one bad recv(), we still have no idea where we are
> w.r.t. frame boundaries, so I can't see any way we could recover.

The idea is that the recv() is _usually_ big enough as to flush the
buffer anyway.

> > But yes, the first attempt should probably be to recv() the rest of the
> > frame. I didn't implement this because it adds complexity and I think
> > that, eventually, we should turn this into a proper ringbuffer anyway.
> >   
> > > > @@ -1028,8 +1031,7 @@ redo:
> > > >  		/* Complete the partial read above before discarding a malformed
> > > >  		 * frame, otherwise the stream will be inconsistent.
> > > >  		 */
> > > > -		if (l2len < (ssize_t)sizeof(struct ethhdr) ||
> > > > -		    l2len > (ssize_t)ETH_MAX_MTU)
> > > > +		if (l2len < sizeof(struct ethhdr) || l2len > ETH_MAX_MTU)
> > > >  			goto next;
> > > >  		tap_add_packet(c, l2len, p);    
> >   
> 

-- 
Stefano

next prev parent reply	other threads:[~2024-07-25 11:10 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-24 21:50 [PATCH 00/11] Minor assorted fixes, mostly logging and tests Stefano Brivio
2024-07-24 21:50 ` [PATCH 01/11] tap: Don't quit if pasta gets EIO on writev() to tap, interface might be down Stefano Brivio
2024-07-25  3:21   ` David Gibson
2024-07-24 21:50 ` [PATCH 02/11] tcp: Change SO_PEEK_OFF support message to debug() Stefano Brivio
2024-07-25  3:22   ` David Gibson
2024-07-24 21:50 ` [PATCH 03/11] log: Drop newlines in the middle of the perror()-like messages Stefano Brivio
2024-07-25  3:26   ` David Gibson
2024-07-25 11:27     ` Stefano Brivio
2024-07-26  0:33       ` David Gibson
2024-07-24 21:50 ` [PATCH 04/11] log: Fix sub-second part in relative log time calculation Stefano Brivio
2024-07-25  3:32   ` David Gibson
2024-07-25  7:51     ` Stefano Brivio
2024-07-24 21:50 ` [PATCH 05/11] log: Initialise timestamp for relative log time also if we use a log file Stefano Brivio
2024-07-25  3:35   ` David Gibson
2024-07-25  7:51     ` Stefano Brivio
2024-07-24 21:50 ` [PATCH 06/11] test: Fix memory/passt tests, --netns-only is not a valid option for passt Stefano Brivio
2024-07-25  3:50   ` David Gibson
2024-07-24 21:50 ` [PATCH 07/11] test: Update names of symbols and slabinfo entries Stefano Brivio
2024-07-25  3:54   ` David Gibson
2024-07-24 21:50 ` [PATCH 08/11] test: iperf3 3.16 introduces multiple threads, drop our own implementation of that Stefano Brivio
2024-07-25  4:23   ` David Gibson
2024-07-24 21:50 ` [PATCH 09/11] tap: Exit if we fail to bind a UNIX domain socket with explicit path Stefano Brivio
2024-07-25  4:00   ` David Gibson
2024-07-24 21:50 ` [PATCH 10/11] tap: Discard guest data on length descriptor mismatch Stefano Brivio
2024-07-25  4:37   ` David Gibson
2024-07-25  9:15     ` Stefano Brivio
2024-07-25 10:23       ` David Gibson
2024-07-25 11:09         ` Stefano Brivio [this message]
2024-07-26  1:22           ` David Gibson
2024-07-24 21:50 ` [PATCH 11/11] conf: Accept addresses enclosed by square brackets in port forwarding specifiers Stefano Brivio
2024-07-25  4:39   ` David Gibson
2024-07-25 11:26     ` Stefano Brivio
2024-07-25 11:28 ` [PATCH 00/11] Minor assorted fixes, mostly logging and tests Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240725130908.2ca21e7c@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).