public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH 0/2] Try harder to avoid inconsistent qemu packet stream
@ 2022-11-08  8:54 Stefano Brivio
  2022-11-08  8:54 ` [PATCH 1/2] tap: Keep stream consistent if qemu length descriptor spans two recv() calls Stefano Brivio
  2022-11-08  8:54 ` [PATCH 2/2] tap: Return -EIO from tap_handler_passt() on inconsistent packet stream Stefano Brivio
  0 siblings, 2 replies; 6+ messages in thread
From: Stefano Brivio @ 2022-11-08  8:54 UTC (permalink / raw)
  To: passt-dev

...and handle the situation more reasonably if that happens.

Stefano Brivio (2):
  tap: Keep stream consistent if qemu length descriptor spans two recv()
    calls
  tap: Return -EIO from tap_handler_passt() on inconsistent packet
    stream

 tap.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] tap: Keep stream consistent if qemu length descriptor spans two recv() calls
  2022-11-08  8:54 [PATCH 0/2] Try harder to avoid inconsistent qemu packet stream Stefano Brivio
@ 2022-11-08  8:54 ` Stefano Brivio
  2022-11-09 10:15   ` David Gibson
  2022-11-08  8:54 ` [PATCH 2/2] tap: Return -EIO from tap_handler_passt() on inconsistent packet stream Stefano Brivio
  1 sibling, 1 reply; 6+ messages in thread
From: Stefano Brivio @ 2022-11-08  8:54 UTC (permalink / raw)
  To: passt-dev

I got all paranoid after triggering a divide-by-zero general
protection fault in passt with a qemu version without the virtio_net
TX hang fix, while flooding UDP. I start thinking this was actually
coming from some random changes I was playing with, but before
reaching this conclusion I reviewed once more the relatively short
path in tap_handler_passt() before we start using packet_*()
functions, and found this.

Never observed in practice, but artificially reproduced with changes
in qemu's socket interface: if we don't receive from qemu a complete
length descriptor in one recv() call, or if we receive a partial one
at the end of one call, we currently disregard the rest, which would
make the stream inconsistent.

Nothing really bad happens, except that from that point on we would
disregard all the packets we get until, if ever, we get the stream
back in sync by chance.

Force reading a complete packet length descriptor with a blocking
recv(), if needed -- not just a complete packet later.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 tap.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/tap.c b/tap.c
index f8314ef..11ac732 100644
--- a/tap.c
+++ b/tap.c
@@ -747,14 +747,26 @@ redo:
 		return -ECONNRESET;
 	}
 
-	while (n > (ssize_t)sizeof(uint32_t)) {
-		ssize_t len = ntohl(*(uint32_t *)p);
+	while (n > 0) {
+		ssize_t len;
+
+		/* Force receiving at least a complete length descriptor to
+		 * avoid an inconsistent stream.
+		 */
+		if (n < (ssize_t)sizeof(uint32_t)) {
+			rem = recv(c->fd_tap, p + n,
+				   (ssize_t)sizeof(uint32_t) - n, 0);
+			if ((n += rem) != (ssize_t)sizeof(uint32_t))
+				return 0;
+		}
+
+		len = ntohl(*(uint32_t *)p);
 
 		p += sizeof(uint32_t);
 		n -= sizeof(uint32_t);
 
 		/* At most one packet might not fit in a single read, and this
-		 * needs to be blocking.
+		 * also needs to be blocking.
 		 */
 		if (len > n) {
 			rem = recv(c->fd_tap, p + n, len - n, 0);
-- 
@@ -747,14 +747,26 @@ redo:
 		return -ECONNRESET;
 	}
 
-	while (n > (ssize_t)sizeof(uint32_t)) {
-		ssize_t len = ntohl(*(uint32_t *)p);
+	while (n > 0) {
+		ssize_t len;
+
+		/* Force receiving at least a complete length descriptor to
+		 * avoid an inconsistent stream.
+		 */
+		if (n < (ssize_t)sizeof(uint32_t)) {
+			rem = recv(c->fd_tap, p + n,
+				   (ssize_t)sizeof(uint32_t) - n, 0);
+			if ((n += rem) != (ssize_t)sizeof(uint32_t))
+				return 0;
+		}
+
+		len = ntohl(*(uint32_t *)p);
 
 		p += sizeof(uint32_t);
 		n -= sizeof(uint32_t);
 
 		/* At most one packet might not fit in a single read, and this
-		 * needs to be blocking.
+		 * also needs to be blocking.
 		 */
 		if (len > n) {
 			rem = recv(c->fd_tap, p + n, len - n, 0);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] tap: Return -EIO from tap_handler_passt() on inconsistent packet stream
  2022-11-08  8:54 [PATCH 0/2] Try harder to avoid inconsistent qemu packet stream Stefano Brivio
  2022-11-08  8:54 ` [PATCH 1/2] tap: Keep stream consistent if qemu length descriptor spans two recv() calls Stefano Brivio
@ 2022-11-08  8:54 ` Stefano Brivio
  1 sibling, 0 replies; 6+ messages in thread
From: Stefano Brivio @ 2022-11-08  8:54 UTC (permalink / raw)
  To: passt-dev

While it's important to fail in that case, it makes little sense to
fail quietly: it's better to tell qemu explicitly that something went
wrong and that we won't recover, by closing the socket.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 tap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tap.c b/tap.c
index 11ac732..abeff25 100644
--- a/tap.c
+++ b/tap.c
@@ -757,7 +757,7 @@ redo:
 			rem = recv(c->fd_tap, p + n,
 				   (ssize_t)sizeof(uint32_t) - n, 0);
 			if ((n += rem) != (ssize_t)sizeof(uint32_t))
-				return 0;
+				return -EIO;
 		}
 
 		len = ntohl(*(uint32_t *)p);
@@ -771,7 +771,7 @@ redo:
 		if (len > n) {
 			rem = recv(c->fd_tap, p + n, len - n, 0);
 			if ((n += rem) != len)
-				return 0;
+				return -EIO;
 		}
 
 		/* Complete the partial read above before discarding a malformed
-- 
@@ -757,7 +757,7 @@ redo:
 			rem = recv(c->fd_tap, p + n,
 				   (ssize_t)sizeof(uint32_t) - n, 0);
 			if ((n += rem) != (ssize_t)sizeof(uint32_t))
-				return 0;
+				return -EIO;
 		}
 
 		len = ntohl(*(uint32_t *)p);
@@ -771,7 +771,7 @@ redo:
 		if (len > n) {
 			rem = recv(c->fd_tap, p + n, len - n, 0);
 			if ((n += rem) != len)
-				return 0;
+				return -EIO;
 		}
 
 		/* Complete the partial read above before discarding a malformed
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] tap: Keep stream consistent if qemu length descriptor spans two recv() calls
  2022-11-08  8:54 ` [PATCH 1/2] tap: Keep stream consistent if qemu length descriptor spans two recv() calls Stefano Brivio
@ 2022-11-09 10:15   ` David Gibson
  2022-11-09 10:29     ` Stefano Brivio
  0 siblings, 1 reply; 6+ messages in thread
From: David Gibson @ 2022-11-09 10:15 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 3176 bytes --]

On Tue, Nov 08, 2022 at 09:54:18AM +0100, Stefano Brivio wrote:
> I got all paranoid after triggering a divide-by-zero general
> protection fault in passt with a qemu version without the virtio_net
> TX hang fix, while flooding UDP. I start thinking this was actually
> coming from some random changes I was playing with, but before
> reaching this conclusion I reviewed once more the relatively short
> path in tap_handler_passt() before we start using packet_*()
> functions, and found this.
> 
> Never observed in practice, but artificially reproduced with changes
> in qemu's socket interface: if we don't receive from qemu a complete
> length descriptor in one recv() call, or if we receive a partial one
> at the end of one call, we currently disregard the rest, which would
> make the stream inconsistent.
> 
> Nothing really bad happens, except that from that point on we would
> disregard all the packets we get until, if ever, we get the stream
> back in sync by chance.
> 
> Force reading a complete packet length descriptor with a blocking
> recv(), if needed -- not just a complete packet later.
> 
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

This seems an ok short term fix, but I think we want another approach
in the slightly longer term.  Read on..

> ---
>  tap.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/tap.c b/tap.c
> index f8314ef..11ac732 100644
> --- a/tap.c
> +++ b/tap.c
> @@ -747,14 +747,26 @@ redo:
>  		return -ECONNRESET;
>  	}
>  
> -	while (n > (ssize_t)sizeof(uint32_t)) {
> -		ssize_t len = ntohl(*(uint32_t *)p);
> +	while (n > 0) {
> +		ssize_t len;
> +
> +		/* Force receiving at least a complete length descriptor to
> +		 * avoid an inconsistent stream.
> +		 */

Is it actually enough for this to be blocking?  AFAICT, recv() on a
stream socket, like read(), can return less data than you requested.

> +		if (n < (ssize_t)sizeof(uint32_t)) {
> +			rem = recv(c->fd_tap, p + n,
> +				   (ssize_t)sizeof(uint32_t) - n, 0);
> +			if ((n += rem) != (ssize_t)sizeof(uint32_t))
> +				return 0;
> +		}
> +
> +		len = ntohl(*(uint32_t *)p);
>  
>  		p += sizeof(uint32_t);
>  		n -= sizeof(uint32_t);
>  
>  		/* At most one packet might not fit in a single read, and this
> -		 * needs to be blocking.
> +		 * also needs to be blocking.

Same issue here (obviously not introduced by this patch, though).

>  		 */
>  		if (len > n) {
>  			rem = recv(c->fd_tap, p + n, len - n, 0);

Can we handle both these cases more neatly (and without blocking
recv()) calls, if we maintain two pointers into pkt_buf.  The first
one tracks how much we've read from the qemu socket, the second tracks
how much has been parsed into packets.  When we get an epoll
notification on the qemu socket, we recv() and advance the first
pointer.  Then we discern as many full packets as we can, advancing
the second pointer.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] tap: Keep stream consistent if qemu length descriptor spans two recv() calls
  2022-11-09 10:15   ` David Gibson
@ 2022-11-09 10:29     ` Stefano Brivio
  2022-11-10 12:59       ` Stefano Brivio
  0 siblings, 1 reply; 6+ messages in thread
From: Stefano Brivio @ 2022-11-09 10:29 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Wed, 9 Nov 2022 21:15:48 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Tue, Nov 08, 2022 at 09:54:18AM +0100, Stefano Brivio wrote:
> > I got all paranoid after triggering a divide-by-zero general
> > protection fault in passt with a qemu version without the virtio_net
> > TX hang fix, while flooding UDP. I start thinking this was actually
> > coming from some random changes I was playing with, but before
> > reaching this conclusion I reviewed once more the relatively short
> > path in tap_handler_passt() before we start using packet_*()
> > functions, and found this.
> > 
> > Never observed in practice, but artificially reproduced with changes
> > in qemu's socket interface: if we don't receive from qemu a complete
> > length descriptor in one recv() call, or if we receive a partial one
> > at the end of one call, we currently disregard the rest, which would
> > make the stream inconsistent.
> > 
> > Nothing really bad happens, except that from that point on we would
> > disregard all the packets we get until, if ever, we get the stream
> > back in sync by chance.
> > 
> > Force reading a complete packet length descriptor with a blocking
> > recv(), if needed -- not just a complete packet later.
> > 
> > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>  
> 
> This seems an ok short term fix, but I think we want another approach
> in the slightly longer term.  Read on..
> 
> > ---
> >  tap.c | 18 +++++++++++++++---
> >  1 file changed, 15 insertions(+), 3 deletions(-)
> > 
> > diff --git a/tap.c b/tap.c
> > index f8314ef..11ac732 100644
> > --- a/tap.c
> > +++ b/tap.c
> > @@ -747,14 +747,26 @@ redo:
> >  		return -ECONNRESET;
> >  	}
> >  
> > -	while (n > (ssize_t)sizeof(uint32_t)) {
> > -		ssize_t len = ntohl(*(uint32_t *)p);
> > +	while (n > 0) {
> > +		ssize_t len;
> > +
> > +		/* Force receiving at least a complete length descriptor to
> > +		 * avoid an inconsistent stream.
> > +		 */  
> 
> Is it actually enough for this to be blocking?  AFAICT, recv() on a
> stream socket, like read(), can return less data than you requested.

It's not enough, hence the check on 'rem' afterwards, and this doesn't
cover anyway the case were qemu would decide to send one byte at a
time (because as you pointed out blocking doesn't mean we'll get the
full amount requested), which never happens in practice, though.

> > +		if (n < (ssize_t)sizeof(uint32_t)) {
> > +			rem = recv(c->fd_tap, p + n,
> > +				   (ssize_t)sizeof(uint32_t) - n, 0);
> > +			if ((n += rem) != (ssize_t)sizeof(uint32_t))
> > +				return 0;
> > +		}
> > +
> > +		len = ntohl(*(uint32_t *)p);
> >  
> >  		p += sizeof(uint32_t);
> >  		n -= sizeof(uint32_t);
> >  
> >  		/* At most one packet might not fit in a single read, and this
> > -		 * needs to be blocking.
> > +		 * also needs to be blocking.  
> 
> Same issue here (obviously not introduced by this patch, though).

Same here.

> >  		 */
> >  		if (len > n) {
> >  			rem = recv(c->fd_tap, p + n, len - n, 0);  
> 
> Can we handle both these cases more neatly (and without blocking
> recv()) calls, if we maintain two pointers into pkt_buf.  The first
> one tracks how much we've read from the qemu socket, the second tracks
> how much has been parsed into packets.  When we get an epoll
> notification on the qemu socket, we recv() and advance the first
> pointer.  Then we discern as many full packets as we can, advancing
> the second pointer.

Yes, and I actually drafted something like that, but it takes a lot of
attention and time to get it right, so I preferred to keep it simple
until now. I can file a ticket as enhancement.

Also, given that a subsequent recv() would operate on the "next"
pointer, it will have less space available than the first one. Ideally
this should be a ringbuffer (using scatter-gather IO).

-- 
Stefano


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] tap: Keep stream consistent if qemu length descriptor spans two recv() calls
  2022-11-09 10:29     ` Stefano Brivio
@ 2022-11-10 12:59       ` Stefano Brivio
  0 siblings, 0 replies; 6+ messages in thread
From: Stefano Brivio @ 2022-11-10 12:59 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Wed, 9 Nov 2022 11:29:29 +0100
Stefano Brivio <sbrivio@redhat.com> wrote:

> On Wed, 9 Nov 2022 21:15:48 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Tue, Nov 08, 2022 at 09:54:18AM +0100, Stefano Brivio wrote:  
> > > I got all paranoid after triggering a divide-by-zero general
> > > protection fault in passt with a qemu version without the virtio_net
> > > TX hang fix, while flooding UDP. I start thinking this was actually
> > > coming from some random changes I was playing with, but before
> > > reaching this conclusion I reviewed once more the relatively short
> > > path in tap_handler_passt() before we start using packet_*()
> > > functions, and found this.
> > > 
> > > Never observed in practice, but artificially reproduced with changes
> > > in qemu's socket interface: if we don't receive from qemu a complete
> > > length descriptor in one recv() call, or if we receive a partial one
> > > at the end of one call, we currently disregard the rest, which would
> > > make the stream inconsistent.
> > > 
> > > Nothing really bad happens, except that from that point on we would
> > > disregard all the packets we get until, if ever, we get the stream
> > > back in sync by chance.
> > > 
> > > Force reading a complete packet length descriptor with a blocking
> > > recv(), if needed -- not just a complete packet later.
> > > 
> > > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>    
> > 
> > This seems an ok short term fix, but I think we want another approach
> > in the slightly longer term.  Read on..
> >   
> > > ---
> > >  tap.c | 18 +++++++++++++++---
> > >  1 file changed, 15 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/tap.c b/tap.c
> > > index f8314ef..11ac732 100644
> > > --- a/tap.c
> > > +++ b/tap.c
> > > @@ -747,14 +747,26 @@ redo:
> > >  		return -ECONNRESET;
> > >  	}
> > >  
> > > -	while (n > (ssize_t)sizeof(uint32_t)) {
> > > -		ssize_t len = ntohl(*(uint32_t *)p);
> > > +	while (n > 0) {
> > > +		ssize_t len;
> > > +
> > > +		/* Force receiving at least a complete length descriptor to
> > > +		 * avoid an inconsistent stream.
> > > +		 */    
> > 
> > Is it actually enough for this to be blocking?  AFAICT, recv() on a
> > stream socket, like read(), can return less data than you requested.  
> 
> It's not enough, hence the check on 'rem' afterwards, and this doesn't
> cover anyway the case were qemu would decide to send one byte at a
> time (because as you pointed out blocking doesn't mean we'll get the
> full amount requested), which never happens in practice, though.
> 
> > > +		if (n < (ssize_t)sizeof(uint32_t)) {
> > > +			rem = recv(c->fd_tap, p + n,
> > > +				   (ssize_t)sizeof(uint32_t) - n, 0);
> > > +			if ((n += rem) != (ssize_t)sizeof(uint32_t))
> > > +				return 0;
> > > +		}
> > > +
> > > +		len = ntohl(*(uint32_t *)p);
> > >  
> > >  		p += sizeof(uint32_t);
> > >  		n -= sizeof(uint32_t);
> > >  
> > >  		/* At most one packet might not fit in a single read, and this
> > > -		 * needs to be blocking.
> > > +		 * also needs to be blocking.    
> > 
> > Same issue here (obviously not introduced by this patch, though).  
> 
> Same here.
> 
> > >  		 */
> > >  		if (len > n) {
> > >  			rem = recv(c->fd_tap, p + n, len - n, 0);    
> > 
> > Can we handle both these cases more neatly (and without blocking
> > recv()) calls, if we maintain two pointers into pkt_buf.  The first
> > one tracks how much we've read from the qemu socket, the second tracks
> > how much has been parsed into packets.  When we get an epoll
> > notification on the qemu socket, we recv() and advance the first
> > pointer.  Then we discern as many full packets as we can, advancing
> > the second pointer.  
> 
> Yes, and I actually drafted something like that, but it takes a lot of
> attention and time to get it right, so I preferred to keep it simple
> until now. I can file a ticket as enhancement.

https://bugs.passt.top/show_bug.cgi?id=38

-- 
Stefano


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-11-10 12:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-08  8:54 [PATCH 0/2] Try harder to avoid inconsistent qemu packet stream Stefano Brivio
2022-11-08  8:54 ` [PATCH 1/2] tap: Keep stream consistent if qemu length descriptor spans two recv() calls Stefano Brivio
2022-11-09 10:15   ` David Gibson
2022-11-09 10:29     ` Stefano Brivio
2022-11-10 12:59       ` Stefano Brivio
2022-11-08  8:54 ` [PATCH 2/2] tap: Return -EIO from tap_handler_passt() on inconsistent packet stream Stefano Brivio

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).