public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* migrate/bidirectional debugging
@ 2025-02-07  6:26 David Gibson
  2025-02-07  6:51 ` Stefano Brivio
  0 siblings, 1 reply; 4+ messages in thread
From: David Gibson @ 2025-02-07  6:26 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev


[-- Attachment #1.1: Type: text/plain, Size: 4817 bytes --]

I've spent today trying to debug this failure.  I've gathered a bunch
of information, but no breakthroughs, alas.  At this point I suspect a
kernel bug, though I hope I'm wrong.

# Background.

I think these are as you described it on your system:

 * Most (but not every) time I run migrate/bidirectional it fails,
   with the "outbound" stream only getting the before migration piece
 * I can't reproduce if I put strace on the guest 2 passt.

Possibly unlike you:
 * I'm able to use TRACE=1, and the problem still reproduces
 * I can put strace on the outer pasta and the problem still
   reproduces

The specific anomolies I was focused on were:
 * The passt_2 pcap shows "and from guest 2" coming _inbound_ a bit
   after it (correctly) went outbound
 * The pasta_1 pcap doesn't seem to show "and from guest 2" in either
   direction

# Observations

 * I added a hack (see other series) that let me log comments to the
   pcap file as ethertype 0xffff, this was so I could have debugging
   messages in order with the the captured packets.
 * I used that to bin down exactly where the bogus output "and from
   guest 2" was being recorded, and it's in tcp_vu_data_from_sock()
 * I traced back from there, and passt_2 really does seem to be
   getting "and from guest 2" from a recvmsg() on the socket.  I see
   from my pcap comments that we're getting 17 bytes from recvmsg()
   right before capturing the inbound packet, at any rate.
 * As noted, I couldn't reproduce with an strace on passt_2, so I
   couldn't confirm that piece that way

It kind of seemed like we were sendmsg()ing "and from guest 2" and it
was bouncing straight back to our socket, instead of being delivered
to the outer pasta.

 * I tried putting a dumpcap on 'lo' in the pasta namespace, thinking
   I might see this weird passt->passt packet.  But, nothing.  There
   are thousands of packets of the qemu migration stream, and
   absolutely nothing else.
 * I also tried dumpcap on the external interface in the pasta
   namspace, and I didn't see anything different from what pasta
   captured (although I didn't check super carefully).  In particular
   I didn't seem to see "and from guest 2" in either direction there
   either

 * Since I couldn't strace() passt_2, I instead tried logging
   TCP sendmsg() and recvmsg() calls of length 17 using systemtap
   (script attached).  At this point it gets even weirder:

On a working run (achieved by adding the strace), I get this:

BEGIN
tcp sendmsg(-129530279294592) len=17  - ./passt -s /tmp/passt-tests-niICXS/migrate/passt_2.socket -P /tmp/passt-tests-niICXS/migrate/passt_2.pid -f --vhost-user -p /home/dwg/src/passt/test/test_logs/passt_2.pcap --trace -t 10004 -u 10004
tcp sendmsg(-129489810388608) len=17  - ./pasta -p /home/dwg/src/passt/test/test_logs/pasta_1.pcap --trace --trace -l /tmp/pasta1.log -P /tmp/passt-tests-niICXS/migrate/pasta_1.pid -t 10001,10002,10004 -T 10003 -u 10001,10002,10004 -U 10003 --map-guest-addr 169.254.1.1 --config-net /home/dwg/src/passt/test/nstool hold /tmp/passt-tests-niICXS/migrate/ns1.hold
END

This mostly makes sense.  passt_2 sends the expected outbound packet
to the namespace, then pasta_1 forwards it on to the host.  I don't
know why I'm not seeing the recvmsg() from the socat server, though.

In the failing case, though, I get this:

BEGIN
tcp sendmsg(-129471392995840) len=17  - ./passt -s /tmp/passt-tests-CV71zo/migrate/passt_2.socket -P /tmp/passt-tests-CV71zo/migrate/passt_2.pid -f --vhost-user -p /home/dwg/src/passt/test/test_logs/passt_2.pcap --trace -t 10004 -u 10004
tcp recvmsg(-129476447043584) len=17  - ./pasta -p /home/dwg/src/passt/test/test_logs/pasta_1.pcap --trace --trace -l /tmp/pasta1.log -P /tmp/passt-tests-CV71zo/migrate/pasta_1.pid -t 10001,10002,10004 -T 10003 -u 10001,10002,10004 -U 10003 --map-guest-addr 169.254.1.1 --config-net /home/dwg/src/passt/test/nstool hold /tmp/passt-tests-CV71zo/migrate/ns1.hold
END

First event seems the same: passt_2 sending the outbound packet, as
expected.  The second, though, is weird: the outer pasta seems to
receive the data from a socket, not from tap as we'd expect.  That
might explain the other symptoms, if pasta received it on its socket,
it would send inwards.

But... I don't see pasta sending that "and from guest 2" inbound in
its packet capture.  And, weirder still, although I see that recvmsg()
with systemtap, I don't see it in an strace of pasta.

...and.. that's where I'm at.  Attaching my systemtap script and a
ball of logs.  Hoping they're helpful :/.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #1.2: sockloopback.stp --]
[-- Type: application/p21, Size: 482 bytes --]

[-- Attachment #1.3: weird.tar.xz --]
[-- Type: application/x-xz, Size: 26804 bytes --]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: migrate/bidirectional debugging
  2025-02-07  6:26 migrate/bidirectional debugging David Gibson
@ 2025-02-07  6:51 ` Stefano Brivio
  2025-02-07 23:44   ` David Gibson
  0 siblings, 1 reply; 4+ messages in thread
From: Stefano Brivio @ 2025-02-07  6:51 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Fri, 7 Feb 2025 17:26:35 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> It kind of seemed like we were sendmsg()ing "and from guest 2" and it
> was bouncing straight back to our socket, instead of being delivered
> to the outer pasta.

Oops.

diff --git a/tcp.c b/tcp.c
index 0f05011..b7f5169 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2796,6 +2796,12 @@ static int tcp_flow_repair_queues(int s,
 		debug("Read socket %i receive queue: %zi bytes", s, rc);
 	}
 
+	v = TCP_NO_QUEUE;
+	if (setsockopt(s, SOL_TCP, TCP_REPAIR_QUEUE, &v, sizeof(v))) {
+		err_perror("Setting TCP_NO_QUEUE on socket %i", s);
+		return -errno;
+	}
+
 	return 0;
 }
 

otherwise, I guess, there's a time window in which we might be writing
that message to our queue instead of writing it on the socket, even
with repair mode off. The write might be pending or something.

With this, the stray packet is gone. I spotted this case now:

$ tshark -r test/test_logs/passt_2.pcap 
    1   0.000000 88.198.0.164 → 169.254.1.1  TCP 71 58150 → 10006 [PSH, ACK] Seq=1 Ack=1 Win=1024 Len=17
    2   0.000031 88.198.0.164 → 169.254.1.1  TCP 54 58150 → 10006 [FIN, ACK] Seq=18 Ack=1 Win=1024 Len=0
    3   0.000059  169.254.1.1 → 88.198.0.164 TCP 54 10006 → 58150 [RST, ACK] Seq=1 Ack=1 Win=256 Len=0
    4   0.026443  169.254.1.1 → 88.198.0.164 TCP 74 48150 → 10001 [PSH, ACK] Seq=1 Ack=1 Win=256 Len=20
    5   0.026538 88.198.0.164 → 169.254.1.1  TCP 54 10001 → 48150 [ACK] Seq=1 Ack=21 Win=1023 Len=0
    6   0.026557  169.254.1.1 → 88.198.0.164 TCP 54 48150 → 10001 [FIN, ACK] Seq=21 Ack=1 Win=256 Len=0
    7   0.026656 88.198.0.164 → 169.254.1.1  TCP 54 10001 → 48150 [FIN, ACK] Seq=1 Ack=22 Win=1024 Len=0
    8   0.026675  169.254.1.1 → 88.198.0.164 TCP 54 48150 → 10001 [ACK] Seq=22 Ack=2 Win=256 Len=0
    9 318.959707      fe80::1 → ff02::1      ICMPv6 158 Router Advertisement from 9a:55:9a:55:9a:55

That RST we send as frame #3 looks unwarranted. I wonder if we get
packets from the target guest before we have a chance to set up flows.

-- 
@@ -2796,6 +2796,12 @@ static int tcp_flow_repair_queues(int s,
 		debug("Read socket %i receive queue: %zi bytes", s, rc);
 	}
 
+	v = TCP_NO_QUEUE;
+	if (setsockopt(s, SOL_TCP, TCP_REPAIR_QUEUE, &v, sizeof(v))) {
+		err_perror("Setting TCP_NO_QUEUE on socket %i", s);
+		return -errno;
+	}
+
 	return 0;
 }
 

otherwise, I guess, there's a time window in which we might be writing
that message to our queue instead of writing it on the socket, even
with repair mode off. The write might be pending or something.

With this, the stray packet is gone. I spotted this case now:

$ tshark -r test/test_logs/passt_2.pcap 
    1   0.000000 88.198.0.164 → 169.254.1.1  TCP 71 58150 → 10006 [PSH, ACK] Seq=1 Ack=1 Win=1024 Len=17
    2   0.000031 88.198.0.164 → 169.254.1.1  TCP 54 58150 → 10006 [FIN, ACK] Seq=18 Ack=1 Win=1024 Len=0
    3   0.000059  169.254.1.1 → 88.198.0.164 TCP 54 10006 → 58150 [RST, ACK] Seq=1 Ack=1 Win=256 Len=0
    4   0.026443  169.254.1.1 → 88.198.0.164 TCP 74 48150 → 10001 [PSH, ACK] Seq=1 Ack=1 Win=256 Len=20
    5   0.026538 88.198.0.164 → 169.254.1.1  TCP 54 10001 → 48150 [ACK] Seq=1 Ack=21 Win=1023 Len=0
    6   0.026557  169.254.1.1 → 88.198.0.164 TCP 54 48150 → 10001 [FIN, ACK] Seq=21 Ack=1 Win=256 Len=0
    7   0.026656 88.198.0.164 → 169.254.1.1  TCP 54 10001 → 48150 [FIN, ACK] Seq=1 Ack=22 Win=1024 Len=0
    8   0.026675  169.254.1.1 → 88.198.0.164 TCP 54 48150 → 10001 [ACK] Seq=22 Ack=2 Win=256 Len=0
    9 318.959707      fe80::1 → ff02::1      ICMPv6 158 Router Advertisement from 9a:55:9a:55:9a:55

That RST we send as frame #3 looks unwarranted. I wonder if we get
packets from the target guest before we have a chance to set up flows.

-- 
Stefano


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: migrate/bidirectional debugging
  2025-02-07  6:51 ` Stefano Brivio
@ 2025-02-07 23:44   ` David Gibson
  2025-02-08  0:06     ` Stefano Brivio
  0 siblings, 1 reply; 4+ messages in thread
From: David Gibson @ 2025-02-07 23:44 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 3126 bytes --]

On Fri, Feb 07, 2025 at 07:51:15AM +0100, Stefano Brivio wrote:
> On Fri, 7 Feb 2025 17:26:35 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > It kind of seemed like we were sendmsg()ing "and from guest 2" and it
> > was bouncing straight back to our socket, instead of being delivered
> > to the outer pasta.
> 
> Oops.
> 
> diff --git a/tcp.c b/tcp.c
> index 0f05011..b7f5169 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -2796,6 +2796,12 @@ static int tcp_flow_repair_queues(int s,
>  		debug("Read socket %i receive queue: %zi bytes", s, rc);
>  	}
>  
> +	v = TCP_NO_QUEUE;
> +	if (setsockopt(s, SOL_TCP, TCP_REPAIR_QUEUE, &v, sizeof(v))) {
> +		err_perror("Setting TCP_NO_QUEUE on socket %i", s);
> +		return -errno;
> +	}
> +
>  	return 0;
>  }
>  
> 
> otherwise, I guess, there's a time window in which we might be writing
> that message to our queue instead of writing it on the socket, even
> with repair mode off. The write might be pending or something.

Huh.  It somehow never occurred to me to think that repair stuff might
be happening after we turned repair mode off.  Is that actually the
case, or could it be that the passt-repair protocol bug meant repair
mode sometimes wasn't turned off?

If repair mode stuff is happening after turning it off, I'd say that
is a kernel bug, although a much less nasty and more easily worked
around one than I'd feared (I was contemplating whether repair mode
connect() might be looking something us wrong and wiring things up to
the wrong place).

> With this, the stray packet is gone. I spotted this case now:
> 
> $ tshark -r test/test_logs/passt_2.pcap 
>     1   0.000000 88.198.0.164 → 169.254.1.1  TCP 71 58150 → 10006 [PSH, ACK] Seq=1 Ack=1 Win=1024 Len=17
>     2   0.000031 88.198.0.164 → 169.254.1.1  TCP 54 58150 → 10006 [FIN, ACK] Seq=18 Ack=1 Win=1024 Len=0
>     3   0.000059  169.254.1.1 → 88.198.0.164 TCP 54 10006 → 58150 [RST, ACK] Seq=1 Ack=1 Win=256 Len=0
>     4   0.026443  169.254.1.1 → 88.198.0.164 TCP 74 48150 → 10001 [PSH, ACK] Seq=1 Ack=1 Win=256 Len=20
>     5   0.026538 88.198.0.164 → 169.254.1.1  TCP 54 10001 → 48150 [ACK] Seq=1 Ack=21 Win=1023 Len=0
>     6   0.026557  169.254.1.1 → 88.198.0.164 TCP 54 48150 → 10001 [FIN, ACK] Seq=21 Ack=1 Win=256 Len=0
>     7   0.026656 88.198.0.164 → 169.254.1.1  TCP 54 10001 → 48150 [FIN, ACK] Seq=1 Ack=22 Win=1024 Len=0
>     8   0.026675  169.254.1.1 → 88.198.0.164 TCP 54 48150 → 10001 [ACK] Seq=22 Ack=2 Win=256 Len=0
>     9 318.959707      fe80::1 → ff02::1      ICMPv6 158 Router Advertisement from 9a:55:9a:55:9a:55
> 
> That RST we send as frame #3 looks unwarranted. I wonder if we get
> packets from the target guest before we have a chance to set up flows.

I don't think that should be possible: the target guest shouldn't be
resumed until after we ack the check_device_state.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: migrate/bidirectional debugging
  2025-02-07 23:44   ` David Gibson
@ 2025-02-08  0:06     ` Stefano Brivio
  0 siblings, 0 replies; 4+ messages in thread
From: Stefano Brivio @ 2025-02-08  0:06 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Sat, 8 Feb 2025 10:44:03 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Fri, Feb 07, 2025 at 07:51:15AM +0100, Stefano Brivio wrote:
> > On Fri, 7 Feb 2025 17:26:35 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > It kind of seemed like we were sendmsg()ing "and from guest 2" and it
> > > was bouncing straight back to our socket, instead of being delivered
> > > to the outer pasta.  
> > 
> > Oops.
> > 
> > diff --git a/tcp.c b/tcp.c
> > index 0f05011..b7f5169 100644
> > --- a/tcp.c
> > +++ b/tcp.c
> > @@ -2796,6 +2796,12 @@ static int tcp_flow_repair_queues(int s,
> >  		debug("Read socket %i receive queue: %zi bytes", s, rc);
> >  	}
> >  
> > +	v = TCP_NO_QUEUE;
> > +	if (setsockopt(s, SOL_TCP, TCP_REPAIR_QUEUE, &v, sizeof(v))) {
> > +		err_perror("Setting TCP_NO_QUEUE on socket %i", s);
> > +		return -errno;
> > +	}
> > +
> >  	return 0;
> >  }
> >  
> > 
> > otherwise, I guess, there's a time window in which we might be writing
> > that message to our queue instead of writing it on the socket, even
> > with repair mode off. The write might be pending or something.  
> 
> Huh.  It somehow never occurred to me to think that repair stuff might
> be happening after we turned repair mode off.  Is that actually the
> case, or could it be that the passt-repair protocol bug meant repair
> mode sometimes wasn't turned off?

Nah, sorry, never mind, it's all fixed by the patch for passt-repair I
sent. Setting TCP_NO_QUEUE would hide the issue.

> If repair mode stuff is happening after turning it off, I'd say that
> is a kernel bug, although a much less nasty and more easily worked
> around one than I'd feared (I was contemplating whether repair mode
> connect() might be looking something us wrong and wiring things up to
> the wrong place).
> 
> > With this, the stray packet is gone. I spotted this case now:
> > 
> > $ tshark -r test/test_logs/passt_2.pcap 
> >     1   0.000000 88.198.0.164 → 169.254.1.1  TCP 71 58150 → 10006 [PSH, ACK] Seq=1 Ack=1 Win=1024 Len=17
> >     2   0.000031 88.198.0.164 → 169.254.1.1  TCP 54 58150 → 10006 [FIN, ACK] Seq=18 Ack=1 Win=1024 Len=0
> >     3   0.000059  169.254.1.1 → 88.198.0.164 TCP 54 10006 → 58150 [RST, ACK] Seq=1 Ack=1 Win=256 Len=0
> >     4   0.026443  169.254.1.1 → 88.198.0.164 TCP 74 48150 → 10001 [PSH, ACK] Seq=1 Ack=1 Win=256 Len=20
> >     5   0.026538 88.198.0.164 → 169.254.1.1  TCP 54 10001 → 48150 [ACK] Seq=1 Ack=21 Win=1023 Len=0
> >     6   0.026557  169.254.1.1 → 88.198.0.164 TCP 54 48150 → 10001 [FIN, ACK] Seq=21 Ack=1 Win=256 Len=0
> >     7   0.026656 88.198.0.164 → 169.254.1.1  TCP 54 10001 → 48150 [FIN, ACK] Seq=1 Ack=22 Win=1024 Len=0
> >     8   0.026675  169.254.1.1 → 88.198.0.164 TCP 54 48150 → 10001 [ACK] Seq=22 Ack=2 Win=256 Len=0
> >     9 318.959707      fe80::1 → ff02::1      ICMPv6 158 Router Advertisement from 9a:55:9a:55:9a:55
> > 
> > That RST we send as frame #3 looks unwarranted. I wonder if we get
> > packets from the target guest before we have a chance to set up flows.  
> 
> I don't think that should be possible: the target guest shouldn't be
> resumed until after we ack the check_device_state.

Correct, I also figured that out later.

-- 
Stefano


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-02-08  0:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-07  6:26 migrate/bidirectional debugging David Gibson
2025-02-07  6:51 ` Stefano Brivio
2025-02-07 23:44   ` David Gibson
2025-02-08  0:06     ` Stefano Brivio

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).