From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: migrate/bidirectional debugging
Date: Fri, 7 Feb 2025 17:26:35 +1100 [thread overview]
Message-ID: <Z6WnmyqIgPkRgV-T@zatzit> (raw)
[-- Attachment #1.1: Type: text/plain, Size: 4817 bytes --]
I've spent today trying to debug this failure. I've gathered a bunch
of information, but no breakthroughs, alas. At this point I suspect a
kernel bug, though I hope I'm wrong.
# Background.
I think these are as you described it on your system:
* Most (but not every) time I run migrate/bidirectional it fails,
with the "outbound" stream only getting the before migration piece
* I can't reproduce if I put strace on the guest 2 passt.
Possibly unlike you:
* I'm able to use TRACE=1, and the problem still reproduces
* I can put strace on the outer pasta and the problem still
reproduces
The specific anomolies I was focused on were:
* The passt_2 pcap shows "and from guest 2" coming _inbound_ a bit
after it (correctly) went outbound
* The pasta_1 pcap doesn't seem to show "and from guest 2" in either
direction
# Observations
* I added a hack (see other series) that let me log comments to the
pcap file as ethertype 0xffff, this was so I could have debugging
messages in order with the the captured packets.
* I used that to bin down exactly where the bogus output "and from
guest 2" was being recorded, and it's in tcp_vu_data_from_sock()
* I traced back from there, and passt_2 really does seem to be
getting "and from guest 2" from a recvmsg() on the socket. I see
from my pcap comments that we're getting 17 bytes from recvmsg()
right before capturing the inbound packet, at any rate.
* As noted, I couldn't reproduce with an strace on passt_2, so I
couldn't confirm that piece that way
It kind of seemed like we were sendmsg()ing "and from guest 2" and it
was bouncing straight back to our socket, instead of being delivered
to the outer pasta.
* I tried putting a dumpcap on 'lo' in the pasta namespace, thinking
I might see this weird passt->passt packet. But, nothing. There
are thousands of packets of the qemu migration stream, and
absolutely nothing else.
* I also tried dumpcap on the external interface in the pasta
namspace, and I didn't see anything different from what pasta
captured (although I didn't check super carefully). In particular
I didn't seem to see "and from guest 2" in either direction there
either
* Since I couldn't strace() passt_2, I instead tried logging
TCP sendmsg() and recvmsg() calls of length 17 using systemtap
(script attached). At this point it gets even weirder:
On a working run (achieved by adding the strace), I get this:
BEGIN
tcp sendmsg(-129530279294592) len=17 - ./passt -s /tmp/passt-tests-niICXS/migrate/passt_2.socket -P /tmp/passt-tests-niICXS/migrate/passt_2.pid -f --vhost-user -p /home/dwg/src/passt/test/test_logs/passt_2.pcap --trace -t 10004 -u 10004
tcp sendmsg(-129489810388608) len=17 - ./pasta -p /home/dwg/src/passt/test/test_logs/pasta_1.pcap --trace --trace -l /tmp/pasta1.log -P /tmp/passt-tests-niICXS/migrate/pasta_1.pid -t 10001,10002,10004 -T 10003 -u 10001,10002,10004 -U 10003 --map-guest-addr 169.254.1.1 --config-net /home/dwg/src/passt/test/nstool hold /tmp/passt-tests-niICXS/migrate/ns1.hold
END
This mostly makes sense. passt_2 sends the expected outbound packet
to the namespace, then pasta_1 forwards it on to the host. I don't
know why I'm not seeing the recvmsg() from the socat server, though.
In the failing case, though, I get this:
BEGIN
tcp sendmsg(-129471392995840) len=17 - ./passt -s /tmp/passt-tests-CV71zo/migrate/passt_2.socket -P /tmp/passt-tests-CV71zo/migrate/passt_2.pid -f --vhost-user -p /home/dwg/src/passt/test/test_logs/passt_2.pcap --trace -t 10004 -u 10004
tcp recvmsg(-129476447043584) len=17 - ./pasta -p /home/dwg/src/passt/test/test_logs/pasta_1.pcap --trace --trace -l /tmp/pasta1.log -P /tmp/passt-tests-CV71zo/migrate/pasta_1.pid -t 10001,10002,10004 -T 10003 -u 10001,10002,10004 -U 10003 --map-guest-addr 169.254.1.1 --config-net /home/dwg/src/passt/test/nstool hold /tmp/passt-tests-CV71zo/migrate/ns1.hold
END
First event seems the same: passt_2 sending the outbound packet, as
expected. The second, though, is weird: the outer pasta seems to
receive the data from a socket, not from tap as we'd expect. That
might explain the other symptoms, if pasta received it on its socket,
it would send inwards.
But... I don't see pasta sending that "and from guest 2" inbound in
its packet capture. And, weirder still, although I see that recvmsg()
with systemtap, I don't see it in an strace of pasta.
...and.. that's where I'm at. Attaching my systemtap script and a
ball of logs. Hoping they're helpful :/.
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #1.2: sockloopback.stp --]
[-- Type: application/p21, Size: 482 bytes --]
[-- Attachment #1.3: weird.tar.xz --]
[-- Type: application/x-xz, Size: 26804 bytes --]
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next reply other threads:[~2025-02-07 6:27 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-07 6:26 David Gibson [this message]
2025-02-07 6:51 ` migrate/bidirectional debugging Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6WnmyqIgPkRgV-T@zatzit \
--to=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).