public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH v18 0/9] State migration (kinda draft again)
@ 2025-02-12  7:07 David Gibson
  2025-02-12  7:07 ` [PATCH v18 1/9] migrate: Skeleton of live migration logic David Gibson
                   ` (8 more replies)
  0 siblings, 9 replies; 11+ messages in thread
From: David Gibson @ 2025-02-12  7:07 UTC (permalink / raw)
  To: passt-dev, Stefano Brivio; +Cc: David Gibson

More debugging of bugs found with the rampstream test today.

* Send queue transfer bug

I spotted by inspection a nasty bug which would mean we never properly
transfer the send queue (in repair mode we read it into the the wrong
buffer, then transferred the right one).  I think we only didn't hit
this because in each of the cases I've seen the send queue has been
empty.  I think that makes sense: with high speed local transfers,
there's probably more than enough time between when the guest is
stopped and when we dump the queue for the sndbuf to drain completely
to the peer.

* Queue transfer error checking

I've made the error checking when extracting and reloading the queue a
bit more robust.

* rampstream corruption bug

I found the cause of the stream corruption bug.  I don't think repair
mode fully supports SO_PEEK_OFF semantics, but apparently it shares
enough code with the normal recv() path that the peek offset applied,
which meant (I think) we skipped the "already sent" portion of the rcv
queue when dumping it.  I've fixed this by disabling SO_PEEK_OFF
(setting it to -1) before migration on the source.  This needs some
fixing to deal correctly with the case of a failed migration which
resumes on the source.

* rampstream unexpected EOF bug

Unfortunately that wasn't the only bug.  With the peek offset fixed, I
no longer get stream corruption, but I still get unexpected EOFs on
the rampstream_in test (at least with 64M [rw]mem_max).

The receiving rampstream is getting an EOF because passt is sending an
RST to the guest

14.1750: Flow 0 (TCP connection): TCP reset at tcp_sock_handler:2270

That happens because we get an EPOLLERR on the socket at some point
after migration.  From some earlier debugging hacks, I think that's an
ECONNRESET specifically, but I haven't debugged further because I was
focused on the corruption bug.

David Gibson (3):
  migrate: Migrate guest observed addresses
  rampstream: Add utility to test for corruption of data streams
  debug

Stefano Brivio (6):
  migrate: Skeleton of live migration logic
  Add interfaces and configuration bits for passt-repair
  vhost_user: Make source quit after reporting migration state
  tcp: Get bound address for connected inbound sockets too
  migrate: Migrate TCP flows
  test: Add migration tests

 Makefile                    |  14 +-
 conf.c                      |  43 +-
 contrib/selinux/passt.te    |   2 +-
 epoll_type.h                |   6 +-
 flow.c                      | 259 +++++++++-
 flow.h                      |   8 +
 flow_table.h                |   6 +-
 migrate.c                   | 309 ++++++++++++
 migrate.h                   |  51 ++
 passt.1                     |  11 +
 passt.c                     |  21 +-
 passt.h                     |  15 +
 repair.c                    | 218 +++++++++
 repair.h                    |  16 +
 tap.c                       |  65 +--
 tcp.c                       | 921 +++++++++++++++++++++++++++++++++++-
 tcp_conn.h                  |  99 ++++
 test/.gitignore             |   1 +
 test/Makefile               |   5 +-
 test/lib/layout             |  55 ++-
 test/lib/setup              | 140 +++++-
 test/lib/test               |  48 ++
 test/migrate/basic          |  59 +++
 test/migrate/bidirectional  |  64 +++
 test/migrate/iperf3_bidir6  |  58 +++
 test/migrate/iperf3_in4     |  50 ++
 test/migrate/iperf3_in6     |  58 +++
 test/migrate/iperf3_out4    |  50 ++
 test/migrate/iperf3_out6    |  58 +++
 test/migrate/rampstream_in  |  60 +++
 test/migrate/rampstream_out |  55 +++
 test/passt.mbuto            |   5 +-
 test/rampstream-check.sh    |   3 +
 test/rampstream.c           | 143 ++++++
 test/run                    |  29 ++
 util.c                      |  62 +++
 util.h                      |  30 ++
 vhost_user.c                |  67 +--
 virtio.h                    |   4 -
 vu_common.c                 |  49 +-
 vu_common.h                 |   2 +-
 41 files changed, 3014 insertions(+), 205 deletions(-)
 create mode 100644 migrate.c
 create mode 100644 migrate.h
 create mode 100644 repair.c
 create mode 100644 repair.h
 create mode 100644 test/migrate/basic
 create mode 100644 test/migrate/bidirectional
 create mode 100644 test/migrate/iperf3_bidir6
 create mode 100644 test/migrate/iperf3_in4
 create mode 100644 test/migrate/iperf3_in6
 create mode 100644 test/migrate/iperf3_out4
 create mode 100644 test/migrate/iperf3_out6
 create mode 100644 test/migrate/rampstream_in
 create mode 100644 test/migrate/rampstream_out
 create mode 100755 test/rampstream-check.sh
 create mode 100644 test/rampstream.c

-- 
2.48.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-02-12 19:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-12  7:07 [PATCH v18 0/9] State migration (kinda draft again) David Gibson
2025-02-12  7:07 ` [PATCH v18 1/9] migrate: Skeleton of live migration logic David Gibson
2025-02-12  7:07 ` [PATCH v18 2/9] migrate: Migrate guest observed addresses David Gibson
2025-02-12  7:07 ` [PATCH v18 3/9] Add interfaces and configuration bits for passt-repair David Gibson
2025-02-12  7:07 ` [PATCH v18 4/9] vhost_user: Make source quit after reporting migration state David Gibson
2025-02-12  7:07 ` [PATCH v18 5/9] tcp: Get bound address for connected inbound sockets too David Gibson
2025-02-12  7:07 ` [PATCH v18 6/9] migrate: Migrate TCP flows David Gibson
2025-02-12  7:07 ` [PATCH v18 7/9] rampstream: Add utility to test for corruption of data streams David Gibson
2025-02-12 19:44   ` Stefano Brivio
2025-02-12  7:07 ` [PATCH v18 8/9] test: Add migration tests David Gibson
2025-02-12  7:07 ` [PATCH v18 9/9] debug David Gibson

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).