From: Stefano Brivio <sbrivio@redhat.com>
To: passt-dev@passt.top
Cc: Laurent Vivier <lvivier@redhat.com>,
David Gibson <david@gibson.dropbear.id.au>
Subject: [PATCH v3 00/20] Draft, incomplete series introducing state migration
Date: Fri, 31 Jan 2025 20:39:33 +0100 [thread overview]
Message-ID: <20250131193953.3034031-1-sbrivio@redhat.com> (raw)
...and finally connections survive migration from source to target,
at least the ones originating from the (source) guest. I didn't try
the other way around, small tweaks might be needed. Tested as follows,
roughly as instructed by Laurent:
Source:
$ ./passt --vhost-user
$ qemu-system-x86_64 -machine accel=kvm -cpu host -kernel ... \
-initrd mbuto.img -nographic -serial mon:stdio -nodefaults \
-append "console=ttyS0" \
-chardev socket,id=chr0,path=/tmp/passt_1.socket \
-netdev vhost-user,id=netdev0,chardev=chr0 \
-device virtio-net,netdev=netdev0 \
-object memory-backend-memfd,id=memfd0,share=on,size=$((2 * 1024 * 1024 * 1024)) \
-numa node,memdev=memfd0 -m 2G
# ./passt-repair /tmp/passt_1.socket.repair
Target (same host):
$ ./passt --vhost-user
$ qemu-system-x86_64 -machine accel=kvm -cpu host -kernel ... \
-initrd mbuto.img -nographic -serial mon:stdio -nodefaults \
-append "console=ttyS0" \
-chardev socket,id=chr0,path=/tmp/passt_2.socket \
-netdev vhost-user,id=netdev0,chardev=chr0 \
-device virtio-net,netdev=netdev0 \
-object memory-backend-memfd,id=memfd0,share=on,size=$((2 * 1024 * 1024 * 1024)) \
-numa node,memdev=memfd0 -m 2G \
-incoming tcp:0:4444
# ./passt-repair /tmp/passt_2.socket.repair
Test server:
$ nc -l 9091
Once the guest boots:
# ip link set dev eth0 up
# dhclient eth0
# socat STDIN TCP:$DEFAULT_GW:9091
abcd
^a-c
migrate tcp:0:4444
Then continue typing in the target guest:
efgh
The purpose of this is mostly to show the complete flow, but it needs
a number of reworks.
What's missing (letting aside pending packet queues for a moment,
those are not strictly needed):
1. tests based on the two_guests layout/setup. Even with reverse-search
in the shell, this is getting quite hard on wrists. I guess we can
start QEMU with -monitor unix:mon.sock,server,nowait and
send the 'migrate' command via socat STDIN UNIX-CONNECT:mon.sock
2. dump and transfer of *socket-side* MSS and window scale (I used
hardcoded values): this needs more storage, so it needs to be
transferred outside the flow table
3. dump, transfer and restore of TCP_REPAIR_WINDOW parameters (not
strictly needed, but easy to add once we have appropriate storage)
4. perhaps some small bits of implementation for socket-originated
connections (I tested only guest-originated ones so far)
5. UDP and ICMP flows (ping already happens to "survive" nicely, by
the way)
6. man page for passt-repair, and man page changes for everything
7. packaging and Linux Security Module changes for passt-repair
8. error handling here and there, and repair rollback/migration abort
9. setting original receive/send buffer sizes and socket options
(TCP_NODELAY)
What clearly needs changes:
a. we can't dump more stuff to the flow table, because we would exceed
128 bytes. We need to copy everything from tcp_tap_conn except for:
- state in flow_common
- in_epoll
- sock
- timer
and on top of this we need:
- values for TCPOPT_WINDOW and TCPOPT_MAXSEG
- struct tcp_repair_window
somewhat unexpectedly, this is actually bigger than a flow table
entry. In any case, we need to implement a stream/per-entry
migration right away.
b. at this point, I guess we can throw the header away, and just keep
a magic (0xB1BB1D1B0BB1D1B0 has a missing 0 at the end but, well,
https://en.wikipedia.org/wiki/Bibbidi-Bobbidi-Boo is the Magic
Song: can we keep it?) and a version number. The rest, let's go
with big/network endianness I'd say, and 64-bit time_t
c. the declarative data thing is very convenient but we need to fetch
stuff from struct ctx, as shown by the hash_secret example. What's
very convenient of this approach is the iovec / writev() / readv()
idea. I'm not sure if we can maintain that convenience, though
Patches that could be applied regardless of this series to make it
more manageable:
1/20 tcp: Always pass NULL event with EPOLL_CTL_DEL
2/20 util: Rename and make global vu_remove_watch()
6/20 util: Add read_remainder() and read_all_buf()
8/20 Introduce passt-repair
16/20 vhost_user: Turn vhost-user message reports to trace()
17/20 vhost_user: Make source quit after reporting migration state
18/20 tcp: Get our socket port using getsockname() when connecting from guest
19/20 tcp: Add HOSTSIDE(x), HOSTFLOW(x) macros
Patches that we can throw away with the changes outlined above:
3/20 icmp, udp: Pad time_t timestamp to 64-bit to ease state migration
4/20 flow, flow_table: Pad flow table entries to 128 bytes, hash entries to 32 bits
15/20 flow, flow_table: Export declaration of hash table
David Gibson (6):
tcp: Always pass NULL event with EPOLL_CTL_DEL
util: Rename and make global vu_remove_watch()
migrate: vu_migrate_{source,target}() aren't actually vu speciic
migrate: Move repair_sock_init() to vu_init()
migrate: Make more handling common rather than vhost-user specific
migrate: Don't handle the migration channel through epoll
Stefano Brivio (14):
icmp, udp: Pad time_t timestamp to 64-bit to ease state migration
flow, flow_table: Pad flow table entries to 128 bytes, hash entries to
32 bits
flow_table: Use size in extern declaration for flowtab
util: Add read_remainder() and read_all_buf()
Introduce facilities for guest migration on top of vhost-user
infrastructure
Introduce passt-repair
Add interfaces and configuration bits for passt-repair
flow, tcp: Basic pre-migration source handler to dump sequence numbers
flow, flow_table: Export declaration of hash table
vhost_user: Turn vhost-user message reports to trace()
vhost_user: Make source quit after reporting migration state
tcp: Get our socket port using getsockname() when connecting from
guest
tcp: Add HOSTSIDE(x), HOSTFLOW(x) macros
Implement target side of migration
.gitignore | 1 +
Makefile | 24 +--
conf.c | 44 +++++-
epoll_type.h | 6 +-
flow.c | 97 +++++++++++-
flow.h | 20 ++-
flow_table.h | 22 ++-
icmp.c | 2 +-
icmp_flow.h | 6 +-
migrate.c | 408 +++++++++++++++++++++++++++++++++++++++++++++++++
migrate.h | 84 ++++++++++
passt-repair.c | 117 ++++++++++++++
passt.1 | 11 ++
passt.c | 17 ++-
passt.h | 17 +++
repair.c | 193 +++++++++++++++++++++++
repair.h | 16 ++
tap.c | 64 +-------
tcp.c | 198 +++++++++++++++++++++++-
tcp_conn.h | 7 +
tcp_internal.h | 10 +-
tcp_splice.c | 4 +-
udp_flow.c | 2 +-
udp_flow.h | 6 +-
util.c | 155 +++++++++++++++++++
util.h | 4 +
vhost_user.c | 94 +++---------
virtio.h | 4 -
vu_common.c | 62 +++-----
vu_common.h | 2 +-
30 files changed, 1469 insertions(+), 228 deletions(-)
create mode 100644 migrate.c
create mode 100644 migrate.h
create mode 100644 passt-repair.c
create mode 100644 repair.c
create mode 100644 repair.h
--
2.43.0
next reply other threads:[~2025-01-31 19:39 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-31 19:39 Stefano Brivio [this message]
2025-01-31 19:39 ` [PATCH v3 01/20] tcp: Always pass NULL event with EPOLL_CTL_DEL Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 02/20] util: Rename and make global vu_remove_watch() Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 03/20] icmp, udp: Pad time_t timestamp to 64-bit to ease state migration Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 04/20] flow, flow_table: Pad flow table entries to 128 bytes, hash entries to 32 bits Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 05/20] flow_table: Use size in extern declaration for flowtab Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 06/20] util: Add read_remainder() and read_all_buf() Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 07/20] Introduce facilities for guest migration on top of vhost-user infrastructure Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 08/20] Introduce passt-repair Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 09/20] Add interfaces and configuration bits for passt-repair Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 10/20] flow, tcp: Basic pre-migration source handler to dump sequence numbers Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 11/20] migrate: vu_migrate_{source,target}() aren't actually vu speciic Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 12/20] migrate: Move repair_sock_init() to vu_init() Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 13/20] migrate: Make more handling common rather than vhost-user specific Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 14/20] migrate: Don't handle the migration channel through epoll Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 15/20] flow, flow_table: Export declaration of hash table Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 16/20] vhost_user: Turn vhost-user message reports to trace() Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 17/20] vhost_user: Make source quit after reporting migration state Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 18/20] tcp: Get our socket port using getsockname() when connecting from guest Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 19/20] tcp: Add HOSTSIDE(x), HOSTFLOW(x) macros Stefano Brivio
2025-01-31 19:39 ` [PATCH v3 20/20] Implement target side of migration Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250131193953.3034031-1-sbrivio@redhat.com \
--to=sbrivio@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=lvivier@redhat.com \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).