From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>, passt-dev@passt.top
Cc: David Gibson <david@gibson.dropbear.id.au>
Subject: [PATCH v10 08/10] migrate: Migrate TCP flows
Date: Thu, 6 Feb 2025 16:49:49 +1100 [thread overview]
Message-ID: <20250206054951.1041610-9-david@gibson.dropbear.id.au> (raw)
In-Reply-To: <20250206054951.1041610-1-david@gibson.dropbear.id.au>
From: Stefano Brivio <sbrivio@redhat.com>
This implements flow preparation on the source, transfer of data with
a format roughly inspired by struct tcp_tap_conn, and flow insertion
on the target, with all the appropriate window options, window
scaling, MSS, etc.
The target side is rather convoluted because we first need to create
sockets and switch them to repair mode, before we can apply options
that are *not* stored in the flow table. However, we don't want to
request repair mode for sockets one by one. So we need to do this in
several steps.
A hack in order to connect() on the "RARP" message should be easy to
enable, I left a couple of comments in that sense.
This is very much draft quality, but I tested the whole flow, and it
works for me. Window parameters and MSS match, too.
[dwg: Assorted cleanups]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Message-ID: <20250205230919.205302-6-sbrivio@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
flow.c | 201 +++++++++++++++++++++++
flow.h | 6 +
migrate.c | 11 ++
repair.c | 1 -
tcp.c | 461 +++++++++++++++++++++++++++++++++++++++++++++++++++++
tcp_conn.h | 60 +++++++
6 files changed, 739 insertions(+), 1 deletion(-)
diff --git a/flow.c b/flow.c
index a6fe6d1f..aa9bf419 100644
--- a/flow.c
+++ b/flow.c
@@ -19,6 +19,7 @@
#include "inany.h"
#include "flow.h"
#include "flow_table.h"
+#include "repair.h"
const char *flow_state_str[] = {
[FLOW_STATE_FREE] = "FREE",
@@ -874,6 +875,206 @@ void flow_defer_handler(const struct ctx *c, const struct timespec *now)
*last_next = FLOW_MAX;
}
+/**
+ * flow_migrate_source_pre_do() - Prepare/"unprepare" source flows for migration
+ * @c: Execution context
+ * @stage: Migration stage information (unused)
+ * @fd: Migration fd (unused)
+ * @rollback: If true, undo preparation
+ *
+ * Return: 0 on success, error code on failure
+ */
+static int flow_migrate_source_pre_do(struct ctx *c,
+ const struct migrate_stage *stage, int fd,
+ bool rollback)
+{
+ unsigned i, max_i;
+ int rc;
+
+ (void)stage;
+ (void)fd;
+
+ if (rollback) {
+ rc = 0;
+ i = FLOW_MAX;
+ goto rollback;
+ }
+
+ for (i = 0; i < FLOW_MAX; i++) { /* TODO: iterator with skip */
+ union flow *flow = &flowtab[i];
+
+ if (flow->f.state == FLOW_STATE_FREE)
+ i += flow->free.n - 1;
+ else if (flow->f.state == FLOW_STATE_ACTIVE &&
+ flow->f.type == FLOW_TCP)
+ rc = tcp_flow_repair_on(c, &flow->tcp);
+
+ if (rc) {
+ debug("Can't set repair mode for TCP flows, roll back");
+ goto rollback;
+ }
+ }
+
+ if ((rc = repair_flush(c))) { /* TODO: move to TCP logic */
+ debug("Can't set repair mode for TCP flows, roll back");
+ goto rollback;
+ }
+
+ return 0;
+
+rollback:
+ max_i = i;
+
+ for (i = 0; i < max_i; i++) { /* TODO: iterator with skip */
+ union flow *flow = &flowtab[i];
+
+ if (flow->f.state == FLOW_STATE_FREE)
+ i += flow->free.n - 1;
+ else if (flow->f.state == FLOW_STATE_ACTIVE &&
+ flow->f.type == FLOW_TCP)
+ tcp_flow_repair_off(c, &flow->tcp);
+ }
+
+ repair_flush(c);
+
+ return rc;
+}
+
+/**
+ * flow_migrate_source_pre() - Prepare source flows for migration
+ * @c: Execution context
+ * @stage: Migration stage information (unused)
+ * @fd: Migration fd (unused)
+ * @rollback: If true, undo preparation
+ *
+ * Return: 0 on success, error code on failure
+ */
+int flow_migrate_source_pre(struct ctx *c, const struct migrate_stage *stage,
+ int fd)
+{
+ return flow_migrate_source_pre_do(c, stage, fd, false);
+}
+
+/**
+ * flow_migrate_source() - Dump additional information and send data
+ * @c: Execution context
+ * @stage: Migration stage information (unused)
+ * @fd: Migration fd
+ *
+ * Return: 0 on success
+ */
+int flow_migrate_source(struct ctx *c, const struct migrate_stage *stage,
+ int fd)
+{
+ uint32_t count = 0;
+ unsigned i;
+ int rc;
+
+ for (i = 0; i < FLOW_MAX; i++) { /* TODO: iterator with skip */
+ union flow *flow = &flowtab[i];
+
+ if (flow->f.state == FLOW_STATE_FREE)
+ i += flow->free.n - 1;
+ else if (flow->f.state == FLOW_STATE_ACTIVE &&
+ flow->f.type == FLOW_TCP)
+ count++;
+ }
+
+ count = htonl(count);
+ rc = write_all_buf(fd, &count, sizeof(count));
+ if (rc) {
+ rc = errno;
+ err("Can't send flow count (%u): %s, abort",
+ ntohl(count), strerror_(errno));
+ return rc;
+ }
+ debug("Sending %u flows", ntohl(count));
+
+ /* Send information that can be stored in the flow table, first */
+ for (i = 0; i < FLOW_MAX; i++) { /* TODO: iterator with skip */
+ union flow *flow = &flowtab[i];
+
+ if (flow->f.state == FLOW_STATE_FREE) {
+ i += flow->free.n - 1;
+ } else if (flow->f.state == FLOW_STATE_ACTIVE &&
+ flow->f.type == FLOW_TCP) {
+ rc = tcp_flow_migrate_source(fd, &flow->tcp);
+ if (rc)
+ goto rollback;
+ }
+ /* TODO: other protocols */
+ }
+
+ /* And then "extended" data: the target needs to set repair mode on
+ * sockets before it can set this stuff, but it needs sockets (and
+ * flows) for that.
+ */
+ for (i = 0; i < FLOW_MAX; i++) { /* TODO: iterator with skip */
+ union flow *flow = &flowtab[i];
+
+ if (flow->f.state == FLOW_STATE_FREE) {
+ i += flow->free.n - 1;
+ } else if (flow->f.state == FLOW_STATE_ACTIVE &&
+ flow->f.type == FLOW_TCP) {
+ rc = tcp_flow_migrate_source_ext(fd, &flow->tcp);
+ if (rc)
+ goto rollback;
+ }
+ /* TODO: other protocols */
+ }
+
+ return 0;
+
+rollback:
+ flow_migrate_source_pre_do(c, stage, fd, true);
+ return rc;
+}
+
+/**
+ * flow_migrate_target() - Receive flows and insert in flow table
+ * @c: Execution context
+ * @stage: Migration stage information (unused)
+ * @fd: Migration fd
+ *
+ * Return: 0 on success
+ */
+int flow_migrate_target(struct ctx *c, const struct migrate_stage *stage,
+ int fd)
+{
+ uint32_t count;
+ unsigned i;
+ int rc;
+
+ (void)stage;
+
+ /* TODO: error handling */
+
+ if (read_all_buf(fd, &count, sizeof(count)))
+ return errno;
+
+ count = ntohl(count);
+ debug("Receiving %u flows", count);
+
+ /* TODO: flow header with type, instead? */
+ for (i = 0; i < count; i++) {
+ rc = tcp_flow_migrate_target(c, fd);
+ if (rc)
+ return rc;
+ }
+
+ repair_flush(c);
+
+ for (i = 0; i < count; i++) {
+ rc = tcp_flow_migrate_target_ext(c, flowtab + i, fd);
+ if (rc)
+ return rc;
+ }
+
+ repair_flush(c);
+
+ return 0;
+}
+
/**
* flow_init() - Initialise flow related data structures
*/
diff --git a/flow.h b/flow.h
index 24ba3ef0..a485c359 100644
--- a/flow.h
+++ b/flow.h
@@ -249,6 +249,12 @@ union flow;
void flow_init(void);
void flow_defer_handler(const struct ctx *c, const struct timespec *now);
+int flow_migrate_source_pre(struct ctx *c, const struct migrate_stage *stage,
+ int fd);
+int flow_migrate_source(struct ctx *c, const struct migrate_stage *stage,
+ int fd);
+int flow_migrate_target(struct ctx *c, const struct migrate_stage *stage,
+ int fd);
void flow_log_(const struct flow_common *f, int pri, const char *fmt, ...)
__attribute__((format(printf, 3, 4)));
diff --git a/migrate.c b/migrate.c
index 9deee7ad..d091e90d 100644
--- a/migrate.c
+++ b/migrate.c
@@ -80,6 +80,16 @@ static const struct migrate_stage stages_v1[] = {
.source = seen_addrs_source_v1,
.target = seen_addrs_target_v1,
},
+ {
+ .name = "flow pre",
+ .source = flow_migrate_source_pre,
+ .target = NULL,
+ },
+ {
+ .name = "flow",
+ .source = flow_migrate_source,
+ .target = flow_migrate_target,
+ },
{ 0 },
};
@@ -208,6 +218,7 @@ static int migrate_target(struct ctx *c, int fd)
void migrate_init(struct ctx *c)
{
c->device_state_result = -1;
+ repair_sock_init(c);
}
/**
diff --git a/repair.c b/repair.c
index 61519279..b170f400 100644
--- a/repair.c
+++ b/repair.c
@@ -171,7 +171,6 @@ int repair_flush(struct ctx *c)
*
* Return: 0 on success, negative error code on failure
*/
-/* cppcheck-suppress unusedFunction */
int repair_set(struct ctx *c, int s, int cmd)
{
int rc;
diff --git a/tcp.c b/tcp.c
index af6bd95a..885ba3af 100644
--- a/tcp.c
+++ b/tcp.c
@@ -299,6 +299,7 @@
#include "log.h"
#include "inany.h"
#include "flow.h"
+#include "repair.h"
#include "linux_dep.h"
#include "flow_table.h"
@@ -326,6 +327,11 @@
((conn)->events & (SOCK_FIN_RCVD | TAP_FIN_RCVD)))
#define CONN_HAS(conn, set) (((conn)->events & (set)) == (set))
+#define TCP_MIGRATE_SND_QUEUE_MAX (16 << 20)
+#define TCP_MIGRATE_RCV_QUEUE_MAX (16 << 20)
+uint8_t tcp_migrate_snd_queue[TCP_MIGRATE_SND_QUEUE_MAX];
+uint8_t tcp_migrate_rcv_queue[TCP_MIGRATE_RCV_QUEUE_MAX];
+
static const char *tcp_event_str[] __attribute((__unused__)) = {
"SOCK_ACCEPTED", "TAP_SYN_RCVD", "ESTABLISHED", "TAP_SYN_ACK_SENT",
@@ -2645,3 +2651,458 @@ void tcp_timer(struct ctx *c, const struct timespec *now)
if (c->mode == MODE_PASTA)
tcp_splice_refill(c);
}
+
+/**
+ * tcp_flow_repair_on() - Enable repair mode for a single TCP flow
+ * @c: Execution context
+ * @conn: Pointer to the TCP connection structure
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int tcp_flow_repair_on(struct ctx *c, const struct tcp_tap_conn *conn)
+{
+ int rc = 0;
+
+ if ((rc = repair_set(c, conn->sock, TCP_REPAIR_ON)))
+ err("Failed to set TCP_REPAIR");
+
+ return rc;
+}
+
+/**
+ * tcp_flow_repair_off() - Clear repair mode for a single TCP flow
+ * @c: Execution context
+ * @conn: Pointer to the TCP connection structure
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int tcp_flow_repair_off(struct ctx *c, const struct tcp_tap_conn *conn)
+{
+ int rc = 0;
+
+ if ((rc = repair_set(c, conn->sock, TCP_REPAIR_OFF)))
+ err("Failed to clear TCP_REPAIR");
+
+ return rc;
+}
+
+/**
+ * tcp_flow_repair_queues() - Dump or set sequences, read or write socket queues
+ * @s: Socket
+ * @snd_seq: Send sequence, set on return if @set == false, network order
+ * @snd_buf: Send queue buffer, read or written depending on @set
+ * @snd_len: Length of send queue buffer, network order
+ * @rcv_seq: Receive sequence, set on return if @set == false, network order
+ * @rcv_buf: Receive queue buffer, read or written depending on @set
+ * @rcv_len: Length of receive queue buffer, network order
+ * @set: Set if true, dump if false
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+static int tcp_flow_repair_queues(int s,
+ uint32_t *snd_seq, uint8_t *snd_buf,
+ uint32_t *snd_len,
+ uint32_t *rcv_seq, uint8_t *rcv_buf,
+ uint32_t *rcv_len, bool set)
+{
+ socklen_t vlen = sizeof(uint32_t);
+ int v;
+
+ /* TODO: proper error management and prints */
+
+ v = TCP_SEND_QUEUE;
+ if (setsockopt(s, SOL_TCP, TCP_REPAIR_QUEUE, &v, sizeof(v)))
+ return -errno;
+
+ if (set) {
+ uint8_t *p;
+
+ *snd_seq = ntohl(*snd_seq);
+ if (setsockopt(s, SOL_TCP, TCP_QUEUE_SEQ, snd_seq, vlen))
+ return -errno;
+ debug("Set sending sequence for socket %i to %u", s, *snd_seq);
+
+ debug("Writing socket %i send queue: %u bytes", s, *snd_len);
+ p = snd_buf;
+ while (*snd_len > 0) {
+ ssize_t rc = send(s, p, *snd_len, 0);
+ if (rc < 0)
+ return rc;
+
+ snd_len -= rc;
+ p += rc;
+ }
+ } else {
+ ssize_t rc;
+
+ if (getsockopt(s, SOL_TCP, TCP_QUEUE_SEQ, snd_seq, &vlen))
+ return -errno;
+ debug("Dumped sending sequence for socket %i: %u", s, *snd_seq);
+ *snd_seq = htonl(*snd_seq);
+
+ rc = recv(s, snd_buf, TCP_MIGRATE_SND_QUEUE_MAX, MSG_PEEK);
+ if (rc < 0 && errno != EAGAIN) { /* FIXME: EAGAIN expected? */
+ err_perror("Can't read send queue for socket %i", s);
+ return rc;
+ }
+
+ *snd_len = htonl((rc < 0) ? 0 : rc);
+ debug("Read socket %i send queue: %zi bytes", s, rc);
+ }
+
+ v = TCP_RECV_QUEUE;
+ if (setsockopt(s, SOL_TCP, TCP_REPAIR_QUEUE, &v, sizeof(v)))
+ return -errno;
+
+ if (set) {
+ uint8_t *p;
+
+ *rcv_seq = ntohl(*rcv_seq);
+ if (setsockopt(s, SOL_TCP, TCP_QUEUE_SEQ, rcv_seq, vlen))
+ return -errno;
+ debug("Set receive sequence for socket %i to %u", s, *rcv_seq);
+
+ debug("Writing socket %i receive queue: %u bytes", s, *rcv_len);
+ p = rcv_buf;
+ while (*rcv_len > 0) {
+ ssize_t rc = send(s, p, *rcv_len, 0);
+ if (rc < 0)
+ return rc;
+
+ rcv_len -= rc;
+ p += rc;
+ }
+ } else {
+ ssize_t rc;
+
+ if (getsockopt(s, SOL_TCP, TCP_QUEUE_SEQ, rcv_seq, &vlen))
+ return -errno;
+ debug("Dumped receive sequence for socket %i: %u", s, *rcv_seq);
+ *rcv_seq = htonl(*rcv_seq);
+
+ rc = recv(s, rcv_buf, TCP_MIGRATE_RCV_QUEUE_MAX, MSG_PEEK);
+ if (rc < 0 && errno != EAGAIN) { /* FIXME: EAGAIN expected? */
+ err_perror("Can't read receive queue for socket %i", s);
+ return rc;
+ }
+
+ *rcv_len = htonl((rc < 0) ? 0 : rc);
+ debug("Read socket %i receive queue: %zi bytes", s, rc);
+ }
+
+ return 0;
+}
+
+/**
+ * tcp_flow_repair_opt() - Dump or set repair "options" (MSS and window scale)
+ * @s: Socket
+ * @ws_to_sock: Window scaling factor from us, network order
+ * @ws_from_sock: Window scaling factor from peer, network order
+ * @mss: Maximum Segment Size, socket side, network order
+ * @set: Set if true, dump if false
+ *
+ * Return: 0 on success, TODO: negative error code on failure
+ */
+int tcp_flow_repair_opt(int s, uint8_t *ws_to_sock, uint8_t *ws_from_sock,
+ uint32_t *mss, bool set)
+{
+ struct tcp_info_linux tinfo;
+ struct tcp_repair_opt opts[2];
+ socklen_t sl;
+
+ opts[0].opt_code = TCPOPT_WINDOW;
+ opts[1].opt_code = TCPOPT_MAXSEG;
+
+ if (set) {
+ opts[0].opt_val = *ws_to_sock + (*ws_from_sock << 16);
+ opts[1].opt_val = ntohl(*mss);
+
+ sl = sizeof(opts);
+ setsockopt(s, SOL_TCP, TCP_REPAIR_OPTIONS, opts, sl);
+ } else {
+ sl = sizeof(tinfo);
+ getsockopt(s, SOL_TCP, TCP_INFO, &tinfo, &sl);
+
+ *ws_to_sock = tinfo.tcpi_snd_wscale;
+ *ws_from_sock = tinfo.tcpi_rcv_wscale;
+ *mss = htonl(tinfo.tcpi_snd_mss);
+ }
+
+ return 0;
+}
+
+/**
+ * tcp_flow_repair_wnd() - Dump or set window parameters
+ * @snd_wl1: See struct tcp_repair_window
+ * @snd_wnd: Socket-side sending window, network order
+ * @max_window: Window clamp, network order
+ * @rcv_wnd: Socket-side receive window, network order
+ * @rcv_wup: See struct tcp_repair_window
+ * @set: Set if true, dump if false
+ *
+ * Return: 0 on success, TODO: negative error code on failure
+ */
+int tcp_flow_repair_wnd(int s, uint32_t *snd_wl1, uint32_t *snd_wnd,
+ uint32_t *max_window, uint32_t *rcv_wnd,
+ uint32_t *rcv_wup, bool set)
+{
+ struct tcp_repair_window wnd;
+ socklen_t sl = sizeof(wnd);
+
+ if (set) {
+ wnd.snd_wl1 = ntohl(*snd_wl1);
+ wnd.snd_wnd = ntohl(*snd_wnd);
+ wnd.max_window = ntohl(*max_window);
+ wnd.rcv_wnd = ntohl(*rcv_wnd);
+ wnd.rcv_wup = ntohl(*rcv_wup);
+
+ setsockopt(s, IPPROTO_TCP, TCP_REPAIR_WINDOW, &wnd, sl);
+ } else {
+ getsockopt(s, IPPROTO_TCP, TCP_REPAIR_WINDOW, &wnd, &sl);
+
+ *snd_wl1 = htonl(wnd.snd_wl1);
+ *snd_wnd = htonl(wnd.snd_wnd);
+ *max_window = htonl(wnd.max_window);
+ *rcv_wnd = htonl(wnd.rcv_wnd);
+ *rcv_wup = htonl(wnd.rcv_wup);
+ }
+
+ return 0;
+}
+
+/**
+ * tcp_flow_migrate_source() - Send data (flow table part) for a single flow
+ * @c: Execution context
+ * @fd: Descriptor for state migration
+ * @conn: Pointer to the TCP connection structure
+ */
+int tcp_flow_migrate_source(int fd, const struct tcp_tap_conn *conn)
+{
+ struct tcp_tap_transfer t = {
+ .retrans = conn->retrans,
+ .ws_from_tap = conn->ws_from_tap,
+ .ws_to_tap = conn->ws_to_tap,
+ .events = conn->events,
+
+ .tap_mss = htonl(MSS_GET(conn)),
+
+ .sndbuf = htonl(conn->sndbuf),
+
+ .flags = conn->flags,
+ .seq_dup_ack_approx = conn->seq_dup_ack_approx,
+
+ .wnd_from_tap = htons(conn->wnd_from_tap),
+ .wnd_to_tap = htons(conn->wnd_to_tap),
+
+ .seq_to_tap = htonl(conn->seq_to_tap),
+ .seq_ack_from_tap = htonl(conn->seq_ack_from_tap),
+ .seq_from_tap = htonl(conn->seq_from_tap),
+ .seq_ack_to_tap = htonl(conn->seq_ack_to_tap),
+ .seq_init_from_tap = htonl(conn->seq_init_from_tap),
+ };
+
+ memcpy(&t.pif, conn->f.pif, sizeof(t.pif));
+ memcpy(&t.side, conn->f.side, sizeof(t.side));
+
+ if (write_all_buf(fd, &t, sizeof(t)))
+ return errno;
+
+ return 0;
+}
+
+/**
+ * tcp_flow_migrate_source_ext() - Send extended data for a single flow
+ * @fd: Descriptor for state migration
+ * @conn: Pointer to the TCP connection structure
+ */
+int tcp_flow_migrate_source_ext(int fd, const struct tcp_tap_conn *conn)
+{
+ struct tcp_tap_transfer_ext t;
+ int s = conn->sock;
+ int rc;
+
+ rc = tcp_flow_repair_queues(s,
+ &t.sock_seq_snd, tcp_migrate_snd_queue,
+ &t.sndlen,
+ &t.sock_seq_rcv, tcp_migrate_rcv_queue,
+ &t.rcvlen, false);
+ if (rc)
+ return rc;
+
+ tcp_flow_repair_opt(s, &t.ws_to_sock, &t.ws_from_sock, &t.sock_mss,
+ false);
+
+ tcp_flow_repair_wnd(s, &t.sock_snd_wl1, &t.sock_snd_wnd,
+ &t.sock_max_window, &t.sock_rcv_wnd,
+ &t.sock_rcv_wup, false);
+
+ if (write_all_buf(fd, &t, sizeof(t)))
+ return errno;
+
+ if (write_all_buf(fd, tcp_migrate_snd_queue, ntohl(t.sndlen)))
+ return errno;
+
+ if (write_all_buf(fd, tcp_migrate_rcv_queue, ntohl(t.rcvlen)))
+ return errno;
+
+ return 0;
+}
+
+/**
+ * tcp_flow_repair_socket() - Open and bind socket, request repair mode
+ * @c: Execution context
+ * @conn: Pointer to the TCP connection structure
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int tcp_flow_repair_socket(struct ctx *c, struct tcp_tap_conn *conn)
+{
+ sa_family_t af = CONN_V4(conn) ? AF_INET : AF_INET6;
+ const struct flowside *sockside = HOSTFLOW(conn);
+ struct sockaddr_in a;
+ int rc;
+
+ a = (struct sockaddr_in){ af, htons(sockside->oport), { 0 }, { 0 } };
+
+ if ((conn->sock = socket(af, SOCK_STREAM, IPPROTO_TCP)) < 0)
+ return -errno;
+
+ /* On the same host, source socket can be in TIME_WAIT */
+ setsockopt(conn->sock, SOL_SOCKET, SO_REUSEADDR,
+ &((int){ 1 }), sizeof(int));
+
+ /* TODO: switch to tcp_bind_outbound(c, conn, conn->sock); ...? */
+ if (bind(conn->sock, (struct sockaddr *)&a, sizeof(a)) < 0) {
+ close(conn->sock);
+ conn->sock = -1;
+ return -errno;
+ }
+
+ rc = tcp_flow_repair_on(c, conn);
+ if (rc) {
+ close(conn->sock);
+ conn->sock = -1;
+ return rc;
+ }
+
+ return 0;
+}
+
+/**
+ * tcp_flow_repair_connect() - Connect socket in repair mode, then turn it off
+ * @c: Execution context
+ * @conn: Pointer to the TCP connection structure
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+static int tcp_flow_repair_connect(const struct ctx *c, struct tcp_tap_conn *conn)
+{
+ const struct flowside *tgt = &conn->f.side[TGTSIDE];
+ int rc;
+
+ rc = flowside_connect(c, conn->sock, PIF_HOST, tgt);
+ if (rc) {
+ err("Failed to connect repaired socket: %s", strerror_(errno));
+ return rc;
+ }
+
+ conn->in_epoll = 0;
+ conn->timer = -1;
+ tcp_epoll_ctl(c, conn);
+
+ return 0;
+
+ /* HACK RARP: return tcp_flow_repair_off(c, conn); */
+}
+
+/**
+ * tcp_flow_migrate_target() - Receive data (flow table part) for flow, insert
+ * @c: Execution context
+ * @fd: Descriptor for state migration
+ */
+int tcp_flow_migrate_target(struct ctx *c, int fd)
+{
+ struct tcp_tap_transfer t;
+ struct tcp_tap_conn *conn;
+ union flow *flow;
+
+ if (!(flow = flow_alloc()))
+ return -ENOMEM;
+
+ if (read_all_buf(fd, &t, sizeof(t)))
+ return errno;
+
+ flow->f.state = FLOW_STATE_TGT;
+ memcpy(&flow->f.pif, &t.pif, sizeof(flow->f.pif));
+ memcpy(&flow->f.side, &t.side, sizeof(flow->f.side));
+ conn = FLOW_SET_TYPE(flow, FLOW_TCP, tcp);
+
+ conn->retrans = t.retrans;
+ conn->ws_from_tap = t.ws_from_tap;
+ conn->ws_to_tap = t.ws_to_tap;
+ conn->events = t.events;
+
+ conn->sndbuf = htonl(t.sndbuf);
+
+ conn->flags = t.flags;
+ conn->seq_dup_ack_approx = t.seq_dup_ack_approx;
+
+ MSS_SET(conn, ntohl(t.tap_mss));
+
+ conn->wnd_from_tap = ntohs(t.wnd_from_tap);
+ conn->wnd_to_tap = ntohs(t.wnd_to_tap);
+
+ conn->seq_to_tap = ntohl(t.seq_to_tap);
+ conn->seq_ack_from_tap = ntohl(t.seq_ack_from_tap);
+ conn->seq_from_tap = ntohl(t.seq_from_tap);
+ conn->seq_ack_to_tap = ntohl(t.seq_ack_to_tap);
+ conn->seq_init_from_tap = ntohl(t.seq_init_from_tap);
+
+ tcp_flow_repair_socket(c, conn);
+
+ flow_hash_insert(c, TAP_SIDX(conn));
+ FLOW_ACTIVATE(conn);
+
+ return 0;
+}
+
+/**
+ * tcp_flow_migrate_target_ext() - Receive extended data for flow, set, connect
+ * @c: Execution context
+ * @flow: Existing flow for this connection data
+ * @fd: Descriptor for state migration
+ */
+int tcp_flow_migrate_target_ext(struct ctx *c, union flow *flow, int fd)
+{
+ struct tcp_tap_conn *conn = &flow->tcp;
+ struct tcp_tap_transfer_ext t;
+ int s = conn->sock;
+
+ if (read_all_buf(fd, &t, sizeof(t)))
+ return errno;
+
+ if (read_all_buf(fd, tcp_migrate_snd_queue, ntohl(t.sndlen)))
+ return errno;
+
+ if (read_all_buf(fd, tcp_migrate_rcv_queue, ntohl(t.rcvlen)))
+ return errno;
+
+ tcp_flow_repair_queues(s,
+ &t.sock_seq_snd, tcp_migrate_snd_queue,
+ &t.sndlen,
+ &t.sock_seq_rcv, tcp_migrate_rcv_queue,
+ &t.rcvlen, true);
+
+ tcp_flow_repair_connect(c, conn);
+
+ tcp_flow_repair_opt(s, &t.ws_to_sock, &t.ws_from_sock, &t.sock_mss,
+ true);
+
+ tcp_flow_repair_wnd(s, &t.sock_snd_wl1, &t.sock_snd_wnd,
+ &t.sock_max_window, &t.sock_rcv_wnd,
+ &t.sock_rcv_wup, true);
+
+ tcp_flow_repair_off(c, conn);
+
+ return 0;
+}
diff --git a/tcp_conn.h b/tcp_conn.h
index d3426808..aba8e914 100644
--- a/tcp_conn.h
+++ b/tcp_conn.h
@@ -96,6 +96,60 @@ struct tcp_tap_conn {
uint32_t seq_init_from_tap;
};
+/**
+ * struct tcp_tap_transfer - TCP data to migrate (flow table part only)
+ * TODO
+ */
+struct tcp_tap_transfer {
+ uint8_t pif[SIDES];
+ struct flowside side[SIDES];
+
+ uint8_t retrans;
+ uint8_t ws_from_tap;
+ uint8_t ws_to_tap;
+ uint8_t events;
+
+ uint32_t tap_mss;
+
+ uint32_t sndbuf;
+
+ uint8_t flags;
+ uint8_t seq_dup_ack_approx;
+
+ uint16_t wnd_from_tap;
+ uint16_t wnd_to_tap;
+
+ uint32_t seq_to_tap;
+ uint32_t seq_ack_from_tap;
+ uint32_t seq_from_tap;
+ uint32_t seq_ack_to_tap;
+ uint32_t seq_init_from_tap;
+} __attribute__((packed, aligned(__alignof__(uint32_t))));
+
+/**
+ * struct tcp_tap_transfer_ext - TCP data to migrate (not stored in flow table)
+ * TODO
+ */
+struct tcp_tap_transfer_ext {
+ uint32_t sock_seq_snd;
+ uint32_t sock_seq_rcv;
+
+ uint32_t sndlen;
+ uint32_t rcvlen;
+
+ uint32_t sock_mss;
+
+ /* We can't just use struct tcp_repair_window: we need network order */
+ uint32_t sock_snd_wl1;
+ uint32_t sock_snd_wnd;
+ uint32_t sock_max_window;
+ uint32_t sock_rcv_wnd;
+ uint32_t sock_rcv_wup;
+
+ uint8_t ws_to_sock;
+ uint8_t ws_from_sock;
+} __attribute__((packed, aligned(__alignof__(uint32_t))));
+
/**
* struct tcp_splice_conn - Descriptor for a spliced TCP connection
* @f: Generic flow information
@@ -140,6 +194,12 @@ extern int init_sock_pool4 [TCP_SOCK_POOL_SIZE];
extern int init_sock_pool6 [TCP_SOCK_POOL_SIZE];
bool tcp_flow_defer(const struct tcp_tap_conn *conn);
+int tcp_flow_repair_on(struct ctx *c, const struct tcp_tap_conn *conn);
+int tcp_flow_repair_off(struct ctx *c, const struct tcp_tap_conn *conn);
+int tcp_flow_migrate_source(int fd, const struct tcp_tap_conn *conn);
+int tcp_flow_migrate_source_ext(int fd, const struct tcp_tap_conn *conn);
+int tcp_flow_migrate_target(struct ctx *c, int fd);
+int tcp_flow_migrate_target_ext(struct ctx *c, union flow *flow, int fd);
bool tcp_splice_flow_defer(struct tcp_splice_conn *conn);
void tcp_splice_timer(const struct ctx *c, struct tcp_splice_conn *conn);
int tcp_conn_pool_sock(int pool[]);
--
@@ -96,6 +96,60 @@ struct tcp_tap_conn {
uint32_t seq_init_from_tap;
};
+/**
+ * struct tcp_tap_transfer - TCP data to migrate (flow table part only)
+ * TODO
+ */
+struct tcp_tap_transfer {
+ uint8_t pif[SIDES];
+ struct flowside side[SIDES];
+
+ uint8_t retrans;
+ uint8_t ws_from_tap;
+ uint8_t ws_to_tap;
+ uint8_t events;
+
+ uint32_t tap_mss;
+
+ uint32_t sndbuf;
+
+ uint8_t flags;
+ uint8_t seq_dup_ack_approx;
+
+ uint16_t wnd_from_tap;
+ uint16_t wnd_to_tap;
+
+ uint32_t seq_to_tap;
+ uint32_t seq_ack_from_tap;
+ uint32_t seq_from_tap;
+ uint32_t seq_ack_to_tap;
+ uint32_t seq_init_from_tap;
+} __attribute__((packed, aligned(__alignof__(uint32_t))));
+
+/**
+ * struct tcp_tap_transfer_ext - TCP data to migrate (not stored in flow table)
+ * TODO
+ */
+struct tcp_tap_transfer_ext {
+ uint32_t sock_seq_snd;
+ uint32_t sock_seq_rcv;
+
+ uint32_t sndlen;
+ uint32_t rcvlen;
+
+ uint32_t sock_mss;
+
+ /* We can't just use struct tcp_repair_window: we need network order */
+ uint32_t sock_snd_wl1;
+ uint32_t sock_snd_wnd;
+ uint32_t sock_max_window;
+ uint32_t sock_rcv_wnd;
+ uint32_t sock_rcv_wup;
+
+ uint8_t ws_to_sock;
+ uint8_t ws_from_sock;
+} __attribute__((packed, aligned(__alignof__(uint32_t))));
+
/**
* struct tcp_splice_conn - Descriptor for a spliced TCP connection
* @f: Generic flow information
@@ -140,6 +194,12 @@ extern int init_sock_pool4 [TCP_SOCK_POOL_SIZE];
extern int init_sock_pool6 [TCP_SOCK_POOL_SIZE];
bool tcp_flow_defer(const struct tcp_tap_conn *conn);
+int tcp_flow_repair_on(struct ctx *c, const struct tcp_tap_conn *conn);
+int tcp_flow_repair_off(struct ctx *c, const struct tcp_tap_conn *conn);
+int tcp_flow_migrate_source(int fd, const struct tcp_tap_conn *conn);
+int tcp_flow_migrate_source_ext(int fd, const struct tcp_tap_conn *conn);
+int tcp_flow_migrate_target(struct ctx *c, int fd);
+int tcp_flow_migrate_target_ext(struct ctx *c, union flow *flow, int fd);
bool tcp_splice_flow_defer(struct tcp_splice_conn *conn);
void tcp_splice_timer(const struct ctx *c, struct tcp_splice_conn *conn);
int tcp_conn_pool_sock(int pool[]);
--
2.48.1
next prev parent reply other threads:[~2025-02-06 5:50 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-06 5:49 [PATCH v10 00/10] Draft state migration David Gibson
2025-02-06 5:49 ` [PATCH v10 01/10] debug: Add tcpdump to mbuto.img David Gibson
2025-02-06 9:14 ` Stefano Brivio
2025-02-06 5:49 ` [PATCH v10 02/10] migrate: Skeleton of live migration logic David Gibson
2025-02-06 5:49 ` [PATCH v10 03/10] fixup: Fix errors in modes that don't support migration David Gibson
2025-02-06 5:49 ` [PATCH v10 04/10] migrate: Migrate guest observed addresses David Gibson
2025-02-06 5:49 ` [PATCH v10 05/10] Add interfaces and configuration bits for passt-repair David Gibson
2025-02-06 5:49 ` [PATCH v10 06/10] vhost_user: Make source quit after reporting migration state David Gibson
2025-02-06 5:49 ` [PATCH v10 07/10] migrate: Hack for late migration fixups David Gibson
2025-02-06 5:49 ` David Gibson [this message]
2025-02-06 5:49 ` [PATCH v10 09/10] fixup: Reset SO_PEEK_OFF value after incoming migration David Gibson
2025-02-06 5:49 ` [PATCH v10 10/10] test: Add migrate/basic tests David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250206054951.1041610-9-david@gibson.dropbear.id.au \
--to=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).