From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202502 header.b=OS1zfFfs; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 4DBA95A061B for ; Wed, 12 Feb 2025 08:07:29 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202502; t=1739344044; bh=bdW/qR/eWRedT3mfkRArTWCeNbU1egNcswdgNK8d6eE=; h=From:To:Cc:Subject:Date:From; b=OS1zfFfshPc22YGUEuYodZmfpTvNsOfsyy8oJ5yc4/6CRqvbW3aHoqh7GByf6DSES Gl6jFTyf4O65tz3GqejGv/cZK7AW1S9fKRFB8H/0I1dKpHFRuLN4eIfGA0T29G59Bs mhxkcBTtzu4ED5lS0PEcExWfSBxXHk6NuIlK+kM6OtGUUfnBG9ehkMZZ7Z3Sd2azFe y7T+bdbmOZW4kWTfhJ9HYuTE6NUmXR6kYXoBZJtqiaChP80syAW7o2J/V9jV7TOFmk 1HlY9JcMhtW54mnHSVT7N+1Thfk9hvt8z8n56jEIXHaG4snwNetWqzsMHYu3KNruei kIX1Eg2WmFQKg== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4Yt8Ww3Hs9z4wcb; Wed, 12 Feb 2025 18:07:24 +1100 (AEDT) From: David Gibson To: passt-dev@passt.top, Stefano Brivio Subject: [PATCH v18 0/9] State migration (kinda draft again) Date: Wed, 12 Feb 2025 18:07:12 +1100 Message-ID: <20250212070721.1746128-1-david@gibson.dropbear.id.au> X-Mailer: git-send-email 2.48.1 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-ID-Hash: EAGZBGWZUZWZTW73632LZ35GHC7GVM7I X-Message-ID-Hash: EAGZBGWZUZWZTW73632LZ35GHC7GVM7I X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: David Gibson X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: More debugging of bugs found with the rampstream test today. * Send queue transfer bug I spotted by inspection a nasty bug which would mean we never properly transfer the send queue (in repair mode we read it into the the wrong buffer, then transferred the right one). I think we only didn't hit this because in each of the cases I've seen the send queue has been empty. I think that makes sense: with high speed local transfers, there's probably more than enough time between when the guest is stopped and when we dump the queue for the sndbuf to drain completely to the peer. * Queue transfer error checking I've made the error checking when extracting and reloading the queue a bit more robust. * rampstream corruption bug I found the cause of the stream corruption bug. I don't think repair mode fully supports SO_PEEK_OFF semantics, but apparently it shares enough code with the normal recv() path that the peek offset applied, which meant (I think) we skipped the "already sent" portion of the rcv queue when dumping it. I've fixed this by disabling SO_PEEK_OFF (setting it to -1) before migration on the source. This needs some fixing to deal correctly with the case of a failed migration which resumes on the source. * rampstream unexpected EOF bug Unfortunately that wasn't the only bug. With the peek offset fixed, I no longer get stream corruption, but I still get unexpected EOFs on the rampstream_in test (at least with 64M [rw]mem_max). The receiving rampstream is getting an EOF because passt is sending an RST to the guest 14.1750: Flow 0 (TCP connection): TCP reset at tcp_sock_handler:2270 That happens because we get an EPOLLERR on the socket at some point after migration. From some earlier debugging hacks, I think that's an ECONNRESET specifically, but I haven't debugged further because I was focused on the corruption bug. David Gibson (3): migrate: Migrate guest observed addresses rampstream: Add utility to test for corruption of data streams debug Stefano Brivio (6): migrate: Skeleton of live migration logic Add interfaces and configuration bits for passt-repair vhost_user: Make source quit after reporting migration state tcp: Get bound address for connected inbound sockets too migrate: Migrate TCP flows test: Add migration tests Makefile | 14 +- conf.c | 43 +- contrib/selinux/passt.te | 2 +- epoll_type.h | 6 +- flow.c | 259 +++++++++- flow.h | 8 + flow_table.h | 6 +- migrate.c | 309 ++++++++++++ migrate.h | 51 ++ passt.1 | 11 + passt.c | 21 +- passt.h | 15 + repair.c | 218 +++++++++ repair.h | 16 + tap.c | 65 +-- tcp.c | 921 +++++++++++++++++++++++++++++++++++- tcp_conn.h | 99 ++++ test/.gitignore | 1 + test/Makefile | 5 +- test/lib/layout | 55 ++- test/lib/setup | 140 +++++- test/lib/test | 48 ++ test/migrate/basic | 59 +++ test/migrate/bidirectional | 64 +++ test/migrate/iperf3_bidir6 | 58 +++ test/migrate/iperf3_in4 | 50 ++ test/migrate/iperf3_in6 | 58 +++ test/migrate/iperf3_out4 | 50 ++ test/migrate/iperf3_out6 | 58 +++ test/migrate/rampstream_in | 60 +++ test/migrate/rampstream_out | 55 +++ test/passt.mbuto | 5 +- test/rampstream-check.sh | 3 + test/rampstream.c | 143 ++++++ test/run | 29 ++ util.c | 62 +++ util.h | 30 ++ vhost_user.c | 67 +-- virtio.h | 4 - vu_common.c | 49 +- vu_common.h | 2 +- 41 files changed, 3014 insertions(+), 205 deletions(-) create mode 100644 migrate.c create mode 100644 migrate.h create mode 100644 repair.c create mode 100644 repair.h create mode 100644 test/migrate/basic create mode 100644 test/migrate/bidirectional create mode 100644 test/migrate/iperf3_bidir6 create mode 100644 test/migrate/iperf3_in4 create mode 100644 test/migrate/iperf3_in6 create mode 100644 test/migrate/iperf3_out4 create mode 100644 test/migrate/iperf3_out6 create mode 100644 test/migrate/rampstream_in create mode 100644 test/migrate/rampstream_out create mode 100755 test/rampstream-check.sh create mode 100644 test/rampstream.c -- 2.48.1