From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202502 header.b=G70AsQ0t; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id D03EB5A061E for ; Wed, 26 Feb 2025 07:04:36 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202502; t=1740549864; bh=Yh0Siul3oRYtF1csVEBLpcz/667to2qLtBNkMVOvX54=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=G70AsQ0tjPlHzCfBxcjKYMRoSy5elLzA1VxAlFDgvJfkY61AZAxzop/O732JZY3Xp scOQIS1zDMamsIo7+2ZtfPGRTWwCnMteegLXSIXfjftMrANcZMWoDLxzT3g3fvYwxZ zJRZGSL33wXsgP+kfxOyNwxz+5sG2RJtHS/WzZtDT1dgjq/aT3BJI0tXvcAb1feAmE VuUZL5jQcvuG9GF69gIeU2x5X6kqfuoGPtyz4PrRzvUWk1Uh8C1TfFn7tEniBCScIh 0Elg+P/K3Po/eVy0pHX6B0b8Og7G82foa0RBCOsMCHASGVXr5EOLS1WPFgWyyzg6QM 35XtEEz66qtuQ== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4Z2kSm40wkz4x2g; Wed, 26 Feb 2025 17:04:24 +1100 (AEDT) From: David Gibson To: passt-dev@passt.top, Stefano Brivio Subject: [PATCH v3 3/3] migrate, tcp: Don't attempt to carry on migration after flow_alloc_cancel() Date: Wed, 26 Feb 2025 17:04:22 +1100 Message-ID: <20250226060422.48295-4-david@gibson.dropbear.id.au> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250226060422.48295-1-david@gibson.dropbear.id.au> References: <20250226060422.48295-1-david@gibson.dropbear.id.au> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-ID-Hash: 2ALIM4TGHVB32PIQR2N3GYBI7HHVRD4B X-Message-ID-Hash: 2ALIM4TGHVB32PIQR2N3GYBI7HHVRD4B X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: David Gibson X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: In tcp_flow_migrate_target(), if we're unable to create and bind the new socket, we print an error, cancel the flow and carry on. This seems to make sense based on our policy of generally letting the migration complete even if some or all flows are lost in the process. However, it can't quite work: the flow_alloc_cancel() means that the flows in the target's flow table are no longer one to one match to the flows which the source is sending data for. This means that data for later flows will be mismatched to a different flow. Most likely that will cause some nasty error later, but even worse it might appear to succeed but lead to data corruption due to incorrectly restoring one of the flows. Instead, we should leave the flow in the table until we've read all the data for it, *then* discard it. Technically removing the flow_alloc_cancel() would be enough for this: if tcp_flow_repair_socket() fails it leaves conn->sock == -1, which will cause the restore functions in tcp_flow_migrate_target_ext() to fail, discarding the flow. To make what's going on clearer, though, put an explicit test for a bad socket fd in tcp_flow_migrate_target_ext() and discard at that point. Signed-off-by: David Gibson --- tcp.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/tcp.c b/tcp.c index e3c0a53b..f713fa99 100644 --- a/tcp.c +++ b/tcp.c @@ -3376,7 +3376,6 @@ int tcp_flow_migrate_target(struct ctx *c, int fd) if ((rc = tcp_flow_repair_socket(c, conn))) { flow_err(flow, "Can't set up socket: %s, drop", strerror_(rc)); - flow_alloc_cancel(flow); return 0; } @@ -3452,6 +3451,10 @@ int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn, int fd return rc; } + if (conn->sock < 0) + /* We weren't able to create the socket, discard flow */ + goto fail; + if (tcp_flow_select_queue(s, TCP_SEND_QUEUE)) goto fail; -- 2.48.1