From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202502 header.b=EQAbvqeq; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 89FBF5A0628 for ; Thu, 13 Feb 2025 05:58:33 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202502; t=1739422696; bh=JoMgABMqYdteuXRA4Lr/ilm6N0qas9bdDlsbOX980bs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=EQAbvqeqGCXbxcyLXEmBbJh5qtYqQlb7XURMaISpr8TWkxgJDCr9QIpslhO8eBq7w mVDyns1BnwAbgc3vT76uLSocDiABXZh9u/alqqo96VNh8EpjjrlaPKWXo/I3e4bkmd aWevRwKwGkMA2ONlj6TzJNLmIxm6wonRPurtIsFy+6lTko0eKBc7kbpLySKyabCy7Z 2uqLD0+z/XSY+WyuFSA9vbEbr86kWyJFWx9tTVVHzpv/0uAtGPVqIdIejl1GorfIdu KHlwn3W+/OVB884SgkBubusHxiY+RqdvM3HzZtNiP6t95HReROxYo/bbX6Tje5uO/s yLgbTXo6HsTTw== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4YtjcS0lZmz4x3q; Thu, 13 Feb 2025 15:58:16 +1100 (AEDT) From: David Gibson To: passt-dev@passt.top, Stefano Brivio Subject: [PATCH v20 5/5] fixup: TCP_REPAIR_WINDOW before send unsent Date: Thu, 13 Feb 2025 15:58:13 +1100 Message-ID: <20250213045813.3767488-6-david@gibson.dropbear.id.au> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250213045813.3767488-1-david@gibson.dropbear.id.au> References: <20250213045813.3767488-1-david@gibson.dropbear.id.au> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-ID-Hash: 2H6XOLHRSY2GA4VMFCQYAPCLJFDPHUDO X-Message-ID-Hash: 2H6XOLHRSY2GA4VMFCQYAPCLJFDPHUDO X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: David Gibson X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: We, like libsoccr handle restoring the send queue in two pieces. The "already sent" piece is written in repair mode, to repopulate the kernel sndbuf without actually sending new packets to the peer. The "not sent" piece is written out of repair mode, so that we both put it into sndbuf *and* actually send it out. To do that we temporarily drop out of repair mode. However we do so before we've called TCP_REPAIR_WINDOW, meaning we're doing real send()s on a real, non-repair socket that has bad window information. That seems bad. Despite it differing from libsoccr, move the sending of the non sent queued data until after the *final* repair off. I strongly suspect that both we and libsoccr were only (kind of) getting away with this because notsent is usually 0. This seems to fix an intermittent hang I was seeing on migrate/iperf3_bidir6. I was seeing that perhaps 1 time in 3, or 1 time in 5 with DEBUG=1. I did observe a nonzero notsent the one time I reproduced after I knew what I was looking for. Signed-off-by: David Gibson --- tcp.c | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/tcp.c b/tcp.c index 39035e14..3f41cf0b 100644 --- a/tcp.c +++ b/tcp.c @@ -3532,25 +3532,22 @@ int tcp_flow_migrate_target_ext(struct ctx *c, union flow *flow, int fd) shutdown(s, SHUT_WR); } + if ((rc = tcp_flow_repair_wnd(s, &t))) + return rc; + + tcp_flow_repair_off(c, conn); + repair_flush(c); + if (t.notsent) { - tcp_flow_repair_off(c, conn); - repair_flush(c); + err("socket %i, t.sndq=%u t.notsent=%u", + s, t.sndq, t.notsent); if ((rc = tcp_flow_repair_queue(s, t.notsent, tcp_migrate_snd_queue + (t.sndq - t.notsent)))) return rc; - - tcp_flow_repair_on(c, conn); - repair_flush(c); } - if ((rc = tcp_flow_repair_wnd(s, &t))) - return rc; - - tcp_flow_repair_off(c, conn); - repair_flush(c); - /* If we sent a FIN but it wasn't acknowledged yet (TCP_FIN_WAIT1), send * it out, because we don't know if we already sent it. * -- 2.48.1