From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202502 header.b=DF9+vl61; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 035F55A0637 for ; Thu, 13 Feb 2025 13:14:37 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202502; t=1739448860; bh=u+KvcaMarXN2HpBnzl7jigC/RDXYTRF5+z3ZETbZ1DE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=DF9+vl61RQueLcWKE/pAfqa2hYlNee06CvQYfhGJhZE8qBU3uqFOsnpgRC99+qFwz dnt4RIlizxCWFn9jcUht+WfLxyaXu8ueqG1HIeRLB4RFvv+pAWR76S+u/x04f6J1Z3 Um/3MOzHV/bnznYBmA4i1ZsJ9mpv7cEoiYFhgs5Tn4ijcILnxhGNGOWKW1ELY0fHof DlXcN7q4Xk3cUGIAA1l5ASeyhPfSKCfMoSb0K5QMn7S1I/oh5G1i9GvYs8gGxadMrA q3T6RhRI85AIwk4myMyo1GMsUSO9Bz3I8bBSuKhB6sMHt3farrERX4RQvytR4p2fac 23uqk9xp/MVJw== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4YtvHc2dTgz4x2c; Thu, 13 Feb 2025 23:14:20 +1100 (AEDT) From: David Gibson To: Stefano Brivio , passt-dev@passt.top Subject: [PATCH v21 3/5] fixup: TCP_REPAIR_WINDOW before send unsent Date: Thu, 13 Feb 2025 23:14:15 +1100 Message-ID: <20250213121417.617970-4-david@gibson.dropbear.id.au> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250213121417.617970-1-david@gibson.dropbear.id.au> References: <20250213121417.617970-1-david@gibson.dropbear.id.au> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-ID-Hash: VTVJ2EGJWYRVFI5OKO43KQ26TLENA2Q6 X-Message-ID-Hash: VTVJ2EGJWYRVFI5OKO43KQ26TLENA2Q6 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: David Gibson X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: We, like libsoccr handle restoring the send queue in two pieces. The "already sent" piece is written in repair mode, to repopulate the kernel sndbuf without actually sending new packets to the peer. The "not sent" piece is written out of repair mode, so that we both put it into sndbuf *and* actually send it out. To do that we temporarily drop out of repair mode. However we do so before we've called TCP_REPAIR_WINDOW, meaning we're doing real send()s on a real, non-repair socket that has bad window information. That seems bad. Despite it differing from libsoccr, move the sending of the non sent queued data until after the *final* repair off. I strongly suspect that both we and libsoccr were only (kind of) getting away with this because notsent is usually 0. This seems to fix an intermittent hang I was seeing on migrate/iperf3_bidir6. I was seeing that perhaps 1 time in 3, or 1 time in 5 with DEBUG=1. I did observe a nonzero notsent the one time I reproduced after I knew what I was looking for. Signed-off-by: David Gibson --- tcp.c | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/tcp.c b/tcp.c index f18b2913..2a64b7c5 100644 --- a/tcp.c +++ b/tcp.c @@ -3530,25 +3530,22 @@ int tcp_flow_migrate_target_ext(struct ctx *c, union flow *flow, int fd) shutdown(s, SHUT_WR); } + if ((rc = tcp_flow_repair_wnd(s, &t))) + return rc; + + tcp_flow_repair_off(c, conn); + repair_flush(c); + if (t.notsent) { - tcp_flow_repair_off(c, conn); - repair_flush(c); + err("socket %i, t.sndq=%u t.notsent=%u", + s, t.sndq, t.notsent); if ((rc = tcp_flow_repair_queue(s, t.notsent, tcp_migrate_snd_queue + (t.sndq - t.notsent)))) return rc; - - tcp_flow_repair_on(c, conn); - repair_flush(c); } - if ((rc = tcp_flow_repair_wnd(s, &t))) - return rc; - - tcp_flow_repair_off(c, conn); - repair_flush(c); - /* If we sent a FIN but it wasn't acknowledged yet (TCP_FIN_WAIT1), send * it out, because we don't know if we already sent it. * -- 2.48.1