From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202512 header.b=oSudkCKY; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 278435A0653 for ; Wed, 04 Feb 2026 12:42:42 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202512; t=1770205359; bh=BqtPQ3MElGkk3E/lnt4wfwL3ADVvRKqUSrWgwnEJmqQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=oSudkCKY8piJxEQ1te00MCO2XzhQxo4lr6wwlJvfyOO8GNN30Ny5ZC8FFluvxkec8 gvtvE+t7JAmAp0mupfb+J0G9K44OMs3XyBu5qRXYxlFkNED/dizNo+TlcGSXGd3Byv vxI5NQPlHpp5xuMbbAwy5FHexkJU5R3seTYesWe/B7wa3AqGfEcntojeJDUCtwGTHD FUjDWJIhcx2Nvl7izoLvMj4TZmVZVHTYWXiqIW1pl2VGBU3k/EOTkq2VoXEkVx7z1/ fOaE5a3PNML/go11l4yKIaTNAhyMtCZdrYXIn2Es2tI82O0CE1wJ9y/hzMyoUL8X6d tQd7+odLaNYSA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4f5dkl0kNFz4wpV; Wed, 04 Feb 2026 22:42:39 +1100 (AEDT) From: David Gibson To: Stefano Brivio , passt-dev@passt.top Subject: [PATCH 4/4] tcp: Send TCP keepalive segments after a period of tap-side inactivity Date: Wed, 4 Feb 2026 21:41:37 +1000 Message-ID: <20260204114137.2784090-5-david@gibson.dropbear.id.au> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260204114137.2784090-1-david@gibson.dropbear.id.au> References: <20260204114137.2784090-1-david@gibson.dropbear.id.au> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-ID-Hash: 3AACHMIWLWV2UAAON7DMWQXQY6IORBXI X-Message-ID-Hash: 3AACHMIWLWV2UAAON7DMWQXQY6IORBXI X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: David Gibson X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: There are several circumstances in which a live, but idle TCP connection can be forgotten by a guest, with no "on the wire" indication that this has happened. The most obvious is if the guest abruptly reboots. A more subtle case can happen with a half-closed connection, specifically one in FIN_WAIT_2 state on the guest. A connection can, legitimately, remain in this state indefinitely. If however, a socket in this state is closed by userspace, Linux at least will remove the kernel socket after 60s (or as configured in the net.ipv4.tcp_fin_timeout sysctl). Because there's no on the wire indication in these cases, passt will pointlessly retain the connection in its flow table, at least until it is removed by the inactivity timeout after several hours. To avoid keeping connections around for so long in this state, add functionality to periodically send TCP keepalive segments to the guest if we've seen no activity on the tap interface. If the guest is no longer aware of the connection, it should respond with an RST which will let passt remove the stale entry. To do this we use a method similar to the inactivity timeout - a 1-bit page replacement / clock algorithm, but with a shorter interval, and only checking for tap side activity. Currently we use a 300s interval, meaning we'll send a keepalive after 5-10 minutes of (tap side) inactivity. Link: https://bugs.passt.top/show_bug.cgi?id=179 Signed-off-by: David Gibson --- tcp.c | 39 +++++++++++++++++++++++++++++++++++++++ tcp.h | 2 ++ tcp_conn.h | 2 ++ 3 files changed, 43 insertions(+) diff --git a/tcp.c b/tcp.c index acdac7df..bf57be23 100644 --- a/tcp.c +++ b/tcp.c @@ -206,6 +206,12 @@ * keepalives) will be removed between INACTIVITY_INTERVAL s and * 2*INACTIVITY_INTERVAL s after the last activity. * + * - KEEPALIVE_INTERVAL: if a connection has had no tap-side activity for an + * entire interval, send a tap-side keepalive. If the endpoint is no longer + * aware of the connection (due to a reboot, or a kernel timeout in FIN_WAIT_2 + * state) that should trigger an RST, so we won't keep track of connections + * that the guest endpoint no longer cares about. + * * Summary of data flows (with ESTABLISHED event) * ---------------------------------------------- * @@ -342,6 +348,7 @@ enum { #define RTO_INIT_AFTER_SYN_RETRIES 3 /* s, RFC 6298 */ #define INACTIVITY_INTERVAL 7200 /* s */ +#define KEEPALIVE_INTERVAL 30 /* s */ #define LOW_RTT_TABLE_SIZE 8 #define LOW_RTT_THRESHOLD 10 /* us */ @@ -2263,6 +2270,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, } conn->inactive = false; + conn->tap_inactive = false; if (th->ack && !(conn->events & ESTABLISHED)) tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq)); @@ -2884,6 +2892,36 @@ int tcp_init(struct ctx *c) return 0; } +/** + * tcp_keepalive() - Send keepalives for connections which need it + * @: Execution context + */ +static void tcp_keepalive(struct ctx *c, const struct timespec *now) +{ + union flow *flow; + + if (now->tv_sec - c->tcp.keepalive_run < KEEPALIVE_INTERVAL) + return; + + c->tcp.keepalive_run = now->tv_sec; + + flow_foreach(flow) { + struct tcp_tap_conn *conn = &flow->tcp; + + if (flow->f.type != FLOW_TCP) + continue; + + if (conn->tap_inactive) { + flow_dbg(conn, "No tap activity for least %us, send keepalive", + KEEPALIVE_INTERVAL); + tcp_send_flag(c, conn, KEEPALIVE); + } + + /* Ready to check fot next interval */ + conn->tap_inactive = true; + } +} + /** * tcp_inactivity() - Scan for and close long-inactive connections * @: Execution context @@ -2927,6 +2965,7 @@ void tcp_timer(struct ctx *c, const struct timespec *now) if (c->mode == MODE_PASTA) tcp_splice_refill(c); + tcp_keepalive(c, now); tcp_inactivity(c, now); } diff --git a/tcp.h b/tcp.h index e104d453..2739f309 100644 --- a/tcp.h +++ b/tcp.h @@ -38,6 +38,7 @@ extern bool peek_offset_cap; * @rto_max: Maximum retry timeout (in s) * @syn_retries: SYN retries using exponential backoff timeout * @syn_linear_timeouts: SYN retries before using exponential backoff timeout + * @keepalive_run: Time we last issued tap-side keepalives * @inactivity_run: Time we last scanned for inactive connections */ struct tcp_ctx { @@ -48,6 +49,7 @@ struct tcp_ctx { int rto_max; uint8_t syn_retries; uint8_t syn_linear_timeouts; + time_t keepalive_run; time_t inactivity_run; }; diff --git a/tcp_conn.h b/tcp_conn.h index 7197ff63..c82e1441 100644 --- a/tcp_conn.h +++ b/tcp_conn.h @@ -16,6 +16,7 @@ * @ws_from_tap: Window scaling factor advertised from tap/guest * @ws_to_tap: Window scaling factor advertised to tap/guest * @tap_mss: MSS advertised by tap/guest, rounded to 2 ^ TCP_MSS_BITS + * @tapinactive: No tao activity within the current KEEPALIVE_INTERVAL * @inactive: No activity within the current INACTIVITY_INTERVAL * @sock: Socket descriptor number * @events: Connection events, implying connection states @@ -58,6 +59,7 @@ struct tcp_tap_conn { (conn->rtt_exp = MIN(RTT_EXP_MAX, ilog2(MAX(1, rtt / RTT_STORE_MIN)))) #define RTT_GET(conn) (RTT_STORE_MIN << conn->rtt_exp) + bool tap_inactive :1; bool inactive :1; int sock :FD_REF_BITS; -- 2.52.0