From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: [PATCH 4/4] tcp: Send TCP keepalive segments after a period of tap-side inactivity
Date: Thu, 05 Feb 2026 01:17:45 +0100 (CET) [thread overview]
Message-ID: <20260205011744.4243b8ce@elisabeth> (raw)
In-Reply-To: <20260204114137.2784090-5-david@gibson.dropbear.id.au>
On Wed, 4 Feb 2026 21:41:37 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:
> There are several circumstances in which a live, but idle TCP connection
> can be forgotten by a guest, with no "on the wire" indication that this has
> happened. The most obvious is if the guest abruptly reboots. A more
> subtle case can happen with a half-closed connection, specifically one
> in FIN_WAIT_2 state on the guest. A connection can, legitimately, remain
> in this state indefinitely. If however, a socket in this state is closed
> by userspace, Linux at least will remove the kernel socket after 60s
> (or as configured in the net.ipv4.tcp_fin_timeout sysctl).
>
> Because there's no on the wire indication in these cases, passt will
> pointlessly retain the connection in its flow table, at least until it is
> removed by the inactivity timeout after several hours.
>
> To avoid keeping connections around for so long in this state, add
> functionality to periodically send TCP keepalive segments to the guest if
> we've seen no activity on the tap interface. If the guest is no longer
> aware of the connection, it should respond with an RST which will let
> passt remove the stale entry.
>
> To do this we use a method similar to the inactivity timeout - a 1-bit
> page replacement / clock algorithm, but with a shorter interval, and only
> checking for tap side activity. Currently we use a 300s interval, meaning
> we'll send a keepalive after 5-10 minutes of (tap side) inactivity.
>
> Link: https://bugs.passt.top/show_bug.cgi?id=179
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> tcp.c | 39 +++++++++++++++++++++++++++++++++++++++
> tcp.h | 2 ++
> tcp_conn.h | 2 ++
> 3 files changed, 43 insertions(+)
>
> diff --git a/tcp.c b/tcp.c
> index acdac7df..bf57be23 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -206,6 +206,12 @@
> * keepalives) will be removed between INACTIVITY_INTERVAL s and
> * 2*INACTIVITY_INTERVAL s after the last activity.
> *
> + * - KEEPALIVE_INTERVAL: if a connection has had no tap-side activity for an
> + * entire interval, send a tap-side keepalive. If the endpoint is no longer
> + * aware of the connection (due to a reboot, or a kernel timeout in FIN_WAIT_2
> + * state) that should trigger an RST, so we won't keep track of connections
> + * that the guest endpoint no longer cares about.
> + *
> * Summary of data flows (with ESTABLISHED event)
> * ----------------------------------------------
> *
> @@ -342,6 +348,7 @@ enum {
> #define RTO_INIT_AFTER_SYN_RETRIES 3 /* s, RFC 6298 */
>
> #define INACTIVITY_INTERVAL 7200 /* s */
> +#define KEEPALIVE_INTERVAL 30 /* s */
By the way, once we're done testing this, I'm not sure which value to
use. MUST-28 in RFC 9293 (3.8.4. TCP Keep-Alives) says we should
"default to no less than two hours". It looks a bit too long to be
useful, to me.
>
> #define LOW_RTT_TABLE_SIZE 8
> #define LOW_RTT_THRESHOLD 10 /* us */
> @@ -2263,6 +2270,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
> }
>
> conn->inactive = false;
> + conn->tap_inactive = false;
>
> if (th->ack && !(conn->events & ESTABLISHED))
> tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq));
> @@ -2884,6 +2892,36 @@ int tcp_init(struct ctx *c)
> return 0;
> }
>
> +/**
> + * tcp_keepalive() - Send keepalives for connections which need it
> + * @: Execution context
* @c: Execution context
* @now: Current timestamp
> + */
> +static void tcp_keepalive(struct ctx *c, const struct timespec *now)
> +{
> + union flow *flow;
> +
> + if (now->tv_sec - c->tcp.keepalive_run < KEEPALIVE_INTERVAL)
> + return;
> +
> + c->tcp.keepalive_run = now->tv_sec;
> +
> + flow_foreach(flow) {
> + struct tcp_tap_conn *conn = &flow->tcp;
> +
> + if (flow->f.type != FLOW_TCP)
> + continue;
> +
> + if (conn->tap_inactive) {
> + flow_dbg(conn, "No tap activity for least %us, send keepalive",
s/least// (or "for at least")
> + KEEPALIVE_INTERVAL);
> + tcp_send_flag(c, conn, KEEPALIVE);
Do we have to make sure we don't interpret replies to keepalives as
duplicate ACKs (by setting some internal flow flag)? I haven't really
checked that path.
> + }
> +
> + /* Ready to check fot next interval */
for
> + conn->tap_inactive = true;
> + }
> +}
> +
> /**
> * tcp_inactivity() - Scan for and close long-inactive connections
> * @: Execution context
> @@ -2927,6 +2965,7 @@ void tcp_timer(struct ctx *c, const struct timespec *now)
> if (c->mode == MODE_PASTA)
> tcp_splice_refill(c);
>
> + tcp_keepalive(c, now);
> tcp_inactivity(c, now);
> }
>
> diff --git a/tcp.h b/tcp.h
> index e104d453..2739f309 100644
> --- a/tcp.h
> +++ b/tcp.h
> @@ -38,6 +38,7 @@ extern bool peek_offset_cap;
> * @rto_max: Maximum retry timeout (in s)
> * @syn_retries: SYN retries using exponential backoff timeout
> * @syn_linear_timeouts: SYN retries before using exponential backoff timeout
> + * @keepalive_run: Time we last issued tap-side keepalives
> * @inactivity_run: Time we last scanned for inactive connections
> */
> struct tcp_ctx {
> @@ -48,6 +49,7 @@ struct tcp_ctx {
> int rto_max;
> uint8_t syn_retries;
> uint8_t syn_linear_timeouts;
> + time_t keepalive_run;
> time_t inactivity_run;
> };
>
> diff --git a/tcp_conn.h b/tcp_conn.h
> index 7197ff63..c82e1441 100644
> --- a/tcp_conn.h
> +++ b/tcp_conn.h
> @@ -16,6 +16,7 @@
> * @ws_from_tap: Window scaling factor advertised from tap/guest
> * @ws_to_tap: Window scaling factor advertised to tap/guest
> * @tap_mss: MSS advertised by tap/guest, rounded to 2 ^ TCP_MSS_BITS
> + * @tapinactive: No tao activity within the current KEEPALIVE_INTERVAL
tap_inactive, tap activity
> * @inactive: No activity within the current INACTIVITY_INTERVAL
> * @sock: Socket descriptor number
> * @events: Connection events, implying connection states
> @@ -58,6 +59,7 @@ struct tcp_tap_conn {
> (conn->rtt_exp = MIN(RTT_EXP_MAX, ilog2(MAX(1, rtt / RTT_STORE_MIN))))
> #define RTT_GET(conn) (RTT_STORE_MIN << conn->rtt_exp)
>
> + bool tap_inactive :1;
> bool inactive :1;
>
> int sock :FD_REF_BITS;
The rest looks all good to me, but I didn't test it at all.
--
Stefano
next prev parent reply other threads:[~2026-02-05 0:17 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-04 11:41 [PATCH 0/4] RFC: Reworks and improvements to TCP activity timers David Gibson
2026-02-04 11:41 ` [PATCH 1/4] tcp: Remove non-working activity timeout mechanism David Gibson
2026-02-04 11:41 ` [PATCH 2/4] tcp: Re-introduce inactivity timeouts based on a clock algorithm David Gibson
2026-02-05 0:17 ` Stefano Brivio
2026-02-06 0:43 ` David Gibson
2026-02-04 11:41 ` [PATCH 3/4] tcp: Extend tcp_send_flag() to send TCP keepalive segments David Gibson
2026-02-04 11:41 ` [PATCH 4/4] tcp: Send TCP keepalive segments after a period of tap-side inactivity David Gibson
2026-02-05 0:17 ` Stefano Brivio [this message]
2026-02-06 1:21 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260205011744.4243b8ce@elisabeth \
--to=sbrivio@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).