public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: [PATCH 4/4] tcp: Send TCP keepalive segments after a period of tap-side inactivity
Date: Thu, 05 Feb 2026 01:17:45 +0100 (CET)	[thread overview]
Message-ID: <20260205011744.4243b8ce@elisabeth> (raw)
In-Reply-To: <20260204114137.2784090-5-david@gibson.dropbear.id.au>

On Wed,  4 Feb 2026 21:41:37 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> There are several circumstances in which a live, but idle TCP connection
> can be forgotten by a guest, with no "on the wire" indication that this has
> happened.  The most obvious is if the guest abruptly reboots.  A more
> subtle case can happen with a half-closed connection, specifically one
> in FIN_WAIT_2 state on the guest.  A connection can, legitimately, remain
> in this state indefinitely.  If however, a socket in this state is closed
> by userspace, Linux at least will remove the kernel socket after 60s
> (or as configured in the net.ipv4.tcp_fin_timeout sysctl).
> 
> Because there's no on the wire indication in these cases, passt will
> pointlessly retain the connection in its flow table, at least until it is
> removed by the inactivity timeout after several hours.
> 
> To avoid keeping connections around for so long in this state, add
> functionality to periodically send TCP keepalive segments to the guest if
> we've seen no activity on the tap interface.  If the guest is no longer
> aware of the connection, it should respond with an RST which will let
> passt remove the stale entry.
> 
> To do this we use a method similar to the inactivity timeout - a 1-bit
> page replacement / clock algorithm, but with a shorter interval, and only
> checking for tap side activity.  Currently we use a 300s interval, meaning
> we'll send a keepalive after 5-10 minutes of (tap side) inactivity.
> 
> Link: https://bugs.passt.top/show_bug.cgi?id=179
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  tcp.c      | 39 +++++++++++++++++++++++++++++++++++++++
>  tcp.h      |  2 ++
>  tcp_conn.h |  2 ++
>  3 files changed, 43 insertions(+)
> 
> diff --git a/tcp.c b/tcp.c
> index acdac7df..bf57be23 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -206,6 +206,12 @@
>   *   keepalives) will be removed between INACTIVITY_INTERVAL s and
>   *   2*INACTIVITY_INTERVAL s after the last activity.
>   *
> + * - KEEPALIVE_INTERVAL: if a connection has had no tap-side activity for an
> + *   entire interval, send a tap-side keepalive.  If the endpoint is no longer
> + *   aware of the connection (due to a reboot, or a kernel timeout in FIN_WAIT_2
> + *   state) that should trigger an RST, so we won't keep track of connections
> + *   that the guest endpoint no longer cares about.
> + *
>   * Summary of data flows (with ESTABLISHED event)
>   * ----------------------------------------------
>   *
> @@ -342,6 +348,7 @@ enum {
>  #define RTO_INIT_AFTER_SYN_RETRIES	3		/* s, RFC 6298 */
>  
>  #define INACTIVITY_INTERVAL		7200		/* s */
> +#define	KEEPALIVE_INTERVAL		30		/* s */

By the way, once we're done testing this, I'm not sure which value to
use. MUST-28 in RFC 9293 (3.8.4. TCP Keep-Alives) says we should
"default to no less than two hours". It looks a bit too long to be
useful, to me.

>  
>  #define LOW_RTT_TABLE_SIZE		8
>  #define LOW_RTT_THRESHOLD		10 /* us */
> @@ -2263,6 +2270,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
>  	}
>  
>  	conn->inactive = false;
> +	conn->tap_inactive = false;
>  
>  	if (th->ack && !(conn->events & ESTABLISHED))
>  		tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq));
> @@ -2884,6 +2892,36 @@ int tcp_init(struct ctx *c)
>  	return 0;
>  }
>  
> +/**
> + * tcp_keepalive() - Send keepalives for connections which need it
> + * @:	Execution context

 * @c:		Execution context
 * @now:	Current timestamp

> + */
> +static void tcp_keepalive(struct ctx *c, const struct timespec *now)
> +{
> +	union flow *flow;
> +
> +	if (now->tv_sec - c->tcp.keepalive_run < KEEPALIVE_INTERVAL)
> +		return;
> +
> +	c->tcp.keepalive_run = now->tv_sec;
> +
> +	flow_foreach(flow) {
> +		struct tcp_tap_conn *conn = &flow->tcp;
> +
> +		if (flow->f.type != FLOW_TCP)
> +			continue;
> +
> +		if (conn->tap_inactive) {
> +			flow_dbg(conn, "No tap activity for least %us, send keepalive",

s/least// (or "for at least")

> +				 KEEPALIVE_INTERVAL);
> +			tcp_send_flag(c, conn, KEEPALIVE);

Do we have to make sure we don't interpret replies to keepalives as
duplicate ACKs (by setting some internal flow flag)? I haven't really
checked that path.

> +		}
> +
> +		/* Ready to check fot next interval */

for

> +		conn->tap_inactive = true;
> +	}
> +}
> +
>  /**
>   * tcp_inactivity() - Scan for and close long-inactive connections
>   * @:	Execution context
> @@ -2927,6 +2965,7 @@ void tcp_timer(struct ctx *c, const struct timespec *now)
>  	if (c->mode == MODE_PASTA)
>  		tcp_splice_refill(c);
>  
> +	tcp_keepalive(c, now);
>  	tcp_inactivity(c, now);
>  }
>  
> diff --git a/tcp.h b/tcp.h
> index e104d453..2739f309 100644
> --- a/tcp.h
> +++ b/tcp.h
> @@ -38,6 +38,7 @@ extern bool peek_offset_cap;
>   * @rto_max:		Maximum retry timeout (in s)
>   * @syn_retries:	SYN retries using exponential backoff timeout
>   * @syn_linear_timeouts: SYN retries before using exponential backoff timeout
> + * @keepalive_run:	Time we last issued tap-side keepalives
>   * @inactivity_run:	Time we last scanned for inactive connections
>   */
>  struct tcp_ctx {
> @@ -48,6 +49,7 @@ struct tcp_ctx {
>  	int rto_max;
>  	uint8_t syn_retries;
>  	uint8_t syn_linear_timeouts;
> +	time_t keepalive_run;
>  	time_t inactivity_run;
>  };
>  
> diff --git a/tcp_conn.h b/tcp_conn.h
> index 7197ff63..c82e1441 100644
> --- a/tcp_conn.h
> +++ b/tcp_conn.h
> @@ -16,6 +16,7 @@
>   * @ws_from_tap:	Window scaling factor advertised from tap/guest
>   * @ws_to_tap:		Window scaling factor advertised to tap/guest
>   * @tap_mss:		MSS advertised by tap/guest, rounded to 2 ^ TCP_MSS_BITS
> + * @tapinactive:	No tao activity within the current KEEPALIVE_INTERVAL

tap_inactive, tap activity

>   * @inactive:		No activity within the current INACTIVITY_INTERVAL
>   * @sock:		Socket descriptor number
>   * @events:		Connection events, implying connection states
> @@ -58,6 +59,7 @@ struct tcp_tap_conn {
>  	(conn->rtt_exp = MIN(RTT_EXP_MAX, ilog2(MAX(1, rtt / RTT_STORE_MIN))))
>  #define RTT_GET(conn)			(RTT_STORE_MIN << conn->rtt_exp)
>  
> +	bool		tap_inactive	:1;
>  	bool		inactive	:1;
>  
>  	int		sock		:FD_REF_BITS;

The rest looks all good to me, but I didn't test it at all.

-- 
Stefano


  reply	other threads:[~2026-02-05  0:17 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-04 11:41 [PATCH 0/4] RFC: Reworks and improvements to TCP activity timers David Gibson
2026-02-04 11:41 ` [PATCH 1/4] tcp: Remove non-working activity timeout mechanism David Gibson
2026-02-04 11:41 ` [PATCH 2/4] tcp: Re-introduce inactivity timeouts based on a clock algorithm David Gibson
2026-02-05  0:17   ` Stefano Brivio
2026-02-06  0:43     ` David Gibson
2026-02-04 11:41 ` [PATCH 3/4] tcp: Extend tcp_send_flag() to send TCP keepalive segments David Gibson
2026-02-04 11:41 ` [PATCH 4/4] tcp: Send TCP keepalive segments after a period of tap-side inactivity David Gibson
2026-02-05  0:17   ` Stefano Brivio [this message]
2026-02-06  1:21     ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260205011744.4243b8ce@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).