* [PATCH 0/4] RFC: Reworks and improvements to TCP activity timers
@ 2026-02-04 11:41 David Gibson
2026-02-04 11:41 ` [PATCH 1/4] tcp: Remove non-working activity timeout mechanism David Gibson
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: David Gibson @ 2026-02-04 11:41 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
Here's a bunch of patches aimed at fixing bug 179, and reworking the
currently broken inactivity timer along the way.
I believe patches 1..2/4 are ready to go - I've tested them, and I'm
happy with how they're behaving. Patches 3..4/4 I think are correct,
but I've been getting bogged down in details trying to test them in
the specific FIN_WAIT_2 situation that occurs in bug 179.
I'm sending this out for comment, while I look at some other things to
clear my head.
Caveats:
* Currently the inactivity timer uses an interval of 2h to match the
intended behaviour of the existig non-working activity timeout.
Arguably it should be much longer (several days), like the kernel
NAT timeout for idle connection tracking.
* The keepalive interval is currently 30s. That's too short - I have
it there just for testing. I was intending to change it to 300s
(5m) for "real", but that's certainly open to discussion.
* This introduces two new fields in the connection structure, as the
clock values for the two new timers. These are new 1-bit bool
fields slotted into a 3-bit gap. Arguably these would be cleaner
as new bits in the 'flags' field. However, since we only have one
spare bit there at the moment, that would require some more complex
reorganization which I didn't want to tackle right now.
David Gibson (4):
tcp: Remove non-working activity timeout mechanism
tcp: Re-introduce inactivity timeouts based on a clock algorithm
tcp: Extend tcp_send_flag() to send TCP keepalive segments
tcp: Send TCP keepalive segments after a period of tap-side inactivity
tcp.c | 115 ++++++++++++++++++++++++++++++++++++++-----------
tcp.h | 6 ++-
tcp_buf.c | 4 ++
tcp_conn.h | 5 +++
tcp_internal.h | 2 +
tcp_vu.c | 3 ++
6 files changed, 109 insertions(+), 26 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/4] tcp: Remove non-working activity timeout mechanism
2026-02-04 11:41 [PATCH 0/4] RFC: Reworks and improvements to TCP activity timers David Gibson
@ 2026-02-04 11:41 ` David Gibson
2026-02-04 11:41 ` [PATCH 2/4] tcp: Re-introduce inactivity timeouts based on a clock algorithm David Gibson
` (2 subsequent siblings)
3 siblings, 0 replies; 9+ messages in thread
From: David Gibson @ 2026-02-04 11:41 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
This mechanism was intended to remove connections which have had no
activity for two hours, even if they haven't closed or been reset
internally. It operated by setting the two hour timeout if there are
no sooner TCP timeouts to schedule.
However, when the timer fires, the way we detect the case of the activity
timeout doesn't work: it resets the timer for another two hours, then
checks if the old timeout was two hours. But the old timeout returned
by timerfd_settime() is not the original value of the timer, but the
remaining time. Since the timer has just fired it will essentially always
be 0.
For now, just remove the mechanism, disarming the timer entirely if there
isn't another upcoming event. We'll re-introduce some sort of activity
timeout by a different means later.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp.c | 24 +++---------------------
1 file changed, 3 insertions(+), 21 deletions(-)
diff --git a/tcp.c b/tcp.c
index 0a64892a..f8663369 100644
--- a/tcp.c
+++ b/tcp.c
@@ -190,9 +190,6 @@
* - RTO_INIT_AFTER_SYN_RETRIES: if SYN retries happened during handshake and
* RTO is less than this, re-initialise RTO to this for data retransmissions
*
- * - ACT_TIMEOUT, in the presence of any event: if no activity is detected on
- * either side, the connection is reset
- *
* - RTT / 2 elapsed after data segment received from tap without having
* sent an ACK segment, or zero-sized window advertised to tap/guest (flag
* ACK_TO_TAP_DUE): forcibly check if an ACK segment can be sent.
@@ -589,7 +586,9 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
timeout <<= MAX(exp, 0);
it.it_value.tv_sec = MIN(timeout, c->tcp.rto_max);
} else {
- it.it_value.tv_sec = ACT_TIMEOUT;
+ /* Disarm */
+ it.it_value.tv_sec = 0;
+ it.it_value.tv_nsec = 0;
}
if (conn->flags & ACK_TO_TAP_DUE) {
@@ -2598,23 +2597,6 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
tcp_data_from_sock(c, conn);
tcp_timer_ctl(c, conn);
}
- } else {
- struct itimerspec new = { { 0 }, { ACT_TIMEOUT, 0 } };
- struct itimerspec old = { { 0 }, { 0 } };
-
- /* Activity timeout: if it was already set, reset the
- * connection, otherwise, it was a left-over from ACK_TO_TAP_DUE
- * or ACK_FROM_TAP_DUE, so just set the long timeout in that
- * case. This avoids having to preemptively reset the timer on
- * ~ACK_TO_TAP_DUE or ~ACK_FROM_TAP_DUE.
- */
- if (timerfd_settime(conn->timer, 0, &new, &old))
- flow_perror(conn, "failed to set timer");
-
- if (old.it_value.tv_sec == ACT_TIMEOUT) {
- flow_dbg(conn, "activity timeout");
- tcp_rst(c, conn);
- }
}
}
--
2.52.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/4] tcp: Re-introduce inactivity timeouts based on a clock algorithm
2026-02-04 11:41 [PATCH 0/4] RFC: Reworks and improvements to TCP activity timers David Gibson
2026-02-04 11:41 ` [PATCH 1/4] tcp: Remove non-working activity timeout mechanism David Gibson
@ 2026-02-04 11:41 ` David Gibson
2026-02-05 0:17 ` Stefano Brivio
2026-02-04 11:41 ` [PATCH 3/4] tcp: Extend tcp_send_flag() to send TCP keepalive segments David Gibson
2026-02-04 11:41 ` [PATCH 4/4] tcp: Send TCP keepalive segments after a period of tap-side inactivity David Gibson
3 siblings, 1 reply; 9+ messages in thread
From: David Gibson @ 2026-02-04 11:41 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
We previously had a mechanism to remove TCP connections which were
inactive for 2 hours. That was broken for a long time, due to poor
interactions with the timerfd handling, so we removed it.
Adding this long scale timer onto the timerfd handling, which mostly
handles much shorter timeouts is tricky to reason about. However, for the
inactivity timeouts, we don't require precision. Instead, we can use
a 1-bit page replacement / "clock" algorithm. Every INACTIVITY_INTERVAL
(2 hours), a global timer marks every TCP connection as tentatively
inactive. That flag is cleared if we get any events, either tap side or
socket side.
If the inactive flag is still set when the next INACTIVITY_INTERVAL expires
then the connection has been inactive for an extended period and we reset
and close it. In practice this means that connections will be removed
after 2-4 hours of inactivity.
This is not a true fix for bug 179, but it does mitigate the damage, by
limiting the time that inactive connections will remain around,
Link: https://bugs.passt.top/show_bug.cgi?id=179
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++----
tcp.h | 4 +++-
tcp_conn.h | 3 +++
3 files changed, 54 insertions(+), 5 deletions(-)
diff --git a/tcp.c b/tcp.c
index f8663369..acdac7df 100644
--- a/tcp.c
+++ b/tcp.c
@@ -198,6 +198,13 @@
* TCP_INFO, with a representable range from RTT_STORE_MIN (100 us) to
* RTT_STORE_MAX (3276.8 ms). The timeout value is clamped accordingly.
*
+ * We also use a global interval timer for an activity timeout which doesn't
+ * require precision:
+ *
+ * - INACTIVITY_INTERVAL: if a connection has had no activity for an entire
+ * interval, close and reset it. This means that idle connections (without
+ * keepalives) will be removed between INACTIVITY_INTERVAL s and
+ * 2*INACTIVITY_INTERVAL s after the last activity.
*
* Summary of data flows (with ESTABLISHED event)
* ----------------------------------------------
@@ -333,7 +340,8 @@ enum {
#define RTO_INIT 1 /* s, RFC 6298 */
#define RTO_INIT_AFTER_SYN_RETRIES 3 /* s, RFC 6298 */
-#define ACT_TIMEOUT 7200
+
+#define INACTIVITY_INTERVAL 7200 /* s */
#define LOW_RTT_TABLE_SIZE 8
#define LOW_RTT_THRESHOLD 10 /* us */
@@ -2254,6 +2262,8 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
return 1;
}
+ conn->inactive = false;
+
if (th->ack && !(conn->events & ESTABLISHED))
tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq));
@@ -2622,6 +2632,8 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref,
return;
}
+ conn->inactive = false;
+
if ((conn->events & TAP_FIN_ACKED) && (events & EPOLLHUP)) {
conn_event(c, conn, CLOSED);
return;
@@ -2872,18 +2884,50 @@ int tcp_init(struct ctx *c)
return 0;
}
+/**
+ * tcp_inactivity() - Scan for and close long-inactive connections
+ * @: Execution context
+ */
+static void tcp_inactivity(struct ctx *c, const struct timespec *now)
+{
+ union flow *flow;
+
+ if (now->tv_sec - c->tcp.inactivity_run < INACTIVITY_INTERVAL)
+ return;
+
+ debug("TCP inactivity scan");
+ c->tcp.inactivity_run = now->tv_sec;
+
+ flow_foreach(flow) {
+ struct tcp_tap_conn *conn = &flow->tcp;
+
+ if (flow->f.type != FLOW_TCP)
+ continue;
+
+ if (conn->inactive) {
+ /* No activity in this interval, reset */
+ flow_dbg(conn, "Inactive for at least %us, resetting",
+ INACTIVITY_INTERVAL);
+ tcp_rst(c, conn);
+ }
+
+ /* Ready to check fot next interval */
+ conn->inactive = true;
+ }
+}
+
/**
* tcp_timer() - Periodic tasks: port detection, closed connections, pool refill
* @c: Execution context
* @now: Current timestamp
*/
-void tcp_timer(const struct ctx *c, const struct timespec *now)
+void tcp_timer(struct ctx *c, const struct timespec *now)
{
- (void)now;
-
tcp_sock_refill_init(c);
if (c->mode == MODE_PASTA)
tcp_splice_refill(c);
+
+ tcp_inactivity(c, now);
}
/**
diff --git a/tcp.h b/tcp.h
index 24b90870..e104d453 100644
--- a/tcp.h
+++ b/tcp.h
@@ -21,7 +21,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
int tcp_listen(const struct ctx *c, uint8_t pif, unsigned rule,
const union inany_addr *addr, const char *ifname, in_port_t port);
int tcp_init(struct ctx *c);
-void tcp_timer(const struct ctx *c, const struct timespec *now);
+void tcp_timer(struct ctx *c, const struct timespec *now);
void tcp_defer_handler(struct ctx *c);
void tcp_update_l2_buf(const unsigned char *eth_d);
@@ -38,6 +38,7 @@ extern bool peek_offset_cap;
* @rto_max: Maximum retry timeout (in s)
* @syn_retries: SYN retries using exponential backoff timeout
* @syn_linear_timeouts: SYN retries before using exponential backoff timeout
+ * @inactivity_run: Time we last scanned for inactive connections
*/
struct tcp_ctx {
struct fwd_ports fwd_in;
@@ -47,6 +48,7 @@ struct tcp_ctx {
int rto_max;
uint8_t syn_retries;
uint8_t syn_linear_timeouts;
+ time_t inactivity_run;
};
#endif /* TCP_H */
diff --git a/tcp_conn.h b/tcp_conn.h
index 21cea109..7197ff63 100644
--- a/tcp_conn.h
+++ b/tcp_conn.h
@@ -16,6 +16,7 @@
* @ws_from_tap: Window scaling factor advertised from tap/guest
* @ws_to_tap: Window scaling factor advertised to tap/guest
* @tap_mss: MSS advertised by tap/guest, rounded to 2 ^ TCP_MSS_BITS
+ * @inactive: No activity within the current INACTIVITY_INTERVAL
* @sock: Socket descriptor number
* @events: Connection events, implying connection states
* @timer: timerfd descriptor for timeout events
@@ -57,6 +58,8 @@ struct tcp_tap_conn {
(conn->rtt_exp = MIN(RTT_EXP_MAX, ilog2(MAX(1, rtt / RTT_STORE_MIN))))
#define RTT_GET(conn) (RTT_STORE_MIN << conn->rtt_exp)
+ bool inactive :1;
+
int sock :FD_REF_BITS;
uint8_t events;
--
2.52.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/4] tcp: Extend tcp_send_flag() to send TCP keepalive segments
2026-02-04 11:41 [PATCH 0/4] RFC: Reworks and improvements to TCP activity timers David Gibson
2026-02-04 11:41 ` [PATCH 1/4] tcp: Remove non-working activity timeout mechanism David Gibson
2026-02-04 11:41 ` [PATCH 2/4] tcp: Re-introduce inactivity timeouts based on a clock algorithm David Gibson
@ 2026-02-04 11:41 ` David Gibson
2026-02-04 11:41 ` [PATCH 4/4] tcp: Send TCP keepalive segments after a period of tap-side inactivity David Gibson
3 siblings, 0 replies; 9+ messages in thread
From: David Gibson @ 2026-02-04 11:41 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
TCP keepalives aren't technically a flag, but they are a zero-data segment
so they can be generated with only a small modification to
tcp_{buf,vu}_send_flag(). Implement this, using a new "pseudo-flag"
value (similar to DUP_ACK), KEEPALIVE.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp_buf.c | 4 ++++
tcp_internal.h | 2 ++
tcp_vu.c | 3 +++
3 files changed, 9 insertions(+)
diff --git a/tcp_buf.c b/tcp_buf.c
index d2925410..18bdce81 100644
--- a/tcp_buf.c
+++ b/tcp_buf.c
@@ -227,6 +227,10 @@ int tcp_buf_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
tcp_frame_conns[tcp_payload_used++] = conn;
l4len = optlen + sizeof(struct tcphdr);
iov[TCP_IOV_PAYLOAD].iov_len = l4len;
+
+ if (flags & KEEPALIVE)
+ seq--;
+
tcp_l2_buf_fill_headers(c, conn, iov, NULL, seq, false);
tcp_l2_buf_pad(iov);
diff --git a/tcp_internal.h b/tcp_internal.h
index 5f8fb358..36f443b4 100644
--- a/tcp_internal.h
+++ b/tcp_internal.h
@@ -38,6 +38,8 @@
/* Flags for internal usage */
#define DUP_ACK (1 << 5)
+#define KEEPALIVE (1 << 6)
+
#define OPT_EOL 0
#define OPT_NOP 1
#define OPT_MSS 2
diff --git a/tcp_vu.c b/tcp_vu.c
index b9e9b55e..f2d38aa5 100644
--- a/tcp_vu.c
+++ b/tcp_vu.c
@@ -135,6 +135,9 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
flags_elem[0].in_sg[0].iov_len = hdrlen + optlen;
payload = IOV_TAIL(flags_elem[0].in_sg, 1, hdrlen);
+ if (flags & KEEPALIVE)
+ seq--;
+
tcp_fill_headers(c, conn, NULL, eh, ip4h, ip6h, th, &payload,
NULL, seq, !*c->pcap);
--
2.52.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 4/4] tcp: Send TCP keepalive segments after a period of tap-side inactivity
2026-02-04 11:41 [PATCH 0/4] RFC: Reworks and improvements to TCP activity timers David Gibson
` (2 preceding siblings ...)
2026-02-04 11:41 ` [PATCH 3/4] tcp: Extend tcp_send_flag() to send TCP keepalive segments David Gibson
@ 2026-02-04 11:41 ` David Gibson
2026-02-05 0:17 ` Stefano Brivio
3 siblings, 1 reply; 9+ messages in thread
From: David Gibson @ 2026-02-04 11:41 UTC (permalink / raw)
To: Stefano Brivio, passt-dev; +Cc: David Gibson
There are several circumstances in which a live, but idle TCP connection
can be forgotten by a guest, with no "on the wire" indication that this has
happened. The most obvious is if the guest abruptly reboots. A more
subtle case can happen with a half-closed connection, specifically one
in FIN_WAIT_2 state on the guest. A connection can, legitimately, remain
in this state indefinitely. If however, a socket in this state is closed
by userspace, Linux at least will remove the kernel socket after 60s
(or as configured in the net.ipv4.tcp_fin_timeout sysctl).
Because there's no on the wire indication in these cases, passt will
pointlessly retain the connection in its flow table, at least until it is
removed by the inactivity timeout after several hours.
To avoid keeping connections around for so long in this state, add
functionality to periodically send TCP keepalive segments to the guest if
we've seen no activity on the tap interface. If the guest is no longer
aware of the connection, it should respond with an RST which will let
passt remove the stale entry.
To do this we use a method similar to the inactivity timeout - a 1-bit
page replacement / clock algorithm, but with a shorter interval, and only
checking for tap side activity. Currently we use a 300s interval, meaning
we'll send a keepalive after 5-10 minutes of (tap side) inactivity.
Link: https://bugs.passt.top/show_bug.cgi?id=179
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp.c | 39 +++++++++++++++++++++++++++++++++++++++
tcp.h | 2 ++
tcp_conn.h | 2 ++
3 files changed, 43 insertions(+)
diff --git a/tcp.c b/tcp.c
index acdac7df..bf57be23 100644
--- a/tcp.c
+++ b/tcp.c
@@ -206,6 +206,12 @@
* keepalives) will be removed between INACTIVITY_INTERVAL s and
* 2*INACTIVITY_INTERVAL s after the last activity.
*
+ * - KEEPALIVE_INTERVAL: if a connection has had no tap-side activity for an
+ * entire interval, send a tap-side keepalive. If the endpoint is no longer
+ * aware of the connection (due to a reboot, or a kernel timeout in FIN_WAIT_2
+ * state) that should trigger an RST, so we won't keep track of connections
+ * that the guest endpoint no longer cares about.
+ *
* Summary of data flows (with ESTABLISHED event)
* ----------------------------------------------
*
@@ -342,6 +348,7 @@ enum {
#define RTO_INIT_AFTER_SYN_RETRIES 3 /* s, RFC 6298 */
#define INACTIVITY_INTERVAL 7200 /* s */
+#define KEEPALIVE_INTERVAL 30 /* s */
#define LOW_RTT_TABLE_SIZE 8
#define LOW_RTT_THRESHOLD 10 /* us */
@@ -2263,6 +2270,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
}
conn->inactive = false;
+ conn->tap_inactive = false;
if (th->ack && !(conn->events & ESTABLISHED))
tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq));
@@ -2884,6 +2892,36 @@ int tcp_init(struct ctx *c)
return 0;
}
+/**
+ * tcp_keepalive() - Send keepalives for connections which need it
+ * @: Execution context
+ */
+static void tcp_keepalive(struct ctx *c, const struct timespec *now)
+{
+ union flow *flow;
+
+ if (now->tv_sec - c->tcp.keepalive_run < KEEPALIVE_INTERVAL)
+ return;
+
+ c->tcp.keepalive_run = now->tv_sec;
+
+ flow_foreach(flow) {
+ struct tcp_tap_conn *conn = &flow->tcp;
+
+ if (flow->f.type != FLOW_TCP)
+ continue;
+
+ if (conn->tap_inactive) {
+ flow_dbg(conn, "No tap activity for least %us, send keepalive",
+ KEEPALIVE_INTERVAL);
+ tcp_send_flag(c, conn, KEEPALIVE);
+ }
+
+ /* Ready to check fot next interval */
+ conn->tap_inactive = true;
+ }
+}
+
/**
* tcp_inactivity() - Scan for and close long-inactive connections
* @: Execution context
@@ -2927,6 +2965,7 @@ void tcp_timer(struct ctx *c, const struct timespec *now)
if (c->mode == MODE_PASTA)
tcp_splice_refill(c);
+ tcp_keepalive(c, now);
tcp_inactivity(c, now);
}
diff --git a/tcp.h b/tcp.h
index e104d453..2739f309 100644
--- a/tcp.h
+++ b/tcp.h
@@ -38,6 +38,7 @@ extern bool peek_offset_cap;
* @rto_max: Maximum retry timeout (in s)
* @syn_retries: SYN retries using exponential backoff timeout
* @syn_linear_timeouts: SYN retries before using exponential backoff timeout
+ * @keepalive_run: Time we last issued tap-side keepalives
* @inactivity_run: Time we last scanned for inactive connections
*/
struct tcp_ctx {
@@ -48,6 +49,7 @@ struct tcp_ctx {
int rto_max;
uint8_t syn_retries;
uint8_t syn_linear_timeouts;
+ time_t keepalive_run;
time_t inactivity_run;
};
diff --git a/tcp_conn.h b/tcp_conn.h
index 7197ff63..c82e1441 100644
--- a/tcp_conn.h
+++ b/tcp_conn.h
@@ -16,6 +16,7 @@
* @ws_from_tap: Window scaling factor advertised from tap/guest
* @ws_to_tap: Window scaling factor advertised to tap/guest
* @tap_mss: MSS advertised by tap/guest, rounded to 2 ^ TCP_MSS_BITS
+ * @tapinactive: No tao activity within the current KEEPALIVE_INTERVAL
* @inactive: No activity within the current INACTIVITY_INTERVAL
* @sock: Socket descriptor number
* @events: Connection events, implying connection states
@@ -58,6 +59,7 @@ struct tcp_tap_conn {
(conn->rtt_exp = MIN(RTT_EXP_MAX, ilog2(MAX(1, rtt / RTT_STORE_MIN))))
#define RTT_GET(conn) (RTT_STORE_MIN << conn->rtt_exp)
+ bool tap_inactive :1;
bool inactive :1;
int sock :FD_REF_BITS;
--
2.52.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/4] tcp: Re-introduce inactivity timeouts based on a clock algorithm
2026-02-04 11:41 ` [PATCH 2/4] tcp: Re-introduce inactivity timeouts based on a clock algorithm David Gibson
@ 2026-02-05 0:17 ` Stefano Brivio
2026-02-06 0:43 ` David Gibson
0 siblings, 1 reply; 9+ messages in thread
From: Stefano Brivio @ 2026-02-05 0:17 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev
Nits only:
On Wed, 4 Feb 2026 21:41:35 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:
> We previously had a mechanism to remove TCP connections which were
> inactive for 2 hours. That was broken for a long time, due to poor
> interactions with the timerfd handling, so we removed it.
>
> Adding this long scale timer onto the timerfd handling, which mostly
> handles much shorter timeouts is tricky to reason about. However, for the
> inactivity timeouts, we don't require precision. Instead, we can use
> a 1-bit page replacement / "clock" algorithm. Every INACTIVITY_INTERVAL
> (2 hours), a global timer marks every TCP connection as tentatively
> inactive. That flag is cleared if we get any events, either tap side or
> socket side.
>
> If the inactive flag is still set when the next INACTIVITY_INTERVAL expires
> then the connection has been inactive for an extended period and we reset
> and close it. In practice this means that connections will be removed
> after 2-4 hours of inactivity.
>
> This is not a true fix for bug 179, but it does mitigate the damage, by
> limiting the time that inactive connections will remain around,
>
> Link: https://bugs.passt.top/show_bug.cgi?id=179
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> tcp.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++----
> tcp.h | 4 +++-
> tcp_conn.h | 3 +++
> 3 files changed, 54 insertions(+), 5 deletions(-)
>
> diff --git a/tcp.c b/tcp.c
> index f8663369..acdac7df 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -198,6 +198,13 @@
> * TCP_INFO, with a representable range from RTT_STORE_MIN (100 us) to
> * RTT_STORE_MAX (3276.8 ms). The timeout value is clamped accordingly.
> *
> + * We also use a global interval timer for an activity timeout which doesn't
> + * require precision:
> + *
> + * - INACTIVITY_INTERVAL: if a connection has had no activity for an entire
> + * interval, close and reset it. This means that idle connections (without
> + * keepalives) will be removed between INACTIVITY_INTERVAL s and
Probably easier to read: ... seconds
> + * 2*INACTIVITY_INTERVAL s after the last activity.
same here.
> *
> * Summary of data flows (with ESTABLISHED event)
> * ----------------------------------------------
> @@ -333,7 +340,8 @@ enum {
>
> #define RTO_INIT 1 /* s, RFC 6298 */
> #define RTO_INIT_AFTER_SYN_RETRIES 3 /* s, RFC 6298 */
> -#define ACT_TIMEOUT 7200
> +
> +#define INACTIVITY_INTERVAL 7200 /* s */
>
> #define LOW_RTT_TABLE_SIZE 8
> #define LOW_RTT_THRESHOLD 10 /* us */
> @@ -2254,6 +2262,8 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
> return 1;
> }
>
> + conn->inactive = false;
> +
> if (th->ack && !(conn->events & ESTABLISHED))
> tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq));
>
> @@ -2622,6 +2632,8 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref,
> return;
> }
>
> + conn->inactive = false;
> +
> if ((conn->events & TAP_FIN_ACKED) && (events & EPOLLHUP)) {
> conn_event(c, conn, CLOSED);
> return;
> @@ -2872,18 +2884,50 @@ int tcp_init(struct ctx *c)
> return 0;
> }
>
> +/**
> + * tcp_inactivity() - Scan for and close long-inactive connections
> + * @: Execution context
* @c: Execution context
* @now: Current timestamp
> + */
> +static void tcp_inactivity(struct ctx *c, const struct timespec *now)
> +{
> + union flow *flow;
> +
> + if (now->tv_sec - c->tcp.inactivity_run < INACTIVITY_INTERVAL)
> + return;
> +
> + debug("TCP inactivity scan");
> + c->tcp.inactivity_run = now->tv_sec;
> +
> + flow_foreach(flow) {
> + struct tcp_tap_conn *conn = &flow->tcp;
> +
> + if (flow->f.type != FLOW_TCP)
> + continue;
> +
> + if (conn->inactive) {
> + /* No activity in this interval, reset */
> + flow_dbg(conn, "Inactive for at least %us, resetting",
> + INACTIVITY_INTERVAL);
> + tcp_rst(c, conn);
> + }
> +
> + /* Ready to check fot next interval */
for
> + conn->inactive = true;
> + }
> +}
> +
> /**
> * tcp_timer() - Periodic tasks: port detection, closed connections, pool refill
> * @c: Execution context
> * @now: Current timestamp
> */
> -void tcp_timer(const struct ctx *c, const struct timespec *now)
> +void tcp_timer(struct ctx *c, const struct timespec *now)
> {
> - (void)now;
> -
> tcp_sock_refill_init(c);
> if (c->mode == MODE_PASTA)
> tcp_splice_refill(c);
> +
> + tcp_inactivity(c, now);
> }
>
> /**
> diff --git a/tcp.h b/tcp.h
> index 24b90870..e104d453 100644
> --- a/tcp.h
> +++ b/tcp.h
> @@ -21,7 +21,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
> int tcp_listen(const struct ctx *c, uint8_t pif, unsigned rule,
> const union inany_addr *addr, const char *ifname, in_port_t port);
> int tcp_init(struct ctx *c);
> -void tcp_timer(const struct ctx *c, const struct timespec *now);
> +void tcp_timer(struct ctx *c, const struct timespec *now);
> void tcp_defer_handler(struct ctx *c);
>
> void tcp_update_l2_buf(const unsigned char *eth_d);
> @@ -38,6 +38,7 @@ extern bool peek_offset_cap;
> * @rto_max: Maximum retry timeout (in s)
> * @syn_retries: SYN retries using exponential backoff timeout
> * @syn_linear_timeouts: SYN retries before using exponential backoff timeout
> + * @inactivity_run: Time we last scanned for inactive connections
> */
> struct tcp_ctx {
> struct fwd_ports fwd_in;
> @@ -47,6 +48,7 @@ struct tcp_ctx {
> int rto_max;
> uint8_t syn_retries;
> uint8_t syn_linear_timeouts;
> + time_t inactivity_run;
> };
>
> #endif /* TCP_H */
> diff --git a/tcp_conn.h b/tcp_conn.h
> index 21cea109..7197ff63 100644
> --- a/tcp_conn.h
> +++ b/tcp_conn.h
> @@ -16,6 +16,7 @@
> * @ws_from_tap: Window scaling factor advertised from tap/guest
> * @ws_to_tap: Window scaling factor advertised to tap/guest
> * @tap_mss: MSS advertised by tap/guest, rounded to 2 ^ TCP_MSS_BITS
> + * @inactive: No activity within the current INACTIVITY_INTERVAL
> * @sock: Socket descriptor number
> * @events: Connection events, implying connection states
> * @timer: timerfd descriptor for timeout events
> @@ -57,6 +58,8 @@ struct tcp_tap_conn {
> (conn->rtt_exp = MIN(RTT_EXP_MAX, ilog2(MAX(1, rtt / RTT_STORE_MIN))))
> #define RTT_GET(conn) (RTT_STORE_MIN << conn->rtt_exp)
>
> + bool inactive :1;
> +
> int sock :FD_REF_BITS;
>
> uint8_t events;
--
Stefano
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 4/4] tcp: Send TCP keepalive segments after a period of tap-side inactivity
2026-02-04 11:41 ` [PATCH 4/4] tcp: Send TCP keepalive segments after a period of tap-side inactivity David Gibson
@ 2026-02-05 0:17 ` Stefano Brivio
2026-02-06 1:21 ` David Gibson
0 siblings, 1 reply; 9+ messages in thread
From: Stefano Brivio @ 2026-02-05 0:17 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev
On Wed, 4 Feb 2026 21:41:37 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:
> There are several circumstances in which a live, but idle TCP connection
> can be forgotten by a guest, with no "on the wire" indication that this has
> happened. The most obvious is if the guest abruptly reboots. A more
> subtle case can happen with a half-closed connection, specifically one
> in FIN_WAIT_2 state on the guest. A connection can, legitimately, remain
> in this state indefinitely. If however, a socket in this state is closed
> by userspace, Linux at least will remove the kernel socket after 60s
> (or as configured in the net.ipv4.tcp_fin_timeout sysctl).
>
> Because there's no on the wire indication in these cases, passt will
> pointlessly retain the connection in its flow table, at least until it is
> removed by the inactivity timeout after several hours.
>
> To avoid keeping connections around for so long in this state, add
> functionality to periodically send TCP keepalive segments to the guest if
> we've seen no activity on the tap interface. If the guest is no longer
> aware of the connection, it should respond with an RST which will let
> passt remove the stale entry.
>
> To do this we use a method similar to the inactivity timeout - a 1-bit
> page replacement / clock algorithm, but with a shorter interval, and only
> checking for tap side activity. Currently we use a 300s interval, meaning
> we'll send a keepalive after 5-10 minutes of (tap side) inactivity.
>
> Link: https://bugs.passt.top/show_bug.cgi?id=179
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> tcp.c | 39 +++++++++++++++++++++++++++++++++++++++
> tcp.h | 2 ++
> tcp_conn.h | 2 ++
> 3 files changed, 43 insertions(+)
>
> diff --git a/tcp.c b/tcp.c
> index acdac7df..bf57be23 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -206,6 +206,12 @@
> * keepalives) will be removed between INACTIVITY_INTERVAL s and
> * 2*INACTIVITY_INTERVAL s after the last activity.
> *
> + * - KEEPALIVE_INTERVAL: if a connection has had no tap-side activity for an
> + * entire interval, send a tap-side keepalive. If the endpoint is no longer
> + * aware of the connection (due to a reboot, or a kernel timeout in FIN_WAIT_2
> + * state) that should trigger an RST, so we won't keep track of connections
> + * that the guest endpoint no longer cares about.
> + *
> * Summary of data flows (with ESTABLISHED event)
> * ----------------------------------------------
> *
> @@ -342,6 +348,7 @@ enum {
> #define RTO_INIT_AFTER_SYN_RETRIES 3 /* s, RFC 6298 */
>
> #define INACTIVITY_INTERVAL 7200 /* s */
> +#define KEEPALIVE_INTERVAL 30 /* s */
By the way, once we're done testing this, I'm not sure which value to
use. MUST-28 in RFC 9293 (3.8.4. TCP Keep-Alives) says we should
"default to no less than two hours". It looks a bit too long to be
useful, to me.
>
> #define LOW_RTT_TABLE_SIZE 8
> #define LOW_RTT_THRESHOLD 10 /* us */
> @@ -2263,6 +2270,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
> }
>
> conn->inactive = false;
> + conn->tap_inactive = false;
>
> if (th->ack && !(conn->events & ESTABLISHED))
> tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq));
> @@ -2884,6 +2892,36 @@ int tcp_init(struct ctx *c)
> return 0;
> }
>
> +/**
> + * tcp_keepalive() - Send keepalives for connections which need it
> + * @: Execution context
* @c: Execution context
* @now: Current timestamp
> + */
> +static void tcp_keepalive(struct ctx *c, const struct timespec *now)
> +{
> + union flow *flow;
> +
> + if (now->tv_sec - c->tcp.keepalive_run < KEEPALIVE_INTERVAL)
> + return;
> +
> + c->tcp.keepalive_run = now->tv_sec;
> +
> + flow_foreach(flow) {
> + struct tcp_tap_conn *conn = &flow->tcp;
> +
> + if (flow->f.type != FLOW_TCP)
> + continue;
> +
> + if (conn->tap_inactive) {
> + flow_dbg(conn, "No tap activity for least %us, send keepalive",
s/least// (or "for at least")
> + KEEPALIVE_INTERVAL);
> + tcp_send_flag(c, conn, KEEPALIVE);
Do we have to make sure we don't interpret replies to keepalives as
duplicate ACKs (by setting some internal flow flag)? I haven't really
checked that path.
> + }
> +
> + /* Ready to check fot next interval */
for
> + conn->tap_inactive = true;
> + }
> +}
> +
> /**
> * tcp_inactivity() - Scan for and close long-inactive connections
> * @: Execution context
> @@ -2927,6 +2965,7 @@ void tcp_timer(struct ctx *c, const struct timespec *now)
> if (c->mode == MODE_PASTA)
> tcp_splice_refill(c);
>
> + tcp_keepalive(c, now);
> tcp_inactivity(c, now);
> }
>
> diff --git a/tcp.h b/tcp.h
> index e104d453..2739f309 100644
> --- a/tcp.h
> +++ b/tcp.h
> @@ -38,6 +38,7 @@ extern bool peek_offset_cap;
> * @rto_max: Maximum retry timeout (in s)
> * @syn_retries: SYN retries using exponential backoff timeout
> * @syn_linear_timeouts: SYN retries before using exponential backoff timeout
> + * @keepalive_run: Time we last issued tap-side keepalives
> * @inactivity_run: Time we last scanned for inactive connections
> */
> struct tcp_ctx {
> @@ -48,6 +49,7 @@ struct tcp_ctx {
> int rto_max;
> uint8_t syn_retries;
> uint8_t syn_linear_timeouts;
> + time_t keepalive_run;
> time_t inactivity_run;
> };
>
> diff --git a/tcp_conn.h b/tcp_conn.h
> index 7197ff63..c82e1441 100644
> --- a/tcp_conn.h
> +++ b/tcp_conn.h
> @@ -16,6 +16,7 @@
> * @ws_from_tap: Window scaling factor advertised from tap/guest
> * @ws_to_tap: Window scaling factor advertised to tap/guest
> * @tap_mss: MSS advertised by tap/guest, rounded to 2 ^ TCP_MSS_BITS
> + * @tapinactive: No tao activity within the current KEEPALIVE_INTERVAL
tap_inactive, tap activity
> * @inactive: No activity within the current INACTIVITY_INTERVAL
> * @sock: Socket descriptor number
> * @events: Connection events, implying connection states
> @@ -58,6 +59,7 @@ struct tcp_tap_conn {
> (conn->rtt_exp = MIN(RTT_EXP_MAX, ilog2(MAX(1, rtt / RTT_STORE_MIN))))
> #define RTT_GET(conn) (RTT_STORE_MIN << conn->rtt_exp)
>
> + bool tap_inactive :1;
> bool inactive :1;
>
> int sock :FD_REF_BITS;
The rest looks all good to me, but I didn't test it at all.
--
Stefano
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/4] tcp: Re-introduce inactivity timeouts based on a clock algorithm
2026-02-05 0:17 ` Stefano Brivio
@ 2026-02-06 0:43 ` David Gibson
0 siblings, 0 replies; 9+ messages in thread
From: David Gibson @ 2026-02-06 0:43 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 7403 bytes --]
On Thu, Feb 05, 2026 at 01:17:28AM +0100, Stefano Brivio wrote:
> Nits only:
>
> On Wed, 4 Feb 2026 21:41:35 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > We previously had a mechanism to remove TCP connections which were
> > inactive for 2 hours. That was broken for a long time, due to poor
> > interactions with the timerfd handling, so we removed it.
> >
> > Adding this long scale timer onto the timerfd handling, which mostly
> > handles much shorter timeouts is tricky to reason about. However, for the
> > inactivity timeouts, we don't require precision. Instead, we can use
> > a 1-bit page replacement / "clock" algorithm. Every INACTIVITY_INTERVAL
> > (2 hours), a global timer marks every TCP connection as tentatively
> > inactive. That flag is cleared if we get any events, either tap side or
> > socket side.
> >
> > If the inactive flag is still set when the next INACTIVITY_INTERVAL expires
> > then the connection has been inactive for an extended period and we reset
> > and close it. In practice this means that connections will be removed
> > after 2-4 hours of inactivity.
> >
> > This is not a true fix for bug 179, but it does mitigate the damage, by
> > limiting the time that inactive connections will remain around,
> >
> > Link: https://bugs.passt.top/show_bug.cgi?id=179
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> > tcp.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++----
> > tcp.h | 4 +++-
> > tcp_conn.h | 3 +++
> > 3 files changed, 54 insertions(+), 5 deletions(-)
> >
> > diff --git a/tcp.c b/tcp.c
> > index f8663369..acdac7df 100644
> > --- a/tcp.c
> > +++ b/tcp.c
> > @@ -198,6 +198,13 @@
> > * TCP_INFO, with a representable range from RTT_STORE_MIN (100 us) to
> > * RTT_STORE_MAX (3276.8 ms). The timeout value is clamped accordingly.
> > *
> > + * We also use a global interval timer for an activity timeout which doesn't
> > + * require precision:
> > + *
> > + * - INACTIVITY_INTERVAL: if a connection has had no activity for an entire
> > + * interval, close and reset it. This means that idle connections (without
> > + * keepalives) will be removed between INACTIVITY_INTERVAL s and
>
> Probably easier to read: ... seconds
>
> > + * 2*INACTIVITY_INTERVAL s after the last activity.
>
> same here.
Good idea, done.
>
> > *
> > * Summary of data flows (with ESTABLISHED event)
> > * ----------------------------------------------
> > @@ -333,7 +340,8 @@ enum {
> >
> > #define RTO_INIT 1 /* s, RFC 6298 */
> > #define RTO_INIT_AFTER_SYN_RETRIES 3 /* s, RFC 6298 */
> > -#define ACT_TIMEOUT 7200
> > +
> > +#define INACTIVITY_INTERVAL 7200 /* s */
> >
> > #define LOW_RTT_TABLE_SIZE 8
> > #define LOW_RTT_THRESHOLD 10 /* us */
> > @@ -2254,6 +2262,8 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
> > return 1;
> > }
> >
> > + conn->inactive = false;
> > +
> > if (th->ack && !(conn->events & ESTABLISHED))
> > tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq));
> >
> > @@ -2622,6 +2632,8 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref,
> > return;
> > }
> >
> > + conn->inactive = false;
> > +
> > if ((conn->events & TAP_FIN_ACKED) && (events & EPOLLHUP)) {
> > conn_event(c, conn, CLOSED);
> > return;
> > @@ -2872,18 +2884,50 @@ int tcp_init(struct ctx *c)
> > return 0;
> > }
> >
> > +/**
> > + * tcp_inactivity() - Scan for and close long-inactive connections
> > + * @: Execution context
>
> * @c: Execution context
> * @now: Current timestamp
Oops, fixed.
> > + */
> > +static void tcp_inactivity(struct ctx *c, const struct timespec *now)
> > +{
> > + union flow *flow;
> > +
> > + if (now->tv_sec - c->tcp.inactivity_run < INACTIVITY_INTERVAL)
> > + return;
> > +
> > + debug("TCP inactivity scan");
> > + c->tcp.inactivity_run = now->tv_sec;
> > +
> > + flow_foreach(flow) {
> > + struct tcp_tap_conn *conn = &flow->tcp;
> > +
> > + if (flow->f.type != FLOW_TCP)
> > + continue;
> > +
> > + if (conn->inactive) {
> > + /* No activity in this interval, reset */
> > + flow_dbg(conn, "Inactive for at least %us, resetting",
> > + INACTIVITY_INTERVAL);
> > + tcp_rst(c, conn);
> > + }
> > +
> > + /* Ready to check fot next interval */
>
> for
Wow, I really wasn't typing good that day. Fixed.
>
> > + conn->inactive = true;
> > + }
> > +}
> > +
> > /**
> > * tcp_timer() - Periodic tasks: port detection, closed connections, pool refill
> > * @c: Execution context
> > * @now: Current timestamp
> > */
> > -void tcp_timer(const struct ctx *c, const struct timespec *now)
> > +void tcp_timer(struct ctx *c, const struct timespec *now)
> > {
> > - (void)now;
> > -
> > tcp_sock_refill_init(c);
> > if (c->mode == MODE_PASTA)
> > tcp_splice_refill(c);
> > +
> > + tcp_inactivity(c, now);
> > }
> >
> > /**
> > diff --git a/tcp.h b/tcp.h
> > index 24b90870..e104d453 100644
> > --- a/tcp.h
> > +++ b/tcp.h
> > @@ -21,7 +21,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
> > int tcp_listen(const struct ctx *c, uint8_t pif, unsigned rule,
> > const union inany_addr *addr, const char *ifname, in_port_t port);
> > int tcp_init(struct ctx *c);
> > -void tcp_timer(const struct ctx *c, const struct timespec *now);
> > +void tcp_timer(struct ctx *c, const struct timespec *now);
> > void tcp_defer_handler(struct ctx *c);
> >
> > void tcp_update_l2_buf(const unsigned char *eth_d);
> > @@ -38,6 +38,7 @@ extern bool peek_offset_cap;
> > * @rto_max: Maximum retry timeout (in s)
> > * @syn_retries: SYN retries using exponential backoff timeout
> > * @syn_linear_timeouts: SYN retries before using exponential backoff timeout
> > + * @inactivity_run: Time we last scanned for inactive connections
> > */
> > struct tcp_ctx {
> > struct fwd_ports fwd_in;
> > @@ -47,6 +48,7 @@ struct tcp_ctx {
> > int rto_max;
> > uint8_t syn_retries;
> > uint8_t syn_linear_timeouts;
> > + time_t inactivity_run;
> > };
> >
> > #endif /* TCP_H */
> > diff --git a/tcp_conn.h b/tcp_conn.h
> > index 21cea109..7197ff63 100644
> > --- a/tcp_conn.h
> > +++ b/tcp_conn.h
> > @@ -16,6 +16,7 @@
> > * @ws_from_tap: Window scaling factor advertised from tap/guest
> > * @ws_to_tap: Window scaling factor advertised to tap/guest
> > * @tap_mss: MSS advertised by tap/guest, rounded to 2 ^ TCP_MSS_BITS
> > + * @inactive: No activity within the current INACTIVITY_INTERVAL
> > * @sock: Socket descriptor number
> > * @events: Connection events, implying connection states
> > * @timer: timerfd descriptor for timeout events
> > @@ -57,6 +58,8 @@ struct tcp_tap_conn {
> > (conn->rtt_exp = MIN(RTT_EXP_MAX, ilog2(MAX(1, rtt / RTT_STORE_MIN))))
> > #define RTT_GET(conn) (RTT_STORE_MIN << conn->rtt_exp)
> >
> > + bool inactive :1;
> > +
> > int sock :FD_REF_BITS;
> >
> > uint8_t events;
>
> --
> Stefano
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 4/4] tcp: Send TCP keepalive segments after a period of tap-side inactivity
2026-02-05 0:17 ` Stefano Brivio
@ 2026-02-06 1:21 ` David Gibson
0 siblings, 0 replies; 9+ messages in thread
From: David Gibson @ 2026-02-06 1:21 UTC (permalink / raw)
To: Stefano Brivio; +Cc: passt-dev
[-- Attachment #1: Type: text/plain, Size: 9033 bytes --]
On Thu, Feb 05, 2026 at 01:17:45AM +0100, Stefano Brivio wrote:
> On Wed, 4 Feb 2026 21:41:37 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > There are several circumstances in which a live, but idle TCP connection
> > can be forgotten by a guest, with no "on the wire" indication that this has
> > happened. The most obvious is if the guest abruptly reboots. A more
> > subtle case can happen with a half-closed connection, specifically one
> > in FIN_WAIT_2 state on the guest. A connection can, legitimately, remain
> > in this state indefinitely. If however, a socket in this state is closed
> > by userspace, Linux at least will remove the kernel socket after 60s
> > (or as configured in the net.ipv4.tcp_fin_timeout sysctl).
> >
> > Because there's no on the wire indication in these cases, passt will
> > pointlessly retain the connection in its flow table, at least until it is
> > removed by the inactivity timeout after several hours.
> >
> > To avoid keeping connections around for so long in this state, add
> > functionality to periodically send TCP keepalive segments to the guest if
> > we've seen no activity on the tap interface. If the guest is no longer
> > aware of the connection, it should respond with an RST which will let
> > passt remove the stale entry.
> >
> > To do this we use a method similar to the inactivity timeout - a 1-bit
> > page replacement / clock algorithm, but with a shorter interval, and only
> > checking for tap side activity. Currently we use a 300s interval, meaning
> > we'll send a keepalive after 5-10 minutes of (tap side) inactivity.
> >
> > Link: https://bugs.passt.top/show_bug.cgi?id=179
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> > tcp.c | 39 +++++++++++++++++++++++++++++++++++++++
> > tcp.h | 2 ++
> > tcp_conn.h | 2 ++
> > 3 files changed, 43 insertions(+)
> >
> > diff --git a/tcp.c b/tcp.c
> > index acdac7df..bf57be23 100644
> > --- a/tcp.c
> > +++ b/tcp.c
> > @@ -206,6 +206,12 @@
> > * keepalives) will be removed between INACTIVITY_INTERVAL s and
> > * 2*INACTIVITY_INTERVAL s after the last activity.
> > *
> > + * - KEEPALIVE_INTERVAL: if a connection has had no tap-side activity for an
> > + * entire interval, send a tap-side keepalive. If the endpoint is no longer
> > + * aware of the connection (due to a reboot, or a kernel timeout in FIN_WAIT_2
> > + * state) that should trigger an RST, so we won't keep track of connections
> > + * that the guest endpoint no longer cares about.
> > + *
> > * Summary of data flows (with ESTABLISHED event)
> > * ----------------------------------------------
> > *
> > @@ -342,6 +348,7 @@ enum {
> > #define RTO_INIT_AFTER_SYN_RETRIES 3 /* s, RFC 6298 */
> >
> > #define INACTIVITY_INTERVAL 7200 /* s */
> > +#define KEEPALIVE_INTERVAL 30 /* s */
>
> By the way, once we're done testing this, I'm not sure which value to
> use. MUST-28 in RFC 9293 (3.8.4. TCP Keep-Alives) says we should
> "default to no less than two hours". It looks a bit too long to be
> useful, to me.
I agree, and as we've discussed a bit, the purpose of these keepalives
differs a bit from more common reasons for sending TCP keepalives,
meaning we have different considerations for the interval.
Also note that while RFC 9293 is pretty recent, the "no less than two
hours" is unchanged since RFC 1122, from 1989. In 1989, keepalives
every few minutes might have been a significant bandwidth concern,
now, not so much. Especially since these are tap side only, which
usually be a local high-speed connections (it only won't be if the
guest routes to a VPN or something).
I was thinking of going with a 300s interval, so timeout would happen
after 5-10 minutes. That has the worse case timeout being just one
(base 10) order of magnitude away from the default 60s value of
net.ipv4.tcp_fin_timeout. I could be persuaded to increase the
interval to 900s (15-30m timeout). Much longer than that and I don't
think it would really accomplish the purpose we're looking for in
bug179 - we'd still have stale connections hanging around for an hour
or more.
> >
> > #define LOW_RTT_TABLE_SIZE 8
> > #define LOW_RTT_THRESHOLD 10 /* us */
> > @@ -2263,6 +2270,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
> > }
> >
> > conn->inactive = false;
> > + conn->tap_inactive = false;
> >
> > if (th->ack && !(conn->events & ESTABLISHED))
> > tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq));
> > @@ -2884,6 +2892,36 @@ int tcp_init(struct ctx *c)
> > return 0;
> > }
> >
> > +/**
> > + * tcp_keepalive() - Send keepalives for connections which need it
> > + * @: Execution context
>
> * @c: Execution context
> * @now: Current timestamp
Fixed.
> > + */
> > +static void tcp_keepalive(struct ctx *c, const struct timespec *now)
> > +{
> > + union flow *flow;
> > +
> > + if (now->tv_sec - c->tcp.keepalive_run < KEEPALIVE_INTERVAL)
> > + return;
> > +
> > + c->tcp.keepalive_run = now->tv_sec;
> > +
> > + flow_foreach(flow) {
> > + struct tcp_tap_conn *conn = &flow->tcp;
> > +
> > + if (flow->f.type != FLOW_TCP)
> > + continue;
> > +
> > + if (conn->tap_inactive) {
> > + flow_dbg(conn, "No tap activity for least %us, send keepalive",
>
> s/least// (or "for at least")
Oops. "for at least" was what I intended, since that's more accurate.
> > + KEEPALIVE_INTERVAL);
> > + tcp_send_flag(c, conn, KEEPALIVE);
>
> Do we have to make sure we don't interpret replies to keepalives as
> duplicate ACKs (by setting some internal flow flag)? I haven't really
> checked that path.
That's... an interesting question. I think the answer is "not in a
way that matters". If we get here, we haven't heard anything
from the guest for at least 5 minutes (or whatever). That implies either:
a) it's already acked everything we sent. So, even if we see the
keepalive reply as a dup ack, there's nothing to retransmit, so
it's harmless.
or
b) it's not listening at all, in which case it's unlikely to respond
to the keepalive either
or
c) we're already retransmitting, and have backed off to pretty long
timeouts (greater than the keepalive interval), in which case a
dup-ack is just a bit more data for us
> > + }
> > +
> > + /* Ready to check fot next interval */
>
> for
Copypasta. Fixed.
> > + conn->tap_inactive = true;
> > + }
> > +}
> > +
> > /**
> > * tcp_inactivity() - Scan for and close long-inactive connections
> > * @: Execution context
> > @@ -2927,6 +2965,7 @@ void tcp_timer(struct ctx *c, const struct timespec *now)
> > if (c->mode == MODE_PASTA)
> > tcp_splice_refill(c);
> >
> > + tcp_keepalive(c, now);
> > tcp_inactivity(c, now);
> > }
> >
> > diff --git a/tcp.h b/tcp.h
> > index e104d453..2739f309 100644
> > --- a/tcp.h
> > +++ b/tcp.h
> > @@ -38,6 +38,7 @@ extern bool peek_offset_cap;
> > * @rto_max: Maximum retry timeout (in s)
> > * @syn_retries: SYN retries using exponential backoff timeout
> > * @syn_linear_timeouts: SYN retries before using exponential backoff timeout
> > + * @keepalive_run: Time we last issued tap-side keepalives
> > * @inactivity_run: Time we last scanned for inactive connections
> > */
> > struct tcp_ctx {
> > @@ -48,6 +49,7 @@ struct tcp_ctx {
> > int rto_max;
> > uint8_t syn_retries;
> > uint8_t syn_linear_timeouts;
> > + time_t keepalive_run;
> > time_t inactivity_run;
> > };
> >
> > diff --git a/tcp_conn.h b/tcp_conn.h
> > index 7197ff63..c82e1441 100644
> > --- a/tcp_conn.h
> > +++ b/tcp_conn.h
> > @@ -16,6 +16,7 @@
> > * @ws_from_tap: Window scaling factor advertised from tap/guest
> > * @ws_to_tap: Window scaling factor advertised to tap/guest
> > * @tap_mss: MSS advertised by tap/guest, rounded to 2 ^ TCP_MSS_BITS
> > + * @tapinactive: No tao activity within the current KEEPALIVE_INTERVAL
>
> tap_inactive, tap activity
The typos keep coming. I'm going to blame the laptop keyboard and an
uncomfortable public library chair :).
>
> > * @inactive: No activity within the current INACTIVITY_INTERVAL
> > * @sock: Socket descriptor number
> > * @events: Connection events, implying connection states
> > @@ -58,6 +59,7 @@ struct tcp_tap_conn {
> > (conn->rtt_exp = MIN(RTT_EXP_MAX, ilog2(MAX(1, rtt / RTT_STORE_MIN))))
> > #define RTT_GET(conn) (RTT_STORE_MIN << conn->rtt_exp)
> >
> > + bool tap_inactive :1;
> > bool inactive :1;
> >
> > int sock :FD_REF_BITS;
>
> The rest looks all good to me, but I didn't test it at all.
>
> --
> Stefano
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-02-06 1:22 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-04 11:41 [PATCH 0/4] RFC: Reworks and improvements to TCP activity timers David Gibson
2026-02-04 11:41 ` [PATCH 1/4] tcp: Remove non-working activity timeout mechanism David Gibson
2026-02-04 11:41 ` [PATCH 2/4] tcp: Re-introduce inactivity timeouts based on a clock algorithm David Gibson
2026-02-05 0:17 ` Stefano Brivio
2026-02-06 0:43 ` David Gibson
2026-02-04 11:41 ` [PATCH 3/4] tcp: Extend tcp_send_flag() to send TCP keepalive segments David Gibson
2026-02-04 11:41 ` [PATCH 4/4] tcp: Send TCP keepalive segments after a period of tap-side inactivity David Gibson
2026-02-05 0:17 ` Stefano Brivio
2026-02-06 1:21 ` David Gibson
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).