From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202512 header.b=SXtmz8c6; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id D64105A004E for ; Fri, 06 Feb 2026 02:22:07 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202512; t=1770340925; bh=mXumzpHA1QGdyw9QOrBKRe40pyhdmh12KAd3OF27AMU=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=SXtmz8c6cioV0se343iWEkIaPb6uWi+NjMs8vVfmH6GruawUjeHM1IQK5ZX3o0l2B JL85EbIsC+haSPkl1lOZLbFeaF4uslXEVQNNWRxcF/7n0V8aKJ0vBk7Y5bqiTE4eCm SAiUPkD0nKzbt21cxa1q3wbdm2jgfsJNGsvEaNzLG2Xl7bnGFcw/1z4ToY5oE1hch2 hmppp0rQsIUzzwXerYDJUGlLaMLmrtUME0FUii46Fr6enxh4AxewN9vTu5vGsJ9yOx THc9S5qRLSuZbDHdSPdQRJsmhEn/+6gEZjLXanqW1IUwJUCRLIjN44clljlwHjECXx 9xgD5NcFdyFJg== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4f6bsn29GBz4wHW; Fri, 06 Feb 2026 12:22:05 +1100 (AEDT) Date: Fri, 6 Feb 2026 11:21:55 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 4/4] tcp: Send TCP keepalive segments after a period of tap-side inactivity Message-ID: References: <20260204114137.2784090-1-david@gibson.dropbear.id.au> <20260204114137.2784090-5-david@gibson.dropbear.id.au> <20260205011744.4243b8ce@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="urA9qkg5pDFKyk3P" Content-Disposition: inline In-Reply-To: <20260205011744.4243b8ce@elisabeth> Message-ID-Hash: UZJLLUAQZHPZAK2LKWCW36V6JYOEMNCE X-Message-ID-Hash: UZJLLUAQZHPZAK2LKWCW36V6JYOEMNCE X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --urA9qkg5pDFKyk3P Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Feb 05, 2026 at 01:17:45AM +0100, Stefano Brivio wrote: > On Wed, 4 Feb 2026 21:41:37 +1000 > David Gibson wrote: >=20 > > There are several circumstances in which a live, but idle TCP connection > > can be forgotten by a guest, with no "on the wire" indication that this= has > > happened. The most obvious is if the guest abruptly reboots. A more > > subtle case can happen with a half-closed connection, specifically one > > in FIN_WAIT_2 state on the guest. A connection can, legitimately, rema= in > > in this state indefinitely. If however, a socket in this state is clos= ed > > by userspace, Linux at least will remove the kernel socket after 60s > > (or as configured in the net.ipv4.tcp_fin_timeout sysctl). > >=20 > > Because there's no on the wire indication in these cases, passt will > > pointlessly retain the connection in its flow table, at least until it = is > > removed by the inactivity timeout after several hours. > >=20 > > To avoid keeping connections around for so long in this state, add > > functionality to periodically send TCP keepalive segments to the guest = if > > we've seen no activity on the tap interface. If the guest is no longer > > aware of the connection, it should respond with an RST which will let > > passt remove the stale entry. > >=20 > > To do this we use a method similar to the inactivity timeout - a 1-bit > > page replacement / clock algorithm, but with a shorter interval, and on= ly > > checking for tap side activity. Currently we use a 300s interval, mean= ing > > we'll send a keepalive after 5-10 minutes of (tap side) inactivity. > >=20 > > Link: https://bugs.passt.top/show_bug.cgi?id=3D179 > >=20 > > Signed-off-by: David Gibson > > --- > > tcp.c | 39 +++++++++++++++++++++++++++++++++++++++ > > tcp.h | 2 ++ > > tcp_conn.h | 2 ++ > > 3 files changed, 43 insertions(+) > >=20 > > diff --git a/tcp.c b/tcp.c > > index acdac7df..bf57be23 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -206,6 +206,12 @@ > > * keepalives) will be removed between INACTIVITY_INTERVAL s and > > * 2*INACTIVITY_INTERVAL s after the last activity. > > * > > + * - KEEPALIVE_INTERVAL: if a connection has had no tap-side activity = for an > > + * entire interval, send a tap-side keepalive. If the endpoint is n= o longer > > + * aware of the connection (due to a reboot, or a kernel timeout in = FIN_WAIT_2 > > + * state) that should trigger an RST, so we won't keep track of conn= ections > > + * that the guest endpoint no longer cares about. > > + * > > * Summary of data flows (with ESTABLISHED event) > > * ---------------------------------------------- > > * > > @@ -342,6 +348,7 @@ enum { > > #define RTO_INIT_AFTER_SYN_RETRIES 3 /* s, RFC 6298 */ > > =20 > > #define INACTIVITY_INTERVAL 7200 /* s */ > > +#define KEEPALIVE_INTERVAL 30 /* s */ >=20 > By the way, once we're done testing this, I'm not sure which value to > use. MUST-28 in RFC 9293 (3.8.4. TCP Keep-Alives) says we should > "default to no less than two hours". It looks a bit too long to be > useful, to me. I agree, and as we've discussed a bit, the purpose of these keepalives differs a bit from more common reasons for sending TCP keepalives, meaning we have different considerations for the interval. Also note that while RFC 9293 is pretty recent, the "no less than two hours" is unchanged since RFC 1122, from 1989. In 1989, keepalives every few minutes might have been a significant bandwidth concern, now, not so much. Especially since these are tap side only, which usually be a local high-speed connections (it only won't be if the guest routes to a VPN or something). I was thinking of going with a 300s interval, so timeout would happen after 5-10 minutes. That has the worse case timeout being just one (base 10) order of magnitude away from the default 60s value of net.ipv4.tcp_fin_timeout. I could be persuaded to increase the interval to 900s (15-30m timeout). Much longer than that and I don't think it would really accomplish the purpose we're looking for in bug179 - we'd still have stale connections hanging around for an hour or more. > > =20 > > #define LOW_RTT_TABLE_SIZE 8 > > #define LOW_RTT_THRESHOLD 10 /* us */ > > @@ -2263,6 +2270,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t = pif, sa_family_t af, > > } > > =20 > > conn->inactive =3D false; > > + conn->tap_inactive =3D false; > > =20 > > if (th->ack && !(conn->events & ESTABLISHED)) > > tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq)); > > @@ -2884,6 +2892,36 @@ int tcp_init(struct ctx *c) > > return 0; > > } > > =20 > > +/** > > + * tcp_keepalive() - Send keepalives for connections which need it > > + * @: Execution context >=20 > * @c: Execution context > * @now: Current timestamp Fixed. > > + */ > > +static void tcp_keepalive(struct ctx *c, const struct timespec *now) > > +{ > > + union flow *flow; > > + > > + if (now->tv_sec - c->tcp.keepalive_run < KEEPALIVE_INTERVAL) > > + return; > > + > > + c->tcp.keepalive_run =3D now->tv_sec; > > + > > + flow_foreach(flow) { > > + struct tcp_tap_conn *conn =3D &flow->tcp; > > + > > + if (flow->f.type !=3D FLOW_TCP) > > + continue; > > + > > + if (conn->tap_inactive) { > > + flow_dbg(conn, "No tap activity for least %us, send keepalive", >=20 > s/least// (or "for at least") Oops. "for at least" was what I intended, since that's more accurate. > > + KEEPALIVE_INTERVAL); > > + tcp_send_flag(c, conn, KEEPALIVE); >=20 > Do we have to make sure we don't interpret replies to keepalives as > duplicate ACKs (by setting some internal flow flag)? I haven't really > checked that path. That's... an interesting question. I think the answer is "not in a way that matters". If we get here, we haven't heard anything =66rom the guest for at least 5 minutes (or whatever). That implies either: a) it's already acked everything we sent. So, even if we see the keepalive reply as a dup ack, there's nothing to retransmit, so it's harmless. or b) it's not listening at all, in which case it's unlikely to respond to the keepalive either or c) we're already retransmitting, and have backed off to pretty long timeouts (greater than the keepalive interval), in which case a dup-ack is just a bit more data for us > > + } > > + > > + /* Ready to check fot next interval */ >=20 > for Copypasta. Fixed. > > + conn->tap_inactive =3D true; > > + } > > +} > > + > > /** > > * tcp_inactivity() - Scan for and close long-inactive connections > > * @: Execution context > > @@ -2927,6 +2965,7 @@ void tcp_timer(struct ctx *c, const struct timesp= ec *now) > > if (c->mode =3D=3D MODE_PASTA) > > tcp_splice_refill(c); > > =20 > > + tcp_keepalive(c, now); > > tcp_inactivity(c, now); > > } > > =20 > > diff --git a/tcp.h b/tcp.h > > index e104d453..2739f309 100644 > > --- a/tcp.h > > +++ b/tcp.h > > @@ -38,6 +38,7 @@ extern bool peek_offset_cap; > > * @rto_max: Maximum retry timeout (in s) > > * @syn_retries: SYN retries using exponential backoff timeout > > * @syn_linear_timeouts: SYN retries before using exponential backoff = timeout > > + * @keepalive_run: Time we last issued tap-side keepalives > > * @inactivity_run: Time we last scanned for inactive connections > > */ > > struct tcp_ctx { > > @@ -48,6 +49,7 @@ struct tcp_ctx { > > int rto_max; > > uint8_t syn_retries; > > uint8_t syn_linear_timeouts; > > + time_t keepalive_run; > > time_t inactivity_run; > > }; > > =20 > > diff --git a/tcp_conn.h b/tcp_conn.h > > index 7197ff63..c82e1441 100644 > > --- a/tcp_conn.h > > +++ b/tcp_conn.h > > @@ -16,6 +16,7 @@ > > * @ws_from_tap: Window scaling factor advertised from tap/guest > > * @ws_to_tap: Window scaling factor advertised to tap/guest > > * @tap_mss: MSS advertised by tap/guest, rounded to 2 ^ TCP_MSS_BITS > > + * @tapinactive: No tao activity within the current KEEPALIVE_INTERVAL >=20 > tap_inactive, tap activity The typos keep coming. I'm going to blame the laptop keyboard and an uncomfortable public library chair :). >=20 > > * @inactive: No activity within the current INACTIVITY_INTERVAL > > * @sock: Socket descriptor number > > * @events: Connection events, implying connection states > > @@ -58,6 +59,7 @@ struct tcp_tap_conn { > > (conn->rtt_exp =3D MIN(RTT_EXP_MAX, ilog2(MAX(1, rtt / RTT_STORE_MIN)= ))) > > #define RTT_GET(conn) (RTT_STORE_MIN << conn->rtt_exp) > > =20 > > + bool tap_inactive :1; > > bool inactive :1; > > =20 > > int sock :FD_REF_BITS; >=20 > The rest looks all good to me, but I didn't test it at all. >=20 > --=20 > Stefano >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --urA9qkg5pDFKyk3P Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmmFQicACgkQzQJF27ox 2Gdkew//WfFsc5BbuyurhuvbJ9p9CrYL/FiuHiPY4NTsumK6n8oDFIgCaeovTzGb N86cFXcpEb5Whlljf9IC2W9o8L0OfvwHPF31W69JwKIH5sPF7rQU/Cb7YrRO20EB spzvjeTrteBtEmqvxVaSSbm4UgmJDFto2D1BaGZHBTAtdpq8IwnTqNCj2VyR474l GhoJxZapfLAi64IahjJpVgiv2QLZ8HVYRAyam7zs0xvQEH7QOjNNbtpx1cymaN/+ RuyBpv1ZazIjyni1oFWeMHcF1TiI8HKSQXMn/2WHmEfGIjQp2uDiovPhQT9aSprn r5LKVZCgTmUTUe5AH1eYvl7k41S7AUgSGxWLu8g7+mkS+de2vxsThQr6DFF2jq8p GUUChUK4R60xnRLUmpjHPhO2n+aBFuYol9uX+IITPNJqL0ffiK5HEtaryHsojxWd wPN4TD0Rbk1MUWJ4AUEG/boge/Jt+xYAwMhCIBydDgN8KKT5GDbLRVaex1fK7Df0 MIeOwZMyOMEBgT8YhnCc0payofCo8zTbh29WqSSPSVFEMfiZEAc7ha+B68oz3oiJ HatcTnJv57OIZZRz3fYLejzngFtfW+oFvt7xDSZoeS2O3Ud7yV3R04dAyLjZyIjP 1NQifc4d5m16Vs5ZjqKsXQoFfsQix+co/WC0vkNpyc8Jie6w/wM= =6ZXZ -----END PGP SIGNATURE----- --urA9qkg5pDFKyk3P--