From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202510 header.b=LzJ7Qn5f; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id EC24D5A0BB9 for ; Fri, 14 Nov 2025 01:47:03 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202510; t=1763081221; bh=9/179b+yRb4oqrNI1zc1IEcwJLIMFJ+bacohSsemPTg=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=LzJ7Qn5fL9gkW6+cy5g0iY0MheUKLdayxq1aAlFTqaqQG93pY7VW5HDmp/Dq7Wudp Wwq338f/dbCjTHFtLUslmCm1kUIHpFGOZnwH2X8lohajiZ4RHFTWVlYU6J4bK2HKAz NQbE4DtmZREaVIql/AxO0Q9S0UlZb+Ma/9gz7/NE2CNU2GPx4O1cv1dPW2baoU6CO1 J75IkZzJE4eCdonsU0TRazBXqDZhBpYi4kHyegKyXCrXPClHqwhxCgqzMrUuEB4l19 rnLJ5MC2BO3fkG6+aZmA46RK914NbHwicaLWEhKOVr6/quHat0mgF2OUm2bRlcAHs1 SN/pJLkmVErZQ== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4d6z45091lz4wD3; Fri, 14 Nov 2025 11:47:01 +1100 (AEDT) Date: Fri, 14 Nov 2025 11:35:03 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v8 6/6] tcp: Clamp the retry timeout Message-ID: References: <20251110093137.87705-7-yuhuang@redhat.com> <20251114010121.10dfb18a@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="Tj4pfXSwE/O/Y3Qt" Content-Disposition: inline In-Reply-To: <20251114010121.10dfb18a@elisabeth> Message-ID-Hash: OTICRJPUPLCT2BG2G2Z3CPK6I223TJHQ X-Message-ID-Hash: OTICRJPUPLCT2BG2G2Z3CPK6I223TJHQ X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Yumei Huang , passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --Tj4pfXSwE/O/Y3Qt Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Nov 14, 2025 at 01:01:21AM +0100, Stefano Brivio wrote: > On Mon, 10 Nov 2025 21:56:39 +1100 > David Gibson wrote: >=20 > > On Mon, Nov 10, 2025 at 05:31:37PM +0800, Yumei Huang wrote: > > > Clamp the TCP retry timeout as Linux kernel does. If a retry occurs > > > during the handshake and the RTO is below 3 seconds, re-initialise > > > it to 3 seconds for data retransmissions according to RFC 6298. > > >=20 > > > Suggested-by: Stefano Brivio > > > Signed-off-by: Yumei Huang =20 > >=20 > > Looks correct, so > >=20 > > Reviewed-by: David Gibson > >=20 > > A few possible improvements (mostly comments/names) below. > >=20 > > > --- > > > tcp.c | 25 ++++++++++++++++++++----- > > > tcp.h | 2 ++ > > > tcp_conn.h | 1 + > > > 3 files changed, 23 insertions(+), 5 deletions(-) > > >=20 > > > diff --git a/tcp.c b/tcp.c > > > index ee111e0..b015658 100644 > > > --- a/tcp.c > > > +++ b/tcp.c > > > @@ -187,6 +187,9 @@ > > > * for established connections, or (syn_retries + syn_linear_timeo= uts) times > > > * during the handshake, reset the connection > > > * > > > + * - RTO_INIT_ACK: if the RTO is less than this, re-initialise RTO t= o this for > > > + * data retransmissions > > > + * =20 > >=20 > > Now that this is conditional on SYN being retried, the description > > doesn't look entirely accurate any more. >=20 > If I read the whole thing again I would actually say it becomes > misleading and worth fixing before applying this. Also because the > relationship to the text in the RFC is not obvious anymore. >=20 > This should be "if SYN retries happened during handshake, ...". >=20 > > > * - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK_F= ROM_TAP_DUE > > > * with TAP_FIN_SENT event), and no ACK is received within this ti= me, reset > > > * the connection > > > @@ -340,6 +343,7 @@ enum { > > > =20 > > > #define ACK_INTERVAL 10 /* ms */ > > > #define RTO_INIT 1 /* s, RFC 6298 */ > > > +#define RTO_INIT_ACK 3 /* s, RFC 6298 */ =20 > >=20 > > The relevance of "_ACK" in the name is not obvious to me. >=20 > What about RTO_INIT_AFTER_SYN_RETRIES? We use it only once, and it > looks fine there. Works for me. > > > #define FIN_TIMEOUT 60 > > > #define ACT_TIMEOUT 7200 > > > =20 > > -> @@ -365,9 +369,11 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV= _QUEUE_MAX]; > > > =20 > > > #define SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retries" > > > #define SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linear_timeo= uts" > > > +#define RTO_MAX_MS "/proc/sys/net/ipv4/tcp_rto_max_ms" > > > =20 > > > #define SYN_RETRIES_DEFAULT 6 > > > #define SYN_LINEAR_TIMEOUTS_DEFAULT 4 > > > +#define RTO_MAX_MS_DEFAULT 120000 > > > #define MAX_SYNCNT 127 /* derived from kernel's limit */ > > > =20 > > > /* "Extended" data (not stored in the flow table) for TCP flow migra= tion */ > > > @@ -392,7 +398,7 @@ static const char *tcp_state_str[] __attribute((_= _unused__)) =3D { > > > =20 > > > static const char *tcp_flag_str[] __attribute((__unused__)) =3D { > > > "STALLED", "LOCAL", "ACTIVE_CLOSE", "ACK_TO_TAP_DUE", > > > - "ACK_FROM_TAP_DUE", "ACK_FROM_TAP_BLOCKS", > > > + "ACK_FROM_TAP_DUE", "ACK_FROM_TAP_BLOCKS", "SYN_RETRIED", > > > }; > > > =20 > > > /* Listening sockets, used for automatic port forwarding in pasta mo= de only */ > > > @@ -590,10 +596,13 @@ static void tcp_timer_ctl(const struct ctx *c, = struct tcp_tap_conn *conn) > > > if (conn->flags & ACK_TO_TAP_DUE) { > > > it.it_value.tv_nsec =3D (long)ACK_INTERVAL * 1000 * 1000; > > > } else if (conn->flags & ACK_FROM_TAP_DUE) { > > > - int exp =3D conn->retries; > > > + int exp =3D conn->retries, timeout =3D RTO_INIT; > > > if (!(conn->events & ESTABLISHED)) > > > exp -=3D c->tcp.syn_linear_timeouts; > > > - it.it_value.tv_sec =3D RTO_INIT << MAX(exp, 0); > > > + else if (conn->flags & SYN_RETRIED) > > > + timeout =3D MAX(timeout, RTO_INIT_ACK); > > > + timeout <<=3D MAX(exp, 0); > > > + it.it_value.tv_sec =3D MIN(timeout, c->tcp.rto_max); > > > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > > > it.it_value.tv_sec =3D FIN_TIMEOUT; > > > } else { > > > @@ -2440,6 +2449,7 @@ void tcp_timer_handler(const struct ctx *c, uni= on epoll_ref ref) > > > flow_trace(conn, "SYN timeout, retry"); > > > tcp_send_flag(c, conn, SYN); > > > conn->retries++; > > > + conn_flag(c, conn, SYN_RETRIED); > > > tcp_timer_ctl(c, conn); > > > } > > > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > > > @@ -2811,10 +2821,15 @@ void tcp_get_rto_params(struct ctx *c) > > > v =3D read_file_integer(SYN_LINEAR_TIMEOUTS, SYN_LINEAR_TIMEOUTS_DE= FAULT); > > > c->tcp.syn_linear_timeouts =3D MIN(v, MAX_SYNCNT); > > > =20 > > > + v =3D read_file_integer(RTO_MAX_MS, RTO_MAX_MS_DEFAULT); > > > + c->tcp.rto_max =3D MIN(DIV_ROUND_CLOSEST(v, 1000), INT_MAX); =20 > >=20 > > Possibly we should verify this is =3D> RTO_INIT. >=20 > As a sanity check, maybe, but I don't see any harmful effect if it's > < RTO_INIT, right? So I'm not sure if we should. No preference from my > side really. Sorry, describing this as >=3D RTO_INIT was misleading. What I'm concerned about here is if the kernel value is set to 400ms, we'll round it to... 0s. So, really what I'm concerned about is that we ensure this is > 0. > > > + > > > debug("Read sysctl values syn_retries: %"PRIu8 > > > - ", syn_linear_timeouts: %"PRIu8, > > > + ", syn_linear_timeouts: %"PRIu8 > > > + ", rto_max: %d", > > > c->tcp.syn_retries, > > > - c->tcp.syn_linear_timeouts); > > > + c->tcp.syn_linear_timeouts, > > > + c->tcp.rto_max); > > > } > > > =20 > > > /** > > > diff --git a/tcp.h b/tcp.h > > > index 37d7758..c4945c3 100644 > > > --- a/tcp.h > > > +++ b/tcp.h > > > @@ -60,6 +60,7 @@ union tcp_listen_epoll_ref { > > > * @fwd_out: Port forwarding configuration for outbound packets > > > * @timer_run: Timestamp of most recent timer run > > > * @pipe_size: Size of pipes for spliced connections > > > + * @rto_max: Maximal retry timeout (in s) >=20 > As pointed out earlier, I think "maximum" is slightly more appropriate > here. Agreed, with nuance discussed earlier. >=20 > > > * @syn_retries: SYN retries using exponential backoff timeout > > > * @syn_linear_timeouts: SYN retries before using exponential backof= f timeout > > > */ > > > @@ -68,6 +69,7 @@ struct tcp_ctx { > > > struct fwd_ports fwd_out; > > > struct timespec timer_run; > > > size_t pipe_size; > > > + int rto_max; > > > uint8_t syn_retries; > > > uint8_t syn_linear_timeouts; > > > }; > > > diff --git a/tcp_conn.h b/tcp_conn.h > > > index 923af36..e36910c 100644 > > > --- a/tcp_conn.h > > > +++ b/tcp_conn.h > > > @@ -77,6 +77,7 @@ struct tcp_tap_conn { > > > #define ACK_TO_TAP_DUE BIT(3) > > > #define ACK_FROM_TAP_DUE BIT(4) > > > #define ACK_FROM_TAP_BLOCKS BIT(5) > > > +#define SYN_RETRIED BIT(6) > > > =20 > > > #define SNDBUF_BITS 24 > > > unsigned int sndbuf :SNDBUF_BITS; > > > --=20 > > > 2.51.0 >=20 > --=20 > Stefano >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --Tj4pfXSwE/O/Y3Qt Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmkWeTYACgkQzQJF27ox 2GdO/hAApi9XLwxOQNjr7h7YP3uPFEUhwuZ7H2iMJOQnW+UeFLALNOL24Zz9AR6T 508JBeFRcMe4Op4PPfHM2QQFH4x5wHNbBW7q4d/RXXNkgpc4EN5g4Adk+pU2TtYb 8eXqzmmsdomCL6b4L+FuT6btA1ppwvD//vBxs8d5CpbOAvDiNPke06Sps11IosHP 9Wc6M1D6K5+aOmSSgnJ+veFoRPXupln7fGjzs0TnR1ZX5Y2XbBoLTpELzBXt4EtF Uvssx34Mo9kc/oA6gmZeIhJVCXNa/pGDTKyLseR8+OE6fmBTvB5uF29WhdbL/pbD z0LEEEEylvj/LLFIq37n2tREb5+QEm0tnr1z86+/lo9qirJ7r/QD97T2J+6t7p5M QiEnCOHmUI0Mw/IDIlzZoyOTlAUm8+V4AskazOAwTksoE/gw7sPs69aIJ2RkTTLd B02c2kpscsUOvxS6F0j3ZT79Fc906jYXXG5fKMZv70kkrywS1aht4P2VvlCjuv3y RWPq5DWu4QZu3CZiTK510DZouzm9C/JUsuCXz+780t2nbcvcD8xPrcnCaEDLtL8X J8L0PrkTIrIaMBoefAxhADsVvKsjCz24Pt0U7kSVCu+ZNqnLKLm/whBdwy/kGAzg KP8mRqgTW6FBrYoM+BYZUEKsspKiSyGPgB/eCMKovhrJ7Dd1ozM= =s8o1 -----END PGP SIGNATURE----- --Tj4pfXSwE/O/Y3Qt--