From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202510 header.b=OOMCuwYt; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 631B35A061A for ; Mon, 03 Nov 2025 02:38:00 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202510; t=1762133877; bh=lT0mzO8qenZXmT5yiz1oWKy2KEz5uPeXMMbtlDV+Kd8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=OOMCuwYtqK9mNl5nqguJuPNbR2LdHk8//gc98ZPrAqONB6hP1lrbjUPPUDvlImE8P xLucGErbfNV+FIYGTdh7HiDC2JJMufiB0r4ye2E/EDIi8GM1pNCRikkRs/Vx+9NAWs 9RvI/JN643/q2L1j/dsAv/Q0M6jFk5pfaSrBf6rFVcEClCkjs7zDsRFXGeCPQ5Z/yz sCGlXBR46Z9TnVQedP/yCVApsKezlL7QzOp7KHPisfHlYVoAXfei4Re/OWpcI8HFWO VZow5pH0jm7wYeo9CTMYX4QhcFm9cQbkQGrpTRi2d0OEUpWRGVl41zvUzSWR9jCxgL Ny2wpSN0EScvQ== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4d0Djx3Qvkz4wCp; Mon, 03 Nov 2025 12:37:57 +1100 (AEDT) Date: Mon, 3 Nov 2025 12:37:52 +1100 From: David Gibson To: Yumei Huang Subject: Re: [PATCH v7 5/5] tcp: Clamp the retry timeout Message-ID: References: <20251031054242.7334-1-yuhuang@redhat.com> <20251031054242.7334-6-yuhuang@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="u1aJQSo3bhXyuEhG" Content-Disposition: inline In-Reply-To: <20251031054242.7334-6-yuhuang@redhat.com> Message-ID-Hash: VGEVZY4YSOMSJ2F3M4YRVYSXFUJR4WWW X-Message-ID-Hash: VGEVZY4YSOMSJ2F3M4YRVYSXFUJR4WWW X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, sbrivio@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --u1aJQSo3bhXyuEhG Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Oct 31, 2025 at 01:42:42PM +0800, Yumei Huang wrote: > Clamp the TCP retry timeout as Linux kernel does. If RTO is less > than 3 seconds, re-initialize it to 3 seconds for data retransmissions > according to RFC 6298. >=20 > Suggested-by: Stefano Brivio > Signed-off-by: Yumei Huang > --- > tcp.c | 25 ++++++++++++++++++++----- > tcp.h | 2 ++ > 2 files changed, 22 insertions(+), 5 deletions(-) >=20 > diff --git a/tcp.c b/tcp.c > index 96ee56a..84a6700 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -187,6 +187,9 @@ > * for established connections, or (tcp_syn_retries + > * tcp_syn_linear_timeouts) times during the handshake, reset the conn= ection > * > + * - RTO_INIT_ACK: if the RTO is less than this, re-initialize RTO to th= is for > + * data retransmissions. > + * > * - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK_FROM_= TAP_DUE > * with TAP_FIN_SENT event), and no ACK is received within this time, = reset > * the connection > @@ -340,6 +343,7 @@ enum { > =20 > #define ACK_INTERVAL 10 /* ms */ > #define RTO_INIT 1 /* s, RFC 6298 */ > +#define RTO_INIT_ACK 3 /* s, RFC 6298 */ > #define FIN_TIMEOUT 60 > #define ACT_TIMEOUT 7200 > =20 > @@ -365,9 +369,11 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEU= E_MAX]; > =20 > #define TCP_SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retries" > #define TCP_SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linear_timeo= uts" > +#define TCP_RTO_MAX_MS "/proc/sys/net/ipv4/tcp_rto_max_ms" > =20 > #define TCP_SYN_RETRIES_DEFAULT 6 > #define TCP_SYN_LINEAR_TIMEOUTS_DEFAULT 4 > +#define TCP_RTO_MAX_MS_DEFAULT 120000 > =20 > /* "Extended" data (not stored in the flow table) for TCP flow migration= */ > static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX]; > @@ -585,10 +591,13 @@ static void tcp_timer_ctl(const struct ctx *c, stru= ct tcp_tap_conn *conn) > if (conn->flags & ACK_TO_TAP_DUE) { > it.it_value.tv_nsec =3D (long)ACK_INTERVAL * 1000 * 1000; > } else if (conn->flags & ACK_FROM_TAP_DUE) { > - int exp =3D conn->retries; > + int exp =3D conn->retries, timeout =3D RTO_INIT; > if (!(conn->events & ESTABLISHED)) > exp -=3D c->tcp.syn_linear_timeouts; > - it.it_value.tv_sec =3D RTO_INIT << MAX(exp, 0); > + else > + timeout =3D MAX(timeout, RTO_INIT_ACK); Possibly I missed something, since I only skimmed your discussion of this behaviour with Stefano. But I'm not convinced this is a correct interpretation of the RFC. (5.7) says "If the timer expires awaiting the ACK of a SYN segment ...". That is, I think it's only suggesting increasing the RTO to 3 at the data stage *if* we had at least one retry during the handshake. That is, unfortunately, much fiddlier to implement, since we need to remember what happened during the handshake to apply it here. Additionally, if I'm reading the RFC correctly, it's treating this as a one-time adjustment of the RTO, which won't necessarily remain the case for the entire data phase. Here this minimum will apply for the entire data phase. Even though it's a "MUST" in the RFC, I kind of think we could just skip this for two reasons: 1) We already don't bother with RTT measurements, which the RFC assumes the implementation is doing to adjust the RTO. 2) We expect to be talking to a guest. The chance of a high RTT is vanishingly small compared to a path to potentially any host on the 2011 internet. > + timeout <<=3D MAX(exp, 0); > + it.it_value.tv_sec =3D MIN(timeout, c->tcp.tcp_rto_max); > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > it.it_value.tv_sec =3D FIN_TIMEOUT; > } else { > @@ -2785,18 +2794,24 @@ static socklen_t tcp_probe_tcp_info(void) > */ > void tcp_get_rto_params(struct ctx *c) > { > - intmax_t tcp_syn_retries, syn_linear_timeouts; > + intmax_t tcp_syn_retries, syn_linear_timeouts, tcp_rto_max_ms; > =20 > tcp_syn_retries =3D read_file_integer( > TCP_SYN_RETRIES, TCP_SYN_RETRIES_DEFAULT); > syn_linear_timeouts =3D read_file_integer( > TCP_SYN_LINEAR_TIMEOUTS, TCP_SYN_LINEAR_TIMEOUTS_DEFAULT); > + tcp_rto_max_ms =3D read_file_integer( > + TCP_RTO_MAX_MS, TCP_RTO_MAX_MS_DEFAULT); > =20 > c->tcp.tcp_syn_retries =3D MIN(tcp_syn_retries, UINT8_MAX); > c->tcp.syn_linear_timeouts =3D MIN(syn_linear_timeouts, UINT8_MAX); > + c->tcp.tcp_rto_max =3D MIN( > + DIV_ROUND_CLOSEST(tcp_rto_max_ms, 1000), SIZE_MAX); > =20 > - debug("Read sysctl values tcp_syn_retries: %"PRIu8", linear_timeouts: %= "PRIu8, > - c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts); > + debug("Read sysctl values tcp_syn_retries: %"PRIu8 > + ", linear_timeouts: %"PRIu8", tcp_rto_max: %zu", > + c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts, > + c->tcp.tcp_rto_max); > } > =20 > /** > diff --git a/tcp.h b/tcp.h > index befedde..a238bb7 100644 > --- a/tcp.h > +++ b/tcp.h > @@ -59,6 +59,7 @@ union tcp_listen_epoll_ref { > * @fwd_out: Port forwarding configuration for outbound packets > * @timer_run: Timestamp of most recent timer run > * @pipe_size: Size of pipes for spliced connections > + * @tcp_rto_max: Maximal retry timeout (in s) > * @tcp_syn_retries: SYN retries using exponential backoff timeout > * @syn_linear_timeouts: SYN retries before using exponential backoff ti= meout > */ > @@ -67,6 +68,7 @@ struct tcp_ctx { > struct fwd_ports fwd_out; > struct timespec timer_run; > size_t pipe_size; > + size_t tcp_rto_max; > uint8_t tcp_syn_retries; > uint8_t syn_linear_timeouts; > }; > --=20 > 2.49.0 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --u1aJQSo3bhXyuEhG Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmkIB28ACgkQzQJF27ox 2GfXJxAAhRTTExbI9PLVADXawYNGzE44PkriAwjoCG3YVl4yS3zphxE9+789Wbw+ nCm31IWf7NBCURSc2xq/dA1/7Z4TKnsLPWTLHRvmFX+2uRxv8JLO+AF0k9PE1fNE +37nkRzSC3tvo59EoChQUWHXVQxf9QjG5mtsRWZkGiO6QakWUTHITNl3HRfxkk/A OfSgOV2slE0ejJDruE5N3eYtutEGWSqAvfY7BsRXsAzbWoOvFxJLBkUPgSlR24Ww VSTEVPA0bffIRqQZndkTz8mIz+xBpYaP7vyb7GKjyQKIu3SBug2XZoWnPZ2i+c6R QvFYWQ07DMn/XgyIxftDSGcCqGOyHyGHqrS3EBI4A/PMbI/tdzkFIejcddbdgJ8F /BPLe56CgXYIv/pStaWGUZpp0/k+Zu29cKdVPUx6ifAAwI0Jv5Ds8EnE3Y0KO3dl APv8lgbCy31Pxdkrg4q3gA3/MXWwRSrviuaIgJCGPE/Td7DURsEnxNADa0J1SIMq 1koLGS+3+KiFac5Hz/XJAifZ8GVtnnyThk+G4+6FRWZr4tv7fIdb6hItCilhTeBQ XsNr2GxfvEzMYxDVTjAoSwq7p7xup75sCFAigZimuvAykLSpQjnmZWYTwobrxXDY 73ByQpuA0C4zvNhuUZ0uIqVCGmJkBSZ0A/+17D6k0mDNz2H7XMI= =7Zam -----END PGP SIGNATURE----- --u1aJQSo3bhXyuEhG--