public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Yumei Huang <yuhuang@redhat.com>
Cc: passt-dev@passt.top, sbrivio@redhat.com
Subject: Re: [PATCH v7 5/5] tcp: Clamp the retry timeout
Date: Mon, 3 Nov 2025 12:37:52 +1100	[thread overview]
Message-ID: <aQgHcB4FDDJhcCZJ@zatzit> (raw)
In-Reply-To: <20251031054242.7334-6-yuhuang@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 5864 bytes --]

On Fri, Oct 31, 2025 at 01:42:42PM +0800, Yumei Huang wrote:
> Clamp the TCP retry timeout as Linux kernel does. If RTO is less
> than 3 seconds, re-initialize it to 3 seconds for data retransmissions
> according to RFC 6298.
> 
> Suggested-by: Stefano Brivio <sbrivio@redhat.com>
> Signed-off-by: Yumei Huang <yuhuang@redhat.com>
> ---
>  tcp.c | 25 ++++++++++++++++++++-----
>  tcp.h |  2 ++
>  2 files changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/tcp.c b/tcp.c
> index 96ee56a..84a6700 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -187,6 +187,9 @@
>   *   for established connections, or (tcp_syn_retries +
>   *   tcp_syn_linear_timeouts) times during the handshake, reset the connection
>   *
> + * - RTO_INIT_ACK: if the RTO is less than this, re-initialize RTO to this for
> + *   data retransmissions.
> + *
>   * - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK_FROM_TAP_DUE
>   *   with TAP_FIN_SENT event), and no ACK is received within this time, reset
>   *   the connection
> @@ -340,6 +343,7 @@ enum {
>  
>  #define ACK_INTERVAL			10		/* ms */
>  #define RTO_INIT			1		/* s, RFC 6298 */
> +#define RTO_INIT_ACK			3		/* s, RFC 6298 */
>  #define FIN_TIMEOUT			60
>  #define ACT_TIMEOUT			7200
>  
> @@ -365,9 +369,11 @@ uint8_t tcp_migrate_rcv_queue		[TCP_MIGRATE_RCV_QUEUE_MAX];
>  
>  #define TCP_SYN_RETRIES		"/proc/sys/net/ipv4/tcp_syn_retries"
>  #define TCP_SYN_LINEAR_TIMEOUTS	"/proc/sys/net/ipv4/tcp_syn_linear_timeouts"
> +#define TCP_RTO_MAX_MS			"/proc/sys/net/ipv4/tcp_rto_max_ms"
>  
>  #define TCP_SYN_RETRIES_DEFAULT		6
>  #define TCP_SYN_LINEAR_TIMEOUTS_DEFAULT	4
> +#define TCP_RTO_MAX_MS_DEFAULT			120000
>  
>  /* "Extended" data (not stored in the flow table) for TCP flow migration */
>  static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX];
> @@ -585,10 +591,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
>  	if (conn->flags & ACK_TO_TAP_DUE) {
>  		it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000;
>  	} else if (conn->flags & ACK_FROM_TAP_DUE) {
> -		int exp = conn->retries;
> +		int exp = conn->retries, timeout = RTO_INIT;
>  		if (!(conn->events & ESTABLISHED))
>  			exp -= c->tcp.syn_linear_timeouts;
> -		it.it_value.tv_sec = RTO_INIT << MAX(exp, 0);
> +		else
> +			timeout = MAX(timeout, RTO_INIT_ACK);

Possibly I missed something, since I only skimmed your discussion of
this behaviour with Stefano.  But I'm not convinced this is a correct
interpretation of the RFC.  (5.7) says "If the timer expires awaiting
the ACK of a SYN segment ...".  That is, I think it's only suggesting
increasing the RTO to 3 at the data stage *if* we had at least one
retry during the handshake.  That is, unfortunately, much fiddlier to
implement, since we need to remember what happened during the
handshake to apply it here.

Additionally, if I'm reading the RFC correctly, it's treating this as
a one-time adjustment of the RTO, which won't necessarily remain the
case for the entire data phase.  Here this minimum will apply for the
entire data phase.

Even though it's a "MUST" in the RFC, I kind of think we could just
skip this for two reasons:

  1) We already don't bother with RTT measurements, which the RFC
     assumes the implementation is doing to adjust the RTO.

  2) We expect to be talking to a guest.  The chance of a high RTT is
     vanishingly small compared to a path to potentially any host on
     the 2011 internet.

> +		timeout <<= MAX(exp, 0);
> +		it.it_value.tv_sec = MIN(timeout, c->tcp.tcp_rto_max);
>  	} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
>  		it.it_value.tv_sec = FIN_TIMEOUT;
>  	} else {
> @@ -2785,18 +2794,24 @@ static socklen_t tcp_probe_tcp_info(void)
>  */
>  void tcp_get_rto_params(struct ctx *c)
>  {
> -	intmax_t tcp_syn_retries, syn_linear_timeouts;
> +	intmax_t tcp_syn_retries, syn_linear_timeouts, tcp_rto_max_ms;
>  
>  	tcp_syn_retries = read_file_integer(
>  		TCP_SYN_RETRIES, TCP_SYN_RETRIES_DEFAULT);
>  	syn_linear_timeouts = read_file_integer(
>  		TCP_SYN_LINEAR_TIMEOUTS, TCP_SYN_LINEAR_TIMEOUTS_DEFAULT);
> +	tcp_rto_max_ms = read_file_integer(
> +		TCP_RTO_MAX_MS, TCP_RTO_MAX_MS_DEFAULT);
>  
>  	c->tcp.tcp_syn_retries = MIN(tcp_syn_retries, UINT8_MAX);
>  	c->tcp.syn_linear_timeouts = MIN(syn_linear_timeouts, UINT8_MAX);
> +	c->tcp.tcp_rto_max = MIN(
> +		DIV_ROUND_CLOSEST(tcp_rto_max_ms, 1000), SIZE_MAX);
>  
> -	debug("Read sysctl values tcp_syn_retries: %"PRIu8", linear_timeouts: %"PRIu8,
> -	      c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts);
> +	debug("Read sysctl values tcp_syn_retries: %"PRIu8
> +	      ", linear_timeouts: %"PRIu8", tcp_rto_max: %zu",
> +	      c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts,
> +	      c->tcp.tcp_rto_max);
>  }
>  
>  /**
> diff --git a/tcp.h b/tcp.h
> index befedde..a238bb7 100644
> --- a/tcp.h
> +++ b/tcp.h
> @@ -59,6 +59,7 @@ union tcp_listen_epoll_ref {
>   * @fwd_out:		Port forwarding configuration for outbound packets
>   * @timer_run:		Timestamp of most recent timer run
>   * @pipe_size:		Size of pipes for spliced connections
> + * @tcp_rto_max:	Maximal retry timeout (in s)
>   * @tcp_syn_retries:	SYN retries using exponential backoff timeout
>   * @syn_linear_timeouts: SYN retries before using exponential backoff timeout
>   */
> @@ -67,6 +68,7 @@ struct tcp_ctx {
>  	struct fwd_ports fwd_out;
>  	struct timespec timer_run;
>  	size_t pipe_size;
> +	size_t tcp_rto_max;
>  	uint8_t tcp_syn_retries;
>  	uint8_t syn_linear_timeouts;
>  };
> -- 
> 2.49.0
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2025-11-03  1:38 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-31  5:42 [PATCH v7 0/5] Retry SYNs for inbound connections Yumei Huang
2025-10-31  5:42 ` [PATCH v7 1/5] tcp: Rename "retrans" to "retries" Yumei Huang
2025-10-31  5:42 ` [PATCH v7 2/5] util: Introduce read_file() and read_file_integer() function Yumei Huang
2025-11-03  0:53   ` David Gibson
2025-10-31  5:42 ` [PATCH v7 3/5] tcp: Resend SYN for inbound connections Yumei Huang
2025-11-03  1:09   ` David Gibson
2025-11-03  2:31     ` Yumei Huang
2025-11-03  9:01       ` David Gibson
2025-11-04  4:42         ` Stefano Brivio
2025-11-04  4:42   ` Stefano Brivio
2025-11-05  4:24     ` David Gibson
2025-11-05  7:00       ` Stefano Brivio
2025-11-07  9:56         ` Yumei Huang
2025-11-07 10:05           ` Stefano Brivio
2025-11-10  2:52             ` Yumei Huang
2025-11-10  4:25               ` David Gibson
2025-10-31  5:42 ` [PATCH v7 4/5] tcp: Update data retransmission timeout Yumei Huang
2025-10-31  5:56   ` Yumei Huang
2025-10-31  8:04     ` Stefano Brivio
2025-11-03  1:18   ` David Gibson
2025-11-03  2:57     ` Yumei Huang
2025-11-03  9:32       ` David Gibson
2025-11-04  4:42         ` Stefano Brivio
2025-11-05  4:19           ` David Gibson
2025-10-31  5:42 ` [PATCH v7 5/5] tcp: Clamp the retry timeout Yumei Huang
2025-10-31  8:38   ` Stefano Brivio
2025-11-03  3:11     ` Yumei Huang
2025-11-03  9:37       ` David Gibson
2025-11-03 10:55       ` Stefano Brivio
2025-11-03  1:37   ` David Gibson [this message]
2025-11-03  4:06     ` Yumei Huang
2025-11-03 10:38     ` Stefano Brivio
2025-11-05  4:22       ` David Gibson
2025-11-04  4:42   ` Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQgHcB4FDDJhcCZJ@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    --cc=yuhuang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).