From: Stefano Brivio <sbrivio@redhat.com>
To: Yumei Huang <yuhuang@redhat.com>
Cc: passt-dev@passt.top, david@gibson.dropbear.id.au
Subject: Re: [PATCH v9 5/5] tcp: Clamp the retry timeout
Date: Thu, 27 Nov 2025 23:33:42 +0100 [thread overview]
Message-ID: <20251127233342.36986b16@elisabeth> (raw)
In-Reply-To: <20251125072638.88896-6-yuhuang@redhat.com>
On Tue, 25 Nov 2025 15:26:38 +0800
Yumei Huang <yuhuang@redhat.com> wrote:
> Clamp the TCP retry timeout as Linux kernel does. If a retry occurs
> during the handshake and the RTO is below 3 seconds, re-initialise
> it to 3 seconds for data retransmissions according to RFC 6298.
>
> Suggested-by: Stefano Brivio <sbrivio@redhat.com>
> Signed-off-by: Yumei Huang <yuhuang@redhat.com>
> ---
> tcp.c | 25 ++++++++++++++++++++-----
> tcp.h | 2 ++
> tcp_conn.h | 1 +
> 3 files changed, 23 insertions(+), 5 deletions(-)
>
> diff --git a/tcp.c b/tcp.c
> index b3aa064..887b32f 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -187,6 +187,9 @@
> * for established connections, or (syn_retries + syn_linear_timeouts) times
> * during the handshake, reset the connection
> *
> + * - RTO_INIT_AFTER_SYN_RETRIES: if SYN retries happened during handshake and
> + * RTO is less than this, re-initialise RTO to this for data retransmissions
> + *
> * - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK_FROM_TAP_DUE
> * with TAP_FIN_SENT event), and no ACK is received within this time, reset
> * the connection
> @@ -340,6 +343,7 @@ enum {
>
> #define ACK_INTERVAL 10 /* ms */
> #define RTO_INIT 1 /* s, RFC 6298 */
> +#define RTO_INIT_AFTER_SYN_RETRIES 3 /* s, RFC 6298 */
> #define FIN_TIMEOUT 60
> #define ACT_TIMEOUT 7200
>
> @@ -365,9 +369,11 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX];
>
> #define SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retries"
> #define SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linear_timeouts"
> +#define RTO_MAX_MS "/proc/sys/net/ipv4/tcp_rto_max_ms"
>
> #define SYN_RETRIES_DEFAULT 6
> #define SYN_LINEAR_TIMEOUTS_DEFAULT 4
> +#define RTO_MAX_MS_DEFAULT 120000
Nit: I think it would make more sense to have this in seconds, because
you are rounding it anyway. If somebody changes this to 120001, you'll
use 120000. At that point you can just say 120.
More importantly, perhaps, if somebody changes this to 1200, you would
be using 2000.
I think this could be RTO_MAX_DEFAULT (with a comment saying it's in
seconds) and later you can just multiply it by 1000 when you use it.
The rest of the series looks good to me now.
> #define MAX_SYNCNT 127 /* derived from kernel's limit */
>
> /* "Extended" data (not stored in the flow table) for TCP flow migration */
> @@ -392,7 +398,7 @@ static const char *tcp_state_str[] __attribute((__unused__)) = {
>
> static const char *tcp_flag_str[] __attribute((__unused__)) = {
> "STALLED", "LOCAL", "ACTIVE_CLOSE", "ACK_TO_TAP_DUE",
> - "ACK_FROM_TAP_DUE", "ACK_FROM_TAP_BLOCKS",
> + "ACK_FROM_TAP_DUE", "ACK_FROM_TAP_BLOCKS", "SYN_RETRIED",
> };
>
> /* Listening sockets, used for automatic port forwarding in pasta mode only */
> @@ -590,10 +596,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
> if (conn->flags & ACK_TO_TAP_DUE) {
> it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000;
> } else if (conn->flags & ACK_FROM_TAP_DUE) {
> - int exp = conn->retries;
> + int exp = conn->retries, timeout = RTO_INIT;
> if (!(conn->events & ESTABLISHED))
> exp -= c->tcp.syn_linear_timeouts;
> - it.it_value.tv_sec = RTO_INIT << MAX(exp, 0);
> + else if (conn->flags & SYN_RETRIED)
> + timeout = MAX(timeout, RTO_INIT_AFTER_SYN_RETRIES);
> + timeout <<= MAX(exp, 0);
> + it.it_value.tv_sec = MIN(timeout, c->tcp.rto_max);
> } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
> it.it_value.tv_sec = FIN_TIMEOUT;
> } else {
> @@ -2441,6 +2450,7 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
> flow_trace(conn, "SYN timeout, retry");
> tcp_send_flag(c, conn, SYN);
> conn->retries++;
> + conn_flag(c, conn, SYN_RETRIED);
> tcp_timer_ctl(c, conn);
> }
> } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
> @@ -2812,10 +2822,15 @@ static void tcp_get_rto_params(struct ctx *c)
> v = read_file_integer(SYN_LINEAR_TIMEOUTS, SYN_LINEAR_TIMEOUTS_DEFAULT);
> c->tcp.syn_linear_timeouts = MIN(v, MAX_SYNCNT);
>
> + v = read_file_integer(RTO_MAX_MS, RTO_MAX_MS_DEFAULT);
> + c->tcp.rto_max = MIN(DIV_ROUND_UP(v, 1000), INT_MAX);
> +
> debug("Using TCP RTO parameters, syn_retries: %"PRIu8
> - ", syn_linear_timeouts: %"PRIu8,
> + ", syn_linear_timeouts: %"PRIu8
> + ", rto_max: %d",
> c->tcp.syn_retries,
> - c->tcp.syn_linear_timeouts);
> + c->tcp.syn_linear_timeouts,
> + c->tcp.rto_max);
> }
>
> /**
> diff --git a/tcp.h b/tcp.h
> index 37d7758..6fb6f92 100644
> --- a/tcp.h
> +++ b/tcp.h
> @@ -60,6 +60,7 @@ union tcp_listen_epoll_ref {
> * @fwd_out: Port forwarding configuration for outbound packets
> * @timer_run: Timestamp of most recent timer run
> * @pipe_size: Size of pipes for spliced connections
> + * @rto_max: Maximum retry timeout (in s)
> * @syn_retries: SYN retries using exponential backoff timeout
> * @syn_linear_timeouts: SYN retries before using exponential backoff timeout
> */
> @@ -68,6 +69,7 @@ struct tcp_ctx {
> struct fwd_ports fwd_out;
> struct timespec timer_run;
> size_t pipe_size;
> + int rto_max;
> uint8_t syn_retries;
> uint8_t syn_linear_timeouts;
> };
> diff --git a/tcp_conn.h b/tcp_conn.h
> index 923af36..e36910c 100644
> --- a/tcp_conn.h
> +++ b/tcp_conn.h
> @@ -77,6 +77,7 @@ struct tcp_tap_conn {
> #define ACK_TO_TAP_DUE BIT(3)
> #define ACK_FROM_TAP_DUE BIT(4)
> #define ACK_FROM_TAP_BLOCKS BIT(5)
> +#define SYN_RETRIED BIT(6)
>
> #define SNDBUF_BITS 24
> unsigned int sndbuf :SNDBUF_BITS;
--
Stefano
next prev parent reply other threads:[~2025-11-27 22:33 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-25 7:26 [PATCH v9 0/5] Retry SYNs for inbound connections Yumei Huang
2025-11-25 7:26 ` [PATCH v9 1/5] tcp: Rename "retrans" to "retries" Yumei Huang
2025-11-25 7:26 ` [PATCH v9 2/5] util: Introduce read_file() and read_file_integer() function Yumei Huang
2025-11-26 4:07 ` David Gibson
2025-11-27 22:33 ` Stefano Brivio
2025-11-25 7:26 ` [PATCH v9 3/5] tcp: Resend SYN for inbound connections Yumei Huang
2025-11-26 4:12 ` David Gibson
2025-11-25 7:26 ` [PATCH v9 4/5] tcp: Update data retransmission timeout Yumei Huang
2025-11-27 22:33 ` Stefano Brivio
2025-11-25 7:26 ` [PATCH v9 5/5] tcp: Clamp the retry timeout Yumei Huang
2025-11-26 4:15 ` David Gibson
2025-11-27 22:33 ` Stefano Brivio [this message]
2025-12-01 4:09 ` Yumei Huang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251127233342.36986b16@elisabeth \
--to=sbrivio@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
--cc=yuhuang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).