From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: Yumei Huang <yuhuang@redhat.com>, passt-dev@passt.top
Subject: Re: [PATCH v8 6/6] tcp: Clamp the retry timeout
Date: Fri, 14 Nov 2025 01:01:21 +0100 [thread overview]
Message-ID: <20251114010121.10dfb18a@elisabeth> (raw)
In-Reply-To: <aRHE58-wiLcUUWC0@zatzit>
On Mon, 10 Nov 2025 21:56:39 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> On Mon, Nov 10, 2025 at 05:31:37PM +0800, Yumei Huang wrote:
> > Clamp the TCP retry timeout as Linux kernel does. If a retry occurs
> > during the handshake and the RTO is below 3 seconds, re-initialise
> > it to 3 seconds for data retransmissions according to RFC 6298.
> >
> > Suggested-by: Stefano Brivio <sbrivio@redhat.com>
> > Signed-off-by: Yumei Huang <yuhuang@redhat.com>
>
> Looks correct, so
>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
>
> A few possible improvements (mostly comments/names) below.
>
> > ---
> > tcp.c | 25 ++++++++++++++++++++-----
> > tcp.h | 2 ++
> > tcp_conn.h | 1 +
> > 3 files changed, 23 insertions(+), 5 deletions(-)
> >
> > diff --git a/tcp.c b/tcp.c
> > index ee111e0..b015658 100644
> > --- a/tcp.c
> > +++ b/tcp.c
> > @@ -187,6 +187,9 @@
> > * for established connections, or (syn_retries + syn_linear_timeouts) times
> > * during the handshake, reset the connection
> > *
> > + * - RTO_INIT_ACK: if the RTO is less than this, re-initialise RTO to this for
> > + * data retransmissions
> > + *
>
> Now that this is conditional on SYN being retried, the description
> doesn't look entirely accurate any more.
If I read the whole thing again I would actually say it becomes
misleading and worth fixing before applying this. Also because the
relationship to the text in the RFC is not obvious anymore.
This should be "if SYN retries happened during handshake, ...".
> > * - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK_FROM_TAP_DUE
> > * with TAP_FIN_SENT event), and no ACK is received within this time, reset
> > * the connection
> > @@ -340,6 +343,7 @@ enum {
> >
> > #define ACK_INTERVAL 10 /* ms */
> > #define RTO_INIT 1 /* s, RFC 6298 */
> > +#define RTO_INIT_ACK 3 /* s, RFC 6298 */
>
> The relevance of "_ACK" in the name is not obvious to me.
What about RTO_INIT_AFTER_SYN_RETRIES? We use it only once, and it
looks fine there.
>
> > #define FIN_TIMEOUT 60
> > #define ACT_TIMEOUT 7200
> >
> -> @@ -365,9 +369,11 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX];
> >
> > #define SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retries"
> > #define SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linear_timeouts"
> > +#define RTO_MAX_MS "/proc/sys/net/ipv4/tcp_rto_max_ms"
> >
> > #define SYN_RETRIES_DEFAULT 6
> > #define SYN_LINEAR_TIMEOUTS_DEFAULT 4
> > +#define RTO_MAX_MS_DEFAULT 120000
> > #define MAX_SYNCNT 127 /* derived from kernel's limit */
> >
> > /* "Extended" data (not stored in the flow table) for TCP flow migration */
> > @@ -392,7 +398,7 @@ static const char *tcp_state_str[] __attribute((__unused__)) = {
> >
> > static const char *tcp_flag_str[] __attribute((__unused__)) = {
> > "STALLED", "LOCAL", "ACTIVE_CLOSE", "ACK_TO_TAP_DUE",
> > - "ACK_FROM_TAP_DUE", "ACK_FROM_TAP_BLOCKS",
> > + "ACK_FROM_TAP_DUE", "ACK_FROM_TAP_BLOCKS", "SYN_RETRIED",
> > };
> >
> > /* Listening sockets, used for automatic port forwarding in pasta mode only */
> > @@ -590,10 +596,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
> > if (conn->flags & ACK_TO_TAP_DUE) {
> > it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000;
> > } else if (conn->flags & ACK_FROM_TAP_DUE) {
> > - int exp = conn->retries;
> > + int exp = conn->retries, timeout = RTO_INIT;
> > if (!(conn->events & ESTABLISHED))
> > exp -= c->tcp.syn_linear_timeouts;
> > - it.it_value.tv_sec = RTO_INIT << MAX(exp, 0);
> > + else if (conn->flags & SYN_RETRIED)
> > + timeout = MAX(timeout, RTO_INIT_ACK);
> > + timeout <<= MAX(exp, 0);
> > + it.it_value.tv_sec = MIN(timeout, c->tcp.rto_max);
> > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
> > it.it_value.tv_sec = FIN_TIMEOUT;
> > } else {
> > @@ -2440,6 +2449,7 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
> > flow_trace(conn, "SYN timeout, retry");
> > tcp_send_flag(c, conn, SYN);
> > conn->retries++;
> > + conn_flag(c, conn, SYN_RETRIED);
> > tcp_timer_ctl(c, conn);
> > }
> > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
> > @@ -2811,10 +2821,15 @@ void tcp_get_rto_params(struct ctx *c)
> > v = read_file_integer(SYN_LINEAR_TIMEOUTS, SYN_LINEAR_TIMEOUTS_DEFAULT);
> > c->tcp.syn_linear_timeouts = MIN(v, MAX_SYNCNT);
> >
> > + v = read_file_integer(RTO_MAX_MS, RTO_MAX_MS_DEFAULT);
> > + c->tcp.rto_max = MIN(DIV_ROUND_CLOSEST(v, 1000), INT_MAX);
>
> Possibly we should verify this is => RTO_INIT.
As a sanity check, maybe, but I don't see any harmful effect if it's
< RTO_INIT, right? So I'm not sure if we should. No preference from my
side really.
> > +
> > debug("Read sysctl values syn_retries: %"PRIu8
> > - ", syn_linear_timeouts: %"PRIu8,
> > + ", syn_linear_timeouts: %"PRIu8
> > + ", rto_max: %d",
> > c->tcp.syn_retries,
> > - c->tcp.syn_linear_timeouts);
> > + c->tcp.syn_linear_timeouts,
> > + c->tcp.rto_max);
> > }
> >
> > /**
> > diff --git a/tcp.h b/tcp.h
> > index 37d7758..c4945c3 100644
> > --- a/tcp.h
> > +++ b/tcp.h
> > @@ -60,6 +60,7 @@ union tcp_listen_epoll_ref {
> > * @fwd_out: Port forwarding configuration for outbound packets
> > * @timer_run: Timestamp of most recent timer run
> > * @pipe_size: Size of pipes for spliced connections
> > + * @rto_max: Maximal retry timeout (in s)
As pointed out earlier, I think "maximum" is slightly more appropriate
here.
> > * @syn_retries: SYN retries using exponential backoff timeout
> > * @syn_linear_timeouts: SYN retries before using exponential backoff timeout
> > */
> > @@ -68,6 +69,7 @@ struct tcp_ctx {
> > struct fwd_ports fwd_out;
> > struct timespec timer_run;
> > size_t pipe_size;
> > + int rto_max;
> > uint8_t syn_retries;
> > uint8_t syn_linear_timeouts;
> > };
> > diff --git a/tcp_conn.h b/tcp_conn.h
> > index 923af36..e36910c 100644
> > --- a/tcp_conn.h
> > +++ b/tcp_conn.h
> > @@ -77,6 +77,7 @@ struct tcp_tap_conn {
> > #define ACK_TO_TAP_DUE BIT(3)
> > #define ACK_FROM_TAP_DUE BIT(4)
> > #define ACK_FROM_TAP_BLOCKS BIT(5)
> > +#define SYN_RETRIED BIT(6)
> >
> > #define SNDBUF_BITS 24
> > unsigned int sndbuf :SNDBUF_BITS;
> > --
> > 2.51.0
--
Stefano
next prev parent reply other threads:[~2025-11-14 0:01 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-10 9:31 [PATCH v8 0/6] Retry SYNs for inbound connections Yumei Huang
2025-11-10 9:31 ` [PATCH v8 1/6] tcp: Rename "retrans" to "retries" Yumei Huang
2025-11-10 9:31 ` [PATCH v8 2/6] util: Introduce read_file() and read_file_integer() function Yumei Huang
2025-11-14 0:01 ` Stefano Brivio
2025-11-14 1:58 ` Yumei Huang
2025-11-14 4:32 ` David Gibson
2025-11-14 10:12 ` Stefano Brivio
2025-11-17 1:10 ` Yumei Huang
2025-11-18 0:19 ` Stefano Brivio
2025-11-18 0:19 ` Stefano Brivio
2025-11-18 3:22 ` David Gibson
2025-11-19 9:04 ` Yumei Huang
2025-11-19 9:38 ` David Gibson
2025-11-10 9:31 ` [PATCH v8 3/6] tcp: Add parameter struct ctx *c to tcp_timer_ctl() Yumei Huang
2025-11-10 10:35 ` David Gibson
2025-11-14 0:01 ` Stefano Brivio
2025-11-14 0:36 ` David Gibson
2025-11-10 9:31 ` [PATCH v8 4/6] tcp: Resend SYN for inbound connections Yumei Huang
2025-11-10 10:46 ` David Gibson
2025-11-14 0:00 ` Stefano Brivio
2025-11-14 0:31 ` David Gibson
2025-11-18 0:19 ` Stefano Brivio
2025-11-10 9:31 ` [PATCH v8 5/6] tcp: Update data retransmission timeout Yumei Huang
2025-11-10 9:31 ` [PATCH v8 6/6] tcp: Clamp the retry timeout Yumei Huang
2025-11-10 10:56 ` David Gibson
2025-11-14 0:01 ` Stefano Brivio [this message]
2025-11-14 0:35 ` David Gibson
2025-11-14 3:05 ` Yumei Huang
2025-11-14 3:35 ` David Gibson
2025-11-17 2:38 ` Yumei Huang
2025-11-17 4:50 ` David Gibson
2025-11-18 0:19 ` Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251114010121.10dfb18a@elisabeth \
--to=sbrivio@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
--cc=yuhuang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).