From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: Yumei Huang <yuhuang@redhat.com>, passt-dev@passt.top
Subject: Re: [PATCH v7 5/5] tcp: Clamp the retry timeout
Date: Mon, 3 Nov 2025 11:38:57 +0100 [thread overview]
Message-ID: <20251103113857.47f7d008@elisabeth> (raw)
In-Reply-To: <aQgHcB4FDDJhcCZJ@zatzit>
On Mon, 3 Nov 2025 12:37:52 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> On Fri, Oct 31, 2025 at 01:42:42PM +0800, Yumei Huang wrote:
> > Clamp the TCP retry timeout as Linux kernel does. If RTO is less
> > than 3 seconds, re-initialize it to 3 seconds for data retransmissions
> > according to RFC 6298.
> >
> > Suggested-by: Stefano Brivio <sbrivio@redhat.com>
> > Signed-off-by: Yumei Huang <yuhuang@redhat.com>
> > ---
> > tcp.c | 25 ++++++++++++++++++++-----
> > tcp.h | 2 ++
> > 2 files changed, 22 insertions(+), 5 deletions(-)
> >
> > diff --git a/tcp.c b/tcp.c
> > index 96ee56a..84a6700 100644
> > --- a/tcp.c
> > +++ b/tcp.c
> > @@ -187,6 +187,9 @@
> > * for established connections, or (tcp_syn_retries +
> > * tcp_syn_linear_timeouts) times during the handshake, reset the connection
> > *
> > + * - RTO_INIT_ACK: if the RTO is less than this, re-initialize RTO to this for
> > + * data retransmissions.
> > + *
> > * - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK_FROM_TAP_DUE
> > * with TAP_FIN_SENT event), and no ACK is received within this time, reset
> > * the connection
> > @@ -340,6 +343,7 @@ enum {
> >
> > #define ACK_INTERVAL 10 /* ms */
> > #define RTO_INIT 1 /* s, RFC 6298 */
> > +#define RTO_INIT_ACK 3 /* s, RFC 6298 */
> > #define FIN_TIMEOUT 60
> > #define ACT_TIMEOUT 7200
> >
> > @@ -365,9 +369,11 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX];
> >
> > #define TCP_SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retries"
> > #define TCP_SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linear_timeouts"
> > +#define TCP_RTO_MAX_MS "/proc/sys/net/ipv4/tcp_rto_max_ms"
> >
> > #define TCP_SYN_RETRIES_DEFAULT 6
> > #define TCP_SYN_LINEAR_TIMEOUTS_DEFAULT 4
> > +#define TCP_RTO_MAX_MS_DEFAULT 120000
> >
> > /* "Extended" data (not stored in the flow table) for TCP flow migration */
> > static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX];
> > @@ -585,10 +591,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
> > if (conn->flags & ACK_TO_TAP_DUE) {
> > it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000;
> > } else if (conn->flags & ACK_FROM_TAP_DUE) {
> > - int exp = conn->retries;
> > + int exp = conn->retries, timeout = RTO_INIT;
> > if (!(conn->events & ESTABLISHED))
> > exp -= c->tcp.syn_linear_timeouts;
> > - it.it_value.tv_sec = RTO_INIT << MAX(exp, 0);
> > + else
> > + timeout = MAX(timeout, RTO_INIT_ACK);
>
> Possibly I missed something, since I only skimmed your discussion of
> this behaviour with Stefano. But I'm not convinced this is a correct
> interpretation of the RFC. (5.7) says "If the timer expires awaiting
> the ACK of a SYN segment ...". That is, I think it's only suggesting
> increasing the RTO to 3 at the data stage *if* we had at least one
> retry during the handshake.
Oops, true, my bad.
I guess I suggested the interpretation as of v7 because I just skimmed
Appendix A of RFC 6298 whose main function is to justify the reasons
behind lowering the initial timeout to one second, and I thought these
reason simply don't apply to the established phase, so we use three
seconds there.
But that's clearly not the case, hence the "When this happens" clause
in the middle of Appendix A.
> That is, unfortunately, much fiddlier to
> implement, since we need to remember what happened during the
> handshake to apply it here.
Hmm,
---
diff --git a/tcp.c b/tcp.c
index c1eb5de..90c3ca1 100644
--- a/tcp.c
+++ b/tcp.c
@@ -397,7 +397,7 @@ static const char *tcp_state_str[] __attribute((__unused__)) = {
static const char *tcp_flag_str[] __attribute((__unused__)) = {
"STALLED", "LOCAL", "ACTIVE_CLOSE", "ACK_TO_TAP_DUE",
- "ACK_FROM_TAP_DUE", "ACK_FROM_TAP_BLOCKS",
+ "ACK_FROM_TAP_DUE", "ACK_FROM_TAP_BLOCKS", "SYN_RETRIED",
};
/* Listening sockets, used for automatic port forwarding in pasta mode only */
@@ -597,7 +597,7 @@ static void tcp_timer_ctl(struct tcp_tap_conn *conn)
int exp = conn->retries, timeout = RTO_INIT;
if (!(conn->events & ESTABLISHED))
exp -= c->tcp.syn_linear_timeouts;
- else
+ else if (conn->flags & SYN_RETRIED)
timeout = MAX(timeout, RTO_INIT_ACK);
timeout <<= MAX(exp, 0);
it.it_value.tv_sec = MIN(timeout, c->tcp.tcp_rto_max);
@@ -2446,6 +2446,7 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
flow_trace(conn, "SYN timeout, retry");
tcp_send_flag(c, conn, SYN);
conn->retries++;
+ conn_flag(c, conn, SYN_RETRIED);
tcp_timer_ctl(c, conn);
}
} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
diff --git a/tcp_conn.h b/tcp_conn.h
index c006d56..87f4a2d 100644
--- a/tcp_conn.h
+++ b/tcp_conn.h
@@ -77,6 +77,7 @@ struct tcp_tap_conn {
#define ACK_TO_TAP_DUE BIT(3)
#define ACK_FROM_TAP_DUE BIT(4)
#define ACK_FROM_TAP_BLOCKS BIT(5)
+#define SYN_RETRIED BIT(6)
#define SNDBUF_BITS 24
unsigned int sndbuf :SNDBUF_BITS;
---
?
> Additionally, if I'm reading the RFC correctly, it's treating this as
> a one-time adjustment of the RTO, which won't necessarily remain the
> case for the entire data phase. Here this minimum will apply for the
> entire data phase.
But it's the initial RTO (see Appendix A, which states it clearly), and
any exponentiation is based on the initial value, so this should fit
the requirement.
> Even though it's a "MUST" in the RFC, I kind of think we could just
> skip this for two reasons:
>
> 1) We already don't bother with RTT measurements, which the RFC
> assumes the implementation is doing to adjust the RTO.
Kind of:
(2.1) Until a round-trip time (RTT) measurement has been made for a
segment sent between the sender and receiver, the sender SHOULD
set RTO <- 1 second
and given that this condition (no round-trip time measurement done yet)
is explicitly considered, I guess we can reasonably expect TCP stacks we
might be talking to to be fully compatible with what we're doing, as
long as we stick to the RFC.
> 2) We expect to be talking to a guest. The chance of a high RTT is
> vanishingly small compared to a path to potentially any host on
> the 2011 internet.
...until we move the guest / container and we implement a TCP transport
(somewhat overdue) in place of the existing tap / UNIX domain /
vhost-user connection.
The chance is still small and the consequences of ignoring this part of
the RFC are, I'm fairly sure, negligible, but if it's that easy to
implement, should we really depart from it?
--
Stefano
next prev parent reply other threads:[~2025-11-03 10:39 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-31 5:42 [PATCH v7 0/5] Retry SYNs for inbound connections Yumei Huang
2025-10-31 5:42 ` [PATCH v7 1/5] tcp: Rename "retrans" to "retries" Yumei Huang
2025-10-31 5:42 ` [PATCH v7 2/5] util: Introduce read_file() and read_file_integer() function Yumei Huang
2025-11-03 0:53 ` David Gibson
2025-10-31 5:42 ` [PATCH v7 3/5] tcp: Resend SYN for inbound connections Yumei Huang
2025-11-03 1:09 ` David Gibson
2025-11-03 2:31 ` Yumei Huang
2025-11-03 9:01 ` David Gibson
2025-11-04 4:42 ` Stefano Brivio
2025-11-04 4:42 ` Stefano Brivio
2025-11-05 4:24 ` David Gibson
2025-11-05 7:00 ` Stefano Brivio
2025-11-07 9:56 ` Yumei Huang
2025-11-07 10:05 ` Stefano Brivio
2025-11-10 2:52 ` Yumei Huang
2025-11-10 4:25 ` David Gibson
2025-10-31 5:42 ` [PATCH v7 4/5] tcp: Update data retransmission timeout Yumei Huang
2025-10-31 5:56 ` Yumei Huang
2025-10-31 8:04 ` Stefano Brivio
2025-11-03 1:18 ` David Gibson
2025-11-03 2:57 ` Yumei Huang
2025-11-03 9:32 ` David Gibson
2025-11-04 4:42 ` Stefano Brivio
2025-11-05 4:19 ` David Gibson
2025-10-31 5:42 ` [PATCH v7 5/5] tcp: Clamp the retry timeout Yumei Huang
2025-10-31 8:38 ` Stefano Brivio
2025-11-03 3:11 ` Yumei Huang
2025-11-03 9:37 ` David Gibson
2025-11-03 10:55 ` Stefano Brivio
2025-11-03 1:37 ` David Gibson
2025-11-03 4:06 ` Yumei Huang
2025-11-03 10:38 ` Stefano Brivio [this message]
2025-11-05 4:22 ` David Gibson
2025-11-04 4:42 ` Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251103113857.47f7d008@elisabeth \
--to=sbrivio@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
--cc=yuhuang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).