* [PATCH v2 0/3] Retry SYNs for inbound connections
@ 2025-10-10 7:46 Yumei Huang
2025-10-10 7:46 ` [PATCH v2 1/3] tcp: Rename "retrans" to "retries" Yumei Huang
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Yumei Huang @ 2025-10-10 7:46 UTC (permalink / raw)
To: passt-dev, sbrivio; +Cc: david, yuhuang
When a client connects, SYN would be sent to guest only once. If the
guest is not connected or ready at that time, the connection will be
reset in 10s. These patches introduce the SYN retry mechanism using
the similar backoff timeout as linux kernel. Also update the data
retransmission timeout.
Yumei Huang (3):
tcp: Rename "retrans" to "retries"
tcp: Resend SYN for inbound connections
tcp: Update data retransmission timeout
tcp.c | 132 +++++++++++++++++++++++++++++++++++++++++++++--------
tcp.h | 2 +
tcp_conn.h | 12 ++---
3 files changed, 120 insertions(+), 26 deletions(-)
--
2.47.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 1/3] tcp: Rename "retrans" to "retries"
2025-10-10 7:46 [PATCH v2 0/3] Retry SYNs for inbound connections Yumei Huang
@ 2025-10-10 7:46 ` Yumei Huang
2025-10-12 23:52 ` David Gibson
2025-10-10 7:46 ` [PATCH v2 2/3] tcp: Resend SYN for inbound connections Yumei Huang
2025-10-10 7:47 ` [PATCH v2 3/3] tcp: Update data retransmission timeout Yumei Huang
2 siblings, 1 reply; 11+ messages in thread
From: Yumei Huang @ 2025-10-10 7:46 UTC (permalink / raw)
To: passt-dev, sbrivio; +Cc: david, yuhuang
Rename "retrans" to "retries" so it can be used for SYN retries.
Signed-off-by: Yumei Huang <yuhuang@redhat.com>
---
tcp.c | 12 ++++++------
tcp_conn.h | 12 ++++++------
2 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/tcp.c b/tcp.c
index 0f9e9b3..2ec4b0c 100644
--- a/tcp.c
+++ b/tcp.c
@@ -186,7 +186,7 @@
* - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending
* data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the
* socket and reset sequence to what was acknowledged. If this persists for
- * more than TCP_MAX_RETRANS times in a row, reset the connection
+ * more than TCP_MAX_RETRIES times in a row, reset the connection
*
* - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK_FROM_TAP_DUE
* with TAP_FIN_SENT event), and no ACK is received within this time, reset
@@ -1127,7 +1127,7 @@ static void tcp_update_seqack_from_tap(const struct ctx *c,
if (SEQ_LT(seq, conn->seq_to_tap))
conn_flag(c, conn, ACK_FROM_TAP_DUE);
- conn->retrans = 0;
+ conn->retries = 0;
conn->seq_ack_from_tap = seq;
}
}
@@ -2414,7 +2414,7 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
flow_dbg(conn, "FIN timeout");
tcp_rst(c, conn);
- } else if (conn->retrans == TCP_MAX_RETRANS) {
+ } else if (conn->retries == TCP_MAX_RETRIES) {
flow_dbg(conn, "retransmissions count exceeded");
tcp_rst(c, conn);
} else {
@@ -2423,7 +2423,7 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
if (!conn->wnd_from_tap)
conn->wnd_from_tap = 1; /* Zero-window probe */
- conn->retrans++;
+ conn->retries++;
if (tcp_rewind_seq(c, conn))
return;
@@ -3382,7 +3382,7 @@ static int tcp_flow_repair_opt(const struct tcp_tap_conn *conn,
int tcp_flow_migrate_source(int fd, struct tcp_tap_conn *conn)
{
struct tcp_tap_transfer t = {
- .retrans = conn->retrans,
+ .retries = conn->retries,
.ws_from_tap = conn->ws_from_tap,
.ws_to_tap = conn->ws_to_tap,
.events = conn->events,
@@ -3662,7 +3662,7 @@ int tcp_flow_migrate_target(struct ctx *c, int fd)
memcpy(&flow->f.side, &t.side, sizeof(flow->f.side));
conn = FLOW_SET_TYPE(flow, FLOW_TCP, tcp);
- conn->retrans = t.retrans;
+ conn->retries = t.retries;
conn->ws_from_tap = t.ws_from_tap;
conn->ws_to_tap = t.ws_to_tap;
conn->events = t.events;
diff --git a/tcp_conn.h b/tcp_conn.h
index 38b5c54..e5c8146 100644
--- a/tcp_conn.h
+++ b/tcp_conn.h
@@ -13,7 +13,7 @@
* struct tcp_tap_conn - Descriptor for a TCP connection (not spliced)
* @f: Generic flow information
* @in_epoll: Is the connection in the epoll set?
- * @retrans: Number of retransmissions occurred due to ACK_TIMEOUT
+ * @retries: Number of retries occurred due to timeouts
* @ws_from_tap: Window scaling factor advertised from tap/guest
* @ws_to_tap: Window scaling factor advertised to tap/guest
* @tap_mss: MSS advertised by tap/guest, rounded to 2 ^ TCP_MSS_BITS
@@ -38,9 +38,9 @@ struct tcp_tap_conn {
bool in_epoll :1;
-#define TCP_RETRANS_BITS 3
- unsigned int retrans :TCP_RETRANS_BITS;
-#define TCP_MAX_RETRANS MAX_FROM_BITS(TCP_RETRANS_BITS)
+#define TCP_RETRIES_BITS 3
+ unsigned int retries :TCP_RETRIES_BITS;
+#define TCP_MAX_RETRIES MAX_FROM_BITS(TCP_RETRIES_BITS)
#define TCP_WS_BITS 4 /* RFC 7323 */
#define TCP_WS_MAX 14
@@ -102,7 +102,7 @@ struct tcp_tap_conn {
* struct tcp_tap_transfer - Migrated TCP data, flow table part, network order
* @pif: Interfaces for each side of the flow
* @side: Addresses and ports for each side of the flow
- * @retrans: Number of retransmissions occurred due to ACK_TIMEOUT
+ * @retries: Number of retries occurred due to timeouts
* @ws_from_tap: Window scaling factor advertised from tap/guest
* @ws_to_tap: Window scaling factor advertised to tap/guest
* @events: Connection events, implying connection states
@@ -122,7 +122,7 @@ struct tcp_tap_transfer {
uint8_t pif[SIDES];
struct flowside side[SIDES];
- uint8_t retrans;
+ uint8_t retries;
uint8_t ws_from_tap;
uint8_t ws_to_tap;
uint8_t events;
--
2.47.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 2/3] tcp: Resend SYN for inbound connections
2025-10-10 7:46 [PATCH v2 0/3] Retry SYNs for inbound connections Yumei Huang
2025-10-10 7:46 ` [PATCH v2 1/3] tcp: Rename "retrans" to "retries" Yumei Huang
@ 2025-10-10 7:46 ` Yumei Huang
2025-10-13 0:18 ` David Gibson
2025-10-10 7:47 ` [PATCH v2 3/3] tcp: Update data retransmission timeout Yumei Huang
2 siblings, 1 reply; 11+ messages in thread
From: Yumei Huang @ 2025-10-10 7:46 UTC (permalink / raw)
To: passt-dev, sbrivio; +Cc: david, yuhuang
If a client connects while guest is not connected or ready yet,
resend SYN instead of just resetting connection after 10 seconds.
Use the same backoff calculation for the timeout as linux kernel.
Signed-off-by: Yumei Huang <yuhuang@redhat.com>
---
tcp.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
tcp.h | 2 ++
2 files changed, 89 insertions(+), 8 deletions(-)
diff --git a/tcp.c b/tcp.c
index 2ec4b0c..85bbdac 100644
--- a/tcp.c
+++ b/tcp.c
@@ -179,9 +179,11 @@
*
* Timeouts are implemented by means of timerfd timers, set based on flags:
*
- * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshake (flag
- * ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, reset the
- * connection
+ * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake
+ * (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend
+ * SYN. It's the starting timeout for the first SYN retry. If this persists
+ * for more than TCP_MAX_RETRIES or (tcp_syn_retries +
+ * tcp_syn_linear_timeouts) times in a row, reset the connection
*
* - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending
* data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the
@@ -309,6 +311,7 @@
#include "tcp_internal.h"
#include "tcp_buf.h"
#include "tcp_vu.h"
+#include "lineread.h"
/*
* The size of TCP header (including options) is given by doff (Data Offset)
@@ -340,7 +343,7 @@ enum {
#define WINDOW_DEFAULT 14600 /* RFC 6928 */
#define ACK_INTERVAL 10 /* ms */
-#define SYN_TIMEOUT 10 /* s */
+#define SYN_TIMEOUT_INIT 1 /* s */
#define ACK_TIMEOUT 2
#define FIN_TIMEOUT 60
#define ACT_TIMEOUT 7200
@@ -365,6 +368,10 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX];
#define TCP_MIGRATE_RESTORE_CHUNK_MIN 1024 /* Try smaller when above this */
+#define TCP_SYN_RETRIES_SYSCTL "/proc/sys/net/ipv4/tcp_syn_retries"
+#define TCP_SYN_LINEAR_TIMEOUTS_SYSCTL \
+ "/proc/sys/net/ipv4/tcp_syn_linear_timeouts"
+
/* "Extended" data (not stored in the flow table) for TCP flow migration */
static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX];
@@ -581,8 +588,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
if (conn->flags & ACK_TO_TAP_DUE) {
it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000;
} else if (conn->flags & ACK_FROM_TAP_DUE) {
- if (!(conn->events & ESTABLISHED))
- it.it_value.tv_sec = SYN_TIMEOUT;
+ if (!(conn->events & ESTABLISHED)) {
+ if (conn->retries < c->tcp.syn_linear_timeouts)
+ it.it_value.tv_sec = SYN_TIMEOUT_INIT;
+ else
+ it.it_value.tv_sec = SYN_TIMEOUT_INIT <<
+ (conn->retries - c->tcp.syn_linear_timeouts);
+ }
else
it.it_value.tv_sec = ACK_TIMEOUT;
} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
@@ -1961,6 +1973,7 @@ static void tcp_conn_from_sock_finish(const struct ctx *c,
}
tcp_send_flag(c, conn, ACK);
+ conn->retries = 0;
/* The client might have sent data already, which we didn't
* dequeue waiting for SYN,ACK from tap -- check now.
@@ -2409,8 +2422,16 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
tcp_timer_ctl(c, conn);
} else if (conn->flags & ACK_FROM_TAP_DUE) {
if (!(conn->events & ESTABLISHED)) {
- flow_dbg(conn, "handshake timeout");
- tcp_rst(c, conn);
+ if (conn->retries == MIN(TCP_MAX_RETRIES,
+ (c->tcp.tcp_syn_retries + c->tcp.tcp_syn_retries))) {
+ flow_dbg(conn, "handshake timeout");
+ tcp_rst(c, conn);
+ } else {
+ flow_dbg(conn, "SYN timeout, retry");
+ tcp_send_flag(c, conn, SYN);
+ conn->retries++;
+ tcp_timer_ctl(c, conn);
+ }
} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
flow_dbg(conn, "FIN timeout");
tcp_rst(c, conn);
@@ -2766,6 +2787,62 @@ static socklen_t tcp_probe_tcp_info(void)
return sl;
}
+/**
+ * get_tcp_syn_param() - Read SYN parameters from /proc/sys
+ * @path: Path to the sysctl file
+ * @fallback: Default value if file can't be read
+ *
+ * Return: Parameter value, fallback on failure
+*/
+int get_tcp_syn_param(const char *path, int fallback)
+{
+ char *line, *end;
+ struct lineread lr;
+ long value;
+ ssize_t len;
+ int fd;
+
+ fd = open(path, O_RDONLY | O_CLOEXEC);
+ if (fd < 0) {
+ debug("Unable to open %s", path);
+ return fallback;
+ }
+
+ lineread_init(&lr, fd);
+ len = lineread_get(&lr, &line);
+ close(fd);
+
+ if (len < 0) {
+ debug("Unable to read %s", path);
+ return fallback;
+ }
+
+ errno = 0;
+ value = strtol(line, &end, 10);
+ if (*end && *end != '\n') {
+ debug("Invalid format in %s", path);
+ return fallback;
+ }
+ if (errno || value < 0 || value > INT_MAX) {
+ debug("Invalid value in %s: %ld", path, value);
+ return fallback;
+ }
+ return (int)value;
+}
+
+/**
+ * tcp_syn_params_init() - Get initial syn params for inbound connection
+ * @c: Execution context
+*/
+void tcp_syn_params_init(struct ctx *c)
+{
+ c->tcp.tcp_syn_retries = get_tcp_syn_param(TCP_SYN_RETRIES_SYSCTL, 8);
+ c->tcp.syn_linear_timeouts =
+ get_tcp_syn_param(TCP_SYN_LINEAR_TIMEOUTS_SYSCTL, 1);
+ debug("TCP SYN parameters: retries=%d, linear_timeouts=%d",
+ c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts);
+}
+
/**
* tcp_init() - Get initial sequence, hash secret, initialise per-socket data
* @c: Execution context
@@ -2776,6 +2853,8 @@ int tcp_init(struct ctx *c)
{
ASSERT(!c->no_tcp);
+ tcp_syn_params_init(c);
+
tcp_sock_iov_init(c);
memset(init_sock_pool4, 0xff, sizeof(init_sock_pool4));
diff --git a/tcp.h b/tcp.h
index 234a803..df699a4 100644
--- a/tcp.h
+++ b/tcp.h
@@ -65,6 +65,8 @@ struct tcp_ctx {
struct fwd_ports fwd_out;
struct timespec timer_run;
size_t pipe_size;
+ uint8_t tcp_syn_retries;
+ uint8_t syn_linear_timeouts;
};
#endif /* TCP_H */
--
2.47.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 3/3] tcp: Update data retransmission timeout
2025-10-10 7:46 [PATCH v2 0/3] Retry SYNs for inbound connections Yumei Huang
2025-10-10 7:46 ` [PATCH v2 1/3] tcp: Rename "retrans" to "retries" Yumei Huang
2025-10-10 7:46 ` [PATCH v2 2/3] tcp: Resend SYN for inbound connections Yumei Huang
@ 2025-10-10 7:47 ` Yumei Huang
2025-10-13 0:29 ` David Gibson
2 siblings, 1 reply; 11+ messages in thread
From: Yumei Huang @ 2025-10-10 7:47 UTC (permalink / raw)
To: passt-dev, sbrivio; +Cc: david, yuhuang
Use an exponential backoff timeout with the initial timeout 1s
and total timeout 60s.
Signed-off-by: Yumei Huang <yuhuang@redhat.com>
---
tcp.c | 25 +++++++++++++++++++------
1 file changed, 19 insertions(+), 6 deletions(-)
diff --git a/tcp.c b/tcp.c
index 85bbdac..0db7f30 100644
--- a/tcp.c
+++ b/tcp.c
@@ -185,10 +185,15 @@
* for more than TCP_MAX_RETRIES or (tcp_syn_retries +
* tcp_syn_linear_timeouts) times in a row, reset the connection
*
- * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending
- * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the
- * socket and reset sequence to what was acknowledged. If this persists for
- * more than TCP_MAX_RETRIES times in a row, reset the connection
+ * - ACK_TIMEOUT_INIT: if no ACK segment was received from tap/guest, after
+ * sending data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data
+ * from the socket and reset sequence to what was acknowledged. It's the
+ * starting timeout for the first retransmission. If this persists for more
+ * than TCP_MAX_RETRIES times in a row, reset the connection
+ *
+ * - ACK_TIMEOUT_TOTAL: if no ACK segment was received from tap/guest after
+ * retransmitting data repeatedly from the socket within this time, reset
+ * the connection
*
* - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK_FROM_TAP_DUE
* with TAP_FIN_SENT event), and no ACK is received within this time, reset
@@ -344,7 +349,8 @@ enum {
#define ACK_INTERVAL 10 /* ms */
#define SYN_TIMEOUT_INIT 1 /* s */
-#define ACK_TIMEOUT 2
+#define ACK_TIMEOUT_INIT 1
+#define ACK_TIMEOUT_TOTAL 60
#define FIN_TIMEOUT 60
#define ACT_TIMEOUT 7200
@@ -358,6 +364,9 @@ enum {
((conn)->events & (SOCK_FIN_RCVD | TAP_FIN_RCVD)))
#define CONN_HAS(conn, set) (((conn)->events & (set)) == (set))
+#define RETRY_ELAPSED(timeout_init, retries) \
+ ((timeout_init) * ((1 << ((retries) + 1)) - 2))
+
/* Buffers to migrate pending data from send and receive queues. No, they don't
* use memory if we don't use them. And we're going away after this, so splurge.
*/
@@ -596,7 +605,7 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
(conn->retries - c->tcp.syn_linear_timeouts);
}
else
- it.it_value.tv_sec = ACK_TIMEOUT;
+ it.it_value.tv_sec = ACK_TIMEOUT_INIT << conn->retries;
} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
it.it_value.tv_sec = FIN_TIMEOUT;
} else {
@@ -2438,6 +2447,10 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
} else if (conn->retries == TCP_MAX_RETRIES) {
flow_dbg(conn, "retransmissions count exceeded");
tcp_rst(c, conn);
+ } else if(RETRY_ELAPSED(ACK_TIMEOUT_INIT, conn->retries) >=
+ ACK_TIMEOUT_TOTAL) {
+ flow_dbg(conn, "retransmissions timeout exceeded");
+ tcp_rst(c, conn);
} else {
flow_dbg(conn, "ACK timeout, retry");
--
2.47.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/3] tcp: Rename "retrans" to "retries"
2025-10-10 7:46 ` [PATCH v2 1/3] tcp: Rename "retrans" to "retries" Yumei Huang
@ 2025-10-12 23:52 ` David Gibson
0 siblings, 0 replies; 11+ messages in thread
From: David Gibson @ 2025-10-12 23:52 UTC (permalink / raw)
To: Yumei Huang; +Cc: passt-dev, sbrivio
[-- Attachment #1: Type: text/plain, Size: 5029 bytes --]
On Fri, Oct 10, 2025 at 03:46:58PM +0800, Yumei Huang wrote:
> Rename "retrans" to "retries" so it can be used for SYN retries.
>
> Signed-off-by: Yumei Huang <yuhuang@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
I think the new name is clearer, although AFAICT the RFCs mostly seem
to refer to both data retransmits and SYN retries as "retransmits".
> ---
> tcp.c | 12 ++++++------
> tcp_conn.h | 12 ++++++------
> 2 files changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/tcp.c b/tcp.c
> index 0f9e9b3..2ec4b0c 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -186,7 +186,7 @@
> * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending
> * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the
> * socket and reset sequence to what was acknowledged. If this persists for
> - * more than TCP_MAX_RETRANS times in a row, reset the connection
> + * more than TCP_MAX_RETRIES times in a row, reset the connection
> *
> * - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK_FROM_TAP_DUE
> * with TAP_FIN_SENT event), and no ACK is received within this time, reset
> @@ -1127,7 +1127,7 @@ static void tcp_update_seqack_from_tap(const struct ctx *c,
> if (SEQ_LT(seq, conn->seq_to_tap))
> conn_flag(c, conn, ACK_FROM_TAP_DUE);
>
> - conn->retrans = 0;
> + conn->retries = 0;
> conn->seq_ack_from_tap = seq;
> }
> }
> @@ -2414,7 +2414,7 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
> } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
> flow_dbg(conn, "FIN timeout");
> tcp_rst(c, conn);
> - } else if (conn->retrans == TCP_MAX_RETRANS) {
> + } else if (conn->retries == TCP_MAX_RETRIES) {
> flow_dbg(conn, "retransmissions count exceeded");
> tcp_rst(c, conn);
> } else {
> @@ -2423,7 +2423,7 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
> if (!conn->wnd_from_tap)
> conn->wnd_from_tap = 1; /* Zero-window probe */
>
> - conn->retrans++;
> + conn->retries++;
> if (tcp_rewind_seq(c, conn))
> return;
>
> @@ -3382,7 +3382,7 @@ static int tcp_flow_repair_opt(const struct tcp_tap_conn *conn,
> int tcp_flow_migrate_source(int fd, struct tcp_tap_conn *conn)
> {
> struct tcp_tap_transfer t = {
> - .retrans = conn->retrans,
> + .retries = conn->retries,
> .ws_from_tap = conn->ws_from_tap,
> .ws_to_tap = conn->ws_to_tap,
> .events = conn->events,
> @@ -3662,7 +3662,7 @@ int tcp_flow_migrate_target(struct ctx *c, int fd)
> memcpy(&flow->f.side, &t.side, sizeof(flow->f.side));
> conn = FLOW_SET_TYPE(flow, FLOW_TCP, tcp);
>
> - conn->retrans = t.retrans;
> + conn->retries = t.retries;
> conn->ws_from_tap = t.ws_from_tap;
> conn->ws_to_tap = t.ws_to_tap;
> conn->events = t.events;
> diff --git a/tcp_conn.h b/tcp_conn.h
> index 38b5c54..e5c8146 100644
> --- a/tcp_conn.h
> +++ b/tcp_conn.h
> @@ -13,7 +13,7 @@
> * struct tcp_tap_conn - Descriptor for a TCP connection (not spliced)
> * @f: Generic flow information
> * @in_epoll: Is the connection in the epoll set?
> - * @retrans: Number of retransmissions occurred due to ACK_TIMEOUT
> + * @retries: Number of retries occurred due to timeouts
> * @ws_from_tap: Window scaling factor advertised from tap/guest
> * @ws_to_tap: Window scaling factor advertised to tap/guest
> * @tap_mss: MSS advertised by tap/guest, rounded to 2 ^ TCP_MSS_BITS
> @@ -38,9 +38,9 @@ struct tcp_tap_conn {
>
> bool in_epoll :1;
>
> -#define TCP_RETRANS_BITS 3
> - unsigned int retrans :TCP_RETRANS_BITS;
> -#define TCP_MAX_RETRANS MAX_FROM_BITS(TCP_RETRANS_BITS)
> +#define TCP_RETRIES_BITS 3
> + unsigned int retries :TCP_RETRIES_BITS;
> +#define TCP_MAX_RETRIES MAX_FROM_BITS(TCP_RETRIES_BITS)
>
> #define TCP_WS_BITS 4 /* RFC 7323 */
> #define TCP_WS_MAX 14
> @@ -102,7 +102,7 @@ struct tcp_tap_conn {
> * struct tcp_tap_transfer - Migrated TCP data, flow table part, network order
> * @pif: Interfaces for each side of the flow
> * @side: Addresses and ports for each side of the flow
> - * @retrans: Number of retransmissions occurred due to ACK_TIMEOUT
> + * @retries: Number of retries occurred due to timeouts
> * @ws_from_tap: Window scaling factor advertised from tap/guest
> * @ws_to_tap: Window scaling factor advertised to tap/guest
> * @events: Connection events, implying connection states
> @@ -122,7 +122,7 @@ struct tcp_tap_transfer {
> uint8_t pif[SIDES];
> struct flowside side[SIDES];
>
> - uint8_t retrans;
> + uint8_t retries;
> uint8_t ws_from_tap;
> uint8_t ws_to_tap;
> uint8_t events;
> --
> 2.47.0
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/3] tcp: Resend SYN for inbound connections
2025-10-10 7:46 ` [PATCH v2 2/3] tcp: Resend SYN for inbound connections Yumei Huang
@ 2025-10-13 0:18 ` David Gibson
2025-10-13 10:22 ` Yumei Huang
0 siblings, 1 reply; 11+ messages in thread
From: David Gibson @ 2025-10-13 0:18 UTC (permalink / raw)
To: Yumei Huang; +Cc: passt-dev, sbrivio
[-- Attachment #1: Type: text/plain, Size: 8034 bytes --]
On Fri, Oct 10, 2025 at 03:46:59PM +0800, Yumei Huang wrote:
> If a client connects while guest is not connected or ready yet,
> resend SYN instead of just resetting connection after 10 seconds.
>
> Use the same backoff calculation for the timeout as linux kernel.
>
> Signed-off-by: Yumei Huang <yuhuang@redhat.com>
> ---
> tcp.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
> tcp.h | 2 ++
> 2 files changed, 89 insertions(+), 8 deletions(-)
>
> diff --git a/tcp.c b/tcp.c
> index 2ec4b0c..85bbdac 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -179,9 +179,11 @@
> *
> * Timeouts are implemented by means of timerfd timers, set based on flags:
> *
> - * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshake (flag
> - * ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, reset the
> - * connection
> + * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake
> + * (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend
> + * SYN. It's the starting timeout for the first SYN retry. If this persists
> + * for more than TCP_MAX_RETRIES or (tcp_syn_retries +
> + * tcp_syn_linear_timeouts) times in a row, reset the connection
> *
> * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending
> * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the
> @@ -309,6 +311,7 @@
> #include "tcp_internal.h"
> #include "tcp_buf.h"
> #include "tcp_vu.h"
> +#include "lineread.h"
>
> /*
> * The size of TCP header (including options) is given by doff (Data Offset)
> @@ -340,7 +343,7 @@ enum {
> #define WINDOW_DEFAULT 14600 /* RFC 6928 */
>
> #define ACK_INTERVAL 10 /* ms */
> -#define SYN_TIMEOUT 10 /* s */
> +#define SYN_TIMEOUT_INIT 1 /* s */
> #define ACK_TIMEOUT 2
As we've discussed in some places, the RFCs largely treat SYN retries
the same as data retransmits. ACK_TIMEOUT controls the latter. 2s is
a little odd, IIRC the RFC suggests 3, I'm not sure what Linux does by
default. In any case, I'm wondering if we should use a single define
for the initial value of both timeouts.
> #define FIN_TIMEOUT 60
> #define ACT_TIMEOUT 7200
> @@ -365,6 +368,10 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX];
>
> #define TCP_MIGRATE_RESTORE_CHUNK_MIN 1024 /* Try smaller when above this */
>
> +#define TCP_SYN_RETRIES_SYSCTL "/proc/sys/net/ipv4/tcp_syn_retries"
> +#define TCP_SYN_LINEAR_TIMEOUTS_SYSCTL \
> + "/proc/sys/net/ipv4/tcp_syn_linear_timeouts"
> +
> /* "Extended" data (not stored in the flow table) for TCP flow migration */
> static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX];
>
> @@ -581,8 +588,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
> if (conn->flags & ACK_TO_TAP_DUE) {
> it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000;
> } else if (conn->flags & ACK_FROM_TAP_DUE) {
> - if (!(conn->events & ESTABLISHED))
> - it.it_value.tv_sec = SYN_TIMEOUT;
> + if (!(conn->events & ESTABLISHED)) {
> + if (conn->retries < c->tcp.syn_linear_timeouts)
> + it.it_value.tv_sec = SYN_TIMEOUT_INIT;
> + else
> + it.it_value.tv_sec = SYN_TIMEOUT_INIT <<
> + (conn->retries - c->tcp.syn_linear_timeouts);
> + }
> else
> it.it_value.tv_sec = ACK_TIMEOUT;
> } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
> @@ -1961,6 +1973,7 @@ static void tcp_conn_from_sock_finish(const struct ctx *c,
> }
>
> tcp_send_flag(c, conn, ACK);
> + conn->retries = 0;
I know Stefano said you needed to add this, but on a closer
examination I don't think you do. tcp_tap_handler() calls
tcp_update_seqack_from_tap() if !ESTABLISHED. That will clear
retrans/retries if the ack number from the guest has advanced. That
will occur when the guest SYN-ACKs our SYN, which is exactly the point
at which we should clear the retries count.
>
> /* The client might have sent data already, which we didn't
> * dequeue waiting for SYN,ACK from tap -- check now.
> @@ -2409,8 +2422,16 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
> tcp_timer_ctl(c, conn);
> } else if (conn->flags & ACK_FROM_TAP_DUE) {
> if (!(conn->events & ESTABLISHED)) {
> - flow_dbg(conn, "handshake timeout");
> - tcp_rst(c, conn);
> + if (conn->retries == MIN(TCP_MAX_RETRIES,
> + (c->tcp.tcp_syn_retries + c->tcp.tcp_syn_retries))) {
Should this be tcp_syn_retries + linear_timeouts?
Also, probably safer to use a >= rather than == - if we somehow send
an extra retry, we don't want to then retry forever.
> + flow_dbg(conn, "handshake timeout");
> + tcp_rst(c, conn);
> + } else {
> + flow_dbg(conn, "SYN timeout, retry");
> + tcp_send_flag(c, conn, SYN);
> + conn->retries++;
> + tcp_timer_ctl(c, conn);
> + }
> } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
> flow_dbg(conn, "FIN timeout");
> tcp_rst(c, conn);
> @@ -2766,6 +2787,62 @@ static socklen_t tcp_probe_tcp_info(void)
> return sl;
> }
>
> +/**
> + * get_tcp_syn_param() - Read SYN parameters from /proc/sys
> + * @path: Path to the sysctl file
> + * @fallback: Default value if file can't be read
> + *
> + * Return: Parameter value, fallback on failure
> +*/
> +int get_tcp_syn_param(const char *path, int fallback)
I wonder if it might be worth making a new function in util.c to read
a file containing a single number.
> +{
> + char *line, *end;
> + struct lineread lr;
> + long value;
> + ssize_t len;
> + int fd;
> +
> + fd = open(path, O_RDONLY | O_CLOEXEC);
> + if (fd < 0) {
> + debug("Unable to open %s", path);
> + return fallback;
> + }
> +
> + lineread_init(&lr, fd);
> + len = lineread_get(&lr, &line);
> + close(fd);
> +
> + if (len < 0) {
> + debug("Unable to read %s", path);
> + return fallback;
> + }
> +
> + errno = 0;
> + value = strtol(line, &end, 10);
> + if (*end && *end != '\n') {
> + debug("Invalid format in %s", path);
> + return fallback;
> + }
> + if (errno || value < 0 || value > INT_MAX) {
> + debug("Invalid value in %s: %ld", path, value);
> + return fallback;
> + }
> + return (int)value;
You return an (int) here, but store the value into a uint8_t in the
caller. That will cause unexpected results if the value you read is
greater than 255. Somewhere you need to check for this and clamp the
value. I'd suggest the function actually reading /proc return a long
long (for reuseability), then clamp it in the caller.
> +}
> +
> +/**
> + * tcp_syn_params_init() - Get initial syn params for inbound connection
> + * @c: Execution context
> +*/
> +void tcp_syn_params_init(struct ctx *c)
> +{
> + c->tcp.tcp_syn_retries = get_tcp_syn_param(TCP_SYN_RETRIES_SYSCTL, 8);
> + c->tcp.syn_linear_timeouts =
> + get_tcp_syn_param(TCP_SYN_LINEAR_TIMEOUTS_SYSCTL, 1);
> + debug("TCP SYN parameters: retries=%d, linear_timeouts=%d",
> + c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts);
> +}
> +
> /**
> * tcp_init() - Get initial sequence, hash secret, initialise per-socket data
> * @c: Execution context
> @@ -2776,6 +2853,8 @@ int tcp_init(struct ctx *c)
> {
> ASSERT(!c->no_tcp);
>
> + tcp_syn_params_init(c);
> +
> tcp_sock_iov_init(c);
>
> memset(init_sock_pool4, 0xff, sizeof(init_sock_pool4));
> diff --git a/tcp.h b/tcp.h
> index 234a803..df699a4 100644
> --- a/tcp.h
> +++ b/tcp.h
> @@ -65,6 +65,8 @@ struct tcp_ctx {
> struct fwd_ports fwd_out;
> struct timespec timer_run;
> size_t pipe_size;
> + uint8_t tcp_syn_retries;
> + uint8_t syn_linear_timeouts;
> };
>
> #endif /* TCP_H */
> --
> 2.47.0
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 3/3] tcp: Update data retransmission timeout
2025-10-10 7:47 ` [PATCH v2 3/3] tcp: Update data retransmission timeout Yumei Huang
@ 2025-10-13 0:29 ` David Gibson
0 siblings, 0 replies; 11+ messages in thread
From: David Gibson @ 2025-10-13 0:29 UTC (permalink / raw)
To: Yumei Huang; +Cc: passt-dev, sbrivio
[-- Attachment #1: Type: text/plain, Size: 3850 bytes --]
On Fri, Oct 10, 2025 at 03:47:00PM +0800, Yumei Huang wrote:
> Use an exponential backoff timeout with the initial timeout 1s
> and total timeout 60s.
The commit message needs more information on why making this change to
behaviour is desirable (e.g. referencing RFCs).
>
> Signed-off-by: Yumei Huang <yuhuang@redhat.com>
> ---
> tcp.c | 25 +++++++++++++++++++------
> 1 file changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/tcp.c b/tcp.c
> index 85bbdac..0db7f30 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -185,10 +185,15 @@
> * for more than TCP_MAX_RETRIES or (tcp_syn_retries +
> * tcp_syn_linear_timeouts) times in a row, reset the connection
> *
> - * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending
> - * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the
> - * socket and reset sequence to what was acknowledged. If this persists for
> - * more than TCP_MAX_RETRIES times in a row, reset the connection
> + * - ACK_TIMEOUT_INIT: if no ACK segment was received from tap/guest, after
> + * sending data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data
> + * from the socket and reset sequence to what was acknowledged. It's the
> + * starting timeout for the first retransmission. If this persists for more
> + * than TCP_MAX_RETRIES times in a row, reset the connection
> + *
> + * - ACK_TIMEOUT_TOTAL: if no ACK segment was received from tap/guest after
> + * retransmitting data repeatedly from the socket within this time, reset
> + * the connection
> *
> * - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK_FROM_TAP_DUE
> * with TAP_FIN_SENT event), and no ACK is received within this time, reset
> @@ -344,7 +349,8 @@ enum {
>
> #define ACK_INTERVAL 10 /* ms */
> #define SYN_TIMEOUT_INIT 1 /* s */
> -#define ACK_TIMEOUT 2
> +#define ACK_TIMEOUT_INIT 1
> +#define ACK_TIMEOUT_TOTAL 60
> #define FIN_TIMEOUT 60
> #define ACT_TIMEOUT 7200
>
> @@ -358,6 +364,9 @@ enum {
> ((conn)->events & (SOCK_FIN_RCVD | TAP_FIN_RCVD)))
> #define CONN_HAS(conn, set) (((conn)->events & (set)) == (set))
>
> +#define RETRY_ELAPSED(timeout_init, retries) \
> + ((timeout_init) * ((1 << ((retries) + 1)) - 2))
> +
> /* Buffers to migrate pending data from send and receive queues. No, they don't
> * use memory if we don't use them. And we're going away after this, so splurge.
> */
> @@ -596,7 +605,7 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
> (conn->retries - c->tcp.syn_linear_timeouts);
> }
> else
> - it.it_value.tv_sec = ACK_TIMEOUT;
> + it.it_value.tv_sec = ACK_TIMEOUT_INIT << conn->retries;
> } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
> it.it_value.tv_sec = FIN_TIMEOUT;
> } else {
> @@ -2438,6 +2447,10 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
> } else if (conn->retries == TCP_MAX_RETRIES) {
> flow_dbg(conn, "retransmissions count exceeded");
> tcp_rst(c, conn);
> + } else if(RETRY_ELAPSED(ACK_TIMEOUT_INIT, conn->retries) >=
> + ACK_TIMEOUT_TOTAL) {
> + flow_dbg(conn, "retransmissions timeout exceeded");
> + tcp_rst(c, conn);
Having a test both for number of retries and time elapsed seems
redundant. RETRY_ELAPSED is pure function of the number of retries,
so it should be possible to just have a threshold on the number of
retries, which can be calculated from the target total timeout.
> } else {
> flow_dbg(conn, "ACK timeout, retry");
>
> --
> 2.47.0
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/3] tcp: Resend SYN for inbound connections
2025-10-13 0:18 ` David Gibson
@ 2025-10-13 10:22 ` Yumei Huang
2025-10-13 10:31 ` Stefano Brivio
2025-10-14 0:42 ` David Gibson
0 siblings, 2 replies; 11+ messages in thread
From: Yumei Huang @ 2025-10-13 10:22 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev, sbrivio
On Mon, Oct 13, 2025 at 8:21 AM David Gibson
<david@gibson.dropbear.id.au> wrote:
>
> On Fri, Oct 10, 2025 at 03:46:59PM +0800, Yumei Huang wrote:
> > If a client connects while guest is not connected or ready yet,
> > resend SYN instead of just resetting connection after 10 seconds.
> >
> > Use the same backoff calculation for the timeout as linux kernel.
> >
> > Signed-off-by: Yumei Huang <yuhuang@redhat.com>
> > ---
> > tcp.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
> > tcp.h | 2 ++
> > 2 files changed, 89 insertions(+), 8 deletions(-)
> >
> > diff --git a/tcp.c b/tcp.c
> > index 2ec4b0c..85bbdac 100644
> > --- a/tcp.c
> > +++ b/tcp.c
> > @@ -179,9 +179,11 @@
> > *
> > * Timeouts are implemented by means of timerfd timers, set based on flags:
> > *
> > - * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshake (flag
> > - * ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, reset the
> > - * connection
> > + * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake
> > + * (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend
> > + * SYN. It's the starting timeout for the first SYN retry. If this persists
> > + * for more than TCP_MAX_RETRIES or (tcp_syn_retries +
> > + * tcp_syn_linear_timeouts) times in a row, reset the connection
> > *
> > * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending
> > * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the
> > @@ -309,6 +311,7 @@
> > #include "tcp_internal.h"
> > #include "tcp_buf.h"
> > #include "tcp_vu.h"
> > +#include "lineread.h"
> >
> > /*
> > * The size of TCP header (including options) is given by doff (Data Offset)
> > @@ -340,7 +343,7 @@ enum {
> > #define WINDOW_DEFAULT 14600 /* RFC 6928 */
> >
> > #define ACK_INTERVAL 10 /* ms */
> > -#define SYN_TIMEOUT 10 /* s */
> > +#define SYN_TIMEOUT_INIT 1 /* s */
> > #define ACK_TIMEOUT 2
>
> As we've discussed in some places, the RFCs largely treat SYN retries
> the same as data retransmits. ACK_TIMEOUT controls the latter. 2s is
> a little odd, IIRC the RFC suggests 3, I'm not sure what Linux does by
> default. In any case, I'm wondering if we should use a single define
> for the initial value of both timeouts.
Well, as Stefano pointed out, according to Appendix A in RFC 6298, the
initial value is suggested to be one second.
BTW, as this is for data retransmits which is addressed by the third
patch, I kept it not changed in this second patch. I agree we could
use one single define for both timeouts. Just not sure how we are
supposed to split the patches. Maybe we could remove the
SYN_TIMEOUT_INIT in the third patch? What do you think?
>
> > #define FIN_TIMEOUT 60
> > #define ACT_TIMEOUT 7200
> > @@ -365,6 +368,10 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX];
> >
> > #define TCP_MIGRATE_RESTORE_CHUNK_MIN 1024 /* Try smaller when above this */
> >
> > +#define TCP_SYN_RETRIES_SYSCTL "/proc/sys/net/ipv4/tcp_syn_retries"
> > +#define TCP_SYN_LINEAR_TIMEOUTS_SYSCTL \
> > + "/proc/sys/net/ipv4/tcp_syn_linear_timeouts"
> > +
> > /* "Extended" data (not stored in the flow table) for TCP flow migration */
> > static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX];
> >
> > @@ -581,8 +588,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
> > if (conn->flags & ACK_TO_TAP_DUE) {
> > it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000;
> > } else if (conn->flags & ACK_FROM_TAP_DUE) {
> > - if (!(conn->events & ESTABLISHED))
> > - it.it_value.tv_sec = SYN_TIMEOUT;
> > + if (!(conn->events & ESTABLISHED)) {
> > + if (conn->retries < c->tcp.syn_linear_timeouts)
> > + it.it_value.tv_sec = SYN_TIMEOUT_INIT;
> > + else
> > + it.it_value.tv_sec = SYN_TIMEOUT_INIT <<
> > + (conn->retries - c->tcp.syn_linear_timeouts);
> > + }
> > else
> > it.it_value.tv_sec = ACK_TIMEOUT;
> > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
> > @@ -1961,6 +1973,7 @@ static void tcp_conn_from_sock_finish(const struct ctx *c,
> > }
> >
> > tcp_send_flag(c, conn, ACK);
> > + conn->retries = 0;
>
> I know Stefano said you needed to add this, but on a closer
> examination I don't think you do. tcp_tap_handler() calls
> tcp_update_seqack_from_tap() if !ESTABLISHED. That will clear
> retrans/retries if the ack number from the guest has advanced. That
> will occur when the guest SYN-ACKs our SYN, which is exactly the point
> at which we should clear the retries count.
I see, thanks. Will remove it in v3.
>
> >
> > /* The client might have sent data already, which we didn't
> > * dequeue waiting for SYN,ACK from tap -- check now.
> > @@ -2409,8 +2422,16 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
> > tcp_timer_ctl(c, conn);
> > } else if (conn->flags & ACK_FROM_TAP_DUE) {
> > if (!(conn->events & ESTABLISHED)) {
> > - flow_dbg(conn, "handshake timeout");
> > - tcp_rst(c, conn);
> > + if (conn->retries == MIN(TCP_MAX_RETRIES,
> > + (c->tcp.tcp_syn_retries + c->tcp.tcp_syn_retries))) {
>
> Should this be tcp_syn_retries + linear_timeouts?
Yes, sorry, there must be a copy error. I should have double checked
before sending out.
>
> Also, probably safer to use a >= rather than == - if we somehow send
> an extra retry, we don't want to then retry forever.
>
> > + flow_dbg(conn, "handshake timeout");
> > + tcp_rst(c, conn);
> > + } else {
> > + flow_dbg(conn, "SYN timeout, retry");
> > + tcp_send_flag(c, conn, SYN);
> > + conn->retries++;
> > + tcp_timer_ctl(c, conn);
> > + }
> > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
> > flow_dbg(conn, "FIN timeout");
> > tcp_rst(c, conn);
> > @@ -2766,6 +2787,62 @@ static socklen_t tcp_probe_tcp_info(void)
> > return sl;
> > }
> >
> > +/**
> > + * get_tcp_syn_param() - Read SYN parameters from /proc/sys
> > + * @path: Path to the sysctl file
> > + * @fallback: Default value if file can't be read
> > + *
> > + * Return: Parameter value, fallback on failure
> > +*/
> > +int get_tcp_syn_param(const char *path, int fallback)
>
> I wonder if it might be worth making a new function in util.c to read
> a file containing a single number.
I can do that. So it could be reused in the future.
>
> > +{
> > + char *line, *end;
> > + struct lineread lr;
> > + long value;
> > + ssize_t len;
> > + int fd;
> > +
> > + fd = open(path, O_RDONLY | O_CLOEXEC);
> > + if (fd < 0) {
> > + debug("Unable to open %s", path);
> > + return fallback;
> > + }
> > +
> > + lineread_init(&lr, fd);
> > + len = lineread_get(&lr, &line);
> > + close(fd);
> > +
> > + if (len < 0) {
> > + debug("Unable to read %s", path);
> > + return fallback;
> > + }
> > +
> > + errno = 0;
> > + value = strtol(line, &end, 10);
> > + if (*end && *end != '\n') {
> > + debug("Invalid format in %s", path);
> > + return fallback;
> > + }
> > + if (errno || value < 0 || value > INT_MAX) {
> > + debug("Invalid value in %s: %ld", path, value);
> > + return fallback;
> > + }
> > + return (int)value;
>
> You return an (int) here, but store the value into a uint8_t in the
> caller. That will cause unexpected results if the value you read is
> greater than 255. Somewhere you need to check for this and clamp the
> value. I'd suggest the function actually reading /proc return a long
> long (for reuseability), then clamp it in the caller.
Yeah, I noticed that. Seems tcp_syn_retries should not be higher than
127 according to Documentation/networking/ip-sysctl.rst, but
tcp_syn_linear_timeouts is not limited. Anyway, I should check them.
Will update in v3.
>
> > +}
> > +
> > +/**
> > + * tcp_syn_params_init() - Get initial syn params for inbound connection
> > + * @c: Execution context
> > +*/
> > +void tcp_syn_params_init(struct ctx *c)
> > +{
> > + c->tcp.tcp_syn_retries = get_tcp_syn_param(TCP_SYN_RETRIES_SYSCTL, 8);
> > + c->tcp.syn_linear_timeouts =
> > + get_tcp_syn_param(TCP_SYN_LINEAR_TIMEOUTS_SYSCTL, 1);
> > + debug("TCP SYN parameters: retries=%d, linear_timeouts=%d",
> > + c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts);
> > +}
> > +
> > /**
> > * tcp_init() - Get initial sequence, hash secret, initialise per-socket data
> > * @c: Execution context
> > @@ -2776,6 +2853,8 @@ int tcp_init(struct ctx *c)
> > {
> > ASSERT(!c->no_tcp);
> >
> > + tcp_syn_params_init(c);
> > +
> > tcp_sock_iov_init(c);
> >
> > memset(init_sock_pool4, 0xff, sizeof(init_sock_pool4));
> > diff --git a/tcp.h b/tcp.h
> > index 234a803..df699a4 100644
> > --- a/tcp.h
> > +++ b/tcp.h
> > @@ -65,6 +65,8 @@ struct tcp_ctx {
> > struct fwd_ports fwd_out;
> > struct timespec timer_run;
> > size_t pipe_size;
> > + uint8_t tcp_syn_retries;
> > + uint8_t syn_linear_timeouts;
> > };
> >
> > #endif /* TCP_H */
> > --
> > 2.47.0
> >
>
> --
> David Gibson (he or they) | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
> | around.
> http://www.ozlabs.org/~dgibson
--
Thanks,
Yumei Huang
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/3] tcp: Resend SYN for inbound connections
2025-10-13 10:22 ` Yumei Huang
@ 2025-10-13 10:31 ` Stefano Brivio
2025-10-14 0:44 ` David Gibson
2025-10-14 0:42 ` David Gibson
1 sibling, 1 reply; 11+ messages in thread
From: Stefano Brivio @ 2025-10-13 10:31 UTC (permalink / raw)
To: Yumei Huang; +Cc: David Gibson, passt-dev
On Mon, 13 Oct 2025 18:22:00 +0800
Yumei Huang <yuhuang@redhat.com> wrote:
> On Mon, Oct 13, 2025 at 8:21 AM David Gibson
> <david@gibson.dropbear.id.au> wrote:
> >
> > On Fri, Oct 10, 2025 at 03:46:59PM +0800, Yumei Huang wrote:
> > >
> > > +/**
> > > + * get_tcp_syn_param() - Read SYN parameters from /proc/sys
> > > + * @path: Path to the sysctl file
> > > + * @fallback: Default value if file can't be read
> > > + *
> > > + * Return: Parameter value, fallback on failure
> > > +*/
> > > +int get_tcp_syn_param(const char *path, int fallback)
> >
> > I wonder if it might be worth making a new function in util.c to read
> > a file containing a single number.
>
> I can do that. So it could be reused in the future.
If it helps: we already have write_file(), so it might be worth adding
a read_file() symmetric to it. Maybe it's overkill though.
--
Stefano
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/3] tcp: Resend SYN for inbound connections
2025-10-13 10:22 ` Yumei Huang
2025-10-13 10:31 ` Stefano Brivio
@ 2025-10-14 0:42 ` David Gibson
1 sibling, 0 replies; 11+ messages in thread
From: David Gibson @ 2025-10-14 0:42 UTC (permalink / raw)
To: Yumei Huang; +Cc: passt-dev, sbrivio
[-- Attachment #1: Type: text/plain, Size: 11600 bytes --]
On Mon, Oct 13, 2025 at 06:22:00PM +0800, Yumei Huang wrote:
> On Mon, Oct 13, 2025 at 8:21 AM David Gibson
> <david@gibson.dropbear.id.au> wrote:
> >
> > On Fri, Oct 10, 2025 at 03:46:59PM +0800, Yumei Huang wrote:
> > > If a client connects while guest is not connected or ready yet,
> > > resend SYN instead of just resetting connection after 10 seconds.
> > >
> > > Use the same backoff calculation for the timeout as linux kernel.
> > >
> > > Signed-off-by: Yumei Huang <yuhuang@redhat.com>
> > > ---
> > > tcp.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
> > > tcp.h | 2 ++
> > > 2 files changed, 89 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/tcp.c b/tcp.c
> > > index 2ec4b0c..85bbdac 100644
> > > --- a/tcp.c
> > > +++ b/tcp.c
> > > @@ -179,9 +179,11 @@
> > > *
> > > * Timeouts are implemented by means of timerfd timers, set based on flags:
> > > *
> > > - * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshake (flag
> > > - * ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, reset the
> > > - * connection
> > > + * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake
> > > + * (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend
> > > + * SYN. It's the starting timeout for the first SYN retry. If this persists
> > > + * for more than TCP_MAX_RETRIES or (tcp_syn_retries +
> > > + * tcp_syn_linear_timeouts) times in a row, reset the connection
> > > *
> > > * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending
> > > * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the
> > > @@ -309,6 +311,7 @@
> > > #include "tcp_internal.h"
> > > #include "tcp_buf.h"
> > > #include "tcp_vu.h"
> > > +#include "lineread.h"
> > >
> > > /*
> > > * The size of TCP header (including options) is given by doff (Data Offset)
> > > @@ -340,7 +343,7 @@ enum {
> > > #define WINDOW_DEFAULT 14600 /* RFC 6928 */
> > >
> > > #define ACK_INTERVAL 10 /* ms */
> > > -#define SYN_TIMEOUT 10 /* s */
> > > +#define SYN_TIMEOUT_INIT 1 /* s */
> > > #define ACK_TIMEOUT 2
> >
> > As we've discussed in some places, the RFCs largely treat SYN retries
> > the same as data retransmits. ACK_TIMEOUT controls the latter. 2s is
> > a little odd, IIRC the RFC suggests 3, I'm not sure what Linux does by
> > default. In any case, I'm wondering if we should use a single define
> > for the initial value of both timeouts.
>
> Well, as Stefano pointed out, according to Appendix A in RFC 6298, the
> initial value is suggested to be one second.
Ah, good point. I was thinking of the old RFC which that
recommendation updates.
> BTW, as this is for data retransmits which is addressed by the third
> patch, I kept it not changed in this second patch. I agree we could
> use one single define for both timeouts. Just not sure how we are
> supposed to split the patches. Maybe we could remove the
> SYN_TIMEOUT_INIT in the third patch? What do you think?
Yes, that makes sense. I wrote the comments above before I'd looked
at patch 3.
> > > #define FIN_TIMEOUT 60
> > > #define ACT_TIMEOUT 7200
> > > @@ -365,6 +368,10 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX];
> > >
> > > #define TCP_MIGRATE_RESTORE_CHUNK_MIN 1024 /* Try smaller when above this */
> > >
> > > +#define TCP_SYN_RETRIES_SYSCTL "/proc/sys/net/ipv4/tcp_syn_retries"
> > > +#define TCP_SYN_LINEAR_TIMEOUTS_SYSCTL \
> > > + "/proc/sys/net/ipv4/tcp_syn_linear_timeouts"
> > > +
> > > /* "Extended" data (not stored in the flow table) for TCP flow migration */
> > > static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX];
> > >
> > > @@ -581,8 +588,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
> > > if (conn->flags & ACK_TO_TAP_DUE) {
> > > it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000;
> > > } else if (conn->flags & ACK_FROM_TAP_DUE) {
> > > - if (!(conn->events & ESTABLISHED))
> > > - it.it_value.tv_sec = SYN_TIMEOUT;
> > > + if (!(conn->events & ESTABLISHED)) {
> > > + if (conn->retries < c->tcp.syn_linear_timeouts)
> > > + it.it_value.tv_sec = SYN_TIMEOUT_INIT;
> > > + else
> > > + it.it_value.tv_sec = SYN_TIMEOUT_INIT <<
> > > + (conn->retries - c->tcp.syn_linear_timeouts);
> > > + }
> > > else
> > > it.it_value.tv_sec = ACK_TIMEOUT;
> > > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
> > > @@ -1961,6 +1973,7 @@ static void tcp_conn_from_sock_finish(const struct ctx *c,
> > > }
> > >
> > > tcp_send_flag(c, conn, ACK);
> > > + conn->retries = 0;
> >
> > I know Stefano said you needed to add this, but on a closer
> > examination I don't think you do. tcp_tap_handler() calls
> > tcp_update_seqack_from_tap() if !ESTABLISHED. That will clear
> > retrans/retries if the ack number from the guest has advanced. That
> > will occur when the guest SYN-ACKs our SYN, which is exactly the point
> > at which we should clear the retries count.
>
> I see, thanks. Will remove it in v3.
> >
> > >
> > > /* The client might have sent data already, which we didn't
> > > * dequeue waiting for SYN,ACK from tap -- check now.
> > > @@ -2409,8 +2422,16 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
> > > tcp_timer_ctl(c, conn);
> > > } else if (conn->flags & ACK_FROM_TAP_DUE) {
> > > if (!(conn->events & ESTABLISHED)) {
> > > - flow_dbg(conn, "handshake timeout");
> > > - tcp_rst(c, conn);
> > > + if (conn->retries == MIN(TCP_MAX_RETRIES,
> > > + (c->tcp.tcp_syn_retries + c->tcp.tcp_syn_retries))) {
> >
> > Should this be tcp_syn_retries + linear_timeouts?
>
> Yes, sorry, there must be a copy error. I should have double checked
> before sending out.
That's what review is for :). I've certainly made sillier errors than
that in patches before.
> > Also, probably safer to use a >= rather than == - if we somehow send
> > an extra retry, we don't want to then retry forever.
> >
> > > + flow_dbg(conn, "handshake timeout");
> > > + tcp_rst(c, conn);
> > > + } else {
> > > + flow_dbg(conn, "SYN timeout, retry");
> > > + tcp_send_flag(c, conn, SYN);
> > > + conn->retries++;
> > > + tcp_timer_ctl(c, conn);
> > > + }
> > > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
> > > flow_dbg(conn, "FIN timeout");
> > > tcp_rst(c, conn);
> > > @@ -2766,6 +2787,62 @@ static socklen_t tcp_probe_tcp_info(void)
> > > return sl;
> > > }
> > >
> > > +/**
> > > + * get_tcp_syn_param() - Read SYN parameters from /proc/sys
> > > + * @path: Path to the sysctl file
> > > + * @fallback: Default value if file can't be read
> > > + *
> > > + * Return: Parameter value, fallback on failure
> > > +*/
> > > +int get_tcp_syn_param(const char *path, int fallback)
> >
> > I wonder if it might be worth making a new function in util.c to read
> > a file containing a single number.
>
> I can do that. So it could be reused in the future.
> >
> > > +{
> > > + char *line, *end;
> > > + struct lineread lr;
> > > + long value;
> > > + ssize_t len;
> > > + int fd;
> > > +
> > > + fd = open(path, O_RDONLY | O_CLOEXEC);
> > > + if (fd < 0) {
> > > + debug("Unable to open %s", path);
> > > + return fallback;
> > > + }
> > > +
> > > + lineread_init(&lr, fd);
> > > + len = lineread_get(&lr, &line);
> > > + close(fd);
> > > +
> > > + if (len < 0) {
> > > + debug("Unable to read %s", path);
> > > + return fallback;
> > > + }
> > > +
> > > + errno = 0;
> > > + value = strtol(line, &end, 10);
> > > + if (*end && *end != '\n') {
> > > + debug("Invalid format in %s", path);
> > > + return fallback;
> > > + }
> > > + if (errno || value < 0 || value > INT_MAX) {
> > > + debug("Invalid value in %s: %ld", path, value);
> > > + return fallback;
> > > + }
> > > + return (int)value;
> >
> > You return an (int) here, but store the value into a uint8_t in the
> > caller. That will cause unexpected results if the value you read is
> > greater than 255. Somewhere you need to check for this and clamp the
> > value. I'd suggest the function actually reading /proc return a long
> > long (for reuseability), then clamp it in the caller.
>
> Yeah, I noticed that. Seems tcp_syn_retries should not be higher than
> 127 according to Documentation/networking/ip-sysctl.rst, but
> tcp_syn_linear_timeouts is not limited. Anyway, I should check them.
> Will update in v3.
>
>
> >
> > > +}
> > > +
> > > +/**
> > > + * tcp_syn_params_init() - Get initial syn params for inbound connection
> > > + * @c: Execution context
> > > +*/
> > > +void tcp_syn_params_init(struct ctx *c)
> > > +{
> > > + c->tcp.tcp_syn_retries = get_tcp_syn_param(TCP_SYN_RETRIES_SYSCTL, 8);
> > > + c->tcp.syn_linear_timeouts =
> > > + get_tcp_syn_param(TCP_SYN_LINEAR_TIMEOUTS_SYSCTL, 1);
> > > + debug("TCP SYN parameters: retries=%d, linear_timeouts=%d",
> > > + c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts);
> > > +}
> > > +
> > > /**
> > > * tcp_init() - Get initial sequence, hash secret, initialise per-socket data
> > > * @c: Execution context
> > > @@ -2776,6 +2853,8 @@ int tcp_init(struct ctx *c)
> > > {
> > > ASSERT(!c->no_tcp);
> > >
> > > + tcp_syn_params_init(c);
> > > +
> > > tcp_sock_iov_init(c);
> > >
> > > memset(init_sock_pool4, 0xff, sizeof(init_sock_pool4));
> > > diff --git a/tcp.h b/tcp.h
> > > index 234a803..df699a4 100644
> > > --- a/tcp.h
> > > +++ b/tcp.h
> > > @@ -65,6 +65,8 @@ struct tcp_ctx {
> > > struct fwd_ports fwd_out;
> > > struct timespec timer_run;
> > > size_t pipe_size;
> > > + uint8_t tcp_syn_retries;
> > > + uint8_t syn_linear_timeouts;
> > > };
> > >
> > > #endif /* TCP_H */
> > > --
> > > 2.47.0
> > >
> >
> > --
> > David Gibson (he or they) | I'll have my music baroque, and my code
> > david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
> > | around.
> > http://www.ozlabs.org/~dgibson
>
>
>
> --
> Thanks,
>
> Yumei Huang
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/3] tcp: Resend SYN for inbound connections
2025-10-13 10:31 ` Stefano Brivio
@ 2025-10-14 0:44 ` David Gibson
0 siblings, 0 replies; 11+ messages in thread
From: David Gibson @ 2025-10-14 0:44 UTC (permalink / raw)
To: Stefano Brivio; +Cc: Yumei Huang, passt-dev
[-- Attachment #1: Type: text/plain, Size: 1388 bytes --]
On Mon, Oct 13, 2025 at 12:31:25PM +0200, Stefano Brivio wrote:
> On Mon, 13 Oct 2025 18:22:00 +0800
> Yumei Huang <yuhuang@redhat.com> wrote:
>
> > On Mon, Oct 13, 2025 at 8:21 AM David Gibson
> > <david@gibson.dropbear.id.au> wrote:
> > >
> > > On Fri, Oct 10, 2025 at 03:46:59PM +0800, Yumei Huang wrote:
> > > >
> > > > +/**
> > > > + * get_tcp_syn_param() - Read SYN parameters from /proc/sys
> > > > + * @path: Path to the sysctl file
> > > > + * @fallback: Default value if file can't be read
> > > > + *
> > > > + * Return: Parameter value, fallback on failure
> > > > +*/
> > > > +int get_tcp_syn_param(const char *path, int fallback)
> > >
> > > I wonder if it might be worth making a new function in util.c to read
> > > a file containing a single number.
> >
> > I can do that. So it could be reused in the future.
>
> If it helps: we already have write_file(), so it might be worth adding
> a read_file() symmetric to it. Maybe it's overkill though.
That might be nice. Since we can't do allocations the right interface
for read_file() is slightly trickier though: we need to have a
preallocated buffer and report possible truncation.
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-10-14 0:44 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-10 7:46 [PATCH v2 0/3] Retry SYNs for inbound connections Yumei Huang
2025-10-10 7:46 ` [PATCH v2 1/3] tcp: Rename "retrans" to "retries" Yumei Huang
2025-10-12 23:52 ` David Gibson
2025-10-10 7:46 ` [PATCH v2 2/3] tcp: Resend SYN for inbound connections Yumei Huang
2025-10-13 0:18 ` David Gibson
2025-10-13 10:22 ` Yumei Huang
2025-10-13 10:31 ` Stefano Brivio
2025-10-14 0:44 ` David Gibson
2025-10-14 0:42 ` David Gibson
2025-10-10 7:47 ` [PATCH v2 3/3] tcp: Update data retransmission timeout Yumei Huang
2025-10-13 0:29 ` David Gibson
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).