* [PATCH v5 0/4] Retry SYNs for inbound connections
@ 2025-10-17 4:27 Yumei Huang
2025-10-17 4:27 ` [PATCH v5 1/4] tcp: Rename "retrans" to "retries" Yumei Huang
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Yumei Huang @ 2025-10-17 4:27 UTC (permalink / raw)
To: passt-dev, sbrivio; +Cc: david, yuhuang
When a client connects, SYN would be sent to guest only once. If the
guest is not connected or ready at that time, the connection will be
reset in 10s. These patches introduce the SYN retry mechanism using
the similar backoff timeout as linux kernel. Also update the data
retransmission timeout using the backoff timeout.
Yumei Huang (4):
tcp: Rename "retrans" to "retries"
util: Introduce read_file() and read_file_integer() function
tcp: Resend SYN for inbound connections
tcp: Update data retransmission timeout
tcp.c | 76 +++++++++++++++++++++++++++++++++++-------------
tcp.h | 5 ++++
tcp_conn.h | 12 ++++----
util.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
util.h | 10 +++++++
5 files changed, 161 insertions(+), 26 deletions(-)
--
2.47.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v5 1/4] tcp: Rename "retrans" to "retries"
2025-10-17 4:27 [PATCH v5 0/4] Retry SYNs for inbound connections Yumei Huang
@ 2025-10-17 4:27 ` Yumei Huang
2025-10-17 4:27 ` [PATCH v5 2/4] util: Introduce read_file() and read_file_integer() function Yumei Huang
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: Yumei Huang @ 2025-10-17 4:27 UTC (permalink / raw)
To: passt-dev, sbrivio; +Cc: david, yuhuang
Rename "retrans" to "retries" so it can be used for SYN retries.
Signed-off-by: Yumei Huang <yuhuang@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp.c | 12 ++++++------
tcp_conn.h | 12 ++++++------
2 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/tcp.c b/tcp.c
index 0f9e9b3..2ec4b0c 100644
--- a/tcp.c
+++ b/tcp.c
@@ -186,7 +186,7 @@
* - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending
* data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the
* socket and reset sequence to what was acknowledged. If this persists for
- * more than TCP_MAX_RETRANS times in a row, reset the connection
+ * more than TCP_MAX_RETRIES times in a row, reset the connection
*
* - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK_FROM_TAP_DUE
* with TAP_FIN_SENT event), and no ACK is received within this time, reset
@@ -1127,7 +1127,7 @@ static void tcp_update_seqack_from_tap(const struct ctx *c,
if (SEQ_LT(seq, conn->seq_to_tap))
conn_flag(c, conn, ACK_FROM_TAP_DUE);
- conn->retrans = 0;
+ conn->retries = 0;
conn->seq_ack_from_tap = seq;
}
}
@@ -2414,7 +2414,7 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
flow_dbg(conn, "FIN timeout");
tcp_rst(c, conn);
- } else if (conn->retrans == TCP_MAX_RETRANS) {
+ } else if (conn->retries == TCP_MAX_RETRIES) {
flow_dbg(conn, "retransmissions count exceeded");
tcp_rst(c, conn);
} else {
@@ -2423,7 +2423,7 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
if (!conn->wnd_from_tap)
conn->wnd_from_tap = 1; /* Zero-window probe */
- conn->retrans++;
+ conn->retries++;
if (tcp_rewind_seq(c, conn))
return;
@@ -3382,7 +3382,7 @@ static int tcp_flow_repair_opt(const struct tcp_tap_conn *conn,
int tcp_flow_migrate_source(int fd, struct tcp_tap_conn *conn)
{
struct tcp_tap_transfer t = {
- .retrans = conn->retrans,
+ .retries = conn->retries,
.ws_from_tap = conn->ws_from_tap,
.ws_to_tap = conn->ws_to_tap,
.events = conn->events,
@@ -3662,7 +3662,7 @@ int tcp_flow_migrate_target(struct ctx *c, int fd)
memcpy(&flow->f.side, &t.side, sizeof(flow->f.side));
conn = FLOW_SET_TYPE(flow, FLOW_TCP, tcp);
- conn->retrans = t.retrans;
+ conn->retries = t.retries;
conn->ws_from_tap = t.ws_from_tap;
conn->ws_to_tap = t.ws_to_tap;
conn->events = t.events;
diff --git a/tcp_conn.h b/tcp_conn.h
index 38b5c54..e5c8146 100644
--- a/tcp_conn.h
+++ b/tcp_conn.h
@@ -13,7 +13,7 @@
* struct tcp_tap_conn - Descriptor for a TCP connection (not spliced)
* @f: Generic flow information
* @in_epoll: Is the connection in the epoll set?
- * @retrans: Number of retransmissions occurred due to ACK_TIMEOUT
+ * @retries: Number of retries occurred due to timeouts
* @ws_from_tap: Window scaling factor advertised from tap/guest
* @ws_to_tap: Window scaling factor advertised to tap/guest
* @tap_mss: MSS advertised by tap/guest, rounded to 2 ^ TCP_MSS_BITS
@@ -38,9 +38,9 @@ struct tcp_tap_conn {
bool in_epoll :1;
-#define TCP_RETRANS_BITS 3
- unsigned int retrans :TCP_RETRANS_BITS;
-#define TCP_MAX_RETRANS MAX_FROM_BITS(TCP_RETRANS_BITS)
+#define TCP_RETRIES_BITS 3
+ unsigned int retries :TCP_RETRIES_BITS;
+#define TCP_MAX_RETRIES MAX_FROM_BITS(TCP_RETRIES_BITS)
#define TCP_WS_BITS 4 /* RFC 7323 */
#define TCP_WS_MAX 14
@@ -102,7 +102,7 @@ struct tcp_tap_conn {
* struct tcp_tap_transfer - Migrated TCP data, flow table part, network order
* @pif: Interfaces for each side of the flow
* @side: Addresses and ports for each side of the flow
- * @retrans: Number of retransmissions occurred due to ACK_TIMEOUT
+ * @retries: Number of retries occurred due to timeouts
* @ws_from_tap: Window scaling factor advertised from tap/guest
* @ws_to_tap: Window scaling factor advertised to tap/guest
* @events: Connection events, implying connection states
@@ -122,7 +122,7 @@ struct tcp_tap_transfer {
uint8_t pif[SIDES];
struct flowside side[SIDES];
- uint8_t retrans;
+ uint8_t retries;
uint8_t ws_from_tap;
uint8_t ws_to_tap;
uint8_t events;
--
2.47.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v5 2/4] util: Introduce read_file() and read_file_integer() function
2025-10-17 4:27 [PATCH v5 0/4] Retry SYNs for inbound connections Yumei Huang
2025-10-17 4:27 ` [PATCH v5 1/4] tcp: Rename "retrans" to "retries" Yumei Huang
@ 2025-10-17 4:27 ` Yumei Huang
2025-10-17 4:38 ` David Gibson
2025-10-17 4:27 ` [PATCH v5 3/4] tcp: Resend SYN for inbound connections Yumei Huang
2025-10-17 4:27 ` [PATCH v5 4/4] tcp: Update data retransmission timeout Yumei Huang
3 siblings, 1 reply; 10+ messages in thread
From: Yumei Huang @ 2025-10-17 4:27 UTC (permalink / raw)
To: passt-dev, sbrivio; +Cc: david, yuhuang
Signed-off-by: Yumei Huang <yuhuang@redhat.com>
---
util.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
util.h | 10 +++++++
2 files changed, 94 insertions(+)
diff --git a/util.c b/util.c
index c492f90..ff21bf6 100644
--- a/util.c
+++ b/util.c
@@ -579,6 +579,90 @@ int write_file(const char *path, const char *buf)
return len == 0 ? 0 : -1;
}
+/**
+ * read_file() - Read contents of file into a buffer
+ * @path: File to read
+ * @buf: Buffer to store file contents
+ * @buf_size: Size of buffer
+ *
+ * Return: number of bytes read on success, -1 on any error, -2 on truncation
+*/
+ssize_t read_file(const char *path, char *buf, size_t buf_size)
+{
+ int fd = open(path, O_RDONLY | O_CLOEXEC);
+ size_t total_read = 0;
+ ssize_t rc;
+
+ if (fd < 0) {
+ warn_perror("Could not open %s", path);
+ return -1;
+ }
+
+ while (total_read < buf_size) {
+ rc = read(fd, buf + total_read, buf_size - total_read);
+
+ if (rc < 0) {
+ warn_perror("Couldn't read from %s", path);
+ close(fd);
+ return -1;
+ }
+
+ if (rc == 0)
+ break;
+
+ total_read += rc;
+ }
+
+ close(fd);
+
+ if (total_read == buf_size) {
+ warn_perror("File %s truncated, buffer too small", path);
+ return -2;
+ }
+
+ buf[total_read] = '\0';
+
+ return total_read;
+}
+
+/**
+ * read_file_integer() - Read an integer value from a file
+ * @path: File to read
+ * @fallback: Default value if file can't be read
+ *
+ * Return: Integer value, fallback on failure
+*/
+intmax_t read_file_integer(const char *path, intmax_t fallback)
+{
+ char buf[INTMAX_STRLEN];
+ ssize_t bytes_read;
+ intmax_t value;
+ char *end;
+
+ bytes_read = read_file(path, buf, sizeof(buf));
+
+ if (bytes_read < 0)
+ return fallback;
+
+ if (bytes_read == 0) {
+ debug("Empty file %s", path);
+ return fallback;
+ }
+
+ errno = 0;
+ value = strtoimax(buf, &end, 10);
+ if (*end && *end != '\n') {
+ debug("Invalid format in %s", path);
+ return fallback;
+ }
+ if (errno) {
+ debug("Invalid value in %s: %s", path, buf);
+ return fallback;
+ }
+
+ return value;
+}
+
#ifdef __ia64__
/* Needed by do_clone() below: glibc doesn't export the prototype of __clone2(),
* use the description from clone(2).
diff --git a/util.h b/util.h
index 22eaac5..acfbd08 100644
--- a/util.h
+++ b/util.h
@@ -222,6 +222,8 @@ void pidfile_write(int fd, pid_t pid);
int __daemon(int pidfile_fd, int devnull_fd);
int fls(unsigned long x);
int write_file(const char *path, const char *buf);
+ssize_t read_file(const char *path, char *buf, size_t buf_size);
+intmax_t read_file_integer(const char *path, intmax_t fallback);
int write_all_buf(int fd, const void *buf, size_t len);
int write_remainder(int fd, const struct iovec *iov, size_t iovcnt, size_t skip);
int read_all_buf(int fd, void *buf, size_t len);
@@ -250,6 +252,14 @@ static inline const char *af_name(sa_family_t af)
#define UINT16_STRLEN (sizeof("65535"))
+/* Each byte holds roughly 2.4 decimal digits:
+ * 1 bit = log10(2) ~= 0.30103 decimal digits
+ * 1 byte = 8 bits = 8 x 0.30103 ~= 2.408 decimal digits
+ * Using sizeof(intmax_t) * 3 provides a safe upper bound,
+ * plus 2 extra bytes for the sign and null terminator.
+ */
+#define INTMAX_STRLEN (sizeof(intmax_t) * 3 + 2)
+
/* inet address (- '\0') + port (u16) (- '\0') + ':' + '\0' */
#define SOCKADDR_INET_STRLEN \
(INET_ADDRSTRLEN-1 + UINT16_STRLEN-1 + sizeof(":"))
--
2.47.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v5 3/4] tcp: Resend SYN for inbound connections
2025-10-17 4:27 [PATCH v5 0/4] Retry SYNs for inbound connections Yumei Huang
2025-10-17 4:27 ` [PATCH v5 1/4] tcp: Rename "retrans" to "retries" Yumei Huang
2025-10-17 4:27 ` [PATCH v5 2/4] util: Introduce read_file() and read_file_integer() function Yumei Huang
@ 2025-10-17 4:27 ` Yumei Huang
2025-10-17 4:43 ` David Gibson
2025-10-17 4:27 ` [PATCH v5 4/4] tcp: Update data retransmission timeout Yumei Huang
3 siblings, 1 reply; 10+ messages in thread
From: Yumei Huang @ 2025-10-17 4:27 UTC (permalink / raw)
To: passt-dev, sbrivio; +Cc: david, yuhuang
If a client connects while guest is not connected or ready yet,
resend SYN instead of just resetting connection after 10 seconds.
Use the same backoff calculation for the timeout as linux kernel.
Link: https://bugs.passt.top/show_bug.cgi?id=153
Signed-off-by: Yumei Huang <yuhuang@redhat.com>
---
tcp.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++--------
tcp.h | 5 +++++
2 files changed, 52 insertions(+), 8 deletions(-)
diff --git a/tcp.c b/tcp.c
index 2ec4b0c..71e2da0 100644
--- a/tcp.c
+++ b/tcp.c
@@ -179,9 +179,11 @@
*
* Timeouts are implemented by means of timerfd timers, set based on flags:
*
- * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshake (flag
- * ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, reset the
- * connection
+ * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake
+ * (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend
+ * SYN. It's the starting timeout for the first SYN retry. If this persists
+ * for more than TCP_MAX_RETRIES or (tcp_syn_retries +
+ * tcp_syn_linear_timeouts) times in a row, reset the connection
*
* - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending
* data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the
@@ -340,7 +342,7 @@ enum {
#define WINDOW_DEFAULT 14600 /* RFC 6928 */
#define ACK_INTERVAL 10 /* ms */
-#define SYN_TIMEOUT 10 /* s */
+#define SYN_TIMEOUT_INIT 1 /* s */
#define ACK_TIMEOUT 2
#define FIN_TIMEOUT 60
#define ACT_TIMEOUT 7200
@@ -365,6 +367,9 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX];
#define TCP_MIGRATE_RESTORE_CHUNK_MIN 1024 /* Try smaller when above this */
+#define TCP_SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retries"
+#define TCP_SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linear_timeouts" \
+
/* "Extended" data (not stored in the flow table) for TCP flow migration */
static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX];
@@ -581,8 +586,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
if (conn->flags & ACK_TO_TAP_DUE) {
it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000;
} else if (conn->flags & ACK_FROM_TAP_DUE) {
- if (!(conn->events & ESTABLISHED))
- it.it_value.tv_sec = SYN_TIMEOUT;
+ if (!(conn->events & ESTABLISHED)) {
+ if (conn->retries < c->tcp.syn_linear_timeouts)
+ it.it_value.tv_sec = SYN_TIMEOUT_INIT;
+ else
+ it.it_value.tv_sec = SYN_TIMEOUT_INIT <<
+ (conn->retries - c->tcp.syn_linear_timeouts);
+ }
else
it.it_value.tv_sec = ACK_TIMEOUT;
} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
@@ -2409,8 +2419,17 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
tcp_timer_ctl(c, conn);
} else if (conn->flags & ACK_FROM_TAP_DUE) {
if (!(conn->events & ESTABLISHED)) {
- flow_dbg(conn, "handshake timeout");
- tcp_rst(c, conn);
+ if (conn->retries >= TCP_MAX_RETRIES ||
+ conn->retries >= (c->tcp.tcp_syn_retries +
+ c->tcp.syn_linear_timeouts)) {
+ flow_dbg(conn, "handshake timeout");
+ tcp_rst(c, conn);
+ } else {
+ flow_trace(conn, "SYN timeout, retry");
+ tcp_send_flag(c, conn, SYN);
+ conn->retries++;
+ tcp_timer_ctl(c, conn);
+ }
} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
flow_dbg(conn, "FIN timeout");
tcp_rst(c, conn);
@@ -2766,6 +2785,24 @@ static socklen_t tcp_probe_tcp_info(void)
return sl;
}
+/**
+ * tcp_syn_params_init() - Get initial syn params for inbound connection
+ * @c: Execution context
+*/
+void tcp_syn_params_init(struct ctx *c)
+{
+ intmax_t tcp_syn_retries, syn_linear_timeouts;
+
+ tcp_syn_retries = read_file_integer(TCP_SYN_RETRIES, 8);
+ syn_linear_timeouts = read_file_integer(TCP_SYN_LINEAR_TIMEOUTS, 1);
+
+ c->tcp.tcp_syn_retries = MIN(tcp_syn_retries, UINT8_MAX);
+ c->tcp.syn_linear_timeouts = MIN(syn_linear_timeouts, UINT8_MAX);
+
+ debug("TCP SYN parameters: retries=%"PRIu8", linear_timeouts=%"PRIu8,
+ c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts);
+}
+
/**
* tcp_init() - Get initial sequence, hash secret, initialise per-socket data
* @c: Execution context
@@ -2776,6 +2813,8 @@ int tcp_init(struct ctx *c)
{
ASSERT(!c->no_tcp);
+ tcp_syn_params_init(c);
+
tcp_sock_iov_init(c);
memset(init_sock_pool4, 0xff, sizeof(init_sock_pool4));
diff --git a/tcp.h b/tcp.h
index 234a803..bb58324 100644
--- a/tcp.h
+++ b/tcp.h
@@ -59,12 +59,17 @@ union tcp_listen_epoll_ref {
* @fwd_out: Port forwarding configuration for outbound packets
* @timer_run: Timestamp of most recent timer run
* @pipe_size: Size of pipes for spliced connections
+ * @tcp_syn_retries: Number of times initial SYNs during handshake
+ * @syn_linear_timeouts: Number of SYN retries using linear backoff timeout
+ * before switching to exponential backoff timeout
*/
struct tcp_ctx {
struct fwd_ports fwd_in;
struct fwd_ports fwd_out;
struct timespec timer_run;
size_t pipe_size;
+ uint8_t tcp_syn_retries;
+ uint8_t syn_linear_timeouts;
};
#endif /* TCP_H */
--
2.47.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v5 4/4] tcp: Update data retransmission timeout
2025-10-17 4:27 [PATCH v5 0/4] Retry SYNs for inbound connections Yumei Huang
` (2 preceding siblings ...)
2025-10-17 4:27 ` [PATCH v5 3/4] tcp: Resend SYN for inbound connections Yumei Huang
@ 2025-10-17 4:27 ` Yumei Huang
3 siblings, 0 replies; 10+ messages in thread
From: Yumei Huang @ 2025-10-17 4:27 UTC (permalink / raw)
To: passt-dev, sbrivio; +Cc: david, yuhuang
Use an exponential backoff timeout for data retransmission according
to RFC 2988 and RFC 6298. Set the initial RTO to one second as discussed
in Appendix A of RFC 6298.
Also combine the macros defining the initial RTO for both SYN and ACK.
Signed-off-by: Yumei Huang <yuhuang@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp.c | 27 ++++++++++++---------------
1 file changed, 12 insertions(+), 15 deletions(-)
diff --git a/tcp.c b/tcp.c
index 71e2da0..31bba6f 100644
--- a/tcp.c
+++ b/tcp.c
@@ -179,16 +179,14 @@
*
* Timeouts are implemented by means of timerfd timers, set based on flags:
*
- * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake
- * (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend
- * SYN. It's the starting timeout for the first SYN retry. If this persists
- * for more than TCP_MAX_RETRIES or (tcp_syn_retries +
- * tcp_syn_linear_timeouts) times in a row, reset the connection
- *
- * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending
- * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the
- * socket and reset sequence to what was acknowledged. If this persists for
- * more than TCP_MAX_RETRIES times in a row, reset the connection
+ * - RTO_INIT: if no ACK segment was received from tap/guest, either during
+ * handshake (flag ACK_FROM_TAP_DUE without ESTABLISHED event) or after
+ * sending data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data
+ * from the socket and reset sequence to what was acknowledged. This is the
+ * timeout for the first retry, in seconds. If this persists too many times
+ * in a row, reset the connection: TCP_MAX_RETRIES for established
+ * connections, or (tcp_syn_retries + tcp_syn_linear_timeouts) during the
+ * handshake.
*
* - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK_FROM_TAP_DUE
* with TAP_FIN_SENT event), and no ACK is received within this time, reset
@@ -342,8 +340,7 @@ enum {
#define WINDOW_DEFAULT 14600 /* RFC 6928 */
#define ACK_INTERVAL 10 /* ms */
-#define SYN_TIMEOUT_INIT 1 /* s */
-#define ACK_TIMEOUT 2
+#define RTO_INIT 1 /* s, RFC 6298 */
#define FIN_TIMEOUT 60
#define ACT_TIMEOUT 7200
@@ -588,13 +585,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
} else if (conn->flags & ACK_FROM_TAP_DUE) {
if (!(conn->events & ESTABLISHED)) {
if (conn->retries < c->tcp.syn_linear_timeouts)
- it.it_value.tv_sec = SYN_TIMEOUT_INIT;
+ it.it_value.tv_sec = RTO_INIT;
else
- it.it_value.tv_sec = SYN_TIMEOUT_INIT <<
+ it.it_value.tv_sec = RTO_INIT <<
(conn->retries - c->tcp.syn_linear_timeouts);
}
else
- it.it_value.tv_sec = ACK_TIMEOUT;
+ it.it_value.tv_sec = RTO_INIT << conn->retries;
} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
it.it_value.tv_sec = FIN_TIMEOUT;
} else {
--
2.47.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v5 2/4] util: Introduce read_file() and read_file_integer() function
2025-10-17 4:27 ` [PATCH v5 2/4] util: Introduce read_file() and read_file_integer() function Yumei Huang
@ 2025-10-17 4:38 ` David Gibson
0 siblings, 0 replies; 10+ messages in thread
From: David Gibson @ 2025-10-17 4:38 UTC (permalink / raw)
To: Yumei Huang; +Cc: passt-dev, sbrivio
[-- Attachment #1: Type: text/plain, Size: 4844 bytes --]
On Fri, Oct 17, 2025 at 12:27:41PM +0800, Yumei Huang wrote:
> Signed-off-by: Yumei Huang <yuhuang@redhat.com>
> ---
> util.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> util.h | 10 +++++++
> 2 files changed, 94 insertions(+)
>
> diff --git a/util.c b/util.c
> index c492f90..ff21bf6 100644
> --- a/util.c
> +++ b/util.c
> @@ -579,6 +579,90 @@ int write_file(const char *path, const char *buf)
> return len == 0 ? 0 : -1;
> }
>
> +/**
> + * read_file() - Read contents of file into a buffer
> + * @path: File to read
> + * @buf: Buffer to store file contents
> + * @buf_size: Size of buffer
> + *
> + * Return: number of bytes read on success, -1 on any error, -2 on truncation
> +*/
> +ssize_t read_file(const char *path, char *buf, size_t buf_size)
> +{
> + int fd = open(path, O_RDONLY | O_CLOEXEC);
> + size_t total_read = 0;
> + ssize_t rc;
> +
> + if (fd < 0) {
> + warn_perror("Could not open %s", path);
> + return -1;
> + }
> +
> + while (total_read < buf_size) {
> + rc = read(fd, buf + total_read, buf_size - total_read);
> +
> + if (rc < 0) {
> + warn_perror("Couldn't read from %s", path);
> + close(fd);
> + return -1;
> + }
> +
> + if (rc == 0)
> + break;
> +
> + total_read += rc;
> + }
> +
> + close(fd);
> +
> + if (total_read == buf_size) {
> + warn_perror("File %s truncated, buffer too small", path);
Sorry, I should have caught this in the last version. You don't want
warn_perror() here, just warn(). warn_perror() includes an error
reason based on errno, but errno is not meaningful here (it will
almost certainly be 0 from the close()).
> + return -2;
> + }
> +
> + buf[total_read] = '\0';
> +
> + return total_read;
> +}
> +
> +/**
> + * read_file_integer() - Read an integer value from a file
> + * @path: File to read
> + * @fallback: Default value if file can't be read
> + *
> + * Return: Integer value, fallback on failure
> +*/
> +intmax_t read_file_integer(const char *path, intmax_t fallback)
> +{
> + char buf[INTMAX_STRLEN];
> + ssize_t bytes_read;
> + intmax_t value;
> + char *end;
> +
> + bytes_read = read_file(path, buf, sizeof(buf));
> +
> + if (bytes_read < 0)
> + return fallback;
> +
> + if (bytes_read == 0) {
> + debug("Empty file %s", path);
> + return fallback;
> + }
> +
> + errno = 0;
> + value = strtoimax(buf, &end, 10);
> + if (*end && *end != '\n') {
> + debug("Invalid format in %s", path);
> + return fallback;
> + }
> + if (errno) {
> + debug("Invalid value in %s: %s", path, buf);
> + return fallback;
> + }
> +
> + return value;
> +}
> +
> #ifdef __ia64__
> /* Needed by do_clone() below: glibc doesn't export the prototype of __clone2(),
> * use the description from clone(2).
> diff --git a/util.h b/util.h
> index 22eaac5..acfbd08 100644
> --- a/util.h
> +++ b/util.h
> @@ -222,6 +222,8 @@ void pidfile_write(int fd, pid_t pid);
> int __daemon(int pidfile_fd, int devnull_fd);
> int fls(unsigned long x);
> int write_file(const char *path, const char *buf);
> +ssize_t read_file(const char *path, char *buf, size_t buf_size);
> +intmax_t read_file_integer(const char *path, intmax_t fallback);
> int write_all_buf(int fd, const void *buf, size_t len);
> int write_remainder(int fd, const struct iovec *iov, size_t iovcnt, size_t skip);
> int read_all_buf(int fd, void *buf, size_t len);
> @@ -250,6 +252,14 @@ static inline const char *af_name(sa_family_t af)
>
> #define UINT16_STRLEN (sizeof("65535"))
>
> +/* Each byte holds roughly 2.4 decimal digits:
> + * 1 bit = log10(2) ~= 0.30103 decimal digits
> + * 1 byte = 8 bits = 8 x 0.30103 ~= 2.408 decimal digits
I don't think these calculations are particularly useful to a reader.
The key point is that each byte expands to at most 3 decimal digits -
since 0xff == 255, I don't think we need to talk about logs to
establish that.
In any case, the main reason for a comment here is not to explain the
logic, but to give credit to where it came from. You got it from
Claude, but Claude's info was so similar to the stack overflow page
Stefano mentioned, it almost certainly took it from there. So, I'd
include a link to that page.
> + * Using sizeof(intmax_t) * 3 provides a safe upper bound,
> + * plus 2 extra bytes for the sign and null terminator.
> + */
> +#define INTMAX_STRLEN (sizeof(intmax_t) * 3 + 2)
> +
> /* inet address (- '\0') + port (u16) (- '\0') + ':' + '\0' */
> #define SOCKADDR_INET_STRLEN \
> (INET_ADDRSTRLEN-1 + UINT16_STRLEN-1 + sizeof(":"))
> --
> 2.47.0
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v5 3/4] tcp: Resend SYN for inbound connections
2025-10-17 4:27 ` [PATCH v5 3/4] tcp: Resend SYN for inbound connections Yumei Huang
@ 2025-10-17 4:43 ` David Gibson
2025-10-17 5:46 ` Yumei Huang
0 siblings, 1 reply; 10+ messages in thread
From: David Gibson @ 2025-10-17 4:43 UTC (permalink / raw)
To: Yumei Huang; +Cc: passt-dev, sbrivio
[-- Attachment #1: Type: text/plain, Size: 6069 bytes --]
On Fri, Oct 17, 2025 at 12:27:42PM +0800, Yumei Huang wrote:
> If a client connects while guest is not connected or ready yet,
> resend SYN instead of just resetting connection after 10 seconds.
>
> Use the same backoff calculation for the timeout as linux kernel.
>
> Link: https://bugs.passt.top/show_bug.cgi?id=153
> Signed-off-by: Yumei Huang <yuhuang@redhat.com>
> ---
> tcp.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++--------
> tcp.h | 5 +++++
> 2 files changed, 52 insertions(+), 8 deletions(-)
>
> diff --git a/tcp.c b/tcp.c
> index 2ec4b0c..71e2da0 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -179,9 +179,11 @@
> *
> * Timeouts are implemented by means of timerfd timers, set based on flags:
> *
> - * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshake (flag
> - * ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, reset the
> - * connection
> + * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake
> + * (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend
> + * SYN. It's the starting timeout for the first SYN retry. If this persists
> + * for more than TCP_MAX_RETRIES or (tcp_syn_retries +
> + * tcp_syn_linear_timeouts) times in a row, reset the connection
> *
> * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending
> * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the
> @@ -340,7 +342,7 @@ enum {
> #define WINDOW_DEFAULT 14600 /* RFC 6928 */
>
> #define ACK_INTERVAL 10 /* ms */
> -#define SYN_TIMEOUT 10 /* s */
> +#define SYN_TIMEOUT_INIT 1 /* s */
> #define ACK_TIMEOUT 2
> #define FIN_TIMEOUT 60
> #define ACT_TIMEOUT 7200
> @@ -365,6 +367,9 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX];
>
> #define TCP_MIGRATE_RESTORE_CHUNK_MIN 1024 /* Try smaller when above this */
>
> +#define TCP_SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retries"
> +#define TCP_SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linear_timeouts" \
> +
> /* "Extended" data (not stored in the flow table) for TCP flow migration */
> static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX];
>
> @@ -581,8 +586,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
> if (conn->flags & ACK_TO_TAP_DUE) {
> it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000;
> } else if (conn->flags & ACK_FROM_TAP_DUE) {
> - if (!(conn->events & ESTABLISHED))
> - it.it_value.tv_sec = SYN_TIMEOUT;
> + if (!(conn->events & ESTABLISHED)) {
> + if (conn->retries < c->tcp.syn_linear_timeouts)
> + it.it_value.tv_sec = SYN_TIMEOUT_INIT;
> + else
> + it.it_value.tv_sec = SYN_TIMEOUT_INIT <<
> + (conn->retries - c->tcp.syn_linear_timeouts);
> + }
> else
> it.it_value.tv_sec = ACK_TIMEOUT;
> } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
> @@ -2409,8 +2419,17 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
> tcp_timer_ctl(c, conn);
> } else if (conn->flags & ACK_FROM_TAP_DUE) {
> if (!(conn->events & ESTABLISHED)) {
> - flow_dbg(conn, "handshake timeout");
> - tcp_rst(c, conn);
> + if (conn->retries >= TCP_MAX_RETRIES ||
> + conn->retries >= (c->tcp.tcp_syn_retries +
> + c->tcp.syn_linear_timeouts)) {
> + flow_dbg(conn, "handshake timeout");
> + tcp_rst(c, conn);
> + } else {
> + flow_trace(conn, "SYN timeout, retry");
> + tcp_send_flag(c, conn, SYN);
> + conn->retries++;
> + tcp_timer_ctl(c, conn);
> + }
> } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
> flow_dbg(conn, "FIN timeout");
> tcp_rst(c, conn);
> @@ -2766,6 +2785,24 @@ static socklen_t tcp_probe_tcp_info(void)
> return sl;
> }
>
> +/**
> + * tcp_syn_params_init() - Get initial syn params for inbound connection
Stefano's comment wasn't 100% clear, but I believe he was suggesting
s/syn params/SYN parameters/
in the comment here for clarity.
> + * @c: Execution context
> +*/
> +void tcp_syn_params_init(struct ctx *c)
> +{
> + intmax_t tcp_syn_retries, syn_linear_timeouts;
> +
> + tcp_syn_retries = read_file_integer(TCP_SYN_RETRIES, 8);
> + syn_linear_timeouts = read_file_integer(TCP_SYN_LINEAR_TIMEOUTS, 1);
> +
> + c->tcp.tcp_syn_retries = MIN(tcp_syn_retries, UINT8_MAX);
> + c->tcp.syn_linear_timeouts = MIN(syn_linear_timeouts, UINT8_MAX);
> +
> + debug("TCP SYN parameters: retries=%"PRIu8", linear_timeouts=%"PRIu8,
> + c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts);
> +}
> +
> /**
> * tcp_init() - Get initial sequence, hash secret, initialise per-socket data
> * @c: Execution context
> @@ -2776,6 +2813,8 @@ int tcp_init(struct ctx *c)
> {
> ASSERT(!c->no_tcp);
>
> + tcp_syn_params_init(c);
> +
> tcp_sock_iov_init(c);
>
> memset(init_sock_pool4, 0xff, sizeof(init_sock_pool4));
> diff --git a/tcp.h b/tcp.h
> index 234a803..bb58324 100644
> --- a/tcp.h
> +++ b/tcp.h
> @@ -59,12 +59,17 @@ union tcp_listen_epoll_ref {
> * @fwd_out: Port forwarding configuration for outbound packets
> * @timer_run: Timestamp of most recent timer run
> * @pipe_size: Size of pipes for spliced connections
> + * @tcp_syn_retries: Number of times initial SYNs during handshake
Perhaps,
"Number of SYN retries using exponential backoff"
> + * @syn_linear_timeouts: Number of SYN retries using linear backoff timeout
> + * before switching to exponential backoff timeout
> */
> struct tcp_ctx {
> struct fwd_ports fwd_in;
> struct fwd_ports fwd_out;
> struct timespec timer_run;
> size_t pipe_size;
> + uint8_t tcp_syn_retries;
> + uint8_t syn_linear_timeouts;
> };
>
> #endif /* TCP_H */
> --
> 2.47.0
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v5 3/4] tcp: Resend SYN for inbound connections
2025-10-17 4:43 ` David Gibson
@ 2025-10-17 5:46 ` Yumei Huang
2025-10-17 7:27 ` David Gibson
0 siblings, 1 reply; 10+ messages in thread
From: Yumei Huang @ 2025-10-17 5:46 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev, sbrivio
On Fri, Oct 17, 2025 at 12:44 PM David Gibson
<david@gibson.dropbear.id.au> wrote:
>
> On Fri, Oct 17, 2025 at 12:27:42PM +0800, Yumei Huang wrote:
> > If a client connects while guest is not connected or ready yet,
> > resend SYN instead of just resetting connection after 10 seconds.
> >
> > Use the same backoff calculation for the timeout as linux kernel.
> >
> > Link: https://bugs.passt.top/show_bug.cgi?id=153
> > Signed-off-by: Yumei Huang <yuhuang@redhat.com>
> > ---
> > tcp.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++--------
> > tcp.h | 5 +++++
> > 2 files changed, 52 insertions(+), 8 deletions(-)
> >
> > diff --git a/tcp.c b/tcp.c
> > index 2ec4b0c..71e2da0 100644
> > --- a/tcp.c
> > +++ b/tcp.c
> > @@ -179,9 +179,11 @@
> > *
> > * Timeouts are implemented by means of timerfd timers, set based on flags:
> > *
> > - * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshake (flag
> > - * ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, reset the
> > - * connection
> > + * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake
> > + * (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend
> > + * SYN. It's the starting timeout for the first SYN retry. If this persists
> > + * for more than TCP_MAX_RETRIES or (tcp_syn_retries +
> > + * tcp_syn_linear_timeouts) times in a row, reset the connection
> > *
> > * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending
> > * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the
> > @@ -340,7 +342,7 @@ enum {
> > #define WINDOW_DEFAULT 14600 /* RFC 6928 */
> >
> > #define ACK_INTERVAL 10 /* ms */
> > -#define SYN_TIMEOUT 10 /* s */
> > +#define SYN_TIMEOUT_INIT 1 /* s */
> > #define ACK_TIMEOUT 2
> > #define FIN_TIMEOUT 60
> > #define ACT_TIMEOUT 7200
> > @@ -365,6 +367,9 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX];
> >
> > #define TCP_MIGRATE_RESTORE_CHUNK_MIN 1024 /* Try smaller when above this */
> >
> > +#define TCP_SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retries"
> > +#define TCP_SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linear_timeouts" \
> > +
> > /* "Extended" data (not stored in the flow table) for TCP flow migration */
> > static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX];
> >
> > @@ -581,8 +586,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
> > if (conn->flags & ACK_TO_TAP_DUE) {
> > it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000;
> > } else if (conn->flags & ACK_FROM_TAP_DUE) {
> > - if (!(conn->events & ESTABLISHED))
> > - it.it_value.tv_sec = SYN_TIMEOUT;
> > + if (!(conn->events & ESTABLISHED)) {
> > + if (conn->retries < c->tcp.syn_linear_timeouts)
> > + it.it_value.tv_sec = SYN_TIMEOUT_INIT;
> > + else
> > + it.it_value.tv_sec = SYN_TIMEOUT_INIT <<
> > + (conn->retries - c->tcp.syn_linear_timeouts);
> > + }
> > else
> > it.it_value.tv_sec = ACK_TIMEOUT;
> > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
> > @@ -2409,8 +2419,17 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
> > tcp_timer_ctl(c, conn);
> > } else if (conn->flags & ACK_FROM_TAP_DUE) {
> > if (!(conn->events & ESTABLISHED)) {
> > - flow_dbg(conn, "handshake timeout");
> > - tcp_rst(c, conn);
> > + if (conn->retries >= TCP_MAX_RETRIES ||
> > + conn->retries >= (c->tcp.tcp_syn_retries +
> > + c->tcp.syn_linear_timeouts)) {
> > + flow_dbg(conn, "handshake timeout");
> > + tcp_rst(c, conn);
> > + } else {
> > + flow_trace(conn, "SYN timeout, retry");
> > + tcp_send_flag(c, conn, SYN);
> > + conn->retries++;
> > + tcp_timer_ctl(c, conn);
> > + }
> > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
> > flow_dbg(conn, "FIN timeout");
> > tcp_rst(c, conn);
> > @@ -2766,6 +2785,24 @@ static socklen_t tcp_probe_tcp_info(void)
> > return sl;
> > }
> >
> > +/**
> > + * tcp_syn_params_init() - Get initial syn params for inbound connection
>
> Stefano's comment wasn't 100% clear, but I believe he was suggesting
> s/syn params/SYN parameters/
> in the comment here for clarity.
Oops, I missed this one.
>
> > + * @c: Execution context
> > +*/
> > +void tcp_syn_params_init(struct ctx *c)
> > +{
> > + intmax_t tcp_syn_retries, syn_linear_timeouts;
> > +
> > + tcp_syn_retries = read_file_integer(TCP_SYN_RETRIES, 8);
> > + syn_linear_timeouts = read_file_integer(TCP_SYN_LINEAR_TIMEOUTS, 1);
> > +
> > + c->tcp.tcp_syn_retries = MIN(tcp_syn_retries, UINT8_MAX);
> > + c->tcp.syn_linear_timeouts = MIN(syn_linear_timeouts, UINT8_MAX);
> > +
> > + debug("TCP SYN parameters: retries=%"PRIu8", linear_timeouts=%"PRIu8,
> > + c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts);
> > +}
> > +
> > /**
> > * tcp_init() - Get initial sequence, hash secret, initialise per-socket data
> > * @c: Execution context
> > @@ -2776,6 +2813,8 @@ int tcp_init(struct ctx *c)
> > {
> > ASSERT(!c->no_tcp);
> >
> > + tcp_syn_params_init(c);
> > +
> > tcp_sock_iov_init(c);
> >
> > memset(init_sock_pool4, 0xff, sizeof(init_sock_pool4));
> > diff --git a/tcp.h b/tcp.h
> > index 234a803..bb58324 100644
> > --- a/tcp.h
> > +++ b/tcp.h
> > @@ -59,12 +59,17 @@ union tcp_listen_epoll_ref {
> > * @fwd_out: Port forwarding configuration for outbound packets
> > * @timer_run: Timestamp of most recent timer run
> > * @pipe_size: Size of pipes for spliced connections
> > + * @tcp_syn_retries: Number of times initial SYNs during handshake
>
> Perhaps,
>
> "Number of SYN retries using exponential backoff"
It's not actually retries using exponential backoff. It's the total
number of retries, including linear backoff and exponential backoff.
I got the "Number of times initial SYNs" from the kernel doc
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.rst.
Maybe "Number of SYN retries during handshake" ? What do you think?
>
> > + * @syn_linear_timeouts: Number of SYN retries using linear backoff timeout
> > + * before switching to exponential backoff timeout
> > */
> > struct tcp_ctx {
> > struct fwd_ports fwd_in;
> > struct fwd_ports fwd_out;
> > struct timespec timer_run;
> > size_t pipe_size;
> > + uint8_t tcp_syn_retries;
> > + uint8_t syn_linear_timeouts;
> > };
> >
> > #endif /* TCP_H */
> > --
> > 2.47.0
> >
>
> --
> David Gibson (he or they) | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
> | around.
> http://www.ozlabs.org/~dgibson
--
Thanks,
Yumei Huang
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v5 3/4] tcp: Resend SYN for inbound connections
2025-10-17 5:46 ` Yumei Huang
@ 2025-10-17 7:27 ` David Gibson
2025-10-17 8:21 ` Yumei Huang
0 siblings, 1 reply; 10+ messages in thread
From: David Gibson @ 2025-10-17 7:27 UTC (permalink / raw)
To: Yumei Huang; +Cc: passt-dev, sbrivio
[-- Attachment #1: Type: text/plain, Size: 8901 bytes --]
On Fri, Oct 17, 2025 at 01:46:15PM +0800, Yumei Huang wrote:
> On Fri, Oct 17, 2025 at 12:44 PM David Gibson
> <david@gibson.dropbear.id.au> wrote:
> >
> > On Fri, Oct 17, 2025 at 12:27:42PM +0800, Yumei Huang wrote:
> > > If a client connects while guest is not connected or ready yet,
> > > resend SYN instead of just resetting connection after 10 seconds.
> > >
> > > Use the same backoff calculation for the timeout as linux kernel.
> > >
> > > Link: https://bugs.passt.top/show_bug.cgi?id=153
> > > Signed-off-by: Yumei Huang <yuhuang@redhat.com>
> > > ---
> > > tcp.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++--------
> > > tcp.h | 5 +++++
> > > 2 files changed, 52 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/tcp.c b/tcp.c
> > > index 2ec4b0c..71e2da0 100644
> > > --- a/tcp.c
> > > +++ b/tcp.c
> > > @@ -179,9 +179,11 @@
> > > *
> > > * Timeouts are implemented by means of timerfd timers, set based on flags:
> > > *
> > > - * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshake (flag
> > > - * ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, reset the
> > > - * connection
> > > + * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake
> > > + * (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend
> > > + * SYN. It's the starting timeout for the first SYN retry. If this persists
> > > + * for more than TCP_MAX_RETRIES or (tcp_syn_retries +
> > > + * tcp_syn_linear_timeouts) times in a row, reset the connection
> > > *
> > > * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending
> > > * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the
> > > @@ -340,7 +342,7 @@ enum {
> > > #define WINDOW_DEFAULT 14600 /* RFC 6928 */
> > >
> > > #define ACK_INTERVAL 10 /* ms */
> > > -#define SYN_TIMEOUT 10 /* s */
> > > +#define SYN_TIMEOUT_INIT 1 /* s */
> > > #define ACK_TIMEOUT 2
> > > #define FIN_TIMEOUT 60
> > > #define ACT_TIMEOUT 7200
> > > @@ -365,6 +367,9 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX];
> > >
> > > #define TCP_MIGRATE_RESTORE_CHUNK_MIN 1024 /* Try smaller when above this */
> > >
> > > +#define TCP_SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retries"
> > > +#define TCP_SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linear_timeouts" \
> > > +
> > > /* "Extended" data (not stored in the flow table) for TCP flow migration */
> > > static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX];
> > >
> > > @@ -581,8 +586,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
> > > if (conn->flags & ACK_TO_TAP_DUE) {
> > > it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000;
> > > } else if (conn->flags & ACK_FROM_TAP_DUE) {
> > > - if (!(conn->events & ESTABLISHED))
> > > - it.it_value.tv_sec = SYN_TIMEOUT;
> > > + if (!(conn->events & ESTABLISHED)) {
> > > + if (conn->retries < c->tcp.syn_linear_timeouts)
> > > + it.it_value.tv_sec = SYN_TIMEOUT_INIT;
> > > + else
> > > + it.it_value.tv_sec = SYN_TIMEOUT_INIT <<
> > > + (conn->retries - c->tcp.syn_linear_timeouts);
> > > + }
> > > else
> > > it.it_value.tv_sec = ACK_TIMEOUT;
> > > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
> > > @@ -2409,8 +2419,17 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
> > > tcp_timer_ctl(c, conn);
> > > } else if (conn->flags & ACK_FROM_TAP_DUE) {
> > > if (!(conn->events & ESTABLISHED)) {
> > > - flow_dbg(conn, "handshake timeout");
> > > - tcp_rst(c, conn);
> > > + if (conn->retries >= TCP_MAX_RETRIES ||
> > > + conn->retries >= (c->tcp.tcp_syn_retries +
> > > + c->tcp.syn_linear_timeouts)) {
> > > + flow_dbg(conn, "handshake timeout");
> > > + tcp_rst(c, conn);
> > > + } else {
> > > + flow_trace(conn, "SYN timeout, retry");
> > > + tcp_send_flag(c, conn, SYN);
> > > + conn->retries++;
> > > + tcp_timer_ctl(c, conn);
> > > + }
> > > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
> > > flow_dbg(conn, "FIN timeout");
> > > tcp_rst(c, conn);
> > > @@ -2766,6 +2785,24 @@ static socklen_t tcp_probe_tcp_info(void)
> > > return sl;
> > > }
> > >
> > > +/**
> > > + * tcp_syn_params_init() - Get initial syn params for inbound connection
> >
> > Stefano's comment wasn't 100% clear, but I believe he was suggesting
> > s/syn params/SYN parameters/
> > in the comment here for clarity.
>
> Oops, I missed this one.
> >
> > > + * @c: Execution context
> > > +*/
> > > +void tcp_syn_params_init(struct ctx *c)
> > > +{
> > > + intmax_t tcp_syn_retries, syn_linear_timeouts;
> > > +
> > > + tcp_syn_retries = read_file_integer(TCP_SYN_RETRIES, 8);
> > > + syn_linear_timeouts = read_file_integer(TCP_SYN_LINEAR_TIMEOUTS, 1);
> > > +
> > > + c->tcp.tcp_syn_retries = MIN(tcp_syn_retries, UINT8_MAX);
> > > + c->tcp.syn_linear_timeouts = MIN(syn_linear_timeouts, UINT8_MAX);
> > > +
> > > + debug("TCP SYN parameters: retries=%"PRIu8", linear_timeouts=%"PRIu8,
> > > + c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts);
> > > +}
> > > +
> > > /**
> > > * tcp_init() - Get initial sequence, hash secret, initialise per-socket data
> > > * @c: Execution context
> > > @@ -2776,6 +2813,8 @@ int tcp_init(struct ctx *c)
> > > {
> > > ASSERT(!c->no_tcp);
> > >
> > > + tcp_syn_params_init(c);
> > > +
> > > tcp_sock_iov_init(c);
> > >
> > > memset(init_sock_pool4, 0xff, sizeof(init_sock_pool4));
> > > diff --git a/tcp.h b/tcp.h
> > > index 234a803..bb58324 100644
> > > --- a/tcp.h
> > > +++ b/tcp.h
> > > @@ -59,12 +59,17 @@ union tcp_listen_epoll_ref {
> > > * @fwd_out: Port forwarding configuration for outbound packets
> > > * @timer_run: Timestamp of most recent timer run
> > > * @pipe_size: Size of pipes for spliced connections
> > > + * @tcp_syn_retries: Number of times initial SYNs during handshake
> >
> > Perhaps,
> >
> > "Number of SYN retries using exponential backoff"
>
> It's not actually retries using exponential backoff. It's the total
> number of retries, including linear backoff and exponential backoff.
That's what ip-sysctl.rst says, but as we discussed the other day, the
kernel code and behaviour seem to treat it as just the retries after
exhausting linear_timeouts.
> I got the "Number of times initial SYNs" from the kernel doc
> https://www.kernel.org/doc/Documentation/networking/ip-sysctl.rst.
Right, but by cutting off the "will be retransmitted", it's no longer
grammatical (admittedly it's a pretty long confusing sentence).
> Maybe "Number of SYN retries during handshake" ? What do you think?
That would be good if the description in ip-sysctl.rst was accurate,
but it seems not to be.
> >
> > > + * @syn_linear_timeouts: Number of SYN retries using linear backoff timeout
> > > + * before switching to exponential backoff timeout
> > > */
> > > struct tcp_ctx {
> > > struct fwd_ports fwd_in;
> > > struct fwd_ports fwd_out;
> > > struct timespec timer_run;
> > > size_t pipe_size;
> > > + uint8_t tcp_syn_retries;
> > > + uint8_t syn_linear_timeouts;
> > > };
> > >
> > > #endif /* TCP_H */
> > > --
> > > 2.47.0
> > >
> >
> > --
> > David Gibson (he or they) | I'll have my music baroque, and my code
> > david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
> > | around.
> > http://www.ozlabs.org/~dgibson
>
>
>
> --
> Thanks,
>
> Yumei Huang
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v5 3/4] tcp: Resend SYN for inbound connections
2025-10-17 7:27 ` David Gibson
@ 2025-10-17 8:21 ` Yumei Huang
0 siblings, 0 replies; 10+ messages in thread
From: Yumei Huang @ 2025-10-17 8:21 UTC (permalink / raw)
To: David Gibson; +Cc: passt-dev, sbrivio
On Fri, Oct 17, 2025 at 3:48 PM David Gibson
<david@gibson.dropbear.id.au> wrote:
>
> On Fri, Oct 17, 2025 at 01:46:15PM +0800, Yumei Huang wrote:
> > On Fri, Oct 17, 2025 at 12:44 PM David Gibson
> > <david@gibson.dropbear.id.au> wrote:
> > >
> > > On Fri, Oct 17, 2025 at 12:27:42PM +0800, Yumei Huang wrote:
> > > > If a client connects while guest is not connected or ready yet,
> > > > resend SYN instead of just resetting connection after 10 seconds.
> > > >
> > > > Use the same backoff calculation for the timeout as linux kernel.
> > > >
> > > > Link: https://bugs.passt.top/show_bug.cgi?id=153
> > > > Signed-off-by: Yumei Huang <yuhuang@redhat.com>
> > > > ---
> > > > tcp.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++--------
> > > > tcp.h | 5 +++++
> > > > 2 files changed, 52 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/tcp.c b/tcp.c
> > > > index 2ec4b0c..71e2da0 100644
> > > > --- a/tcp.c
> > > > +++ b/tcp.c
> > > > @@ -179,9 +179,11 @@
> > > > *
> > > > * Timeouts are implemented by means of timerfd timers, set based on flags:
> > > > *
> > > > - * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshake (flag
> > > > - * ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, reset the
> > > > - * connection
> > > > + * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake
> > > > + * (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend
> > > > + * SYN. It's the starting timeout for the first SYN retry. If this persists
> > > > + * for more than TCP_MAX_RETRIES or (tcp_syn_retries +
> > > > + * tcp_syn_linear_timeouts) times in a row, reset the connection
> > > > *
> > > > * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending
> > > > * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the
> > > > @@ -340,7 +342,7 @@ enum {
> > > > #define WINDOW_DEFAULT 14600 /* RFC 6928 */
> > > >
> > > > #define ACK_INTERVAL 10 /* ms */
> > > > -#define SYN_TIMEOUT 10 /* s */
> > > > +#define SYN_TIMEOUT_INIT 1 /* s */
> > > > #define ACK_TIMEOUT 2
> > > > #define FIN_TIMEOUT 60
> > > > #define ACT_TIMEOUT 7200
> > > > @@ -365,6 +367,9 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX];
> > > >
> > > > #define TCP_MIGRATE_RESTORE_CHUNK_MIN 1024 /* Try smaller when above this */
> > > >
> > > > +#define TCP_SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retries"
> > > > +#define TCP_SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linear_timeouts" \
> > > > +
> > > > /* "Extended" data (not stored in the flow table) for TCP flow migration */
> > > > static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX];
> > > >
> > > > @@ -581,8 +586,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
> > > > if (conn->flags & ACK_TO_TAP_DUE) {
> > > > it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000;
> > > > } else if (conn->flags & ACK_FROM_TAP_DUE) {
> > > > - if (!(conn->events & ESTABLISHED))
> > > > - it.it_value.tv_sec = SYN_TIMEOUT;
> > > > + if (!(conn->events & ESTABLISHED)) {
> > > > + if (conn->retries < c->tcp.syn_linear_timeouts)
> > > > + it.it_value.tv_sec = SYN_TIMEOUT_INIT;
> > > > + else
> > > > + it.it_value.tv_sec = SYN_TIMEOUT_INIT <<
> > > > + (conn->retries - c->tcp.syn_linear_timeouts);
> > > > + }
> > > > else
> > > > it.it_value.tv_sec = ACK_TIMEOUT;
> > > > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
> > > > @@ -2409,8 +2419,17 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
> > > > tcp_timer_ctl(c, conn);
> > > > } else if (conn->flags & ACK_FROM_TAP_DUE) {
> > > > if (!(conn->events & ESTABLISHED)) {
> > > > - flow_dbg(conn, "handshake timeout");
> > > > - tcp_rst(c, conn);
> > > > + if (conn->retries >= TCP_MAX_RETRIES ||
> > > > + conn->retries >= (c->tcp.tcp_syn_retries +
> > > > + c->tcp.syn_linear_timeouts)) {
> > > > + flow_dbg(conn, "handshake timeout");
> > > > + tcp_rst(c, conn);
> > > > + } else {
> > > > + flow_trace(conn, "SYN timeout, retry");
> > > > + tcp_send_flag(c, conn, SYN);
> > > > + conn->retries++;
> > > > + tcp_timer_ctl(c, conn);
> > > > + }
> > > > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
> > > > flow_dbg(conn, "FIN timeout");
> > > > tcp_rst(c, conn);
> > > > @@ -2766,6 +2785,24 @@ static socklen_t tcp_probe_tcp_info(void)
> > > > return sl;
> > > > }
> > > >
> > > > +/**
> > > > + * tcp_syn_params_init() - Get initial syn params for inbound connection
> > >
> > > Stefano's comment wasn't 100% clear, but I believe he was suggesting
> > > s/syn params/SYN parameters/
> > > in the comment here for clarity.
> >
> > Oops, I missed this one.
> > >
> > > > + * @c: Execution context
> > > > +*/
> > > > +void tcp_syn_params_init(struct ctx *c)
> > > > +{
> > > > + intmax_t tcp_syn_retries, syn_linear_timeouts;
> > > > +
> > > > + tcp_syn_retries = read_file_integer(TCP_SYN_RETRIES, 8);
> > > > + syn_linear_timeouts = read_file_integer(TCP_SYN_LINEAR_TIMEOUTS, 1);
> > > > +
> > > > + c->tcp.tcp_syn_retries = MIN(tcp_syn_retries, UINT8_MAX);
> > > > + c->tcp.syn_linear_timeouts = MIN(syn_linear_timeouts, UINT8_MAX);
> > > > +
> > > > + debug("TCP SYN parameters: retries=%"PRIu8", linear_timeouts=%"PRIu8,
> > > > + c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts);
> > > > +}
> > > > +
> > > > /**
> > > > * tcp_init() - Get initial sequence, hash secret, initialise per-socket data
> > > > * @c: Execution context
> > > > @@ -2776,6 +2813,8 @@ int tcp_init(struct ctx *c)
> > > > {
> > > > ASSERT(!c->no_tcp);
> > > >
> > > > + tcp_syn_params_init(c);
> > > > +
> > > > tcp_sock_iov_init(c);
> > > >
> > > > memset(init_sock_pool4, 0xff, sizeof(init_sock_pool4));
> > > > diff --git a/tcp.h b/tcp.h
> > > > index 234a803..bb58324 100644
> > > > --- a/tcp.h
> > > > +++ b/tcp.h
> > > > @@ -59,12 +59,17 @@ union tcp_listen_epoll_ref {
> > > > * @fwd_out: Port forwarding configuration for outbound packets
> > > > * @timer_run: Timestamp of most recent timer run
> > > > * @pipe_size: Size of pipes for spliced connections
> > > > + * @tcp_syn_retries: Number of times initial SYNs during handshake
> > >
> > > Perhaps,
> > >
> > > "Number of SYN retries using exponential backoff"
> >
> > It's not actually retries using exponential backoff. It's the total
> > number of retries, including linear backoff and exponential backoff.
>
> That's what ip-sysctl.rst says, but as we discussed the other day, the
> kernel code and behaviour seem to treat it as just the retries after
> exhausting linear_timeouts.
You are right. I forgot about that.
>
> > I got the "Number of times initial SYNs" from the kernel doc
> > https://www.kernel.org/doc/Documentation/networking/ip-sysctl.rst.
>
> Right, but by cutting off the "will be retransmitted", it's no longer
> grammatical (admittedly it's a pretty long confusing sentence).
>
> > Maybe "Number of SYN retries during handshake" ? What do you think?
>
> That would be good if the description in ip-sysctl.rst was accurate,
> but it seems not to be.
I probably should wait a bit before sending the v6.
Anyway, I will wait and send the v7 next week to see if more comments
from Stefano.
Thanks for all the reviews and comments.
>
> > >
> > > > + * @syn_linear_timeouts: Number of SYN retries using linear backoff timeout
> > > > + * before switching to exponential backoff timeout
> > > > */
> > > > struct tcp_ctx {
> > > > struct fwd_ports fwd_in;
> > > > struct fwd_ports fwd_out;
> > > > struct timespec timer_run;
> > > > size_t pipe_size;
> > > > + uint8_t tcp_syn_retries;
> > > > + uint8_t syn_linear_timeouts;
> > > > };
> > > >
> > > > #endif /* TCP_H */
> > > > --
> > > > 2.47.0
> > > >
> > >
> > > --
> > > David Gibson (he or they) | I'll have my music baroque, and my code
> > > david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
> > > | around.
> > > http://www.ozlabs.org/~dgibson
> >
> >
> >
> > --
> > Thanks,
> >
> > Yumei Huang
> >
>
> --
> David Gibson (he or they) | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
> | around.
> http://www.ozlabs.org/~dgibson
--
Thanks,
Yumei Huang
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-10-17 8:22 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-17 4:27 [PATCH v5 0/4] Retry SYNs for inbound connections Yumei Huang
2025-10-17 4:27 ` [PATCH v5 1/4] tcp: Rename "retrans" to "retries" Yumei Huang
2025-10-17 4:27 ` [PATCH v5 2/4] util: Introduce read_file() and read_file_integer() function Yumei Huang
2025-10-17 4:38 ` David Gibson
2025-10-17 4:27 ` [PATCH v5 3/4] tcp: Resend SYN for inbound connections Yumei Huang
2025-10-17 4:43 ` David Gibson
2025-10-17 5:46 ` Yumei Huang
2025-10-17 7:27 ` David Gibson
2025-10-17 8:21 ` Yumei Huang
2025-10-17 4:27 ` [PATCH v5 4/4] tcp: Update data retransmission timeout Yumei Huang
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).