public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* [PATCH v4 0/4] Retry SYNs for inbound connections
@ 2025-10-16  2:34 Yumei Huang
  2025-10-16  2:34 ` [PATCH v4 1/4] tcp: Rename "retrans" to "retries" Yumei Huang
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Yumei Huang @ 2025-10-16  2:34 UTC (permalink / raw)
  To: passt-dev, sbrivio; +Cc: david, yuhuang

When a client connects, SYN would be sent to guest only once. If the
guest is not connected or ready at that time, the connection will be
reset in 10s. These patches introduce the SYN retry mechanism using
the similar backoff timeout as linux kernel. Also update the data
retransmission timeout using the backoff timeout.

Yumei Huang (4):
  tcp: Rename "retrans" to "retries"
  util: Introduce read_file() and read_file_integer() function
  tcp: Resend SYN for inbound connections
  tcp: Update data retransmission timeout

 tcp.c      | 74 ++++++++++++++++++++++++++++++++++-------------
 tcp.h      |  2 ++
 tcp_conn.h | 12 ++++----
 util.c     | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 util.h     |  3 ++
 5 files changed, 149 insertions(+), 26 deletions(-)

-- 
2.47.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 1/4] tcp: Rename "retrans" to "retries"
  2025-10-16  2:34 [PATCH v4 0/4] Retry SYNs for inbound connections Yumei Huang
@ 2025-10-16  2:34 ` Yumei Huang
  2025-10-16  2:34 ` [PATCH v4 2/4] util: Introduce read_file() and read_file_integer() function Yumei Huang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Yumei Huang @ 2025-10-16  2:34 UTC (permalink / raw)
  To: passt-dev, sbrivio; +Cc: david, yuhuang

Rename "retrans" to "retries" so it can be used for SYN retries.

Signed-off-by: Yumei Huang <yuhuang@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 tcp.c      | 12 ++++++------
 tcp_conn.h | 12 ++++++------
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/tcp.c b/tcp.c
index 0f9e9b3..2ec4b0c 100644
--- a/tcp.c
+++ b/tcp.c
@@ -186,7 +186,7 @@
  * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending
  *   data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the
  *   socket and reset sequence to what was acknowledged. If this persists for
- *   more than TCP_MAX_RETRANS times in a row, reset the connection
+ *   more than TCP_MAX_RETRIES times in a row, reset the connection
  *
  * - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK_FROM_TAP_DUE
  *   with TAP_FIN_SENT event), and no ACK is received within this time, reset
@@ -1127,7 +1127,7 @@ static void tcp_update_seqack_from_tap(const struct ctx *c,
 		if (SEQ_LT(seq, conn->seq_to_tap))
 			conn_flag(c, conn, ACK_FROM_TAP_DUE);
 
-		conn->retrans = 0;
+		conn->retries = 0;
 		conn->seq_ack_from_tap = seq;
 	}
 }
@@ -2414,7 +2414,7 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
 		} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
 			flow_dbg(conn, "FIN timeout");
 			tcp_rst(c, conn);
-		} else if (conn->retrans == TCP_MAX_RETRANS) {
+		} else if (conn->retries == TCP_MAX_RETRIES) {
 			flow_dbg(conn, "retransmissions count exceeded");
 			tcp_rst(c, conn);
 		} else {
@@ -2423,7 +2423,7 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
 			if (!conn->wnd_from_tap)
 				conn->wnd_from_tap = 1; /* Zero-window probe */
 
-			conn->retrans++;
+			conn->retries++;
 			if (tcp_rewind_seq(c, conn))
 				return;
 
@@ -3382,7 +3382,7 @@ static int tcp_flow_repair_opt(const struct tcp_tap_conn *conn,
 int tcp_flow_migrate_source(int fd, struct tcp_tap_conn *conn)
 {
 	struct tcp_tap_transfer t = {
-		.retrans		= conn->retrans,
+		.retries		= conn->retries,
 		.ws_from_tap		= conn->ws_from_tap,
 		.ws_to_tap		= conn->ws_to_tap,
 		.events			= conn->events,
@@ -3662,7 +3662,7 @@ int tcp_flow_migrate_target(struct ctx *c, int fd)
 	memcpy(&flow->f.side, &t.side, sizeof(flow->f.side));
 	conn = FLOW_SET_TYPE(flow, FLOW_TCP, tcp);
 
-	conn->retrans			= t.retrans;
+	conn->retries			= t.retries;
 	conn->ws_from_tap		= t.ws_from_tap;
 	conn->ws_to_tap			= t.ws_to_tap;
 	conn->events			= t.events;
diff --git a/tcp_conn.h b/tcp_conn.h
index 38b5c54..e5c8146 100644
--- a/tcp_conn.h
+++ b/tcp_conn.h
@@ -13,7 +13,7 @@
  * struct tcp_tap_conn - Descriptor for a TCP connection (not spliced)
  * @f:			Generic flow information
  * @in_epoll:		Is the connection in the epoll set?
- * @retrans:		Number of retransmissions occurred due to ACK_TIMEOUT
+ * @retries:		Number of retries occurred due to timeouts
  * @ws_from_tap:	Window scaling factor advertised from tap/guest
  * @ws_to_tap:		Window scaling factor advertised to tap/guest
  * @tap_mss:		MSS advertised by tap/guest, rounded to 2 ^ TCP_MSS_BITS
@@ -38,9 +38,9 @@ struct tcp_tap_conn {
 
 	bool		in_epoll	:1;
 
-#define TCP_RETRANS_BITS		3
-	unsigned int	retrans		:TCP_RETRANS_BITS;
-#define TCP_MAX_RETRANS			MAX_FROM_BITS(TCP_RETRANS_BITS)
+#define TCP_RETRIES_BITS		3
+	unsigned int	retries		:TCP_RETRIES_BITS;
+#define TCP_MAX_RETRIES			MAX_FROM_BITS(TCP_RETRIES_BITS)
 
 #define TCP_WS_BITS			4	/* RFC 7323 */
 #define TCP_WS_MAX			14
@@ -102,7 +102,7 @@ struct tcp_tap_conn {
  * struct tcp_tap_transfer - Migrated TCP data, flow table part, network order
  * @pif:		Interfaces for each side of the flow
  * @side:		Addresses and ports for each side of the flow
- * @retrans:		Number of retransmissions occurred due to ACK_TIMEOUT
+ * @retries:		Number of retries occurred due to timeouts
  * @ws_from_tap:	Window scaling factor advertised from tap/guest
  * @ws_to_tap:		Window scaling factor advertised to tap/guest
  * @events:		Connection events, implying connection states
@@ -122,7 +122,7 @@ struct tcp_tap_transfer {
 	uint8_t		pif[SIDES];
 	struct flowside	side[SIDES];
 
-	uint8_t		retrans;
+	uint8_t		retries;
 	uint8_t		ws_from_tap;
 	uint8_t		ws_to_tap;
 	uint8_t		events;
-- 
2.47.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 2/4] util: Introduce read_file() and read_file_integer() function
  2025-10-16  2:34 [PATCH v4 0/4] Retry SYNs for inbound connections Yumei Huang
  2025-10-16  2:34 ` [PATCH v4 1/4] tcp: Rename "retrans" to "retries" Yumei Huang
@ 2025-10-16  2:34 ` Yumei Huang
  2025-10-16  6:30   ` David Gibson
  2025-10-16  2:34 ` [PATCH v4 3/4] tcp: Resend SYN for inbound connections Yumei Huang
  2025-10-16  2:34 ` [PATCH v4 4/4] tcp: Update data retransmission timeout Yumei Huang
  3 siblings, 1 reply; 17+ messages in thread
From: Yumei Huang @ 2025-10-16  2:34 UTC (permalink / raw)
  To: passt-dev, sbrivio; +Cc: david, yuhuang

Signed-off-by: Yumei Huang <yuhuang@redhat.com>
---
 util.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 util.h |  3 +++
 2 files changed, 87 insertions(+)

diff --git a/util.c b/util.c
index c492f90..197677e 100644
--- a/util.c
+++ b/util.c
@@ -579,6 +579,90 @@ int write_file(const char *path, const char *buf)
 	return len == 0 ? 0 : -1;
 }
 
+/**
+ * read_file() - Read contents of file into a buffer
+ * @path:	File to read
+ * @buf:	Buffer to store file contents
+ * @buf_size:	Size of buffer
+ *
+ * Return: number of bytes read on success, -1 on any error, -2 on truncation
+*/
+int read_file(const char *path, char *buf, size_t buf_size)
+{
+	int fd = open(path, O_RDONLY | O_CLOEXEC);
+	size_t total_read = 0;
+	ssize_t rc;
+
+	if (fd < 0) {
+		warn_perror("Could not open %s", path);
+		return -1;
+	}
+
+	while (total_read < buf_size) {
+		rc = read(fd, buf + total_read, buf_size - total_read);
+
+		if (rc < 0) {
+			warn_perror("Couldn't read from %s", path);
+			close(fd);
+			return -1;
+		}
+
+		if (rc == 0)
+			break;
+
+		total_read += rc;
+	}
+
+	close(fd);
+
+	if (total_read == buf_size) {
+		warn_perror("File %s truncated, buffer too small", path);
+		return -2;
+	}
+
+	buf[total_read] = '\0';
+
+	return (int)total_read;
+}
+
+/**
+ * read_file_integer() - Read an integer value from a file
+ * @path: File to read
+ * @fallback: Default value if file can't be read
+ *
+ * Return: Integer value, fallback on failure
+*/
+intmax_t read_file_integer(const char *path, intmax_t fallback)
+{
+	char buf[INTMAX_STRLEN];
+	char *end;
+	intmax_t value;
+	int bytes_read;
+
+	bytes_read = read_file(path, buf, sizeof(buf));
+
+	if (bytes_read < 0)
+		return fallback;
+
+	if (bytes_read == 0) {
+		debug("Empty file %s", path);
+		return fallback;
+	}
+
+	errno = 0;
+	value = strtoimax(buf, &end, 10);
+	if (*end && *end != '\n') {
+		debug("Invalid format in %s", path);
+		return fallback;
+	}
+	if (errno) {
+		debug("Invalid value in %s: %s", path, buf);
+		return fallback;
+	}
+
+	return value;
+}
+
 #ifdef __ia64__
 /* Needed by do_clone() below: glibc doesn't export the prototype of __clone2(),
  * use the description from clone(2).
diff --git a/util.h b/util.h
index 22eaac5..887d795 100644
--- a/util.h
+++ b/util.h
@@ -222,6 +222,8 @@ void pidfile_write(int fd, pid_t pid);
 int __daemon(int pidfile_fd, int devnull_fd);
 int fls(unsigned long x);
 int write_file(const char *path, const char *buf);
+int read_file(const char *path, char *buf, size_t buf_size);
+intmax_t read_file_integer(const char *path, intmax_t fallback);
 int write_all_buf(int fd, const void *buf, size_t len);
 int write_remainder(int fd, const struct iovec *iov, size_t iovcnt, size_t skip);
 int read_all_buf(int fd, void *buf, size_t len);
@@ -249,6 +251,7 @@ static inline const char *af_name(sa_family_t af)
 }
 
 #define UINT16_STRLEN		(sizeof("65535"))
+#define INTMAX_STRLEN		(sizeof("-9223372036854775808"))
 
 /* inet address (- '\0') + port (u16) (- '\0') + ':' + '\0' */
 #define SOCKADDR_INET_STRLEN					\
-- 
2.47.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 3/4] tcp: Resend SYN for inbound connections
  2025-10-16  2:34 [PATCH v4 0/4] Retry SYNs for inbound connections Yumei Huang
  2025-10-16  2:34 ` [PATCH v4 1/4] tcp: Rename "retrans" to "retries" Yumei Huang
  2025-10-16  2:34 ` [PATCH v4 2/4] util: Introduce read_file() and read_file_integer() function Yumei Huang
@ 2025-10-16  2:34 ` Yumei Huang
  2025-10-16 22:22   ` Stefano Brivio
  2025-10-16 23:49   ` David Gibson
  2025-10-16  2:34 ` [PATCH v4 4/4] tcp: Update data retransmission timeout Yumei Huang
  3 siblings, 2 replies; 17+ messages in thread
From: Yumei Huang @ 2025-10-16  2:34 UTC (permalink / raw)
  To: passt-dev, sbrivio; +Cc: david, yuhuang

If a client connects while guest is not connected or ready yet,
resend SYN instead of just resetting connection after 10 seconds.

Use the same backoff calculation for the timeout as linux kernel.

Signed-off-by: Yumei Huang <yuhuang@redhat.com>
---
 tcp.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++--------
 tcp.h |  2 ++
 2 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/tcp.c b/tcp.c
index 2ec4b0c..3003333 100644
--- a/tcp.c
+++ b/tcp.c
@@ -179,9 +179,11 @@
  *
  * Timeouts are implemented by means of timerfd timers, set based on flags:
  *
- * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshake (flag
- *   ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, reset the
- *   connection
+ * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake
+ *   (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend
+ *   SYN. It's the starting timeout for the first SYN retry. If this persists
+ *   for more than TCP_MAX_RETRIES or (tcp_syn_retries +
+ *   tcp_syn_linear_timeouts) times in a row, reset the connection
  *
  * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending
  *   data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the
@@ -340,7 +342,7 @@ enum {
 #define WINDOW_DEFAULT			14600		/* RFC 6928 */
 
 #define ACK_INTERVAL			10		/* ms */
-#define SYN_TIMEOUT			10		/* s */
+#define SYN_TIMEOUT_INIT		1		/* s */
 #define ACK_TIMEOUT			2
 #define FIN_TIMEOUT			60
 #define ACT_TIMEOUT			7200
@@ -365,6 +367,10 @@ uint8_t tcp_migrate_rcv_queue		[TCP_MIGRATE_RCV_QUEUE_MAX];
 
 #define TCP_MIGRATE_RESTORE_CHUNK_MIN	1024 /* Try smaller when above this */
 
+#define TCP_SYN_RETRIES_SYSCTL		"/proc/sys/net/ipv4/tcp_syn_retries"
+#define TCP_SYN_LINEAR_TIMEOUTS_SYSCTL						\
+	"/proc/sys/net/ipv4/tcp_syn_linear_timeouts"
+
 /* "Extended" data (not stored in the flow table) for TCP flow migration */
 static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX];
 
@@ -581,8 +587,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
 	if (conn->flags & ACK_TO_TAP_DUE) {
 		it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000;
 	} else if (conn->flags & ACK_FROM_TAP_DUE) {
-		if (!(conn->events & ESTABLISHED))
-			it.it_value.tv_sec = SYN_TIMEOUT;
+		if (!(conn->events & ESTABLISHED)) {
+			if (conn->retries < c->tcp.syn_linear_timeouts)
+				it.it_value.tv_sec = SYN_TIMEOUT_INIT;
+			else
+				it.it_value.tv_sec = SYN_TIMEOUT_INIT <<
+					(conn->retries - c->tcp.syn_linear_timeouts);
+		}
 		else
 			it.it_value.tv_sec = ACK_TIMEOUT;
 	} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
@@ -2409,8 +2420,16 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
 		tcp_timer_ctl(c, conn);
 	} else if (conn->flags & ACK_FROM_TAP_DUE) {
 		if (!(conn->events & ESTABLISHED)) {
-			flow_dbg(conn, "handshake timeout");
-			tcp_rst(c, conn);
+			if (conn->retries >= MIN(TCP_MAX_RETRIES,
+				(c->tcp.tcp_syn_retries + c->tcp.syn_linear_timeouts))) {
+				flow_dbg(conn, "handshake timeout");
+				tcp_rst(c, conn);
+			} else {
+				flow_trace(conn, "SYN timeout, retry");
+				tcp_send_flag(c, conn, SYN);
+				conn->retries++;
+				tcp_timer_ctl(c, conn);
+			}
 		} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
 			flow_dbg(conn, "FIN timeout");
 			tcp_rst(c, conn);
@@ -2766,6 +2785,24 @@ static socklen_t tcp_probe_tcp_info(void)
 	return sl;
 }
 
+/**
+ * tcp_syn_params_init() - Get initial syn params for inbound connection
+ * @c:		Execution context
+*/
+void tcp_syn_params_init(struct ctx *c)
+{
+	long tcp_syn_retries, syn_linear_timeouts;
+
+	tcp_syn_retries = read_file_integer(TCP_SYN_RETRIES_SYSCTL, 8);
+	syn_linear_timeouts = read_file_integer(TCP_SYN_LINEAR_TIMEOUTS_SYSCTL, 1);
+
+	c->tcp.tcp_syn_retries = (uint8_t)MIN(tcp_syn_retries, UINT8_MAX);
+	c->tcp.syn_linear_timeouts = (uint8_t)MIN(syn_linear_timeouts, UINT8_MAX);
+
+	debug("TCP SYN parameters: retries=%"PRIu8", linear_timeouts=%"PRIu8,
+		  c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts);
+}
+
 /**
  * tcp_init() - Get initial sequence, hash secret, initialise per-socket data
  * @c:		Execution context
@@ -2776,6 +2813,8 @@ int tcp_init(struct ctx *c)
 {
 	ASSERT(!c->no_tcp);
 
+	tcp_syn_params_init(c);
+
 	tcp_sock_iov_init(c);
 
 	memset(init_sock_pool4,		0xff,	sizeof(init_sock_pool4));
diff --git a/tcp.h b/tcp.h
index 234a803..df699a4 100644
--- a/tcp.h
+++ b/tcp.h
@@ -65,6 +65,8 @@ struct tcp_ctx {
 	struct fwd_ports fwd_out;
 	struct timespec timer_run;
 	size_t pipe_size;
+	uint8_t tcp_syn_retries;
+	uint8_t syn_linear_timeouts;
 };
 
 #endif /* TCP_H */
-- 
2.47.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 4/4] tcp: Update data retransmission timeout
  2025-10-16  2:34 [PATCH v4 0/4] Retry SYNs for inbound connections Yumei Huang
                   ` (2 preceding siblings ...)
  2025-10-16  2:34 ` [PATCH v4 3/4] tcp: Resend SYN for inbound connections Yumei Huang
@ 2025-10-16  2:34 ` Yumei Huang
  2025-10-16 23:59   ` David Gibson
  3 siblings, 1 reply; 17+ messages in thread
From: Yumei Huang @ 2025-10-16  2:34 UTC (permalink / raw)
  To: passt-dev, sbrivio; +Cc: david, yuhuang

Use an exponential backoff timeout for data retransmission according
to RFC 2988 and RFC 6298. Set the initial RTO to one second as discussed
in Appendix A of RFC 6298.

Also combine the macros defining the initial RTO for both SYN and ACK.

Signed-off-by: Yumei Huang <yuhuang@redhat.com>
---
 tcp.c | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/tcp.c b/tcp.c
index 3003333..3254d67 100644
--- a/tcp.c
+++ b/tcp.c
@@ -179,16 +179,12 @@
  *
  * Timeouts are implemented by means of timerfd timers, set based on flags:
  *
- * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake
- *   (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend
- *   SYN. It's the starting timeout for the first SYN retry. If this persists
- *   for more than TCP_MAX_RETRIES or (tcp_syn_retries +
- *   tcp_syn_linear_timeouts) times in a row, reset the connection
- *
- * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending
- *   data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the
- *   socket and reset sequence to what was acknowledged. If this persists for
- *   more than TCP_MAX_RETRIES times in a row, reset the connection
+ * - RTO_INIT: if no ACK segment was received from tap/guest, either during
+ *   handshake (flag ACK_FROM_TAP_DUE without ESTABLISHED event) or after
+ *   sending data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data
+ *   from the socket and reset sequence to what was acknowledged. It's the
+ *   starting timeout for the first retry. If this persists for more than
+ *   allowed times in a row, reset the connection
  *
  * - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK_FROM_TAP_DUE
  *   with TAP_FIN_SENT event), and no ACK is received within this time, reset
@@ -342,8 +338,7 @@ enum {
 #define WINDOW_DEFAULT			14600		/* RFC 6928 */
 
 #define ACK_INTERVAL			10		/* ms */
-#define SYN_TIMEOUT_INIT		1		/* s */
-#define ACK_TIMEOUT			2
+#define RTO_INIT			1		/* s, RFC 6298 */
 #define FIN_TIMEOUT			60
 #define ACT_TIMEOUT			7200
 
@@ -589,13 +584,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
 	} else if (conn->flags & ACK_FROM_TAP_DUE) {
 		if (!(conn->events & ESTABLISHED)) {
 			if (conn->retries < c->tcp.syn_linear_timeouts)
-				it.it_value.tv_sec = SYN_TIMEOUT_INIT;
+				it.it_value.tv_sec = RTO_INIT;
 			else
-				it.it_value.tv_sec = SYN_TIMEOUT_INIT <<
+				it.it_value.tv_sec = RTO_INIT <<
 					(conn->retries - c->tcp.syn_linear_timeouts);
 		}
 		else
-			it.it_value.tv_sec = ACK_TIMEOUT;
+			it.it_value.tv_sec = RTO_INIT << conn->retries;
 	} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
 		it.it_value.tv_sec = FIN_TIMEOUT;
 	} else {
-- 
2.47.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/4] util: Introduce read_file() and read_file_integer() function
  2025-10-16  2:34 ` [PATCH v4 2/4] util: Introduce read_file() and read_file_integer() function Yumei Huang
@ 2025-10-16  6:30   ` David Gibson
  2025-10-16  7:49     ` Yumei Huang
  0 siblings, 1 reply; 17+ messages in thread
From: David Gibson @ 2025-10-16  6:30 UTC (permalink / raw)
  To: Yumei Huang; +Cc: passt-dev, sbrivio

[-- Attachment #1: Type: text/plain, Size: 4177 bytes --]

On Thu, Oct 16, 2025 at 10:34:21AM +0800, Yumei Huang wrote:
> Signed-off-by: Yumei Huang <yuhuang@redhat.com>
> ---
>  util.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  util.h |  3 +++
>  2 files changed, 87 insertions(+)
> 
> diff --git a/util.c b/util.c
> index c492f90..197677e 100644
> --- a/util.c
> +++ b/util.c
> @@ -579,6 +579,90 @@ int write_file(const char *path, const char *buf)
>  	return len == 0 ? 0 : -1;
>  }
>  
> +/**
> + * read_file() - Read contents of file into a buffer
> + * @path:	File to read
> + * @buf:	Buffer to store file contents
> + * @buf_size:	Size of buffer
> + *
> + * Return: number of bytes read on success, -1 on any error, -2 on truncation
> +*/
> +int read_file(const char *path, char *buf, size_t buf_size)
> +{
> +	int fd = open(path, O_RDONLY | O_CLOEXEC);
> +	size_t total_read = 0;
> +	ssize_t rc;
> +
> +	if (fd < 0) {
> +		warn_perror("Could not open %s", path);
> +		return -1;
> +	}
> +
> +	while (total_read < buf_size) {
> +		rc = read(fd, buf + total_read, buf_size - total_read);
> +
> +		if (rc < 0) {
> +			warn_perror("Couldn't read from %s", path);
> +			close(fd);
> +			return -1;
> +		}
> +
> +		if (rc == 0)
> +			break;
> +
> +		total_read += rc;
> +	}
> +
> +	close(fd);
> +
> +	if (total_read == buf_size) {
> +		warn_perror("File %s truncated, buffer too small", path);
> +		return -2;
> +	}
> +
> +	buf[total_read] = '\0';
> +
> +	return (int)total_read;

Probably makes more sense for total_read and the return type to be ssize_t.

> +}
> +
> +/**
> + * read_file_integer() - Read an integer value from a file
> + * @path: File to read
> + * @fallback: Default value if file can't be read
> + *
> + * Return: Integer value, fallback on failure
> +*/
> +intmax_t read_file_integer(const char *path, intmax_t fallback)
> +{
> +	char buf[INTMAX_STRLEN];
> +	char *end;

passt coding style is to list (where possible) local variables in
reverse order of line length, so this should go after bytes_read.

> +	intmax_t value;
> +	int bytes_read;
> +
> +	bytes_read = read_file(path, buf, sizeof(buf));
> +
> +	if (bytes_read < 0)
> +		return fallback;
> +
> +	if (bytes_read == 0) {
> +		debug("Empty file %s", path);
> +		return fallback;
> +	}
> +
> +	errno = 0;
> +	value = strtoimax(buf, &end, 10);
> +	if (*end && *end != '\n') {
> +		debug("Invalid format in %s", path);
> +		return fallback;
> +	}
> +	if (errno) {
> +		debug("Invalid value in %s: %s", path, buf);
> +		return fallback;
> +	}
> +
> +	return value;
> +}
> +
>  #ifdef __ia64__
>  /* Needed by do_clone() below: glibc doesn't export the prototype of __clone2(),
>   * use the description from clone(2).
> diff --git a/util.h b/util.h
> index 22eaac5..887d795 100644
> --- a/util.h
> +++ b/util.h
> @@ -222,6 +222,8 @@ void pidfile_write(int fd, pid_t pid);
>  int __daemon(int pidfile_fd, int devnull_fd);
>  int fls(unsigned long x);
>  int write_file(const char *path, const char *buf);
> +int read_file(const char *path, char *buf, size_t buf_size);
> +intmax_t read_file_integer(const char *path, intmax_t fallback);
>  int write_all_buf(int fd, const void *buf, size_t len);
>  int write_remainder(int fd, const struct iovec *iov, size_t iovcnt, size_t skip);
>  int read_all_buf(int fd, void *buf, size_t len);
> @@ -249,6 +251,7 @@ static inline const char *af_name(sa_family_t af)
>  }
>  
>  #define UINT16_STRLEN		(sizeof("65535"))
> +#define INTMAX_STRLEN		(sizeof("-9223372036854775808"))

It's correct for now, and probably for any systems we're likely to run
on, but I dislike hard-assuming the size of intmax_t here.  I feel
like there must be a better way to derive the correct string length,
but I haven't figured out what it is yet :(.

>  
>  /* inet address (- '\0') + port (u16) (- '\0') + ':' + '\0' */
>  #define SOCKADDR_INET_STRLEN					\
> -- 
> 2.47.0
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/4] util: Introduce read_file() and read_file_integer() function
  2025-10-16  6:30   ` David Gibson
@ 2025-10-16  7:49     ` Yumei Huang
  2025-10-16 22:22       ` Stefano Brivio
  0 siblings, 1 reply; 17+ messages in thread
From: Yumei Huang @ 2025-10-16  7:49 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev, sbrivio

On Thu, Oct 16, 2025 at 2:30 PM David Gibson
<david@gibson.dropbear.id.au> wrote:
>
> On Thu, Oct 16, 2025 at 10:34:21AM +0800, Yumei Huang wrote:
> > Signed-off-by: Yumei Huang <yuhuang@redhat.com>
> > ---
> >  util.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  util.h |  3 +++
> >  2 files changed, 87 insertions(+)
> >
> > diff --git a/util.c b/util.c
> > index c492f90..197677e 100644
> > --- a/util.c
> > +++ b/util.c
> > @@ -579,6 +579,90 @@ int write_file(const char *path, const char *buf)
> >       return len == 0 ? 0 : -1;
> >  }
> >
> > +/**
> > + * read_file() - Read contents of file into a buffer
> > + * @path:    File to read
> > + * @buf:     Buffer to store file contents
> > + * @buf_size:        Size of buffer
> > + *
> > + * Return: number of bytes read on success, -1 on any error, -2 on truncation
> > +*/
> > +int read_file(const char *path, char *buf, size_t buf_size)
> > +{
> > +     int fd = open(path, O_RDONLY | O_CLOEXEC);
> > +     size_t total_read = 0;
> > +     ssize_t rc;
> > +
> > +     if (fd < 0) {
> > +             warn_perror("Could not open %s", path);
> > +             return -1;
> > +     }
> > +
> > +     while (total_read < buf_size) {
> > +             rc = read(fd, buf + total_read, buf_size - total_read);
> > +
> > +             if (rc < 0) {
> > +                     warn_perror("Couldn't read from %s", path);
> > +                     close(fd);
> > +                     return -1;
> > +             }
> > +
> > +             if (rc == 0)
> > +                     break;
> > +
> > +             total_read += rc;
> > +     }
> > +
> > +     close(fd);
> > +
> > +     if (total_read == buf_size) {
> > +             warn_perror("File %s truncated, buffer too small", path);
> > +             return -2;
> > +     }
> > +
> > +     buf[total_read] = '\0';
> > +
> > +     return (int)total_read;
>
> Probably makes more sense for total_read and the return type to be ssize_t.

Just tried to be consistent with write_file().  I can change it to
ssize_t if needed.
>
> > +}
> > +
> > +/**
> > + * read_file_integer() - Read an integer value from a file
> > + * @path: File to read
> > + * @fallback: Default value if file can't be read
> > + *
> > + * Return: Integer value, fallback on failure
> > +*/
> > +intmax_t read_file_integer(const char *path, intmax_t fallback)
> > +{
> > +     char buf[INTMAX_STRLEN];
> > +     char *end;
>
> passt coding style is to list (where possible) local variables in
> reverse order of line length, so this should go after bytes_read.

Oh, I didn't notice that. Will update later.
>
> > +     intmax_t value;
> > +     int bytes_read;
> > +
> > +     bytes_read = read_file(path, buf, sizeof(buf));
> > +
> > +     if (bytes_read < 0)
> > +             return fallback;
> > +
> > +     if (bytes_read == 0) {
> > +             debug("Empty file %s", path);
> > +             return fallback;
> > +     }
> > +
> > +     errno = 0;
> > +     value = strtoimax(buf, &end, 10);
> > +     if (*end && *end != '\n') {
> > +             debug("Invalid format in %s", path);
> > +             return fallback;
> > +     }
> > +     if (errno) {
> > +             debug("Invalid value in %s: %s", path, buf);
> > +             return fallback;
> > +     }
> > +
> > +     return value;
> > +}
> > +
> >  #ifdef __ia64__
> >  /* Needed by do_clone() below: glibc doesn't export the prototype of __clone2(),
> >   * use the description from clone(2).
> > diff --git a/util.h b/util.h
> > index 22eaac5..887d795 100644
> > --- a/util.h
> > +++ b/util.h
> > @@ -222,6 +222,8 @@ void pidfile_write(int fd, pid_t pid);
> >  int __daemon(int pidfile_fd, int devnull_fd);
> >  int fls(unsigned long x);
> >  int write_file(const char *path, const char *buf);
> > +int read_file(const char *path, char *buf, size_t buf_size);
> > +intmax_t read_file_integer(const char *path, intmax_t fallback);
> >  int write_all_buf(int fd, const void *buf, size_t len);
> >  int write_remainder(int fd, const struct iovec *iov, size_t iovcnt, size_t skip);
> >  int read_all_buf(int fd, void *buf, size_t len);
> > @@ -249,6 +251,7 @@ static inline const char *af_name(sa_family_t af)
> >  }
> >
> >  #define UINT16_STRLEN                (sizeof("65535"))
> > +#define INTMAX_STRLEN                (sizeof("-9223372036854775808"))
>
> It's correct for now, and probably for any systems we're likely to run
> on, but I dislike hard-assuming the size of intmax_t here.  I feel
> like there must be a better way to derive the correct string length,
> but I haven't figured out what it is yet :(.

How about this:

     #define INTMAX_STRLEN (sizeof(intmax_t) * 3 + 2)

Each byte can represent about 2.4 decimal digits as below,
sizeof(intmax_t) * 3 gives us a safe upper bound, +2 for sign and null
terminator.

  1 bit = log₁₀(2) ≈ 0.30103 decimal digits
  1 byte = 8 bits = 8 × 0.30103 ≈ 2.408 decimal digits

>
> >
> >  /* inet address (- '\0') + port (u16) (- '\0') + ':' + '\0' */
> >  #define SOCKADDR_INET_STRLEN                                 \
> > --
> > 2.47.0
> >
>
> --
> David Gibson (he or they)       | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you, not the other way
>                                 | around.
> http://www.ozlabs.org/~dgibson



-- 
Thanks,

Yumei Huang


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/4] util: Introduce read_file() and read_file_integer() function
  2025-10-16  7:49     ` Yumei Huang
@ 2025-10-16 22:22       ` Stefano Brivio
  2025-10-16 23:16         ` David Gibson
  0 siblings, 1 reply; 17+ messages in thread
From: Stefano Brivio @ 2025-10-16 22:22 UTC (permalink / raw)
  To: Yumei Huang; +Cc: David Gibson, passt-dev

On Thu, 16 Oct 2025 15:49:39 +0800
Yumei Huang <yuhuang@redhat.com> wrote:

> On Thu, Oct 16, 2025 at 2:30 PM David Gibson
> <david@gibson.dropbear.id.au> wrote:
> >
> > On Thu, Oct 16, 2025 at 10:34:21AM +0800, Yumei Huang wrote:  
> > > Signed-off-by: Yumei Huang <yuhuang@redhat.com>
> > > ---
> > >  util.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  util.h |  3 +++
> > >  2 files changed, 87 insertions(+)
> > >
> > > diff --git a/util.c b/util.c
> > > index c492f90..197677e 100644
> > > --- a/util.c
> > > +++ b/util.c
> > > @@ -579,6 +579,90 @@ int write_file(const char *path, const char *buf)
> > >       return len == 0 ? 0 : -1;
> > >  }
> > >
> > > +/**
> > > + * read_file() - Read contents of file into a buffer
> > > + * @path:    File to read
> > > + * @buf:     Buffer to store file contents
> > > + * @buf_size:        Size of buffer
> > > + *
> > > + * Return: number of bytes read on success, -1 on any error, -2 on truncation
> > > +*/
> > > +int read_file(const char *path, char *buf, size_t buf_size)
> > > +{
> > > +     int fd = open(path, O_RDONLY | O_CLOEXEC);
> > > +     size_t total_read = 0;
> > > +     ssize_t rc;
> > > +
> > > +     if (fd < 0) {
> > > +             warn_perror("Could not open %s", path);
> > > +             return -1;
> > > +     }
> > > +
> > > +     while (total_read < buf_size) {
> > > +             rc = read(fd, buf + total_read, buf_size - total_read);
> > > +
> > > +             if (rc < 0) {
> > > +                     warn_perror("Couldn't read from %s", path);
> > > +                     close(fd);
> > > +                     return -1;
> > > +             }
> > > +
> > > +             if (rc == 0)
> > > +                     break;
> > > +
> > > +             total_read += rc;
> > > +     }
> > > +
> > > +     close(fd);
> > > +
> > > +     if (total_read == buf_size) {
> > > +             warn_perror("File %s truncated, buffer too small", path);
> > > +             return -2;
> > > +     }
> > > +
> > > +     buf[total_read] = '\0';
> > > +
> > > +     return (int)total_read;  
> >
> > Probably makes more sense for total_read and the return type to be ssize_t.  
> 
> Just tried to be consistent with write_file().  I can change it to
> ssize_t if needed.

ssize_t is the type designed for this, if write_file() has it wrong (I
didn't check), we should fix that as well.

> > > +}
> > > +
> > > +/**
> > > + * read_file_integer() - Read an integer value from a file
> > > + * @path: File to read
> > > + * @fallback: Default value if file can't be read
> > > + *
> > > + * Return: Integer value, fallback on failure
> > > +*/
> > > +intmax_t read_file_integer(const char *path, intmax_t fallback)
> > > +{
> > > +     char buf[INTMAX_STRLEN];
> > > +     char *end;  
> >
> > passt coding style is to list (where possible) local variables in
> > reverse order of line length, so this should go after bytes_read.  
> 
> Oh, I didn't notice that. Will update later.

Rationale (added to my further list for CONTRIBUTING.md):

  https://hisham.hm/2018/06/16/when-listing-repeated-things-make-pyramids/

and see also https://lwn.net/Articles/758552/.

> >  
> > > +     intmax_t value;
> > > +     int bytes_read;
> > > +
> > > +     bytes_read = read_file(path, buf, sizeof(buf));
> > > +
> > > +     if (bytes_read < 0)
> > > +             return fallback;
> > > +
> > > +     if (bytes_read == 0) {
> > > +             debug("Empty file %s", path);
> > > +             return fallback;
> > > +     }
> > > +
> > > +     errno = 0;
> > > +     value = strtoimax(buf, &end, 10);
> > > +     if (*end && *end != '\n') {
> > > +             debug("Invalid format in %s", path);
> > > +             return fallback;
> > > +     }
> > > +     if (errno) {
> > > +             debug("Invalid value in %s: %s", path, buf);
> > > +             return fallback;
> > > +     }
> > > +
> > > +     return value;
> > > +}
> > > +
> > >  #ifdef __ia64__
> > >  /* Needed by do_clone() below: glibc doesn't export the prototype of __clone2(),
> > >   * use the description from clone(2).
> > > diff --git a/util.h b/util.h
> > > index 22eaac5..887d795 100644
> > > --- a/util.h
> > > +++ b/util.h
> > > @@ -222,6 +222,8 @@ void pidfile_write(int fd, pid_t pid);
> > >  int __daemon(int pidfile_fd, int devnull_fd);
> > >  int fls(unsigned long x);
> > >  int write_file(const char *path, const char *buf);
> > > +int read_file(const char *path, char *buf, size_t buf_size);
> > > +intmax_t read_file_integer(const char *path, intmax_t fallback);
> > >  int write_all_buf(int fd, const void *buf, size_t len);
> > >  int write_remainder(int fd, const struct iovec *iov, size_t iovcnt, size_t skip);
> > >  int read_all_buf(int fd, void *buf, size_t len);
> > > @@ -249,6 +251,7 @@ static inline const char *af_name(sa_family_t af)
> > >  }
> > >
> > >  #define UINT16_STRLEN                (sizeof("65535"))
> > > +#define INTMAX_STRLEN                (sizeof("-9223372036854775808"))  
> >
> > It's correct for now, and probably for any systems we're likely to run
> > on, but I dislike hard-assuming the size of intmax_t here.  I feel
> > like there must be a better way to derive the correct string length,
> > but I haven't figured out what it is yet :(.  
> 
> How about this:
> 
>      #define INTMAX_STRLEN (sizeof(intmax_t) * 3 + 2)
> 
> Each byte can represent about 2.4 decimal digits as below,
> sizeof(intmax_t) * 3 gives us a safe upper bound, +2 for sign and null
> terminator.
> 
>   1 bit = log₁₀(2) ≈ 0.30103 decimal digits
>   1 byte = 8 bits = 8 × 0.30103 ≈ 2.408 decimal digits

If it's sourced from https://stackoverflow.com/a/10536254 and comment,
don't forget to mention that in whatever implementation / commit
message.

But I was thinking... what if we keep it much simpler, use BUFSIZ, and
error out if the buffer is too small? It would be good to be robust
against any potential kernel issue anyway, so I think we need a
mechanism like that in any case.

It's not a security matter, because if the kernel was compromised,
we're compromised too, simply a matter of robustness.

-- 
Stefano


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 3/4] tcp: Resend SYN for inbound connections
  2025-10-16  2:34 ` [PATCH v4 3/4] tcp: Resend SYN for inbound connections Yumei Huang
@ 2025-10-16 22:22   ` Stefano Brivio
  2025-10-16 23:34     ` David Gibson
  2025-10-16 23:49   ` David Gibson
  1 sibling, 1 reply; 17+ messages in thread
From: Stefano Brivio @ 2025-10-16 22:22 UTC (permalink / raw)
  To: Yumei Huang; +Cc: passt-dev, david

On Thu, 16 Oct 2025 10:34:22 +0800
Yumei Huang <yuhuang@redhat.com> wrote:

> If a client connects while guest is not connected or ready yet,
> resend SYN instead of just resetting connection after 10 seconds.
> 
> Use the same backoff calculation for the timeout as linux kernel.
> 
> Signed-off-by: Yumei Huang <yuhuang@redhat.com>
> ---
>  tcp.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++--------
>  tcp.h |  2 ++
>  2 files changed, 49 insertions(+), 8 deletions(-)
> 
> diff --git a/tcp.c b/tcp.c
> index 2ec4b0c..3003333 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -179,9 +179,11 @@
>   *
>   * Timeouts are implemented by means of timerfd timers, set based on flags:
>   *
> - * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshake (flag
> - *   ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, reset the
> - *   connection
> + * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake
> + *   (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend
> + *   SYN. It's the starting timeout for the first SYN retry. If this persists
> + *   for more than TCP_MAX_RETRIES or (tcp_syn_retries +
> + *   tcp_syn_linear_timeouts) times in a row, reset the connection
>   *
>   * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending
>   *   data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the
> @@ -340,7 +342,7 @@ enum {
>  #define WINDOW_DEFAULT			14600		/* RFC 6928 */
>  
>  #define ACK_INTERVAL			10		/* ms */
> -#define SYN_TIMEOUT			10		/* s */
> +#define SYN_TIMEOUT_INIT		1		/* s */
>  #define ACK_TIMEOUT			2
>  #define FIN_TIMEOUT			60
>  #define ACT_TIMEOUT			7200
> @@ -365,6 +367,10 @@ uint8_t tcp_migrate_rcv_queue		[TCP_MIGRATE_RCV_QUEUE_MAX];
>  
>  #define TCP_MIGRATE_RESTORE_CHUNK_MIN	1024 /* Try smaller when above this */
>  
> +#define TCP_SYN_RETRIES_SYSCTL		"/proc/sys/net/ipv4/tcp_syn_retries"
> +#define TCP_SYN_LINEAR_TIMEOUTS_SYSCTL						\
> +	"/proc/sys/net/ipv4/tcp_syn_linear_timeouts"

It's quite obvious those are names of sysctl entries, I think we can
drop the _SYSCTL suffix and keep this shorter without losing any
information/indication.

> +
>  /* "Extended" data (not stored in the flow table) for TCP flow migration */
>  static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX];
>  
> @@ -581,8 +587,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
>  	if (conn->flags & ACK_TO_TAP_DUE) {
>  		it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000;
>  	} else if (conn->flags & ACK_FROM_TAP_DUE) {
> -		if (!(conn->events & ESTABLISHED))
> -			it.it_value.tv_sec = SYN_TIMEOUT;
> +		if (!(conn->events & ESTABLISHED)) {
> +			if (conn->retries < c->tcp.syn_linear_timeouts)
> +				it.it_value.tv_sec = SYN_TIMEOUT_INIT;
> +			else
> +				it.it_value.tv_sec = SYN_TIMEOUT_INIT <<
> +					(conn->retries - c->tcp.syn_linear_timeouts);
> +		}
>  		else
>  			it.it_value.tv_sec = ACK_TIMEOUT;
>  	} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
> @@ -2409,8 +2420,16 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
>  		tcp_timer_ctl(c, conn);
>  	} else if (conn->flags & ACK_FROM_TAP_DUE) {
>  		if (!(conn->events & ESTABLISHED)) {
> -			flow_dbg(conn, "handshake timeout");
> -			tcp_rst(c, conn);
> +			if (conn->retries >= MIN(TCP_MAX_RETRIES,
> +				(c->tcp.tcp_syn_retries + c->tcp.syn_linear_timeouts))) {

That doesn't seem to match the sysctl documentation for
tcp_syn_retries, which should be the *total* number of retries, not
excluding the ones with "linear timeouts".

This is pretty hard to read, by the way. It could be:

			if (conn->retries >= TCP_MAX_RETRIES ||
			    conn->retries >= ...) {

> +				flow_dbg(conn, "handshake timeout");
> +				tcp_rst(c, conn);
> +			} else {
> +				flow_trace(conn, "SYN timeout, retry");
> +				tcp_send_flag(c, conn, SYN);
> +				conn->retries++;
> +				tcp_timer_ctl(c, conn);
> +			}
>  		} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
>  			flow_dbg(conn, "FIN timeout");
>  			tcp_rst(c, conn);
> @@ -2766,6 +2785,24 @@ static socklen_t tcp_probe_tcp_info(void)
>  	return sl;
>  }
>  
> +/**
> + * tcp_syn_params_init() - Get initial syn params for inbound connection

SYN, parameters

> + * @c:		Execution context
> +*/
> +void tcp_syn_params_init(struct ctx *c)
> +{
> +	long tcp_syn_retries, syn_linear_timeouts;
> +
> +	tcp_syn_retries = read_file_integer(TCP_SYN_RETRIES_SYSCTL, 8);
> +	syn_linear_timeouts = read_file_integer(TCP_SYN_LINEAR_TIMEOUTS_SYSCTL, 1);
> +
> +	c->tcp.tcp_syn_retries = (uint8_t)MIN(tcp_syn_retries, UINT8_MAX);
> +	c->tcp.syn_linear_timeouts = (uint8_t)MIN(syn_linear_timeouts, UINT8_MAX);

I don't think you need those casts, MIN(..., UINT8_MAX) already
guarantees that the number is <= UINT8_MAX, and in any case the cast
won't fix anything here.

> +
> +	debug("TCP SYN parameters: retries=%"PRIu8", linear_timeouts=%"PRIu8,
> +		  c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts);
> +}
> +
>  /**
>   * tcp_init() - Get initial sequence, hash secret, initialise per-socket data
>   * @c:		Execution context
> @@ -2776,6 +2813,8 @@ int tcp_init(struct ctx *c)
>  {
>  	ASSERT(!c->no_tcp);
>  
> +	tcp_syn_params_init(c);
> +
>  	tcp_sock_iov_init(c);
>  
>  	memset(init_sock_pool4,		0xff,	sizeof(init_sock_pool4));
> diff --git a/tcp.h b/tcp.h
> index 234a803..df699a4 100644
> --- a/tcp.h
> +++ b/tcp.h
> @@ -65,6 +65,8 @@ struct tcp_ctx {
>  	struct fwd_ports fwd_out;
>  	struct timespec timer_run;
>  	size_t pipe_size;
> +	uint8_t tcp_syn_retries;
> +	uint8_t syn_linear_timeouts;

These should be added to the documentation for struct tcp_ctx, above.

>  };
>  
>  #endif /* TCP_H */

-- 
Stefano


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/4] util: Introduce read_file() and read_file_integer() function
  2025-10-16 22:22       ` Stefano Brivio
@ 2025-10-16 23:16         ` David Gibson
  2025-10-17  2:11           ` Yumei Huang
  0 siblings, 1 reply; 17+ messages in thread
From: David Gibson @ 2025-10-16 23:16 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Yumei Huang, passt-dev

[-- Attachment #1: Type: text/plain, Size: 5606 bytes --]

On Fri, Oct 17, 2025 at 12:22:14AM +0200, Stefano Brivio wrote:
> On Thu, 16 Oct 2025 15:49:39 +0800
> Yumei Huang <yuhuang@redhat.com> wrote:
> 
> > On Thu, Oct 16, 2025 at 2:30 PM David Gibson
> > <david@gibson.dropbear.id.au> wrote:
> > >
> > > On Thu, Oct 16, 2025 at 10:34:21AM +0800, Yumei Huang wrote:  
> > > > Signed-off-by: Yumei Huang <yuhuang@redhat.com>
[snip]
> > > > +     if (total_read == buf_size) {
> > > > +             warn_perror("File %s truncated, buffer too small", path);
> > > > +             return -2;
> > > > +     }
> > > > +
> > > > +     buf[total_read] = '\0';
> > > > +
> > > > +     return (int)total_read;  
> > >
> > > Probably makes more sense for total_read and the return type to be ssize_t.  
> > 
> > Just tried to be consistent with write_file().  I can change it to
> > ssize_t if needed.
> 
> ssize_t is the type designed for this, if write_file() has it wrong (I
> didn't check), we should fix that as well.

It does, and we should :).

> > > > +}
> > > > +
> > > > +/**
> > > > + * read_file_integer() - Read an integer value from a file
> > > > + * @path: File to read
> > > > + * @fallback: Default value if file can't be read
> > > > + *
> > > > + * Return: Integer value, fallback on failure
> > > > +*/
> > > > +intmax_t read_file_integer(const char *path, intmax_t fallback)
> > > > +{
> > > > +     char buf[INTMAX_STRLEN];
> > > > +     char *end;  
> > >
> > > passt coding style is to list (where possible) local variables in
> > > reverse order of line length, so this should go after bytes_read.  
> > 
> > Oh, I didn't notice that. Will update later.
> 
> Rationale (added to my further list for CONTRIBUTING.md):
> 
>   https://hisham.hm/2018/06/16/when-listing-repeated-things-make-pyramids/
> 
> and see also https://lwn.net/Articles/758552/.

If you want to update CONTRIBUTING.md to cover this, Yumei, that would
be much appreciated.

> > > > +     intmax_t value;
> > > > +     int bytes_read;
> > > > +
> > > > +     bytes_read = read_file(path, buf, sizeof(buf));
> > > > +
> > > > +     if (bytes_read < 0)
> > > > +             return fallback;
> > > > +
> > > > +     if (bytes_read == 0) {
> > > > +             debug("Empty file %s", path);
> > > > +             return fallback;
> > > > +     }
> > > > +
> > > > +     errno = 0;
> > > > +     value = strtoimax(buf, &end, 10);
> > > > +     if (*end && *end != '\n') {
> > > > +             debug("Invalid format in %s", path);
> > > > +             return fallback;
> > > > +     }
> > > > +     if (errno) {
> > > > +             debug("Invalid value in %s: %s", path, buf);
> > > > +             return fallback;
> > > > +     }
> > > > +
> > > > +     return value;
> > > > +}
> > > > +
> > > >  #ifdef __ia64__
> > > >  /* Needed by do_clone() below: glibc doesn't export the prototype of __clone2(),
> > > >   * use the description from clone(2).
> > > > diff --git a/util.h b/util.h
> > > > index 22eaac5..887d795 100644
> > > > --- a/util.h
> > > > +++ b/util.h
> > > > @@ -222,6 +222,8 @@ void pidfile_write(int fd, pid_t pid);
> > > >  int __daemon(int pidfile_fd, int devnull_fd);
> > > >  int fls(unsigned long x);
> > > >  int write_file(const char *path, const char *buf);
> > > > +int read_file(const char *path, char *buf, size_t buf_size);
> > > > +intmax_t read_file_integer(const char *path, intmax_t fallback);
> > > >  int write_all_buf(int fd, const void *buf, size_t len);
> > > >  int write_remainder(int fd, const struct iovec *iov, size_t iovcnt, size_t skip);
> > > >  int read_all_buf(int fd, void *buf, size_t len);
> > > > @@ -249,6 +251,7 @@ static inline const char *af_name(sa_family_t af)
> > > >  }
> > > >
> > > >  #define UINT16_STRLEN                (sizeof("65535"))
> > > > +#define INTMAX_STRLEN                (sizeof("-9223372036854775808"))  
> > >
> > > It's correct for now, and probably for any systems we're likely to run
> > > on, but I dislike hard-assuming the size of intmax_t here.  I feel
> > > like there must be a better way to derive the correct string length,
> > > but I haven't figured out what it is yet :(.  
> > 
> > How about this:
> > 
> >      #define INTMAX_STRLEN (sizeof(intmax_t) * 3 + 2)
> > 
> > Each byte can represent about 2.4 decimal digits as below,
> > sizeof(intmax_t) * 3 gives us a safe upper bound, +2 for sign and null
> > terminator.
> > 
> >   1 bit = log₁₀(2) ≈ 0.30103 decimal digits
> >   1 byte = 8 bits = 8 × 0.30103 ≈ 2.408 decimal digits

Works for me.

> If it's sourced from https://stackoverflow.com/a/10536254 and comment,
> don't forget to mention that in whatever implementation / commit
> message.

Good point.

> But I was thinking... what if we keep it much simpler, use BUFSIZ, and
> error out if the buffer is too small? It would be good to be robust
> against any potential kernel issue anyway, so I think we need a
> mechanism like that in any case.

It already handles the case where the buffer isn't big enough (in
read_file()).  We could use BUFSIZ, but it's massive overkill for
reading a single integer: 8192 versus ~21 bytes (or ~42 bytes if
intmax_t were 128-bit).

> It's not a security matter, because if the kernel was compromised,
> we're compromised too, simply a matter of robustness.
> 
> -- 
> Stefano
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 3/4] tcp: Resend SYN for inbound connections
  2025-10-16 22:22   ` Stefano Brivio
@ 2025-10-16 23:34     ` David Gibson
  0 siblings, 0 replies; 17+ messages in thread
From: David Gibson @ 2025-10-16 23:34 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Yumei Huang, passt-dev

[-- Attachment #1: Type: text/plain, Size: 4372 bytes --]

On Fri, Oct 17, 2025 at 12:22:46AM +0200, Stefano Brivio wrote:
> On Thu, 16 Oct 2025 10:34:22 +0800
> Yumei Huang <yuhuang@redhat.com> wrote:
> 
> > If a client connects while guest is not connected or ready yet,
> > resend SYN instead of just resetting connection after 10 seconds.
> > 
> > Use the same backoff calculation for the timeout as linux kernel.
> > 
> > Signed-off-by: Yumei Huang <yuhuang@redhat.com>
> > ---
> >  tcp.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++--------
> >  tcp.h |  2 ++
> >  2 files changed, 49 insertions(+), 8 deletions(-)
> > 
> > diff --git a/tcp.c b/tcp.c
> > index 2ec4b0c..3003333 100644
> > --- a/tcp.c
> > +++ b/tcp.c
> > @@ -179,9 +179,11 @@
> >   *
> >   * Timeouts are implemented by means of timerfd timers, set based on flags:
> >   *
> > - * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshake (flag
> > - *   ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, reset the
> > - *   connection
> > + * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake
> > + *   (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend
> > + *   SYN. It's the starting timeout for the first SYN retry. If this persists
> > + *   for more than TCP_MAX_RETRIES or (tcp_syn_retries +
> > + *   tcp_syn_linear_timeouts) times in a row, reset the connection
> >   *
> >   * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending
> >   *   data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the
> > @@ -340,7 +342,7 @@ enum {
> >  #define WINDOW_DEFAULT			14600		/* RFC 6928 */
> >  
> >  #define ACK_INTERVAL			10		/* ms */
> > -#define SYN_TIMEOUT			10		/* s */
> > +#define SYN_TIMEOUT_INIT		1		/* s */
> >  #define ACK_TIMEOUT			2
> >  #define FIN_TIMEOUT			60
> >  #define ACT_TIMEOUT			7200
> > @@ -365,6 +367,10 @@ uint8_t tcp_migrate_rcv_queue		[TCP_MIGRATE_RCV_QUEUE_MAX];
> >  
> >  #define TCP_MIGRATE_RESTORE_CHUNK_MIN	1024 /* Try smaller when above this */
> >  
> > +#define TCP_SYN_RETRIES_SYSCTL		"/proc/sys/net/ipv4/tcp_syn_retries"
> > +#define TCP_SYN_LINEAR_TIMEOUTS_SYSCTL						\
> > +	"/proc/sys/net/ipv4/tcp_syn_linear_timeouts"
> 
> It's quite obvious those are names of sysctl entries, I think we can
> drop the _SYSCTL suffix and keep this shorter without losing any
> information/indication.
> 
> > +
> >  /* "Extended" data (not stored in the flow table) for TCP flow migration */
> >  static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX];
> >  
> > @@ -581,8 +587,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
> >  	if (conn->flags & ACK_TO_TAP_DUE) {
> >  		it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000;
> >  	} else if (conn->flags & ACK_FROM_TAP_DUE) {
> > -		if (!(conn->events & ESTABLISHED))
> > -			it.it_value.tv_sec = SYN_TIMEOUT;
> > +		if (!(conn->events & ESTABLISHED)) {
> > +			if (conn->retries < c->tcp.syn_linear_timeouts)
> > +				it.it_value.tv_sec = SYN_TIMEOUT_INIT;
> > +			else
> > +				it.it_value.tv_sec = SYN_TIMEOUT_INIT <<
> > +					(conn->retries - c->tcp.syn_linear_timeouts);
> > +		}
> >  		else
> >  			it.it_value.tv_sec = ACK_TIMEOUT;
> >  	} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
> > @@ -2409,8 +2420,16 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
> >  		tcp_timer_ctl(c, conn);
> >  	} else if (conn->flags & ACK_FROM_TAP_DUE) {
> >  		if (!(conn->events & ESTABLISHED)) {
> > -			flow_dbg(conn, "handshake timeout");
> > -			tcp_rst(c, conn);
> > +			if (conn->retries >= MIN(TCP_MAX_RETRIES,
> > +				(c->tcp.tcp_syn_retries + c->tcp.syn_linear_timeouts))) {
> 
> That doesn't seem to match the sysctl documentation for
> tcp_syn_retries, which should be the *total* number of retries, not
> excluding the ones with "linear timeouts".

No, but it does match the actual kernel behaviour (Yumei already
experimented and discovered this).  Confirmed it in the kernel code,
here:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/tcp_timer.c#n258

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 3/4] tcp: Resend SYN for inbound connections
  2025-10-16  2:34 ` [PATCH v4 3/4] tcp: Resend SYN for inbound connections Yumei Huang
  2025-10-16 22:22   ` Stefano Brivio
@ 2025-10-16 23:49   ` David Gibson
  1 sibling, 0 replies; 17+ messages in thread
From: David Gibson @ 2025-10-16 23:49 UTC (permalink / raw)
  To: Yumei Huang; +Cc: passt-dev, sbrivio

[-- Attachment #1: Type: text/plain, Size: 881 bytes --]

On Thu, Oct 16, 2025 at 10:34:22AM +0800, Yumei Huang wrote:
> If a client connects while guest is not connected or ready yet,
> resend SYN instead of just resetting connection after 10 seconds.
> 
> Use the same backoff calculation for the timeout as linux kernel.
> 
> Signed-off-by: Yumei Huang <yuhuang@redhat.com>

LGTM.  Stefano's suggestions make sense, plus one more nit below:

[snip]
> +/**
> + * tcp_syn_params_init() - Get initial syn params for inbound connection
> + * @c:		Execution context
> +*/
> +void tcp_syn_params_init(struct ctx *c)
> +{
> +	long tcp_syn_retries, syn_linear_timeouts;

These should be intmax_t to match read_file_integer().

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 4/4] tcp: Update data retransmission timeout
  2025-10-16  2:34 ` [PATCH v4 4/4] tcp: Update data retransmission timeout Yumei Huang
@ 2025-10-16 23:59   ` David Gibson
  0 siblings, 0 replies; 17+ messages in thread
From: David Gibson @ 2025-10-16 23:59 UTC (permalink / raw)
  To: Yumei Huang; +Cc: passt-dev, sbrivio

[-- Attachment #1: Type: text/plain, Size: 3668 bytes --]

On Thu, Oct 16, 2025 at 10:34:23AM +0800, Yumei Huang wrote:
> Use an exponential backoff timeout for data retransmission according
> to RFC 2988 and RFC 6298. Set the initial RTO to one second as discussed
> in Appendix A of RFC 6298.
> 
> Also combine the macros defining the initial RTO for both SYN and ACK.
> 
> Signed-off-by: Yumei Huang <yuhuang@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Except for some minor English clarity notes below.

> ---
>  tcp.c | 25 ++++++++++---------------
>  1 file changed, 10 insertions(+), 15 deletions(-)
> 
> diff --git a/tcp.c b/tcp.c
> index 3003333..3254d67 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -179,16 +179,12 @@
>   *
>   * Timeouts are implemented by means of timerfd timers, set based on flags:
>   *
> - * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake
> - *   (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend
> - *   SYN. It's the starting timeout for the first SYN retry. If this persists
> - *   for more than TCP_MAX_RETRIES or (tcp_syn_retries +
> - *   tcp_syn_linear_timeouts) times in a row, reset the connection
> - *
> - * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending
> - *   data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the
> - *   socket and reset sequence to what was acknowledged. If this persists for
> - *   more than TCP_MAX_RETRIES times in a row, reset the connection
> + * - RTO_INIT: if no ACK segment was received from tap/guest, either during
> + *   handshake (flag ACK_FROM_TAP_DUE without ESTABLISHED event) or after
> + *   sending data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data
> + *   from the socket and reset sequence to what was acknowledged. It's the
> + *   starting timeout for the first retry.

For clarity, I'd suggest:
	This is the timeout for the first retry, in seconds.

> If this persists for more than
> + *   allowed times in a row, reset the connection

I'd suggest:

If this persists too many times in a row, reset the connection:
TCP_MAX_RETRIES for established connections, or (tcp_syn_retries +
tcp_syn_linear_timeouts) during the handshake.

>   * - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK_FROM_TAP_DUE
>   *   with TAP_FIN_SENT event), and no ACK is received within this time, reset
> @@ -342,8 +338,7 @@ enum {
>  #define WINDOW_DEFAULT			14600		/* RFC 6928 */
>  
>  #define ACK_INTERVAL			10		/* ms */
> -#define SYN_TIMEOUT_INIT		1		/* s */
> -#define ACK_TIMEOUT			2
> +#define RTO_INIT			1		/* s, RFC 6298 */
>  #define FIN_TIMEOUT			60
>  #define ACT_TIMEOUT			7200
>  
> @@ -589,13 +584,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
>  	} else if (conn->flags & ACK_FROM_TAP_DUE) {
>  		if (!(conn->events & ESTABLISHED)) {
>  			if (conn->retries < c->tcp.syn_linear_timeouts)
> -				it.it_value.tv_sec = SYN_TIMEOUT_INIT;
> +				it.it_value.tv_sec = RTO_INIT;
>  			else
> -				it.it_value.tv_sec = SYN_TIMEOUT_INIT <<
> +				it.it_value.tv_sec = RTO_INIT <<
>  					(conn->retries - c->tcp.syn_linear_timeouts);
>  		}
>  		else
> -			it.it_value.tv_sec = ACK_TIMEOUT;
> +			it.it_value.tv_sec = RTO_INIT << conn->retries;
>  	} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
>  		it.it_value.tv_sec = FIN_TIMEOUT;
>  	} else {
> -- 
> 2.47.0
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/4] util: Introduce read_file() and read_file_integer() function
  2025-10-16 23:16         ` David Gibson
@ 2025-10-17  2:11           ` Yumei Huang
  2025-10-17  2:29             ` David Gibson
  0 siblings, 1 reply; 17+ messages in thread
From: Yumei Huang @ 2025-10-17  2:11 UTC (permalink / raw)
  To: David Gibson; +Cc: Stefano Brivio, passt-dev

On Fri, Oct 17, 2025 at 7:16 AM David Gibson
<david@gibson.dropbear.id.au> wrote:
>
> On Fri, Oct 17, 2025 at 12:22:14AM +0200, Stefano Brivio wrote:
> > On Thu, 16 Oct 2025 15:49:39 +0800
> > Yumei Huang <yuhuang@redhat.com> wrote:
> >
> > > On Thu, Oct 16, 2025 at 2:30 PM David Gibson
> > > <david@gibson.dropbear.id.au> wrote:
> > > >
> > > > On Thu, Oct 16, 2025 at 10:34:21AM +0800, Yumei Huang wrote:
> > > > > Signed-off-by: Yumei Huang <yuhuang@redhat.com>
> [snip]
> > > > > +     if (total_read == buf_size) {
> > > > > +             warn_perror("File %s truncated, buffer too small", path);
> > > > > +             return -2;
> > > > > +     }
> > > > > +
> > > > > +     buf[total_read] = '\0';
> > > > > +
> > > > > +     return (int)total_read;
> > > >
> > > > Probably makes more sense for total_read and the return type to be ssize_t.
> > >
> > > Just tried to be consistent with write_file().  I can change it to
> > > ssize_t if needed.
> >
> > ssize_t is the type designed for this, if write_file() has it wrong (I
> > didn't check), we should fix that as well.
>
> It does, and we should :).

Checked write_file(), seems the return value is not the same to
read_file(). It returns 0 or 1 depending on success or failure. So int
works fine here. I will update read_file() only.
>
> > > > > +}
> > > > > +
> > > > > +/**
> > > > > + * read_file_integer() - Read an integer value from a file
> > > > > + * @path: File to read
> > > > > + * @fallback: Default value if file can't be read
> > > > > + *
> > > > > + * Return: Integer value, fallback on failure
> > > > > +*/
> > > > > +intmax_t read_file_integer(const char *path, intmax_t fallback)
> > > > > +{
> > > > > +     char buf[INTMAX_STRLEN];
> > > > > +     char *end;
> > > >
> > > > passt coding style is to list (where possible) local variables in
> > > > reverse order of line length, so this should go after bytes_read.
> > >
> > > Oh, I didn't notice that. Will update later.
> >
> > Rationale (added to my further list for CONTRIBUTING.md):
> >
> >   https://hisham.hm/2018/06/16/when-listing-repeated-things-make-pyramids/
> >
> > and see also https://lwn.net/Articles/758552/.
>
> If you want to update CONTRIBUTING.md to cover this, Yumei, that would
> be much appreciated.

Sure, will do it.
>
> > > > > +     intmax_t value;
> > > > > +     int bytes_read;
> > > > > +
> > > > > +     bytes_read = read_file(path, buf, sizeof(buf));
> > > > > +
> > > > > +     if (bytes_read < 0)
> > > > > +             return fallback;
> > > > > +
> > > > > +     if (bytes_read == 0) {
> > > > > +             debug("Empty file %s", path);
> > > > > +             return fallback;
> > > > > +     }
> > > > > +
> > > > > +     errno = 0;
> > > > > +     value = strtoimax(buf, &end, 10);
> > > > > +     if (*end && *end != '\n') {
> > > > > +             debug("Invalid format in %s", path);
> > > > > +             return fallback;
> > > > > +     }
> > > > > +     if (errno) {
> > > > > +             debug("Invalid value in %s: %s", path, buf);
> > > > > +             return fallback;
> > > > > +     }
> > > > > +
> > > > > +     return value;
> > > > > +}
> > > > > +
> > > > >  #ifdef __ia64__
> > > > >  /* Needed by do_clone() below: glibc doesn't export the prototype of __clone2(),
> > > > >   * use the description from clone(2).
> > > > > diff --git a/util.h b/util.h
> > > > > index 22eaac5..887d795 100644
> > > > > --- a/util.h
> > > > > +++ b/util.h
> > > > > @@ -222,6 +222,8 @@ void pidfile_write(int fd, pid_t pid);
> > > > >  int __daemon(int pidfile_fd, int devnull_fd);
> > > > >  int fls(unsigned long x);
> > > > >  int write_file(const char *path, const char *buf);
> > > > > +int read_file(const char *path, char *buf, size_t buf_size);
> > > > > +intmax_t read_file_integer(const char *path, intmax_t fallback);
> > > > >  int write_all_buf(int fd, const void *buf, size_t len);
> > > > >  int write_remainder(int fd, const struct iovec *iov, size_t iovcnt, size_t skip);
> > > > >  int read_all_buf(int fd, void *buf, size_t len);
> > > > > @@ -249,6 +251,7 @@ static inline const char *af_name(sa_family_t af)
> > > > >  }
> > > > >
> > > > >  #define UINT16_STRLEN                (sizeof("65535"))
> > > > > +#define INTMAX_STRLEN                (sizeof("-9223372036854775808"))
> > > >
> > > > It's correct for now, and probably for any systems we're likely to run
> > > > on, but I dislike hard-assuming the size of intmax_t here.  I feel
> > > > like there must be a better way to derive the correct string length,
> > > > but I haven't figured out what it is yet :(.
> > >
> > > How about this:
> > >
> > >      #define INTMAX_STRLEN (sizeof(intmax_t) * 3 + 2)
> > >
> > > Each byte can represent about 2.4 decimal digits as below,
> > > sizeof(intmax_t) * 3 gives us a safe upper bound, +2 for sign and null
> > > terminator.
> > >
> > >   1 bit = log₁₀(2) ≈ 0.30103 decimal digits
> > >   1 byte = 8 bits = 8 × 0.30103 ≈ 2.408 decimal digits
>
> Works for me.
>
> > If it's sourced from https://stackoverflow.com/a/10536254 and comment,
> > don't forget to mention that in whatever implementation / commit
> > message.

Actually, it's a suggestion from Claude and I double checked the logic and math.
Maybe I should mention Claude and the logic in the commit message instead?

>
> Good point.
>
> > But I was thinking... what if we keep it much simpler, use BUFSIZ, and
> > error out if the buffer is too small? It would be good to be robust
> > against any potential kernel issue anyway, so I think we need a
> > mechanism like that in any case.
>
> It already handles the case where the buffer isn't big enough (in
> read_file()).  We could use BUFSIZ, but it's massive overkill for
> reading a single integer: 8192 versus ~21 bytes (or ~42 bytes if
> intmax_t were 128-bit).
>
> > It's not a security matter, because if the kernel was compromised,
> > we're compromised too, simply a matter of robustness.
> >
> > --
> > Stefano
> >
>
> --
> David Gibson (he or they)       | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you, not the other way
>                                 | around.
> http://www.ozlabs.org/~dgibson



-- 
Thanks,

Yumei Huang


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/4] util: Introduce read_file() and read_file_integer() function
  2025-10-17  2:11           ` Yumei Huang
@ 2025-10-17  2:29             ` David Gibson
  2025-10-17  2:44               ` Yumei Huang
  0 siblings, 1 reply; 17+ messages in thread
From: David Gibson @ 2025-10-17  2:29 UTC (permalink / raw)
  To: Yumei Huang; +Cc: Stefano Brivio, passt-dev

[-- Attachment #1: Type: text/plain, Size: 4445 bytes --]

On Fri, Oct 17, 2025 at 10:11:21AM +0800, Yumei Huang wrote:
> On Fri, Oct 17, 2025 at 7:16 AM David Gibson
> <david@gibson.dropbear.id.au> wrote:
> >
> > On Fri, Oct 17, 2025 at 12:22:14AM +0200, Stefano Brivio wrote:
> > > On Thu, 16 Oct 2025 15:49:39 +0800
> > > Yumei Huang <yuhuang@redhat.com> wrote:
> > >
> > > > On Thu, Oct 16, 2025 at 2:30 PM David Gibson
> > > > <david@gibson.dropbear.id.au> wrote:
> > > > >
> > > > > On Thu, Oct 16, 2025 at 10:34:21AM +0800, Yumei Huang wrote:
> > > > > > Signed-off-by: Yumei Huang <yuhuang@redhat.com>
> > [snip]
> > > > > > +     if (total_read == buf_size) {
> > > > > > +             warn_perror("File %s truncated, buffer too small", path);
> > > > > > +             return -2;
> > > > > > +     }
> > > > > > +
> > > > > > +     buf[total_read] = '\0';
> > > > > > +
> > > > > > +     return (int)total_read;
> > > > >
> > > > > Probably makes more sense for total_read and the return type to be ssize_t.
> > > >
> > > > Just tried to be consistent with write_file().  I can change it to
> > > > ssize_t if needed.
> > >
> > > ssize_t is the type designed for this, if write_file() has it wrong (I
> > > didn't check), we should fix that as well.
> >
> > It does, and we should :).
> 
> Checked write_file(), seems the return value is not the same to
> read_file(). It returns 0 or 1 depending on success or failure. So int
> works fine here. I will update read_file() only.

Ah, good point.


[snip]
> > > Rationale (added to my further list for CONTRIBUTING.md):
> > >
> > >   https://hisham.hm/2018/06/16/when-listing-repeated-things-make-pyramids/
> > >
> > > and see also https://lwn.net/Articles/758552/.
> >
> > If you want to update CONTRIBUTING.md to cover this, Yumei, that would
> > be much appreciated.
> 
> Sure, will do it.

Thanks.

[snip]
> > > > > > diff --git a/util.h b/util.h
> > > > > > index 22eaac5..887d795 100644
> > > > > > --- a/util.h
> > > > > > +++ b/util.h
> > > > > > @@ -222,6 +222,8 @@ void pidfile_write(int fd, pid_t pid);
> > > > > >  int __daemon(int pidfile_fd, int devnull_fd);
> > > > > >  int fls(unsigned long x);
> > > > > >  int write_file(const char *path, const char *buf);
> > > > > > +int read_file(const char *path, char *buf, size_t buf_size);
> > > > > > +intmax_t read_file_integer(const char *path, intmax_t fallback);
> > > > > >  int write_all_buf(int fd, const void *buf, size_t len);
> > > > > >  int write_remainder(int fd, const struct iovec *iov, size_t iovcnt, size_t skip);
> > > > > >  int read_all_buf(int fd, void *buf, size_t len);
> > > > > > @@ -249,6 +251,7 @@ static inline const char *af_name(sa_family_t af)
> > > > > >  }
> > > > > >
> > > > > >  #define UINT16_STRLEN                (sizeof("65535"))
> > > > > > +#define INTMAX_STRLEN                (sizeof("-9223372036854775808"))
> > > > >
> > > > > It's correct for now, and probably for any systems we're likely to run
> > > > > on, but I dislike hard-assuming the size of intmax_t here.  I feel
> > > > > like there must be a better way to derive the correct string length,
> > > > > but I haven't figured out what it is yet :(.
> > > >
> > > > How about this:
> > > >
> > > >      #define INTMAX_STRLEN (sizeof(intmax_t) * 3 + 2)
> > > >
> > > > Each byte can represent about 2.4 decimal digits as below,
> > > > sizeof(intmax_t) * 3 gives us a safe upper bound, +2 for sign and null
> > > > terminator.
> > > >
> > > >   1 bit = log₁₀(2) ≈ 0.30103 decimal digits
> > > >   1 byte = 8 bits = 8 × 0.30103 ≈ 2.408 decimal digits
> >
> > Works for me.
> >
> > > If it's sourced from https://stackoverflow.com/a/10536254 and comment,
> > > don't forget to mention that in whatever implementation / commit
> > > message.
> 
> Actually, it's a suggestion from Claude and I double checked the logic and math.
> Maybe I should mention Claude and the logic in the commit message instead?

Heh, so Claude got it from that stackoverflow page or one like it,
probably.  It's not correctness we're concerned about, but proper
attribution.  This is small enough that it's probably not
copyrightable, but that would be a concern for larger segments as
well.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/4] util: Introduce read_file() and read_file_integer() function
  2025-10-17  2:29             ` David Gibson
@ 2025-10-17  2:44               ` Yumei Huang
  2025-10-19 10:07                 ` Stefano Brivio
  0 siblings, 1 reply; 17+ messages in thread
From: Yumei Huang @ 2025-10-17  2:44 UTC (permalink / raw)
  To: David Gibson; +Cc: Stefano Brivio, passt-dev

On Fri, Oct 17, 2025 at 10:30 AM David Gibson
<david@gibson.dropbear.id.au> wrote:
>
> On Fri, Oct 17, 2025 at 10:11:21AM +0800, Yumei Huang wrote:
> > On Fri, Oct 17, 2025 at 7:16 AM David Gibson
> > <david@gibson.dropbear.id.au> wrote:
> > >
> > > On Fri, Oct 17, 2025 at 12:22:14AM +0200, Stefano Brivio wrote:
> > > > On Thu, 16 Oct 2025 15:49:39 +0800
> > > > Yumei Huang <yuhuang@redhat.com> wrote:
> > > >
> > > > > On Thu, Oct 16, 2025 at 2:30 PM David Gibson
> > > > > <david@gibson.dropbear.id.au> wrote:
> > > > > >
> > > > > > On Thu, Oct 16, 2025 at 10:34:21AM +0800, Yumei Huang wrote:
> > > > > > > Signed-off-by: Yumei Huang <yuhuang@redhat.com>
> > > [snip]
> > > > > > > +     if (total_read == buf_size) {
> > > > > > > +             warn_perror("File %s truncated, buffer too small", path);
> > > > > > > +             return -2;
> > > > > > > +     }
> > > > > > > +
> > > > > > > +     buf[total_read] = '\0';
> > > > > > > +
> > > > > > > +     return (int)total_read;
> > > > > >
> > > > > > Probably makes more sense for total_read and the return type to be ssize_t.
> > > > >
> > > > > Just tried to be consistent with write_file().  I can change it to
> > > > > ssize_t if needed.
> > > >
> > > > ssize_t is the type designed for this, if write_file() has it wrong (I
> > > > didn't check), we should fix that as well.
> > >
> > > It does, and we should :).
> >
> > Checked write_file(), seems the return value is not the same to
> > read_file(). It returns 0 or 1 depending on success or failure. So int
> > works fine here. I will update read_file() only.
>
> Ah, good point.
>
>
> [snip]
> > > > Rationale (added to my further list for CONTRIBUTING.md):
> > > >
> > > >   https://hisham.hm/2018/06/16/when-listing-repeated-things-make-pyramids/
> > > >
> > > > and see also https://lwn.net/Articles/758552/.
> > >
> > > If you want to update CONTRIBUTING.md to cover this, Yumei, that would
> > > be much appreciated.
> >
> > Sure, will do it.
>
> Thanks.
>
> [snip]
> > > > > > > diff --git a/util.h b/util.h
> > > > > > > index 22eaac5..887d795 100644
> > > > > > > --- a/util.h
> > > > > > > +++ b/util.h
> > > > > > > @@ -222,6 +222,8 @@ void pidfile_write(int fd, pid_t pid);
> > > > > > >  int __daemon(int pidfile_fd, int devnull_fd);
> > > > > > >  int fls(unsigned long x);
> > > > > > >  int write_file(const char *path, const char *buf);
> > > > > > > +int read_file(const char *path, char *buf, size_t buf_size);
> > > > > > > +intmax_t read_file_integer(const char *path, intmax_t fallback);
> > > > > > >  int write_all_buf(int fd, const void *buf, size_t len);
> > > > > > >  int write_remainder(int fd, const struct iovec *iov, size_t iovcnt, size_t skip);
> > > > > > >  int read_all_buf(int fd, void *buf, size_t len);
> > > > > > > @@ -249,6 +251,7 @@ static inline const char *af_name(sa_family_t af)
> > > > > > >  }
> > > > > > >
> > > > > > >  #define UINT16_STRLEN                (sizeof("65535"))
> > > > > > > +#define INTMAX_STRLEN                (sizeof("-9223372036854775808"))
> > > > > >
> > > > > > It's correct for now, and probably for any systems we're likely to run
> > > > > > on, but I dislike hard-assuming the size of intmax_t here.  I feel
> > > > > > like there must be a better way to derive the correct string length,
> > > > > > but I haven't figured out what it is yet :(.
> > > > >
> > > > > How about this:
> > > > >
> > > > >      #define INTMAX_STRLEN (sizeof(intmax_t) * 3 + 2)
> > > > >
> > > > > Each byte can represent about 2.4 decimal digits as below,
> > > > > sizeof(intmax_t) * 3 gives us a safe upper bound, +2 for sign and null
> > > > > terminator.
> > > > >
> > > > >   1 bit = log₁₀(2) ≈ 0.30103 decimal digits
> > > > >   1 byte = 8 bits = 8 × 0.30103 ≈ 2.408 decimal digits
> > >
> > > Works for me.
> > >
> > > > If it's sourced from https://stackoverflow.com/a/10536254 and comment,
> > > > don't forget to mention that in whatever implementation / commit
> > > > message.
> >
> > Actually, it's a suggestion from Claude and I double checked the logic and math.
> > Maybe I should mention Claude and the logic in the commit message instead?
>
> Heh, so Claude got it from that stackoverflow page or one like it,
> probably.  It's not correctness we're concerned about, but proper
> attribution.  This is small enough that it's probably not
> copyrightable, but that would be a concern for larger segments as
> well.

Yeah, I understand. For most of the time, I only asked claude to
explain the code instead of writing the code directly. For this piece,
I will add a comment explaining the logic before the define line.
Thanks.
>
> --
> David Gibson (he or they)       | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you, not the other way
>                                 | around.
> http://www.ozlabs.org/~dgibson



-- 
Thanks,

Yumei Huang


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/4] util: Introduce read_file() and read_file_integer() function
  2025-10-17  2:44               ` Yumei Huang
@ 2025-10-19 10:07                 ` Stefano Brivio
  0 siblings, 0 replies; 17+ messages in thread
From: Stefano Brivio @ 2025-10-19 10:07 UTC (permalink / raw)
  To: Yumei Huang; +Cc: David Gibson, passt-dev

On Fri, 17 Oct 2025 10:44:42 +0800
Yumei Huang <yuhuang@redhat.com> wrote:

> On Fri, Oct 17, 2025 at 10:30 AM David Gibson
> <david@gibson.dropbear.id.au> wrote:
> >
> > On Fri, Oct 17, 2025 at 10:11:21AM +0800, Yumei Huang wrote:  
> > > On Fri, Oct 17, 2025 at 7:16 AM David Gibson
> > > <david@gibson.dropbear.id.au> wrote:  
> > > >
> > > > On Fri, Oct 17, 2025 at 12:22:14AM +0200, Stefano Brivio wrote:  
> > > > > On Thu, 16 Oct 2025 15:49:39 +0800
> > > > > Yumei Huang <yuhuang@redhat.com> wrote:
> > > > >  
> > > > > > On Thu, Oct 16, 2025 at 2:30 PM David Gibson
> > > > > > <david@gibson.dropbear.id.au> wrote:  
> > > > > > >
> > > > > > > On Thu, Oct 16, 2025 at 10:34:21AM +0800, Yumei Huang wrote:  
> > > > > > > > Signed-off-by: Yumei Huang <yuhuang@redhat.com>  
> > > > [snip]  
> > > > > > > > +     if (total_read == buf_size) {
> > > > > > > > +             warn_perror("File %s truncated, buffer too small", path);
> > > > > > > > +             return -2;
> > > > > > > > +     }
> > > > > > > > +
> > > > > > > > +     buf[total_read] = '\0';
> > > > > > > > +
> > > > > > > > +     return (int)total_read;  
> > > > > > >
> > > > > > > Probably makes more sense for total_read and the return type to be ssize_t.  
> > > > > >
> > > > > > Just tried to be consistent with write_file().  I can change it to
> > > > > > ssize_t if needed.  
> > > > >
> > > > > ssize_t is the type designed for this, if write_file() has it wrong (I
> > > > > didn't check), we should fix that as well.  
> > > >
> > > > It does, and we should :).  
> > >
> > > Checked write_file(), seems the return value is not the same to
> > > read_file(). It returns 0 or 1 depending on success or failure. So int
> > > works fine here. I will update read_file() only.  
> >
> > Ah, good point.
> >
> >
> > [snip]  
> > > > > Rationale (added to my further list for CONTRIBUTING.md):
> > > > >
> > > > >   https://hisham.hm/2018/06/16/when-listing-repeated-things-make-pyramids/
> > > > >
> > > > > and see also https://lwn.net/Articles/758552/.  
> > > >
> > > > If you want to update CONTRIBUTING.md to cover this, Yumei, that would
> > > > be much appreciated.  
> > >
> > > Sure, will do it.  
> >
> > Thanks.
> >
> > [snip]  
> > > > > > > > diff --git a/util.h b/util.h
> > > > > > > > index 22eaac5..887d795 100644
> > > > > > > > --- a/util.h
> > > > > > > > +++ b/util.h
> > > > > > > > @@ -222,6 +222,8 @@ void pidfile_write(int fd, pid_t pid);
> > > > > > > >  int __daemon(int pidfile_fd, int devnull_fd);
> > > > > > > >  int fls(unsigned long x);
> > > > > > > >  int write_file(const char *path, const char *buf);
> > > > > > > > +int read_file(const char *path, char *buf, size_t buf_size);
> > > > > > > > +intmax_t read_file_integer(const char *path, intmax_t fallback);
> > > > > > > >  int write_all_buf(int fd, const void *buf, size_t len);
> > > > > > > >  int write_remainder(int fd, const struct iovec *iov, size_t iovcnt, size_t skip);
> > > > > > > >  int read_all_buf(int fd, void *buf, size_t len);
> > > > > > > > @@ -249,6 +251,7 @@ static inline const char *af_name(sa_family_t af)
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  #define UINT16_STRLEN                (sizeof("65535"))
> > > > > > > > +#define INTMAX_STRLEN                (sizeof("-9223372036854775808"))  
> > > > > > >
> > > > > > > It's correct for now, and probably for any systems we're likely to run
> > > > > > > on, but I dislike hard-assuming the size of intmax_t here.  I feel
> > > > > > > like there must be a better way to derive the correct string length,
> > > > > > > but I haven't figured out what it is yet :(.  
> > > > > >
> > > > > > How about this:
> > > > > >
> > > > > >      #define INTMAX_STRLEN (sizeof(intmax_t) * 3 + 2)
> > > > > >
> > > > > > Each byte can represent about 2.4 decimal digits as below,
> > > > > > sizeof(intmax_t) * 3 gives us a safe upper bound, +2 for sign and null
> > > > > > terminator.
> > > > > >
> > > > > >   1 bit = log₁₀(2) ≈ 0.30103 decimal digits
> > > > > >   1 byte = 8 bits = 8 × 0.30103 ≈ 2.408 decimal digits  
> > > >
> > > > Works for me.
> > > >  
> > > > > If it's sourced from https://stackoverflow.com/a/10536254 and comment,
> > > > > don't forget to mention that in whatever implementation / commit
> > > > > message.  
> > >
> > > Actually, it's a suggestion from Claude and I double checked the logic and math.

Please mention that the next time instead of making it look like it's
yours.

> > > Maybe I should mention Claude and the logic in the commit message instead?  

Using whatever tool to copy code or descriptions of techniques doesn't
make you exempt from licensing requirements, see:

  https://stackoverflow.com/help/licensing

> > Heh, so Claude got it from that stackoverflow page or one like it,
> > probably.  It's not correctness we're concerned about, but proper
> > attribution.  This is small enough that it's probably not
> > copyrightable, but that would be a concern for larger segments as
> > well.  
> 
> Yeah, I understand. For most of the time, I only asked claude to
> explain the code instead of writing the code directly. For this piece,
> I will add a comment explaining the logic before the define line.

That's not enough. Just to be very clear, I'm not going to merge
anything that violates licensing requirements, no matter if some tool
hides them from you.

-- 
Stefano


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-10-19 10:07 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-16  2:34 [PATCH v4 0/4] Retry SYNs for inbound connections Yumei Huang
2025-10-16  2:34 ` [PATCH v4 1/4] tcp: Rename "retrans" to "retries" Yumei Huang
2025-10-16  2:34 ` [PATCH v4 2/4] util: Introduce read_file() and read_file_integer() function Yumei Huang
2025-10-16  6:30   ` David Gibson
2025-10-16  7:49     ` Yumei Huang
2025-10-16 22:22       ` Stefano Brivio
2025-10-16 23:16         ` David Gibson
2025-10-17  2:11           ` Yumei Huang
2025-10-17  2:29             ` David Gibson
2025-10-17  2:44               ` Yumei Huang
2025-10-19 10:07                 ` Stefano Brivio
2025-10-16  2:34 ` [PATCH v4 3/4] tcp: Resend SYN for inbound connections Yumei Huang
2025-10-16 22:22   ` Stefano Brivio
2025-10-16 23:34     ` David Gibson
2025-10-16 23:49   ` David Gibson
2025-10-16  2:34 ` [PATCH v4 4/4] tcp: Update data retransmission timeout Yumei Huang
2025-10-16 23:59   ` David Gibson

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).