[PATCH 0/2] Retry SYNs for inbound connections

public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed

* [PATCH 0/2] Retry SYNs for inbound connections
@ 2025-09-28  7:29 Yumei Huang
  2025-09-28  7:29 ` [PATCH 1/2] tcp: Rename "retrans" of struct tcp_tap_conn and tcp_tap_transfer Yumei Huang
  2025-09-28  7:29 ` [PATCH 2/2] tcp: Resend SYN for inbound connections Yumei Huang
  0 siblings, 2 replies; 9+ messages in thread
From: Yumei Huang @ 2025-09-28  7:29 UTC (permalink / raw)
  To: passt-dev, sbrivio; +Cc: david, yuhuang

When a client connects, SYN would be sent to guest only once. If the 
guest is not connected or ready at that time, the connection will be
reset in 10s. These patches introduce the SYN retry mechanism similar
to retransmits.

Tested two scenarios:

1) A guest is connected in a few seconds, the client connects
to the guest successfully.

2) No guest connects at all, the client gets "Connection reset" in 80s.

Also ran the testsuits, no regression issue introduced.

Yumei Huang (2):
  tcp: Rename "retrans" of struct tcp_tap_conn and tcp_tap_transfer
  tcp: Resend SYN for inbound connections

 tcp.c      | 21 ++++++++++++++-------
 tcp_conn.h |  8 ++++----
 2 files changed, 18 insertions(+), 11 deletions(-)

-- 
2.47.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] tcp: Rename "retrans" of struct tcp_tap_conn and tcp_tap_transfer
  2025-09-28  7:29 [PATCH 0/2] Retry SYNs for inbound connections Yumei Huang
@ 2025-09-28  7:29 ` Yumei Huang
  2025-09-29  6:06   ` David Gibson
  2025-09-28  7:29 ` [PATCH 2/2] tcp: Resend SYN for inbound connections Yumei Huang
  1 sibling, 1 reply; 9+ messages in thread
From: Yumei Huang @ 2025-09-28  7:29 UTC (permalink / raw)
  To: passt-dev, sbrivio; +Cc: david, yuhuang

Rename "retrans" of struct tcp_tap_conn and tcp_tap_transfer to
"retries", so it can be reused to retry for SYN_TIMEOUT.

Signed-off-by: Yumei Huang <yuhuang@redhat.com>
---
 tcp.c      | 10 +++++-----
 tcp_conn.h |  8 ++++----
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/tcp.c b/tcp.c
index 48b1ef2..21b75a5 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1102,7 +1102,7 @@ static void tcp_update_seqack_from_tap(const struct ctx *c,
 		if (SEQ_LT(seq, conn->seq_to_tap))
 			conn_flag(c, conn, ACK_FROM_TAP_DUE);
 
-		conn->retrans = 0;
+		conn->retries = 0;
 		conn->seq_ack_from_tap = seq;
 	}
 }
@@ -2383,7 +2383,7 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
 		} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
 			flow_dbg(conn, "FIN timeout");
 			tcp_rst(c, conn);
-		} else if (conn->retrans == TCP_MAX_RETRANS) {
+		} else if (conn->retries == TCP_MAX_RETRANS) {
 			flow_dbg(conn, "retransmissions count exceeded");
 			tcp_rst(c, conn);
 		} else {
@@ -2392,7 +2392,7 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
 			if (!conn->wnd_from_tap)
 				conn->wnd_from_tap = 1; /* Zero-window probe */
 
-			conn->retrans++;
+			conn->retries++;
 			if (tcp_rewind_seq(c, conn))
 				return;
 
@@ -3351,7 +3351,7 @@ static int tcp_flow_repair_opt(const struct tcp_tap_conn *conn,
 int tcp_flow_migrate_source(int fd, struct tcp_tap_conn *conn)
 {
 	struct tcp_tap_transfer t = {
-		.retrans		= conn->retrans,
+		.retries		= conn->retries,
 		.ws_from_tap		= conn->ws_from_tap,
 		.ws_to_tap		= conn->ws_to_tap,
 		.events			= conn->events,
@@ -3631,7 +3631,7 @@ int tcp_flow_migrate_target(struct ctx *c, int fd)
 	memcpy(&flow->f.side, &t.side, sizeof(flow->f.side));
 	conn = FLOW_SET_TYPE(flow, FLOW_TCP, tcp);
 
-	conn->retrans			= t.retrans;
+	conn->retries			= t.retries;
 	conn->ws_from_tap		= t.ws_from_tap;
 	conn->ws_to_tap			= t.ws_to_tap;
 	conn->events			= t.events;
diff --git a/tcp_conn.h b/tcp_conn.h
index 38b5c54..885d54a 100644
--- a/tcp_conn.h
+++ b/tcp_conn.h
@@ -13,7 +13,7 @@
  * struct tcp_tap_conn - Descriptor for a TCP connection (not spliced)
  * @f:			Generic flow information
  * @in_epoll:		Is the connection in the epoll set?
- * @retrans:		Number of retransmissions occurred due to ACK_TIMEOUT
+ * @retries:		Number of retries occurred due to ACK_TIMEOUT or SYN_TIMEOUT
  * @ws_from_tap:	Window scaling factor advertised from tap/guest
  * @ws_to_tap:		Window scaling factor advertised to tap/guest
  * @tap_mss:		MSS advertised by tap/guest, rounded to 2 ^ TCP_MSS_BITS
@@ -39,7 +39,7 @@ struct tcp_tap_conn {
 	bool		in_epoll	:1;
 
 #define TCP_RETRANS_BITS		3
-	unsigned int	retrans		:TCP_RETRANS_BITS;
+	unsigned int	retries		:TCP_RETRANS_BITS;
 #define TCP_MAX_RETRANS			MAX_FROM_BITS(TCP_RETRANS_BITS)
 
 #define TCP_WS_BITS			4	/* RFC 7323 */
@@ -102,7 +102,7 @@ struct tcp_tap_conn {
  * struct tcp_tap_transfer - Migrated TCP data, flow table part, network order
  * @pif:		Interfaces for each side of the flow
  * @side:		Addresses and ports for each side of the flow
- * @retrans:		Number of retransmissions occurred due to ACK_TIMEOUT
+ * @retries:		Number of retries occurred due to ACK_TIMEOUT or SYN_TIMEOUT
  * @ws_from_tap:	Window scaling factor advertised from tap/guest
  * @ws_to_tap:		Window scaling factor advertised to tap/guest
  * @events:		Connection events, implying connection states
@@ -122,7 +122,7 @@ struct tcp_tap_transfer {
 	uint8_t		pif[SIDES];
 	struct flowside	side[SIDES];
 
-	uint8_t		retrans;
+	uint8_t		retries;
 	uint8_t		ws_from_tap;
 	uint8_t		ws_to_tap;
 	uint8_t		events;
-- 
2.47.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2] tcp: Resend SYN for inbound connections
  2025-09-28  7:29 [PATCH 0/2] Retry SYNs for inbound connections Yumei Huang
  2025-09-28  7:29 ` [PATCH 1/2] tcp: Rename "retrans" of struct tcp_tap_conn and tcp_tap_transfer Yumei Huang
@ 2025-09-28  7:29 ` Yumei Huang
  2025-09-29  6:25   ` David Gibson
  2025-09-29 22:23   ` Stefano Brivio
  1 sibling, 2 replies; 9+ messages in thread
From: Yumei Huang @ 2025-09-28  7:29 UTC (permalink / raw)
  To: passt-dev, sbrivio; +Cc: david, yuhuang

If a client connects while guest is not connected or ready yet,
resend SYN instead of just resetting connection after SYN_TIMEOUT.

Signed-off-by: Yumei Huang <yuhuang@redhat.com>
---
 tcp.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/tcp.c b/tcp.c
index 21b75a5..6fe8678 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2378,8 +2378,15 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
 		tcp_timer_ctl(c, conn);
 	} else if (conn->flags & ACK_FROM_TAP_DUE) {
 		if (!(conn->events & ESTABLISHED)) {
-			flow_dbg(conn, "handshake timeout");
-			tcp_rst(c, conn);
+			if (conn->retries == TCP_MAX_RETRANS){
+				flow_dbg(conn, "handshake timeout");
+				tcp_rst(c, conn);
+			} else {
+				flow_dbg(conn, "SYN timeout, retry");
+				tcp_send_flag(c, conn, SYN);
+				conn->retries++;
+				tcp_timer_ctl(c, conn);
+			}
 		} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
 			flow_dbg(conn, "FIN timeout");
 			tcp_rst(c, conn);
-- 
2.47.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] tcp: Rename "retrans" of struct tcp_tap_conn and tcp_tap_transfer
  2025-09-28  7:29 ` [PATCH 1/2] tcp: Rename "retrans" of struct tcp_tap_conn and tcp_tap_transfer Yumei Huang
@ 2025-09-29  6:06   ` David Gibson
  0 siblings, 0 replies; 9+ messages in thread
From: David Gibson @ 2025-09-29  6:06 UTC (permalink / raw)
  To: Yumei Huang; +Cc: passt-dev, sbrivio

[-- Attachment #1: Type: text/plain, Size: 1091 bytes --]

On Sun, Sep 28, 2025 at 03:29:45PM +0800, Yumei Huang wrote:
> Rename "retrans" of struct tcp_tap_conn and tcp_tap_transfer to
> "retries", so it can be reused to retry for SYN_TIMEOUT.
> 
> Signed-off-by: Yumei Huang <yuhuang@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

One nit,

[snip]
> diff --git a/tcp_conn.h b/tcp_conn.h
> index 38b5c54..885d54a 100644
> --- a/tcp_conn.h
> +++ b/tcp_conn.h
> @@ -13,7 +13,7 @@
>   * struct tcp_tap_conn - Descriptor for a TCP connection (not spliced)
>   * @f:			Generic flow information
>   * @in_epoll:		Is the connection in the epoll set?
> - * @retrans:		Number of retransmissions occurred due to ACK_TIMEOUT
> + * @retries:		Number of retries occurred due to ACK_TIMEOUT or SYN_TIMEOUT

This comment now goes past column 80.  I'd suggest shortening to
"Number of retries occurred due to timeouts".

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] tcp: Resend SYN for inbound connections
  2025-09-28  7:29 ` [PATCH 2/2] tcp: Resend SYN for inbound connections Yumei Huang
@ 2025-09-29  6:25   ` David Gibson
  2025-09-29 22:23     ` Stefano Brivio
  2025-09-29 22:23   ` Stefano Brivio
  1 sibling, 1 reply; 9+ messages in thread
From: David Gibson @ 2025-09-29  6:25 UTC (permalink / raw)
  To: Yumei Huang; +Cc: passt-dev, sbrivio

[-- Attachment #1: Type: text/plain, Size: 2441 bytes --]

On Sun, Sep 28, 2025 at 03:29:46PM +0800, Yumei Huang wrote:
> If a client connects while guest is not connected or ready yet,
> resend SYN instead of just resetting connection after SYN_TIMEOUT.
> 
> Signed-off-by: Yumei Huang <yuhuang@redhat.com>

Simpler than I thought.  Nice.

However, I think now that we're retrying we probably want to adjust
SYN_TIMEOUT.  I suspect the 10s was a generous amount to mitigate the
fact we didn't retry.  However, AFAICT most OSes resend SYNs faster
than that (after 1-3s initially).  They also typically slow down the
resents on subsequent retries.  I'm not sure if that last is important
in our case - since we're talking directly to a guest, we're unlikely
to flood the link this way.

In fact, I haven't read closely enough to be sure, but there was some
language in RFC 6298 and RFC 1122 that suggested to me maybe we should
be using the same backoff calculation for SYN retries as for regular
retransmits.  Which as a bonus might simplify our logic a little bit.

Documentation/networking/ip-sysctl.rst has some information on how
Linux handles this (tcp_syn_retries and tcp_syn_linear_timeouts in
particular).  I guess we could configure ourselves to match the host's
settings - we do something similar to determine what we consider
ephemeral ports.

Stefano, thoughts?

> ---
>  tcp.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/tcp.c b/tcp.c
> index 21b75a5..6fe8678 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -2378,8 +2378,15 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
>  		tcp_timer_ctl(c, conn);
>  	} else if (conn->flags & ACK_FROM_TAP_DUE) {
>  		if (!(conn->events & ESTABLISHED)) {
> -			flow_dbg(conn, "handshake timeout");
> -			tcp_rst(c, conn);
> +			if (conn->retries == TCP_MAX_RETRANS){
> +				flow_dbg(conn, "handshake timeout");
> +				tcp_rst(c, conn);
> +			} else {
> +				flow_dbg(conn, "SYN timeout, retry");
> +				tcp_send_flag(c, conn, SYN);
> +				conn->retries++;
> +				tcp_timer_ctl(c, conn);
> +			}
>  		} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
>  			flow_dbg(conn, "FIN timeout");
>  			tcp_rst(c, conn);
> -- 
> 2.47.0
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] tcp: Resend SYN for inbound connections
  2025-09-28  7:29 ` [PATCH 2/2] tcp: Resend SYN for inbound connections Yumei Huang
  2025-09-29  6:25   ` David Gibson
@ 2025-09-29 22:23   ` Stefano Brivio
  1 sibling, 0 replies; 9+ messages in thread
From: Stefano Brivio @ 2025-09-29 22:23 UTC (permalink / raw)
  To: Yumei Huang; +Cc: passt-dev, david

On Sun, 28 Sep 2025 15:29:46 +0800
Yumei Huang <yuhuang@redhat.com> wrote:

> If a client connects while guest is not connected or ready yet,
> resend SYN instead of just resetting connection after SYN_TIMEOUT.
> 
> Signed-off-by: Yumei Huang <yuhuang@redhat.com>
> ---
>  tcp.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/tcp.c b/tcp.c
> index 21b75a5..6fe8678 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -2378,8 +2378,15 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
>  		tcp_timer_ctl(c, conn);
>  	} else if (conn->flags & ACK_FROM_TAP_DUE) {
>  		if (!(conn->events & ESTABLISHED)) {
> -			flow_dbg(conn, "handshake timeout");
> -			tcp_rst(c, conn);
> +			if (conn->retries == TCP_MAX_RETRANS){

Nit: missing whitespace between ) and {.

Now that you use conn->retries for this purpose as well, I guess you
should either rename TCP_MAX_RETRANS to TCP_MAX_RETRIES, or use another
constant.

This also depends on the timing behaviour we want to have here, see the
other part of the thread. In any case, you should update the "Theory of
Operation" documentation at the top of tcp.c, see the "Aging and timeout"
section.

> +				flow_dbg(conn, "handshake timeout");
> +				tcp_rst(c, conn);
> +			} else {
> +				flow_dbg(conn, "SYN timeout, retry");
> +				tcp_send_flag(c, conn, SYN);
> +				conn->retries++;

You should reset this to zero once the connection is established,
otherwise we might start the connection with a value of conn->retries
that indicates that we already retransmitted data a number of times,
even if we didn't.

> +				tcp_timer_ctl(c, conn);
> +			}
>  		} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
>  			flow_dbg(conn, "FIN timeout");
>  			tcp_rst(c, conn);

-- 
Stefano


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] tcp: Resend SYN for inbound connections
  2025-09-29  6:25   ` David Gibson
@ 2025-09-29 22:23     ` Stefano Brivio
  2025-09-30  1:05       ` David Gibson
  0 siblings, 1 reply; 9+ messages in thread
From: Stefano Brivio @ 2025-09-29 22:23 UTC (permalink / raw)
  To: David Gibson; +Cc: Yumei Huang, passt-dev

On Mon, 29 Sep 2025 16:25:39 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Sun, Sep 28, 2025 at 03:29:46PM +0800, Yumei Huang wrote:
> > If a client connects while guest is not connected or ready yet,
> > resend SYN instead of just resetting connection after SYN_TIMEOUT.
> > 
> > Signed-off-by: Yumei Huang <yuhuang@redhat.com>  
> 
> Simpler than I thought.  Nice.
> 
> However, I think now that we're retrying we probably want to adjust
> SYN_TIMEOUT.  I suspect the 10s was a generous amount to mitigate the
> fact we didn't retry.

That was the idea, yes.

> However, AFAICT most OSes resend SYNs faster than that (after 1-3s
> initially).

Right. Remember all the examples from Linux with one-second retries
I wanted to show? :)

> They also typically slow down the
> resents on subsequent retries.  I'm not sure if that last is important
> in our case - since we're talking directly to a guest, we're unlikely
> to flood the link this way.

Exponential back-off (or whatever is used by other implementations)
doesn't only serve the purpose of avoiding to flood the link. It's also
about functionality itself.

That is, if you waited one second, and you didn't get any reply, that's
a good indication that you might not get a reply in one second from
now, because the peer might need a little bit longer.

> In fact, I haven't read closely enough to be sure, but there was some
> language in RFC 6298 and RFC 1122 that suggested to me maybe we should
> be using the same backoff calculation for SYN retries as for regular
> retransmits.  Which as a bonus might simplify our logic a little bit.

Somewhat surprisingly, RFC 9293 doesn't say anything about this. :(

And while I'm fairly sure that RFC 2988 was intended to only cover
*data* retransmissions, RFC 6298 (which updates it) seems to simply
assume, in section 5., that it's also about SYN segments:

   (5.7) If the timer expires awaiting the ACK of a SYN segment and the
         TCP implementation is using an RTO less than 3 seconds, the RTO
         MUST be re-initialized to 3 seconds when data transmission
         begins (i.e., after the three-way handshake completes).

so, yes, I tend to agree with this. Let's just use the same logic.

Just note that it's an approximation of RFC 6298, in any case, because
we don't implement RTT measurements.

It's a rather complicated implementation that I originally decided to
skip because there's no actual data transmission between us and
guest/container, so there isn't much that can go wrong. Maybe we could
even assume that the RTT is zero.

As a result of that, we can't implement RFC 2988 / RFC 6298 exactly as
it is. But we can get quite close to it.

> Documentation/networking/ip-sysctl.rst

...in the unlikely case it's not clear: David means a Linux kernel tree
here. Yumei, it might be a good idea to have a kernel tree (maybe
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git) at
hand to check this kind of stuff.

> has some information on how
> Linux handles this (tcp_syn_retries and tcp_syn_linear_timeouts in
> particular).  I guess we could configure ourselves to match the host's
> settings - we do something similar to determine what we consider
> ephemeral ports.
> 
> Stefano, thoughts?

Yes, I think reading those values from the host makes sense because the
same thing that would happen if the host connects to the guest or
container with another implementation (veth, or tap).

It's also the least surprising behaviour for any kind of application
that was previously running outside guest or containers and now it's
moved there.

We have just 3 bits here, so we can only try 9 times (retry 8 times),
but other than that we can use tcp_syn_retries and
tcp_syn_linear_timeouts as they are.

Summing up, I would propose that we do something like this:

1. (optional, and might be in another series, but keeping it together
   with the rest might be more convenient): read tcp_syn_retries, limit
   to 8, and also read tcp_syn_linear_timeouts

2. always use one second as initial RTO (retransmission timeout). That's
   what the kernel does (even though rto_initial should be 3 seconds by
   default...? I'm not sure)

3. for SYN, implement the tcp_syn_linear_timeouts thing. That is,
   in tcp_timer_ctl(), use this timeout:

	if (conn->retries < c->tcp_ctx.syn_linear_timeouts)
		[one second]
	else
		[1 << conn->retries]

4. do the same for data retransmissions, but without the
   tcp_syn_linear_timeouts thing, start from one second (see Appendix A
   in RFC 6298)... and maybe limit it to 60 seconds as allowed by the
   same RFC:

   (2.5) A maximum value MAY be placed on RTO provided it is at least 60
         seconds.

   ? Retransmitting data after 256 seconds doesn't make a lot of sense
   to me.

It shouldn't be much more complicated than the current patch, it just
involves some extra changes in tcp_timer_ctl(), I guess.

-- 
Stefano

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] tcp: Resend SYN for inbound connections
  2025-09-29 22:23     ` Stefano Brivio
@ 2025-09-30  1:05       ` David Gibson
  2025-09-30  6:04         ` Yumei Huang
  0 siblings, 1 reply; 9+ messages in thread
From: David Gibson @ 2025-09-30  1:05 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Yumei Huang, passt-dev

[-- Attachment #1: Type: text/plain, Size: 6180 bytes --]

On Tue, Sep 30, 2025 at 12:23:35AM +0200, Stefano Brivio wrote:
> On Mon, 29 Sep 2025 16:25:39 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Sun, Sep 28, 2025 at 03:29:46PM +0800, Yumei Huang wrote:
> > > If a client connects while guest is not connected or ready yet,
> > > resend SYN instead of just resetting connection after SYN_TIMEOUT.
> > > 
> > > Signed-off-by: Yumei Huang <yuhuang@redhat.com>  
> > 
> > Simpler than I thought.  Nice.
> > 
> > However, I think now that we're retrying we probably want to adjust
> > SYN_TIMEOUT.  I suspect the 10s was a generous amount to mitigate the
> > fact we didn't retry.
> 
> That was the idea, yes.
> 
> > However, AFAICT most OSes resend SYNs faster than that (after 1-3s
> > initially).
> 
> Right. Remember all the examples from Linux with one-second retries
> I wanted to show? :)
> 
> > They also typically slow down the
> > resents on subsequent retries.  I'm not sure if that last is important
> > in our case - since we're talking directly to a guest, we're unlikely
> > to flood the link this way.
> 
> Exponential back-off (or whatever is used by other implementations)
> doesn't only serve the purpose of avoiding to flood the link. It's also
> about functionality itself.
> 
> That is, if you waited one second, and you didn't get any reply, that's
> a good indication that you might not get a reply in one second from
> now, because the peer might need a little bit longer.
> 
> > In fact, I haven't read closely enough to be sure, but there was some
> > language in RFC 6298 and RFC 1122 that suggested to me maybe we should
> > be using the same backoff calculation for SYN retries as for regular
> > retransmits.  Which as a bonus might simplify our logic a little bit.
> 
> Somewhat surprisingly, RFC 9293 doesn't say anything about this. :(

Right, it discusses RTO, and never explicitly talks about SYN resends,
kind of implying that SYN resends should use the same RTO calculations
as data retransmits.

> And while I'm fairly sure that RFC 2988 was intended to only cover
> *data* retransmissions, RFC 6298 (which updates it) seems to simply
> assume, in section 5., that it's also about SYN segments:
> 
>    (5.7) If the timer expires awaiting the ACK of a SYN segment and the
>          TCP implementation is using an RTO less than 3 seconds, the RTO
>          MUST be re-initialized to 3 seconds when data transmission
>          begins (i.e., after the three-way handshake completes).
> 
> so, yes, I tend to agree with this. Let's just use the same logic.
> 
> Just note that it's an approximation of RFC 6298, in any case, because
> we don't implement RTT measurements.
> 
> It's a rather complicated implementation that I originally decided to
> skip because there's no actual data transmission between us and
> guest/container, so there isn't much that can go wrong. Maybe we could
> even assume that the RTT is zero.
> 
> As a result of that, we can't implement RFC 2988 / RFC 6298 exactly as
> it is. But we can get quite close to it.
> 
> > Documentation/networking/ip-sysctl.rst
> 
> ...in the unlikely case it's not clear: David means a Linux kernel tree
> here. Yumei, it might be a good idea to have a kernel tree (maybe
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git) at
> hand to check this kind of stuff.
> 
> > has some information on how
> > Linux handles this (tcp_syn_retries and tcp_syn_linear_timeouts in
> > particular).  I guess we could configure ourselves to match the host's
> > settings - we do something similar to determine what we consider
> > ephemeral ports.
> > 
> > Stefano, thoughts?
> 
> Yes, I think reading those values from the host makes sense because the
> same thing that would happen if the host connects to the guest or
> container with another implementation (veth, or tap).

I think taking tcp_syn_retries from the host makes sense.  I'm a bit
less sure about tcp_syn_linear_timeouts, since that requires
implementing the more complex and specific linear backoff behaviour.

> It's also the least surprising behaviour for any kind of application
> that was previously running outside guest or containers and now it's
> moved there.

Well, probably the least surprising we can achieve.  It's not possible
for us to match the peer's SYN retry behaviour if it's not Linux or
has different settings to the host.

> We have just 3 bits here, so we can only try 9 times (retry 8 times),
> but other than that we can use tcp_syn_retries and
> tcp_syn_linear_timeouts as they are.
> 
> Summing up, I would propose that we do something like this:
> 
> 1. (optional, and might be in another series, but keeping it together
>    with the rest might be more convenient): read tcp_syn_retries, limit
>    to 8, and also read tcp_syn_linear_timeouts
> 
> 2. always use one second as initial RTO (retransmission timeout). That's
>    what the kernel does (even though rto_initial should be 3 seconds by
>    default...? I'm not sure)
> 
> 3. for SYN, implement the tcp_syn_linear_timeouts thing. That is,
>    in tcp_timer_ctl(), use this timeout:
> 
> 	if (conn->retries < c->tcp_ctx.syn_linear_timeouts)
> 		[one second]
> 	else
> 		[1 << conn->retries]

1 << conn->retries, or 1 << (conn->retries - syn_linear_timeouts) ?

> 4. do the same for data retransmissions, but without the
>    tcp_syn_linear_timeouts thing, start from one second (see Appendix A
>    in RFC 6298)... and maybe limit it to 60 seconds as allowed by the
>    same RFC:
> 
>    (2.5) A maximum value MAY be placed on RTO provided it is at least 60
>          seconds.
> 
>    ? Retransmitting data after 256 seconds doesn't make a lot of sense
>    to me.
> 
> It shouldn't be much more complicated than the current patch, it just
> involves some extra changes in tcp_timer_ctl(), I guess.
> 
> -- 
> Stefano
> 

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] tcp: Resend SYN for inbound connections
  2025-09-30  1:05       ` David Gibson
@ 2025-09-30  6:04         ` Yumei Huang
  0 siblings, 0 replies; 9+ messages in thread
From: Yumei Huang @ 2025-09-30  6:04 UTC (permalink / raw)
  To: David Gibson, Stefano Brivio; +Cc: passt-dev

Thank you both for the comments!

On Tue, Sep 30, 2025 at 9:06 AM David Gibson
<david@gibson.dropbear.id.au> wrote:
>
> On Tue, Sep 30, 2025 at 12:23:35AM +0200, Stefano Brivio wrote:
> > On Mon, 29 Sep 2025 16:25:39 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >
> > > On Sun, Sep 28, 2025 at 03:29:46PM +0800, Yumei Huang wrote:
> > > > If a client connects while guest is not connected or ready yet,
> > > > resend SYN instead of just resetting connection after SYN_TIMEOUT.
> > > >
> > > > Signed-off-by: Yumei Huang <yuhuang@redhat.com>
> > >
> > > Simpler than I thought.  Nice.
> > >
> > > However, I think now that we're retrying we probably want to adjust
> > > SYN_TIMEOUT.  I suspect the 10s was a generous amount to mitigate the
> > > fact we didn't retry.
> >
> > That was the idea, yes.
> >
> > > However, AFAICT most OSes resend SYNs faster than that (after 1-3s
> > > initially).
> >
> > Right. Remember all the examples from Linux with one-second retries
> > I wanted to show? :)
> >
> > > They also typically slow down the
> > > resents on subsequent retries.  I'm not sure if that last is important
> > > in our case - since we're talking directly to a guest, we're unlikely
> > > to flood the link this way.
> >
> > Exponential back-off (or whatever is used by other implementations)
> > doesn't only serve the purpose of avoiding to flood the link. It's also
> > about functionality itself.
> >
> > That is, if you waited one second, and you didn't get any reply, that's
> > a good indication that you might not get a reply in one second from
> > now, because the peer might need a little bit longer.
> >
> > > In fact, I haven't read closely enough to be sure, but there was some
> > > language in RFC 6298 and RFC 1122 that suggested to me maybe we should
> > > be using the same backoff calculation for SYN retries as for regular
> > > retransmits.  Which as a bonus might simplify our logic a little bit.
> >
> > Somewhat surprisingly, RFC 9293 doesn't say anything about this. :(
>
> Right, it discusses RTO, and never explicitly talks about SYN resends,
> kind of implying that SYN resends should use the same RTO calculations
> as data retransmits.
>
> > And while I'm fairly sure that RFC 2988 was intended to only cover
> > *data* retransmissions, RFC 6298 (which updates it) seems to simply
> > assume, in section 5., that it's also about SYN segments:
> >
> >    (5.7) If the timer expires awaiting the ACK of a SYN segment and the
> >          TCP implementation is using an RTO less than 3 seconds, the RTO
> >          MUST be re-initialized to 3 seconds when data transmission
> >          begins (i.e., after the three-way handshake completes).
> >
> > so, yes, I tend to agree with this. Let's just use the same logic.
> >
> > Just note that it's an approximation of RFC 6298, in any case, because
> > we don't implement RTT measurements.
> >
> > It's a rather complicated implementation that I originally decided to
> > skip because there's no actual data transmission between us and
> > guest/container, so there isn't much that can go wrong. Maybe we could
> > even assume that the RTT is zero.
> >
> > As a result of that, we can't implement RFC 2988 / RFC 6298 exactly as
> > it is. But we can get quite close to it.
> >
> > > Documentation/networking/ip-sysctl.rst
> >
> > ...in the unlikely case it's not clear: David means a Linux kernel tree
> > here. Yumei, it might be a good idea to have a kernel tree (maybe
> > git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git) at
> > hand to check this kind of stuff.

I found this through Google,
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.rst.
Having a repo at hand would be great too!

> >
> > > has some information on how
> > > Linux handles this (tcp_syn_retries and tcp_syn_linear_timeouts in
> > > particular).  I guess we could configure ourselves to match the host's
> > > settings - we do something similar to determine what we consider
> > > ephemeral ports.
> > >
> > > Stefano, thoughts?
> >
> > Yes, I think reading those values from the host makes sense because the
> > same thing that would happen if the host connects to the guest or
> > container with another implementation (veth, or tap).
>
> I think taking tcp_syn_retries from the host makes sense.  I'm a bit
> less sure about tcp_syn_linear_timeouts, since that requires
> implementing the more complex and specific linear backoff behaviour.
>
> > It's also the least surprising behaviour for any kind of application
> > that was previously running outside guest or containers and now it's
> > moved there.
>
> Well, probably the least surprising we can achieve.  It's not possible
> for us to match the peer's SYN retry behaviour if it's not Linux or
> has different settings to the host.
>
> > We have just 3 bits here, so we can only try 9 times (retry 8 times),
> > but other than that we can use tcp_syn_retries and
> > tcp_syn_linear_timeouts as they are.
> >
> > Summing up, I would propose that we do something like this:
> >
> > 1. (optional, and might be in another series, but keeping it together
> >    with the rest might be more convenient): read tcp_syn_retries, limit
> >    to 8, and also read tcp_syn_linear_timeouts
> >
> > 2. always use one second as initial RTO (retransmission timeout). That's
> >    what the kernel does (even though rto_initial should be 3 seconds by
> >    default...? I'm not sure)
> >
> > 3. for SYN, implement the tcp_syn_linear_timeouts thing. That is,
> >    in tcp_timer_ctl(), use this timeout:
> >
> >       if (conn->retries < c->tcp_ctx.syn_linear_timeouts)
> >               [one second]
> >       else
> >               [1 << conn->retries]
>
> 1 << conn->retries, or 1 << (conn->retries - syn_linear_timeouts) ?

I think it should be the latter.

>
> > 4. do the same for data retransmissions, but without the
> >    tcp_syn_linear_timeouts thing, start from one second (see Appendix A
> >    in RFC 6298)... and maybe limit it to 60 seconds as allowed by the
> >    same RFC:
> >
> >    (2.5) A maximum value MAY be placed on RTO provided it is at least 60
> >          seconds.
> >
> >    ? Retransmitting data after 256 seconds doesn't make a lot of sense
> >    to me.
> >
> > It shouldn't be much more complicated than the current patch, it just
> > involves some extra changes in tcp_timer_ctl(), I guess.

I'm afraid I won't be able to finish it today. Will work on this after
I come back from holiday.

> >
> > --
> > Stefano
> >
>
> --
> David Gibson (he or they)       | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you, not the other way
>                                 | around.
> http://www.ozlabs.org/~dgibson


-- 
Thanks,

Yumei Huang


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-09-30  6:04 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-28  7:29 [PATCH 0/2] Retry SYNs for inbound connections Yumei Huang
2025-09-28  7:29 ` [PATCH 1/2] tcp: Rename "retrans" of struct tcp_tap_conn and tcp_tap_transfer Yumei Huang
2025-09-29  6:06   ` David Gibson
2025-09-28  7:29 ` [PATCH 2/2] tcp: Resend SYN for inbound connections Yumei Huang
2025-09-29  6:25   ` David Gibson
2025-09-29 22:23     ` Stefano Brivio
2025-09-30  1:05       ` David Gibson
2025-09-30  6:04         ` Yumei Huang
2025-09-29 22:23   ` Stefano Brivio

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).