From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=ipagpYgp; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id A697B5A061B for ; Tue, 04 Nov 2025 05:42:39 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762231358; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pOJ+zW8DZVhFN6yZAdpBDXhchJbwtyAuRxjomjkCLsQ=; b=ipagpYgpCIfVJed8OY3lNhreOd6NWfhu46vw3a8vaGWvahKMmxl+2OKkwPdPl6ZFNQ0Fdp 7PJj9GPq5KtvvCYAfKgUh0hE7hOuUoy6+MiqdOuSpLISH4a9/uYc3DfDLlarRBm1gA6XCC TLPglgv0uqveu1ajj+/MCN6Rbdzhhts= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-592-hT8fopohPNWV-tcQZkkSwA-1; Mon, 03 Nov 2025 23:42:37 -0500 X-MC-Unique: hT8fopohPNWV-tcQZkkSwA-1 X-Mimecast-MFC-AGG-ID: hT8fopohPNWV-tcQZkkSwA_1762231356 Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-429c5da68e5so2579894f8f.1 for ; Mon, 03 Nov 2025 20:42:37 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762231356; x=1762836156; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=pOJ+zW8DZVhFN6yZAdpBDXhchJbwtyAuRxjomjkCLsQ=; b=v1X7q3UfZkkDI54NLl0NMdtuTTu+4olSVzdobLhpBKfVnwWNqrOdx/49PuTsqmzpo4 czixytD2zJUirLZlteMQJSVkUag9jyHt1RQZ0xTaMsWoXKn+w+YF39FxMK6HJjpoVOxL vdhsFdEf9yhallURFWdKSdUjkDetawMjKlDe5aTEHW+bAzRqOpyxwqBATBrOZDUMoQIw jUJOFcUilwmNlQt8+RbPLK1boq6z4R+kGdbWe9LQd/HVF6/NFDhmlW4phTwYxU2s6WKC ZfCVbwUAF+3JEFQhLPxEOPXp/0urXTSIUQ9BXeHZnAXUDQTPNbcoEShQW7N7NGC0R5DP uhDQ== X-Gm-Message-State: AOJu0Yw5fMGN9e/TWoUNEohC+KHts3M21Y9AXD71xr/4goexaqI/jdNt imu/Ykd5NsZPBuePzH5K6Qiq10t3z4+87pzTx4DtZ2qLl7yyPIxpM/sCgtJJ1fvJWnCttHQEX8K z79C2gxTcDE9xX+d5IvfXfQcVrs8VWBLR4bffmRkcVTkeNqeCvOuj3w== X-Gm-Gg: ASbGncuRTcWQBBAAv3rRnp769Knlgd5LN+SYRkYVryo+rqhtflk3e4/CNAaI1eWYJmv qvZWg9Yk3GFPLk0DBKfL1dXCSGW/blj6SDGMmcyYy0dy2d7ZodCk8morULjHJTGbDe7K3JeBMrt FZx4yK4TDCm5oGRWcUZQCVAi1kcGTbkgykleoQSdgYugUQMQyp4WY/rzzKUH0XCLe1hWnmVCvTq xsy3M8K8nlOYY/+OtzfbwCLCyfzu5461TWdiM71YgWpwGxUNVDw9+XoECOPQDxnXP9xsGqC7RTM 9SW+HBhyoB61Yzq+rftECRtQWeyhAqGRTBjsqPvd/ko6GZxXq2i42o21kPe+qOFXSrOdZHjfSyI mqDeTZoPqXZdXrLkSkLA14AyRLLM= X-Received: by 2002:a05:6000:430e:b0:429:cacf:1065 with SMTP id ffacd0b85a97d-429dbce08c1mr1587908f8f.17.1762231355797; Mon, 03 Nov 2025 20:42:35 -0800 (PST) X-Google-Smtp-Source: AGHT+IG/7IF9VVJiZ7W6wv1JUdXKlL/mwgRWg7yepp/xBqql3SVoeteztxhrS5rkjjzV3tnQiLw1Sg== X-Received: by 2002:a05:6000:430e:b0:429:cacf:1065 with SMTP id ffacd0b85a97d-429dbce08c1mr1587892f8f.17.1762231355349; Mon, 03 Nov 2025 20:42:35 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-429dc1f9cdbsm2194025f8f.34.2025.11.03.20.42.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Nov 2025 20:42:34 -0800 (PST) Date: Tue, 4 Nov 2025 05:42:33 +0100 From: Stefano Brivio To: Yumei Huang Subject: Re: [PATCH v7 3/5] tcp: Resend SYN for inbound connections Message-ID: <20251104054233.1dec4eb6@elisabeth> In-Reply-To: <20251031054242.7334-4-yuhuang@redhat.com> References: <20251031054242.7334-1-yuhuang@redhat.com> <20251031054242.7334-4-yuhuang@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: yP9Au1LunwhjbFjAvlCrgaUHd3GnJ0ykDBBqQ3a9zno_1762231356 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: ZYFH5L2CN46WMA7J4K2K2C3ILOODEDMH X-Message-ID-Hash: ZYFH5L2CN46WMA7J4K2K2C3ILOODEDMH X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, david@gibson.dropbear.id.au X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, 31 Oct 2025 13:42:40 +0800 Yumei Huang wrote: > If a client connects while guest is not connected or ready yet, > resend SYN instead of just resetting connection after 10 seconds. > > Use the same backoff calculation for the timeout as Linux kernel. > > Link: https://bugs.passt.top/show_bug.cgi?id=153 > Signed-off-by: Yumei Huang > --- > tcp.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++-------- > tcp.h | 4 ++++ > 2 files changed, 53 insertions(+), 8 deletions(-) > > diff --git a/tcp.c b/tcp.c > index 2ec4b0c..bada88a 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -179,9 +179,11 @@ > * > * Timeouts are implemented by means of timerfd timers, set based on flags: > * > - * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshake (flag > - * ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, reset the > - * connection > + * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake > + * (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend > + * SYN. It's the starting timeout for the first SYN retry. Retry for > + * TCP_MAX_RETRIES or (tcp_syn_retries + tcp_syn_linear_timeouts) times, > + * reset the connection > * > * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending > * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the > @@ -340,7 +342,7 @@ enum { > #define WINDOW_DEFAULT 14600 /* RFC 6928 */ > > #define ACK_INTERVAL 10 /* ms */ > -#define SYN_TIMEOUT 10 /* s */ > +#define SYN_TIMEOUT_INIT 1 /* s, RFC 6928 */ > #define ACK_TIMEOUT 2 > #define FIN_TIMEOUT 60 > #define ACT_TIMEOUT 7200 > @@ -365,6 +367,12 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX]; > > #define TCP_MIGRATE_RESTORE_CHUNK_MIN 1024 /* Try smaller when above this */ > > +#define TCP_SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retries" > +#define TCP_SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linear_timeouts" > + > +#define TCP_SYN_RETRIES_DEFAULT 6 > +#define TCP_SYN_LINEAR_TIMEOUTS_DEFAULT 4 > + > /* "Extended" data (not stored in the flow table) for TCP flow migration */ > static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX]; > > @@ -581,8 +589,10 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) > if (conn->flags & ACK_TO_TAP_DUE) { > it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000; > } else if (conn->flags & ACK_FROM_TAP_DUE) { > - if (!(conn->events & ESTABLISHED)) > - it.it_value.tv_sec = SYN_TIMEOUT; > + if (!(conn->events & ESTABLISHED)) { > + int exp = conn->retries - c->tcp.syn_linear_timeouts; > + it.it_value.tv_sec = SYN_TIMEOUT_INIT << MAX(exp, 0); > + } > else > it.it_value.tv_sec = ACK_TIMEOUT; > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > @@ -2409,8 +2419,17 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) > tcp_timer_ctl(c, conn); > } else if (conn->flags & ACK_FROM_TAP_DUE) { > if (!(conn->events & ESTABLISHED)) { > - flow_dbg(conn, "handshake timeout"); > - tcp_rst(c, conn); > + if (conn->retries >= TCP_MAX_RETRIES || > + conn->retries >= (c->tcp.tcp_syn_retries + > + c->tcp.syn_linear_timeouts)) { This: int max; max = c->tcp.syn_retries + c->tcp.syn_linear_timeouts; max = MIN(TCP_MAX_RETRIES, max); if (conn->retries >= max) { is one more line but a bit easier to understand. Not a strong preference on my side though. > + flow_dbg(conn, "handshake timeout"); > + tcp_rst(c, conn); > + } else { > + flow_trace(conn, "SYN timeout, retry"); > + tcp_send_flag(c, conn, SYN); > + conn->retries++; > + tcp_timer_ctl(c, conn); > + } > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > flow_dbg(conn, "FIN timeout"); > tcp_rst(c, conn); > @@ -2766,6 +2785,26 @@ static socklen_t tcp_probe_tcp_info(void) > return sl; > } > > +/** > + * tcp_get_rto_params() - Get host kernel RTO parameters > + * @c: Execution context > +*/ > +void tcp_get_rto_params(struct ctx *c) > +{ > + intmax_t tcp_syn_retries, syn_linear_timeouts; > + > + tcp_syn_retries = read_file_integer( > + TCP_SYN_RETRIES, TCP_SYN_RETRIES_DEFAULT); > + syn_linear_timeouts = read_file_integer( > + TCP_SYN_LINEAR_TIMEOUTS, TCP_SYN_LINEAR_TIMEOUTS_DEFAULT); I think this is a bit hard to read. Now: tcp_syn_retries = read_file_integer(TCP_SYN_RETRIES, TCP_SYN_RETRIES_DEFAULT); syn_linear_timeouts = read_file_integer(TCP_SYN_LINEAR_TIMEOUTS, TCP_SYN_LINEAR_TIMEOUTS_DEFAULT); would be a bit closer to the coding style we're adopting, but I wonder: - does read_file_integer() really need to have "integer" in the name? In a language where integers are called 'int', perhaps read_file_int() is clear enough? - the constants are defined here, in tcp.c, so they obviously refer to TCP, so maybe SYN_LINEAR_TIMEOUTS is clear enough - you don't really need to store both values in two different variables, one is enough as you're assigning them right away. And: v = read_file_int(SYN_RETRIES, SYN_RETRIES_DEFAULT); c->tcp.tcp_syn_retries = MIN(v, UINT8_MAX); v = read_file_int(SYN_LINEAR_TIMEOUTS, SYN_LINEAR_TIMEOUTS_DEFAULT); c->tcp.syn_linear_timeouts = MIN(v, UINT8_MAX); is four lines instead of six, and more readable if you ask me. > + > + c->tcp.tcp_syn_retries = MIN(tcp_syn_retries, UINT8_MAX); > + c->tcp.syn_linear_timeouts = MIN(syn_linear_timeouts, UINT8_MAX); > + > + debug("Read sysctl values tcp_syn_retries: %"PRIu8", linear_timeouts: %"PRIu8, > + c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts); > +} > + > /** > * tcp_init() - Get initial sequence, hash secret, initialise per-socket data > * @c: Execution context > @@ -2776,6 +2815,8 @@ int tcp_init(struct ctx *c) > { > ASSERT(!c->no_tcp); > > + tcp_get_rto_params(c); > + > tcp_sock_iov_init(c); > > memset(init_sock_pool4, 0xff, sizeof(init_sock_pool4)); > diff --git a/tcp.h b/tcp.h > index 234a803..befedde 100644 > --- a/tcp.h > +++ b/tcp.h > @@ -59,12 +59,16 @@ union tcp_listen_epoll_ref { > * @fwd_out: Port forwarding configuration for outbound packets > * @timer_run: Timestamp of most recent timer run > * @pipe_size: Size of pipes for spliced connections > + * @tcp_syn_retries: SYN retries using exponential backoff timeout > + * @syn_linear_timeouts: SYN retries before using exponential backoff timeout > */ > struct tcp_ctx { > struct fwd_ports fwd_in; > struct fwd_ports fwd_out; > struct timespec timer_run; > size_t pipe_size; > + uint8_t tcp_syn_retries; Why 'tcp' again? > + uint8_t syn_linear_timeouts; > }; > > #endif /* TCP_H */ -- Stefano