From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=PRsnQ4wG; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id 3501D5A061C for ; Fri, 24 Oct 2025 01:04:43 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1761260682; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MiR/QH4c4X14Zj2Y75HbmUtwBEfCGqb7WYxR0UVGrs4=; b=PRsnQ4wGPHZz9YUcTBTB9cW/8M7wyB2RXngd33wDmZjAqtOPrvvEWjCC4LH6VVQRAJndXJ +z8pOL6aG4xzWE4mzLILG8QiBMuJGwzwC4+21YVf/FhXENAFRcdaCs7bp7RxnFMxE+/UZY 32b6Pmz36OdGnMetHosQZJEBWiR5tkE= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-377-9Fso2Iz6M5ShSXCVILXldQ-1; Thu, 23 Oct 2025 19:04:40 -0400 X-MC-Unique: 9Fso2Iz6M5ShSXCVILXldQ-1 X-Mimecast-MFC-AGG-ID: 9Fso2Iz6M5ShSXCVILXldQ_1761260675 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-427013eb71dso1344833f8f.2 for ; Thu, 23 Oct 2025 16:04:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761260675; x=1761865475; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=MiR/QH4c4X14Zj2Y75HbmUtwBEfCGqb7WYxR0UVGrs4=; b=XUZzLUNVfY12kNHlTpHtyrHMiCTVJzxUCFmvk9eoZJNpIEfDwJMm2u5qAWpb7+FdSs 5vmbHfpTriqv+xq8eBUEvcRDn8u9T/I63/6ABSCqPcmIyv2Rv+U0kKLWGzlzLjOgvNfx +e5CSK39aoKwmxRG4wql22H+mJYnfr0ucsLeaXn9YA3Irmi4A8yCHyXtwtJKXpOIOL6j 2bTl/bBKndLynCkB0fp/GqpgJz18ayO/0BaA5iq7qjLqiSQ7QAT8uhGUokvDQT8Y7yR2 71UJguFJeZj9JLe7mrKEQwoKP7ffJRYdq/ZB5VkgaFuTK2HxKyNYNFdqYjBobKrNV6Gz Tgfg== X-Gm-Message-State: AOJu0YxDHRVTWH4PxeqP4Vf3tYZVkFWWec65Se5j27BYeGwif1xXVG1P nRGECxCUf7hS+3LKiuRLwfyNqy5Y41pJNCDbNiC01fGf0tiAMNdFjEz0lCddB+mlfonU63P6vOq ACNpHXGH4Z/YVfQrQtmI1ciuLmC8vTDQ8fHqidDGV3+duVE65Uxpx3jsNcJEVew== X-Gm-Gg: ASbGncsTdnlvChZboujzIZEhTDIYxJKTzOnqkOnB+oF8nF9yOzTWZJD5lPp55VCtZkE uhkoTbJCJWqIude7td47fUnxWeyVaZFNrZXeQimNGLoNPFhPjeRMO5Eo4Hxff66AvCssnR3QU57 SujyvuYZD0/qax4CGx1A5cA76acaB844cqYqoZGTszbwcwtfuXfVZKp5EiMPqB+ag/c1nMwzp3g zsM5XLi8cn/abaVSJnu/c52QSYsjjnMDKYkRjxqsVgqunx5EtyksxG7H2/zWdJH+286NvrlrQS8 clEPxBqTYPTQFuToOJTlzAxACs0hNikNSEFzYSs/2cmdw3Z5AhgTUrSMW1akMZpVnJ0rgwcHOIc JT6M1CFTSqGWiDrIuf56TjUXFDlg= X-Received: by 2002:a05:6000:4285:b0:426:ee08:8ea9 with SMTP id ffacd0b85a97d-42704d9b240mr18894835f8f.44.1761260674681; Thu, 23 Oct 2025 16:04:34 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEO7fmG9QXOrjYWfM41asD5v0hgwn6mcIKSCox0XZHntIX7I1M32kT9SdpiXpl3MKUbFHzqiA== X-Received: by 2002:a05:6000:4285:b0:426:ee08:8ea9 with SMTP id ffacd0b85a97d-42704d9b240mr18894818f8f.44.1761260674160; Thu, 23 Oct 2025 16:04:34 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-429897f57efsm6141772f8f.18.2025.10.23.16.04.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Oct 2025 16:04:33 -0700 (PDT) Date: Fri, 24 Oct 2025 01:04:31 +0200 From: Stefano Brivio To: Yumei Huang Subject: Re: [PATCH v6 3/4] tcp: Resend SYN for inbound connections Message-ID: <20251024010431.4329a843@elisabeth> In-Reply-To: <20251017062838.21041-4-yuhuang@redhat.com> References: <20251017062838.21041-1-yuhuang@redhat.com> <20251017062838.21041-4-yuhuang@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: nRbgywG_Kxi0anY3f3GzxIgzTPtD3lxNeIm8qwG11WE_1761260675 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: W57NHIX6WU4FZHTTFQP656SRIPQEASCN X-Message-ID-Hash: W57NHIX6WU4FZHTTFQP656SRIPQEASCN X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, david@gibson.dropbear.id.au X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, 17 Oct 2025 14:28:37 +0800 Yumei Huang wrote: > If a client connects while guest is not connected or ready yet, > resend SYN instead of just resetting connection after 10 seconds. > > Use the same backoff calculation for the timeout as linux kernel. Linux. > > Link: https://bugs.passt.top/show_bug.cgi?id=153 > Signed-off-by: Yumei Huang > --- > tcp.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++-------- > tcp.h | 5 +++++ > 2 files changed, 52 insertions(+), 8 deletions(-) > > diff --git a/tcp.c b/tcp.c > index 2ec4b0c..9385132 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -179,9 +179,11 @@ > * > * Timeouts are implemented by means of timerfd timers, set based on flags: > * > - * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshake (flag > - * ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, reset the > - * connection > + * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake > + * (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend > + * SYN. It's the starting timeout for the first SYN retry. If this persists "If this persists" makes sense for the existing ACK_TIMEOUT description but not here, because it looks like it refers to "starting timeout". Coupled with the next patch, it becomes increasingly difficult to understand what "this" persisting thing is. Maybe directly say "Retry for ..., then reset the connection"? It's shorter and clearer. > + * for more than TCP_MAX_RETRIES or (tcp_syn_retries + > + * tcp_syn_linear_timeouts) times in a row, reset the connection > * > * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending > * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the > @@ -340,7 +342,7 @@ enum { > #define WINDOW_DEFAULT 14600 /* RFC 6928 */ > > #define ACK_INTERVAL 10 /* ms */ > -#define SYN_TIMEOUT 10 /* s */ > +#define SYN_TIMEOUT_INIT 1 /* s */ Maybe mention RFC 6928 as done above? That's where this value comes from. I just noticed you do that in 4/4, so it's slightly nicer if you do that right away here for ease of future reference, but not really needed. > #define ACK_TIMEOUT 2 > #define FIN_TIMEOUT 60 > #define ACT_TIMEOUT 7200 > @@ -365,6 +367,9 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX]; > > #define TCP_MIGRATE_RESTORE_CHUNK_MIN 1024 /* Try smaller when above this */ > > +#define TCP_SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retries" > +#define TCP_SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linear_timeouts" \ This uses 121 columns. I'm not sure where all those tabs and \ come from. > + > /* "Extended" data (not stored in the flow table) for TCP flow migration */ > static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX]; > > @@ -581,8 +586,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) > if (conn->flags & ACK_TO_TAP_DUE) { > it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000; > } else if (conn->flags & ACK_FROM_TAP_DUE) { > - if (!(conn->events & ESTABLISHED)) > - it.it_value.tv_sec = SYN_TIMEOUT; > + if (!(conn->events & ESTABLISHED)) { > + if (conn->retries < c->tcp.syn_linear_timeouts) > + it.it_value.tv_sec = SYN_TIMEOUT_INIT; > + else > + it.it_value.tv_sec = SYN_TIMEOUT_INIT << > + (conn->retries - c->tcp.syn_linear_timeouts); Probably more readable, but I haven't tried: always start from SYN_TIMEOUT_INIT, then multiply/shift if conn->retries >= c->tcp.syn_linear_timeouts. > + } > else > it.it_value.tv_sec = ACK_TIMEOUT; > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > @@ -2409,8 +2419,17 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) > tcp_timer_ctl(c, conn); > } else if (conn->flags & ACK_FROM_TAP_DUE) { > if (!(conn->events & ESTABLISHED)) { > - flow_dbg(conn, "handshake timeout"); > - tcp_rst(c, conn); > + if (conn->retries >= TCP_MAX_RETRIES || > + conn->retries >= (c->tcp.tcp_syn_retries + > + c->tcp.syn_linear_timeouts)) { > + flow_dbg(conn, "handshake timeout"); > + tcp_rst(c, conn); > + } else { > + flow_trace(conn, "SYN timeout, retry"); > + tcp_send_flag(c, conn, SYN); > + conn->retries++; I think I already raised this point on a previous revision: this needs to be zeroed as the connection is established, but I don't see that in the current version. > + tcp_timer_ctl(c, conn); > + } > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > flow_dbg(conn, "FIN timeout"); > tcp_rst(c, conn); > @@ -2766,6 +2785,24 @@ static socklen_t tcp_probe_tcp_info(void) > return sl; > } > > +/** > + * tcp_syn_params_init() - Get initial SYN parameters for inbound connection They're not initial, they'll be used for all the connections if I understand correctly. Maybe "Get SYN retries sysctl values"? I think the _init() in the function name is also somewhat misleading. > + * @c: Execution context > +*/ > +void tcp_syn_params_init(struct ctx *c) > +{ > + intmax_t tcp_syn_retries, syn_linear_timeouts; > + > + tcp_syn_retries = read_file_integer(TCP_SYN_RETRIES, 8); Why 8? Perhaps a #define would help? > + syn_linear_timeouts = read_file_integer(TCP_SYN_LINEAR_TIMEOUTS, 1); > + > + c->tcp.tcp_syn_retries = MIN(tcp_syn_retries, UINT8_MAX); > + c->tcp.syn_linear_timeouts = MIN(syn_linear_timeouts, UINT8_MAX); > + > + debug("TCP SYN parameters: retries=%"PRIu8", linear_timeouts=%"PRIu8, Similar to the comment above: these are not parameters of SYN segments (which would seem to imply TCP options, such as the MSS). We typically don't print C assignments, rather human-readable messages, so that could be "Read sysctl values tcp_syn_retries: ..., syn_linear_timeouts: ...". > + c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts); > +} > + > /** > * tcp_init() - Get initial sequence, hash secret, initialise per-socket data > * @c: Execution context > @@ -2776,6 +2813,8 @@ int tcp_init(struct ctx *c) > { > ASSERT(!c->no_tcp); > > + tcp_syn_params_init(c); > + > tcp_sock_iov_init(c); > > memset(init_sock_pool4, 0xff, sizeof(init_sock_pool4)); > diff --git a/tcp.h b/tcp.h > index 234a803..4369b52 100644 > --- a/tcp.h > +++ b/tcp.h > @@ -59,12 +59,17 @@ union tcp_listen_epoll_ref { > * @fwd_out: Port forwarding configuration for outbound packets > * @timer_run: Timestamp of most recent timer run > * @pipe_size: Size of pipes for spliced connections > + * @tcp_syn_retries: Number of SYN retries during handshake > + * @syn_linear_timeouts: Number of SYN retries using linear backoff timeout > + * before switching to exponential backoff timeout Maybe more compact: * @syn_linear_timeouts: SYN retries before using exponential timeout > */ > struct tcp_ctx { > struct fwd_ports fwd_in; > struct fwd_ports fwd_out; > struct timespec timer_run; > size_t pipe_size; > + uint8_t tcp_syn_retries; > + uint8_t syn_linear_timeouts; > }; > > #endif /* TCP_H */ -- Stefano