From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=U/d05dkq; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id 3E64B5A0BC2 for ; Fri, 14 Nov 2025 01:01:28 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1763078487; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dcHRlrlsJbl6ZI0h7jf3XTZUHNeXrtL8F33nv7AoBFQ=; b=U/d05dkq9L1Ov6GC0QNExg++YOm20+79CcgWxi9qOsISCBxrhpehmZmkbOwr3Pl2ildyrs J5GSJbcHIaMkMqLe+0Byx1FMOmFWot/73MKt6ZITuuKxT0Y03EHMdsaLBeP6lhxtY+0w32 jTmK8PvLR7q9sahsgoFi92CaclKz97U= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-635-kNQWuXS3PRygpqZm4GOMbg-1; Thu, 13 Nov 2025 19:01:25 -0500 X-MC-Unique: kNQWuXS3PRygpqZm4GOMbg-1 X-Mimecast-MFC-AGG-ID: kNQWuXS3PRygpqZm4GOMbg_1763078484 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-4777b59119dso15248535e9.2 for ; Thu, 13 Nov 2025 16:01:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763078484; x=1763683284; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=dcHRlrlsJbl6ZI0h7jf3XTZUHNeXrtL8F33nv7AoBFQ=; b=RunlogQGhr53bLXsVduAZVL5yuGez4pLKcsCjVXez9GWiWgCmfgwiu9cC4zK2fp9+4 stol/lifv5XpFflrijzwc0J2IsbWsy3YFIDVnUUUuPX0vZW7QgM+6G+ccKHZ3/8hlrs9 ycSNa4/k0aFZ02Vrls67n5FIsrRNqUK8A1Rm3kxzWOvWaUKdmEuUDcqYOucP7okEsh3R 2fKV6pMhVWO8Pat7S8X6xlW+272KHlcpRI6n4xnJyhOZpA74kpZofBa+PlCCXoxVVxfv htQ4vDl1P7vW1fP6Cg+tXn/MG4QeWU7buVNN7VYXWHEmQDVd0uI5CX9PXCpF+hVQp/9e GlvA== X-Forwarded-Encrypted: i=1; AJvYcCUUTsLo41d9rnNfVv2QFdCRIgBwDMPVYTaKEJzbTC0KFdsk0AiWpfemjuqRpjjz6lZ8UrSz3rGQy48=@passt.top X-Gm-Message-State: AOJu0YyKLLdW3Jleeefwd/2iJ2NYstMe5oZj6FbxenWtS/a/mqqiZ81h Igd0+jD0N9Opo48VP+eEZpKAt4sDvurRz+QALsVSHHkTZcqgQoF4a7DBI6028EbG3P9WjpQ2xm/ xg4xCHUinw+YGVKWSdc8e0CUwxMoB4/sFaGsbCpLQ7Ds0jbtyxYt9iQ== X-Gm-Gg: ASbGncs1Qpdz1DVlKdaFDpXW6k0pRHYnKdAgKwNkOTSRti7N1IRL5iIrtLxpkTuehiM 95QlTdI9+suUBrcuHlxY/mKgnRmubU4badHXUs7QMHq0a7KEpMhrUnVFk3IRDV55nmNg3qEIonh ijjJE5KmqcfLphTfsUX2IXKAsFCrrGL6zNfxtGREpvTcZjsKo/kE3G/iedr9VRCs9FV7xKSmOy2 3qZgUwg9cLkCCQ4fnTx1/T5SqICCpVeWnE+6WKW17hDcHRfzgxw2IZKDW1HXKHZ0v3SK9YjbIXz qdUnlKYi4mE7Rs0i5zU6euI3DVG117ubDKJXZMxO1VlEsDr+6HzLYjKnv7qZRWDft99z1OLdXa5 piq4+bSPtupfpXXN7oEPmbjSyopujVlV5npKf+Q== X-Received: by 2002:a05:600c:4fd2:b0:477:76c2:49c9 with SMTP id 5b1f17b1804b1-4778fe4fa18mr12201355e9.2.1763078484157; Thu, 13 Nov 2025 16:01:24 -0800 (PST) X-Google-Smtp-Source: AGHT+IE1y6OJ08B2DbooaQuFAVKRbFplsS73swLoSS+lU/Ij+0El6y/RfJYKvUByuZh1Q84zjGR9OA== X-Received: by 2002:a05:600c:4fd2:b0:477:76c2:49c9 with SMTP id 5b1f17b1804b1-4778fe4fa18mr12201135e9.2.1763078483555; Thu, 13 Nov 2025 16:01:23 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4778bd11165sm30812365e9.6.2025.11.13.16.01.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Nov 2025 16:01:22 -0800 (PST) Date: Fri, 14 Nov 2025 01:01:21 +0100 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH v8 6/6] tcp: Clamp the retry timeout Message-ID: <20251114010121.10dfb18a@elisabeth> In-Reply-To: References: <20251110093137.87705-7-yuhuang@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: a4_GksMziaOlRPpplI_PMwUxEIS489TwNfrR1ZT46Pc_1763078484 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: USX5YZMSIAHE5P7UMY46KYAHE2OYWEXN X-Message-ID-Hash: USX5YZMSIAHE5P7UMY46KYAHE2OYWEXN X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Yumei Huang , passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Mon, 10 Nov 2025 21:56:39 +1100 David Gibson wrote: > On Mon, Nov 10, 2025 at 05:31:37PM +0800, Yumei Huang wrote: > > Clamp the TCP retry timeout as Linux kernel does. If a retry occurs > > during the handshake and the RTO is below 3 seconds, re-initialise > > it to 3 seconds for data retransmissions according to RFC 6298. > > > > Suggested-by: Stefano Brivio > > Signed-off-by: Yumei Huang > > Looks correct, so > > Reviewed-by: David Gibson > > A few possible improvements (mostly comments/names) below. > > > --- > > tcp.c | 25 ++++++++++++++++++++----- > > tcp.h | 2 ++ > > tcp_conn.h | 1 + > > 3 files changed, 23 insertions(+), 5 deletions(-) > > > > diff --git a/tcp.c b/tcp.c > > index ee111e0..b015658 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -187,6 +187,9 @@ > > * for established connections, or (syn_retries + syn_linear_timeouts) times > > * during the handshake, reset the connection > > * > > + * - RTO_INIT_ACK: if the RTO is less than this, re-initialise RTO to this for > > + * data retransmissions > > + * > > Now that this is conditional on SYN being retried, the description > doesn't look entirely accurate any more. If I read the whole thing again I would actually say it becomes misleading and worth fixing before applying this. Also because the relationship to the text in the RFC is not obvious anymore. This should be "if SYN retries happened during handshake, ...". > > * - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK_FROM_TAP_DUE > > * with TAP_FIN_SENT event), and no ACK is received within this time, reset > > * the connection > > @@ -340,6 +343,7 @@ enum { > > > > #define ACK_INTERVAL 10 /* ms */ > > #define RTO_INIT 1 /* s, RFC 6298 */ > > +#define RTO_INIT_ACK 3 /* s, RFC 6298 */ > > The relevance of "_ACK" in the name is not obvious to me. What about RTO_INIT_AFTER_SYN_RETRIES? We use it only once, and it looks fine there. > > > #define FIN_TIMEOUT 60 > > #define ACT_TIMEOUT 7200 > > > -> @@ -365,9 +369,11 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX]; > > > > #define SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retries" > > #define SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linear_timeouts" > > +#define RTO_MAX_MS "/proc/sys/net/ipv4/tcp_rto_max_ms" > > > > #define SYN_RETRIES_DEFAULT 6 > > #define SYN_LINEAR_TIMEOUTS_DEFAULT 4 > > +#define RTO_MAX_MS_DEFAULT 120000 > > #define MAX_SYNCNT 127 /* derived from kernel's limit */ > > > > /* "Extended" data (not stored in the flow table) for TCP flow migration */ > > @@ -392,7 +398,7 @@ static const char *tcp_state_str[] __attribute((__unused__)) = { > > > > static const char *tcp_flag_str[] __attribute((__unused__)) = { > > "STALLED", "LOCAL", "ACTIVE_CLOSE", "ACK_TO_TAP_DUE", > > - "ACK_FROM_TAP_DUE", "ACK_FROM_TAP_BLOCKS", > > + "ACK_FROM_TAP_DUE", "ACK_FROM_TAP_BLOCKS", "SYN_RETRIED", > > }; > > > > /* Listening sockets, used for automatic port forwarding in pasta mode only */ > > @@ -590,10 +596,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) > > if (conn->flags & ACK_TO_TAP_DUE) { > > it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000; > > } else if (conn->flags & ACK_FROM_TAP_DUE) { > > - int exp = conn->retries; > > + int exp = conn->retries, timeout = RTO_INIT; > > if (!(conn->events & ESTABLISHED)) > > exp -= c->tcp.syn_linear_timeouts; > > - it.it_value.tv_sec = RTO_INIT << MAX(exp, 0); > > + else if (conn->flags & SYN_RETRIED) > > + timeout = MAX(timeout, RTO_INIT_ACK); > > + timeout <<= MAX(exp, 0); > > + it.it_value.tv_sec = MIN(timeout, c->tcp.rto_max); > > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > > it.it_value.tv_sec = FIN_TIMEOUT; > > } else { > > @@ -2440,6 +2449,7 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) > > flow_trace(conn, "SYN timeout, retry"); > > tcp_send_flag(c, conn, SYN); > > conn->retries++; > > + conn_flag(c, conn, SYN_RETRIED); > > tcp_timer_ctl(c, conn); > > } > > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > > @@ -2811,10 +2821,15 @@ void tcp_get_rto_params(struct ctx *c) > > v = read_file_integer(SYN_LINEAR_TIMEOUTS, SYN_LINEAR_TIMEOUTS_DEFAULT); > > c->tcp.syn_linear_timeouts = MIN(v, MAX_SYNCNT); > > > > + v = read_file_integer(RTO_MAX_MS, RTO_MAX_MS_DEFAULT); > > + c->tcp.rto_max = MIN(DIV_ROUND_CLOSEST(v, 1000), INT_MAX); > > Possibly we should verify this is => RTO_INIT. As a sanity check, maybe, but I don't see any harmful effect if it's < RTO_INIT, right? So I'm not sure if we should. No preference from my side really. > > + > > debug("Read sysctl values syn_retries: %"PRIu8 > > - ", syn_linear_timeouts: %"PRIu8, > > + ", syn_linear_timeouts: %"PRIu8 > > + ", rto_max: %d", > > c->tcp.syn_retries, > > - c->tcp.syn_linear_timeouts); > > + c->tcp.syn_linear_timeouts, > > + c->tcp.rto_max); > > } > > > > /** > > diff --git a/tcp.h b/tcp.h > > index 37d7758..c4945c3 100644 > > --- a/tcp.h > > +++ b/tcp.h > > @@ -60,6 +60,7 @@ union tcp_listen_epoll_ref { > > * @fwd_out: Port forwarding configuration for outbound packets > > * @timer_run: Timestamp of most recent timer run > > * @pipe_size: Size of pipes for spliced connections > > + * @rto_max: Maximal retry timeout (in s) As pointed out earlier, I think "maximum" is slightly more appropriate here. > > * @syn_retries: SYN retries using exponential backoff timeout > > * @syn_linear_timeouts: SYN retries before using exponential backoff timeout > > */ > > @@ -68,6 +69,7 @@ struct tcp_ctx { > > struct fwd_ports fwd_out; > > struct timespec timer_run; > > size_t pipe_size; > > + int rto_max; > > uint8_t syn_retries; > > uint8_t syn_linear_timeouts; > > }; > > diff --git a/tcp_conn.h b/tcp_conn.h > > index 923af36..e36910c 100644 > > --- a/tcp_conn.h > > +++ b/tcp_conn.h > > @@ -77,6 +77,7 @@ struct tcp_tap_conn { > > #define ACK_TO_TAP_DUE BIT(3) > > #define ACK_FROM_TAP_DUE BIT(4) > > #define ACK_FROM_TAP_BLOCKS BIT(5) > > +#define SYN_RETRIED BIT(6) > > > > #define SNDBUF_BITS 24 > > unsigned int sndbuf :SNDBUF_BITS; > > -- > > 2.51.0 -- Stefano