From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=BhsB0MPw; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id 69B265A0619 for ; Thu, 27 Nov 2025 23:33:49 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1764282828; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eZH9kCojgZQS3PZOmoq/+UL868qh8iVdS9Q0wwaHViI=; b=BhsB0MPwWzXJz8/+F0X5bRspfa+k6NL7DSDQYhNDBKIGhVjFgZr/MedKs4qxz/0ZLbAhel BGu5wBc2NtvjQSeOSJCUyAiZkCUTSrM51lSdbHgYVhKAqX6igv3DpEq3NiXuCNjfmX4yzK U2hngmoGsByoGjWUxOIIuufeq0s+jVs= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-489-Fp3VQxmSNvqjZEn7GfVIRQ-1; Thu, 27 Nov 2025 17:33:46 -0500 X-MC-Unique: Fp3VQxmSNvqjZEn7GfVIRQ-1 X-Mimecast-MFC-AGG-ID: Fp3VQxmSNvqjZEn7GfVIRQ_1764282826 Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-4776b0ada3dso15247925e9.0 for ; Thu, 27 Nov 2025 14:33:46 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764282825; x=1764887625; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=eZH9kCojgZQS3PZOmoq/+UL868qh8iVdS9Q0wwaHViI=; b=tsp/1gz8w6wWsinrw1cVqbMgIQJXaN12zA+Fm67zW4MwC+qZLKSgQF91ni2JMg6U02 UGMvA5tOlNiutuLuGXs4xmCOAp8YMqMkJXreGp7jIB+4gIz92KxBzVguHJi7f0uoi1Ue Yg9T7LEhdfuMgYRwNBhZI0aiAV+QerQUMXfDtCwf7n3ZvOI0JRWyC0uMD47aimKfNke8 0qcKebBBpFHyUaRy1/0pHbHl1SkBqNZt2IL8F45UPC1gc5m8jLI8bgOlD7zcQu4AfVGu zACwry3zvG1siafnHgFieptEwOUI4gC5/ZdPVlYBW2fsG55wB4UtwDgyyCD7U6T28mji JpGA== X-Gm-Message-State: AOJu0YzsentuKKq8lCbhOdior9gVYnYSUOcbcXRWNID4aNpDbL+gQIj8 0iq6DLI7FA29yNQY37qgYoukN2l0Sm+8+NH+hpWvjIhI6Yk67rZzaQ3cMYl96tU46mljL6miIfn toJWz7Sic1LBWasn8gWY34oEpUmqNtJ+nLKWfJB7KBV32NLfUh1zBOlJPlju+Qw== X-Gm-Gg: ASbGncsdGB4inF93MiYOp2p7ybjV07+o6dE4rA3Ax3ukYVUN2lk/EOjz337/5a3e+Ot 2mbiXmzsrL1TZXfTouC5vyF6DuRz6FX0tQTZt6dpgULOLTBfssg5ItkS3Ec6jLs/LCbWoYGtYn9 +Ma/UNWOzblwsG1v4OBMKGOR699IaD71YIpOBSbo6dMFIjo14gNjyxeajzXsOA6DlsMTkIz78+V jzurIAJbMsAVmRrCLvx4BNpfh+q77uadMoZdxp1feM814UbDnBeg2SjtjHps+v34TGdvsfFW4Ha 1xo1XhozJyJxHIk41Hcg5ikAHS76/uPDgRwEJRDBoUXrwynGkdQxIRaGFrnSyAvNCqWOg8Xrr8t kK5wnRC6Gk8hWUdC/YxQK52qRlE7oB2Lt4xqLvQ== X-Received: by 2002:a05:600c:21c7:b0:477:75b4:d2d1 with SMTP id 5b1f17b1804b1-477b9ef50b4mr220501775e9.15.1764282824662; Thu, 27 Nov 2025 14:33:44 -0800 (PST) X-Google-Smtp-Source: AGHT+IF13DLbQgCKA6kwCe+HLl9QXzgJX0xhCQC/Ly3KtNDJwqJ2weg4PP9KIDpXRSso6v/1HDLQRQ== X-Received: by 2002:a05:600c:21c7:b0:477:75b4:d2d1 with SMTP id 5b1f17b1804b1-477b9ef50b4mr220501605e9.15.1764282824145; Thu, 27 Nov 2025 14:33:44 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4790b0cc39csm122931655e9.14.2025.11.27.14.33.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Nov 2025 14:33:43 -0800 (PST) Date: Thu, 27 Nov 2025 23:33:42 +0100 From: Stefano Brivio To: Yumei Huang Subject: Re: [PATCH v9 5/5] tcp: Clamp the retry timeout Message-ID: <20251127233342.36986b16@elisabeth> In-Reply-To: <20251125072638.88896-6-yuhuang@redhat.com> References: <20251125072638.88896-1-yuhuang@redhat.com> <20251125072638.88896-6-yuhuang@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: XYmuZ5dJKQ14K76Xd5f9-yD0ooEVznCjQAxisdspaIY_1764282826 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: WP2HYPRFWN2FJQERXSL7FWQSI3QH4JWD X-Message-ID-Hash: WP2HYPRFWN2FJQERXSL7FWQSI3QH4JWD X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, david@gibson.dropbear.id.au X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Tue, 25 Nov 2025 15:26:38 +0800 Yumei Huang wrote: > Clamp the TCP retry timeout as Linux kernel does. If a retry occurs > during the handshake and the RTO is below 3 seconds, re-initialise > it to 3 seconds for data retransmissions according to RFC 6298. > > Suggested-by: Stefano Brivio > Signed-off-by: Yumei Huang > --- > tcp.c | 25 ++++++++++++++++++++----- > tcp.h | 2 ++ > tcp_conn.h | 1 + > 3 files changed, 23 insertions(+), 5 deletions(-) > > diff --git a/tcp.c b/tcp.c > index b3aa064..887b32f 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -187,6 +187,9 @@ > * for established connections, or (syn_retries + syn_linear_timeouts) times > * during the handshake, reset the connection > * > + * - RTO_INIT_AFTER_SYN_RETRIES: if SYN retries happened during handshake and > + * RTO is less than this, re-initialise RTO to this for data retransmissions > + * > * - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK_FROM_TAP_DUE > * with TAP_FIN_SENT event), and no ACK is received within this time, reset > * the connection > @@ -340,6 +343,7 @@ enum { > > #define ACK_INTERVAL 10 /* ms */ > #define RTO_INIT 1 /* s, RFC 6298 */ > +#define RTO_INIT_AFTER_SYN_RETRIES 3 /* s, RFC 6298 */ > #define FIN_TIMEOUT 60 > #define ACT_TIMEOUT 7200 > > @@ -365,9 +369,11 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX]; > > #define SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retries" > #define SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linear_timeouts" > +#define RTO_MAX_MS "/proc/sys/net/ipv4/tcp_rto_max_ms" > > #define SYN_RETRIES_DEFAULT 6 > #define SYN_LINEAR_TIMEOUTS_DEFAULT 4 > +#define RTO_MAX_MS_DEFAULT 120000 Nit: I think it would make more sense to have this in seconds, because you are rounding it anyway. If somebody changes this to 120001, you'll use 120000. At that point you can just say 120. More importantly, perhaps, if somebody changes this to 1200, you would be using 2000. I think this could be RTO_MAX_DEFAULT (with a comment saying it's in seconds) and later you can just multiply it by 1000 when you use it. The rest of the series looks good to me now. > #define MAX_SYNCNT 127 /* derived from kernel's limit */ > > /* "Extended" data (not stored in the flow table) for TCP flow migration */ > @@ -392,7 +398,7 @@ static const char *tcp_state_str[] __attribute((__unused__)) = { > > static const char *tcp_flag_str[] __attribute((__unused__)) = { > "STALLED", "LOCAL", "ACTIVE_CLOSE", "ACK_TO_TAP_DUE", > - "ACK_FROM_TAP_DUE", "ACK_FROM_TAP_BLOCKS", > + "ACK_FROM_TAP_DUE", "ACK_FROM_TAP_BLOCKS", "SYN_RETRIED", > }; > > /* Listening sockets, used for automatic port forwarding in pasta mode only */ > @@ -590,10 +596,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) > if (conn->flags & ACK_TO_TAP_DUE) { > it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000; > } else if (conn->flags & ACK_FROM_TAP_DUE) { > - int exp = conn->retries; > + int exp = conn->retries, timeout = RTO_INIT; > if (!(conn->events & ESTABLISHED)) > exp -= c->tcp.syn_linear_timeouts; > - it.it_value.tv_sec = RTO_INIT << MAX(exp, 0); > + else if (conn->flags & SYN_RETRIED) > + timeout = MAX(timeout, RTO_INIT_AFTER_SYN_RETRIES); > + timeout <<= MAX(exp, 0); > + it.it_value.tv_sec = MIN(timeout, c->tcp.rto_max); > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > it.it_value.tv_sec = FIN_TIMEOUT; > } else { > @@ -2441,6 +2450,7 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) > flow_trace(conn, "SYN timeout, retry"); > tcp_send_flag(c, conn, SYN); > conn->retries++; > + conn_flag(c, conn, SYN_RETRIED); > tcp_timer_ctl(c, conn); > } > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > @@ -2812,10 +2822,15 @@ static void tcp_get_rto_params(struct ctx *c) > v = read_file_integer(SYN_LINEAR_TIMEOUTS, SYN_LINEAR_TIMEOUTS_DEFAULT); > c->tcp.syn_linear_timeouts = MIN(v, MAX_SYNCNT); > > + v = read_file_integer(RTO_MAX_MS, RTO_MAX_MS_DEFAULT); > + c->tcp.rto_max = MIN(DIV_ROUND_UP(v, 1000), INT_MAX); > + > debug("Using TCP RTO parameters, syn_retries: %"PRIu8 > - ", syn_linear_timeouts: %"PRIu8, > + ", syn_linear_timeouts: %"PRIu8 > + ", rto_max: %d", > c->tcp.syn_retries, > - c->tcp.syn_linear_timeouts); > + c->tcp.syn_linear_timeouts, > + c->tcp.rto_max); > } > > /** > diff --git a/tcp.h b/tcp.h > index 37d7758..6fb6f92 100644 > --- a/tcp.h > +++ b/tcp.h > @@ -60,6 +60,7 @@ union tcp_listen_epoll_ref { > * @fwd_out: Port forwarding configuration for outbound packets > * @timer_run: Timestamp of most recent timer run > * @pipe_size: Size of pipes for spliced connections > + * @rto_max: Maximum retry timeout (in s) > * @syn_retries: SYN retries using exponential backoff timeout > * @syn_linear_timeouts: SYN retries before using exponential backoff timeout > */ > @@ -68,6 +69,7 @@ struct tcp_ctx { > struct fwd_ports fwd_out; > struct timespec timer_run; > size_t pipe_size; > + int rto_max; > uint8_t syn_retries; > uint8_t syn_linear_timeouts; > }; > diff --git a/tcp_conn.h b/tcp_conn.h > index 923af36..e36910c 100644 > --- a/tcp_conn.h > +++ b/tcp_conn.h > @@ -77,6 +77,7 @@ struct tcp_tap_conn { > #define ACK_TO_TAP_DUE BIT(3) > #define ACK_FROM_TAP_DUE BIT(4) > #define ACK_FROM_TAP_BLOCKS BIT(5) > +#define SYN_RETRIED BIT(6) > > #define SNDBUF_BITS 24 > unsigned int sndbuf :SNDBUF_BITS; -- Stefano