From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=BmGJeWJy; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id C30185A026F for ; Fri, 31 Oct 2025 09:38:29 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1761899908; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=t/d4oZSXwtRbE4DprIAu7eJ/XEsTDbJu53AkK0aE960=; b=BmGJeWJycTgNqGonG1De8UUPQIYl2OGufvBeUa5Kp+h5pZJ/Fm6kLQ/X4hmlsl1ESh+f3g p6o9uyRk666j0Lgol8p5u5Oe35egrpuwtc5F0UwAltqVkO3mgLyBbuh8n7J9Q9bXQMr5fM rZffmdb1zIJ4u4IdG1VIuqBV0+UcAZg= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-631-ERlRxgPrMhSa4ofLNM8Fpg-1; Fri, 31 Oct 2025 04:38:27 -0400 X-MC-Unique: ERlRxgPrMhSa4ofLNM8Fpg-1 X-Mimecast-MFC-AGG-ID: ERlRxgPrMhSa4ofLNM8Fpg_1761899906 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-475ddd57999so14518925e9.1 for ; Fri, 31 Oct 2025 01:38:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761899905; x=1762504705; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=t/d4oZSXwtRbE4DprIAu7eJ/XEsTDbJu53AkK0aE960=; b=RR0tZ28JP6rrOr+PTSmgxBLQXyPAx3Y7Fh49zURrOIBymP5RRZ5xcqc/f5IufX7+0q Fay1rTUh9ueNUcVGTi1W3g3WmJmHu+2nTGe36fYGidRomMMkKCz4je1VkMXJLL06SRIk bQqu1OeO5dUHpSN3c8dtsKuWALG4rItY//XAmm+RU8DNtnyBUOqY9s2wSzUHLRu1VuM3 hYmC4hbr2Hd+aFjcLHMiA/bGF59c6+SRqmjL+7Zu1Fr0DV8FEKzEzM2zwHFMA/9pBJGh tGyNqTnClMYh1JCKMW5EEcnUVxEqFLVUMsbBS/C5sZTIU4eEspt9fMZP0LemH9n2gvsh daMQ== X-Gm-Message-State: AOJu0YzruZjzEP5e+MjhzSvfRLmCuMFKu9jh5HJYwZZwEiECXq/wnNjP ZgKluTrojapDgeb9YpfDgESr/0ZDjPVZS2BdNOBisln+aSU3hONf/ed/FpengDX5CsZQh1JV7ga N0yMARrFWh3ATlur0860A48xpvyneLQacmDXUJtrDum3RPlVphNfyHOCvkhIFyw== X-Gm-Gg: ASbGncuWpep22AhdKZ/9DIIe95DWltf0jNn8uZqJlxS04CWgTl2fZp5Q5GFnzONpU0a K/+U4E7b6MWgsmou6Snm2ix2hnkSkGY5UQL7fXXDgMB2qdssU1QCGTSBthfbttRnyiT9e5toPAO ifNQHVhQyPa3PFEs2G9BwP/gCRQTfkZMJudA1lowhOH7qeWF043ZdKzW+b4fWmxSlhNGIPdRsTt Wv8EP6zEHF2REMAVEcKuOmlvuO84kbQnZYE6veL48fxmt1KWbVEIdVnsO+MQSMJUtoiPFNZFzFv JmNQP05EGwoXQFuPDBaX9GaGH/IYt307yCtQsYQfwEjMwoommgDf57BqFagzR2IVPsyONSrbnz5 kOm1XV2dQ/g== X-Received: by 2002:a05:600c:a48:b0:475:dd04:1289 with SMTP id 5b1f17b1804b1-4773086d50dmr19499815e9.20.1761899905366; Fri, 31 Oct 2025 01:38:25 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGAn3Mc+njqYcBpSe7m2PBGexsppBDhpW8+xu7+fylDxoKKic4wkymybXdJnr1lCVtpa2IEKg== X-Received: by 2002:a05:600c:a48:b0:475:dd04:1289 with SMTP id 5b1f17b1804b1-4773086d50dmr19499515e9.20.1761899904773; Fri, 31 Oct 2025 01:38:24 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-47732ff7d6bsm20610795e9.13.2025.10.31.01.38.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 31 Oct 2025 01:38:24 -0700 (PDT) Date: Fri, 31 Oct 2025 09:38:22 +0100 From: Stefano Brivio To: Yumei Huang Subject: Re: [PATCH v7 5/5] tcp: Clamp the retry timeout Message-ID: <20251031093822.6fa93b7e@elisabeth> In-Reply-To: <20251031054242.7334-6-yuhuang@redhat.com> References: <20251031054242.7334-1-yuhuang@redhat.com> <20251031054242.7334-6-yuhuang@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: Qewrzfhccfmxf4dPeUMz634fVIAdOo87SSgZ8qkEYVs_1761899906 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: TRCTVGCJUIMIK6HUSAXCWV4JT3GSEZNM X-Message-ID-Hash: TRCTVGCJUIMIK6HUSAXCWV4JT3GSEZNM X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, david@gibson.dropbear.id.au X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, 31 Oct 2025 13:42:42 +0800 Yumei Huang wrote: > Clamp the TCP retry timeout as Linux kernel does. If RTO is less > than 3 seconds, re-initialize it to 3 seconds for data retransmissions > according to RFC 6298. > > Suggested-by: Stefano Brivio > Signed-off-by: Yumei Huang > --- > tcp.c | 25 ++++++++++++++++++++----- > tcp.h | 2 ++ > 2 files changed, 22 insertions(+), 5 deletions(-) > > diff --git a/tcp.c b/tcp.c > index 96ee56a..84a6700 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -187,6 +187,9 @@ > * for established connections, or (tcp_syn_retries + > * tcp_syn_linear_timeouts) times during the handshake, reset the connection > * > + * - RTO_INIT_ACK: if the RTO is less than this, re-initialize RTO to this for Nit: we started with BrE (British English) spelling and later tried to keep that consistent for the sake of grep, so we'll usually use "initialise" (for consistency, at this point). > + * data retransmissions. > + * > * - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK_FROM_TAP_DUE > * with TAP_FIN_SENT event), and no ACK is received within this time, reset > * the connection > @@ -340,6 +343,7 @@ enum { > > #define ACK_INTERVAL 10 /* ms */ > #define RTO_INIT 1 /* s, RFC 6298 */ > +#define RTO_INIT_ACK 3 /* s, RFC 6298 */ > #define FIN_TIMEOUT 60 > #define ACT_TIMEOUT 7200 > > @@ -365,9 +369,11 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX]; > > #define TCP_SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retries" > #define TCP_SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linear_timeouts" > +#define TCP_RTO_MAX_MS "/proc/sys/net/ipv4/tcp_rto_max_ms" > > #define TCP_SYN_RETRIES_DEFAULT 6 > #define TCP_SYN_LINEAR_TIMEOUTS_DEFAULT 4 > +#define TCP_RTO_MAX_MS_DEFAULT 120000 > > /* "Extended" data (not stored in the flow table) for TCP flow migration */ > static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX]; > @@ -585,10 +591,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) > if (conn->flags & ACK_TO_TAP_DUE) { > it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000; > } else if (conn->flags & ACK_FROM_TAP_DUE) { > - int exp = conn->retries; > + int exp = conn->retries, timeout = RTO_INIT; > if (!(conn->events & ESTABLISHED)) > exp -= c->tcp.syn_linear_timeouts; > - it.it_value.tv_sec = RTO_INIT << MAX(exp, 0); > + else > + timeout = MAX(timeout, RTO_INIT_ACK); > + timeout <<= MAX(exp, 0); > + it.it_value.tv_sec = MIN(timeout, c->tcp.tcp_rto_max); > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > it.it_value.tv_sec = FIN_TIMEOUT; > } else { > @@ -2785,18 +2794,24 @@ static socklen_t tcp_probe_tcp_info(void) > */ > void tcp_get_rto_params(struct ctx *c) > { > - intmax_t tcp_syn_retries, syn_linear_timeouts; > + intmax_t tcp_syn_retries, syn_linear_timeouts, tcp_rto_max_ms; > > tcp_syn_retries = read_file_integer( > TCP_SYN_RETRIES, TCP_SYN_RETRIES_DEFAULT); > syn_linear_timeouts = read_file_integer( > TCP_SYN_LINEAR_TIMEOUTS, TCP_SYN_LINEAR_TIMEOUTS_DEFAULT); > + tcp_rto_max_ms = read_file_integer( > + TCP_RTO_MAX_MS, TCP_RTO_MAX_MS_DEFAULT); > > c->tcp.tcp_syn_retries = MIN(tcp_syn_retries, UINT8_MAX); > c->tcp.syn_linear_timeouts = MIN(syn_linear_timeouts, UINT8_MAX); > + c->tcp.tcp_rto_max = MIN( > + DIV_ROUND_CLOSEST(tcp_rto_max_ms, 1000), SIZE_MAX); size_t looks like a rather weird choice for tcp_rto_max: size_t is used to represent the size of objects, and nothing else (not a timeout in milliseconds). It's also an int in ipv4_net_table[], net/ipv4/sysctl_net_ipv4.c (Linux kernel). Any reason for picking size_t here? I mean, it will work, it just looks weird (and wastes a tiny bit of space, even though I guess we don't care about 4 bytes there). > - debug("Read sysctl values tcp_syn_retries: %"PRIu8", linear_timeouts: %"PRIu8, > - c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts); > + debug("Read sysctl values tcp_syn_retries: %"PRIu8 > + ", linear_timeouts: %"PRIu8", tcp_rto_max: %zu", > + c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts, > + c->tcp.tcp_rto_max); > } > > /** > diff --git a/tcp.h b/tcp.h > index befedde..a238bb7 100644 > --- a/tcp.h > +++ b/tcp.h > @@ -59,6 +59,7 @@ union tcp_listen_epoll_ref { > * @fwd_out: Port forwarding configuration for outbound packets > * @timer_run: Timestamp of most recent timer run > * @pipe_size: Size of pipes for spliced connections > + * @tcp_rto_max: Maximal retry timeout (in s) Nit: "maximal" has a slightly different meaning compared to "maximum". The highest value allowed for a field would typically be called "maximum", while "maximal" is more commonly used to indicate a value / element that's the biggest of all values. Yes, I know, it's complicated. > * @tcp_syn_retries: SYN retries using exponential backoff timeout > * @syn_linear_timeouts: SYN retries before using exponential backoff timeout > */ > @@ -67,6 +68,7 @@ struct tcp_ctx { > struct fwd_ports fwd_out; > struct timespec timer_run; > size_t pipe_size; > + size_t tcp_rto_max; > uint8_t tcp_syn_retries; > uint8_t syn_linear_timeouts; > }; The rest of the series looks good to me at a *very* quick glance, but I can't claim I really reviewed it yet. -- Stefano