From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=L1MgQefA; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id 422D65A0619 for ; Mon, 03 Nov 2025 11:55:30 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762167329; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=T8UyRS5/WPBZMe2+tMS7THf6yHUjvhHDW988Ch2xKt4=; b=L1MgQefAFRIsPW2j0b2tDpKh53rERFOPJgvVtBGsOS31sfEw3B5K81DneS7qHSZjyZd4mP dyggk38AK4XvUO0xwvGYZqf01i12YTQUakicPgEptloK0cC6HKJXvbSl7tb6BMMYyuzijA dRSJ9c41CIGZaDsm3sm9ivHmzk01kg8= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-99-zdIHWhy6MaCDgi1Y4NKtHA-1; Mon, 03 Nov 2025 05:55:27 -0500 X-MC-Unique: zdIHWhy6MaCDgi1Y4NKtHA-1 X-Mimecast-MFC-AGG-ID: zdIHWhy6MaCDgi1Y4NKtHA_1762167326 Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-475de1afec6so14154255e9.1 for ; Mon, 03 Nov 2025 02:55:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762167326; x=1762772126; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=4RTKY+lotZQJW8OFKDrWfBxLk+jOSv1fqIfcIcqgnW8=; b=udMRbq04IFi9F7yBMTsfO8dyWt0RhOyq1KH5Vh360/E3EkpO7Vm/kUZPeCxkGLHgEF ipUHni0vlkPBREk0KSHkEHFB4u29Lg5ejDzBy/+opXHmIIGWurUcrF34EWOekNcI/y9x p7E1DOsmIIZptYwSG6fK55gODRZMiQDkzSzfo3U/FjpK9VqA6ddfnBWrCJZaSS6kZr7k 85dHclvG73D0L6Eg+0Yszk/zEPXI0/MBUZUlG2YmdIM2xLUim6D/e425fTPM1ICHNj9e g6sJ49Pcxqx/wzzrgNVihrhHe2b/C/kQN5i939A7129jUyS4lQqMpvE+UK3Bl6DR3Ixj A6MQ== X-Gm-Message-State: AOJu0YwJEtm9WpdISnnDEKGNSPFR7SGL4GWpPnGUjQTDmf2aCFu/HETY hNxPKtK2HexFqagiHeAbsO/SkN+YUyJ+lP4kRi18F0cujm4uvl0fPjJi/jHPGVzXE2fneqXtJSD PptHCXaWIdQ82V7DVa2qgIH0Nqx8SWPGJLDMUZEdGK65rkEBxYXAskA== X-Gm-Gg: ASbGnctlf+0cP+MPTsn4gfpzWeeN9BjWAahjd4uNS5Tqe+tL9oAxHcxCzqRGRpPYbtB iuzV4OXKWFYXkUIjvGscfYo+IetncSEOqiiFoQi8J0zQOW67l3Gsdi/njjJQkjeaKzPOtDS//AD 8mvr9aA+vbhpDJnbQUoQqTnaGI+8+FLd5pwwN0YMhsYzqnn0R9uSmrhFYQ6nNwFsnzeJ33xJCpO GPC5szPZ6LGuTiO3I8vgQn2GccaLDSWvAiWFMgEpH2FPHdiKv7aGOiEQJzhnlGOfHVadHWSeR4+ GrDEV9TPMCU/GFZlDJsavUUyeTer71amGBqLiqbCxMppixlxL440PPy61iezL5lc0FeTqoCQ561 Sany7VmQTlQ== X-Received: by 2002:a05:600d:435a:b0:475:d278:1ab8 with SMTP id 5b1f17b1804b1-477262226d8mr93789805e9.2.1762167325937; Mon, 03 Nov 2025 02:55:25 -0800 (PST) X-Google-Smtp-Source: AGHT+IFFqeGH+yMTnC72ib48rhEGmsoqnW1mElbRdPICfwh0G4TRskm7Hr45AkRl8W9dR7sPTD4oyQ== X-Received: by 2002:a05:600d:435a:b0:475:d278:1ab8 with SMTP id 5b1f17b1804b1-477262226d8mr93789525e9.2.1762167325285; Mon, 03 Nov 2025 02:55:25 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4773c3a77b1sm150446375e9.17.2025.11.03.02.55.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Nov 2025 02:55:24 -0800 (PST) Date: Mon, 3 Nov 2025 11:55:23 +0100 From: Stefano Brivio To: Yumei Huang Subject: Re: [PATCH v7 5/5] tcp: Clamp the retry timeout Message-ID: <20251103115523.4a1d3e40@elisabeth> In-Reply-To: References: <20251031054242.7334-1-yuhuang@redhat.com> <20251031054242.7334-6-yuhuang@redhat.com> <20251031093822.6fa93b7e@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: v8dr2l_f1Oh7QmMpFeEMhLZ96iw1nD2lJHjsmXyv8dM_1762167326 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: RP74Z47IVZORI4MZEHBKIWM2IMAHKRMI X-Message-ID-Hash: RP74Z47IVZORI4MZEHBKIWM2IMAHKRMI X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, david@gibson.dropbear.id.au X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Mon, 3 Nov 2025 11:11:10 +0800 Yumei Huang wrote: > On Fri, Oct 31, 2025 at 4:38=E2=80=AFPM Stefano Brivio wrote: > > > > On Fri, 31 Oct 2025 13:42:42 +0800 > > Yumei Huang wrote: > > =20 > > > Clamp the TCP retry timeout as Linux kernel does. If RTO is less > > > than 3 seconds, re-initialize it to 3 seconds for data retransmission= s > > > according to RFC 6298. > > > > > > Suggested-by: Stefano Brivio > > > Signed-off-by: Yumei Huang > > > --- > > > tcp.c | 25 ++++++++++++++++++++----- > > > tcp.h | 2 ++ > > > 2 files changed, 22 insertions(+), 5 deletions(-) > > > > > > diff --git a/tcp.c b/tcp.c > > > index 96ee56a..84a6700 100644 > > > --- a/tcp.c > > > +++ b/tcp.c > > > @@ -187,6 +187,9 @@ > > > * for established connections, or (tcp_syn_retries + > > > * tcp_syn_linear_timeouts) times during the handshake, reset the = connection > > > * > > > + * - RTO_INIT_ACK: if the RTO is less than this, re-initialize RTO t= o this for =20 > > > > Nit: we started with BrE (British English) spelling and later tried to > > keep that consistent for the sake of grep, so we'll usually use > > "initialise" (for consistency, at this point). =20 >=20 > Got it. > > =20 > > > + * data retransmissions. > > > + * > > > * - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK_F= ROM_TAP_DUE > > > * with TAP_FIN_SENT event), and no ACK is received within this ti= me, reset > > > * the connection > > > @@ -340,6 +343,7 @@ enum { > > > > > > #define ACK_INTERVAL 10 /* ms */ > > > #define RTO_INIT 1 /* s, RFC 6298 = */ > > > +#define RTO_INIT_ACK 3 /* s, RFC 6298 = */ > > > #define FIN_TIMEOUT 60 > > > #define ACT_TIMEOUT 7200 > > > > > > @@ -365,9 +369,11 @@ uint8_t tcp_migrate_rcv_queue [TCP_MI= GRATE_RCV_QUEUE_MAX]; > > > > > > #define TCP_SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_ret= ries" > > > #define TCP_SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_lin= ear_timeouts" > > > +#define TCP_RTO_MAX_MS "/proc/sys/net/ipv4/tcp= _rto_max_ms" > > > > > > #define TCP_SYN_RETRIES_DEFAULT 6 > > > #define TCP_SYN_LINEAR_TIMEOUTS_DEFAULT 4 > > > +#define TCP_RTO_MAX_MS_DEFAULT 120000 > > > > > > /* "Extended" data (not stored in the flow table) for TCP flow migra= tion */ > > > static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX]; > > > @@ -585,10 +591,13 @@ static void tcp_timer_ctl(const struct ctx *c, = struct tcp_tap_conn *conn) > > > if (conn->flags & ACK_TO_TAP_DUE) { > > > it.it_value.tv_nsec =3D (long)ACK_INTERVAL * 1000 * 100= 0; > > > } else if (conn->flags & ACK_FROM_TAP_DUE) { > > > - int exp =3D conn->retries; > > > + int exp =3D conn->retries, timeout =3D RTO_INIT; > > > if (!(conn->events & ESTABLISHED)) > > > exp -=3D c->tcp.syn_linear_timeouts; > > > - it.it_value.tv_sec =3D RTO_INIT << MAX(exp, 0); > > > + else > > > + timeout =3D MAX(timeout, RTO_INIT_ACK); > > > + timeout <<=3D MAX(exp, 0); > > > + it.it_value.tv_sec =3D MIN(timeout, c->tcp.tcp_rto_max)= ; > > > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > > > it.it_value.tv_sec =3D FIN_TIMEOUT; > > > } else { > > > @@ -2785,18 +2794,24 @@ static socklen_t tcp_probe_tcp_info(void) > > > */ > > > void tcp_get_rto_params(struct ctx *c) > > > { > > > - intmax_t tcp_syn_retries, syn_linear_timeouts; > > > + intmax_t tcp_syn_retries, syn_linear_timeouts, tcp_rto_max_ms; > > > > > > tcp_syn_retries =3D read_file_integer( > > > TCP_SYN_RETRIES, TCP_SYN_RETRIES_DEFAULT); > > > syn_linear_timeouts =3D read_file_integer( > > > TCP_SYN_LINEAR_TIMEOUTS, TCP_SYN_LINEAR_TIMEOUTS_DEFAUL= T); > > > + tcp_rto_max_ms =3D read_file_integer( > > > + TCP_RTO_MAX_MS, TCP_RTO_MAX_MS_DEFAULT); > > > > > > c->tcp.tcp_syn_retries =3D MIN(tcp_syn_retries, UINT8_MAX); > > > c->tcp.syn_linear_timeouts =3D MIN(syn_linear_timeouts, UINT8_M= AX); > > > + c->tcp.tcp_rto_max =3D MIN( > > > + DIV_ROUND_CLOSEST(tcp_rto_max_ms, 1000), SIZE_MAX); =20 > > > > size_t looks like a rather weird choice for tcp_rto_max: size_t is used > > to represent the size of objects, and nothing else (not a timeout in > > milliseconds). > > > > It's also an int in ipv4_net_table[], net/ipv4/sysctl_net_ipv4.c > > (Linux kernel). Any reason for picking size_t here? =20 >=20 > No particular reason. Just thought it can be used for counting as > well. I can change it to int in v8. > > > > I mean, it will work, it just looks weird (and wastes a tiny bit of spa= ce, > > even though I guess we don't care about 4 bytes there). > > =20 > > > - debug("Read sysctl values tcp_syn_retries: %"PRIu8", linear_tim= eouts: %"PRIu8, > > > - c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts); > > > + debug("Read sysctl values tcp_syn_retries: %"PRIu8 > > > + ", linear_timeouts: %"PRIu8", tcp_rto_max: %zu", > > > + c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts, > > > + c->tcp.tcp_rto_max); > > > } > > > > > > /** > > > diff --git a/tcp.h b/tcp.h > > > index befedde..a238bb7 100644 > > > --- a/tcp.h > > > +++ b/tcp.h > > > @@ -59,6 +59,7 @@ union tcp_listen_epoll_ref { > > > * @fwd_out: Port forwarding configuration for outbound pack= ets > > > * @timer_run: Timestamp of most recent timer run > > > * @pipe_size: Size of pipes for spliced connections > > > + * @tcp_rto_max: Maximal retry timeout (in s) =20 > > > > Nit: "maximal" has a slightly different meaning compared to "maximum". > > > > The highest value allowed for a field would typically be called > > "maximum", while "maximal" is more commonly used to indicate a value / > > element that's the biggest of all values. Yes, I know, it's complicated= . =20 >=20 > Yeah, it's complicated. Actually I get the word from networking/ip-sysctl= .rst. >=20 > tcp_rto_max_ms - INTEGER > Maximal TCP retransmission timeout (in ms). Oh, that's from 1280c26228bd ("tcp: add tcp_rto_max_ms sysctl), so probably from the French adjective "maximal" (in French, "maximum" is only a noun). > "maximum" might be more appropriate as you explained. >=20 > > =20 > > > * @tcp_syn_retries: SYN retries using exponential backoff timeout > > > * @syn_linear_timeouts: SYN retries before using exponential backof= f timeout > > > */ > > > @@ -67,6 +68,7 @@ struct tcp_ctx { > > > struct fwd_ports fwd_out; > > > struct timespec timer_run; > > > size_t pipe_size; > > > + size_t tcp_rto_max; > > > uint8_t tcp_syn_retries; > > > uint8_t syn_linear_timeouts; > > > }; =20 > > > > The rest of the series looks good to me at a *very* quick glance, but I > > can't claim I really reviewed it yet. =20 >=20 > Thank you. Please take your time if you'd like to review them further. Yes, I already started, didn't quite finish yet. --=20 Stefano