From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=Q6OX+xFX; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 1D0695A026F for ; Mon, 03 Nov 2025 05:07:15 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762142834; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/LDnNZwo5OxWjgTFPxwtQFeEzgEYNmlf67Ff02CRP/Q=; b=Q6OX+xFX/WbB3zZl00/mrpaJiCjc/34oNNMp7HqzmG36u/upoz55duBWBP57S9EiBqNKE+ rIGx0HVSa/reoLkbUEBJvjqqpvPxSyfKWTjURqdoTmPgfAM5f1fE00MuJLbFnAENxVkbeg ANJ57YO+5cWj/1y0R3fCx2GjIfSnH9E= Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-474-84BS05MrM8ihlcN2g9Z_gw-1; Sun, 02 Nov 2025 23:07:12 -0500 X-MC-Unique: 84BS05MrM8ihlcN2g9Z_gw-1 X-Mimecast-MFC-AGG-ID: 84BS05MrM8ihlcN2g9Z_gw_1762142831 Received: by mail-ed1-f69.google.com with SMTP id 4fb4d7f45d1cf-640ac45f840so2233428a12.1 for ; Sun, 02 Nov 2025 20:07:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762142831; x=1762747631; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/LDnNZwo5OxWjgTFPxwtQFeEzgEYNmlf67Ff02CRP/Q=; b=Jb7Jj+zveupp92X2N3QKYSeCA6vg7tJ6PB6FO9GAfXoV98DOXKD3YP1Mg5s5ZNXHTW eTw4D07i1Kn6SMqJHcuxUn3MP9bq9+UtRA+G2hb0FPY8ZfucA1+txnaJ6FVOjsQg/34u M/+SZe8QRi7r7FMyvpAWCj8p8Hl/LmyjVfaIbj/CTAYXLj912ZfuXhXA0lIIUdcEJ+WD EppPcvsdQVNWDYGDakloQi2pRwjtux1E7Np0tY8r6BhoSytdClgCEuq5kp/PvmBluA75 jqbDBzuMfodXtd1IqhCmKu66/HLeFSl+yUbFTlNmAtbwkDzNaUCe2KR1Y/mvhVX8e5gc L0Ew== X-Gm-Message-State: AOJu0YwOZ5l9f+eRd/bow6jt3X+mkiiAk9Q6f6Bg6C07lsX+Q7KCuDLH Yvat/Lhze6aLlzAK6TOgsUqL9wZfaeDlEIn8k21sW5C6r0hohNGKNlcfHb0tIC/lN/qsWUWDOzI +1jP20zdgxOJFYGF/eL4Oi+WKpmbOU43146lxOGqpZFix2459jeKNH90tiwWip0kgSpT+7RVzU5 4juAOPynSjbZmX/+2J4RiCFb14aY2y X-Gm-Gg: ASbGncudcf48msRHjphhMrald4W8rh+Mb1tMOiQJQBcunFpOtZRGsaAZTiutcaAu5ly 0Wj82IayaGGcw0H2BYoCvk8jwyRsWWG4A+DX4fN8oCEqhKDCdNnvYar570CYFaPvYPGQTSihPlm ZjnObT8njZEg04y/du8yM/gg2dFZicfWeaOpvPBLdfYIHaiGMaTYqOxJue X-Received: by 2002:a05:6402:34d0:b0:640:b247:fede with SMTP id 4fb4d7f45d1cf-640b2480407mr3023190a12.29.1762142831241; Sun, 02 Nov 2025 20:07:11 -0800 (PST) X-Google-Smtp-Source: AGHT+IEh8Q/Smgum6p3cRjVokFySRdMwxQVptsoJh0ZNK2YqSzglDo3Sh81wWyod4aBPRFpWYu6ZKneNct9vHV20VBE= X-Received: by 2002:a05:6402:34d0:b0:640:b247:fede with SMTP id 4fb4d7f45d1cf-640b2480407mr3023176a12.29.1762142830798; Sun, 02 Nov 2025 20:07:10 -0800 (PST) MIME-Version: 1.0 References: <20251031054242.7334-1-yuhuang@redhat.com> <20251031054242.7334-6-yuhuang@redhat.com> In-Reply-To: From: Yumei Huang Date: Mon, 3 Nov 2025 12:06:59 +0800 X-Gm-Features: AWmQ_blK7UDYTpO8qUPEeLxnXl29W7FGTYB8L5xX9GKtbZuHs25SZQziWy5lP8Y Message-ID: Subject: Re: [PATCH v7 5/5] tcp: Clamp the retry timeout To: David Gibson X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: GDVxfh_SFsfKQAY_YjtW0XuX8cSU-CUqxAwluNTwf8k_1762142831 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Message-ID-Hash: V7J6ZPEZM73GDR5DQVAIY2AVVW67IQNY X-Message-ID-Hash: V7J6ZPEZM73GDR5DQVAIY2AVVW67IQNY X-MailFrom: yuhuang@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, sbrivio@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Mon, Nov 3, 2025 at 9:38=E2=80=AFAM David Gibson wrote: > > On Fri, Oct 31, 2025 at 01:42:42PM +0800, Yumei Huang wrote: > > Clamp the TCP retry timeout as Linux kernel does. If RTO is less > > than 3 seconds, re-initialize it to 3 seconds for data retransmissions > > according to RFC 6298. > > > > Suggested-by: Stefano Brivio > > Signed-off-by: Yumei Huang > > --- > > tcp.c | 25 ++++++++++++++++++++----- > > tcp.h | 2 ++ > > 2 files changed, 22 insertions(+), 5 deletions(-) > > > > diff --git a/tcp.c b/tcp.c > > index 96ee56a..84a6700 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -187,6 +187,9 @@ > > * for established connections, or (tcp_syn_retries + > > * tcp_syn_linear_timeouts) times during the handshake, reset the co= nnection > > * > > + * - RTO_INIT_ACK: if the RTO is less than this, re-initialize RTO to = this for > > + * data retransmissions. > > + * > > * - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK_FRO= M_TAP_DUE > > * with TAP_FIN_SENT event), and no ACK is received within this time= , reset > > * the connection > > @@ -340,6 +343,7 @@ enum { > > > > #define ACK_INTERVAL 10 /* ms */ > > #define RTO_INIT 1 /* s, RFC 6298 */ > > +#define RTO_INIT_ACK 3 /* s, RFC 6298 */ > > #define FIN_TIMEOUT 60 > > #define ACT_TIMEOUT 7200 > > > > @@ -365,9 +369,11 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGR= ATE_RCV_QUEUE_MAX]; > > > > #define TCP_SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retri= es" > > #define TCP_SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linea= r_timeouts" > > +#define TCP_RTO_MAX_MS "/proc/sys/net/ipv4/tcp_r= to_max_ms" > > > > #define TCP_SYN_RETRIES_DEFAULT 6 > > #define TCP_SYN_LINEAR_TIMEOUTS_DEFAULT 4 > > +#define TCP_RTO_MAX_MS_DEFAULT 120000 > > > > /* "Extended" data (not stored in the flow table) for TCP flow migrati= on */ > > static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX]; > > @@ -585,10 +591,13 @@ static void tcp_timer_ctl(const struct ctx *c, st= ruct tcp_tap_conn *conn) > > if (conn->flags & ACK_TO_TAP_DUE) { > > it.it_value.tv_nsec =3D (long)ACK_INTERVAL * 1000 * 1000; > > } else if (conn->flags & ACK_FROM_TAP_DUE) { > > - int exp =3D conn->retries; > > + int exp =3D conn->retries, timeout =3D RTO_INIT; > > if (!(conn->events & ESTABLISHED)) > > exp -=3D c->tcp.syn_linear_timeouts; > > - it.it_value.tv_sec =3D RTO_INIT << MAX(exp, 0); > > + else > > + timeout =3D MAX(timeout, RTO_INIT_ACK); > > Possibly I missed something, since I only skimmed your discussion of > this behaviour with Stefano. But I'm not convinced this is a correct > interpretation of the RFC. (5.7) says "If the timer expires awaiting > the ACK of a SYN segment ...". That is, I think it's only suggesting > increasing the RTO to 3 at the data stage *if* we had at least one > retry during the handshake. That is, unfortunately, much fiddlier to > implement, since we need to remember what happened during the > handshake to apply it here. Yes, you are right. Stefano thought it would be simpler than re-introducing separate starting values. > > Additionally, if I'm reading the RFC correctly, it's treating this as > a one-time adjustment of the RTO, which won't necessarily remain the > case for the entire data phase. Here this minimum will apply for the > entire data phase. > > Even though it's a "MUST" in the RFC, I kind of think we could just > skip this for two reasons: I will leave the discussion to the two of you :) > > 1) We already don't bother with RTT measurements, which the RFC > assumes the implementation is doing to adjust the RTO. > > 2) We expect to be talking to a guest. The chance of a high RTT is > vanishingly small compared to a path to potentially any host on > the 2011 internet. > > > + timeout <<=3D MAX(exp, 0); > > + it.it_value.tv_sec =3D MIN(timeout, c->tcp.tcp_rto_max); > > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > > it.it_value.tv_sec =3D FIN_TIMEOUT; > > } else { > > @@ -2785,18 +2794,24 @@ static socklen_t tcp_probe_tcp_info(void) > > */ > > void tcp_get_rto_params(struct ctx *c) > > { > > - intmax_t tcp_syn_retries, syn_linear_timeouts; > > + intmax_t tcp_syn_retries, syn_linear_timeouts, tcp_rto_max_ms; > > > > tcp_syn_retries =3D read_file_integer( > > TCP_SYN_RETRIES, TCP_SYN_RETRIES_DEFAULT); > > syn_linear_timeouts =3D read_file_integer( > > TCP_SYN_LINEAR_TIMEOUTS, TCP_SYN_LINEAR_TIMEOUTS_DEFAULT)= ; > > + tcp_rto_max_ms =3D read_file_integer( > > + TCP_RTO_MAX_MS, TCP_RTO_MAX_MS_DEFAULT); > > > > c->tcp.tcp_syn_retries =3D MIN(tcp_syn_retries, UINT8_MAX); > > c->tcp.syn_linear_timeouts =3D MIN(syn_linear_timeouts, UINT8_MAX= ); > > + c->tcp.tcp_rto_max =3D MIN( > > + DIV_ROUND_CLOSEST(tcp_rto_max_ms, 1000), SIZE_MAX); > > > > - debug("Read sysctl values tcp_syn_retries: %"PRIu8", linear_timeo= uts: %"PRIu8, > > - c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts); > > + debug("Read sysctl values tcp_syn_retries: %"PRIu8 > > + ", linear_timeouts: %"PRIu8", tcp_rto_max: %zu", > > + c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts, > > + c->tcp.tcp_rto_max); > > } > > > > /** > > diff --git a/tcp.h b/tcp.h > > index befedde..a238bb7 100644 > > --- a/tcp.h > > +++ b/tcp.h > > @@ -59,6 +59,7 @@ union tcp_listen_epoll_ref { > > * @fwd_out: Port forwarding configuration for outbound packet= s > > * @timer_run: Timestamp of most recent timer run > > * @pipe_size: Size of pipes for spliced connections > > + * @tcp_rto_max: Maximal retry timeout (in s) > > * @tcp_syn_retries: SYN retries using exponential backoff timeout > > * @syn_linear_timeouts: SYN retries before using exponential backoff = timeout > > */ > > @@ -67,6 +68,7 @@ struct tcp_ctx { > > struct fwd_ports fwd_out; > > struct timespec timer_run; > > size_t pipe_size; > > + size_t tcp_rto_max; > > uint8_t tcp_syn_retries; > > uint8_t syn_linear_timeouts; > > }; > > -- > > 2.49.0 > > > > -- > David Gibson (he or they) | I'll have my music baroque, and my code > david AT gibson.dropbear.id.au | minimalist, thank you, not the other wa= y > | around. > http://www.ozlabs.org/~dgibson --=20 Thanks, Yumei Huang