From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=ZMNtZkuU; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 4FCB75A0619 for ; Mon, 20 Oct 2025 12:58:01 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1760957880; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UNC1Y0EhWa+G0sMjrROFKxrEuAPWCyOjTIX1QAEKN6I=; b=ZMNtZkuUknos2uoPtH9Bc8gjyqSW0fZj1DxfRM6gK31K6/KJT43wBXf4z4GGQ5NQfDwnbB bPeGlds1fi4Vimex6EIVsjPnPbE+9hgFNESJIQ3zb+9n8sS4MEq7At0Xjp27xSbdKLCClH 94r3AQukdxI4N6kPR3mjG3YUhSlb2Eg= Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-642-xKC3_D_HMQiBR0ZtH2D0UQ-1; Mon, 20 Oct 2025 06:57:58 -0400 X-MC-Unique: xKC3_D_HMQiBR0ZtH2D0UQ-1 X-Mimecast-MFC-AGG-ID: xKC3_D_HMQiBR0ZtH2D0UQ_1760957878 Received: by mail-ed1-f70.google.com with SMTP id 4fb4d7f45d1cf-63c585eb47bso598967a12.1 for ; Mon, 20 Oct 2025 03:57:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760957877; x=1761562677; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UNC1Y0EhWa+G0sMjrROFKxrEuAPWCyOjTIX1QAEKN6I=; b=botMH2NledRgK1wSthPYydw7RojCJCwnKrjRxvmOQ2M4zNQW9FvCVWk1THfU2INMf1 jS3yIGn9X6DZ228d/YN9xAFIVlMoJ3eSWT3bIT/kYJVWuCCLr+LtIOMDjqXG7I7t0AjL Sfa2K8omwEdY3yKEELp0ZEakbT35QPxtNX/mfEDBXcZhWkWVGFyeVlEzu1d5O8jM1fXu fbf3+GsifEL176IME/YQp1p0Tlt08Na+G2Ryi7nc64cT4ymzJwruIr71Y+dPHjrTQlNx eqAbXXKqd6bK3lzvT5KPN1BposI/pwkLDxRUImeaE/e0Uypyw/McK/bCYhRk55pUNVq1 CU9g== X-Forwarded-Encrypted: i=1; AJvYcCWdqKx1Piq4unyskE4B1z9bZqxypVhbKRWZaexhXGjgKjJHXuFOp6qwmKjavvdCdsA3Oaatrhsoq7U=@passt.top X-Gm-Message-State: AOJu0YzNhbUA6tNlfQ2ixrKEcszQzslZ+O439veblBgt0SZ3RcFjxRPJ Zs53ei1BO7n9wgtt/n8XZ4u0N+9eW98K+2+xATn1QEs3832HU+faBmmbS/09vicWdvhEAmoKUKI nURk2op4vadXrUGUPuv1aBxIxUsONAbWRiIw9TIEgwrz5l4p/2OiO2qQII+l4z+jO/GlDeox20E NE4AsNbSki19vMKSVcr6aL48OPAU+k X-Gm-Gg: ASbGncupjm4KpEWt8gh1owLft2WecLh6M1xTuGzCKRK9YgivKo0Jghzlrvo+rC6SJTW ln2my3roKtnkAncpamMq6jIlAaDlIpalph6na42aOB9s0OwexOkSvVvB3px6jRRB0OENrv40yfg 31ei484rVwP+w1iHAFEA/Z7qiphd1mmFZ8H/rtjRwSAqQMnLdKBlY2jKvP X-Received: by 2002:a05:6402:354c:b0:639:db35:62df with SMTP id 4fb4d7f45d1cf-63c1e1e0481mr13098555a12.3.1760957877475; Mon, 20 Oct 2025 03:57:57 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEw/emhZ3mbLtQzQSO6An5lhBYmj+zzM3+TsnatC5+0s61Wd6FWgop5BDaaF95nwFdKLYIb1NZtQNatmhgbEbA= X-Received: by 2002:a05:6402:354c:b0:639:db35:62df with SMTP id 4fb4d7f45d1cf-63c1e1e0481mr13098528a12.3.1760957876959; Mon, 20 Oct 2025 03:57:56 -0700 (PDT) MIME-Version: 1.0 References: <20251014073836.18150-1-yuhuang@redhat.com> <20251014073836.18150-5-yuhuang@redhat.com> <20251017202812.173e9352@elisabeth> In-Reply-To: <20251017202812.173e9352@elisabeth> From: Yumei Huang Date: Mon, 20 Oct 2025 18:57:45 +0800 X-Gm-Features: AS18NWDvyrJRvuKDUZbTRGOhGM4Ku1KSkp2a_83BoZAK_a0JfC-g-AAPgbPZWoY Message-ID: Subject: Re: [PATCH v3 4/4] tcp: Update data retransmission timeout To: Stefano Brivio X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: SW1BESxN3tsrgkhxZTlO92hBJYmcR3KLNReXHIsfiCs_1760957878 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Message-ID-Hash: HJ23YHNVZMPPPEI3MGGF63LPTNJZ7Y3J X-Message-ID-Hash: HJ23YHNVZMPPPEI3MGGF63LPTNJZ7Y3J X-MailFrom: yuhuang@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: David Gibson , passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Sat, Oct 18, 2025 at 2:28=E2=80=AFAM Stefano Brivio = wrote: > > On Thu, 16 Oct 2025 09:54:25 +1100 > David Gibson wrote: > > > On Wed, Oct 15, 2025 at 02:31:27PM +0800, Yumei Huang wrote: > > > On Wed, Oct 15, 2025 at 8:05=E2=80=AFAM David Gibson > > > wrote: > > > > > > > > On Tue, Oct 14, 2025 at 03:38:36PM +0800, Yumei Huang wrote: > > > > > According to RFC 2988 and RFC 6298, we should use an exponential > > > > > backoff timeout for data retransmission starting from one second > > > > > (see Appendix A in RFC 6298), and limit it to about 60 seconds > > > > > as allowed by the same RFC: > > > > > > > > > > (2.5) A maximum value MAY be placed on RTO provided it is at > > > > > least 60 seconds. > > > > > > > > The interpretation of this isn't entirely clear to me. Does it mea= n > > > > if the total retransmit delay exceeds 60s we give up and RST (what > > > > this patch implements)? Or does it mean that if the retransmit del= ay > > > > reaches 60s we keep retransmitting, but don't increase the delay an= y > > > > further? > > > > > > > > Looking at tcp_bound_rto() and related code in the kernel suggests = the > > > > second interpretation. > > > > > > > > > Combine the macros defining the initial timeout for both SYN and = ACK. > > > > > And add a macro ACK_RETRIES to limit the total timeout to about 6= 0s. > > > > > > > > > > Signed-off-by: Yumei Huang > > > > > --- > > > > > tcp.c | 32 ++++++++++++++++---------------- > > > > > 1 file changed, 16 insertions(+), 16 deletions(-) > > > > > > > > > > diff --git a/tcp.c b/tcp.c > > > > > index 3ce3991..84da069 100644 > > > > > --- a/tcp.c > > > > > +++ b/tcp.c > > > > > @@ -179,16 +179,12 @@ > > > > > * > > > > > * Timeouts are implemented by means of timerfd timers, set base= d on flags: > > > > > * > > > > > - * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest duri= ng handshake > > > > > - * (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within th= is time, resend > > > > > - * SYN. It's the starting timeout for the first SYN retry. If = this persists > > > > > - * for more than TCP_MAX_RETRIES or (tcp_syn_retries + > > > > > - * tcp_syn_linear_timeouts) times in a row, reset the connecti= on > > > > > - * > > > > > - * - ACK_TIMEOUT: if no ACK segment was received from tap/guest,= after sending > > > > > - * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-sen= d data from the > > > > > - * socket and reset sequence to what was acknowledged. If this= persists for > > > > > - * more than TCP_MAX_RETRIES times in a row, reset the connect= ion > > > > > + * - ACK_TIMEOUT_INIT: if no ACK segment was received from tap/g= uest, eiher > > > > > + * during handshake(flag ACK_FROM_TAP_DUE without ESTABLISHED = event) or after > > > > > + * sending data (flag ACK_FROM_TAP_DUE with ESTABLISHED event)= , re-send data > > > > > + * from the socket and reset sequence to what was acknowledged= . It's the > > > > > + * starting timeout for the first retry. If this persists for = more than > > > > > + * allowed times in a row, reset the connection > > > > > * > > > > > * - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag A= CK_FROM_TAP_DUE > > > > > * with TAP_FIN_SENT event), and no ACK is received within thi= s time, reset > > > > > @@ -342,8 +338,7 @@ enum { > > > > > #define WINDOW_DEFAULT 14600 /* = RFC 6928 */ > > > > > > > > > > #define ACK_INTERVAL 10 /* ms */ > > > > > -#define SYN_TIMEOUT_INIT 1 /* s */ > > > > > -#define ACK_TIMEOUT 2 > > > > > +#define ACK_TIMEOUT_INIT 1 /* s, RFC 6= 298 */ > > > > > > > > I'd suggest calling this RTO_INIT to match the terminology used in = the > > > > RFCs. > > > > > > Sure. > > > > > > > > > #define FIN_TIMEOUT 60 > > > > > #define ACT_TIMEOUT 7200 > > > > > > > > > > @@ -352,6 +347,11 @@ enum { > > > > > > > > > > #define ACK_IF_NEEDED 0 /* See tcp_send_fla= g() */ > > > > > > > > > > +/* Number of retries calculated from the exponential backoff for= mula, limited > > > > > + * by a total timeout of about 60 seconds. > > > > > + */ > > > > > +#define ACK_RETRIES 5 > > > > > + > > > > > > > > As noted above, I think this is based on a misunderstanding of what > > > > the RFC is saying. TCP_MAX_RETRIES should be fine as it is, I thin= k. > > > > We could implement the clamping of the RTO, but it's a "MAY" in the > > > > RFC, so we don't have to, and I don't really see a strong reason to= do > > > > so. > > > > > > If we use TCP_MAX_RETRIES and not clamping RTO, the total timeout > > > could be 255 seconds. > > > > > > Stefano mentioned "Retransmitting data after 256 seconds doesn't make > > > a lot of sense to me" in the previous comment. > > > > That's true, but it's pretty much true for 60s as well. For the local > > link we usually have between passt and guest, even 1s is an eternity. > > Rather than the local link I was thinking of whatever monitor or > liveness probe in KubeVirt which might have a 60-second period, or some > firewall agent, or how long it typically takes for guests to stop and > resume again in KubeVirt. > > It's usually seconds or maybe minutes but not five minutes. > > > Basically I see no harm, but also no advantage to clamping or limiting > > the RTO, so I'm suggesting going with the simplest code. > > The advantage I see is that we'll recover significantly faster in case > something went wrong. > > > Note that there are (rare) situations where we could get a response > > after minutes. > > - The interface on the guest was disabled for a while > > - An error in guest firewall configuration blocked packets for a while > > - A bug on the guest cause the kernel to wedge for a while > > - The user manually suspended the guest for a while (VM/passt only) > > > > These generally indicate something has gone fairly badly wrong, but a > > long RTO gives the user a bit more time to realise their mistake and > > fix things. > > True, it's just that to me five minutes sounds like "broken beyond > repair", while one minute sounds like "oh we tried again and it worked". > > > These are niche cases, but given the cost of implementing > > it is "do nothing"... > > ...anyway, it's not a strong preference from my side. It's mostly about > experience but I won't be able to really come up with obvious evidence > (at least not quickly), so if the code is significantly simpler... > whatever. It's not provable so I won't insist. > > Note: the comments I'm replying to are from yesterday / Thursday, on > v3, and today / Friday we're at v6. I don't expect a week grace period > as you would on the kernel: > > https://docs.kernel.org/process/submitting-patches.html#don-t-get-disco= uraged-or-impatient > > because we can surely move faster than that, but three versions in a > day obviously before I get any chance to have a look means a > substantial overhead for me, and I might miss the meaning and context of > comments of other reviewers (David in this case). There are no > changelogs in cover letters either. > > I plan to skip to v6 but don't expect a review soon, because of that > overhead I just mentioned. Sorry for the overhead I brought. It's just so different from what we do with MRs or PRs(at least within our team), which we are supposed to update as soon as possible, so reviewers could review again at any time they are available. And it's always the latest code (with less "problematic" code) there for review, not the outdated ones. I thought it's the same with patches in emails, that outdated versions are no longer useful. Apparently I got it wrong. I will keep it in mind and not send too many versions in a short time, and add changelogs in cover letters when necessary. > > -- > Stefano > --=20 Thanks, Yumei Huang