From mboxrd@z Thu Jan  1 00:00:00 1970
Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com
Authentication-Results: passt.top;
	dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=XyYoJn7t;
	dkim-atps=neutral
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by passt.top (Postfix) with ESMTPS id C6C2E5A026F
	for <passt-dev@passt.top>; Fri, 17 Oct 2025 20:28:19 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1760725698;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=jd5Ylv4N6JrqB4JE9BZJltMKeVsYi7cNK1r096XgtuU=;
	b=XyYoJn7tJko+TieGf51Vr7cdW5dzH6D9fwdEv9M9d4wvpNlFF09gh40ElCLnOziJB3g1SB
	iuwOaGT2wDTBBHmcY/nq9NN1uGRralmZSgCuOMPdaINVPaOj7NNaNpMrJFe2Hmt6p1tBJO
	ZND9wjwqgR1boyw/KKJGeplsjvgLk18=
Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com
 [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id
 us-mta-451-J3a5Z8hnN0WZ7tQ_56Xntg-1; Fri, 17 Oct 2025 14:28:17 -0400
X-MC-Unique: J3a5Z8hnN0WZ7tQ_56Xntg-1
X-Mimecast-MFC-AGG-ID: J3a5Z8hnN0WZ7tQ_56Xntg_1760725696
Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-3f7b5c27d41so1367434f8f.0
        for <passt-dev@passt.top>; Fri, 17 Oct 2025 11:28:16 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1760725696; x=1761330496;
        h=content-transfer-encoding:mime-version:organization:references
         :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state
         :from:to:cc:subject:date:message-id:reply-to;
        bh=J5UNKhT1pnGeUCoks4rx4mfHSbUGJEZ/FfypvendERE=;
        b=XmqaIkR97PRbMs5j8i0dtNpb5EzxKXeo3EiS74p3dES2JypAPWRmCOFX03yiekYrBx
         dfC2ICXccq7dUy8MlYmOQL1goKV/Vsqj5ijD2furcVk1n3RkXqv0RDRuCDwd6DIAmVSL
         jfva5qKgBTirLxQtLhZkQ8lysg4gYlTivcMUZvrFSHpIwUQLEn/UM6JyHfyiP5W0t+Nr
         tow7o0juzEl+FHkhwsqVoJMUKuVNhQkDHW/hSwbOMze5DY3EGorNzFSqNy3cBUyhgh5I
         8vq5v/3FV4K3WIqjCrwCC94TRLkKSbEkpZkZYUsyRjFJxoF6apyVhZmJ0zaTBmon8QWN
         84eQ==
X-Gm-Message-State: AOJu0YwY/+KORzigQ1VsWoTukpmbQij6pQUUJEDXaRvPe3ikwbVDMpGx
	Qw0RgzeQku5Gqyp9iO0gV8BiY0pVLgW5v5KtOtWntWoSYtcOzZypTvPBR2C1B4l3hT15McfmwfR
	RkM8AUJoN2CzgKJogROdZZH6KlIm5oa977p93YumzWTcbLhozALIHIg==
X-Gm-Gg: ASbGncsF/GJqkKr1j8qH/CHKPRuX1dZq7tgJX5Fmd64kK7urYT8+5UTX1SbUswYyB+A
	m6XKMB8XcOcZVmnyUm8YHAq3QG7fgZJED3FuZx2YKEPLUPQvqWDwwxN0IZe7RoDrVQ9jw0Dml5r
	qBAPLH6bLW0bJt7pxiq+ZKg6wHrFyWOgpgWpzjCNlIDa1+MwcdP5ibqQrMEu3zIOigSz9lt3i0G
	pY2oyN9jHQ6z+M3fVGwdRW2dtoCyNOqcB0gQVr9/4XZOCWLtnkm/a8S4J4Wi8+J1ZmlB96RgIn6
	IRGAolYWfQT2VO/8vd7VxNo39W4eahPOdZR7DpILT5lriRqiQ2p2f2tQe96YO3B3Alq0ARdrPuI
	Br6TkugPuRQ==
X-Received: by 2002:a05:6000:2384:b0:425:58d0:483a with SMTP id ffacd0b85a97d-426fb6a1cbamr6475119f8f.3.1760725695646;
        Fri, 17 Oct 2025 11:28:15 -0700 (PDT)
X-Google-Smtp-Source: AGHT+IElsKvcOU7/nplQzHSjs+GqkCDLqVZmyo/pBvRF4Nqggi9cKobbsi1kyAUr9ATq8zGXHxrk7A==
X-Received: by 2002:a05:6000:2384:b0:425:58d0:483a with SMTP id ffacd0b85a97d-426fb6a1cbamr6475094f8f.3.1760725695088;
        Fri, 17 Oct 2025 11:28:15 -0700 (PDT)
Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1])
        by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4711c487dfesm62805505e9.17.2025.10.17.11.28.14
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 17 Oct 2025 11:28:14 -0700 (PDT)
Date: Fri, 17 Oct 2025 20:28:12 +0200
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>, Yumei Huang
 <yuhuang@redhat.com>
Subject: Re: [PATCH v3 4/4] tcp: Update data retransmission timeout
Message-ID: <20251017202812.173e9352@elisabeth>
In-Reply-To: <aPAmIX_ZqEPjuH7A@zatzit>
References: <20251014073836.18150-1-yuhuang@redhat.com>
	<20251014073836.18150-5-yuhuang@redhat.com>
	<aO7lMXUs9jWzBMO9@zatzit>
	<CANsz47kXgwXvbBR_=VTt2kjEy-ZZGDrqHDriP5LGv+-jM6VppQ@mail.gmail.com>
	<aPAmIX_ZqEPjuH7A@zatzit>
Organization: Red Hat
X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu)
MIME-Version: 1.0
X-Mimecast-Spam-Score: 0
X-Mimecast-MFC-PROC-ID: 3jn5C5quFbWLLL5cBnFVtIZm3MCVlQR4VnhuICB6R74_1760725696
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Message-ID-Hash: FDW2HERQWTZDZGDVCPPJNYH7SDXGBPER
X-Message-ID-Hash: FDW2HERQWTZDZGDVCPPJNYH7SDXGBPER
X-MailFrom: sbrivio@redhat.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: passt-dev@passt.top
X-Mailman-Version: 3.3.8
Precedence: list
List-Id: Development discussion and patches for passt <passt-dev.passt.top>
Archived-At: <https://archives.passt.top/passt-dev/20251017202812.173e9352@elisabeth/>
Archived-At: <https://passt.top/hyperkitty/list/passt-dev@passt.top/message/FDW2HERQWTZDZGDVCPPJNYH7SDXGBPER/>
List-Archive: <https://archives.passt.top/passt-dev/>
List-Archive: <https://passt.top/hyperkitty/list/passt-dev@passt.top/>
List-Help: <mailto:passt-dev-request@passt.top?subject=help>
List-Owner: <mailto:passt-dev-owner@passt.top>
List-Post: <mailto:passt-dev@passt.top>
List-Subscribe: <mailto:passt-dev-join@passt.top>
List-Unsubscribe: <mailto:passt-dev-leave@passt.top>

On Thu, 16 Oct 2025 09:54:25 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Oct 15, 2025 at 02:31:27PM +0800, Yumei Huang wrote:
> > On Wed, Oct 15, 2025 at 8:05=E2=80=AFAM David Gibson
> > <david@gibson.dropbear.id.au> wrote: =20
> > >
> > > On Tue, Oct 14, 2025 at 03:38:36PM +0800, Yumei Huang wrote: =20
> > > > According to RFC 2988 and RFC 6298, we should use an exponential
> > > > backoff timeout for data retransmission starting from one second
> > > > (see Appendix A in RFC 6298), and limit it to about 60 seconds
> > > > as allowed by the same RFC:
> > > >
> > > >    (2.5) A maximum value MAY be placed on RTO provided it is at
> > > >          least 60 seconds. =20
> > >
> > > The interpretation of this isn't entirely clear to me.  Does it mean
> > > if the total retransmit delay exceeds 60s we give up and RST (what
> > > this patch implements)?  Or does it mean that if the retransmit delay
> > > reaches 60s we keep retransmitting, but don't increase the delay any
> > > further?
> > >
> > > Looking at tcp_bound_rto() and related code in the kernel suggests th=
e
> > > second interpretation.
> > > =20
> > > > Combine the macros defining the initial timeout for both SYN and AC=
K.
> > > > And add a macro ACK_RETRIES to limit the total timeout to about 60s=
.
> > > >
> > > > Signed-off-by: Yumei Huang <yuhuang@redhat.com>
> > > > ---
> > > >  tcp.c | 32 ++++++++++++++++----------------
> > > >  1 file changed, 16 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/tcp.c b/tcp.c
> > > > index 3ce3991..84da069 100644
> > > > --- a/tcp.c
> > > > +++ b/tcp.c
> > > > @@ -179,16 +179,12 @@
> > > >   *
> > > >   * Timeouts are implemented by means of timerfd timers, set based =
on flags:
> > > >   *
> > > > - * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during=
 handshake
> > > > - *   (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this=
 time, resend
> > > > - *   SYN. It's the starting timeout for the first SYN retry. If th=
is persists
> > > > - *   for more than TCP_MAX_RETRIES or (tcp_syn_retries +
> > > > - *   tcp_syn_linear_timeouts) times in a row, reset the connection
> > > > - *
> > > > - * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, a=
fter sending
> > > > - *   data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send =
data from the
> > > > - *   socket and reset sequence to what was acknowledged. If this p=
ersists for
> > > > - *   more than TCP_MAX_RETRIES times in a row, reset the connectio=
n
> > > > + * - ACK_TIMEOUT_INIT: if no ACK segment was received from tap/gue=
st, eiher
> > > > + *   during handshake(flag ACK_FROM_TAP_DUE without ESTABLISHED ev=
ent) or after
> > > > + *   sending data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), =
re-send data
> > > > + *   from the socket and reset sequence to what was acknowledged. =
It's the
> > > > + *   starting timeout for the first retry. If this persists for mo=
re than
> > > > + *   allowed times in a row, reset the connection
> > > >   *
> > > >   * - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK=
_FROM_TAP_DUE
> > > >   *   with TAP_FIN_SENT event), and no ACK is received within this =
time, reset
> > > > @@ -342,8 +338,7 @@ enum {
> > > >  #define WINDOW_DEFAULT                       14600           /* RF=
C 6928 */
> > > >
> > > >  #define ACK_INTERVAL                 10              /* ms */
> > > > -#define SYN_TIMEOUT_INIT             1               /* s */
> > > > -#define ACK_TIMEOUT                  2
> > > > +#define ACK_TIMEOUT_INIT             1               /* s, RFC 629=
8 */ =20
> > >
> > > I'd suggest calling this RTO_INIT to match the terminology used in th=
e
> > > RFCs. =20
> >=20
> > Sure. =20
> > > =20
> > > >  #define FIN_TIMEOUT                  60
> > > >  #define ACT_TIMEOUT                  7200
> > > >
> > > > @@ -352,6 +347,11 @@ enum {
> > > >
> > > >  #define ACK_IF_NEEDED        0               /* See tcp_send_flag(=
) */
> > > >
> > > > +/* Number of retries calculated from the exponential backoff formu=
la, limited
> > > > + * by a total timeout of about 60 seconds.
> > > > + */
> > > > +#define ACK_RETRIES          5
> > > > + =20
> > >
> > > As noted above, I think this is based on a misunderstanding of what
> > > the RFC is saying.  TCP_MAX_RETRIES should be fine as it is, I think.
> > > We could implement the clamping of the RTO, but it's a "MAY" in the
> > > RFC, so we don't have to, and I don't really see a strong reason to d=
o
> > > so. =20
> >=20
> > If we use TCP_MAX_RETRIES and not clamping RTO, the total timeout
> > could be 255 seconds.
> >=20
> > Stefano mentioned "Retransmitting data after 256 seconds doesn't make
> > a lot of sense to me" in the previous comment. =20
>=20
> That's true, but it's pretty much true for 60s as well.  For the local
> link we usually have between passt and guest, even 1s is an eternity.

Rather than the local link I was thinking of whatever monitor or
liveness probe in KubeVirt which might have a 60-second period, or some
firewall agent, or how long it typically takes for guests to stop and
resume again in KubeVirt.

It's usually seconds or maybe minutes but not five minutes.

> Basically I see no harm, but also no advantage to clamping or limiting
> the RTO, so I'm suggesting going with the simplest code.

The advantage I see is that we'll recover significantly faster in case
something went wrong.

> Note that there are (rare) situations where we could get a response
> after minutes.
>  - The interface on the guest was disabled for a while
>  - An error in guest firewall configuration blocked packets for a while
>  - A bug on the guest cause the kernel to wedge for a while
>  - The user manually suspended the guest for a while (VM/passt only)
>=20
> These generally indicate something has gone fairly badly wrong, but a
> long RTO gives the user a bit more time to realise their mistake and
> fix things.

True, it's just that to me five minutes sounds like "broken beyond
repair", while one minute sounds like "oh we tried again and it worked".

> These are niche cases, but given the cost of implementing
> it is "do nothing"...

...anyway, it's not a strong preference from my side. It's mostly about
experience but I won't be able to really come up with obvious evidence
(at least not quickly), so if the code is significantly simpler...
whatever. It's not provable so I won't insist.

Note: the comments I'm replying to are from yesterday / Thursday, on
v3, and today / Friday we're at v6. I don't expect a week grace period
as you would on the kernel:

  https://docs.kernel.org/process/submitting-patches.html#don-t-get-discour=
aged-or-impatient

because we can surely move faster than that, but three versions in a
day obviously before I get any chance to have a look means a
substantial overhead for me, and I might miss the meaning and context of
comments of other reviewers (David in this case). There are no
changelogs in cover letters either.

I plan to skip to v6 but don't expect a review soon, because of that
overhead I just mentioned.

--=20
Stefano