From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=ReAzy7PZ; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id E26175A0619 for ; Tue, 21 Oct 2025 01:20:54 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1761002453; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wh9ixLqSu6/cbuN2lLY3r4riKR8CL/nEHftKEWyN0uA=; b=ReAzy7PZSlm1nb31nBOnfqRPVhUyTS7cOEkT6vP8rXGYRpB4fuUbtO/GEOnXVbRRH384tQ IQtys/YD17S6h2kU4j8+147Zs4/gY6W7TS0wl+GRq+/mnX11+1ZnGJJxOPAbOqzPxu0D5C AeE6J7hFvoTLzqkUGPJxE1FvdHlXxK0= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-538-j40RjEe2PVCoRcecE7AQVw-1; Mon, 20 Oct 2025 19:20:50 -0400 X-MC-Unique: j40RjEe2PVCoRcecE7AQVw-1 X-Mimecast-MFC-AGG-ID: j40RjEe2PVCoRcecE7AQVw_1761002449 Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-4721b4f3afbso10746505e9.0 for ; Mon, 20 Oct 2025 16:20:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761002449; x=1761607249; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=hByZ1worlMUttDMUYzcCjWPNXdqkWd2BbMNFSWR19Cs=; b=XJ1Frn/8AxWcBf3twSMmeU+bKON8nDgqoOjVGk44pQMX3CKt+yGs8zC/4sUbsC/scT 7/sJrY2QWdb0xLgSwLxq8gCgKEakdOSCBvC2Yt3T83UoVYPkwUbwNjzQYP51Q8JwmK5Z hmmC1dXb0HXB/haNA7uWdYPSlruxQeIuaop9H7vQtljzvx6UwUNMTsRJvqNZiojw+5lu ic0TmA7Bz03zd8YfAgNmLPYONDSe/zZkhUtBGlWZTbEobG6k/i9cw6iQZBvZyPiFHp4a QNApy+0IUNr0mXQ6Adj+GOHUAQVEsW9zt7Aom+DaGUj8n+pNGNg1TpKyf05CCE5td1Ua l/SA== X-Forwarded-Encrypted: i=1; AJvYcCXQU1Wh188+CvxQPI7A6BI1FNI6WwEm0h/pXlVbbT/V0186UyaJvuqatx0LOah2GsfVfD6YJdiHKHo=@passt.top X-Gm-Message-State: AOJu0YwiLS9+CqdNlDHE6EyI5hY/SjfjVVQRizood374vu/buF4TX6rj FDlUo1R7uf3i8s6OBFItJrxhd9n3oTls31aVH+0EyhoMjdcovPyoHRdkvH4B4KzUi2+IsrkF0/F u9t0QjVG6J2BeX66GxuUjtyf8j3qBH5Ga71Jr7coENcqNJL2ciL/KuA== X-Gm-Gg: ASbGncvV/kIRvxnJbG5vnkbeu+5y+yGFVBFzXhKORA+TaPOZB4z/ELGbNUqbjG1qFRb YzdLknFpg6mYtFOJYeM7SNet7hW1JcL4Y7Il78TexiRGVK5v+xKUjV3junDD4uh1dLOKls6LD+S e4nLDLanrhFv1xuewFquDMuTwBYPDWZmsXdAUCZKf+uwKKesTGmZEEVhlt1RIBCLmZdlNeUXQ3j yVURagNut1x4e9J3kUnkcItyZsj1IuVTdXUbHtxU/IjyB3YsAuHG0zlOlxdkKjhz9oErZVFLvat fGvwV5KsC810wdVIs0b6cX+4EbTYKkmPHXUZrl+9OeP9H/zp6LkHJJ+AQPVpNmBL+CvPnSJSyyB LekZAG/GSxw== X-Received: by 2002:a05:600c:8b03:b0:46e:41b0:f0cb with SMTP id 5b1f17b1804b1-4711790c57amr116205195e9.25.1761002448721; Mon, 20 Oct 2025 16:20:48 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHl1Adu0m4h3dJGsCq8qgaM/s2koNOTslD70wgU69qd+key4HKim94TRb4nHlvHaamUZaqZAw== X-Received: by 2002:a05:600c:8b03:b0:46e:41b0:f0cb with SMTP id 5b1f17b1804b1-4711790c57amr116204995e9.25.1761002448065; Mon, 20 Oct 2025 16:20:48 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-427f00ce3aesm17742354f8f.48.2025.10.20.16.20.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Oct 2025 16:20:47 -0700 (PDT) Date: Tue, 21 Oct 2025 01:20:46 +0200 From: Stefano Brivio To: Yumei Huang Subject: Re: [PATCH v3 4/4] tcp: Update data retransmission timeout Message-ID: <20251021012046.6a1aa634@elisabeth> In-Reply-To: References: <20251014073836.18150-1-yuhuang@redhat.com> <20251014073836.18150-5-yuhuang@redhat.com> <20251017202812.173e9352@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: Mln6vwCn5_h8qwxm_rjqe1KJo-RF9apOsSWLg_l_oco_1761002449 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: XYCFBLCS67QV7HYNHIKP54U6FBUAPIBK X-Message-ID-Hash: XYCFBLCS67QV7HYNHIKP54U6FBUAPIBK X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: David Gibson , passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Mon, 20 Oct 2025 18:57:45 +0800 Yumei Huang wrote: > On Sat, Oct 18, 2025 at 2:28=E2=80=AFAM Stefano Brivio wrote: > > > > On Thu, 16 Oct 2025 09:54:25 +1100 > > David Gibson wrote: > > =20 > > > On Wed, Oct 15, 2025 at 02:31:27PM +0800, Yumei Huang wrote: =20 > > > > On Wed, Oct 15, 2025 at 8:05=E2=80=AFAM David Gibson > > > > wrote: =20 > > > > > > > > > > On Tue, Oct 14, 2025 at 03:38:36PM +0800, Yumei Huang wrote: =20 > > > > > > According to RFC 2988 and RFC 6298, we should use an exponentia= l > > > > > > backoff timeout for data retransmission starting from one secon= d > > > > > > (see Appendix A in RFC 6298), and limit it to about 60 seconds > > > > > > as allowed by the same RFC: > > > > > > > > > > > > (2.5) A maximum value MAY be placed on RTO provided it is at > > > > > > least 60 seconds. =20 > > > > > > > > > > The interpretation of this isn't entirely clear to me. Does it m= ean > > > > > if the total retransmit delay exceeds 60s we give up and RST (wha= t > > > > > this patch implements)? Or does it mean that if the retransmit d= elay > > > > > reaches 60s we keep retransmitting, but don't increase the delay = any > > > > > further? > > > > > > > > > > Looking at tcp_bound_rto() and related code in the kernel suggest= s the > > > > > second interpretation. > > > > > =20 > > > > > > Combine the macros defining the initial timeout for both SYN an= d ACK. > > > > > > And add a macro ACK_RETRIES to limit the total timeout to about= 60s. > > > > > > > > > > > > Signed-off-by: Yumei Huang > > > > > > --- > > > > > > tcp.c | 32 ++++++++++++++++---------------- > > > > > > 1 file changed, 16 insertions(+), 16 deletions(-) > > > > > > > > > > > > diff --git a/tcp.c b/tcp.c > > > > > > index 3ce3991..84da069 100644 > > > > > > --- a/tcp.c > > > > > > +++ b/tcp.c > > > > > > @@ -179,16 +179,12 @@ > > > > > > * > > > > > > * Timeouts are implemented by means of timerfd timers, set ba= sed on flags: > > > > > > * > > > > > > - * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest du= ring handshake > > > > > > - * (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within = this time, resend > > > > > > - * SYN. It's the starting timeout for the first SYN retry. I= f this persists > > > > > > - * for more than TCP_MAX_RETRIES or (tcp_syn_retries + > > > > > > - * tcp_syn_linear_timeouts) times in a row, reset the connec= tion > > > > > > - * > > > > > > - * - ACK_TIMEOUT: if no ACK segment was received from tap/gues= t, after sending > > > > > > - * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-s= end data from the > > > > > > - * socket and reset sequence to what was acknowledged. If th= is persists for > > > > > > - * more than TCP_MAX_RETRIES times in a row, reset the conne= ction > > > > > > + * - ACK_TIMEOUT_INIT: if no ACK segment was received from tap= /guest, eiher > > > > > > + * during handshake(flag ACK_FROM_TAP_DUE without ESTABLISHE= D event) or after > > > > > > + * sending data (flag ACK_FROM_TAP_DUE with ESTABLISHED even= t), re-send data > > > > > > + * from the socket and reset sequence to what was acknowledg= ed. It's the > > > > > > + * starting timeout for the first retry. If this persists fo= r more than > > > > > > + * allowed times in a row, reset the connection > > > > > > * > > > > > > * - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag= ACK_FROM_TAP_DUE > > > > > > * with TAP_FIN_SENT event), and no ACK is received within t= his time, reset > > > > > > @@ -342,8 +338,7 @@ enum { > > > > > > #define WINDOW_DEFAULT 14600 /= * RFC 6928 */ > > > > > > > > > > > > #define ACK_INTERVAL 10 /* ms */ > > > > > > -#define SYN_TIMEOUT_INIT 1 /* s */ > > > > > > -#define ACK_TIMEOUT 2 > > > > > > +#define ACK_TIMEOUT_INIT 1 /* s, RFC= 6298 */ =20 > > > > > > > > > > I'd suggest calling this RTO_INIT to match the terminology used i= n the > > > > > RFCs. =20 > > > > > > > > Sure. =20 > > > > > =20 > > > > > > #define FIN_TIMEOUT 60 > > > > > > #define ACT_TIMEOUT 7200 > > > > > > > > > > > > @@ -352,6 +347,11 @@ enum { > > > > > > > > > > > > #define ACK_IF_NEEDED 0 /* See tcp_send_f= lag() */ > > > > > > > > > > > > +/* Number of retries calculated from the exponential backoff f= ormula, limited > > > > > > + * by a total timeout of about 60 seconds. > > > > > > + */ > > > > > > +#define ACK_RETRIES 5 > > > > > > + =20 > > > > > > > > > > As noted above, I think this is based on a misunderstanding of wh= at > > > > > the RFC is saying. TCP_MAX_RETRIES should be fine as it is, I th= ink. > > > > > We could implement the clamping of the RTO, but it's a "MAY" in t= he > > > > > RFC, so we don't have to, and I don't really see a strong reason = to do > > > > > so. =20 > > > > > > > > If we use TCP_MAX_RETRIES and not clamping RTO, the total timeout > > > > could be 255 seconds. > > > > > > > > Stefano mentioned "Retransmitting data after 256 seconds doesn't ma= ke > > > > a lot of sense to me" in the previous comment. =20 > > > > > > That's true, but it's pretty much true for 60s as well. For the loca= l > > > link we usually have between passt and guest, even 1s is an eternity.= =20 > > > > Rather than the local link I was thinking of whatever monitor or > > liveness probe in KubeVirt which might have a 60-second period, or some > > firewall agent, or how long it typically takes for guests to stop and > > resume again in KubeVirt. > > > > It's usually seconds or maybe minutes but not five minutes. > > =20 > > > Basically I see no harm, but also no advantage to clamping or limitin= g > > > the RTO, so I'm suggesting going with the simplest code. =20 > > > > The advantage I see is that we'll recover significantly faster in case > > something went wrong. > > =20 > > > Note that there are (rare) situations where we could get a response > > > after minutes. > > > - The interface on the guest was disabled for a while > > > - An error in guest firewall configuration blocked packets for a whi= le > > > - A bug on the guest cause the kernel to wedge for a while > > > - The user manually suspended the guest for a while (VM/passt only) > > > > > > These generally indicate something has gone fairly badly wrong, but a > > > long RTO gives the user a bit more time to realise their mistake and > > > fix things. =20 > > > > True, it's just that to me five minutes sounds like "broken beyond > > repair", while one minute sounds like "oh we tried again and it worked"= . > > =20 > > > These are niche cases, but given the cost of implementing > > > it is "do nothing"... =20 > > > > ...anyway, it's not a strong preference from my side. It's mostly about > > experience but I won't be able to really come up with obvious evidence > > (at least not quickly), so if the code is significantly simpler... > > whatever. It's not provable so I won't insist. > > > > Note: the comments I'm replying to are from yesterday / Thursday, on > > v3, and today / Friday we're at v6. I don't expect a week grace period > > as you would on the kernel: > > > > https://docs.kernel.org/process/submitting-patches.html#don-t-get-dis= couraged-or-impatient > > > > because we can surely move faster than that, but three versions in a > > day obviously before I get any chance to have a look means a > > substantial overhead for me, and I might miss the meaning and context o= f > > comments of other reviewers (David in this case). There are no > > changelogs in cover letters either. > > > > I plan to skip to v6 but don't expect a review soon, because of that > > overhead I just mentioned. =20 >=20 > Sorry for the overhead I brought. It's just so different from what we > do with MRs or PRs(at least within our team), which we are supposed to > update as soon as possible, so reviewers could review again at any > time they are available. And it's always the latest code (with less > "problematic" code) there for review, not the outdated ones. Oh, I see now. I also have some experience with contributing via git forges, and I think it's a serious limitation (at least on GitHub) coming from the fact that you don't have (proper) threading. You have it on discussions and issues/tickets but not on code reviews. You lose one dimension of discussion there, because it becomes entirely "linear", and while you can see differences between revisions, it's not really practical to review or discuss them. There's also no space to record and describe changes, if you just force push a branch. I think code quality suffers because if the author of the change and just one reviewer are fast enough, the point of view of everybody else will be ignored. Other points of view can be re-evaluated later, but in this case you'll waste more time writing yet another revision, which might now ignore a previous comment (that you addressed, previously) because it's not visible anymore. * * * Let's pick this practical example here: we were in the middle of a discussion about whether we need to properly size a buffer to read out sysctl values (David's idea), or if we can go for a larger buffer in any case to keep things simpler (my proposal). Before I had the chance to follow up with the discussion, you posted another revision. And then another one. On GitHub, it would be impossible for me to re-open that discussion, so I would start a new one, and now David might miss the fact it's the same discussion. Maybe he was right, but it doesn't matter anymore. With email, I can do that because we have threading and persistence, but if the outcome of the discussion now changes, you wasted time with another revision. Or maybe I see that you're at v7 now and I forget that that discussion was still open, so my previous point, even if valid, is now effectively ignored and forgotten by everybody. The workflow you have on GitHub works well if you have one author and one reviewer, or more reviewers who are always right and always agree between each other, but that's a quite unrealistic expectation. I guess it also works well if code quantity is more important than quality, because it's merged faster that way, and because it's harder to discuss about it (no real threading). But here we're trying to have less code and less bugs, not more. > I thought > it's the same with patches in emails, that outdated versions are no > longer useful. They are, but they're not so practical to have a discussion about, so not so useful as the current one, which is why discussions should have a chance to complete. You'll just be busy writing new revisions otherwise, instead of having time for something else in parallel. And reviewers have other stuff to review too, so we don't really gain time if you re-post fast. It's different if we have a critical issue affecting many users and we want to fix it fast for them. But usually it's a small patch/series in that case and we don't care so much about discussing the best approach as long as it's fixed and released quickly. > Apparently I got it wrong. I will keep it in mind and > not send too many versions in a short time, and add changelogs in > cover letters when necessary. It's not always necessary I think, and sometimes you can keep things short if they're obvious to everybody. These are the biggest series ever posted for passt, in terms of number of patches: [PATCH v2 00/32] Use dual stack sockets to listen for inbound TCP connections https://archives.passt.top/passt-dev/20221117055908.2782981-1-david@gibso= n.dropbear.id.au/ [PATCH v11 00/30] Introduce discontiguous frames management https://archives.passt.top/passt-dev/20250902075253.990038-1-lvivier@redh= at.com/ ...you'll see that, for some revisions, changes are very briefly summarised. That's enough, especially if there was a single reviewer for a given revision. But with this series it's doable and there are a few specific changes between each revision, so I think you should, because it helps reviewers to understand what you're doing. --=20 Stefano