From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=dQqNJxOo; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 223D05A095E for ; Mon, 17 Nov 2025 03:39:08 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1763347146; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AiXgAnyVNTpbBN9t9JJLzKjDKm0OgyBgwP0xnLq0c6A=; b=dQqNJxOoVHXmVjPGc7MQIZMWKm3S0CjVGmkdzbSdnSDM5o5CJs+Dnv+fOTiSOwugOZads0 Z7FfGMxJG80xtZw1023D8iT253DodB5ALvgfDpbjOgPGMwkRBZbRfnd0Te9hthYy9aaun3 SVl95DlwBTumhXGN5Ha8bAEj0MaQQD8= Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-568-Q8vLWoOLPxyna6knY-pZlg-1; Sun, 16 Nov 2025 21:39:04 -0500 X-MC-Unique: Q8vLWoOLPxyna6knY-pZlg-1 X-Mimecast-MFC-AGG-ID: Q8vLWoOLPxyna6knY-pZlg_1763347144 Received: by mail-ed1-f69.google.com with SMTP id 4fb4d7f45d1cf-640b8d02165so5101004a12.2 for ; Sun, 16 Nov 2025 18:39:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763347143; x=1763951943; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=AiXgAnyVNTpbBN9t9JJLzKjDKm0OgyBgwP0xnLq0c6A=; b=djNUmoYfPyjsV9kP3DWRipAqTZqvtp6/sXUVu0K7JNtm9HzCF+hboHZsv1yHXwQLOz ozwMAcPWYITG4qcIeWhEEkIHzQsTwqh5bWrgyDyhjWCiICwkw6nGoW9rQwUcz2o1T1sD 7CJzXHXC3Vsnw5MpBKmmjrIfzQeuLorKd65MV1XaGbLhhxONyWjKP86MAyqSMvdVz2hi LE47J6qBwpEqohGvy7LURVcJu1Z9z+i+PtICmP1G5n/IoBAkwJiKE2sz+kQLQILbl0Ih mqguInUbmQi/HCoRbWOWrYs6RxsbSUUrCC6+LbYF9Q8ljsugWxGe+/J0KHb1bKVt99te DHTQ== X-Forwarded-Encrypted: i=1; AJvYcCU2wqFqXYeJE1d2uIN+yEImf2za9CHV+LCG2mgzX37CFw21wesftykgJPcmAaqfdxtvta0rAvQnsxw=@passt.top X-Gm-Message-State: AOJu0YySFmwyAsrLD4qlitue7LGulURvc2ZDmOkysZQEaAfevLhWJiEW ykBCpQCgPdMRfcRFeoKe5zDkxTzN9A5mF1l2dB1A/0ACNosedO5FG2zBKD7c75oIjFtkISkkUwS XxC2AKTE7ljyC5YIs7tAsN1om71jHLMU24tlaWULDY8v+bj8chHx94+JHu84iSnaMo5irSXDqF1 fgPenqK0NCqTN6/2uT+FseS+KEtVegx5Euxu4EAZI= X-Gm-Gg: ASbGnctxdo3EF5n8upxBVuHgUVh6Pcc7TSfV33JrztMcE7oMS55bVnaIQ8gt1dVZzb2 N/yK2yhkrn4J1YILSDB7pFrxZ+UNQ/HW8v1tpr5aXZIgB/uQIFLjMo0z/mU0tc7Xge8++8yAxPr OaTkUnSo85/yRCnSezr0tzaOtEMM5QzEyg7HeaAWzkzDkH9+Bvi42ZBqCg X-Received: by 2002:a05:6402:5c8:b0:63c:3efe:d98a with SMTP id 4fb4d7f45d1cf-64350e88869mr9957828a12.32.1763347143462; Sun, 16 Nov 2025 18:39:03 -0800 (PST) X-Google-Smtp-Source: AGHT+IGTFaAyGRPN7lBpNJwQt/P2WAkgvXpPcsoJvm3WA1PTLhn/H0to7UDAhuTkO1OfvtKMIcs/3GmlnK0DPrOEgg4= X-Received: by 2002:a05:6402:5c8:b0:63c:3efe:d98a with SMTP id 4fb4d7f45d1cf-64350e88869mr9957818a12.32.1763347143021; Sun, 16 Nov 2025 18:39:03 -0800 (PST) MIME-Version: 1.0 References: <20251110093137.87705-7-yuhuang@redhat.com> <20251114010121.10dfb18a@elisabeth> In-Reply-To: From: Yumei Huang Date: Mon, 17 Nov 2025 10:38:52 +0800 X-Gm-Features: AWmQ_bnMypNzX9t1b7doRMXvtJKN4W5xzSmZmO5X2cKnbl4ddNBtel7REYJS1vs Message-ID: Subject: Re: [PATCH v8 6/6] tcp: Clamp the retry timeout To: David Gibson X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: dWGBevyTzCSTGAFrvFbJpsS1Jee4bDF6cl7Th8eE74c_1763347144 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Message-ID-Hash: KI7PNMZMN6MXARAJEO2D24SFY6HPDXFM X-Message-ID-Hash: KI7PNMZMN6MXARAJEO2D24SFY6HPDXFM X-MailFrom: yuhuang@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Stefano Brivio , passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, Nov 14, 2025 at 12:19=E2=80=AFPM David Gibson wrote: > > On Fri, Nov 14, 2025 at 11:05:51AM +0800, Yumei Huang wrote: > > On Fri, Nov 14, 2025 at 8:47=E2=80=AFAM David Gibson > > wrote: > > > > > > On Fri, Nov 14, 2025 at 01:01:21AM +0100, Stefano Brivio wrote: > > > > On Mon, 10 Nov 2025 21:56:39 +1100 > > > > David Gibson wrote: > > > > > > > > > On Mon, Nov 10, 2025 at 05:31:37PM +0800, Yumei Huang wrote: > > > > > > Clamp the TCP retry timeout as Linux kernel does. If a retry oc= curs > > > > > > during the handshake and the RTO is below 3 seconds, re-initial= ise > > > > > > it to 3 seconds for data retransmissions according to RFC 6298. > > > > > > > > > > > > Suggested-by: Stefano Brivio > > > > > > Signed-off-by: Yumei Huang > > > > > > > > > > Looks correct, so > > > > > > > > > > Reviewed-by: David Gibson > > > > > > > > > > A few possible improvements (mostly comments/names) below. > > > > > > > > > > > --- > > > > > > tcp.c | 25 ++++++++++++++++++++----- > > > > > > tcp.h | 2 ++ > > > > > > tcp_conn.h | 1 + > > > > > > 3 files changed, 23 insertions(+), 5 deletions(-) > > > > > > > > > > > > diff --git a/tcp.c b/tcp.c > > > > > > index ee111e0..b015658 100644 > > > > > > --- a/tcp.c > > > > > > +++ b/tcp.c > > > > > > @@ -187,6 +187,9 @@ > > > > > > * for established connections, or (syn_retries + syn_linear= _timeouts) times > > > > > > * during the handshake, reset the connection > > > > > > * > > > > > > + * - RTO_INIT_ACK: if the RTO is less than this, re-initialise= RTO to this for > > > > > > + * data retransmissions > > > > > > + * > > > > > > > > > > Now that this is conditional on SYN being retried, the descriptio= n > > > > > doesn't look entirely accurate any more. > > > > > > > > If I read the whole thing again I would actually say it becomes > > > > misleading and worth fixing before applying this. Also because the > > > > relationship to the text in the RFC is not obvious anymore. > > > > > > > > This should be "if SYN retries happened during handshake, ...". > > > > > > > > > > * - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag= ACK_FROM_TAP_DUE > > > > > > * with TAP_FIN_SENT event), and no ACK is received within t= his time, reset > > > > > > * the connection > > > > > > @@ -340,6 +343,7 @@ enum { > > > > > > > > > > > > #define ACK_INTERVAL 10 /* ms= */ > > > > > > #define RTO_INIT 1 /* s, RFC 629= 8 */ > > > > > > +#define RTO_INIT_ACK 3 /* s,= RFC 6298 */ > > > > > > > > > > The relevance of "_ACK" in the name is not obvious to me. > > > > > > > > What about RTO_INIT_AFTER_SYN_RETRIES? We use it only once, and it > > > > looks fine there. > > > > > > Works for me. > > > > > > > > > #define FIN_TIMEOUT 60 > > > > > > #define ACT_TIMEOUT 7200 > > > > > > > > > > > -> @@ -365,9 +369,11 @@ uint8_t tcp_migrate_rcv_queue = [TCP_MIGRATE_RCV_QUEUE_MAX]; > > > > > > > > > > > > #define SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_r= etries" > > > > > > #define SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_l= inear_timeouts" > > > > > > +#define RTO_MAX_MS "/proc/sys/net/ipv4/tcp_rto_m= ax_ms" > > > > > > > > > > > > #define SYN_RETRIES_DEFAULT 6 > > > > > > #define SYN_LINEAR_TIMEOUTS_DEFAULT 4 > > > > > > +#define RTO_MAX_MS_DEFAULT 120000 > > > > > > #define MAX_SYNCNT 127 /* derived from k= ernel's limit */ > > > > > > > > > > > > /* "Extended" data (not stored in the flow table) for TCP flow= migration */ > > > > > > @@ -392,7 +398,7 @@ static const char *tcp_state_str[] __attrib= ute((__unused__)) =3D { > > > > > > > > > > > > static const char *tcp_flag_str[] __attribute((__unused__)) = =3D { > > > > > > "STALLED", "LOCAL", "ACTIVE_CLOSE", "ACK_TO_TAP_DUE", > > > > > > - "ACK_FROM_TAP_DUE", "ACK_FROM_TAP_BLOCKS", > > > > > > + "ACK_FROM_TAP_DUE", "ACK_FROM_TAP_BLOCKS", "SYN_RETRIED", > > > > > > }; > > > > > > > > > > > > /* Listening sockets, used for automatic port forwarding in pa= sta mode only */ > > > > > > @@ -590,10 +596,13 @@ static void tcp_timer_ctl(const struct ct= x *c, struct tcp_tap_conn *conn) > > > > > > if (conn->flags & ACK_TO_TAP_DUE) { > > > > > > it.it_value.tv_nsec =3D (long)ACK_INTERVAL * 1000 * 1= 000; > > > > > > } else if (conn->flags & ACK_FROM_TAP_DUE) { > > > > > > - int exp =3D conn->retries; > > > > > > + int exp =3D conn->retries, timeout =3D RTO_INIT; > > > > > > if (!(conn->events & ESTABLISHED)) > > > > > > exp -=3D c->tcp.syn_linear_timeouts; > > > > > > - it.it_value.tv_sec =3D RTO_INIT << MAX(exp, 0); > > > > > > + else if (conn->flags & SYN_RETRIED) > > > > > > + timeout =3D MAX(timeout, RTO_INIT_ACK); > > > > > > + timeout <<=3D MAX(exp, 0); > > > > > > + it.it_value.tv_sec =3D MIN(timeout, c->tcp.rto_max); > > > > > > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > > > > > > it.it_value.tv_sec =3D FIN_TIMEOUT; > > > > > > } else { > > > > > > @@ -2440,6 +2449,7 @@ void tcp_timer_handler(const struct ctx *= c, union epoll_ref ref) > > > > > > flow_trace(conn, "SYN timeout, retry"= ); > > > > > > tcp_send_flag(c, conn, SYN); > > > > > > conn->retries++; > > > > > > + conn_flag(c, conn, SYN_RETRIED); > > > > > > tcp_timer_ctl(c, conn); > > > > > > } > > > > > > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACK= ED)) { > > > > > > @@ -2811,10 +2821,15 @@ void tcp_get_rto_params(struct ctx *c) > > > > > > v =3D read_file_integer(SYN_LINEAR_TIMEOUTS, SYN_LINEAR_TIMEO= UTS_DEFAULT); > > > > > > c->tcp.syn_linear_timeouts =3D MIN(v, MAX_SYNCNT); > > > > > > > > > > > > + v =3D read_file_integer(RTO_MAX_MS, RTO_MAX_MS_DEFAULT); > > > > > > + c->tcp.rto_max =3D MIN(DIV_ROUND_CLOSEST(v, 1000), INT_MAX); > > > > > > > > > > Possibly we should verify this is =3D> RTO_INIT. > > > > > > > > As a sanity check, maybe, but I don't see any harmful effect if it'= s > > > > < RTO_INIT, right? So I'm not sure if we should. No preference from= my > > > > side really. > > > > > > Sorry, describing this as >=3D RTO_INIT was misleading. What I'm > > > concerned about here is if the kernel value is set to 400ms, we'll > > > round it to... 0s. > > > > > > So, really what I'm concerned about is that we ensure this is > 0. > > > > That's a good point. > > Actually, thinking about it, I wonder if makes more sense to always > round up to 1s, rather than to the nearest 1s. So it's to replace DIV_ROUND_CLOSEST with DIV_ROUND_UP, right? > > > > > > > > + > > > > > > debug("Read sysctl values syn_retries: %"PRIu8 > > > > > > - ", syn_linear_timeouts: %"PRIu8, > > > > > > + ", syn_linear_timeouts: %"PRIu8 > > > > > > + ", rto_max: %d", > > > > > > c->tcp.syn_retries, > > > > > > - c->tcp.syn_linear_timeouts); > > > > > > + c->tcp.syn_linear_timeouts, > > > > > > + c->tcp.rto_max); > > > > > > } > > > > > > > > > > > > /** > > > > > > diff --git a/tcp.h b/tcp.h > > > > > > index 37d7758..c4945c3 100644 > > > > > > --- a/tcp.h > > > > > > +++ b/tcp.h > > > > > > @@ -60,6 +60,7 @@ union tcp_listen_epoll_ref { > > > > > > * @fwd_out: Port forwarding configuration for out= bound packets > > > > > > * @timer_run: Timestamp of most recent timer run > > > > > > * @pipe_size: Size of pipes for spliced connections > > > > > > + * @rto_max: Maximal retry timeout (in s) > > > > > > > > As pointed out earlier, I think "maximum" is slightly more appropri= ate > > > > here. > > > > > > Agreed, with nuance discussed earlier. > > > > Well, I must have misunderstood something in last week's meeting. I > > joined the second half meeting a few mins later and I caught Stefano > > saying he was wrong about maximal. Probably he was talking about the > > French meaning. Anyway, I will update. > > Sorry, yeah, the discussion there was probably fairly confusing - I > think both Stefano and I got a bit distracted from the question at > hand by language-nerding. > > The upshot is that while I think both "maximal" and "maximum" would be > correct, I think "maximum" is slightly clearer. Got it. > > > > > > > > > > > > > > * @syn_retries: SYN retries using exponential backoff timeout > > > > > > * @syn_linear_timeouts: SYN retries before using exponential = backoff timeout > > > > > > */ > > > > > > @@ -68,6 +69,7 @@ struct tcp_ctx { > > > > > > struct fwd_ports fwd_out; > > > > > > struct timespec timer_run; > > > > > > size_t pipe_size; > > > > > > + int rto_max; > > > > > > uint8_t syn_retries; > > > > > > uint8_t syn_linear_timeouts; > > > > > > }; > > > > > > diff --git a/tcp_conn.h b/tcp_conn.h > > > > > > index 923af36..e36910c 100644 > > > > > > --- a/tcp_conn.h > > > > > > +++ b/tcp_conn.h > > > > > > @@ -77,6 +77,7 @@ struct tcp_tap_conn { > > > > > > #define ACK_TO_TAP_DUE BIT(3) > > > > > > #define ACK_FROM_TAP_DUE BIT(4) > > > > > > #define ACK_FROM_TAP_BLOCKS BIT(5) > > > > > > +#define SYN_RETRIED BIT(6) > > > > > > > > > > > > #define SNDBUF_BITS 24 > > > > > > unsigned int sndbuf :SNDBUF_BITS; > > > > > > -- > > > > > > 2.51.0 > > > > > > > > -- > > > > Stefano > > > > > > > > > > -- > > > David Gibson (he or they) | I'll have my music baroque, and my = code > > > david AT gibson.dropbear.id.au | minimalist, thank you, not the othe= r way > > > | around. > > > http://www.ozlabs.org/~dgibson > > > > > > > > -- > > Thanks, > > > > Yumei Huang > > > > -- > David Gibson (he or they) | I'll have my music baroque, and my code > david AT gibson.dropbear.id.au | minimalist, thank you, not the other wa= y > | around. > http://www.ozlabs.org/~dgibson --=20 Thanks, Yumei Huang