From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=A7Y6yJBN; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 7FBE15A0274 for ; Mon, 01 Dec 2025 05:09:34 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1764562173; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Kf4/oNvR7N2X0VLhSKFQuoAJOffPDAApVtESMUCSCa4=; b=A7Y6yJBNms50dtzsO6aXSg1GaY6WkzGVhgUXQQwIjneWjPcRxrE3SHJ7bTrTP58zJEawdj eZ40nocosl/HcPkuOS4hyNOCCLIyYrWc6Iz2dbreDajeF1PRayH1q9W5hUUsL+RsJa1o6N cKY8phdmwoK0ENB4Oa7mo29fDjApipA= Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-14-fMtffOs9MEa_GDcPpVD23g-1; Sun, 30 Nov 2025 23:09:30 -0500 X-MC-Unique: fMtffOs9MEa_GDcPpVD23g-1 X-Mimecast-MFC-AGG-ID: fMtffOs9MEa_GDcPpVD23g_1764562169 Received: by mail-ed1-f70.google.com with SMTP id 4fb4d7f45d1cf-64161017e34so4848892a12.1 for ; Sun, 30 Nov 2025 20:09:30 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764562169; x=1765166969; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=Kf4/oNvR7N2X0VLhSKFQuoAJOffPDAApVtESMUCSCa4=; b=jYYvQXYb0LVxgqq+SDlqjvmzG1fWbg4ucqwhJxsxfkh3pB4yxaHH4PM2jhLE1kM5tZ xnlh8rSO/frdFK1MX6aadkce6EH1pPdtSW6z6Ny90l99BgO/9WtJXplVSRnij6CPsk3C NFl0MgY5HAH5StSSqZ8Kqo561OtG144reTP4skQqq2BS7oQkqeiiH+9rYU1VIJKnGaJZ dR4yjTWqBmi4KULbhErYzwVH+mgKZAJU9CH5UqtSRHqUWZFmuGVKdaQ5oExQXYlOK6Gs IYHmBImpHsIslBbpvyXa1mdLSPLsPhUK3GNQYjrNehMyGCJsEyKZ4acAfiCGv+leq3xX V38A== X-Gm-Message-State: AOJu0YxSsPgMardzgWnRXUPPEIebT2dHyt3YUcsuuyosNwNZsbrF1NWS JD4UOKHYrSNeXtjCZzlgYPsx5JjBpfvX5oO2kFtMaM1xn7tfEs6IsDv3Xwwr5euadbQWXJQ67P5 cMqkvI3sRrHw0weBWTdNe07Zkbn/zMou0IaCG2LHRkj3JnpnG0K4UbvDeoyt4sj+9slP9nIwuW3 czqublc86tmon5MJAorxB+1RCzBB9CQJJXh5q6JpKvcA== X-Gm-Gg: ASbGncvBlX8Rf4os+pfWUPipIWcgGpXf+l4nSDC2W4J2Iza5gMb0DjdA8OKZw+DImFP l/KZfaLwsgd0tkKX4oIOgsJ0FRu5uA9bsQgQ++L//5Pm98dtilk4OPq2+QIs2w0T1l8wkJdWzY9 V6LoiQUy6nQO7I1hAY266sv2v6AG3Y5rCgln5CaFOL/YoCU55vufTe9qFfXXnpEvxxEqQ= X-Received: by 2002:a05:6402:50d0:b0:643:1659:759a with SMTP id 4fb4d7f45d1cf-645546767a7mr31655967a12.25.1764562169047; Sun, 30 Nov 2025 20:09:29 -0800 (PST) X-Google-Smtp-Source: AGHT+IGpGm69fSyajtVpTAQWrsuEQVzX+NaIbCjzYkqF1iNut8QYPQzCRlBcQvBhKZTRN/3KxNP3wwQpppUcHG5ZsBA= X-Received: by 2002:a05:6402:50d0:b0:643:1659:759a with SMTP id 4fb4d7f45d1cf-645546767a7mr31655949a12.25.1764562168595; Sun, 30 Nov 2025 20:09:28 -0800 (PST) MIME-Version: 1.0 References: <20251125072638.88896-1-yuhuang@redhat.com> <20251125072638.88896-6-yuhuang@redhat.com> <20251127233342.36986b16@elisabeth> In-Reply-To: <20251127233342.36986b16@elisabeth> From: Yumei Huang Date: Mon, 1 Dec 2025 12:09:17 +0800 X-Gm-Features: AWmQ_blYSgeXXWtjZY_Jt7Aw7Gk-DFzyfQsjVxksZ_qKo4pWpsRly936nEqqDaA Message-ID: Subject: Re: [PATCH v9 5/5] tcp: Clamp the retry timeout To: Stefano Brivio X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: Ndzu9H7U5Pl9xxHJT0068z_-dGeXK0Xd_I3XOOwH2Ag_1764562169 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Message-ID-Hash: AX4ALHNUB2ZLRY7554HWCOBPWJABL2SY X-Message-ID-Hash: AX4ALHNUB2ZLRY7554HWCOBPWJABL2SY X-MailFrom: yuhuang@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, david@gibson.dropbear.id.au X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, Nov 28, 2025 at 6:33=E2=80=AFAM Stefano Brivio = wrote: > > On Tue, 25 Nov 2025 15:26:38 +0800 > Yumei Huang wrote: > > > Clamp the TCP retry timeout as Linux kernel does. If a retry occurs > > during the handshake and the RTO is below 3 seconds, re-initialise > > it to 3 seconds for data retransmissions according to RFC 6298. > > > > Suggested-by: Stefano Brivio > > Signed-off-by: Yumei Huang > > --- > > tcp.c | 25 ++++++++++++++++++++----- > > tcp.h | 2 ++ > > tcp_conn.h | 1 + > > 3 files changed, 23 insertions(+), 5 deletions(-) > > > > diff --git a/tcp.c b/tcp.c > > index b3aa064..887b32f 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -187,6 +187,9 @@ > > * for established connections, or (syn_retries + syn_linear_timeout= s) times > > * during the handshake, reset the connection > > * > > + * - RTO_INIT_AFTER_SYN_RETRIES: if SYN retries happened during handsh= ake and > > + * RTO is less than this, re-initialise RTO to this for data retrans= missions > > + * > > * - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK_FRO= M_TAP_DUE > > * with TAP_FIN_SENT event), and no ACK is received within this time= , reset > > * the connection > > @@ -340,6 +343,7 @@ enum { > > > > #define ACK_INTERVAL 10 /* ms */ > > #define RTO_INIT 1 /* s, RFC 6298 */ > > +#define RTO_INIT_AFTER_SYN_RETRIES 3 /* s, RFC 6298 */ > > #define FIN_TIMEOUT 60 > > #define ACT_TIMEOUT 7200 > > > > @@ -365,9 +369,11 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGR= ATE_RCV_QUEUE_MAX]; > > > > #define SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retries" > > #define SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linear_timeou= ts" > > +#define RTO_MAX_MS "/proc/sys/net/ipv4/tcp_rto_max_ms" > > > > #define SYN_RETRIES_DEFAULT 6 > > #define SYN_LINEAR_TIMEOUTS_DEFAULT 4 > > +#define RTO_MAX_MS_DEFAULT 120000 > > Nit: I think it would make more sense to have this in seconds, because > you are rounding it anyway. I agree and will update. > If somebody changes this to 120001, you'll > use 120000. At that point you can just say 120. Just a little correction, as we are using DIV_ROUND_UP, when it's 120001, we will get 121000 instead of 120000. > > More importantly, perhaps, if somebody changes this to 1200, you would > be using 2000. > > I think this could be RTO_MAX_DEFAULT (with a comment saying it's in > seconds) and later you can just multiply it by 1000 when you use it. > > The rest of the series looks good to me now. Thanks for the review. > > > #define MAX_SYNCNT 127 /* derived from kernel's limi= t */ > > > > /* "Extended" data (not stored in the flow table) for TCP flow migrati= on */ > > @@ -392,7 +398,7 @@ static const char *tcp_state_str[] __attribute((__u= nused__)) =3D { > > > > static const char *tcp_flag_str[] __attribute((__unused__)) =3D { > > "STALLED", "LOCAL", "ACTIVE_CLOSE", "ACK_TO_TAP_DUE", > > - "ACK_FROM_TAP_DUE", "ACK_FROM_TAP_BLOCKS", > > + "ACK_FROM_TAP_DUE", "ACK_FROM_TAP_BLOCKS", "SYN_RETRIED", > > }; > > > > /* Listening sockets, used for automatic port forwarding in pasta mode= only */ > > @@ -590,10 +596,13 @@ static void tcp_timer_ctl(const struct ctx *c, st= ruct tcp_tap_conn *conn) > > if (conn->flags & ACK_TO_TAP_DUE) { > > it.it_value.tv_nsec =3D (long)ACK_INTERVAL * 1000 * 1000; > > } else if (conn->flags & ACK_FROM_TAP_DUE) { > > - int exp =3D conn->retries; > > + int exp =3D conn->retries, timeout =3D RTO_INIT; > > if (!(conn->events & ESTABLISHED)) > > exp -=3D c->tcp.syn_linear_timeouts; > > - it.it_value.tv_sec =3D RTO_INIT << MAX(exp, 0); > > + else if (conn->flags & SYN_RETRIED) > > + timeout =3D MAX(timeout, RTO_INIT_AFTER_SYN_RETRI= ES); > > + timeout <<=3D MAX(exp, 0); > > + it.it_value.tv_sec =3D MIN(timeout, c->tcp.rto_max); > > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > > it.it_value.tv_sec =3D FIN_TIMEOUT; > > } else { > > @@ -2441,6 +2450,7 @@ void tcp_timer_handler(const struct ctx *c, union= epoll_ref ref) > > flow_trace(conn, "SYN timeout, retry"); > > tcp_send_flag(c, conn, SYN); > > conn->retries++; > > + conn_flag(c, conn, SYN_RETRIED); > > tcp_timer_ctl(c, conn); > > } > > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED))= { > > @@ -2812,10 +2822,15 @@ static void tcp_get_rto_params(struct ctx *c) > > v =3D read_file_integer(SYN_LINEAR_TIMEOUTS, SYN_LINEAR_TIMEOUTS_= DEFAULT); > > c->tcp.syn_linear_timeouts =3D MIN(v, MAX_SYNCNT); > > > > + v =3D read_file_integer(RTO_MAX_MS, RTO_MAX_MS_DEFAULT); > > + c->tcp.rto_max =3D MIN(DIV_ROUND_UP(v, 1000), INT_MAX); > > + > > debug("Using TCP RTO parameters, syn_retries: %"PRIu8 > > - ", syn_linear_timeouts: %"PRIu8, > > + ", syn_linear_timeouts: %"PRIu8 > > + ", rto_max: %d", > > c->tcp.syn_retries, > > - c->tcp.syn_linear_timeouts); > > + c->tcp.syn_linear_timeouts, > > + c->tcp.rto_max); > > } > > > > /** > > diff --git a/tcp.h b/tcp.h > > index 37d7758..6fb6f92 100644 > > --- a/tcp.h > > +++ b/tcp.h > > @@ -60,6 +60,7 @@ union tcp_listen_epoll_ref { > > * @fwd_out: Port forwarding configuration for outbound packet= s > > * @timer_run: Timestamp of most recent timer run > > * @pipe_size: Size of pipes for spliced connections > > + * @rto_max: Maximum retry timeout (in s) > > * @syn_retries: SYN retries using exponential backoff timeout > > * @syn_linear_timeouts: SYN retries before using exponential backoff = timeout > > */ > > @@ -68,6 +69,7 @@ struct tcp_ctx { > > struct fwd_ports fwd_out; > > struct timespec timer_run; > > size_t pipe_size; > > + int rto_max; > > uint8_t syn_retries; > > uint8_t syn_linear_timeouts; > > }; > > diff --git a/tcp_conn.h b/tcp_conn.h > > index 923af36..e36910c 100644 > > --- a/tcp_conn.h > > +++ b/tcp_conn.h > > @@ -77,6 +77,7 @@ struct tcp_tap_conn { > > #define ACK_TO_TAP_DUE BIT(3) > > #define ACK_FROM_TAP_DUE BIT(4) > > #define ACK_FROM_TAP_BLOCKS BIT(5) > > +#define SYN_RETRIED BIT(6) > > > > #define SNDBUF_BITS 24 > > unsigned int sndbuf :SNDBUF_BITS; > > -- > Stefano > --=20 Thanks, Yumei Huang