From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=E2tKGh3G; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 010395A061E for ; Fri, 14 Nov 2025 01:01:00 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1763078459; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FYeHhr/0Z7UVDnXal1eU2Tqwgs1fdIupu/5Scz9Yuoo=; b=E2tKGh3GKM3U0cchViwSkvncB1ChuTRjiKmf7FcDXvXAbsqsmKnmGYXgOZ5O9nLt5suuZz njdKG17t0n7sdGujrSovIfwubglcjRvOVjs4jrcmnhpI9as85qG78cF8PvYPcD50mMMixT U+bu1ycSCb+IrCkvib6SVPHAkAniTYU= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-262-mdcQrrGlO0CRvJxKvbUZ2Q-1; Thu, 13 Nov 2025 19:00:58 -0500 X-MC-Unique: mdcQrrGlO0CRvJxKvbUZ2Q-1 X-Mimecast-MFC-AGG-ID: mdcQrrGlO0CRvJxKvbUZ2Q_1763078457 Received: by mail-wr1-f71.google.com with SMTP id ffacd0b85a97d-42b3c965ce5so997470f8f.2 for ; Thu, 13 Nov 2025 16:00:58 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763078457; x=1763683257; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=OtrdRiHkDVB60b/79/HOMvJkSxqMpboRMtezJx0yJQo=; b=ovv/2YWBKdEVc6PVo5AxYFn5O6MmBG49+/C3jaMlNI7lZl8lqvq3mGOCgP+BH62oBD gXybm0CJUO770AkUsRK7Nb2pIoZRWaHbmHuo0EeUzhr0eOrcPvOUACjG3yUnS/B3aHvf Tqy5HzFxUwHypo2/LT+CRb0W+is59gT20rcUMM4oiGybjuTB6WcvCQ7Ywqd2YEDls2E5 WDbvm4t+RFq0cFYEP2N220Axo+MOXXbJY7xeetxnIY2d2Tz2KI9x5Ycxyv8Hb9E9HYm/ BQNrBKzilebOOWvzRbl5vAQzuf0vp6vsO5/4gOVnQBoipqMUWWRXHo5Ml/DnGjkNt4JQ P/Xw== X-Forwarded-Encrypted: i=1; AJvYcCVMJr2F2YysgrPind0Mma5aZdMJh5/Q8xR05iSzi/aGO4jspFmAlrriUvFK+qxpOZyKcwMixKbK4Xc=@passt.top X-Gm-Message-State: AOJu0Yzz7XOVfDNVLIABAxZ+pPQbcxm28EGNhjI6Q80OqhVVzQYv2Jor QlpLWjaSBrP6WcNGvqh/ZPAuTLqI2Ik1iq6ltR895wJA+oXRpR8WM2CyNoNJZec1a2YoQdnEXvS XHtz43rDfCehE20HenYVZdQXhyQB8RFvBpkht9SG4oklkZVRdRBawsA== X-Gm-Gg: ASbGncuH64bCmpHmYFmdMEWOCnVOE11ckHZAh0xxJ1AlFyFcIXOOpCX1Tr+s3HZgmpR keAcxhEjnHdXDtUTxipo3MinSfRM7uZNW/baRqdrEgl2SwQGyrAJtutaRrOoiQZXMgwTYH4rDS4 b+aQkfnk57FPQx3JtFoeqxh1ZhJV2/jHZCYn1PpVx+9/dBuSgeZy4frrVKPqr0ZTh1ULIcPU+ci rAUV6LQBYO0KROkkda0MNytnJ11mzME5iVJwcJvc6wZZQdma71bjyCFmaEJ+rTldvedB0q2Jb3o R0LwhJCJLfq3DpxuvH/pj7RLLQKA4lT+KTx4siw7Wv+GGX/ZE7zbl2uozwy7vXe4CVm+//vec5z 9732F3rKVSLDP4M3Qe9BP X-Received: by 2002:a05:6000:4284:b0:42b:30f9:79c9 with SMTP id ffacd0b85a97d-42b59386891mr977489f8f.37.1763078457218; Thu, 13 Nov 2025 16:00:57 -0800 (PST) X-Google-Smtp-Source: AGHT+IF/uRtQyL+7xhEWJuGCJkrHF7tmbcLzq9xSIzM/vx1G189lhRAC5gqRNZhNrN1bqJq9ftOfYQ== X-Received: by 2002:a05:6000:4284:b0:42b:30f9:79c9 with SMTP id ffacd0b85a97d-42b59386891mr977465f8f.37.1763078456632; Thu, 13 Nov 2025 16:00:56 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-42b53f0b8d6sm6797067f8f.28.2025.11.13.16.00.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Nov 2025 16:00:55 -0800 (PST) Date: Fri, 14 Nov 2025 01:00:53 +0100 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH v8 4/6] tcp: Resend SYN for inbound connections Message-ID: <20251114010053.62389abd@elisabeth> In-Reply-To: References: <20251110093137.87705-1-yuhuang@redhat.com> <20251110093137.87705-5-yuhuang@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: QukMBAjHHCp75BbxYs3b0NUuHuuVMNuJI4ntJhTorfQ_1763078457 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: ZDIJKTT34QKML5EWXODYHRQLFP6OI7FE X-Message-ID-Hash: ZDIJKTT34QKML5EWXODYHRQLFP6OI7FE X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Yumei Huang , passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Mon, 10 Nov 2025 21:46:55 +1100 David Gibson wrote: > On Mon, Nov 10, 2025 at 05:31:35PM +0800, Yumei Huang wrote: > > If a client connects while guest is not connected or ready yet, > > resend SYN instead of just resetting connection after 10 seconds. > >=20 > > Use the same backoff calculation for the timeout as Linux kernel. > >=20 > > Link: https://bugs.passt.top/show_bug.cgi?id=3D153 > > Signed-off-by: Yumei Huang =20 >=20 > Reviewed-by: David Gibson >=20 > Though I have one small concern remaining >=20 > > --- > > tcp.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++--------- > > tcp.h | 4 ++++ > > 2 files changed, 57 insertions(+), 9 deletions(-) > >=20 > > diff --git a/tcp.c b/tcp.c > > index 2f49327..da40a99 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -179,9 +179,11 @@ > > * > > * Timeouts are implemented by means of timerfd timers, set based on f= lags: > > * > > - * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshak= e (flag > > - * ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, res= et the > > - * connection > > + * - SYN_TIMEOUT_INIT: if no SYN,ACK is received from tap/guest during > > + * handshake (flag ACK_FROM_TAP_DUE without ESTABLISHED event) withi= n > > + * this time, resend SYN. It's the starting timeout for the first SY= N > > + * retry. Retry for TCP_MAX_RETRIES or (syn_retries + syn_linear_tim= eouts) > > + * times, reset the connection > > * > > * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after= sending > > * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data= from the > > @@ -340,7 +342,7 @@ enum { > > #define WINDOW_DEFAULT=09=09=0914600=09=09/* RFC 6928 */ > > =20 > > #define ACK_INTERVAL=09=09=0910=09=09/* ms */ > > -#define SYN_TIMEOUT=09=09=0910=09=09/* s */ > > +#define SYN_TIMEOUT_INIT=09=091=09=09/* s, RFC 6928 */ > > #define ACK_TIMEOUT=09=09=092 > > #define FIN_TIMEOUT=09=09=0960 > > #define ACT_TIMEOUT=09=09=097200 > > @@ -365,6 +367,13 @@ uint8_t tcp_migrate_rcv_queue=09=09[TCP_MIGRATE_RC= V_QUEUE_MAX]; > > =20 > > #define TCP_MIGRATE_RESTORE_CHUNK_MIN=091024 /* Try smaller when above= this */ > > =20 > > +#define SYN_RETRIES=09=09"/proc/sys/net/ipv4/tcp_syn_retries" > > +#define SYN_LINEAR_TIMEOUTS=09"/proc/sys/net/ipv4/tcp_syn_linear_timeo= uts" > > + > > +#define SYN_RETRIES_DEFAULT=09=096 > > +#define SYN_LINEAR_TIMEOUTS_DEFAULT=094 > > +#define MAX_SYNCNT=09=09=09127 /* derived from kernel's limit */ > > + > > /* "Extended" data (not stored in the flow table) for TCP flow migrati= on */ > > static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX]; > > =20 > > @@ -585,10 +594,13 @@ static void tcp_timer_ctl(const struct ctx *c, st= ruct tcp_tap_conn *conn) > > =09if (conn->flags & ACK_TO_TAP_DUE) { > > =09=09it.it_value.tv_nsec =3D (long)ACK_INTERVAL * 1000 * 1000; > > =09} else if (conn->flags & ACK_FROM_TAP_DUE) { > > -=09=09if (!(conn->events & ESTABLISHED)) > > -=09=09=09it.it_value.tv_sec =3D SYN_TIMEOUT; > > -=09=09else > > +=09=09if (!(conn->events & ESTABLISHED)) { > > +=09=09=09int exp =3D conn->retries - c->tcp.syn_linear_timeouts; =20 >=20 > As discussed in the thread on the last version, the subtraction will > be done unsigned, so the final result depends on behaviour of casting > a "negative" (that is, close to max value) unsigned value into a > signed variable. I'm pretty sure that will do the right thing in > practice, but I don't believe that behaviour is guaranteed by the C > standard. Surprisingly to me, C99 6.2.6.2 "Integer types" admits three interpretations of the sign bit for signed types, quoting: =E2=80=94 the corresponding value with sign bit 0 is negated (sign and magnitude); =E2=80=94 the sign bit has the value =E2=88=92(2N ) (two=E2=80=99s comple= ment); =E2=80=94 the sign bit has the value =E2=88=92(2N =E2=88=92 1) (ones=E2= =80=99 complement). and this, together with the conversion rule from 6.3.1.3 "Signed and unsigned integers": Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised. seems to confirm your interpretation, even though I'm really having a hard time convincing myself that we should actually do stuff like: =09=09=09int exp; =09=09=09exp =3D (int)conn->retries - c->tcp.syn_linear_timeouts; and I still have the suspicion we're missing something. Is this syntax what you're suggesting, by the way? > This probably isn't worth a respin, though. Maybe not but: > > +=09=09=09it.it_value.tv_sec =3D SYN_TIMEOUT_INIT << MAX(exp, 0); > > +=09=09} > > +=09=09else { this is another bit I would need to change. The coding style dictates: =09=09if (x) { =09=09=09... =09=09} else { =09=09=09... =09=09} *not*: =09=09if (x) { =09=09=09... =09=09} =09=09else { =09=09=09... =09=09} ...on the other hand, it goes away in 5/6, so I don't care particularly. It still matters a very tiny bit because if we need to revert the commit resulting from 5/6 for any reason we'll end up with broken coding style. But that's not a realistic scenario and it's very minor anyway. > > =09=09=09it.it_value.tv_sec =3D ACK_TIMEOUT; > > +=09=09} > > =09} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > > =09=09it.it_value.tv_sec =3D FIN_TIMEOUT; > > =09} else { > > @@ -2425,8 +2437,18 @@ void tcp_timer_handler(const struct ctx *c, unio= n epoll_ref ref) > > =09=09tcp_timer_ctl(c, conn); > > =09} else if (conn->flags & ACK_FROM_TAP_DUE) { > > =09=09if (!(conn->events & ESTABLISHED)) { > > -=09=09=09flow_dbg(conn, "handshake timeout"); > > -=09=09=09tcp_rst(c, conn); > > +=09=09=09int max; > > +=09=09=09max =3D c->tcp.syn_retries + c->tcp.syn_linear_timeouts; > > +=09=09=09max =3D MIN(TCP_MAX_RETRIES, max); > > +=09=09=09if (conn->retries >=3D max) { > > +=09=09=09=09flow_dbg(conn, "handshake timeout"); > > +=09=09=09=09tcp_rst(c, conn); > > +=09=09=09} else { > > +=09=09=09=09flow_trace(conn, "SYN timeout, retry"); > > +=09=09=09=09tcp_send_flag(c, conn, SYN); > > +=09=09=09=09conn->retries++; > > +=09=09=09=09tcp_timer_ctl(c, conn); > > +=09=09=09} > > =09=09} else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > > =09=09=09flow_dbg(conn, "FIN timeout"); > > =09=09=09tcp_rst(c, conn); > > @@ -2782,6 +2804,26 @@ static socklen_t tcp_probe_tcp_info(void) > > =09return sl; > > } > > =20 > > +/** > > + * tcp_get_rto_params() - Get host kernel RTO parameters > > + * @c:=09=09Execution context > > + */ > > +void tcp_get_rto_params(struct ctx *c) > > +{ > > +=09intmax_t v; > > + > > +=09v =3D read_file_integer(SYN_RETRIES, SYN_RETRIES_DEFAULT); > > +=09c->tcp.syn_retries =3D MIN(v, MAX_SYNCNT); > > + > > +=09v =3D read_file_integer(SYN_LINEAR_TIMEOUTS, SYN_LINEAR_TIMEOUTS_DE= FAULT); > > +=09c->tcp.syn_linear_timeouts =3D MIN(v, MAX_SYNCNT); > > + > > +=09debug("Read sysctl values syn_retries: %"PRIu8 > > +=09 ", syn_linear_timeouts: %"PRIu8, > > +=09 c->tcp.syn_retries, > > +=09 c->tcp.syn_linear_timeouts); > > +} > > + > > /** > > * tcp_init() - Get initial sequence, hash secret, initialise per-sock= et data > > * @c:=09=09Execution context > > @@ -2792,6 +2834,8 @@ int tcp_init(struct ctx *c) > > { > > =09ASSERT(!c->no_tcp); > > =20 > > +=09tcp_get_rto_params(c); > > + > > =09tcp_sock_iov_init(c); > > =20 > > =09memset(init_sock_pool4,=09=090xff,=09sizeof(init_sock_pool4)); > > diff --git a/tcp.h b/tcp.h > > index 0082386..37d7758 100644 > > --- a/tcp.h > > +++ b/tcp.h > > @@ -60,12 +60,16 @@ union tcp_listen_epoll_ref { > > * @fwd_out:=09=09Port forwarding configuration for outbound packets > > * @timer_run:=09=09Timestamp of most recent timer run > > * @pipe_size:=09=09Size of pipes for spliced connections > > + * @syn_retries:=09SYN retries using exponential backoff timeout > > + * @syn_linear_timeouts: SYN retries before using exponential backoff = timeout > > */ > > struct tcp_ctx { > > =09struct fwd_ports fwd_in; > > =09struct fwd_ports fwd_out; > > =09struct timespec timer_run; > > =09size_t pipe_size; > > +=09uint8_t syn_retries; > > +=09uint8_t syn_linear_timeouts; > > }; > > =20 > > #endif /* TCP_H */ > > --=20 > > 2.51.0 --=20 Stefano