From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=YleXB7an; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 52ADD5A0619 for ; Fri, 24 Oct 2025 10:37:24 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1761295043; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D4FBtxtltAhAMe1L0Oc9IWsSFekJ7pkI1NZPyvQeV4k=; b=YleXB7ani7u9uY/1tD88GSHyEUSnt21oxDr55xFlSDgkKsEEamTWeCoCd9O39SA6Yu8nr1 uHgpcO0VNfV8jB5KiPbAOLP0sdyXh+WfMRnrA6Rn6nwc+R/SwkJ9Xi8yxoMKeE8LpQOGPl ife+poxcMlrT4XizUNTI3+r0ItzdZFU= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-178-hbtstnLdMvGqg3-Gw2_68w-1; Fri, 24 Oct 2025 04:37:21 -0400 X-MC-Unique: hbtstnLdMvGqg3-Gw2_68w-1 X-Mimecast-MFC-AGG-ID: hbtstnLdMvGqg3-Gw2_68w_1761295040 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-475c422fd70so10461225e9.2 for ; Fri, 24 Oct 2025 01:37:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761295040; x=1761899840; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=D4FBtxtltAhAMe1L0Oc9IWsSFekJ7pkI1NZPyvQeV4k=; b=xU2CFJLXkirFDrh1XXh6K3EYOIw0EAh/kfWvKBJnBts2Ajd0CCvIR8xA3rZ7avZAxs adHN7WcvGKPvV+PTQzsKOniF7YJjM4Ygjf+X1bNfP3vlqR4f9WLIQo2PQUGtiOQ0GTg2 f4YQ52D1XKMUJTWSu9X450uN764cAMcgK45TJ/ZK+BDyVb0y60NBpUOYYVnaXbIetWh9 sVkyMQcSkyR49QO6ErORg6td49b8e7ZhSsDOSgnhrye/Qfhdjgegj2CRKjvok9UfxAu4 UUkZjDsSFDY3Vh3Db2SsQZpIxseSs+aA/OTiRkKqmzm0VNSehYmgvd6LpmcfXsRgxZxe ZXXw== X-Forwarded-Encrypted: i=1; AJvYcCUdAEplgCTOYkDeWmYxdekNetI5C+QREpClIJ6/K2kl47st/T8WQDIxNj6yfqmXntS85LUQ/ZEyqmM=@passt.top X-Gm-Message-State: AOJu0Yy91lWPgrNJUv4BrqW8/zOh9gduw9uey2tesZyKfb4pftQrI7xY GZ64o6Q/ilYhJ8RuStUW0GZHEpl57XCbx8mS8zhtbA5W6+EtOtHrZS6lfHPfVthQp+sRjCfNTPW G7VuKLOTjlAF3OD5frANCcdI4HX2cgj9kkBt/AjeFeo0vxO7W2zGngg== X-Gm-Gg: ASbGncvO41wupCV+lAJqUMABEb6rj5LpszMpt43+7k/6+ry3GSiPaiTXw8lTBGyC3Bm bHP3Pjq1Jj+U/Gwye5iyYn9wZLBvfmRi/CKc/WUPfbdKcI8SVaorUSRfXJtJ35eLN7dR2D0PVGC 0dHI4tkGYG/Ge1/MrH+VvnVqYl/DH32kbeKvNM8jYhovjy7dHOZzh7FZbWj/cgKBWc1RZ8lUdRy p/gBdukdoSPTnc/t6/Rw7/NLKKKHUbhmuTQYlKPfSofdAFSik0iT8PqZtWjRdnsp+/VTtZ594C/ RYGfoGhcbVTidz1hfsJGpDXv0Zkcqmid8kSVmcMv9gHFSXCh9ZioUqdn7mQiCYjRRiJZJXfosqo /hRkFGyTUvg== X-Received: by 2002:a05:600c:674a:b0:46e:4a30:2b0f with SMTP id 5b1f17b1804b1-4711791ad4bmr214183945e9.29.1761295040170; Fri, 24 Oct 2025 01:37:20 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFqKTpu1p7TIcbdQu5G3rkBH9/g8ZAkUzuYJb2VIQw2kH9lJ3pnCAAk7J0HM4IqNFthMvxKog== X-Received: by 2002:a05:600c:674a:b0:46e:4a30:2b0f with SMTP id 5b1f17b1804b1-4711791ad4bmr214183585e9.29.1761295039585; Fri, 24 Oct 2025 01:37:19 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-47496b09bc4sm83298195e9.2.2025.10.24.01.37.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Oct 2025 01:37:18 -0700 (PDT) Date: Fri, 24 Oct 2025 10:37:17 +0200 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH v6 3/4] tcp: Resend SYN for inbound connections Message-ID: <20251024103717.715fe49e@elisabeth> In-Reply-To: References: <20251017062838.21041-1-yuhuang@redhat.com> <20251017062838.21041-4-yuhuang@redhat.com> <20251024010431.4329a843@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 1GaevkBiuXvNiyehkTqz9qYSkXmXgs1tffgdYJGsN5o_1761295040 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: HBVZ75W24QZCF4KA2VSCM6JB3JEMNCNR X-Message-ID-Hash: HBVZ75W24QZCF4KA2VSCM6JB3JEMNCNR X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Yumei Huang , passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, 24 Oct 2025 14:30:09 +1100 David Gibson wrote: > On Fri, Oct 24, 2025 at 01:04:31AM +0200, Stefano Brivio wrote: > > On Fri, 17 Oct 2025 14:28:37 +0800 > > Yumei Huang wrote: > > > > > If a client connects while guest is not connected or ready yet, > > > resend SYN instead of just resetting connection after 10 seconds. > > > > > > Use the same backoff calculation for the timeout as linux kernel. > > > > Linux. > > > > > > > > Link: https://bugs.passt.top/show_bug.cgi?id=153 > > > Signed-off-by: Yumei Huang > > > --- > > > tcp.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++-------- > > > tcp.h | 5 +++++ > > > 2 files changed, 52 insertions(+), 8 deletions(-) > > > > > > diff --git a/tcp.c b/tcp.c > > > index 2ec4b0c..9385132 100644 > > > --- a/tcp.c > > > +++ b/tcp.c > > > @@ -179,9 +179,11 @@ > > > * > > > * Timeouts are implemented by means of timerfd timers, set based on flags: > > > * > > > - * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshake (flag > > > - * ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, reset the > > > - * connection > > > + * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake > > > + * (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend > > > + * SYN. It's the starting timeout for the first SYN retry. If this persists > > > > "If this persists" makes sense for the existing ACK_TIMEOUT > > description but not here, because it looks like it refers to "starting > > timeout". > > > > Coupled with the next patch, it becomes increasingly difficult to > > understand what "this" persisting thing is. > > Yeah. This was my suggested wording, based on the existing wording > for ACK_TIMEOUT. It's not great, but I struggled a bit to find better > wording. > > > Maybe directly say "Retry for ..., then reset the connection"? It's > > shorter and clearer. > > There it is :). "Retry (NNN) times, then reset the connection". > > [snip] > > > @@ -2409,8 +2419,17 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) > > > tcp_timer_ctl(c, conn); > > > } else if (conn->flags & ACK_FROM_TAP_DUE) { > > > if (!(conn->events & ESTABLISHED)) { > > > - flow_dbg(conn, "handshake timeout"); > > > - tcp_rst(c, conn); > > > + if (conn->retries >= TCP_MAX_RETRIES || > > > + conn->retries >= (c->tcp.tcp_syn_retries + > > > + c->tcp.syn_linear_timeouts)) { > > > + flow_dbg(conn, "handshake timeout"); > > > + tcp_rst(c, conn); > > > + } else { > > > + flow_trace(conn, "SYN timeout, retry"); > > > + tcp_send_flag(c, conn, SYN); > > > + conn->retries++; > > > > I think I already raised this point on a previous revision: this needs > > to be zeroed as the connection is established, but I don't see that in > > the current version. > > Yes, you raised that, but then I realised it's already handled. I > think I put that in the thread, not just direct to Yumei, but maybe > not? Or it just got lost in the minutiae. Yes, here: https://archives.passt.top/passt-dev/aOxFRfJjPWy0ZW0M@zatzit this is another example of what I meant about (potential) advantages of a fully threaded (email) workflow. In this case, I didn't review v2, which came before you could post this to my comment on v1, but in a normal case, we could have settled this earlier, once for all. > When we receive a SYN-ACK, it will have th->ack_seq advanced a byte > acknowledging the SYN. tcp_tap_handler() calls > tcp_update_seqack_from_tap() in the !ESTABLISHED case which will see > the new ack_seq and clear retries (retrans before this series). It doesn't look obvious at all to me. We're unlikely to break it in the future, so I don't think it's fragile in the long term, but... can one of you double check that it's actually the case with a manual one-off test? > > > + tcp_timer_ctl(c, conn); > > > + } > > > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > > > flow_dbg(conn, "FIN timeout"); > > > tcp_rst(c, conn); > > > @@ -2766,6 +2785,24 @@ static socklen_t tcp_probe_tcp_info(void) > > > return sl; > > > } > > > > > > +/** > > > + * tcp_syn_params_init() - Get initial SYN parameters for inbound connection > > > > They're not initial, they'll be used for all the connections if I > > understand correctly. > > > > Maybe "Get SYN retries sysctl values"? I think the _init() in the > > function name is also somewhat misleading. > > "Get host kernel RTO parameters"? Since we're thinking of extending > this to cover the RTO upper bound as well as the SYN specific > parameters. Ah, maybe yes, better. > > > + * @c: Execution context > > > +*/ > > > +void tcp_syn_params_init(struct ctx *c) > > > +{ > > > + intmax_t tcp_syn_retries, syn_linear_timeouts; > > > + > > > + tcp_syn_retries = read_file_integer(TCP_SYN_RETRIES, 8); > > > > Why 8? Perhaps a #define would help? > > > > > + syn_linear_timeouts = read_file_integer(TCP_SYN_LINEAR_TIMEOUTS, 1); > > > + > > > + c->tcp.tcp_syn_retries = MIN(tcp_syn_retries, UINT8_MAX); > > > + c->tcp.syn_linear_timeouts = MIN(syn_linear_timeouts, UINT8_MAX); > > > + > > > + debug("TCP SYN parameters: retries=%"PRIu8", linear_timeouts=%"PRIu8, > > > > Similar to the comment above: these are not parameters of SYN segments > > (which would seem to imply TCP options, such as the MSS). > > > > We typically don't print C assignments, rather human-readable messages, > > so that could be "Read sysctl values tcp_syn_retries: ..., > > syn_linear_timeouts: ...". > > > > > > > + c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts); > > > +} > > > + > > > /** > > > * tcp_init() - Get initial sequence, hash secret, initialise per-socket data > > > * @c: Execution context > > > @@ -2776,6 +2813,8 @@ int tcp_init(struct ctx *c) > > > { > > > ASSERT(!c->no_tcp); > > > > > > + tcp_syn_params_init(c); > > > + > > > tcp_sock_iov_init(c); > > > > > > memset(init_sock_pool4, 0xff, sizeof(init_sock_pool4)); > > > diff --git a/tcp.h b/tcp.h > > > index 234a803..4369b52 100644 > > > --- a/tcp.h > > > +++ b/tcp.h > > > @@ -59,12 +59,17 @@ union tcp_listen_epoll_ref { > > > * @fwd_out: Port forwarding configuration for outbound packets > > > * @timer_run: Timestamp of most recent timer run > > > * @pipe_size: Size of pipes for spliced connections > > > + * @tcp_syn_retries: Number of SYN retries during handshake > > > + * @syn_linear_timeouts: Number of SYN retries using linear backoff timeout > > > + * before switching to exponential backoff timeout > > > > Maybe more compact: > > > > * @syn_linear_timeouts: SYN retries before using exponential timeout > > > > > */ > > > struct tcp_ctx { > > > struct fwd_ports fwd_in; > > > struct fwd_ports fwd_out; > > > struct timespec timer_run; > > > size_t pipe_size; > > > + uint8_t tcp_syn_retries; > > > + uint8_t syn_linear_timeouts; > > > }; > > > > > > #endif /* TCP_H */ -- Stefano