From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=bSiOki2i; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id B09AD5A0619 for ; Fri, 17 Oct 2025 00:22:54 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1760653373; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=x1Sv11KVlTCpg+MvzlOcHhfOjMAwuNtDnqnr6+fanyg=; b=bSiOki2iFhGOSdG8S8Az0JosKdGWSWg9CbJ7scszorG6xDARyQdBYTuHzaHVWm3tTcoKql hefgqTpm74StuomtEaOTTW+XhsK/gt3cqjClrGWk/JhesbaJVOy7BnGoIsCottBhwmjoTz VXgQFLHxaYD822y/LGU0Kq+AvqsTbm4= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-460-f3qezlh_Mcq3CeJ37UJkOQ-1; Thu, 16 Oct 2025 18:22:52 -0400 X-MC-Unique: f3qezlh_Mcq3CeJ37UJkOQ-1 X-Mimecast-MFC-AGG-ID: f3qezlh_Mcq3CeJ37UJkOQ_1760653370 Received: by mail-wr1-f71.google.com with SMTP id ffacd0b85a97d-3ee10a24246so828981f8f.3 for ; Thu, 16 Oct 2025 15:22:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760653369; x=1761258169; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=x1Sv11KVlTCpg+MvzlOcHhfOjMAwuNtDnqnr6+fanyg=; b=r1J1ORw15joxtV3IOoLneWK1LNSZUqd3zU/udYTv0DU+VbGB8aK758LjvvTKkmbWTu ZlQEndpmeuBgYEAQ1gW2vuwwH9cOahwu+SRe3+84NyuEMLNNZ+yvXL3SLLkKVkmmZwOu 4NpS0p2NnmgU77qnizDiqYpLajySyJ394rD1CJkUAQ8r+/puv9QIqz7Tdqyr38fFJPhy igRWexvlBT6U9brTYzTcgmZIH59457Mq5Z1IPwUU8BBuqCvJgHMb6h1VhBs6mHK8pFRV UF6/zwihkIyYUKRx6hNqgNrQUOw5irtTlYYr+4erA56XsUq9ExGWUlCft5vNwoFnnm4G ROdQ== X-Gm-Message-State: AOJu0Yx4BZjcuhfRjTfklRuSq0uzUEIlVDFj6Lo7ECL0lIgI+rsLIgc+ tUAuNw+wKlqraFMPAtHdW1u2U2QLbdzgF09+lu1gqSwI45y8pDd/UT5IJ+mSxOzqNzHiX7Z59Fu XJp02YE50FmPHkoch+afdSm6Pzu172Qy0QZl9alEo3fM+xfpUAs3cvLYlP0+eOA== X-Gm-Gg: ASbGncvvwg4dlDtl1/Za+Tnq0jUIHXUeceZLLAOGMA0vvCOH9vzKrUXiIUN+K/coA+C fLYtrXX9X0fUcFbnLi5kWrP9cySyvFl+SPmRhO9RIab6OWKWSus8Tr11WQT2sBKEFy+42LjRQIu wPUhvzyXaNrdZipc2/r5oRaJFy3HpJbQWmNxa60Bc3yScOXEUApPeRgonFJRfebL0I5bR/RCoWe LR0h+V7rTmfytRqFpzu/BZ/kt7mS2lXpTTMXXn4yqznyMV15MgFcrRk0xKFqlxT+VJUFtZlrUSQ XFUCowdjoCpZtV3MtzAPwPTqTnTo6hAEoe42nqlXkDghLzX4p/5CnOY4+vwEFFJLmZsItRznT1t W8KkLwKXt5w== X-Received: by 2002:a05:6000:2309:b0:426:d5bf:aa7 with SMTP id ffacd0b85a97d-42704db5bbemr1207958f8f.63.1760653369445; Thu, 16 Oct 2025 15:22:49 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH5sAWIX8PT3Frgur5OBY36zequtGYwnYBrN6FPlMNcEmnUw3PRxD4jzDDCmnzC37zNus0NQg== X-Received: by 2002:a05:6000:2309:b0:426:d5bf:aa7 with SMTP id ffacd0b85a97d-42704db5bbemr1207949f8f.63.1760653368910; Thu, 16 Oct 2025 15:22:48 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-427013f9b58sm5402274f8f.51.2025.10.16.15.22.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Oct 2025 15:22:48 -0700 (PDT) Date: Fri, 17 Oct 2025 00:22:46 +0200 From: Stefano Brivio To: Yumei Huang Subject: Re: [PATCH v4 3/4] tcp: Resend SYN for inbound connections Message-ID: <20251017002246.20a6dc40@elisabeth> In-Reply-To: <20251016023423.8923-4-yuhuang@redhat.com> References: <20251016023423.8923-1-yuhuang@redhat.com> <20251016023423.8923-4-yuhuang@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: tPDm-6eea6V5CZvAaKbQYFPsE4EsI5NYjH834Z_EsmM_1760653370 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: NOZEI5FM4XQQEDPQQALGMZWB6FTD7P2T X-Message-ID-Hash: NOZEI5FM4XQQEDPQQALGMZWB6FTD7P2T X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, david@gibson.dropbear.id.au X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, 16 Oct 2025 10:34:22 +0800 Yumei Huang wrote: > If a client connects while guest is not connected or ready yet, > resend SYN instead of just resetting connection after 10 seconds. > > Use the same backoff calculation for the timeout as linux kernel. > > Signed-off-by: Yumei Huang > --- > tcp.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++-------- > tcp.h | 2 ++ > 2 files changed, 49 insertions(+), 8 deletions(-) > > diff --git a/tcp.c b/tcp.c > index 2ec4b0c..3003333 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -179,9 +179,11 @@ > * > * Timeouts are implemented by means of timerfd timers, set based on flags: > * > - * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshake (flag > - * ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, reset the > - * connection > + * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake > + * (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend > + * SYN. It's the starting timeout for the first SYN retry. If this persists > + * for more than TCP_MAX_RETRIES or (tcp_syn_retries + > + * tcp_syn_linear_timeouts) times in a row, reset the connection > * > * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending > * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the > @@ -340,7 +342,7 @@ enum { > #define WINDOW_DEFAULT 14600 /* RFC 6928 */ > > #define ACK_INTERVAL 10 /* ms */ > -#define SYN_TIMEOUT 10 /* s */ > +#define SYN_TIMEOUT_INIT 1 /* s */ > #define ACK_TIMEOUT 2 > #define FIN_TIMEOUT 60 > #define ACT_TIMEOUT 7200 > @@ -365,6 +367,10 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX]; > > #define TCP_MIGRATE_RESTORE_CHUNK_MIN 1024 /* Try smaller when above this */ > > +#define TCP_SYN_RETRIES_SYSCTL "/proc/sys/net/ipv4/tcp_syn_retries" > +#define TCP_SYN_LINEAR_TIMEOUTS_SYSCTL \ > + "/proc/sys/net/ipv4/tcp_syn_linear_timeouts" It's quite obvious those are names of sysctl entries, I think we can drop the _SYSCTL suffix and keep this shorter without losing any information/indication. > + > /* "Extended" data (not stored in the flow table) for TCP flow migration */ > static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX]; > > @@ -581,8 +587,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) > if (conn->flags & ACK_TO_TAP_DUE) { > it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000; > } else if (conn->flags & ACK_FROM_TAP_DUE) { > - if (!(conn->events & ESTABLISHED)) > - it.it_value.tv_sec = SYN_TIMEOUT; > + if (!(conn->events & ESTABLISHED)) { > + if (conn->retries < c->tcp.syn_linear_timeouts) > + it.it_value.tv_sec = SYN_TIMEOUT_INIT; > + else > + it.it_value.tv_sec = SYN_TIMEOUT_INIT << > + (conn->retries - c->tcp.syn_linear_timeouts); > + } > else > it.it_value.tv_sec = ACK_TIMEOUT; > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > @@ -2409,8 +2420,16 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) > tcp_timer_ctl(c, conn); > } else if (conn->flags & ACK_FROM_TAP_DUE) { > if (!(conn->events & ESTABLISHED)) { > - flow_dbg(conn, "handshake timeout"); > - tcp_rst(c, conn); > + if (conn->retries >= MIN(TCP_MAX_RETRIES, > + (c->tcp.tcp_syn_retries + c->tcp.syn_linear_timeouts))) { That doesn't seem to match the sysctl documentation for tcp_syn_retries, which should be the *total* number of retries, not excluding the ones with "linear timeouts". This is pretty hard to read, by the way. It could be: if (conn->retries >= TCP_MAX_RETRIES || conn->retries >= ...) { > + flow_dbg(conn, "handshake timeout"); > + tcp_rst(c, conn); > + } else { > + flow_trace(conn, "SYN timeout, retry"); > + tcp_send_flag(c, conn, SYN); > + conn->retries++; > + tcp_timer_ctl(c, conn); > + } > } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > flow_dbg(conn, "FIN timeout"); > tcp_rst(c, conn); > @@ -2766,6 +2785,24 @@ static socklen_t tcp_probe_tcp_info(void) > return sl; > } > > +/** > + * tcp_syn_params_init() - Get initial syn params for inbound connection SYN, parameters > + * @c: Execution context > +*/ > +void tcp_syn_params_init(struct ctx *c) > +{ > + long tcp_syn_retries, syn_linear_timeouts; > + > + tcp_syn_retries = read_file_integer(TCP_SYN_RETRIES_SYSCTL, 8); > + syn_linear_timeouts = read_file_integer(TCP_SYN_LINEAR_TIMEOUTS_SYSCTL, 1); > + > + c->tcp.tcp_syn_retries = (uint8_t)MIN(tcp_syn_retries, UINT8_MAX); > + c->tcp.syn_linear_timeouts = (uint8_t)MIN(syn_linear_timeouts, UINT8_MAX); I don't think you need those casts, MIN(..., UINT8_MAX) already guarantees that the number is <= UINT8_MAX, and in any case the cast won't fix anything here. > + > + debug("TCP SYN parameters: retries=%"PRIu8", linear_timeouts=%"PRIu8, > + c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts); > +} > + > /** > * tcp_init() - Get initial sequence, hash secret, initialise per-socket data > * @c: Execution context > @@ -2776,6 +2813,8 @@ int tcp_init(struct ctx *c) > { > ASSERT(!c->no_tcp); > > + tcp_syn_params_init(c); > + > tcp_sock_iov_init(c); > > memset(init_sock_pool4, 0xff, sizeof(init_sock_pool4)); > diff --git a/tcp.h b/tcp.h > index 234a803..df699a4 100644 > --- a/tcp.h > +++ b/tcp.h > @@ -65,6 +65,8 @@ struct tcp_ctx { > struct fwd_ports fwd_out; > struct timespec timer_run; > size_t pipe_size; > + uint8_t tcp_syn_retries; > + uint8_t syn_linear_timeouts; These should be added to the documentation for struct tcp_ctx, above. > }; > > #endif /* TCP_H */ -- Stefano