From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=VJkiEDrf; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id EE26A5A0271 for ; Thu, 29 Jan 2026 18:33:49 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1769708028; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ThN5WIVpvcR+DvEmcGl0eJNi3b450HrXmXS+g1DkdOc=; b=VJkiEDrfvAubklOQbVaosZlGlUdVUjhznxLme4J7pQqSO7erroUfQAYfDMiyH/qo0MqSy4 CnTzm6CRIFjWm94zi8tKJjtC0z+uE5C9q4EdFxeMmVq+ASolitF77n2Hb34UawJAbFpjOL U4aTbuML3zwBbZOYFVTd9jPDGIUcj3Q= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-408-0t6UYTeuPjG93xGT5bJryQ-1; Thu, 29 Jan 2026 12:33:47 -0500 X-MC-Unique: 0t6UYTeuPjG93xGT5bJryQ-1 X-Mimecast-MFC-AGG-ID: 0t6UYTeuPjG93xGT5bJryQ_1769708026 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-42fd46385c0so850924f8f.0 for ; Thu, 29 Jan 2026 09:33:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769708026; x=1770312826; h=date:content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ThN5WIVpvcR+DvEmcGl0eJNi3b450HrXmXS+g1DkdOc=; b=Oh0CnDFDZe5u4jkuf6a+bAF+Epo5rZOuRPMTv/c6wGl8muSuyOthEt1tYN1vgVys7Z Cgzj4/43CAtwqAAMjptCFfzsNjj3JfwCLZuY77SrgDWB8EjG+t/Qk5PVl473JS6Jv/3I r55dwvVOS9Q7T5TKJtfutCb9D0YiqfCBsaO0YumpC3+9M0ZfcPuMCk7jZ/wXwLLXg6sS 5xcpk6GeKhaeG5SVIr1S2nVpKeXAu1JDb7GvtyccTvn+r9MbgRNgPFYascG1J8qAUmNJ 9iw/8Ih8oFWmUSlhRzmXmX3pW3pWB9WHvc+k2L9Sw3k80lLyrKoinU4YptwNfMnaDV1L Z8og== X-Gm-Message-State: AOJu0Yyms6iGjki4Z7mNDoLftv1xaJgMf/qI+40rQgNvefsTABVO/d81 ehT947vrbkaZgWi7ZcEOTIXBT3vhMMgepUiGc1g+MluECdQd1/dLU1zY8xkMAblBCbyhjHdkGnK NoC0hlkv3yiqlOBtO/RkvBzZF/fZEPfvUlZElDM39Gd40IXLwh2suOg== X-Gm-Gg: AZuq6aJ6TuapFbLJs7NXlSVzjJPWruKtp/1ElhOjr8sC9FGTy5QW/4HJcvKMPOticll 1VB3hwj9PIHpbspJWE79BxgwMdM1MOsFw6715k8TsaidFHQv76D0lf1LMEF0JWpVznya6I1PgnO QzyxPqjhcho675Ov6BIDqjznZQCZqd/ZzVsVfQ6L1PfAtIhkt+Awnfhp0eOh/aNkPrw73qLpaPd /mXROtD0SnPt6dusf2Da478Hvcf+4k+8y0h6FYZpXk31syDv+7DKkxdPhcupLstnS2YYKx9J6to N22vtaRIuRoGjPFN4f6etO36d11MYMVxZtManH3bMldPrBwdJ7lMyR5mDPC6PgET18cIZwRWUSd BVtqN8t+S2ZE6Xl+GLuBccfbn/ZrZk56Ubn9aTg== X-Received: by 2002:a05:6000:4205:b0:435:a501:359 with SMTP id ffacd0b85a97d-435f3aafd01mr529759f8f.41.1769708026246; Thu, 29 Jan 2026 09:33:46 -0800 (PST) X-Received: by 2002:a05:6000:4205:b0:435:a501:359 with SMTP id ffacd0b85a97d-435f3aafd01mr529712f8f.41.1769708025726; Thu, 29 Jan 2026 09:33:45 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [176.103.220.4]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-435e132368csm14654127f8f.31.2026.01.29.09.33.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 29 Jan 2026 09:33:45 -0800 (PST) From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 2/2] tcp: Eliminate FIN_TIMEOUT Message-ID: <20260129183330.5632ceff@elisabeth> In-Reply-To: <20260129055355.1229700-3-david@gibson.dropbear.id.au> References: <20260129055355.1229700-1-david@gibson.dropbear.id.au> <20260129055355.1229700-3-david@gibson.dropbear.id.au> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 Date: Thu, 29 Jan 2026 18:33:44 +0100 (CET) X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: YMSKTPk-v8LbgV_hGKxFfcb2rtkfkYNYR22mbe_Yw_Q_1769708026 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: Y2FLEQMSTMISDLAI6IVJ34N5I2FQQTPZ X-Message-ID-Hash: Y2FLEQMSTMISDLAI6IVJ34N5I2FQQTPZ X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, 29 Jan 2026 16:53:55 +1100 David Gibson wrote: > From the name, and it's value of 60s, FIN_TIMEOUT appears to be defined > as an equivalent to the kernel's net.ipv4.tcp_fin_timeout sysctl, which > controls how long the kernel will keep an orphaned (that is, closed by > userspace) socket which is in FIN_WAIT_2 state. Not really, that wasn't the intention. As you noted, we couldn't even implement that. > The theory of operation describes FIN_TIMEOUT thus: > > - FIN_TIMEOUT: if a FIN segment was acknowledged by tap/guest and a FIN > segment (write shutdown) was sent via socket (events SOCK_FIN_SENT and > TAP_FIN_ACKED), but no socket activity is detected from the socket within > this time, reset the connection This was the intention, and it used to match the implementation. The equivalent state would be FIN_WAIT_1. I accidentally dropped this case in commit be5bbb9b0681 ("tcp: Rework timers to use timerfd instead of periodic bitmap scan"): if (conn->events & SOCK_FIN_SENT && sock_act > FIN_TIMEOUT) goto rst; that is, if we detected the last socket activity (including EPOLLHUP) more than FIN_TIMEOUT seconds ago, and we (presumably) sent a FIN segment via the socket (calling shutdown()), we would reset the connection. > This description does not match what net.ipv4.tcp_fin_timeout does. The > test for SOCK_FIN_SENT does not check if we are in FIN_WAIT_2 or for an > orphaned socket. In fact SOCK_FIN_SENT means we *cannot* be in FIN_WAIT_2 > state (w.r.t. the tap side): we set it when we shutdown(SHUT_WR) the socket > connection. We do that only when we receive a FIN from the tap side, and > the TCP state diagram does not allow us to be in FIN_WAIT_2 state if we > already have a FIN from our peer. > > The description also doesn't match what the code does: in tcp_timer_ctl() > we only set FIN_TIMEOUT on our timer when when ACK_FROM_TAP_DUE is unset, > but we only act on the FIN_TIMEOUT if ACK_FROM_TAP_DUE *is* set. > > In fact it's impossible for us to implement something like > net.ipv4.tcp_fin_timeout, because recognizing an orphaned socket requires > out of band information (the fact the socket has been closed) that an > endpoint kernel has, but is not visible to us (via either tap or socket > interface). Therefore, entirely remove the FIN_TIMEOUT related logic. > > Signed-off-by: David Gibson > --- > tcp.c | 14 -------------- > 1 file changed, 14 deletions(-) > > diff --git a/tcp.c b/tcp.c > index dbfde2e0..0be871a4 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -190,11 +190,6 @@ > * - RTO_INIT_AFTER_SYN_RETRIES: if SYN retries happened during handshake and > * RTO is less than this, re-initialise RTO to this for data retransmissions > * > - * - FIN_TIMEOUT: if a FIN segment was acknowledged by tap/guest and a FIN > - * segment (write shutdown) was sent via socket (events SOCK_FIN_SENT and > - * TAP_FIN_ACKED), but no socket activity is detected from the socket within > - * this time, reset the connection I think the patch is correct, strictly speaking, but we should implement what's described here again. We won't otherwise have any mechanism to reset connections() if we call shutdown(), and no further events happen are reported by the socket. Should I apply the series anyway, or wait? > - * > * - ACT_TIMEOUT, in the presence of any event: if no activity is detected on > * either side, the connection is reset > * > @@ -341,7 +336,6 @@ enum { > > #define RTO_INIT 1 /* s, RFC 6298 */ > #define RTO_INIT_AFTER_SYN_RETRIES 3 /* s, RFC 6298 */ > -#define FIN_TIMEOUT 60 > #define ACT_TIMEOUT 7200 > > #define LOW_RTT_TABLE_SIZE 8 > @@ -594,8 +588,6 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) > timeout = MAX(timeout, RTO_INIT_AFTER_SYN_RETRIES); > timeout <<= MAX(exp, 0); > it.it_value.tv_sec = MIN(timeout, c->tcp.rto_max); > - } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > - it.it_value.tv_sec = FIN_TIMEOUT; > } else { > it.it_value.tv_sec = ACT_TIMEOUT; > } > @@ -715,9 +707,6 @@ void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn, > flow_hash_remove(c, TAP_SIDX(conn)); > tcp_epoll_ctl(conn); > } > - > - if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) > - tcp_timer_ctl(c, conn); > } > > /** > @@ -2590,9 +2579,6 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) > conn_flag(c, conn, SYN_RETRIED); > tcp_timer_ctl(c, conn); > } > - } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > - flow_dbg(conn, "FIN timeout"); > - tcp_rst(c, conn); > } else if (conn->retries == TCP_MAX_RETRIES) { > flow_dbg(conn, "retransmissions count exceeded"); > tcp_rst(c, conn); -- Stefano