* [PATCH v2 0/3] Fix errors in FIN timeout logic
@ 2026-01-30 4:41 David Gibson
2026-01-30 4:41 ` [PATCH v2 1/3] tcp: Retransmit FINs like data segments David Gibson
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: David Gibson @ 2026-01-30 4:41 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
While investigating bug 179, I found a number of things that confused
me about the TCP timer handling. One of them, I think I figured out
what's going on and what should be done about it. So, here are the
changes. This is mostly about FIN handling and only tangentially
about the timer, but it does at least slightly simplify the timer
handling while I figure out the rest of it.
Changes in v2:
* Stefano pointed out some errors in my guesses at the history of
things, evised commit message of 2/3 accordingly
* Added 3/3 checking for shutdown(2) failures
David Gibson (3):
tcp: Retransmit FINs like data segments
tcp: Eliminate FIN_TIMEOUT
tcp, tcp_splice: Check for failures of shutdown(2)
tcp.c | 49 ++++++++++++++++++++++++-------------------------
tcp_buf.c | 1 +
tcp_splice.c | 3 ++-
tcp_vu.c | 1 +
4 files changed, 28 insertions(+), 26 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2 1/3] tcp: Retransmit FINs like data segments
2026-01-30 4:41 [PATCH v2 0/3] Fix errors in FIN timeout logic David Gibson
@ 2026-01-30 4:41 ` David Gibson
2026-01-30 4:41 ` [PATCH v2 2/3] tcp: Eliminate FIN_TIMEOUT David Gibson
2026-01-30 4:41 ` [PATCH v2 3/3] tcp, tcp_splice: Check for failures of shutdown(2) David Gibson
2 siblings, 0 replies; 4+ messages in thread
From: David Gibson @ 2026-01-30 4:41 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
RFC 9293 doesn't distinguish between regular data segments and FIN segments
for the purposes of retransmissions. Our existing retransmission logic
will also work for FIN segments, except for one detail: we don't currently
set the ACK_FROM_TAP_DUE flag when we send a FIN. Add the flag, so that
we'll properly retransmit FIN segments like data segments.
Remove the section from the theory of operation comment that describes a
different way of handling FIN timeouts which (a) isn't correct behaviour
and (b) doesn't appear to be implemented.
I've tested this by adding logic to suppress sending the FIN if retries <
some non-zero value. We correctly resend the FIN and close normally after
the expected timeouts.
Link: https://bugs.passt.top/show_bug.cgi?id=195
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp.c | 4 ----
tcp_buf.c | 1 +
tcp_vu.c | 1 +
3 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/tcp.c b/tcp.c
index 17e5b006..dbfde2e0 100644
--- a/tcp.c
+++ b/tcp.c
@@ -190,10 +190,6 @@
* - RTO_INIT_AFTER_SYN_RETRIES: if SYN retries happened during handshake and
* RTO is less than this, re-initialise RTO to this for data retransmissions
*
- * - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK_FROM_TAP_DUE
- * with TAP_FIN_SENT event), and no ACK is received within this time, reset
- * the connection
- *
* - FIN_TIMEOUT: if a FIN segment was acknowledged by tap/guest and a FIN
* segment (write shutdown) was sent via socket (events SOCK_FIN_SENT and
* TAP_FIN_ACKED), but no socket activity is detected from the socket within
diff --git a/tcp_buf.c b/tcp_buf.c
index 5d419d36..d2925410 100644
--- a/tcp_buf.c
+++ b/tcp_buf.c
@@ -407,6 +407,7 @@ int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
}
conn_event(c, conn, TAP_FIN_SENT);
+ conn_flag(c, conn, ACK_FROM_TAP_DUE);
}
return 0;
diff --git a/tcp_vu.c b/tcp_vu.c
index db9db78a..b9e9b55e 100644
--- a/tcp_vu.c
+++ b/tcp_vu.c
@@ -425,6 +425,7 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
}
conn_event(c, conn, TAP_FIN_SENT);
+ conn_flag(c, conn, ACK_FROM_TAP_DUE);
}
return 0;
--
2.52.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2 2/3] tcp: Eliminate FIN_TIMEOUT
2026-01-30 4:41 [PATCH v2 0/3] Fix errors in FIN timeout logic David Gibson
2026-01-30 4:41 ` [PATCH v2 1/3] tcp: Retransmit FINs like data segments David Gibson
@ 2026-01-30 4:41 ` David Gibson
2026-01-30 4:41 ` [PATCH v2 3/3] tcp, tcp_splice: Check for failures of shutdown(2) David Gibson
2 siblings, 0 replies; 4+ messages in thread
From: David Gibson @ 2026-01-30 4:41 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
Despite the name and its value of 60s, FIN_TIMEOUT is not related to
the kernel's net.ipv4.tcp_fin_timeout sysctl. Indeed, we can't make
an equivalent to that, since it relies on information that endpoint
kernels have, but we do not.
Neither is it simply the time to wait for an ACK to a FIN. It may
have been intended as that at some point, but the implementation has
not matched that for some time. In any case RFC9293 makes no
distinction between ACKs to FIN segments and ACKs to data segments, so
we now implement handling of ACKs to FINs with the same code path as
ACKs to data segments.
The theory of operation describes FIN_TIMEOUT thus:
- FIN_TIMEOUT: if a FIN segment was acknowledged by tap/guest and a FIN
segment (write shutdown) was sent via socket (events SOCK_FIN_SENT and
TAP_FIN_ACKED), but no socket activity is detected from the socket within
this time, reset the connection
In other words, it's attempting to handle the case that we
shutdown(SHUT_WR) on the socket side (causing the kernel to send a
FIN), but the kernel never responds with an EPOLLHUP event indicating
the peer has acked the FIN.
The description doesn't match what the code does: in tcp_timer_ctl()
we only set FIN_TIMEOUT on our timer when when ACK_FROM_TAP_DUE is
unset, but we only act on the FIN_TIMEOUT if ACK_FROM_TAP_DUE *is*
(also) set.
In fact, there's no need to handle this case. Once we've called
shutdown(SHUT_WR), it's the kernel's responsibility to resend FINs as
needed (and reset the connection if that times out). Therefore,
entirely remove the FIN_TIMEOUT related logic.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp.c | 14 --------------
1 file changed, 14 deletions(-)
diff --git a/tcp.c b/tcp.c
index dbfde2e0..0be871a4 100644
--- a/tcp.c
+++ b/tcp.c
@@ -190,11 +190,6 @@
* - RTO_INIT_AFTER_SYN_RETRIES: if SYN retries happened during handshake and
* RTO is less than this, re-initialise RTO to this for data retransmissions
*
- * - FIN_TIMEOUT: if a FIN segment was acknowledged by tap/guest and a FIN
- * segment (write shutdown) was sent via socket (events SOCK_FIN_SENT and
- * TAP_FIN_ACKED), but no socket activity is detected from the socket within
- * this time, reset the connection
- *
* - ACT_TIMEOUT, in the presence of any event: if no activity is detected on
* either side, the connection is reset
*
@@ -341,7 +336,6 @@ enum {
#define RTO_INIT 1 /* s, RFC 6298 */
#define RTO_INIT_AFTER_SYN_RETRIES 3 /* s, RFC 6298 */
-#define FIN_TIMEOUT 60
#define ACT_TIMEOUT 7200
#define LOW_RTT_TABLE_SIZE 8
@@ -594,8 +588,6 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn)
timeout = MAX(timeout, RTO_INIT_AFTER_SYN_RETRIES);
timeout <<= MAX(exp, 0);
it.it_value.tv_sec = MIN(timeout, c->tcp.rto_max);
- } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
- it.it_value.tv_sec = FIN_TIMEOUT;
} else {
it.it_value.tv_sec = ACT_TIMEOUT;
}
@@ -715,9 +707,6 @@ void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn,
flow_hash_remove(c, TAP_SIDX(conn));
tcp_epoll_ctl(conn);
}
-
- if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED))
- tcp_timer_ctl(c, conn);
}
/**
@@ -2590,9 +2579,6 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref)
conn_flag(c, conn, SYN_RETRIED);
tcp_timer_ctl(c, conn);
}
- } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) {
- flow_dbg(conn, "FIN timeout");
- tcp_rst(c, conn);
} else if (conn->retries == TCP_MAX_RETRIES) {
flow_dbg(conn, "retransmissions count exceeded");
tcp_rst(c, conn);
--
2.52.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2 3/3] tcp, tcp_splice: Check for failures of shutdown(2)
2026-01-30 4:41 [PATCH v2 0/3] Fix errors in FIN timeout logic David Gibson
2026-01-30 4:41 ` [PATCH v2 1/3] tcp: Retransmit FINs like data segments David Gibson
2026-01-30 4:41 ` [PATCH v2 2/3] tcp: Eliminate FIN_TIMEOUT David Gibson
@ 2026-01-30 4:41 ` David Gibson
2 siblings, 0 replies; 4+ messages in thread
From: David Gibson @ 2026-01-30 4:41 UTC (permalink / raw)
To: passt-dev, Stefano Brivio; +Cc: David Gibson
shutdown(2) should never fail, unless we give it bad parameters (e.g.
passing it an fd which isn't a connected socket). However, if it ever did,
we'd currently ignore the error and carry on which could lead to very
confusing behaviour.
In the interests of debugability, check for failure of shutdown(2), log an
error and:
- during runtime, reset the affected connection
- during migration, fail the migration
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
tcp.c | 31 ++++++++++++++++++++++++-------
tcp_splice.c | 3 ++-
2 files changed, 26 insertions(+), 8 deletions(-)
diff --git a/tcp.c b/tcp.c
index 0be871a4..9dd02cd8 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2284,7 +2284,11 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
if (th->fin) {
conn->seq_from_tap++;
- shutdown(conn->sock, SHUT_WR);
+ if (shutdown(conn->sock, SHUT_WR) < 0) {
+ flow_dbg_perror(conn, "shutdown() failed");
+ goto reset;
+ }
+
tcp_send_flag(c, conn, ACK);
conn_event(c, conn, SOCK_FIN_SENT);
@@ -2359,7 +2363,11 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
socklen_t sl;
struct tcp_info tinfo;
- shutdown(conn->sock, SHUT_WR);
+ if (shutdown(conn->sock, SHUT_WR) < 0) {
+ flow_dbg_perror(conn, "shutdown() failed");
+ goto reset;
+ }
+
conn_event(c, conn, SOCK_FIN_SENT);
tcp_send_flag(c, conn, ACK);
ack_due = 0;
@@ -3831,10 +3839,15 @@ int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn, int fd
int v;
v = TCP_SEND_QUEUE;
- if (setsockopt(s, SOL_TCP, TCP_REPAIR_QUEUE, &v, sizeof(v)))
+ if (setsockopt(s, SOL_TCP, TCP_REPAIR_QUEUE, &v, sizeof(v))) {
flow_perror(conn, "Selecting repair queue");
- else
- shutdown(s, SHUT_WR);
+ } else {
+ if (shutdown(s, SHUT_WR) < 0) {
+ flow_perror(conn,
+ "Repair mode shutdown() failed");
+ goto fail;
+ }
+ }
}
if (tcp_flow_repair_wnd(conn, &t))
@@ -3861,8 +3874,12 @@ int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn, int fd
* Call shutdown(x, SHUT_WR) *not* in repair mode, which moves us to
* TCP_FIN_WAIT1.
*/
- if (t.tcpi_state == TCP_FIN_WAIT1)
- shutdown(s, SHUT_WR);
+ if (t.tcpi_state == TCP_FIN_WAIT1) {
+ if (shutdown(s, SHUT_WR) < 0) {
+ flow_perror(conn, "Post-repair shutdown() failed");
+ goto fail;
+ }
+ }
if (tcp_set_peek_offset(conn, peek_offset))
goto fail;
diff --git a/tcp_splice.c b/tcp_splice.c
index 8806523a..d60981ca 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -627,7 +627,8 @@ retry:
flow_foreach_sidei(sidei) {
if ((conn->events & FIN_RCVD(sidei)) &&
!(conn->events & FIN_SENT(!sidei))) {
- shutdown(conn->s[!sidei], SHUT_WR);
+ if (shutdown(conn->s[!sidei], SHUT_WR) < 0)
+ goto reset;
conn_event(conn, FIN_SENT(!sidei));
}
}
--
2.52.0
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-01-30 4:41 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-30 4:41 [PATCH v2 0/3] Fix errors in FIN timeout logic David Gibson
2026-01-30 4:41 ` [PATCH v2 1/3] tcp: Retransmit FINs like data segments David Gibson
2026-01-30 4:41 ` [PATCH v2 2/3] tcp: Eliminate FIN_TIMEOUT David Gibson
2026-01-30 4:41 ` [PATCH v2 3/3] tcp, tcp_splice: Check for failures of shutdown(2) David Gibson
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).