* [PATCH RFT] tcp: Fix ACK sequence on FIN to tap, handle retransmission in half-closed state
@ 2025-09-30 20:51 Stefano Brivio
2025-09-30 20:58 ` Stefano Brivio
2025-09-30 21:55 ` Stefano Brivio
0 siblings, 2 replies; 3+ messages in thread
From: Stefano Brivio @ 2025-09-30 20:51 UTC (permalink / raw)
To: passt-dev
If we reach end-of-file on a socket (or get EPOLLRDHUP / EPOLLHUP) and
send a FIN segment to the guest / container acknowledging a sequence
number that's behind what we received so far, we won't have any
further trigger to send an updated ACK segment, as we are now
switching the epoll socket monitoring to edge-triggered mode.
To avoid this situation, in tcp_update_seqack_wnd(), we set the next
acknowledgement sequence to the current observed sequence, regardless
of what was acknowledged socket-side.
However, we don't necessarily call tcp_update_seqack_wnd() before
sending the FIN segment, which might lead to this exchange, where
192.168.10.102 is a HTTP server in a Podman container, and
192.168.10.44 is a client fetching approximately 160 KB of data
from it:
82 2.026811 192.168.10.102 → 192.168.10.44 54 TCP 55414 → 44992 [FIN, ACK] Seq=121441 Ack=143 Win=65536 Len=0
the server is done sending
83 2.026898 192.168.10.44 → 192.168.10.102 54 TCP 44992 → 55414 [ACK] Seq=143 Ack=114394 Win=216192 Len=0
pasta (client) acknowledges a previous sequence
84 2.027324 192.168.10.44 → 192.168.10.102 54 TCP 44992 → 55414 [FIN, ACK] Seq=143 Ack=114394 Win=216192 Len=0
pasta (client) sends FIN, ACK as the client has no more data to
send (a single GET request), while still acknowledging a previous
sequence (because that's what the client acknowledged so far)
85 2.027349 192.168.10.102 → 192.168.10.44 54 TCP 55414 → 44992 [ACK] Seq=121442 Ack=144 Win=65536 Len=0
the server acknowledges the FIN, ACK
86 2.224125 192.168.10.102 → 192.168.10.44 4150 TCP [TCP Retransmission] 55414 → 44992 [ACK] Seq=114394 Ack=144 Win=65536 Len=4096 [TCP segment of a reassembled PDU]
...and nothing happens for a while, until, approximately 200ms
later, the server concludes that it needs to retransmit some data
as it wasn't acknowledged.
This is totally unnecessary, so avoid that by setting the ACK
sequence to whatever we received from the container / guest, before
sending a FIN segment and switching to EPOLLET.
Further on:
87 2.224202 192.168.10.44 → 192.168.10.102 54 TCP 44992 → 55414 [RST] Seq=144 Win=0 Len=0
as we see the retransmission, we conclude that this was the final
ACK and close the connection. But the retransmission should instead
tell us that the server is waiting for an ACK segment we never sent.
To fix this, revert the TAP_FIN_RCVD event if we get a retransmission,
because the peer is rewinding the sequence, which implies that the FIN
will be retransmitted as well, eventually. And, until then, we
shouldn't handle that side of the connection as closed. We need to
process (silently discard) this data, instead.
Link: https://github.com/containers/podman/issues/27179
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
I couldn't reproduce the original issue, and the excerpts from the
capture in the commit message are from the original report. I'm fairly
sure that this is correct, but some testing in the problematic case is
definitely needed.
tcp.c | 4 ++++
tcp_buf.c | 14 +++++++++++++-
tcp_vu.c | 7 ++++++-
3 files changed, 23 insertions(+), 2 deletions(-)
diff --git a/tcp.c b/tcp.c
index 48b1ef2..bfee56b 100644
--- a/tcp.c
+++ b/tcp.c
@@ -2128,6 +2128,10 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af,
return 1;
}
+ /* Retransmissions revert any FIN segment (last data segment) */
+ if (SEQ_LT(ntohl(th->seq), conn->seq_from_tap))
+ conn_event(c, conn, ~TAP_FIN_RCVD);
+
/* Established connections not accepting data from tap */
if (conn->events & TAP_FIN_RCVD) {
bool retr;
diff --git a/tcp_buf.c b/tcp_buf.c
index a493b5a..cc106bc 100644
--- a/tcp_buf.c
+++ b/tcp_buf.c
@@ -368,7 +368,19 @@ int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
conn_flag(c, conn, STALLED);
} else if ((conn->events & (SOCK_FIN_RCVD | TAP_FIN_SENT)) ==
SOCK_FIN_RCVD) {
- int ret = tcp_buf_send_flag(c, conn, FIN | ACK);
+ int ret;
+
+ /* On TAP_FIN_SENT, we won't get further data events
+ * from the socket, and this might be the last ACK
+ * segment we send to the tap, so update its sequence to
+ * include everything we received until now.
+ *
+ * See also the special handling on CONN_IS_CLOSING() in
+ * tcp_update_seqack_wnd().
+ */
+ conn->seq_ack_to_tap = conn->seq_from_tap;
+
+ ret = tcp_buf_send_flag(c, conn, FIN | ACK);
if (ret) {
tcp_rst(c, conn);
return ret;
diff --git a/tcp_vu.c b/tcp_vu.c
index ebd3a1e..3ec3538 100644
--- a/tcp_vu.c
+++ b/tcp_vu.c
@@ -410,7 +410,12 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
conn_flag(c, conn, STALLED);
} else if ((conn->events & (SOCK_FIN_RCVD | TAP_FIN_SENT)) ==
SOCK_FIN_RCVD) {
- int ret = tcp_vu_send_flag(c, conn, FIN | ACK);
+ int ret;
+
+ /* See tcp_buf_data_from_sock() */
+ conn->seq_ack_to_tap = conn->seq_from_tap;
+
+ ret = tcp_vu_send_flag(c, conn, FIN | ACK);
if (ret) {
tcp_rst(c, conn);
return ret;
--
2.43.0
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH RFT] tcp: Fix ACK sequence on FIN to tap, handle retransmission in half-closed state
2025-09-30 20:51 [PATCH RFT] tcp: Fix ACK sequence on FIN to tap, handle retransmission in half-closed state Stefano Brivio
@ 2025-09-30 20:58 ` Stefano Brivio
2025-09-30 21:55 ` Stefano Brivio
1 sibling, 0 replies; 3+ messages in thread
From: Stefano Brivio @ 2025-09-30 20:58 UTC (permalink / raw)
To: passt-dev
On Tue, 30 Sep 2025 22:51:32 +0200
Stefano Brivio <sbrivio@redhat.com> wrote:
> However, we don't necessarily call tcp_update_seqack_wnd() before
> sending the FIN segment, which might lead to this exchange, where
> 192.168.10.102 is a HTTP server in a Podman container, and
> 192.168.10.44 is a client fetching approximately 160 KB of data
...sorry, this is actually 121 KB.
--
Stefano
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH RFT] tcp: Fix ACK sequence on FIN to tap, handle retransmission in half-closed state
2025-09-30 20:51 [PATCH RFT] tcp: Fix ACK sequence on FIN to tap, handle retransmission in half-closed state Stefano Brivio
2025-09-30 20:58 ` Stefano Brivio
@ 2025-09-30 21:55 ` Stefano Brivio
1 sibling, 0 replies; 3+ messages in thread
From: Stefano Brivio @ 2025-09-30 21:55 UTC (permalink / raw)
To: passt-dev
On Tue, 30 Sep 2025 22:51:32 +0200
Stefano Brivio <sbrivio@redhat.com> wrote:
> I couldn't reproduce the original issue, and the excerpts from the
> capture in the commit message are from the original report. I'm fairly
> sure that this is correct, but some testing in the problematic case is
> definitely needed.
...nope, not fixed, see:
https://github.com/containers/podman/issues/27179#issuecomment-3353911720
--
Stefano
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-09-30 21:55 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-30 20:51 [PATCH RFT] tcp: Fix ACK sequence on FIN to tap, handle retransmission in half-closed state Stefano Brivio
2025-09-30 20:58 ` Stefano Brivio
2025-09-30 21:55 ` Stefano Brivio
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).