On Thu, Oct 02, 2025 at 02:06:45AM +0200, Stefano Brivio wrote: > If a guest or container sends us a FIN segment but its sequence number > doesn't match the highest sequence of data we *accepted* (not > necessarily the highest sequence we received), that is, > conn->seq_from_tap, plus any data we're accepting in the current > batch, we should discard the flag (not necessarily the segment), > because there's still data we need to receive (again) before the end > of the stream. > > If we consider those FIN flags as such, we'll end up in the > situation described below. > > Here, 192.168.10.102 is a HTTP server in a Podman container, and > 192.168.10.44 is a client fetching approximately 121 KB of data from > it: > > 82 2.026811 192.168.10.102 → 192.168.10.44 54 TCP 55414 → 44992 [FIN, ACK] Seq=121441 Ack=143 Win=65536 Len=0 > > the server is done sending > > 83 2.026898 192.168.10.44 → 192.168.10.102 54 TCP 44992 → 55414 [ACK] Seq=143 Ack=114394 Win=216192 Len=0 > > pasta (client) acknowledges a previous sequence, because of > a short sendmsg() > > 84 2.027324 192.168.10.44 → 192.168.10.102 54 TCP 44992 → 55414 [FIN, ACK] Seq=143 Ack=114394 Win=216192 Len=0 > > pasta (client) sends FIN, ACK as the client has no more data to > send (a single GET request), while still acknowledging a previous > sequence, because the retransmission didn't happen yet > > 85 2.027349 192.168.10.102 → 192.168.10.44 54 TCP 55414 → 44992 [ACK] Seq=121442 Ack=144 Win=65536 Len=0 > > the server acknowledges the FIN, ACK > > 86 2.224125 192.168.10.102 → 192.168.10.44 4150 TCP [TCP Retransmission] 55414 → 44992 [ACK] Seq=114394 Ack=144 Win=65536 Len=4096 [TCP segment of a reassembled PDU] > > and finally a retransmission comes, but as we wrongly switched to > the CLOSE-WAIT state, > > 87 2.224202 192.168.10.44 → 192.168.10.102 54 TCP 44992 → 55414 [RST] Seq=144 Win=0 Len=0 > > we consider frame #86 as an acknowledgement for the FIN segment we > sent, and close the connection, while we still had to re-receive > (and finally send) the missing data segment, instead. > > Link: https://github.com/containers/podman/issues/27179 > Signed-off-by: Stefano Brivio > --- > tcp.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tcp.c b/tcp.c > index 3f7dc82..5a7a607 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -1769,7 +1769,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, > } > } > > - if (th->fin) > + if (th->fin && seq == seq_from_tap) > fin = 1; Can a FIN segment also contain data? My quick googling suggests yes. If so, doesn't this logic need to go after we process the data processing, so that seq_from_tap points to the end of the packet's data, rather than the beginning? (And the handling of zero-length packets would also need revision to match). > > if (!len) > -- > 2.43.0 > -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson