From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=VkGTKjSQ; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id 67E085A0272 for ; Thu, 02 Oct 2025 13:51:11 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1759405870; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sT3IcmWqCi6z9ejx0osjsXD7PSBhJtQ70CJaMdEDBlE=; b=VkGTKjSQ76vK7MNQnf8vO0U5C+wcCeWedeu5zKG6DzXj4Ru/qMVurSZpLNPth6pcRK4iCP df3tyyHDmx6P4FQewon6pUc9ijYLfr0Bfb8a5CsvHzkOnuokEn6rNAyHn8xJHLhseoSefK uQEfbJV8IfIii+HwyReiTXpwQFHKpJU= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-125-Fclr4XdGMHuEcOKC1u9vjw-1; Thu, 02 Oct 2025 07:51:09 -0400 X-MC-Unique: Fclr4XdGMHuEcOKC1u9vjw-1 X-Mimecast-MFC-AGG-ID: Fclr4XdGMHuEcOKC1u9vjw_1759405868 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-46e2c11b94cso4604565e9.3 for ; Thu, 02 Oct 2025 04:51:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759405867; x=1760010667; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=cz0FR+VEY61AFeFDHNC1MmaR9w1dqFy0Ms9iZAHtnYA=; b=EwkqbQNg5VELD4lpTSYbIlenNHv1fbAp51tsIltO/f+WU+bDtn7TVGhgGhq9Ws2x9f EVzr0gPx38LxkIdEkniTVhAl3T3MvZ04fphr3tyaNsnkJnp0unwpA9Mb3ouBwDuCeXjP aurJQBBH80m9wxaqoTZ+oOPDkEKvyM6LyPVLtmWI51HSmKLIqv9WEy4AUjdV4tEGBTF3 3OsUpHLsl+QA3kUUnSwNG9NlgkMyHI6Wmbk14M9wrW3J3xF6j8qxKTlPtU/GSvtd+HCk 2LzhAmPOZryyKCkUXEEzphggZUpOA4q7ZvNs2zjy27HZUwlmtNCF+8kUCCAaAfl8+sy6 ZCog== X-Gm-Message-State: AOJu0Yx0L7Q0lrh9Asj89W7D6T7uGgfQ95d97UBDJEN2NennKGKGD16P xfMkRCtFg3t2yfT7mDNqrVJF+o4zzZyiEVLqhVw7OAqC3HfNPUrC51xgwCoyBem1iPei7LxpNot DcLnRba2qFAFCSGlhJgBYa6yyF7/QtWEX5d1b9ME0QL0VWYBhHF6+pUi68FMLXg== X-Gm-Gg: ASbGncsWmn2Hn+NA+uibXyh+/JDujAlSD/WdHGmwa9cU0pUVoMAGj5ZVtuYS4c/dyff m5xA4i+c5s7R7BLN/VlKx1Fs0W0jTaEYhZ2z7oPhqM/MX7yXNbdW78NvkXZ6R3JbFu20xYnfYjy yD0BjoxwHlb9AvUMG7PNxjo4VvBFzspfgqc7wJ3/xov9b1wwxi51KlJnLzKmJLnh5kBTh4qi4S3 /S9Vevw+OYQKYaM/46yRuH9I4hWZAUGDZl/znK4E//KEzZLWwsHKvbTi97HEV/1dBfWHJSRRh70 BX7Frx0v+MaTew/k1Bn8A4/zaPMIAv1FWvNYxnu00LkuH7rY1n9BlEI9 X-Received: by 2002:a05:600c:4ec6:b0:46c:e3df:529e with SMTP id 5b1f17b1804b1-46e612ba9ecmr46287085e9.19.1759405867348; Thu, 02 Oct 2025 04:51:07 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFFDyUQJUudaqlicPh+A5ggyZM7RxrnbVyNyhR5+meWmzbPBRc5rfUKrxRbH/CZXx59b04SaA== X-Received: by 2002:a05:600c:4ec6:b0:46c:e3df:529e with SMTP id 5b1f17b1804b1-46e612ba9ecmr46286865e9.19.1759405866735; Thu, 02 Oct 2025 04:51:06 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-46e693bd8bfsm31941165e9.11.2025.10.02.04.51.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Oct 2025 04:51:06 -0700 (PDT) Date: Thu, 2 Oct 2025 13:51:04 +0200 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 3/4] tcp: Don't consider FIN flags with mismatching sequence Message-ID: <20251002135104.1a662029@elisabeth> In-Reply-To: References: <20251002000646.2136202-1-sbrivio@redhat.com> <20251002000646.2136202-4-sbrivio@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: -gSeOeg3cSeOYdSaH3SJzVc_wbpzIHACj2L08VTegLQ_1759405868 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: EEJSLGKNWYIKMT2EIBIOUG7LLGVHKQG4 X-Message-ID-Hash: EEJSLGKNWYIKMT2EIBIOUG7LLGVHKQG4 X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu, 2 Oct 2025 13:02:09 +1000 David Gibson wrote: > On Thu, Oct 02, 2025 at 12:52:31PM +1000, David Gibson wrote: > > On Thu, Oct 02, 2025 at 02:06:45AM +0200, Stefano Brivio wrote: =20 > > > If a guest or container sends us a FIN segment but its sequence numbe= r > > > doesn't match the highest sequence of data we *accepted* (not > > > necessarily the highest sequence we received), that is, > > > conn->seq_from_tap, plus any data we're accepting in the current > > > batch, we should discard the flag (not necessarily the segment), > > > because there's still data we need to receive (again) before the end > > > of the stream. > > >=20 > > > If we consider those FIN flags as such, we'll end up in the > > > situation described below. > > >=20 > > > Here, 192.168.10.102 is a HTTP server in a Podman container, and > > > 192.168.10.44 is a client fetching approximately 121 KB of data from > > > it: > > >=20 > > > 82 2.026811 192.168.10.102 =E2=86=92 192.168.10.44 54 TCP 55414 = =E2=86=92 44992 [FIN, ACK] Seq=3D121441 Ack=3D143 Win=3D65536 Len=3D0 > > >=20 > > > the server is done sending > > >=20 > > > 83 2.026898 192.168.10.44 =E2=86=92 192.168.10.102 54 TCP 44992 = =E2=86=92 55414 [ACK] Seq=3D143 Ack=3D114394 Win=3D216192 Len=3D0 > > >=20 > > > pasta (client) acknowledges a previous sequence, because of > > > a short sendmsg() > > >=20 > > > 84 2.027324 192.168.10.44 =E2=86=92 192.168.10.102 54 TCP 44992 = =E2=86=92 55414 [FIN, ACK] Seq=3D143 Ack=3D114394 Win=3D216192 Len=3D0 > > >=20 > > > pasta (client) sends FIN, ACK as the client has no more data to > > > send (a single GET request), while still acknowledging a previous > > > sequence, because the retransmission didn't happen yet > > >=20 > > > 85 2.027349 192.168.10.102 =E2=86=92 192.168.10.44 54 TCP 55414 = =E2=86=92 44992 [ACK] Seq=3D121442 Ack=3D144 Win=3D65536 Len=3D0 > > >=20 > > > the server acknowledges the FIN, ACK > > >=20 > > > 86 2.224125 192.168.10.102 =E2=86=92 192.168.10.44 4150 TCP [TCP= Retransmission] 55414 =E2=86=92 44992 [ACK] Seq=3D114394 Ack=3D144 Win=3D6= 5536 Len=3D4096 [TCP segment of a reassembled PDU] > > >=20 > > > and finally a retransmission comes, but as we wrongly switched to > > > the CLOSE-WAIT state, > > >=20 > > > 87 2.224202 192.168.10.44 =E2=86=92 192.168.10.102 54 TCP 44992 = =E2=86=92 55414 [RST] Seq=3D144 Win=3D0 Len=3D0 > > >=20 > > > we consider frame #86 as an acknowledgement for the FIN segment we > > > sent, and close the connection, while we still had to re-receive > > > (and finally send) the missing data segment, instead. > > >=20 > > > Link: https://github.com/containers/podman/issues/27179 > > > Signed-off-by: Stefano Brivio > > > --- > > > tcp.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > >=20 > > > diff --git a/tcp.c b/tcp.c > > > index 3f7dc82..5a7a607 100644 > > > --- a/tcp.c > > > +++ b/tcp.c > > > @@ -1769,7 +1769,7 @@ static int tcp_data_from_tap(const struct ctx *= c, struct tcp_tap_conn *conn, > > > =09=09=09} > > > =09=09} > > > =20 > > > -=09=09if (th->fin) > > > +=09=09if (th->fin && seq =3D=3D seq_from_tap) > > > =09=09=09fin =3D 1; =20 > >=20 > > Can a FIN segment also contain data? My quick googling suggests yes. Yes, absolutely, my slow wiresharking over the years also confirms, and it's so often the case that (I think) this issue doesn't happen so frequently as it only occurs if we have a FIN segment without data. If we have a data segment, with FIN set, that we can't fully transmit, we already set 'partial_send' and won't set TAP_FIN_RCVD as a result. Another case where we want to ignore a FIN segment with data is if we have a gap before it, but in that case we'll eventually set 'keep' and return early. > > If so, doesn't this logic need to go after we process the data > > processing, so that seq_from_tap points to the end of the packet's > > data, rather than the beginning? (And the handling of zero-length > > packets would also need revision to match). =20 This made sense to me for a moment but now I'm struggling to understand or remember why. What I want to check here is that a FIN segment without data (I should have specified in the commit message) is acceptable because its sequence is as expected. But going back to FIN segments with data: why should we sum the length to seq_from_tap before comparing the sequence? I don't understand what additional check you want to introduce, or what case you want to cover. > Following on from that, it seems to me like it would make sense for > FIN segments to also participate in the 'keep' mechanism. It should > work eventually, but I expect it would be smoother in the case that we > get a final burst of packets in a stream out of order. FIN segments with data already go through that dance. Without data, I guess you're right, we might have in the same batch (not that I've ever seen it happening in practice) a FIN segment without data that we process first (and now discard because of the sequence number), and some data before that we process later, so we shouldn't throw away the FIN segment because of that. We should, conceptually, reorder it as well. It probably makes things more complicated for a reason that's not so critical (ignoring a FIN is fine, we'll get another one), and I wanted to have the simplest possible fix here. Let me see if I can make this entirely correct without a substantially bigger change, I haven't really tried. --=20 Stefano