From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202510 header.b=t9m6Ox2S; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 801F55A0619 for ; Tue, 07 Oct 2025 01:34:13 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202510; t=1759793650; bh=zTfdzg0E2YyLMflDDRZ2cdM/xAnMV8TDHC9y2S0aLDU=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=t9m6Ox2SfONI8hMiUKtk6OALo+Xo7WQ8zn/d5IrklnHs3gqP6SFjnrvn9MvV5yZWh fiGOXHAvj+VnQqb5Fr9y8RLOTMk93vnTvF27ZmXvZUOu31GS9Uilcm+q4Wg7iyxkWk b59gL0ZbPZThGXyKIemuAwDr11c7LTR//HxqusBHrF7U3aMZqd0kGb+P6bqDoNeMko AgM4qdAnedM0qajranrJsqTX76trd8IgCjxsRq27N+xExHkeBCLwphBwbxD0aCovEF AxFumdS2lVt6N+wzxj3VO3czlc0Uzirgg+g6GyiUeCpGi2o2F4EO1jUt0taSM8Ddj/ KiJVUzoRzk3Hg== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4cgbFZ3m2kz4w1v; Tue, 7 Oct 2025 10:34:10 +1100 (AEDT) Date: Tue, 7 Oct 2025 10:34:01 +1100 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 3/4] tcp: Don't consider FIN flags with mismatching sequence Message-ID: References: <20251002000646.2136202-1-sbrivio@redhat.com> <20251002000646.2136202-4-sbrivio@redhat.com> <20251002135104.1a662029@elisabeth> <20251007003254.57ce5e26@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="HqHwIsh7q3PJBWhU" Content-Disposition: inline In-Reply-To: <20251007003254.57ce5e26@elisabeth> Message-ID-Hash: D524OSB7GCVHGHDGURMLVVEDG3SO6XNP X-Message-ID-Hash: D524OSB7GCVHGHDGURMLVVEDG3SO6XNP X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --HqHwIsh7q3PJBWhU Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Oct 07, 2025 at 12:32:54AM +0200, Stefano Brivio wrote: > On Fri, 3 Oct 2025 13:43:32 +1000 > David Gibson wrote: >=20 > > On Thu, Oct 02, 2025 at 01:51:04PM +0200, Stefano Brivio wrote: > > > On Thu, 2 Oct 2025 13:02:09 +1000 > > > David Gibson wrote: =20 > > > > On Thu, Oct 02, 2025 at 12:52:31PM +1000, David Gibson wrote: =20 > > > > > On Thu, Oct 02, 2025 at 02:06:45AM +0200, Stefano Brivio wrote: = =20 > > [snip] > > > > > > diff --git a/tcp.c b/tcp.c > > > > > > index 3f7dc82..5a7a607 100644 > > > > > > --- a/tcp.c > > > > > > +++ b/tcp.c > > > > > > @@ -1769,7 +1769,7 @@ static int tcp_data_from_tap(const struct= ctx *c, struct tcp_tap_conn *conn, > > > > > > } > > > > > > } > > > > > > =20 > > > > > > - if (th->fin) > > > > > > + if (th->fin && seq =3D=3D seq_from_tap) > > > > > > fin =3D 1; =20 > > > > >=20 > > > > > Can a FIN segment also contain data? My quick googling suggests = yes. =20 > > >=20 > > > Yes, absolutely, my slow wiresharking over the years also confirms, a= nd > > > it's so often the case that (I think) this issue doesn't happen so > > > frequently as it only occurs if we have a FIN segment without data. = =20 > >=20 > > Makes sense. > >=20 > > > If we have a data segment, with FIN set, that we can't fully transmit, > > > we already set 'partial_send' and won't set TAP_FIN_RCVD as a result. > > >=20 > > > Another case where we want to ignore a FIN segment with data is if we > > > have a gap before it, but in that case we'll eventually set 'keep' and > > > return early. =20 > >=20 > > Ah, right. I'd noticed we set fin =3D 1 in that case, but forgotten > > about the exit before setting TAP_FIN_RCVD if keep is set. > >=20 > > > > > If so, doesn't this logic need to go after we process the data > > > > > processing, so that seq_from_tap points to the end of the packet's > > > > > data, rather than the beginning? (And the handling of zero-length > > > > > packets would also need revision to match). =20 > > >=20 > > > This made sense to me for a moment but now I'm struggling to understa= nd > > > or remember why. What I want to check here is that a FIN segment > > > without data (I should have specified in the commit message) is > > > acceptable because its sequence is as expected. =20 > >=20 > > Right. This is correct for zero-data FIN segments, but I think as a > > side-effect you've made it ignore certain FIN segments _with_ data. > > It will work in the common case where the data exactly follows on from > > what we already have. But in the case where the segment has some data > > we already have and some new data, the fin =3D 1 won't trip because seq > > !=3D seq_from_tap. There isn't another place that will catch it > > instead, AFAICT. > >=20 > > I guess it will be fine in the end, because with all the data acked, > > the guest should retransmit the FIN with no data. > >=20 > > > But going back to FIN segments with data: why should we sum the length > > > to seq_from_tap before comparing the sequence? I don't understand what > > > additional check you want to introduce, or what case you want to cove= r. =20 > >=20 > > I was thinking about the case above, but I didn't explain it very > > well. > >=20 > > > > Following on from that, it seems to me like it would make sense for > > > > FIN segments to also participate in the 'keep' mechanism. It should > > > > work eventually, but I expect it would be smoother in the case that= we > > > > get a final burst of packets in a stream out of order. =20 > > >=20 > > > FIN segments with data already go through that dance. > > >=20 > > > Without data, I guess you're right, we might have in the same batch > > > (not that I've ever seen it happening in practice) a FIN segment > > > without data that we process first (and now discard because of the > > > sequence number), and some data before that we process later, so we > > > shouldn't throw away the FIN segment because of that. We should, > > > conceptually, reorder it as well. > > >=20 > > > It probably makes things more complicated for a reason that's not so > > > critical (ignoring a FIN is fine, we'll get another one), and I wanted > > > to have the simplest possible fix here. > > >=20 > > > Let me see if I can make this entirely correct without a substantially > > > bigger change, I haven't really tried. =20 > >=20 > > How about this: > >=20 > > diff --git a/tcp.c b/tcp.c > > index 7da41797..42e576b4 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -1774,10 +1774,7 @@ static int tcp_data_from_tap(const struct ctx *c= , struct tcp_tap_conn *conn, > > } > > } > > =20 > > - if (th->fin) > > - fin =3D 1; > > - > > - if (!len) > > + if (!len && !th->fin) > > continue; > > =20 > > seq_offset =3D seq_from_tap - seq; > > @@ -1820,6 +1817,8 @@ static int tcp_data_from_tap(const struct ctx *c,= struct tcp_tap_conn *conn, > > break; > > seq_from_tap +=3D size; > > iov_i +=3D count; > > + if (th->fin) > > + fin =3D 1; > > =20 > > if (keep =3D=3D i) > > keep =3D -1; > >=20 > >=20 > > We'd need to double check that the "accept data segment" path is safe > > with len =3D=3D 0, of course. >=20 > For sure it's not before d2c33f45f7be ("tcp: Convert > tcp_data_from_tap() to use iov_tail"), because we might add > zero-length segments to the tcp_iov array, and that would make > backporting an otherwise simple and critical fix to slightly older > versions rather complicated. Kinda. It's not that complicated to deal with that case, by wrapping the actual data processing in an `if (len) { ... }` > After that commit, I'm not sure about the behaviour of > iov_tail_clone(). I think it will return 0, but it should be tested > (that assumption, itself, but also that the fix still works). Right, I think it's safe in that case - that's what I was looking at. > > But I think that will treat dataless and > > with-data FINs the same way, and let them use the keep mechanism. >=20 > Given that the only advantage of doing this would be to handle a rare > (I guess) corner case, that is, an out-of-sequence FIN segment with > data, which is not critical anyway because the FIN will be > retransmitted, I would rather keep this (critical) fix as it is. >=20 > I would suggest filing a separate ticket or anyway sending a separate > patch if you want to fix that other case. Fair enough. I'll wait for you to get the basic fix merge, then I might batch this cleanup/fixup with the reorg to clarify bytes_acked handling. --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --HqHwIsh7q3PJBWhU Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmjkUegACgkQzQJF27ox 2Gf6aA/+OBCE4C+7Gkia+1+nl+S3+o9XQ6+bjXquDmXNP3X0kH6VIBRiZIFbACoI /tL0uwxM2V6/+ej9CP2Nln1OmntSC4BY+uDkwr/E4ImByGYOcwA7AMa76wrk6wuj 7m0tlYiZCIVv4vtHeSvOYJiNbRcStXwaieJpZLTqNeXB1Mj5JjiPH0RUIeS608NO /mKuedwnYwznoHn0HNmOj/BALIXNfmMqJCvRD9W7TvtGqE2IezcCHyskyhcSrrKK 3SfYsgYdF4y6n1peAw1aeRZrJHtMIxwVGDLgMNjmEy0kHkA9ZF+A3fOYY5OCDDE5 rpIDD872Kd2nqeeCX2BIknuR08nbJ0CuvhoWdH/eNhNrw1QGWFycms/ERF4wHoFd wWVJRfh6fOXCOEILgQzzTDTFynS+u49WD89Emq0x140J9GPNX9+aaNegEdersjAG sr82Jb+taxgtmia2YmuamnLTNQZwLChk+DRC1KJby4GBzrr+enA5pXqQEsAqY1bf Cm9Zz6T6jl4SXg60aElfZw0UVuhQutY48VBqSgSXIt4M3x0KtkiDvxp9r6gGUkGc FXC/ClcXkQmd/ieLkfWDnn4qU7pNEqqAHNC0W/tUesDjbZkx4jMpXahoqAaOVFLt 7k9t7Oli+PGbj5UaotzwbS4KkEy878VKAsAewkHvlrEWatupCng= =Qjhw -----END PGP SIGNATURE----- --HqHwIsh7q3PJBWhU--