From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=CAhkMEPK; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 69D2F5A026F for ; Wed, 08 Oct 2025 00:42:58 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1759876977; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XkeanslxNL7SeqJnTWpZXPDjmsbJuHbjK9iAonfQFxA=; b=CAhkMEPKoTmlL12w0H9MlVK8hsuWuWpiJSD6MqoEAF1+n9cOvzGhm4uONZC7Y93KxR7yad W5yf4RkfMdSmkROWPsf15icYJ5r8SNKfoZ4ouFU84rVtNiR6YEF3KQ/ZZfRM1BZeo5YqSJ 9rbj228e/fg9hJTmX7416U66rZOQXDQ= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-146-7HAaCTY4OwCgmOE0_o4jng-1; Tue, 07 Oct 2025 18:42:53 -0400 X-MC-Unique: 7HAaCTY4OwCgmOE0_o4jng-1 X-Mimecast-MFC-AGG-ID: 7HAaCTY4OwCgmOE0_o4jng_1759876972 Received: by mail-wr1-f71.google.com with SMTP id ffacd0b85a97d-3e997eb7232so2829712f8f.3 for ; Tue, 07 Oct 2025 15:42:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759876972; x=1760481772; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=XkeanslxNL7SeqJnTWpZXPDjmsbJuHbjK9iAonfQFxA=; b=nXfDvnc5oDWFkbsL28uP6A2fNLJxpAJ7JPFinsuty/Hwo7hTYYmSZyLqvEictWHPEs wHF9sA/G2yDPa8YtG7yRMZDkpwTI4b8ukTF3SzqEgcUUYfrt64WSo/sDIvr5ucQVNs1s rrmXmstoswflt0yPS7GbWzJqR9HvpQlUpdXgZRyUFkEaYhm/DhcwsB/ECukzpHQMgU8U 25aQmc1OnL8XMecZ+RfhY3rH6jIX0xB/Ii3V3YJSizaC+L6rlJnW5ufEOJsrS/TRKEUG 1C7+sjwGnLNEWuO5Pb0955ItW2q+cVapXzYtHPWJLaYTvCD8jzUPzrQGC455H3OsQTSy SBZg== X-Gm-Message-State: AOJu0Yx7Qb/xSeTluL7yHEvtSmCnU7qCt7g6nDrLz0QRuVNkQmGh6hXa sbvjYbGAVgJRN1vuZGfdbcjwwcf6bHOC0VyZ6lbgAEs61FQ2o/1LPKYX9GA79BoNvkKoTUdFArJ OgMBp+WSC/K4x9UVxIrfZoKPIUpPjzYz2H5LB/Rl4Mc17VE2THf1VaVjh4l6Wmw== X-Gm-Gg: ASbGncu43YQYiM8W6dvK1rJk+MedrW2z0iQsI3m/wL7m0alzZF+4/6G49g0N59cZ6SH bEy5aU+AmxpAyIS6tD+NxBXpw3q0UVgjirSgXgu8/LVcWbKrcfi5rF0biJMyybJ2ldrH/SirT7O bKejNDA9kbDF2cBjiueOVgkaCcAb2ufJjcjeeqoE4JzUCcctgZEfWCzseFxK29k2PZXA/mQiGX1 S6dcVGcaZhQ3cd9hm8lRr3phnG5JbKOQ28QG8MWtv2hZvrCQ6GKL4uhzd8dbUFJurYx5jPa/NzP LZo842GEZ/YoJeFHcW5HmsImmjNwTXN9giA4FEb0W1D6qY8nEHreqMjp X-Received: by 2002:a05:600c:4506:b0:45b:8a0e:cda9 with SMTP id 5b1f17b1804b1-46fa9a8638dmr8363855e9.2.1759876971814; Tue, 07 Oct 2025 15:42:51 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHDTUez/BbITeWNiZa0LolcBDiyo2BXZX8R3DEZ4U6pwp3nvGMz19bY2hE9eMoKhqb0pep2Uw== X-Received: by 2002:a05:600c:4506:b0:45b:8a0e:cda9 with SMTP id 5b1f17b1804b1-46fa9a8638dmr8363795e9.2.1759876971367; Tue, 07 Oct 2025 15:42:51 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-46fa9d71489sm10783775e9.19.2025.10.07.15.42.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Oct 2025 15:42:50 -0700 (PDT) Date: Wed, 8 Oct 2025 00:42:49 +0200 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 3/4] tcp: Don't consider FIN flags with mismatching sequence Message-ID: <20251008004249.6dc894fa@elisabeth> In-Reply-To: References: <20251002000646.2136202-1-sbrivio@redhat.com> <20251002000646.2136202-4-sbrivio@redhat.com> <20251002135104.1a662029@elisabeth> <20251007003254.57ce5e26@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: YQw1MXnrnSZ8ufLFk8v68CQMTAxievHvCQkn7Eo7fTc_1759876972 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: LFGFYYFUMBGYXZCXJ2R36THKI3WHMNFK X-Message-ID-Hash: LFGFYYFUMBGYXZCXJ2R36THKI3WHMNFK X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Tue, 7 Oct 2025 10:34:01 +1100 David Gibson wrote: > On Tue, Oct 07, 2025 at 12:32:54AM +0200, Stefano Brivio wrote: > > On Fri, 3 Oct 2025 13:43:32 +1000 > > David Gibson wrote: > > > > > On Thu, Oct 02, 2025 at 01:51:04PM +0200, Stefano Brivio wrote: > > > > On Thu, 2 Oct 2025 13:02:09 +1000 > > > > David Gibson wrote: > > > > > On Thu, Oct 02, 2025 at 12:52:31PM +1000, David Gibson wrote: > > > > > > On Thu, Oct 02, 2025 at 02:06:45AM +0200, Stefano Brivio wrote: > > > [snip] > > > > > > > diff --git a/tcp.c b/tcp.c > > > > > > > index 3f7dc82..5a7a607 100644 > > > > > > > --- a/tcp.c > > > > > > > +++ b/tcp.c > > > > > > > @@ -1769,7 +1769,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > - if (th->fin) > > > > > > > + if (th->fin && seq == seq_from_tap) > > > > > > > fin = 1; > > > > > > > > > > > > Can a FIN segment also contain data? My quick googling suggests yes. > > > > > > > > Yes, absolutely, my slow wiresharking over the years also confirms, and > > > > it's so often the case that (I think) this issue doesn't happen so > > > > frequently as it only occurs if we have a FIN segment without data. > > > > > > Makes sense. > > > > > > > If we have a data segment, with FIN set, that we can't fully transmit, > > > > we already set 'partial_send' and won't set TAP_FIN_RCVD as a result. > > > > > > > > Another case where we want to ignore a FIN segment with data is if we > > > > have a gap before it, but in that case we'll eventually set 'keep' and > > > > return early. > > > > > > Ah, right. I'd noticed we set fin = 1 in that case, but forgotten > > > about the exit before setting TAP_FIN_RCVD if keep is set. > > > > > > > > > If so, doesn't this logic need to go after we process the data > > > > > > processing, so that seq_from_tap points to the end of the packet's > > > > > > data, rather than the beginning? (And the handling of zero-length > > > > > > packets would also need revision to match). > > > > > > > > This made sense to me for a moment but now I'm struggling to understand > > > > or remember why. What I want to check here is that a FIN segment > > > > without data (I should have specified in the commit message) is > > > > acceptable because its sequence is as expected. > > > > > > Right. This is correct for zero-data FIN segments, but I think as a > > > side-effect you've made it ignore certain FIN segments _with_ data. > > > It will work in the common case where the data exactly follows on from > > > what we already have. But in the case where the segment has some data > > > we already have and some new data, the fin = 1 won't trip because seq > > > != seq_from_tap. There isn't another place that will catch it > > > instead, AFAICT. > > > > > > I guess it will be fine in the end, because with all the data acked, > > > the guest should retransmit the FIN with no data. > > > > > > > But going back to FIN segments with data: why should we sum the length > > > > to seq_from_tap before comparing the sequence? I don't understand what > > > > additional check you want to introduce, or what case you want to cover. > > > > > > I was thinking about the case above, but I didn't explain it very > > > well. > > > > > > > > Following on from that, it seems to me like it would make sense for > > > > > FIN segments to also participate in the 'keep' mechanism. It should > > > > > work eventually, but I expect it would be smoother in the case that we > > > > > get a final burst of packets in a stream out of order. > > > > > > > > FIN segments with data already go through that dance. > > > > > > > > Without data, I guess you're right, we might have in the same batch > > > > (not that I've ever seen it happening in practice) a FIN segment > > > > without data that we process first (and now discard because of the > > > > sequence number), and some data before that we process later, so we > > > > shouldn't throw away the FIN segment because of that. We should, > > > > conceptually, reorder it as well. > > > > > > > > It probably makes things more complicated for a reason that's not so > > > > critical (ignoring a FIN is fine, we'll get another one), and I wanted > > > > to have the simplest possible fix here. > > > > > > > > Let me see if I can make this entirely correct without a substantially > > > > bigger change, I haven't really tried. > > > > > > How about this: > > > > > > diff --git a/tcp.c b/tcp.c > > > index 7da41797..42e576b4 100644 > > > --- a/tcp.c > > > +++ b/tcp.c > > > @@ -1774,10 +1774,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, > > > } > > > } > > > > > > - if (th->fin) > > > - fin = 1; > > > - > > > - if (!len) > > > + if (!len && !th->fin) > > > continue; > > > > > > seq_offset = seq_from_tap - seq; > > > @@ -1820,6 +1817,8 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, > > > break; > > > seq_from_tap += size; > > > iov_i += count; > > > + if (th->fin) > > > + fin = 1; > > > > > > if (keep == i) > > > keep = -1; > > > > > > > > > We'd need to double check that the "accept data segment" path is safe > > > with len == 0, of course. > > > > For sure it's not before d2c33f45f7be ("tcp: Convert > > tcp_data_from_tap() to use iov_tail"), because we might add > > zero-length segments to the tcp_iov array, and that would make > > backporting an otherwise simple and critical fix to slightly older > > versions rather complicated. > > Kinda. It's not that complicated to deal with that case, by wrapping > the actual data processing in an `if (len) { ... }` That's needed for sure, but do we risk looping forever on a particular batch of FIN segments without data with a given series of sequence numbers? Right now the loop terminates because the sequence moves forward. I'm not sure what happens without data while moving 'keep' around. Maybe it takes a few minutes to figure out (I haven't tried) but I wouldn't call that trivial. > > After that commit, I'm not sure about the behaviour of > > iov_tail_clone(). I think it will return 0, but it should be tested > > (that assumption, itself, but also that the fix still works). > > Right, I think it's safe in that case - that's what I was looking at. Note that I didn't test this. > > > But I think that will treat dataless and > > > with-data FINs the same way, and let them use the keep mechanism. > > > > Given that the only advantage of doing this would be to handle a rare > > (I guess) corner case, that is, an out-of-sequence FIN segment with > > data, which is not critical anyway because the FIN will be > > retransmitted, I would rather keep this (critical) fix as it is. > > > > I would suggest filing a separate ticket or anyway sending a separate > > patch if you want to fix that other case. > > Fair enough. I'll wait for you to get the basic fix merge, then I > might batch this cleanup/fixup with the reorg to clarify bytes_acked > handling. Okay, thanks, merged now. -- Stefano