From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=WW91ReZI; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id EE9465A0619 for ; Tue, 07 Oct 2025 00:33:00 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1759789980; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3QZvqARiE4cVFrWJUUqVcU+Q8Y80pfpE230CBaRslg8=; b=WW91ReZI74hnWRb2slOqaFvdk4tI+RheyRpwCt0qJJgQUsZuD1qp633tp6momWTtlHc0T6 M+oaa7V4aaYsoUrKFpvjUeUm3JHhMF+TCFgYdJ43P7MrnxlyLlIB55LGrbWYAbd+nXOKIB +Cm0oSyWMtW3GCm644t88dl2bwyWHZM= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-519-7szxw4_DMnSG8L9UDSPPEQ-1; Mon, 06 Oct 2025 18:32:59 -0400 X-MC-Unique: 7szxw4_DMnSG8L9UDSPPEQ-1 X-Mimecast-MFC-AGG-ID: 7szxw4_DMnSG8L9UDSPPEQ_1759789978 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-46e4c8fa2b1so23381915e9.0 for ; Mon, 06 Oct 2025 15:32:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759789977; x=1760394777; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=3QZvqARiE4cVFrWJUUqVcU+Q8Y80pfpE230CBaRslg8=; b=DA4ZogE56o4CIheSw2h49B7ZSMOxqse8CprRRBu8RJkXmsdcz3OKLgTFKGQ39M2S8J Scd0Rl8NMfAh8LMgty1vSPm/ypC80r7EkFSdTLXIO+Sb6BB0gdltDSeKGVJeCLmSufYl HwdEDZzgXNunaRmpP+Ui6QkcOyd2qaCQ9Nwn4e6Z/1qvKyWjX0BshKmIREo/wTf3RW1P CTQ6PC/gQuqDNmUlq52df4ZGlEI9ioWJk1L1aGLUs8Bo8otj0YiCLOs9BSnTyOxPG52z lBp5bpxXBl9fxO5tV5UbEo1jOn+tsV1FZG6TyaDvmZMPIrMjHugFHGG7XbZOh+g2PWjk Cbug== X-Gm-Message-State: AOJu0Yxfk6V4uUlDbvNgAsBcjd34ynUlruPGTsYkfMimoAN7IO3/yHik fKfUJPx+OAoIzx9psTB8DlH3MyBjZzdnOTiwFvCeautwzOLqEQShlxt8f5txLwf85qEuexpVqVc HhzPgaOtV1EW9sFPxu/B7Ct73TC7MCKhyXw+PpqXKqM0KuPujVfGr1l6FWypeKw== X-Gm-Gg: ASbGncsN8+KEv31dEoQsxIVpnRS+sy4ZjztijkYMmzbsUNpU3SjnjzTQSmUV/8M/E+/ Lg0YjF+j7iCh8zolATgrfhwqzZKdIlW62pyvAFmZuGNKneUG8Fn5mqZc++D32F24Srif2lw8qQF XW0nmVuOpN7Bj9QNR5Vu1Uuf0wT7uKZ0xZC8W5LRl6x7vmT4MlFiz7rIPRDqQ6s7ydXsHi9itj/ df5txDWoNrDj4JCm7G+zQdK7B7JOOEIWVP5267y+RAPUfFWThfLlxok106MCmXkP3qNHxiTnctw 0nAhuSitCcuaBnXLviq+zYASo841JZzYNRV+8nyvyU070/bU6n4rkyGi X-Received: by 2002:a05:600d:108:10b0:46e:4cd3:7d54 with SMTP id 5b1f17b1804b1-46e7271cd6emr72054085e9.18.1759789976778; Mon, 06 Oct 2025 15:32:56 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHSJZuQkpsoZSfqouxd/xjHrpZz266dSkaip8efk+SZjZlaaUcrJ7bKXW0wSCuqtKK+4f84aA== X-Received: by 2002:a05:600d:108:10b0:46e:4cd3:7d54 with SMTP id 5b1f17b1804b1-46e7271cd6emr72053975e9.18.1759789976281; Mon, 06 Oct 2025 15:32:56 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-46e61a020a3sm277730325e9.10.2025.10.06.15.32.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Oct 2025 15:32:55 -0700 (PDT) Date: Tue, 7 Oct 2025 00:32:54 +0200 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 3/4] tcp: Don't consider FIN flags with mismatching sequence Message-ID: <20251007003254.57ce5e26@elisabeth> In-Reply-To: References: <20251002000646.2136202-1-sbrivio@redhat.com> <20251002000646.2136202-4-sbrivio@redhat.com> <20251002135104.1a662029@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: rRi5vaFvk_pXISB6feTYKqVkqMLc7val7egtT-1P1TI_1759789978 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: ZTLUEQO5YGPFW4Z7NF24B6TOLPFC5EJJ X-Message-ID-Hash: ZTLUEQO5YGPFW4Z7NF24B6TOLPFC5EJJ X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, 3 Oct 2025 13:43:32 +1000 David Gibson wrote: > On Thu, Oct 02, 2025 at 01:51:04PM +0200, Stefano Brivio wrote: > > On Thu, 2 Oct 2025 13:02:09 +1000 > > David Gibson wrote: > > > On Thu, Oct 02, 2025 at 12:52:31PM +1000, David Gibson wrote: > > > > On Thu, Oct 02, 2025 at 02:06:45AM +0200, Stefano Brivio wrote: > [snip] > > > > > diff --git a/tcp.c b/tcp.c > > > > > index 3f7dc82..5a7a607 100644 > > > > > --- a/tcp.c > > > > > +++ b/tcp.c > > > > > @@ -1769,7 +1769,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, > > > > > } > > > > > } > > > > > > > > > > - if (th->fin) > > > > > + if (th->fin && seq == seq_from_tap) > > > > > fin = 1; > > > > > > > > Can a FIN segment also contain data? My quick googling suggests yes. > > > > Yes, absolutely, my slow wiresharking over the years also confirms, and > > it's so often the case that (I think) this issue doesn't happen so > > frequently as it only occurs if we have a FIN segment without data. > > Makes sense. > > > If we have a data segment, with FIN set, that we can't fully transmit, > > we already set 'partial_send' and won't set TAP_FIN_RCVD as a result. > > > > Another case where we want to ignore a FIN segment with data is if we > > have a gap before it, but in that case we'll eventually set 'keep' and > > return early. > > Ah, right. I'd noticed we set fin = 1 in that case, but forgotten > about the exit before setting TAP_FIN_RCVD if keep is set. > > > > > If so, doesn't this logic need to go after we process the data > > > > processing, so that seq_from_tap points to the end of the packet's > > > > data, rather than the beginning? (And the handling of zero-length > > > > packets would also need revision to match). > > > > This made sense to me for a moment but now I'm struggling to understand > > or remember why. What I want to check here is that a FIN segment > > without data (I should have specified in the commit message) is > > acceptable because its sequence is as expected. > > Right. This is correct for zero-data FIN segments, but I think as a > side-effect you've made it ignore certain FIN segments _with_ data. > It will work in the common case where the data exactly follows on from > what we already have. But in the case where the segment has some data > we already have and some new data, the fin = 1 won't trip because seq > != seq_from_tap. There isn't another place that will catch it > instead, AFAICT. > > I guess it will be fine in the end, because with all the data acked, > the guest should retransmit the FIN with no data. > > > But going back to FIN segments with data: why should we sum the length > > to seq_from_tap before comparing the sequence? I don't understand what > > additional check you want to introduce, or what case you want to cover. > > I was thinking about the case above, but I didn't explain it very > well. > > > > Following on from that, it seems to me like it would make sense for > > > FIN segments to also participate in the 'keep' mechanism. It should > > > work eventually, but I expect it would be smoother in the case that we > > > get a final burst of packets in a stream out of order. > > > > FIN segments with data already go through that dance. > > > > Without data, I guess you're right, we might have in the same batch > > (not that I've ever seen it happening in practice) a FIN segment > > without data that we process first (and now discard because of the > > sequence number), and some data before that we process later, so we > > shouldn't throw away the FIN segment because of that. We should, > > conceptually, reorder it as well. > > > > It probably makes things more complicated for a reason that's not so > > critical (ignoring a FIN is fine, we'll get another one), and I wanted > > to have the simplest possible fix here. > > > > Let me see if I can make this entirely correct without a substantially > > bigger change, I haven't really tried. > > How about this: > > diff --git a/tcp.c b/tcp.c > index 7da41797..42e576b4 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -1774,10 +1774,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, > } > } > > - if (th->fin) > - fin = 1; > - > - if (!len) > + if (!len && !th->fin) > continue; > > seq_offset = seq_from_tap - seq; > @@ -1820,6 +1817,8 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, > break; > seq_from_tap += size; > iov_i += count; > + if (th->fin) > + fin = 1; > > if (keep == i) > keep = -1; > > > We'd need to double check that the "accept data segment" path is safe > with len == 0, of course. For sure it's not before d2c33f45f7be ("tcp: Convert tcp_data_from_tap() to use iov_tail"), because we might add zero-length segments to the tcp_iov array, and that would make backporting an otherwise simple and critical fix to slightly older versions rather complicated. After that commit, I'm not sure about the behaviour of iov_tail_clone(). I think it will return 0, but it should be tested (that assumption, itself, but also that the fix still works). > But I think that will treat dataless and > with-data FINs the same way, and let them use the keep mechanism. Given that the only advantage of doing this would be to handle a rare (I guess) corner case, that is, an out-of-sequence FIN segment with data, which is not critical anyway because the FIN will be retransmitted, I would rather keep this (critical) fix as it is. I would suggest filing a separate ticket or anyway sending a separate patch if you want to fix that other case. -- Stefano