public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: passt-dev@passt.top
Subject: Re: [PATCH 3/4] tcp: Don't consider FIN flags with mismatching sequence
Date: Tue, 7 Oct 2025 00:32:54 +0200	[thread overview]
Message-ID: <20251007003254.57ce5e26@elisabeth> (raw)
In-Reply-To: <aN9GZNswivXxOwr3@zatzit>

On Fri, 3 Oct 2025 13:43:32 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Oct 02, 2025 at 01:51:04PM +0200, Stefano Brivio wrote:
> > On Thu, 2 Oct 2025 13:02:09 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:  
> > > On Thu, Oct 02, 2025 at 12:52:31PM +1000, David Gibson wrote:  
> > > > On Thu, Oct 02, 2025 at 02:06:45AM +0200, Stefano Brivio wrote:  
> [snip]
> > > > > diff --git a/tcp.c b/tcp.c
> > > > > index 3f7dc82..5a7a607 100644
> > > > > --- a/tcp.c
> > > > > +++ b/tcp.c
> > > > > @@ -1769,7 +1769,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
> > > > >  			}
> > > > >  		}
> > > > >  
> > > > > -		if (th->fin)
> > > > > +		if (th->fin && seq == seq_from_tap)
> > > > >  			fin = 1;    
> > > > 
> > > > Can a FIN segment also contain data?  My quick googling suggests yes.  
> > 
> > Yes, absolutely, my slow wiresharking over the years also confirms, and
> > it's so often the case that (I think) this issue doesn't happen so
> > frequently as it only occurs if we have a FIN segment without data.  
> 
> Makes sense.
> 
> > If we have a data segment, with FIN set, that we can't fully transmit,
> > we already set 'partial_send' and won't set TAP_FIN_RCVD as a result.
> > 
> > Another case where we want to ignore a FIN segment with data is if we
> > have a gap before it, but in that case we'll eventually set 'keep' and
> > return early.  
> 
> Ah, right.  I'd noticed we set fin = 1 in that case, but forgotten
> about the exit before setting TAP_FIN_RCVD if keep is set.
> 
> > > > If so, doesn't this logic need to go after we process the data
> > > > processing, so that seq_from_tap points to the end of the packet's
> > > > data, rather than the beginning?  (And the handling of zero-length
> > > > packets would also need revision to match).    
> > 
> > This made sense to me for a moment but now I'm struggling to understand
> > or remember why. What I want to check here is that a FIN segment
> > without data (I should have specified in the commit message) is
> > acceptable because its sequence is as expected.  
> 
> Right.  This is correct for zero-data FIN segments, but I think as a
> side-effect you've made it ignore certain FIN segments _with_ data.
> It will work in the common case where the data exactly follows on from
> what we already have.  But in the case where the segment has some data
> we already have and some new data, the fin = 1 won't trip because seq
> != seq_from_tap.  There isn't another place that will catch it
> instead, AFAICT.
> 
> I guess it will be fine in the end, because with all the data acked,
> the guest should retransmit the FIN with no data.
> 
> > But going back to FIN segments with data: why should we sum the length
> > to seq_from_tap before comparing the sequence? I don't understand what
> > additional check you want to introduce, or what case you want to cover.  
> 
> I was thinking about the case above, but I didn't explain it very
> well.
> 
> > > Following on from that, it seems to me like it would make sense for
> > > FIN segments to also participate in the 'keep' mechanism.  It should
> > > work eventually, but I expect it would be smoother in the case that we
> > > get a final burst of packets in a stream out of order.  
> > 
> > FIN segments with data already go through that dance.
> > 
> > Without data, I guess you're right, we might have in the same batch
> > (not that I've ever seen it happening in practice) a FIN segment
> > without data that we process first (and now discard because of the
> > sequence number), and some data before that we process later, so we
> > shouldn't throw away the FIN segment because of that. We should,
> > conceptually, reorder it as well.
> > 
> > It probably makes things more complicated for a reason that's not so
> > critical (ignoring a FIN is fine, we'll get another one), and I wanted
> > to have the simplest possible fix here.
> > 
> > Let me see if I can make this entirely correct without a substantially
> > bigger change, I haven't really tried.  
> 
> How about this:
> 
> diff --git a/tcp.c b/tcp.c
> index 7da41797..42e576b4 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -1774,10 +1774,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
>  			}
>  		}
>  
> -		if (th->fin)
> -			fin = 1;
> -
> -		if (!len)
> +		if (!len && !th->fin)
>  			continue;
>  
>  		seq_offset = seq_from_tap - seq;
> @@ -1820,6 +1817,8 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
>  			break;
>  		seq_from_tap += size;
>  		iov_i += count;
> +		if (th->fin)
> +			fin = 1;
>  
>  		if (keep == i)
>  			keep = -1;
> 
> 
> We'd need to double check that the "accept data segment" path is safe
> with len == 0, of course.

For sure it's not before d2c33f45f7be ("tcp: Convert
tcp_data_from_tap() to use iov_tail"), because we might add
zero-length segments to the tcp_iov array, and that would make
backporting an otherwise simple and critical fix to slightly older
versions rather complicated.

After that commit, I'm not sure about the behaviour of
iov_tail_clone(). I think it will return 0, but it should be tested
(that assumption, itself, but also that the fix still works).

> But I think that will treat dataless and
> with-data FINs the same way, and let them use the keep mechanism.

Given that the only advantage of doing this would be to handle a rare
(I guess) corner case, that is, an out-of-sequence FIN segment with
data, which is not critical anyway because the FIN will be
retransmitted, I would rather keep this (critical) fix as it is.

I would suggest filing a separate ticket or anyway sending a separate
patch if you want to fix that other case.

-- 
Stefano


  reply	other threads:[~2025-10-06 22:33 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-02  0:06 [PATCH 0/4] tcp: Fix bad switch to CLOSE-WAIT state and surrounding issues Stefano Brivio
2025-10-02  0:06 ` [PATCH 1/4] tcp: Fix ACK sequence on FIN to tap Stefano Brivio
2025-10-02  2:41   ` David Gibson
2025-10-02 11:58     ` Stefano Brivio
2025-10-03  3:19       ` David Gibson
2025-10-06 22:32         ` Stefano Brivio
2025-10-06 23:31           ` David Gibson
2025-10-02  0:06 ` [PATCH 2/4] tcp: Completely ignore data segment in CLOSE-WAIT state, log a message Stefano Brivio
2025-10-02  2:44   ` David Gibson
2025-10-02  0:06 ` [PATCH 3/4] tcp: Don't consider FIN flags with mismatching sequence Stefano Brivio
2025-10-02  2:52   ` David Gibson
2025-10-02  3:02     ` David Gibson
2025-10-02 11:51       ` Stefano Brivio
2025-10-03  3:43         ` David Gibson
2025-10-06 22:32           ` Stefano Brivio [this message]
2025-10-06 23:34             ` David Gibson
2025-10-02  0:06 ` [PATCH 4/4] tcp: On partial send (incomplete sendmsg()), request a retransmission right away Stefano Brivio
2025-10-02  3:00   ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251007003254.57ce5e26@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).