public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top, Tim Besard <tim.besard@gmail.com>
Subject: Re: [PATCH 2/2] tcp: Acknowledge keep-alive segments, ignore them for the rest
Date: Thu, 21 Nov 2024 17:21:12 +1100	[thread overview]
Message-ID: <Zz7RWEq02Zc3mSuM@zatzit> (raw)
In-Reply-To: <20241121052617.50cf96ef@elisabeth>

[-- Attachment #1: Type: text/plain, Size: 4616 bytes --]

On Thu, Nov 21, 2024 at 05:26:17AM +0100, Stefano Brivio wrote:
> On Thu, 21 Nov 2024 13:38:09 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Wed, Nov 20, 2024 at 07:43:44AM +0100, Stefano Brivio wrote:
> > > On Wed, 20 Nov 2024 12:02:00 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > On Tue, Nov 19, 2024 at 08:53:44PM +0100, Stefano Brivio wrote:  
> > > > > RFC 9293, 3.8.4 says:
> > > > > 
> > > > >    Implementers MAY include "keep-alives" in their TCP implementations
> > > > >    (MAY-5), although this practice is not universally accepted.  Some
> > > > >    TCP implementations, however, have included a keep-alive mechanism.
> > > > >    To confirm that an idle connection is still active, these
> > > > >    implementations send a probe segment designed to elicit a response
> > > > >    from the TCP peer.  Such a segment generally contains SEG.SEQ =
> > > > >    SND.NXT-1 and may or may not contain one garbage octet of data.  If
> > > > >    keep-alives are included, the application MUST be able to turn them
> > > > >    on or off for each TCP connection (MUST-24), and they MUST default to
> > > > >    off (MUST-25).
> > > > > 
> > > > > but currently, tcp_data_from_tap() is not aware of this and will
> > > > > schedule a fast re-transmit on the second keep-alive (because it's
> > > > > also a duplicate ACK), ignoring the fact that the sequence number was
> > > > > rewinded to SND.NXT-1.
> > > > > 
> > > > > ACK these keep-alive segments, reset the activity timeout, and ignore
> > > > > them for the rest.
> > > > > 
> > > > > At some point, we could think of implementing an approximation of
> > > > > keep-alive segments on outbound sockets, for example by setting
> > > > > TCP_KEEPIDLE to 1, and a large TCP_KEEPINTVL, so that we send a single
> > > > > keep-alive segment at approximately the same time, and never reset the
> > > > > connection. That's beyond the scope of this fix, though.
> > > > > 
> > > > > Reported-by: Tim Besard <tim.besard@gmail.com>
> > > > > Link: https://github.com/containers/podman/discussions/24572
> > > > > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> > > > > ---
> > > > >  tcp.c | 14 ++++++++++++++
> > > > >  1 file changed, 14 insertions(+)
> > > > > 
> > > > > diff --git a/tcp.c b/tcp.c
> > > > > index f357920..1eb85bb 100644
> > > > > --- a/tcp.c
> > > > > +++ b/tcp.c
> > > > > @@ -1763,6 +1763,20 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
> > > > >  			continue;
> > > > >  
> > > > >  		seq = ntohl(th->seq);
> > > > > +		if (SEQ_LT(seq, conn->seq_from_tap) && len <= 1) {
> > > > > +			flow_trace(conn,
> > > > > +				   "keep-alive sequence: %u, previous: %u",
> > > > > +				   seq, conn->seq_from_tap);
> > > > > +
> > > > > +			tcp_send_flag(c, conn, ACK);
> > > > > +			tcp_timer_ctl(c, conn);
> > > > > +
> > > > > +			if (p->count == 1)
> > > > > +				return 1;    
> > > > 
> > > > I'm not sure what this test is for.  Shouldn't the continue be sufficient?  
> > > 
> > > I don't think we want to go through tcp_update_seqack_from_tap(),
> > > tcp_tap_window_update() and the like on a keep-alive segment.  
> > 
> > Ah, I see.  But that is an optimisation, right?  It shouldn't be
> > necessary for correctness.
> 
> *Shouldn't*.
> 
> > > But if we receive something else in this batch, that's going to be a
> > > data segment that happened to arrive just after the keep-alive, so, in
> > > that case, we have to do the normal processing, by ignoring just this
> > > segment and hitting 'continue'.
> > > 
> > > Strictly speaking, the 'continue' is enough and correct, but I think
> > > that returning early in the obviously common case is simpler and more
> > > robust.  
> > 
> > Hrm.  Doesn't seem simpler to me, but I can see the point of the
> > change so,
> 
> The code itself is two lines longer, of course, with an additional
> early return. Considering all the possible side effects of looking at
> window values from a keep-alive segment looks to me more complicated
> than the alternative, though.

Except that we *will* consider them if there happen to be other data
packets in the batch.  That seems like it will just make any problems
from processing the keepalive sequence values harder to track down,
not make them go away.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2024-11-21  8:53 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-19 19:53 [PATCH 0/2] tcp: Handle keep-alives, avoid unnecessary timer scheduling Stefano Brivio
2024-11-19 19:53 ` [PATCH 1/2] tcp: Reset ACK_TO_TAP_DUE flag whenever an ACK isn't needed anymore Stefano Brivio
2024-11-20  0:58   ` David Gibson
2024-11-19 19:53 ` [PATCH 2/2] tcp: Acknowledge keep-alive segments, ignore them for the rest Stefano Brivio
2024-11-20  1:02   ` David Gibson
2024-11-20  6:43     ` Stefano Brivio
2024-11-21  2:38       ` David Gibson
2024-11-21  4:26         ` Stefano Brivio
2024-11-21  4:30           ` Stefano Brivio
2024-11-21  6:21           ` David Gibson [this message]
2024-11-21  9:23             ` Stefano Brivio
2024-11-21  9:32               ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zz7RWEq02Zc3mSuM@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    --cc=tim.besard@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).