public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: Jon Maloy <jmaloy@redhat.com>,
	passt-dev@passt.top, lvivier@redhat.com, dgibson@redhat.com
Subject: Re: [PATCH 1/2] tcp: leverage support of SO_PEEK_OFF socket option when available
Date: Mon, 29 Apr 2024 11:46:01 +1000	[thread overview]
Message-ID: <Zi772TTk2vwWttso@zatzit> (raw)
In-Reply-To: <20240426075832.093aac78@elisabeth>

[-- Attachment #1: Type: text/plain, Size: 3630 bytes --]

On Fri, Apr 26, 2024 at 07:58:32AM +0200, Stefano Brivio wrote:
> On Fri, 26 Apr 2024 13:27:11 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Wed, Apr 24, 2024 at 08:30:44PM +0200, Stefano Brivio wrote:
> > > On Wed, 24 Apr 2024 10:48:05 +1000
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > On Tue, Apr 23, 2024 at 07:50:10PM +0200, Stefano Brivio wrote:  
> > > > > On Sat, 20 Apr 2024 15:19:19 -0400
> > > > > Jon Maloy <jmaloy@redhat.com> wrote:    
> > > > [snip]  
> > > > > > +	set_peek_offset(s, 0);    
> > > > > 
> > > > > Do we really need to initialise it to zero on a new connection? Extra
> > > > > system calls on this path matter for latency of connection
> > > > > establishment.    
> > > > 
> > > > Sort of, yes: we need to enable the SO_PEEK_OFF behaviour by setting
> > > > it to 0, rather than the default -1.  
> > > 
> > > By the way of which, this is not documented at this point -- a man page
> > > patch (linux-man and linux-api lists) would be nice.
> > >   
> > > > We could lazily enable it, but
> > > > we'd need either to a) do it later in the handshake (maybe when we set
> > > > ESTABLISHED), but we'd need to be careful it is always set before the
> > > > first MSG_PEEK  
> > > 
> > > I was actually thinking that we could set it only as we receive data
> > > (not every connection will receive data), and keep this out of the
> > > handshake (which we want to keep "faster", I think).  
> > 
> > That makes sense, but I think it would need a per-connection flag.
> 
> Definitely.
> 
> > > And setting it as we mark a connection as ESTABLISHED should have the
> > > same effect on latency as setting it on a new connection -- that's not
> > > really lazy. So, actually:  
> > 
> > Good point.
> > 
> > > > or b) keep track of whether it's set on a per-socket
> > > > basis (this would have the advantage of robustness if we ever
> > > > encountered a kernel that weirdly allows it for some but not all TCP
> > > > sockets).  
> > > 
> > > ...this could be done as we receive data in tcp_data_from_sock(), with
> > > a new flag in tcp_tap_conn::flags, to avoid adding latency to the
> > > handshake. It also looks more robust to me, and done/checked in a
> > > single place where we need it.
> > > 
> > > We have just three bits left there which isn't great, but if we need to
> > > save one at a later point, we can drop this new flag easily.  
> > 
> > I just realised that folding the feature detection into this is a bit
> > costlier than I thought.  If we globally probe the feature we just
> > need one bit per connection: is SO_PEEK_OFF set yet or not.  If we
> > tried to probe per-connection we'd need a tristate: haven't tried /
> > SO_PEEK_OFF enabled / tried and failed.
> 
> I forgot to mention this part: what I wanted to propose was actually
> still a global probe, so that we don't waste one system call per
> connection on kernels not supporting this (a substantial use case for a
> couple of years from now?), which probably outweighs the advantage of
> the weird, purely theoretical kernel not supporting the feature for
> some sockets only.

> And then something like PEEK_OFFSET_SET (SO_PEEK_OFF_SET sounds awkward
> to me) on top. Another advantage is avoiding the tristate you described.

Right, having thought it through I agree this is a better approach.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2024-04-29  1:46 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-20 19:19 [PATCH 0/2] Support for SO_PEEK_OFF when a available Jon Maloy
2024-04-20 19:19 ` [PATCH 1/2] tcp: leverage support of SO_PEEK_OFF socket option when available Jon Maloy
2024-04-23 17:50   ` Stefano Brivio
2024-04-24  0:48     ` David Gibson
2024-04-24 18:30       ` Stefano Brivio
2024-04-26  3:27         ` David Gibson
2024-04-26  5:58           ` Stefano Brivio
2024-04-29  1:46             ` David Gibson [this message]
2024-04-25 23:06       ` Jon Maloy
2024-04-24  0:44   ` David Gibson
2024-04-25 23:23     ` Jon Maloy
2024-04-26  3:29       ` David Gibson
2024-04-20 19:19 ` [PATCH 2/2] tcp: allow retransmit when peer receive window is zero Jon Maloy
2024-04-24  1:04   ` David Gibson
2024-04-24 18:31     ` Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zi772TTk2vwWttso@zatzit \
    --to=david@gibson.dropbear.id.au \
    --cc=dgibson@redhat.com \
    --cc=jmaloy@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).