On Fri, Apr 26, 2024 at 07:58:32AM +0200, Stefano Brivio wrote: > On Fri, 26 Apr 2024 13:27:11 +1000 > David Gibson wrote: > > > On Wed, Apr 24, 2024 at 08:30:44PM +0200, Stefano Brivio wrote: > > > On Wed, 24 Apr 2024 10:48:05 +1000 > > > David Gibson wrote: > > > > > > > On Tue, Apr 23, 2024 at 07:50:10PM +0200, Stefano Brivio wrote: > > > > > On Sat, 20 Apr 2024 15:19:19 -0400 > > > > > Jon Maloy wrote: > > > > [snip] > > > > > > + set_peek_offset(s, 0); > > > > > > > > > > Do we really need to initialise it to zero on a new connection? Extra > > > > > system calls on this path matter for latency of connection > > > > > establishment. > > > > > > > > Sort of, yes: we need to enable the SO_PEEK_OFF behaviour by setting > > > > it to 0, rather than the default -1. > > > > > > By the way of which, this is not documented at this point -- a man page > > > patch (linux-man and linux-api lists) would be nice. > > > > > > > We could lazily enable it, but > > > > we'd need either to a) do it later in the handshake (maybe when we set > > > > ESTABLISHED), but we'd need to be careful it is always set before the > > > > first MSG_PEEK > > > > > > I was actually thinking that we could set it only as we receive data > > > (not every connection will receive data), and keep this out of the > > > handshake (which we want to keep "faster", I think). > > > > That makes sense, but I think it would need a per-connection flag. > > Definitely. > > > > And setting it as we mark a connection as ESTABLISHED should have the > > > same effect on latency as setting it on a new connection -- that's not > > > really lazy. So, actually: > > > > Good point. > > > > > > or b) keep track of whether it's set on a per-socket > > > > basis (this would have the advantage of robustness if we ever > > > > encountered a kernel that weirdly allows it for some but not all TCP > > > > sockets). > > > > > > ...this could be done as we receive data in tcp_data_from_sock(), with > > > a new flag in tcp_tap_conn::flags, to avoid adding latency to the > > > handshake. It also looks more robust to me, and done/checked in a > > > single place where we need it. > > > > > > We have just three bits left there which isn't great, but if we need to > > > save one at a later point, we can drop this new flag easily. > > > > I just realised that folding the feature detection into this is a bit > > costlier than I thought. If we globally probe the feature we just > > need one bit per connection: is SO_PEEK_OFF set yet or not. If we > > tried to probe per-connection we'd need a tristate: haven't tried / > > SO_PEEK_OFF enabled / tried and failed. > > I forgot to mention this part: what I wanted to propose was actually > still a global probe, so that we don't waste one system call per > connection on kernels not supporting this (a substantial use case for a > couple of years from now?), which probably outweighs the advantage of > the weird, purely theoretical kernel not supporting the feature for > some sockets only. > And then something like PEEK_OFFSET_SET (SO_PEEK_OFF_SET sounds awkward > to me) on top. Another advantage is avoiding the tristate you described. Right, having thought it through I agree this is a better approach. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson