public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
* tcp_splice SO_RCVLOWAT code; never invoked?
@ 2025-04-11  4:54 David Gibson
  2025-04-11  6:13 ` Stefano Brivio
  0 siblings, 1 reply; 3+ messages in thread
From: David Gibson @ 2025-04-11  4:54 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: passt-dev

[-- Attachment #1: Type: text/plain, Size: 1264 bytes --]

Hi Stefano,

When debugging the splice EINTR bug I fixed the other day, I found the
whole tcp_splice_sock_handler() pretty confusing to follow.  So, I was
working on some cleanups.  But then I noticed something more
specifically odd here.

We've discussed the use of SO_RCVLOWAT previously.  AIUI, you found it
essential to achieve reasonable throughput and load for spliced
connections.  I think we've agreed before that it's not entirely the
right tool for the job; just the only one available.

Except... as far as I can tell, it's never invoked.  AFAICT the only
place we enable the RCVLOWAT stuff is in a block under this if:

	if (!(conn->flags & lowat_set_flag) &&
	    readlen > (long)c->tcp.pipe_size / 10) {

But... this occurs immediately after:
	if (readlen >= (long)c->tcp.pipe_size * 10 / 100)
		continue;

.. which is a strictly more inclusive condition, so we'll never reach
the RCVLOWAT block.

To confirm, I tried putting an ASSERT(0) in that block, and didn't hit
it with spliced iperf3 runs.

Am I missing something?

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: tcp_splice SO_RCVLOWAT code; never invoked?
  2025-04-11  4:54 tcp_splice SO_RCVLOWAT code; never invoked? David Gibson
@ 2025-04-11  6:13 ` Stefano Brivio
  2025-04-11  6:15   ` Stefano Brivio
  0 siblings, 1 reply; 3+ messages in thread
From: Stefano Brivio @ 2025-04-11  6:13 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Fri, 11 Apr 2025 14:54:50 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> Hi Stefano,
> 
> When debugging the splice EINTR bug I fixed the other day, I found the
> whole tcp_splice_sock_handler() pretty confusing to follow.  So, I was
> working on some cleanups.  But then I noticed something more
> specifically odd here.
> 
> We've discussed the use of SO_RCVLOWAT previously.  AIUI, you found it
> essential to achieve reasonable throughput and load for spliced
> connections.

From my tests back then (never on what I ended up committing, it seems)
it wasn't needed in general, it helped only with bulk transfers that
never feel the pipe for some reason. With iperf3, I needed to play with
parameters quite a bit to reproduce something like that. You would need
(at least) to disable Nagle's algorithm (-N) and send small messages
(say, -l 4k instead of -l 1M).

> I think we've agreed before that it's not entirely the
> right tool for the job; just the only one available.
> 
> Except... as far as I can tell, it's never invoked.  AFAICT the only
> place we enable the RCVLOWAT stuff is in a block under this if:
> 
> 	if (!(conn->flags & lowat_set_flag) &&
> 	    readlen > (long)c->tcp.pipe_size / 10) {
> 
> But... this occurs immediately after:
> 	if (readlen >= (long)c->tcp.pipe_size * 10 / 100)
> 		continue;
> 
> .. which is a strictly more inclusive condition, so we'll never reach
> the RCVLOWAT block.

Right, yes, I think we noticed a while ago a bit after trying to
restore the functionality with 01b6a164d94f ("tcp_splice: A typo three
years ago and SO_RCVLOWAT is gone"). That wasn't sufficient.

> To confirm, I tried putting an ASSERT(0) in that block, and didn't hit
> it with spliced iperf3 runs.
> 
> Am I missing something?

No, not really, and that error has been there forever, since I "added"
(not really) the feature in 904b86ade7db ("tcp: Rework window handling,
timers, add SO_RCVLOWAT and pools for sockets/pipes").

My intention was actually:

			if (read >= (long)c->tcp.pipe_size * 90 / 100)
				continue;

or something like that, perhaps 50% even. The idea behind it was: if we
already have good pipe utilisation, there's no need for SO_RCVLOWAT, and
we should retry calling splice() right away. But at this point we should
try again with iperf3 and smaller messages. Perhaps even limiting the
throughput (-b) with multiple flows...

-- 
Stefano


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: tcp_splice SO_RCVLOWAT code; never invoked?
  2025-04-11  6:13 ` Stefano Brivio
@ 2025-04-11  6:15   ` Stefano Brivio
  0 siblings, 0 replies; 3+ messages in thread
From: Stefano Brivio @ 2025-04-11  6:15 UTC (permalink / raw)
  To: David Gibson; +Cc: passt-dev

On Fri, 11 Apr 2025 08:13:23 +0200
Stefano Brivio <sbrivio@redhat.com> wrote:

> On Fri, 11 Apr 2025 14:54:50 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > Hi Stefano,
> > 
> > When debugging the splice EINTR bug I fixed the other day, I found the
> > whole tcp_splice_sock_handler() pretty confusing to follow.  So, I was
> > working on some cleanups.  But then I noticed something more
> > specifically odd here.
> > 
> > We've discussed the use of SO_RCVLOWAT previously.  AIUI, you found it
> > essential to achieve reasonable throughput and load for spliced
> > connections.  
> 
> From my tests back then (never on what I ended up committing, it seems)
> it wasn't needed in general, it helped only with bulk transfers that
> never feel the pipe for some reason. With iperf3, I needed to play with

fill, of course

-- 
Stefano


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-04-11  6:15 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-11  4:54 tcp_splice SO_RCVLOWAT code; never invoked? David Gibson
2025-04-11  6:13 ` Stefano Brivio
2025-04-11  6:15   ` Stefano Brivio

Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).