On Wed, Sep 27, 2023 at 07:06:16PM +0200, Stefano Brivio wrote: > On Mon, 25 Sep 2023 14:57:40 +1000 > David Gibson wrote: > > > On Sat, Sep 23, 2023 at 12:06:10AM +0200, Stefano Brivio wrote: > > > Signed-off-by: Stefano Brivio > > > --- > > > passt.1 | 33 +++++++++++++++++++++++++++++++++ > > > 1 file changed, 33 insertions(+) > > > > > > diff --git a/passt.1 b/passt.1 > > > index 1ad4276..bcbe6fd 100644 > > > --- a/passt.1 > > > +++ b/passt.1 > > > @@ -926,6 +926,39 @@ If the sending window cannot be queried, it will always be announced as the > > > current sending buffer size to guest or target namespace. This might affect > > > throughput of TCP connections. > > > > > > +.SS Tuning for high throughput > > > + > > > +On Linux, by default, the maximum memory that can be set for receive and send > > > +socket buffers is 208 KiB. Those limits are set by the > > > +\fI/proc/sys/net/core/rmem_max\fR and \fI/proc/sys/net/core/wmem_max\fR files, > > > +see \fBsocket\fR(7). > > > + > > > +As of Linux 6.5, while the TCP implementation can dynamically shrink buffers > > > +depending on utilisation even above those limits, such a small limit will > > > > "shrink buffers" and "even above those limits" don't seem to quite > > work together. > > Oops. I guess I should simply s/shrink/grow/ here. Or "resize" would work too. > > > +reflect on the advertised TCP window at the beginning of a connection, and the > > > > Hmmm.... while [rw]mem_max might limit that initial window size, I > > wouldn't expect increasing the limits alone to increase that initial > > window size: wouldn't that instead be affected by the TCP default > > buffer size i.e. the middle value in net.ipv4.tcp_rmem? > > If we don't use SO_RCVBUF, yes... but we currently do, and with that, > we can get a much larger initial window (as we do now). Good point. > On the other hand, maybe, as mentioned in my follow-up about 3/5, we > should drop SO_RCVBUF for TCP sockets. Ok. > > > +buffer size of the UNIX domain socket buffer used by \fBpasst\fR cannot exceed > > > +these limits anyway. > > > + > > > +Further, as of Linux 6.5, using socket options \fBSO_RCVBUF\fR and > > > +\fBSO_SNDBUF\fR will prevent TCP buffers to expand above the \fIrmem_max\fR and > > > +\fIwmem_max\fR limits because the automatic adjustment provided by the TCP > > > +implementation is then disabled. > > > + > > > +As a consequence, \fBpasst\fR and \fBpasta\fR probe these limits at start-up and > > > +will not set TCP socket buffer sizes if they are lower than 2 MiB, because this > > > +would affect the maximum size of TCP buffers for the whole duration of a > > > +connection. > > > + > > > +Note that 208 KiB is, accounting for kernel overhead, enough to fit less than > > > +three TCP packets at the default MSS. In applications where high throughput is > > > +expected, it is therefore advisable to increase those limits to at least 2 MiB, > > > +or even 16 MiB: > > > + > > > +.nf > > > + sysctl -w net.core.rmem_max=$((16 << 20) > > > + sysctl -w net.core.wmem_max=$((16 << 20) > > > +.fi > > > > As noted in a previous mail, empirically, this doesn't necessarily > > seem to work better for me. I'm wondering if we'd be better off never > > touching RCFBUF and SNDBUF for TCP sockets, and letting the kernel do > > its adaptive thing. We probably still want to expand the buffers as > > much as we can for the Unix socket, though. And we likely still want > > expanded limits for the tests so that iperf3 can use large buffers > > Right. Let's keep this patch for a later time then, and meanwhile check > if we should drop SO_RCVBUF, SO_SNDBUF, or both, for TCP sockets. Makes sense to me. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson