On Sat, Sep 23, 2023 at 12:06:10AM +0200, Stefano Brivio wrote: > Signed-off-by: Stefano Brivio > --- > passt.1 | 33 +++++++++++++++++++++++++++++++++ > 1 file changed, 33 insertions(+) > > diff --git a/passt.1 b/passt.1 > index 1ad4276..bcbe6fd 100644 > --- a/passt.1 > +++ b/passt.1 > @@ -926,6 +926,39 @@ If the sending window cannot be queried, it will always be announced as the > current sending buffer size to guest or target namespace. This might affect > throughput of TCP connections. > > +.SS Tuning for high throughput > + > +On Linux, by default, the maximum memory that can be set for receive and send > +socket buffers is 208 KiB. Those limits are set by the > +\fI/proc/sys/net/core/rmem_max\fR and \fI/proc/sys/net/core/wmem_max\fR files, > +see \fBsocket\fR(7). > + > +As of Linux 6.5, while the TCP implementation can dynamically shrink buffers > +depending on utilisation even above those limits, such a small limit will "shrink buffers" and "even above those limits" don't seem to quite work together. > +reflect on the advertised TCP window at the beginning of a connection, and the Hmmm.... while [rw]mem_max might limit that initial window size, I wouldn't expect increasing the limits alone to increase that initial window size: wouldn't that instead be affected by the TCP default buffer size i.e. the middle value in net.ipv4.tcp_rmem? > +buffer size of the UNIX domain socket buffer used by \fBpasst\fR cannot exceed > +these limits anyway. > + > +Further, as of Linux 6.5, using socket options \fBSO_RCVBUF\fR and > +\fBSO_SNDBUF\fR will prevent TCP buffers to expand above the \fIrmem_max\fR and > +\fIwmem_max\fR limits because the automatic adjustment provided by the TCP > +implementation is then disabled. > + > +As a consequence, \fBpasst\fR and \fBpasta\fR probe these limits at start-up and > +will not set TCP socket buffer sizes if they are lower than 2 MiB, because this > +would affect the maximum size of TCP buffers for the whole duration of a > +connection. > + > +Note that 208 KiB is, accounting for kernel overhead, enough to fit less than > +three TCP packets at the default MSS. In applications where high throughput is > +expected, it is therefore advisable to increase those limits to at least 2 MiB, > +or even 16 MiB: > + > +.nf > + sysctl -w net.core.rmem_max=$((16 << 20) > + sysctl -w net.core.wmem_max=$((16 << 20) > +.fi As noted in a previous mail, empirically, this doesn't necessarily seem to work better for me. I'm wondering if we'd be better off never touching RCFBUF and SNDBUF for TCP sockets, and letting the kernel do its adaptive thing. We probably still want to expand the buffers as much as we can for the Unix socket, though. And we likely still want expanded limits for the tests so that iperf3 can use large buffers > + > .SH LIMITATIONS > > Currently, IGMP/MLD proxying (RFC 4605) and support for SCTP (RFC 4960) are not -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson