From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 241E15A026F for ; Mon, 25 Sep 2023 07:52:17 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=201602; t=1695621133; bh=909TyUK1lZ+WGj/FhsqFFhi9Ff/PC8oqwBnLp7oPNBc=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=BnrhYAGdef0xcjdYzWc/oXIDSiuIx2KyM4fp0a1VhqlwliMViKyJTHqt9aAhsrWRu x+gpnF4cq47IGzwiRPKxi+zXDcJDYq5ufT+tsQ09ZOn2K33cSSFq1Ga7giwcbMzBgt 7gzYtmJlo0b3YMaQtDe+wDfe/cV+4Xn4MnwczX7k= Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4RvBpj2T4Lz4xMC; Mon, 25 Sep 2023 15:52:13 +1000 (AEST) Date: Mon, 25 Sep 2023 14:57:40 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH RFT 5/5] passt.1: Add note about tuning rmem_max and wmem_max for throughput Message-ID: References: <20230922220610.58767-1-sbrivio@redhat.com> <20230922220610.58767-6-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="Vg9L150Go4KDTfSg" Content-Disposition: inline In-Reply-To: <20230922220610.58767-6-sbrivio@redhat.com> Message-ID-Hash: 6VRKECCRXA5T2IKOLUVD747EK66TYQJ2 X-Message-ID-Hash: 6VRKECCRXA5T2IKOLUVD747EK66TYQJ2 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Matej Hrica , passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --Vg9L150Go4KDTfSg Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Sep 23, 2023 at 12:06:10AM +0200, Stefano Brivio wrote: > Signed-off-by: Stefano Brivio > --- > passt.1 | 33 +++++++++++++++++++++++++++++++++ > 1 file changed, 33 insertions(+) >=20 > diff --git a/passt.1 b/passt.1 > index 1ad4276..bcbe6fd 100644 > --- a/passt.1 > +++ b/passt.1 > @@ -926,6 +926,39 @@ If the sending window cannot be queried, it will alw= ays be announced as the > current sending buffer size to guest or target namespace. This might aff= ect > throughput of TCP connections. > =20 > +.SS Tuning for high throughput > + > +On Linux, by default, the maximum memory that can be set for receive and= send > +socket buffers is 208 KiB. Those limits are set by the > +\fI/proc/sys/net/core/rmem_max\fR and \fI/proc/sys/net/core/wmem_max\fR = files, > +see \fBsocket\fR(7). > + > +As of Linux 6.5, while the TCP implementation can dynamically shrink buf= fers > +depending on utilisation even above those limits, such a small limit will "shrink buffers" and "even above those limits" don't seem to quite work together. > +reflect on the advertised TCP window at the beginning of a connection, a= nd the Hmmm.... while [rw]mem_max might limit that initial window size, I wouldn't expect increasing the limits alone to increase that initial window size: wouldn't that instead be affected by the TCP default buffer size i.e. the middle value in net.ipv4.tcp_rmem? > +buffer size of the UNIX domain socket buffer used by \fBpasst\fR cannot = exceed > +these limits anyway. > + > +Further, as of Linux 6.5, using socket options \fBSO_RCVBUF\fR and > +\fBSO_SNDBUF\fR will prevent TCP buffers to expand above the \fIrmem_max= \fR and > +\fIwmem_max\fR limits because the automatic adjustment provided by the T= CP > +implementation is then disabled. > + > +As a consequence, \fBpasst\fR and \fBpasta\fR probe these limits at star= t-up and > +will not set TCP socket buffer sizes if they are lower than 2 MiB, becau= se this > +would affect the maximum size of TCP buffers for the whole duration of a > +connection. > + > +Note that 208 KiB is, accounting for kernel overhead, enough to fit less= than > +three TCP packets at the default MSS. In applications where high through= put is > +expected, it is therefore advisable to increase those limits to at least= 2 MiB, > +or even 16 MiB: > + > +.nf > + sysctl -w net.core.rmem_max=3D$((16 << 20) > + sysctl -w net.core.wmem_max=3D$((16 << 20) > +.fi As noted in a previous mail, empirically, this doesn't necessarily seem to work better for me. I'm wondering if we'd be better off never touching RCFBUF and SNDBUF for TCP sockets, and letting the kernel do its adaptive thing. We probably still want to expand the buffers as much as we can for the Unix socket, though. And we likely still want expanded limits for the tests so that iperf3 can use large buffers > + > .SH LIMITATIONS > =20 > Currently, IGMP/MLD proxying (RFC 4605) and support for SCTP (RFC 4960) = are not --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --Vg9L150Go4KDTfSg Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmUREz0ACgkQzQJF27ox 2GeTxA/9H/zTew2o4xYxlzJ2ATLfoYGIfk7hLe1a1Ja5KgmM2dIRXRXYrm8g861A W2p6FSKgLg+KrNeJ2QLVldENmyO91ZfGueLuVQh3C5UPmIsS0mxFgH+/EV83jums wzO3Uf0U9LBw39GbqnL9ArhWtud0enctlfDa10fhZg4l4iVBJN8mBWSNLTNzLaxi kXLrPkwb7qSfCn0+44NmauS0R9qV74ywleHdTc944m7/vboP6TDe3mVCLVynv8fz pvjvJkO1YJ/9A60qLlr+4pmNmbVhj8cBZhPD+00VxAtLH2qJlTmf2LhXic+a2AcJ YEceyq++0E4d5kRbbaS+Dk/Hl5eFbLFEsKmHGQrAk2jvASWpiB/oyi2yxOyGWnBK 1FAg2JIDnYeuqBfQEcxLER3cLow2Ciws9t9WOLSMW2WEtokqlBYnoUHqetc+IrOv ygYP14XeMPj8cP8336X+Lp//juffc9xgP1M/qFzXGMjeu0mV7Nvw5LvykYOyVxhz i9CaF7gbEFyTZw/f0ae2m4pObRePLqUKY0Q7bz1FYGamqbxVGv82zQga0LUcCKqh WOBxApwl7PuNkz6rTAj7udiJ6kilQUopTtnW/yWHL5HghCTj0NUVhEe/s/uuZgvj 2z8UAVeE488EwDSeC+LizhCkME5XLBO0NuljYNOxUqNvZe5hmA4= =UKjM -----END PGP SIGNATURE----- --Vg9L150Go4KDTfSg--