From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id DA4A55A026F for ; Thu, 28 Sep 2023 04:03:39 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=201602; t=1695866617; bh=kq53ZwDA56e1DbAQAkMG1sJ147or+x55L/CFcEFw7Z8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=JRBjYWo62jv277bkHJVYuP1gpX/ZjM8NlAKn9UUAlcpSTSA6+15X9UEvYzdamXOjm PW4eDVSPmqaPONNEEdiPxVLj7VbdFfUyzQh3MYaQ2MSoUulGyWUU/ZNku0ep7PRvrN b3oPMpIXbzfWt+TbSBoOXmUog4R+JwcrLezhiFYo= Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4RwxbY4tqNz4xQW; Thu, 28 Sep 2023 12:03:37 +1000 (AEST) Date: Thu, 28 Sep 2023 12:02:27 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH RFT 5/5] passt.1: Add note about tuning rmem_max and wmem_max for throughput Message-ID: References: <20230922220610.58767-1-sbrivio@redhat.com> <20230922220610.58767-6-sbrivio@redhat.com> <20230927190616.24821407@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="YVBjtjKaipL0oMvn" Content-Disposition: inline In-Reply-To: <20230927190616.24821407@elisabeth> Message-ID-Hash: JUBH4BCHQDIZIOD2QTPULEL5JYBKBX7B X-Message-ID-Hash: JUBH4BCHQDIZIOD2QTPULEL5JYBKBX7B X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Matej Hrica , passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --YVBjtjKaipL0oMvn Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Sep 27, 2023 at 07:06:16PM +0200, Stefano Brivio wrote: > On Mon, 25 Sep 2023 14:57:40 +1000 > David Gibson wrote: >=20 > > On Sat, Sep 23, 2023 at 12:06:10AM +0200, Stefano Brivio wrote: > > > Signed-off-by: Stefano Brivio > > > --- > > > passt.1 | 33 +++++++++++++++++++++++++++++++++ > > > 1 file changed, 33 insertions(+) > > >=20 > > > diff --git a/passt.1 b/passt.1 > > > index 1ad4276..bcbe6fd 100644 > > > --- a/passt.1 > > > +++ b/passt.1 > > > @@ -926,6 +926,39 @@ If the sending window cannot be queried, it will= always be announced as the > > > current sending buffer size to guest or target namespace. This might= affect > > > throughput of TCP connections. > > > =20 > > > +.SS Tuning for high throughput > > > + > > > +On Linux, by default, the maximum memory that can be set for receive= and send > > > +socket buffers is 208 KiB. Those limits are set by the > > > +\fI/proc/sys/net/core/rmem_max\fR and \fI/proc/sys/net/core/wmem_max= \fR files, > > > +see \fBsocket\fR(7). > > > + > > > +As of Linux 6.5, while the TCP implementation can dynamically shrink= buffers > > > +depending on utilisation even above those limits, such a small limit= will =20 > >=20 > > "shrink buffers" and "even above those limits" don't seem to quite > > work together. >=20 > Oops. I guess I should simply s/shrink/grow/ here. Or "resize" would work too. > > > +reflect on the advertised TCP window at the beginning of a connectio= n, and the =20 > >=20 > > Hmmm.... while [rw]mem_max might limit that initial window size, I > > wouldn't expect increasing the limits alone to increase that initial > > window size: wouldn't that instead be affected by the TCP default > > buffer size i.e. the middle value in net.ipv4.tcp_rmem? >=20 > If we don't use SO_RCVBUF, yes... but we currently do, and with that, > we can get a much larger initial window (as we do now). Good point. > On the other hand, maybe, as mentioned in my follow-up about 3/5, we > should drop SO_RCVBUF for TCP sockets. Ok. > > > +buffer size of the UNIX domain socket buffer used by \fBpasst\fR can= not exceed > > > +these limits anyway. > > > + > > > +Further, as of Linux 6.5, using socket options \fBSO_RCVBUF\fR and > > > +\fBSO_SNDBUF\fR will prevent TCP buffers to expand above the \fIrmem= _max\fR and > > > +\fIwmem_max\fR limits because the automatic adjustment provided by t= he TCP > > > +implementation is then disabled. > > > + > > > +As a consequence, \fBpasst\fR and \fBpasta\fR probe these limits at = start-up and > > > +will not set TCP socket buffer sizes if they are lower than 2 MiB, b= ecause this > > > +would affect the maximum size of TCP buffers for the whole duration = of a > > > +connection. > > > + > > > +Note that 208 KiB is, accounting for kernel overhead, enough to fit = less than > > > +three TCP packets at the default MSS. In applications where high thr= oughput is > > > +expected, it is therefore advisable to increase those limits to at l= east 2 MiB, > > > +or even 16 MiB: > > > + > > > +.nf > > > + sysctl -w net.core.rmem_max=3D$((16 << 20) > > > + sysctl -w net.core.wmem_max=3D$((16 << 20) > > > +.fi =20 > >=20 > > As noted in a previous mail, empirically, this doesn't necessarily > > seem to work better for me. I'm wondering if we'd be better off never > > touching RCFBUF and SNDBUF for TCP sockets, and letting the kernel do > > its adaptive thing. We probably still want to expand the buffers as > > much as we can for the Unix socket, though. And we likely still want > > expanded limits for the tests so that iperf3 can use large buffers >=20 > Right. Let's keep this patch for a later time then, and meanwhile check > if we should drop SO_RCVBUF, SO_SNDBUF, or both, for TCP sockets. Makes sense to me. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --YVBjtjKaipL0oMvn Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmUU3qwACgkQzQJF27ox 2Gc3pg//bIMfBUSf6LmLn4zOL8o0bh17c+cKR5wsaa+zb0tLEOCGPNmwZ4nRmuS3 FpVYbgtoEdjYD0ZDSfddSG+ezTVVRXVSdNSRP+/y4eqMDFs52OGawi7nkp6z7bHN FDci/Ab6pPRu6MLt+ygm1Qopc1IT5EPnD1J2PzPnz+gEMfHwzMyrDMhDmvvUUJn7 8/svXyi89BGqIOFZRYQPRVX+Dkyq4CRdhN9v8Sl2XDANEqVp7wQ1l70Q/FdQeOwK DPOWXuFuRWLzaxsWFLx7IGcc47FRA4CVWqOyUaGgYjT5LnTPu0jvBDhy3GTPwQ+2 ugXzMalk4z3+7hOHrGTDyOHooN9/43s/R07gcW4zg2FpbbXLEUf7yr0CVJ3Jd7Zw 16SQAtsUOBDi67qaVa3XIHkar8xNuuC5cVXNSMzFXkQdIIcq0WCQA7ZRAcMKmJ98 P+woQ2mzG9m82uXJLuBOrk3LzATEnTt4phF7nwG4OMihmFup3OIIu+5a/TZelmII gXWYE5ZLTbAdUkcz7R/WjCVq4BxMfO0AW37brJRXSbLRMISAaTn6/esm4foHHjOz QP/lm6HaSGdx7FwIALZ06WJoNFJ+tsVCwTXW3B01aQivDcDpl1gpZg5GOXmQQTnk hQh/C/Pg4Q+0CdGMKDOJyYY8UxVQeHPc5giHr81tMdqjE3ehn60= =dbgv -----END PGP SIGNATURE----- --YVBjtjKaipL0oMvn--