From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 3B4085A0271 for ; Mon, 25 Sep 2023 07:52:17 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=201602; t=1695621133; bh=tL0+eVcn6wcU3d/ZsUqQIMSm/AlHS80vB+8ErdmEWsc=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=XfAp7USvQDnqXjp7nbeFIiQvzc74kR+02UufnTN0yA9AxXuEmflj3Mss4CBTopXIZ rN6AZsuegHlUVscx/0dpDgudiX9oDz4Nd/XeIH2U7gc8TDRI5rgZR1wZLrEh+iSQit j/Q+BpROmVqJYMnMZ8sFgKM3izp2RTNxiywAJNNA= Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4RvBpj2dm8z4xKl; Mon, 25 Sep 2023 15:52:13 +1000 (AEST) Date: Mon, 25 Sep 2023 15:52:11 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH RFT 0/5] Fixes and a workaround for TCP stalls with small buffers Message-ID: References: <20230922220610.58767-1-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="MfDvNWSZ1G3iljDT" Content-Disposition: inline In-Reply-To: <20230922220610.58767-1-sbrivio@redhat.com> Message-ID-Hash: FC765K4YCET22TOSW6ZAG36ASESKBHH4 X-Message-ID-Hash: FC765K4YCET22TOSW6ZAG36ASESKBHH4 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Matej Hrica , passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --MfDvNWSZ1G3iljDT Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Sep 23, 2023 at 12:06:05AM +0200, Stefano Brivio wrote: > The fundamental patch here is 3/5, which is a workaround for a rather > surprising kernel behaviour we seem to be hitting. This all comes from > the investigation around https://bugs.passt.top/show_bug.cgi?id=3D74. >=20 > I can't hit stalls anymore and throughput looks finally good to me > (~3.5gbps with 208 KiB rmem_max and wmem_max), but... please test. Write site issue, testing results I replied with a bunch of test information already, but that was all related to the specifically read-side issue: I used 16MiB wmem_max throughout, but limited the read side buffer either with rmem_max or SO_RCVBUF. I've now done some tests looking specifically for write side issues. I basically reversed the setup, with rmem_max set to 4MiB throughout, but wmem_max limited to 256kiB. With no patches applied, I easily get a stall, although the exact details are a little different from the read-side stall: rather than being consistently 0 there are a few small bursts of traffic on both sides. With 2/5 applied, there doesn't appear to be much difference in behaviour. With 3/5 applied, I can no longer reproduce stalls, but throughput isn't very good. With 4/5 applied, throughput seems to improve notably (from ~300Mbps to ~2.5Gbps, though it's not surprisingly variable from second to second). Tentative conclusions: * The primary cause of the stalls appears to be the kernel bug identified, where the window isn't properly recalculated after MSG_TRUNC. 3/5 appears to successfully work around that bug. I think getting that merged is our top priority. * 2/5 makes logical sense to me, but I don't see a lot of evidence of it changing the behaviour here much. I think we hold it back for now, polish it a bit, maybe reconsider it as part of a broader rethink of the STALLED flag. * 4/5 doesn't appear to be linked to the stalls per se, but does appear to generally improve behaviour with limited wmem_max. I think we can improve the implementation a bit, then look at merging as the second priority. Open questions: * Even with the fixes, why does very large rmem_max seem to cause wildly variable and not great throughput? * Why does explicitly limiting RCVBUF usually, but not always, cause very poor throughput but without stalling? * Given the above oddities, is there any value to us setting RCVBUF for TCP sockets, rather than just letting the kernel adapt it. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --MfDvNWSZ1G3iljDT Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmURIAMACgkQzQJF27ox 2GdwBw//QzWd1CmyEYPjMQ1b7iB/ttDIiOZ6xQ2G+BrdnqD3a5FZAcjOU0Lr6Irk //8q/6687YUw2bRa8Q8AjqIF6NXg5YP9y9jMqSA2VBT0HdaQtXhv8o7FS/tBOrIQ +JnTdLXxtyqRhDR41mt5fSUiQ5Bw+ZKabyeHPkSW9us01V2Ww9tokSTr7cdHM0Ee Mp6d5AOBYlG2mLlA+Mbd27r0Gmnsxn3GL71hUeD6ztXAMcp2mfhN7D6ZQXOpfx37 WWBgimR4W0+eePCfrftwDYctJlBnJInvGVcdq/u+l9qsl8Zv4MT+U9Cv8FtMzzkT nOYYpH/NJUd1L1Le0Ac2a02HxpG1n833tbuaQO1vLsg+38w30rq/8VI5E5v+y7DI +nNp0KvxL3uuuiisKxLwvDlGmtfvKVXSodqBblwF6O+dvnniJ4nYBVT7eLT+8FL+ 52kI5RSn0qOrRBsExntUmuiL9PHa4Sha6oJsXyCigo5n7pjLrqw5beGI7M+0soQi TM9g0m7aNRktTOn4lz2WpOIWP2mFUidXnYuvkTdbXeqx3OeSalJ9386a1P0m0U7m Lx5WGi5OkfaAwEE/ks7RXOR977MjWI871ftdDy5phjsVBLKHM0neLFvAnnrliYI+ RjUdhjf02X8e7JqJl7rtWuMc5M3wQuuPFoIhI9if1rU1UPLU9Tg= =Cp9O -----END PGP SIGNATURE----- --MfDvNWSZ1G3iljDT--