From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3])
	by passt.top (Postfix) with ESMTPS id 3B4085A0271
	for <passt-dev@passt.top>; Mon, 25 Sep 2023 07:52:17 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=gibson.dropbear.id.au; s=201602; t=1695621133;
	bh=tL0+eVcn6wcU3d/ZsUqQIMSm/AlHS80vB+8ErdmEWsc=;
	h=Date:From:To:Cc:Subject:References:In-Reply-To:From;
	b=XfAp7USvQDnqXjp7nbeFIiQvzc74kR+02UufnTN0yA9AxXuEmflj3Mss4CBTopXIZ
	 rN6AZsuegHlUVscx/0dpDgudiX9oDz4Nd/XeIH2U7gc8TDRI5rgZR1wZLrEh+iSQit
	 j/Q+BpROmVqJYMnMZ8sFgKM3izp2RTNxiywAJNNA=
Received: by gandalf.ozlabs.org (Postfix, from userid 1007)
	id 4RvBpj2dm8z4xKl; Mon, 25 Sep 2023 15:52:13 +1000 (AEST)
Date: Mon, 25 Sep 2023 15:52:11 +1000
From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Subject: Re: [PATCH RFT 0/5] Fixes and a workaround for TCP stalls with small
 buffers
Message-ID: <ZREgC/uksH+2ol1b@zatzit>
References: <20230922220610.58767-1-sbrivio@redhat.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha256;
	protocol="application/pgp-signature"; boundary="MfDvNWSZ1G3iljDT"
Content-Disposition: inline
In-Reply-To: <20230922220610.58767-1-sbrivio@redhat.com>
Message-ID-Hash: FC765K4YCET22TOSW6ZAG36ASESKBHH4
X-Message-ID-Hash: FC765K4YCET22TOSW6ZAG36ASESKBHH4
X-MailFrom: dgibson@gandalf.ozlabs.org
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: Matej Hrica <mhrica@redhat.com>, passt-dev@passt.top
X-Mailman-Version: 3.3.8
Precedence: list
List-Id: Development discussion and patches for passt <passt-dev.passt.top>
Archived-At: <https://archives.passt.top/passt-dev/ZREgC/uksH+2ol1b@zatzit/>
Archived-At: <https://passt.top/hyperkitty/list/passt-dev@passt.top/message/FC765K4YCET22TOSW6ZAG36ASESKBHH4/>
List-Archive: <https://archives.passt.top/passt-dev/>
List-Archive: <https://passt.top/hyperkitty/list/passt-dev@passt.top/>
List-Help: <mailto:passt-dev-request@passt.top?subject=help>
List-Owner: <mailto:passt-dev-owner@passt.top>
List-Post: <mailto:passt-dev@passt.top>
List-Subscribe: <mailto:passt-dev-join@passt.top>
List-Unsubscribe: <mailto:passt-dev-leave@passt.top>


--MfDvNWSZ1G3iljDT
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Sep 23, 2023 at 12:06:05AM +0200, Stefano Brivio wrote:
> The fundamental patch here is 3/5, which is a workaround for a rather
> surprising kernel behaviour we seem to be hitting. This all comes from
> the investigation around https://bugs.passt.top/show_bug.cgi?id=3D74.
>=20
> I can't hit stalls anymore and throughput looks finally good to me
> (~3.5gbps with 208 KiB rmem_max and wmem_max), but... please test.

Write site issue, testing results

I replied with a bunch of test information already, but that was all
related to the specifically read-side issue: I used 16MiB wmem_max
throughout, but limited the read side buffer either with rmem_max or
SO_RCVBUF.

I've now done some tests looking specifically for write side issues.
I basically reversed the setup, with rmem_max set to 4MiB throughout,
but wmem_max limited to 256kiB.

With no patches applied, I easily get a stall, although the exact
details are a little different from the read-side stall: rather than
being consistently 0 there are a few small bursts of traffic on both
sides.

With 2/5 applied, there doesn't appear to be much difference in
behaviour.

With 3/5 applied, I can no longer reproduce stalls, but throughput
isn't very good.

With 4/5 applied, throughput seems to improve notably (from ~300Mbps
to ~2.5Gbps, though it's not surprisingly variable from second to
second).

Tentative conclusions:

 * The primary cause of the stalls appears to be the kernel bug
   identified, where the window isn't properly recalculated after
   MSG_TRUNC.  3/5 appears to successfully work around that bug.  I
   think getting that merged is our top priority.

 * 2/5 makes logical sense to me, but I don't see a lot of evidence of
   it changing the behaviour here much.  I think we hold it back for
   now, polish it a bit, maybe reconsider it as part of a broader
   rethink of the STALLED flag.

 * 4/5 doesn't appear to be linked to the stalls per se, but does
   appear to generally improve behaviour with limited wmem_max.  I
   think we can improve the implementation a bit, then look at merging
   as the second priority.

Open questions:

 * Even with the fixes, why does very large rmem_max seem to cause
   wildly variable and not great throughput?

 * Why does explicitly limiting RCVBUF usually, but not always, cause
   very poor throughput but without stalling?

 * Given the above oddities, is there any value to us setting RCVBUF
   for TCP sockets, rather than just letting the kernel adapt it.

--=20
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

--MfDvNWSZ1G3iljDT
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmURIAMACgkQzQJF27ox
2GdwBw//QzWd1CmyEYPjMQ1b7iB/ttDIiOZ6xQ2G+BrdnqD3a5FZAcjOU0Lr6Irk
//8q/6687YUw2bRa8Q8AjqIF6NXg5YP9y9jMqSA2VBT0HdaQtXhv8o7FS/tBOrIQ
+JnTdLXxtyqRhDR41mt5fSUiQ5Bw+ZKabyeHPkSW9us01V2Ww9tokSTr7cdHM0Ee
Mp6d5AOBYlG2mLlA+Mbd27r0Gmnsxn3GL71hUeD6ztXAMcp2mfhN7D6ZQXOpfx37
WWBgimR4W0+eePCfrftwDYctJlBnJInvGVcdq/u+l9qsl8Zv4MT+U9Cv8FtMzzkT
nOYYpH/NJUd1L1Le0Ac2a02HxpG1n833tbuaQO1vLsg+38w30rq/8VI5E5v+y7DI
+nNp0KvxL3uuuiisKxLwvDlGmtfvKVXSodqBblwF6O+dvnniJ4nYBVT7eLT+8FL+
52kI5RSn0qOrRBsExntUmuiL9PHa4Sha6oJsXyCigo5n7pjLrqw5beGI7M+0soQi
TM9g0m7aNRktTOn4lz2WpOIWP2mFUidXnYuvkTdbXeqx3OeSalJ9386a1P0m0U7m
Lx5WGi5OkfaAwEE/ks7RXOR977MjWI871ftdDy5phjsVBLKHM0neLFvAnnrliYI+
RjUdhjf02X8e7JqJl7rtWuMc5M3wQuuPFoIhI9if1rU1UPLU9Tg=
=Cp9O
-----END PGP SIGNATURE-----

--MfDvNWSZ1G3iljDT--