From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202412 header.b=Ym0J2wZn; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id B1D1F5A0276 for ; Tue, 21 Jan 2025 13:42:41 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202412; t=1737463346; bh=sTQqnkmgYLzAIZB5HJtP9J2O/jnMujGm4J3GWjf9WZg=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Ym0J2wZniajsj1hqDTIXMRNfIJbs1jCCpsecY2tmWdmI8bxjRZkLe6/co6SW52en6 PYdJArsO4yCiqGE7G8xM3TCRefFUdNmIOKO03KL6pyxEKK36Ln4aVi1mGHGjlNyxFo Kri7CjDFKWDmuHj8ZSI90cJYEYaNFtx3R9OCXf4nvBaouxDk7gA8VTabnfDciHFzay JJZQ1wt+mHcLBSjEpXXozwR+eSbktBQIqLl/PQ63FEXywjosJH6wAzTj1hRI9AMyAj rPg0VH9RWmJS2sJpHRy9KUcJRYazWE/4gMLsSADi9VrBpLrU+r2RFd2C5s1z6aL3Lw rNnJqKiWVrDhg== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4Ycn0f2Rrcz4x8Y; Tue, 21 Jan 2025 23:42:26 +1100 (AEDT) Date: Tue, 21 Jan 2025 23:12:05 +1030 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v2] tcp: Disable Nagle's algorithm (set TCP_NODELAY) on all sockets Message-ID: References: <20250120172816.2102833-1-sbrivio@redhat.com> <20250121094002.7d73f1be@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="GdTH7qoI159LA7s7" Content-Disposition: inline In-Reply-To: <20250121094002.7d73f1be@elisabeth> Message-ID-Hash: MUPL2BCDNHGEAXWQZYGJDZMM7IMBZWD5 X-Message-ID-Hash: MUPL2BCDNHGEAXWQZYGJDZMM7IMBZWD5 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --GdTH7qoI159LA7s7 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jan 21, 2025 at 09:40:02AM +0100, Stefano Brivio wrote: > On Tue, 21 Jan 2025 13:31:03 +1030 > David Gibson wrote: >=20 > > On Mon, Jan 20, 2025 at 06:28:16PM +0100, Stefano Brivio wrote: > > > Following up on 725acd111ba3 ("tcp_splice: Set (again) TCP_NODELAY on > > > both sides"), David argues that, in general, we don't know what kind > > > of TCP traffic we're dealing with, on any side or path. > > >=20 > > > TCP segments might have been delivered to our socket with a PSH flag, > > > but we don't have a way to know about it. > > >=20 > > > Similarly, the guest might send us segments with PSH or URG set, but > > > we don't know if we should generally TCP_CORK sockets and uncork on > > > those flags, because that would assume they're running a Linux kernel > > > (and a particular version of it) matching the kernel that delivers > > > outbound packets for us. > > >=20 > > > Given that we can't make any assumption and everything might very well > > > be interactive traffic, disable Nagle's algorithm on all non-spliced > > > sockets as well. > > >=20 > > > After all, John Nagle himself is nowadays recommending that delayed > > > ACKs should never be enabled together with his algorithm, but we > > > don't have a practical way to ensure that our environment is free from > > > delayed ACKs (TCP_QUICKACK is not really usable for this purpose): > > >=20 > > > https://news.ycombinator.com/item?id=3D34180239 > > >=20 > > > Suggested-by: David Gibson > > > Signed-off-by: Stefano Brivio > > > --- > > > v2: Set TCP_NODELAY on inbound socket after accept4(), not on the > > > listening sockets, and change failure message to debug() > > > instead of trace() =20 > >=20 > > Uh.. you've moved it to after accept() for incoming connections, but > > it looks like you've accidentally removed the call for outgoing > > connections (tcp_conn_new_sock()). >=20 > Uh... no? It's here: >=20 > > > tcp.c | 15 +++++++++++++++ > > > 1 file changed, 15 insertions(+) > > >=20 > > > diff --git a/tcp.c b/tcp.c > > > index a012b81..4d6a6b3 100644 > > > --- a/tcp.c > > > +++ b/tcp.c > > > @@ -756,6 +756,19 @@ static void tcp_sock_set_bufsize(const struct ct= x *c, int s) > > > trace("TCP: failed to set SO_SNDBUF to %i", v); > > > } > > > =20 > > > +/** > > > + * tcp_sock_set_nodelay() - Set TCP_NODELAY option (disable Nagle's = algorithm) > > > + * @s: Socket, can be -1 to avoid check in the caller > > > + */ > > > +static void tcp_sock_set_nodelay(int s) > > > +{ > > > + if (s =3D=3D -1) > > > + return; > > > + > > > + if (setsockopt(s, SOL_TCP, TCP_NODELAY, &((int){ 1 }), sizeof(int))) > > > + debug("TCP: failed to set TCP_NODELAY on socket %i", s); > > > +} > > > + > > > /** > > > * tcp_update_csum() - Calculate TCP checksum > > > * @psum: Unfolded partial checksum of the IPv4 or IPv6 pseudo-header > > > @@ -1285,6 +1298,7 @@ static int tcp_conn_new_sock(const struct ctx *= c, sa_family_t af) > ^^^^^^^^^^^^^^^^^ >=20 > > > return -errno; > > > =20 > > > tcp_sock_set_bufsize(c, s); > > > + tcp_sock_set_nodelay(s); > ^^^^^^^^^^^^^^^^^^^^^^^^ >=20 > ...am I missing something? Oops, no, I am. The page break fell at an inconvenient spot which made my eye skate over the relevant hunk, sorry. >=20 > > > =20 > > > return s; > > > } > > > @@ -2058,6 +2072,7 @@ void tcp_listen_handler(const struct ctx *c, un= ion epoll_ref ref, > > > goto cancel; > > > =20 > > > tcp_sock_set_bufsize(c, s); > > > + tcp_sock_set_nodelay(s); > > > =20 > > > /* FIXME: When listening port has a specific bound address, record = that > > > * as our address =20 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --GdTH7qoI159LA7s7 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmePlhEACgkQzQJF27ox 2GePHxAAnbDaXgaR63gQs153gPBRJROSWaoPTQ2BANdh8dsbMtUjqFY0+ZCim2CU 2wr7XtZGcD8+bZr5OmAdrjRTEJmpE2PqaL4JqM3ZBUGA1pmtYwcmxrGoNFaL7wo4 miLX6g4Mxd/iKd1XoSkoznAjv9iY55Crgu48rE44pesnRg3PJtXF2Pmd2J9U5n0y /kQp7qmG095wphDJoHod7hRNBAWZLZG5sUhxI543riC4bEi4RzxYTy8R1sW/FsHy wAUFqI8fgP4E0QfIS30h5OKQP0+MPU7U47Uu+OTyuIIwxP/VnYQfwQsMBC1dW3E6 430j0Jaf5VqwBj+RRCZiKq59D2YNnEJFomvv0TuK1Eoqi0P11Io/F90H+y/oyhx8 GpC05XEuVwNUZ0adqUKCfYpF3R7k410HK3Cs8DbXdKNRHDkhFQhOlh7ercdminNg 9KIP7Rwp2QC9CDl48PgzKS6E71qYq5813Hv59G4Y9GQmYxPssObPCaFnFaLGZLL8 YDTWL8hMic/kgvX0aO+k7TBv9CG1Ior8/DVQzYKtLvSP6DrXfM44gu/8BYAA/RSk wGSLKZBEMBX8L80TMrEjudCedGsWrQd2N4JBKgkuoh4VvpLYq3XzYHyIsnjAUbjf W2JOJOM3Hrh7zSrXKt2xjMC479Phff/HkGnVdRWKA9nDmbsb4hM= =J2rC -----END PGP SIGNATURE----- --GdTH7qoI159LA7s7--