From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202412 header.b=pIQoUnpN; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 092035A0272 for ; Tue, 21 Jan 2025 13:42:37 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202412; t=1737463346; bh=aTzZpr65Oay7qsouWWS0zP5zaX4yHM0iWBvYKk5xMr0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=pIQoUnpN2Pqtqds3jmLsrWpxeL8HVOnEJ3gKPdbHqr/k0qNeyKF9asUqzHVtPPYek 2mW4BW/4i/gkGKBWMOwQSMKEtv4tWThEDa8hhtJLnjd4WBapp4SGzsM592G+qxXNse PxGX/ee1WpH8pn7D0DmmTTICODuVA12pYRzC70UR9O1x8g7KpRn2bMKCk6vuinMuSn AN0HvC8hNCI3Oa91vdq4tcvbFkdi8nrNGQTYRnFdPyhZrmGB7uHmXNRaOv/OA8i6Pq hHj8TGDJR+Qtup2mhz94Cbix9QiHOqahB77CNeljDEu5a/fNGf/I4VHcfr0Ct2p/U9 E1E7H6O+F8ngA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4Ycn0f2gbKz4x5m; Tue, 21 Jan 2025 23:42:26 +1100 (AEDT) Date: Tue, 21 Jan 2025 23:12:25 +1030 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH v2] tcp: Disable Nagle's algorithm (set TCP_NODELAY) on all sockets Message-ID: References: <20250120172816.2102833-1-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="mfFqIFWuMAvOSpIq" Content-Disposition: inline In-Reply-To: <20250120172816.2102833-1-sbrivio@redhat.com> Message-ID-Hash: ETRTFRLSPFWX4TZWCWIFLH4LA5GB2EFU X-Message-ID-Hash: ETRTFRLSPFWX4TZWCWIFLH4LA5GB2EFU X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --mfFqIFWuMAvOSpIq Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jan 20, 2025 at 06:28:16PM +0100, Stefano Brivio wrote: > Following up on 725acd111ba3 ("tcp_splice: Set (again) TCP_NODELAY on > both sides"), David argues that, in general, we don't know what kind > of TCP traffic we're dealing with, on any side or path. >=20 > TCP segments might have been delivered to our socket with a PSH flag, > but we don't have a way to know about it. >=20 > Similarly, the guest might send us segments with PSH or URG set, but > we don't know if we should generally TCP_CORK sockets and uncork on > those flags, because that would assume they're running a Linux kernel > (and a particular version of it) matching the kernel that delivers > outbound packets for us. >=20 > Given that we can't make any assumption and everything might very well > be interactive traffic, disable Nagle's algorithm on all non-spliced > sockets as well. >=20 > After all, John Nagle himself is nowadays recommending that delayed > ACKs should never be enabled together with his algorithm, but we > don't have a practical way to ensure that our environment is free from > delayed ACKs (TCP_QUICKACK is not really usable for this purpose): >=20 > https://news.ycombinator.com/item?id=3D34180239 >=20 > Suggested-by: David Gibson > Signed-off-by: Stefano Brivio Reviewed-by: David Gibson > --- > v2: Set TCP_NODELAY on inbound socket after accept4(), not on the > listening sockets, and change failure message to debug() > instead of trace() >=20 > tcp.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) >=20 > diff --git a/tcp.c b/tcp.c > index a012b81..4d6a6b3 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -756,6 +756,19 @@ static void tcp_sock_set_bufsize(const struct ctx *c= , int s) > trace("TCP: failed to set SO_SNDBUF to %i", v); > } > =20 > +/** > + * tcp_sock_set_nodelay() - Set TCP_NODELAY option (disable Nagle's algo= rithm) > + * @s: Socket, can be -1 to avoid check in the caller > + */ > +static void tcp_sock_set_nodelay(int s) > +{ > + if (s =3D=3D -1) > + return; > + > + if (setsockopt(s, SOL_TCP, TCP_NODELAY, &((int){ 1 }), sizeof(int))) > + debug("TCP: failed to set TCP_NODELAY on socket %i", s); > +} > + > /** > * tcp_update_csum() - Calculate TCP checksum > * @psum: Unfolded partial checksum of the IPv4 or IPv6 pseudo-header > @@ -1285,6 +1298,7 @@ static int tcp_conn_new_sock(const struct ctx *c, s= a_family_t af) > return -errno; > =20 > tcp_sock_set_bufsize(c, s); > + tcp_sock_set_nodelay(s); > =20 > return s; > } > @@ -2058,6 +2072,7 @@ void tcp_listen_handler(const struct ctx *c, union = epoll_ref ref, > goto cancel; > =20 > tcp_sock_set_bufsize(c, s); > + tcp_sock_set_nodelay(s); > =20 > /* FIXME: When listening port has a specific bound address, record that > * as our address --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --mfFqIFWuMAvOSpIq Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmePljEACgkQzQJF27ox 2Ge9yg//cAMXhewA0zL2t7d48Nq5hFF23ijKCOd5mt3bYH0aMA/etGWp9a40CVgZ juX/L0/NPt3ls3yOf8tqm9VB+EQ1we6U50JQg1+bW9mWhZRW2g92koicTfVN1guh DLSG/WRaIngnkherpXDw3WCL8RVlMGSV3y5IP6156DGDX5KShQXYPOLQwlMg2qw4 IlURK+2U1WC1I9GkGOC5lf5FPqkzf7j8ZH39UmYoOXdyNOi+ySsArA2jn67BaqOB vNtINLN232SrhqgnClQEOWESzL9IV0y/NSxbCxSLYSVzECleP4m9HU/4EX/tQK0L cX3G3HG58m4xr28+M41HXi72RABArobDjNcwzdAP255A65w9HqYEEaskagcFHx3S nvzTCzpE9Hcpvfbqhx7hBEBUc2tYD3v9UIbXGDkcAjpis3ixBjCFOhVQ3I02SD73 67Omg3+ILqZkqKCxtoK+vcWpbbNDSVM9nIS4EjImjKGO+KQelSXaN6eB5VWdLqK+ yr2Mqf07RUb20iwSB4A2uTeF5vH/zE+MpdZ37q2wypxJA4etUg3d9wZStqhUqV1X gNcBqFArROBCxGT8Aue44U4p6TWTMtPl3DJiPQw5S0qS8nY90vG/y0HsaGQsWxla Od6MxsgrM0Lo92L5rP8Jp8F+mojzkO1vKC5aXa+JuXy41TWzHMg= =Cce8 -----END PGP SIGNATURE----- --mfFqIFWuMAvOSpIq--