From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 125EB5A0277 for ; Mon, 29 Apr 2024 03:46:46 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1714355201; bh=f+nkifnMDuiuvmUVQfC16KQ2Vm8g1MyG6VBL0FbMFAk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=W/Ad6J3EYY71n5JYAE3KJyGAg8QFtyqbmvQ+bfQB+5FvSODJSlTO99e5V600+NrMm OEl9vnmwNbBGzKNppVWgVfCVq6tRG7L7+Cz3AAQuBNg0UfvotG8tqIVkup03+why2Y zpwpStsmyVCgzMfdCLXP+usiKd5F0Clo12P1UsWr6VVWVDHz5/qHzb3+js7y6fy865 ZEWjohYVovultjhXhXHpZrV/B9WJMa9e7kVm9Ds1rx9u/akJ57TuqEn9NCRT9a6Pxp e3G7RUlLVy9prvf6Y3rEENWRZxJu0eeQz0IPdDyLAJefgIEv/AFveRBY3c/AONFTWs bhIOVN5i2rO+w== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4VSR5F5vRFz4wyq; Mon, 29 Apr 2024 11:46:41 +1000 (AEST) Date: Mon, 29 Apr 2024 11:46:01 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 1/2] tcp: leverage support of SO_PEEK_OFF socket option when available Message-ID: References: <20240420191920.104876-1-jmaloy@redhat.com> <20240420191920.104876-2-jmaloy@redhat.com> <20240423195010.2b4d5c13@elisabeth> <20240424203044.2df748d7@elisabeth> <20240426075832.093aac78@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="NUs7KNPhqhrniWrn" Content-Disposition: inline In-Reply-To: <20240426075832.093aac78@elisabeth> Message-ID-Hash: XIHM5RN3EWSZQ2U5LZBBHISVYDST2NJT X-Message-ID-Hash: XIHM5RN3EWSZQ2U5LZBBHISVYDST2NJT X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Jon Maloy , passt-dev@passt.top, lvivier@redhat.com, dgibson@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --NUs7KNPhqhrniWrn Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Apr 26, 2024 at 07:58:32AM +0200, Stefano Brivio wrote: > On Fri, 26 Apr 2024 13:27:11 +1000 > David Gibson wrote: >=20 > > On Wed, Apr 24, 2024 at 08:30:44PM +0200, Stefano Brivio wrote: > > > On Wed, 24 Apr 2024 10:48:05 +1000 > > > David Gibson wrote: > > > =20 > > > > On Tue, Apr 23, 2024 at 07:50:10PM +0200, Stefano Brivio wrote: =20 > > > > > On Sat, 20 Apr 2024 15:19:19 -0400 > > > > > Jon Maloy wrote: =20 > > > > [snip] =20 > > > > > > + set_peek_offset(s, 0); =20 > > > > >=20 > > > > > Do we really need to initialise it to zero on a new connection? E= xtra > > > > > system calls on this path matter for latency of connection > > > > > establishment. =20 > > > >=20 > > > > Sort of, yes: we need to enable the SO_PEEK_OFF behaviour by setting > > > > it to 0, rather than the default -1. =20 > > >=20 > > > By the way of which, this is not documented at this point -- a man pa= ge > > > patch (linux-man and linux-api lists) would be nice. > > > =20 > > > > We could lazily enable it, but > > > > we'd need either to a) do it later in the handshake (maybe when we = set > > > > ESTABLISHED), but we'd need to be careful it is always set before t= he > > > > first MSG_PEEK =20 > > >=20 > > > I was actually thinking that we could set it only as we receive data > > > (not every connection will receive data), and keep this out of the > > > handshake (which we want to keep "faster", I think). =20 > >=20 > > That makes sense, but I think it would need a per-connection flag. >=20 > Definitely. >=20 > > > And setting it as we mark a connection as ESTABLISHED should have the > > > same effect on latency as setting it on a new connection -- that's not > > > really lazy. So, actually: =20 > >=20 > > Good point. > >=20 > > > > or b) keep track of whether it's set on a per-socket > > > > basis (this would have the advantage of robustness if we ever > > > > encountered a kernel that weirdly allows it for some but not all TCP > > > > sockets). =20 > > >=20 > > > ...this could be done as we receive data in tcp_data_from_sock(), with > > > a new flag in tcp_tap_conn::flags, to avoid adding latency to the > > > handshake. It also looks more robust to me, and done/checked in a > > > single place where we need it. > > >=20 > > > We have just three bits left there which isn't great, but if we need = to > > > save one at a later point, we can drop this new flag easily. =20 > >=20 > > I just realised that folding the feature detection into this is a bit > > costlier than I thought. If we globally probe the feature we just > > need one bit per connection: is SO_PEEK_OFF set yet or not. If we > > tried to probe per-connection we'd need a tristate: haven't tried / > > SO_PEEK_OFF enabled / tried and failed. >=20 > I forgot to mention this part: what I wanted to propose was actually > still a global probe, so that we don't waste one system call per > connection on kernels not supporting this (a substantial use case for a > couple of years from now?), which probably outweighs the advantage of > the weird, purely theoretical kernel not supporting the feature for > some sockets only. > And then something like PEEK_OFFSET_SET (SO_PEEK_OFF_SET sounds awkward > to me) on top. Another advantage is avoiding the tristate you described. Right, having thought it through I agree this is a better approach. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --NUs7KNPhqhrniWrn Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmYu+9gACgkQzQJF27ox 2GfgaQ/9ES65i3WDhGs8uuvGDx9+EXHmAXn4imCMI6uKBuCg3WfATIsZkpte5O4X kZy++u9BFupudvJdNv4ZXOuOBImaIULNYL0YeXLL89Wdm3RXda1yvhm6mnlLfnbJ fatoep0vtn8Yk8XwtSTJfHP1rLc+si2h2IwNmpRm4EUm1tVdzKswYa5JS/Iu6sL5 MvJKyzS6kW2Hr4m8KV1Itb4rEd2chpO+C0miPvB06EJz4QWNIbokn5q1XOhuDem7 HM/N380mdP3T1SIGHdbIc7BKhS6Ud29w0Jfz4+fLNLoHA2WscI3cBP3PFG1jE3O+ pT7AyEiDCvjdkVY98qLWUf9sX1NZKqU5U3fHmGtR0f5/G2CQlb5cwS6vm5elvbHi WBORYSFuZvZHeQEG6Sm14txRMFDnhyQIyXraj1O05HIRJMMPhhcolWeetpgMSpkX k5zwrUeGWR0jzvG1geBUm0y2SLzW/LAGlj0/zO+hu+gc4hIjKV9i/xJURSMEqKRG NtJ2FLruaJfghBDVwlBbptO2FQPZYcwUgeaCf6ITZNK/11r7ZfJ80O0szlVR/Ono QHyM275M519prsE8+dxIoM6DHR/9HM7cNv3IrlRvhG+nYyETsBHqf1zJN08869Az j/82BIoBIgSMu3sXWq9i1Hs6rQVn0eudb4qFpjAcHWB6x0mnxCg= =aeCX -----END PGP SIGNATURE----- --NUs7KNPhqhrniWrn--