From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 8F4ED5A027C for ; Wed, 14 Feb 2024 00:35:07 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1707867296; bh=3/1iP3HNwb1vm+Hd3oAF3/M+8QwR+TUmgRrX+ZzY2AE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=sK9Chtl7iFuSICOEniUcsJmNGMzxn0yXcczGF+m8St3hoclYHXOh1O6i3ECQEWH6/ VvAVAHx9dT5Xw0IB6yTeK3qQJ09hIubqCUpCqjqVRbr071j1DsVD1GpXnRugHRIB6W aX1qGlcE4pnlsAEiOJOKJJf4W2EAcQW3eVsr5QABSsXpNyN3SUf9pqDfvq96s3u4F5 J7juikRQg3OEIZ/fTYnQ3TokQzu3Ab86kIIXYuNGwN8YJ1J2JYpmwVusihA9ye6DyY mEC23E3h3RsqrDsMYj6EK6HKBYTKkzW712LPPdonFiGZajC0zKce99G1CdGM69i6Es 9btJhSZWmOPHQ== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4TZHjr2TNnz4wcr; Wed, 14 Feb 2024 10:34:56 +1100 (AEDT) Date: Wed, 14 Feb 2024 10:34:50 +1100 From: David Gibson To: Eric Dumazet Subject: Re: [PATCH v3] tcp: add support for SO_PEEK_OFF Message-ID: References: <20240209221233.3150253-1-jmaloy@redhat.com> <8d77d8a4e6a37e80aa46cd8df98de84714c384a5.camel@redhat.com> <20072ba530b34729589a3d527c420a766b49e205.camel@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="lI5RsEVqDHCNRRnC" Content-Disposition: inline In-Reply-To: Message-ID-Hash: PU3HLF2EPH7EX3UIL4CWF5N3CIS6ERMO X-Message-ID-Hash: PU3HLF2EPH7EX3UIL4CWF5N3CIS6ERMO X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Paolo Abeni , kuba@kernel.org, passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com, dgibson@redhat.com, jmaloy@redhat.com, netdev@vger.kernel.org, davem@davemloft.net X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --lI5RsEVqDHCNRRnC Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Feb 13, 2024 at 04:49:01PM +0100, Eric Dumazet wrote: > On Tue, Feb 13, 2024 at 4:28=E2=80=AFPM Paolo Abeni w= rote: > > > > On Tue, 2024-02-13 at 14:34 +0100, Eric Dumazet wrote: > > > On Tue, Feb 13, 2024 at 2:02=E2=80=AFPM Paolo Abeni wrote: > > > > > > > > On Tue, 2024-02-13 at 13:24 +0100, Eric Dumazet wrote: > > > > > On Tue, Feb 13, 2024 at 11:49=E2=80=AFAM Paolo Abeni wrote: > > > > > > > > > > > > @@ -2508,7 +2508,10 @@ static int tcp_recvmsg_locked(struct s= ock *sk, struct msghdr *msg, size_t len, > > > > > > > WRITE_ONCE(*seq, *seq + used); > > > > > > > copied +=3D used; > > > > > > > len -=3D used; > > > > > > > - > > > > > > > + if (flags & MSG_PEEK) > > > > > > > + sk_peek_offset_fwd(sk, used); > > > > > > > + else > > > > > > > + sk_peek_offset_bwd(sk, used); > > > > > > > > > > Yet another cache miss in TCP fast path... > > > > > > > > > > We need to move sk_peek_off in a better location before we accept= this patch. > > > > > > > > > > I always thought MSK_PEEK was very inefficient, I am surprised we > > > > > allow arbitrary loops in recvmsg(). > > > > > > > > Let me double check I read the above correctly: are you concerned by > > > > the 'skb_queue_walk(&sk->sk_receive_queue, skb) {' loop that could > > > > touch a lot of skbs/cachelines before reaching the relevant skb? > > > > > > > > The end goal here is allowing an user-space application to read > > > > incrementally/sequentially the received data while leaving them in > > > > receive buffer. > > > > > > > > I don't see a better option than MSG_PEEK, am I missing something? > > > > > > > > > This sk_peek_offset protocol, needing sk_peek_offset_bwd() in the non > > > MSG_PEEK case is very strange IMO. > > > > > > Ideally, we should read/write over sk_peek_offset only when MSG_PEEK > > > is used by the caller. > > > > > > That would only touch non fast paths. > > > > > > Since the API is mono-threaded anyway, the caller should not rely on > > > the fact that normal recvmsg() call > > > would 'consume' sk_peek_offset. > > > > Storing in sk_peek_seq the tcp next sequence number to be peeked should > > avoid changes in the non MSG_PEEK cases. > > > > AFAICS that would need a new get_peek_off() sock_op and a bit somewhere > > (in sk_flags?) to discriminate when sk_peek_seq is actually set. Would > > that be acceptable? >=20 > We could have a parallel SO_PEEK_OFFSET option, reusing the same socket f= ield. >=20 > The new semantic would be : Supported by TCP (so far), and tcp > recvmsg() only reads/writes this field when MSG_PEEK is used. > Applications would have to clear the values themselves. Those semantics would likely defeat the purpose of using SO_PEEK_OFF for our use case, since we'd need an additional setsockopt() for every non-PEEK recv() (which are all MSG_TRUNC in our case). > BTW I see the man pages say SO_PEEK_OFF is "is currently supported > only for unix(7) sockets" Yes, this patch is explicitly aiming to change that. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --lI5RsEVqDHCNRRnC Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmXL/HMACgkQzQJF27ox 2GdsixAAoZ5PJR+8aM96Vzj7TC/TuVc4buPXC0Hx6Q7rEkbNOaWZJHTRziyzbO8x edfyHa+383wtNI3XkOyExm5HPS5X59Smc0WKvpukm1hv/JsWHhjf3dGuSoO+HKgy 3D6mfuXfCz3rCTpJHqpkEvMk084771XzDE4KfSUdRVu3m3Bhl7HnAlhBS0PssOMr c2cHxmuzFASfk6lMO5jqRjZkgFN0o5LOL14/EoWLZBN/c2VkIh0Ie1FeyqsZQfa8 GDS5StjkJKsmXWL4Wc0lVQHxorqIPjDDLsT/8IBDn7aYVqTzHKSo63WpklXKzNeh pcNWU9d7Jz6wcLlPVKAbzDrOcJhhg82oS+RQPGAxu2lH6r+h6rUV5TnOvAZf3mZj eWJjuJEO6uOpn4grP+4jQ49sDCOWTIPvkBqjGmQP8XjDadIzwerZ1w7+w+k4zSVd GD7OutqQP2UOpILXEEct6VVICm4GGzUbH8/fnqJlSh38fw/1PJjJiTrOg2SL2OoU yDZnFYi9xDhy60xd0vq8wSqNDxKwbEgNOF3RVgfF9UaDdOv70PqntOmMjG5UKB40 S/nA2sQohEVgUoqizCW7H3oWtpzIgpeQ428BCNIfda8NwiKnQhGxnpu3j1miOngs 8Hy7Q1zfPoeG68c6bke8PAcHHlZyzhjgSsO17yWD/iQVigLM6qE= =7qtX -----END PGP SIGNATURE----- --lI5RsEVqDHCNRRnC--