From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ed1-x531.google.com (mail-ed1-x531.google.com [IPv6:2a00:1450:4864:20::531]) by passt.top (Postfix) with ESMTPS id 985CF5A0279 for ; Tue, 13 Feb 2024 14:35:03 +0100 (CET) Received: by mail-ed1-x531.google.com with SMTP id 4fb4d7f45d1cf-56101dee221so8548a12.1 for ; Tue, 13 Feb 2024 05:35:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1707831303; x=1708436103; darn=passt.top; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=9D7WJAJOW3grtGcQd2/F+Hkudt5TMk0CwS0QtDNqp+M=; b=pa3Y8w24oEvWw1dzgKRKFxDLO3/lnMwKC+r/E0a1Kk8fhVqaA4s+COf50b8hypXlNT s7uTfuuU+NOUcE2bjWEwlozgQ5O5F+UdJs4IAluQAHOVrXBPC0qKmbCPA15LqCC6zKYy H9BqtHxABLju7i1EP1TKBuhEuymoFuczCxE12C1vs8xAPNXo5rm2eGbnnAE+tPu7h/0+ gRRJeq+jzUG9hX0/Ob5akVBb8B6BiKEHg4CXAjvui0WaxAyx1nbDjAm+RC0f7BQJi1SK V4HF19qJlK0VOjNHkmWcyxhgxJ5s9/9gb8R2ZsQ3n19Q+PvUdfpP8byHiowP/zZD2uYh 39/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707831303; x=1708436103; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9D7WJAJOW3grtGcQd2/F+Hkudt5TMk0CwS0QtDNqp+M=; b=fL2avb151bhO8SGXaL6P/cbePCeNOAGY2eVDcFcvgJJxv35ZR5ExEAiMArkyygsci9 diIgYuDtBw0MIV43X70PN7RStmNNuSSQiihaDUTvEDQrHRF7qAKuysO6K2E+3lO+8QIg UbXyjR43uqSVAnB4qB0RGAbz2LcTPtavFZXp2w+Ts5aA43x68Tpra7BgqkSPQEPL4Lwp txjZqcurOFGKdMqrKkcXN3qhvGmedLKNJb2Xt/U4T4AfNnENDfU2msXvMfTw3RQXLZkI Ka80r+GAPXAKxA/sgxHcwos0O1bYgAQavgTqFlLybnY/Rczh8pmQOdXW64lqA43T+kKV I57Q== X-Forwarded-Encrypted: i=1; AJvYcCVdj1A4/LqD3ptek1tlZXo77jUt+57kdxcdu5Vtgo4cvq3wqv21Txt2D7HzNYDprvV1A3GfQtRevJZMFThdfhNLGo6B X-Gm-Message-State: AOJu0YzXhYdHcBixidq7kuZQjHvWID49wtp2+TOvz6PwoqqW4186dxmS iGQ7+7BJvJLrG7Moe/Zi2NJrT0jvXFRXPmg0/SXO2tRL0L7UD/1wXZeTla2p65w1uUt3DnxZoKd MMzjfDiA74cebi06695UZgHQDAYBJZB9S3NHx X-Google-Smtp-Source: AGHT+IGC2WRBAQU3oaeio96VlSPAdkneNkZS5J51Sayp3w6GfyKzi2VO2lFGpSXSMiSAt/kt22ZOyoMyrNGRpX9BuAE= X-Received: by 2002:a50:9b5e:0:b0:55f:8851:d03b with SMTP id a30-20020a509b5e000000b0055f8851d03bmr115571edj.5.1707831302889; Tue, 13 Feb 2024 05:35:02 -0800 (PST) MIME-Version: 1.0 References: <20240209221233.3150253-1-jmaloy@redhat.com> <8d77d8a4e6a37e80aa46cd8df98de84714c384a5.camel@redhat.com> In-Reply-To: From: Eric Dumazet Date: Tue, 13 Feb 2024 14:34:48 +0100 Message-ID: Subject: Re: [PATCH v3] tcp: add support for SO_PEEK_OFF To: Paolo Abeni Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-MailFrom: edumazet@google.com X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation Message-ID-Hash: EX2SQI7TJNAW6TVI5OOSQQ6IKEVTHUYD X-Message-ID-Hash: EX2SQI7TJNAW6TVI5OOSQQ6IKEVTHUYD X-Mailman-Approved-At: Wed, 14 Feb 2024 00:18:46 +0100 CC: kuba@kernel.org, passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com, dgibson@redhat.com, jmaloy@redhat.com, netdev@vger.kernel.org, davem@davemloft.net X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Tue, Feb 13, 2024 at 2:02=E2=80=AFPM Paolo Abeni wro= te: > > On Tue, 2024-02-13 at 13:24 +0100, Eric Dumazet wrote: > > On Tue, Feb 13, 2024 at 11:49=E2=80=AFAM Paolo Abeni wrote: > > > > > > @@ -2508,7 +2508,10 @@ static int tcp_recvmsg_locked(struct sock *s= k, struct msghdr *msg, size_t len, > > > > WRITE_ONCE(*seq, *seq + used); > > > > copied +=3D used; > > > > len -=3D used; > > > > - > > > > + if (flags & MSG_PEEK) > > > > + sk_peek_offset_fwd(sk, used); > > > > + else > > > > + sk_peek_offset_bwd(sk, used); > > > > Yet another cache miss in TCP fast path... > > > > We need to move sk_peek_off in a better location before we accept this = patch. > > > > I always thought MSK_PEEK was very inefficient, I am surprised we > > allow arbitrary loops in recvmsg(). > > Let me double check I read the above correctly: are you concerned by > the 'skb_queue_walk(&sk->sk_receive_queue, skb) {' loop that could > touch a lot of skbs/cachelines before reaching the relevant skb? > > The end goal here is allowing an user-space application to read > incrementally/sequentially the received data while leaving them in > receive buffer. > > I don't see a better option than MSG_PEEK, am I missing something? This sk_peek_offset protocol, needing sk_peek_offset_bwd() in the non MSG_PEEK case is very strange IMO. Ideally, we should read/write over sk_peek_offset only when MSG_PEEK is used by the caller. That would only touch non fast paths. Since the API is mono-threaded anyway, the caller should not rely on the fact that normal recvmsg() call would 'consume' sk_peek_offset.