From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ed1-x529.google.com (mail-ed1-x529.google.com [IPv6:2a00:1450:4864:20::529]) by passt.top (Postfix) with ESMTPS id EDF0A5A0279 for ; Tue, 13 Feb 2024 16:49:14 +0100 (CET) Received: by mail-ed1-x529.google.com with SMTP id 4fb4d7f45d1cf-561f0f116ecso11124a12.0 for ; Tue, 13 Feb 2024 07:49:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1707839354; x=1708444154; darn=passt.top; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=btl9AilTHs6i0wOMaqaeeQt6wBIRS4ri8unbNibUxxI=; b=2TyZTi9NMO2DEGAcyrdAascJqOtbHD1cWpk5BXQ6Xqxf1zVqaWGwVo5Zm6mhstfizt R7XXul+6I02x7w+aiv+a9LYeSMkoanqeFE3RmUkoB6Z/AME6qBXEZ1Xfq15oOEPSvPnT G/MKdEfITxnNDchobO3HQkCCk4PzexxaA9+vjSNY5yVpRfO5t3jSwqSYJmgAd/D4oNb1 Ejzd8Wl2kkdkbmxYnfjlu5yF0puYyck/77saCsc916pcYQyQtu0sKKPwOvpUcxMVFdQt /I0pxlpyFpUForNkZ720ZarvGc5VooGmhf+rAjQJ8v7vEr27QxOtEa2Z3+Hpii1Y6mcy Gk6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707839354; x=1708444154; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=btl9AilTHs6i0wOMaqaeeQt6wBIRS4ri8unbNibUxxI=; b=JYhn6kTU0TvJdixFtW7pZy6ShGXaukt/OP6T28n7B01kdgzJYYiWeyCzS6hqX+qaXk r23Sj20SZf0Ed1C05/j3iWvUedz3xqsB2AxQdjC1KAIUOgN4+V1O7qPu1AutQCOku7Dd LD+ptJ6pyc7K6wSOPpEy9oHdtXehYz6hYVx/QTPosI56LccgTLxyWhstnl9uekOjym1G XasaxRcPgqlnRlXLpEpPhjnJoQrrfR+TTAA1xE30abl0XoLEUWph855wm5k1uLutemGC MLR+/eAwOm0iofm+lFZPgGFdTqYdwZ9hAzx9rAPk6EeU5EpL9OyD9AlHaq/4ID4POfTG rQmw== X-Forwarded-Encrypted: i=1; AJvYcCUU9M/e6IAlK7qvYr4yiX/kOVCRkzdDFXi6PoZNLzTjpv7ol9ZopSVZLWoo6siTgQWrNUjhmQgPISKZZjH9nezszJ9Q X-Gm-Message-State: AOJu0YyMDii9S91XygWp7q8VkatB5viNcTetEtCQiBDfvd0NJxlaAHHS VO8TEEZXu8iW4WiJQxxsyTo1nv++wWBggBjm72ESOJP8QnkP+Kh4WJfEuvKudh81qc8wCCSwKsL 6hF/IoMeAW7cBdz0gLYRGymQ6Lp4uZNPGkE/o X-Google-Smtp-Source: AGHT+IFtPWqlxfv8kQukkyro94nEzUKIQRWiIhsat1P80UHejThOk10R45cuLnJjpk1JyEp7KgHhd/zgVDlklcZJN9Q= X-Received: by 2002:a50:8ad6:0:b0:55f:9918:dadd with SMTP id k22-20020a508ad6000000b0055f9918daddmr1251edk.2.1707839354237; Tue, 13 Feb 2024 07:49:14 -0800 (PST) MIME-Version: 1.0 References: <20240209221233.3150253-1-jmaloy@redhat.com> <8d77d8a4e6a37e80aa46cd8df98de84714c384a5.camel@redhat.com> <20072ba530b34729589a3d527c420a766b49e205.camel@redhat.com> In-Reply-To: <20072ba530b34729589a3d527c420a766b49e205.camel@redhat.com> From: Eric Dumazet Date: Tue, 13 Feb 2024 16:49:01 +0100 Message-ID: Subject: Re: [PATCH v3] tcp: add support for SO_PEEK_OFF To: Paolo Abeni Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-MailFrom: edumazet@google.com X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation Message-ID-Hash: 4DUGQ25YTJGIY6LSPHXYYUD77Y4L5C5Q X-Message-ID-Hash: 4DUGQ25YTJGIY6LSPHXYYUD77Y4L5C5Q X-Mailman-Approved-At: Wed, 14 Feb 2024 00:18:46 +0100 CC: kuba@kernel.org, passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com, dgibson@redhat.com, jmaloy@redhat.com, netdev@vger.kernel.org, davem@davemloft.net X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Tue, Feb 13, 2024 at 4:28=E2=80=AFPM Paolo Abeni wro= te: > > On Tue, 2024-02-13 at 14:34 +0100, Eric Dumazet wrote: > > On Tue, Feb 13, 2024 at 2:02=E2=80=AFPM Paolo Abeni = wrote: > > > > > > On Tue, 2024-02-13 at 13:24 +0100, Eric Dumazet wrote: > > > > On Tue, Feb 13, 2024 at 11:49=E2=80=AFAM Paolo Abeni wrote: > > > > > > > > > > @@ -2508,7 +2508,10 @@ static int tcp_recvmsg_locked(struct soc= k *sk, struct msghdr *msg, size_t len, > > > > > > WRITE_ONCE(*seq, *seq + used); > > > > > > copied +=3D used; > > > > > > len -=3D used; > > > > > > - > > > > > > + if (flags & MSG_PEEK) > > > > > > + sk_peek_offset_fwd(sk, used); > > > > > > + else > > > > > > + sk_peek_offset_bwd(sk, used); > > > > > > > > Yet another cache miss in TCP fast path... > > > > > > > > We need to move sk_peek_off in a better location before we accept t= his patch. > > > > > > > > I always thought MSK_PEEK was very inefficient, I am surprised we > > > > allow arbitrary loops in recvmsg(). > > > > > > Let me double check I read the above correctly: are you concerned by > > > the 'skb_queue_walk(&sk->sk_receive_queue, skb) {' loop that could > > > touch a lot of skbs/cachelines before reaching the relevant skb? > > > > > > The end goal here is allowing an user-space application to read > > > incrementally/sequentially the received data while leaving them in > > > receive buffer. > > > > > > I don't see a better option than MSG_PEEK, am I missing something? > > > > > > This sk_peek_offset protocol, needing sk_peek_offset_bwd() in the non > > MSG_PEEK case is very strange IMO. > > > > Ideally, we should read/write over sk_peek_offset only when MSG_PEEK > > is used by the caller. > > > > That would only touch non fast paths. > > > > Since the API is mono-threaded anyway, the caller should not rely on > > the fact that normal recvmsg() call > > would 'consume' sk_peek_offset. > > Storing in sk_peek_seq the tcp next sequence number to be peeked should > avoid changes in the non MSG_PEEK cases. > > AFAICS that would need a new get_peek_off() sock_op and a bit somewhere > (in sk_flags?) to discriminate when sk_peek_seq is actually set. Would > that be acceptable? We could have a parallel SO_PEEK_OFFSET option, reusing the same socket fie= ld. The new semantic would be : Supported by TCP (so far), and tcp recvmsg() only reads/writes this field when MSG_PEEK is used. Applications would have to clear the values themselves. BTW I see the man pages say SO_PEEK_OFF is "is currently supported only for unix(7) sockets"