From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ed1-x52e.google.com (mail-ed1-x52e.google.com [IPv6:2a00:1450:4864:20::52e]) by passt.top (Postfix) with ESMTPS id D8C9F5A027C for ; Tue, 13 Feb 2024 20:31:48 +0100 (CET) Received: by mail-ed1-x52e.google.com with SMTP id 4fb4d7f45d1cf-561f0f116ecso15432a12.0 for ; Tue, 13 Feb 2024 11:31:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1707852708; x=1708457508; darn=passt.top; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=CVe2S+43wcZf1yVZqzunJxN7pTPJqoR3PVQTtg6Fxq8=; b=Q1vMKRrvpnQ634FR3Ad4w2hhR4Kl9+aiOs2qBGjh+QnbJCptiEYzkkXbl9DTbutUvO A4iYAyUqMp9YqvTYK8yGOSXippx8tPa8BiS26qjgRi6So0fTR/28RWHG30KRckQDqPcV ijYt6KIRDhybmoYAmqD5s27gHnD1z4TLkgPdd7SBQPXvfPBHrD9W5n8Sv3CXdKGnEx59 vusAX8eDNrN6GrLS1OBSIbcyCWf5eNxE9G5Qr2vfaYjoUSsfLIBHegCLZnjYp5B/3w5k UOUcQvFmzb/W5tSrxbbj9wtR2gOeGW6W49sna8tYBUoprnCYEhapYfOIkCw/011kKUAJ zWCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707852708; x=1708457508; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CVe2S+43wcZf1yVZqzunJxN7pTPJqoR3PVQTtg6Fxq8=; b=v+hEryuygNBl8JHzoV2VVTswMAGpFa5Q/56yzeqbg+z6TyZtj0G8oSoj52rzTGWAI9 ZKaTOxrOELuxJHSGQt2sLZlIo99nNY3of/G3QQsMgXM4ARToQekZlUqiAyTVfrjvup9A LNwDOd7uQg5mOo1HEhnSWZIvwq8IyNWoODUfkxViFzGLmKoSZPC2olUR2rVLFIaxRODc vgKvhX/yi0i3wK3gEOovHletEgu1jPVAOhpO9069OQvN5iNd+Ogh0hzsRm14CYaNk0p8 0GZsIqFpj1i0Il3D8StIAocxRSXzhRmka1sPATmuTeSopYkBe5aJFbg9IgtSDwmTJruX YYJA== X-Forwarded-Encrypted: i=1; AJvYcCVgqSU+ie1p02E6685NiU4EVww2r+8TkePzczpCSWvN+tqff/NisO7iKH4BVyPMAKfTKgb9mo98kP/VfY8G+opcVfyC X-Gm-Message-State: AOJu0YyuuvuVczFvPMIzlF3FpL4lX2FabSGio9wi6aEo8TKoFIvEBEgh ABqlj5vw1FE2rL+m2y5u1lRIm9eeRfma1F5M+bAT3QtkmvEPHC6qoPFuh4V8rcrJN8LnBI4h8i/ 0xMgp3l5w2HgZmsNBVeG+fwpCAD1c4VcnRSXi X-Google-Smtp-Source: AGHT+IF2MZMV7QDxWC/1F2duVCX9Tpa+oXcNv2c0wb4z4FzyRI5UZGOxoKnIPzP/dEceS//xa7Lkx8NJ+VbbEo625iY= X-Received: by 2002:a50:d781:0:b0:560:ea86:4d28 with SMTP id w1-20020a50d781000000b00560ea864d28mr54946edi.4.1707852707675; Tue, 13 Feb 2024 11:31:47 -0800 (PST) MIME-Version: 1.0 References: <20240209221233.3150253-1-jmaloy@redhat.com> <8d77d8a4e6a37e80aa46cd8df98de84714c384a5.camel@redhat.com> <20072ba530b34729589a3d527c420a766b49e205.camel@redhat.com> <725a92b4813242549f2316e6682d3312b5e658d8.camel@redhat.com> In-Reply-To: <725a92b4813242549f2316e6682d3312b5e658d8.camel@redhat.com> From: Eric Dumazet Date: Tue, 13 Feb 2024 20:31:34 +0100 Message-ID: Subject: Re: [PATCH v3] tcp: add support for SO_PEEK_OFF To: Paolo Abeni Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-MailFrom: edumazet@google.com X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation Message-ID-Hash: WDJOFQGUPYEASM7TTSJE5DNNJDUNFBBI X-Message-ID-Hash: WDJOFQGUPYEASM7TTSJE5DNNJDUNFBBI X-Mailman-Approved-At: Wed, 14 Feb 2024 00:18:46 +0100 CC: kuba@kernel.org, passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com, dgibson@redhat.com, jmaloy@redhat.com, netdev@vger.kernel.org, davem@davemloft.net X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Tue, Feb 13, 2024 at 7:39=E2=80=AFPM Paolo Abeni wro= te: > > On Tue, 2024-02-13 at 16:49 +0100, Eric Dumazet wrote: > > On Tue, Feb 13, 2024 at 4:28=E2=80=AFPM Paolo Abeni = wrote: > > > On Tue, 2024-02-13 at 14:34 +0100, Eric Dumazet wrote: > > > > This sk_peek_offset protocol, needing sk_peek_offset_bwd() in the = non > > > > MSG_PEEK case is very strange IMO. > > > > > > > > Ideally, we should read/write over sk_peek_offset only when MSG_PEE= K > > > > is used by the caller. > > > > > > > > That would only touch non fast paths. > > > > > > > > Since the API is mono-threaded anyway, the caller should not rely o= n > > > > the fact that normal recvmsg() call > > > > would 'consume' sk_peek_offset. > > > > > > Storing in sk_peek_seq the tcp next sequence number to be peeked shou= ld > > > avoid changes in the non MSG_PEEK cases. > > > > > > AFAICS that would need a new get_peek_off() sock_op and a bit somewhe= re > > > (in sk_flags?) to discriminate when sk_peek_seq is actually set. Woul= d > > > that be acceptable? > > > > We could have a parallel SO_PEEK_OFFSET option, reusing the same socket= field. > > > > The new semantic would be : Supported by TCP (so far), and tcp > > recvmsg() only reads/writes this field when MSG_PEEK is used. > > Applications would have to clear the values themselves. > > I feel like there is some misunderstanding, or at least I can't follow. > Let me be more verbose, to try to clarify my reasoning. > > Two consecutive recvmsg(MSG_PEEK) calls for TCP after SO_PEEK_OFF will > return adjacent data. AFAICS this is the same semantic currently > implemented by UDP and unix sockets. > > Currently 'sk_peek_off' maintains the next offset to be peeked into the > current receive queue. To implement the above behaviour, tcp_recvmsg() > has to update 'sk_peek_off' after MSG_PEEK, to move the offset to the > next data, and after a plain read, to account for the data removed from > the receive queue. > > I proposed to let introduce a tcp-specific set_peek_off doing something > alike: > > WRTIE_ONCE(sk->sk_peek_off, tcp_sk(sk)->copied_seq + val); > > so that the recvmsg will need to update sk_peek_off only for MSG_PEEK, > while retaining the semantic described above. > > To keep the userspace interface unchanged that will need a paired > tcp_get_peek_off(), so that getsockopt(SO_PEEK_OFF) could return to the > user a plain offset. An additional bit flag will be needed to store the > information "the user-space enabled peek with offset". > > I don't understand how a setsockopt(PEEK_OFFSET) variant would help > avoiding touching sk->sk_peek_offset? > I was trying to avoid using an extra storage, I was not trying to implement the alternative myself :0) If the recvmsg( MSG_PEEK) is supposed to auto-advance the peek_offset, we probably need more than a mere 32bit field. > Thanks! > > Paolo >