From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTP id 93C615A0279 for ; Tue, 13 Feb 2024 16:28:43 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707838122; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=BECdebsetLJD8oehNBFBOebgbzStfrCRQ78NrPHXYM4=; b=NKkEcQDWSvBomjaqhn8Ce/ciwLOauixbZ0WEJVjk/vSW4hX1qzmJOVxhShZvty5xWoPYQG 7cr4lWkDwWLxjhk6qAQFuibiffMzmeeRwCoNQKg79pBxvoCCUAvOfM/AM3QfgoUmyNyA17 Gd+n7pf/901HFTW+LnWgqOIQ0O3ln74= Received: from mail-lj1-f197.google.com (mail-lj1-f197.google.com [209.85.208.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-610-Bc1s6c8nM0GgTm0GSIU_Fw-1; Tue, 13 Feb 2024 10:28:40 -0500 X-MC-Unique: Bc1s6c8nM0GgTm0GSIU_Fw-1 Received: by mail-lj1-f197.google.com with SMTP id 38308e7fff4ca-2d0af6e4540so17614631fa.0 for ; Tue, 13 Feb 2024 07:28:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707838119; x=1708442919; h=mime-version:user-agent:content-transfer-encoding:autocrypt :references:in-reply-to:date:cc:to:from:subject:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=BECdebsetLJD8oehNBFBOebgbzStfrCRQ78NrPHXYM4=; b=LbBn1F/Pm/ypQFGu4emj5VSrwRSaqw4hllJhnFMb3w1zNx+gzA/iXnbAC4COv0guwR h6DkP37Nd3sfqtI5iVI9bMtsOzBnErwco4pXrZSveGLUnuHFWqG8+XjrOUrwr3Aej/bJ fVlD77RHxJbfhQGqWfD+hyr+j65Wfppjolq8aGGF/JAa9Br+bRZT3Y+3kG2cPd+cA6PD pFQr7GPT5+3MWtRT19FFMZsp4SRrEl5ZiK+RkcxrofoiT9mfoJ5BJyvFCpoqfBGB+8xY BXBTSYIWISLnRc7Q+UI3ofVaxTszztPxH+OBoVu6v4UdW1Nf5xH6jinmN+A+hyT3O0Ts BFhg== X-Forwarded-Encrypted: i=1; AJvYcCXJMbv+eeYCGpuQ2t1Co9sHYLsI7qz5vrHhhrYT+Tb5Ef0qdC2jMtTW4UgvVXDmMj0ZTGJGLIBxBs9b9dy2Sp2oZm1A X-Gm-Message-State: AOJu0YzlB3UJjjcNeeaCuNB5O7nv1gAh/b1PaMwTFKHSkv4ZtYmaA4ed JJI8f/6szz817eAY59Pgc5xwB3PHIlAV3GEpRwj1frMqeOQsHT3uywy5jZiPIzSf1ZtmYHUYkPm cwfRNaxWaO8c13DpB1EAgegj/aRi9wdeUO8Xz79zkNFce/QGgQw== X-Received: by 2002:a2e:3c03:0:b0:2d0:a258:3003 with SMTP id j3-20020a2e3c03000000b002d0a2583003mr5950296lja.2.1707838119404; Tue, 13 Feb 2024 07:28:39 -0800 (PST) X-Google-Smtp-Source: AGHT+IHBRFQYXPYRfBqsvlLZ864EaPJnS9kgHdpllD35YvhK26M56+OLprAFPo1g2GNaYPLBy77BHA== X-Received: by 2002:a2e:3c03:0:b0:2d0:a258:3003 with SMTP id j3-20020a2e3c03000000b002d0a2583003mr5950280lja.2.1707838119042; Tue, 13 Feb 2024 07:28:39 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCUTf3zufGqHmogGLxVtaUFMol2sWlKhjtd2DwmoR0yQOnCUuuwxAwBMW3VFNbmxz/dblRi9Nl65I4UEje8rYAP1wiAR6IY/81sIZ0VTiCXx2+HzPzGyZkDIqD+/oqQCBkJ2sq9at797/P5c/B8gP62U9DAwyzpGva4ALWO453HDoGGUmKXbCJLCrVOtzRcDzy3a2FWHYRAmKr4C536MKLF3SGmCDVxjPtMn1qQkZnNeNGgqoO4DRxxnvGZnArUGgg== Received: from gerbillo.redhat.com (146-241-230-54.dyn.eolo.it. [146.241.230.54]) by smtp.gmail.com with ESMTPSA id e12-20020a05600c4e4c00b00411c9c0ede4sm1464622wmq.7.2024.02.13.07.28.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Feb 2024 07:28:38 -0800 (PST) Message-ID: <20072ba530b34729589a3d527c420a766b49e205.camel@redhat.com> Subject: Re: [PATCH v3] tcp: add support for SO_PEEK_OFF From: Paolo Abeni To: Eric Dumazet Date: Tue, 13 Feb 2024 16:28:36 +0100 In-Reply-To: References: <20240209221233.3150253-1-jmaloy@redhat.com> <8d77d8a4e6a37e80aa46cd8df98de84714c384a5.camel@redhat.com> Autocrypt: addr=pabeni@redhat.com; prefer-encrypt=mutual; keydata=mQINBGISiDUBEAC5uMdJicjm3ZlWQJG4u2EU1EhWUSx8IZLUTmEE8zmjPJFSYDcjtfGcbzLPb63BvX7FADmTOkO7gwtDgm501XnQaZgBUnCOUT8qv5MkKsFH20h1XJyqjPeGM55YFAXc+a4WD0YyO5M0+KhDeRLoildeRna1ey944VlZ6Inf67zMYw9vfE5XozBtytFIrRyGEWkQwkjaYhr1cGM8ia24QQVQid3P7SPkR78kJmrT32sGk+TdR4YnZzBvVaojX4AroZrrAQVdOLQWR+w4w1mONfJvahNdjq73tKv51nIpu4SAC1Zmnm3x4u9r22mbMDr0uWqDqwhsvkanYmn4umDKc1ZkBnDIbbumd40x9CKgG6ogVlLYeJa9WyfVMOHDF6f0wRjFjxVoPO6p/ZDkuEa67KCpJnXNYipLJ3MYhdKWBZw0xc3LKiKc+nMfQlo76T/qHMDfRMaMhk+L8gWc3ZlRQFG0/Pd1pdQEiRuvfM5DUXDo/YOZLV0NfRFU9SmtIPhbdm9cV8Hf8mUwubihiJB/9zPvVq8xfiVbdT0sPzBtxW0fXwrbFxYAOFvT0UC2MjlIsukjmXOUJtdZqBE3v3Jf7VnjNVj9P58+MOx9iYo8jl3fNd7biyQWdPDfYk9ncK8km4skfZQIoUVqrWqGDJjHO1W9CQLAxkfOeHrmG29PK9tHIwARAQABtB9QYW9sbyBBYmVuaSA8cGFiZW5pQHJlZGhhdC5jb20+iQJSBBMBCAA8FiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmISiDUCGwMFCwkIBwIDIgIBBhUKCQgLAgQWAgMBAh4HAheAAAoJECkkeY3MjxOkJSYQAJcc6MTsuFxYdYZkeWjW//zbD3ApRHzpNlHLVSuJqHr9/aDS+tyszgS8jj9MiqALzgq4iZbg 7ZxN9ZsDL38qVIuFkSpgMZCiUHdxBC11J8nbBSLlpnc924UAyr5XrGA99 6Wl5I4Km3128GY6iAkH54pZpOmpoUyBjcxbJWHstzmvyiXrjA2sMzYjt3Xkqp0cJfIEekOi75wnNPofEEJg28XPcFrpkMUFFvB4Aqrdc2yyR8Y36rbw18sIX3dJdomIP3dL7LoJi9mfUKOnr86Z0xltgcLPGYoCiUZMlXyWgB2IPmmcMP2jLJrusICjZxLYJJLofEjznAJSUEwB/3rlvFrSYvkKkVmfnfro5XEr5nStVTECxfy7RTtltwih85LlZEHP8eJWMUDj3P4Q9CWNgz2pWr1t68QuPHWaA+PrXyasDlcRpRXHZCOcvsKhAaCOG8TzCrutOZ5NxdfXTe3f1jVIEab7lNgr+7HiNVS+UPRzmvBc73DAyToKQBn9kC4jh9HoWyYTepjdcxnio0crmara+/HEyRZDQeOzSexf85I4dwxcdPKXv0fmLtxrN57Ae82bHuRlfeTuDG3x3vl/Bjx4O7Lb+oN2BLTmgpYq7V1WJPUwikZg8M+nvDNcsOoWGbU417PbHHn3N7yS0lLGoCCWyrK1OY0QM4EVsL3TjOfUtCNQYW9sbyBBYmVuaSA8cGFvbG8uYWJlbmlAZ21haWwuY29tPokCUgQTAQgAPBYhBINQI6gu+8G3S19i2ykkeY3MjxOkBQJiEoitAhsDBQsJCAcCAyICAQYVCgkICwIEFgIDAQIeBwIXgAAKCRApJHmNzI8TpBzHD/45pUctaCnhee1vkQnmStAYvHmwrWwIEH1lzDMDCpJQHTUQOOJWDAZOFnE/67bxSS81Wie0OKW2jvg1ylmpBA0gPpnzIExQmfP72cQ1TBoeVColVT6Io35BINn+ymM7c0Bn8RvngSEpr3jBtqvvWXjvtnJ5/HbOVQCg62NC6ewosoKJPWpGXMJ9SKsVIOUHsmoWK60spzeiJoSmAwm3zTJQnM5kRh2q iWjoCy8L35zPqR5TV+f5WR5hTVCqmLHSgm1jxwKhPg9L+GfuE4d0SWd84y GeOB3sSxlhWsuTj1K6K3MO9srD9hr0puqjO9sAizd0BJP8ucf/AACfrgmzIqZXCfVS7jJ/M+0ic+j1Si3yY8wYPEi3dvbVC0zsoGj9n1R7B7L9c3g1pZ4L9ui428vnPiMnDN3jh9OsdaXeWLvSvTylYvw9q0DEXVQTv4/OkcoMrfEkfbXbtZ3PRlAiddSZA5BDEkkm6P9KA2YAuooi1OD9d4MW8LFAeEicvHG+TPO6jtKTacdXDRe611EfRwTjBs19HmabSUfFcumL6BlVyceIoSqXFe5jOfGpbBevTZtg4kTSHqymGb6ra6sKs+/9aJiONs5NXY7iacZ55qG3Ib1cpQTps9bQILnqpwL2VTaH9TPGWwMY3Nc2VEc08zsLrXnA/yZKqZ1YzSY9MGXWYLkCDQRiEog1ARAAyXMKL+x1lDvLZVQjSUIVlaWswc0nV5y2EzBdbdZZCP3ysGC+s+n7xtq0o1wOvSvaG9h5q7sYZs+AKbuUbeZPu0bPWKoO02i00yVoSgWnEqDbyNeiSW+vI+VdiXITV83lG6pS+pAoTZlRROkpb5xo0gQ5ZeYok8MrkEmJbsPjdoKUJDBFTwrRnaDOfb+Qx1D22PlAZpdKiNtwbNZWiwEQFm6mHkIVSTUe2zSemoqYX4QQRvbmuMyPIbwbdNWlItukjHsffuPivLF/XsI1gDV67S1cVnQbBgrpFDxN62USwewXkNl+ndwa+15wgJFyq4Sd+RSMTPDzDQPFovyDfA/jxN2SK1Lizam6o+LBmvhIxwZOfdYH8bdYCoSpqcKLJVG3qVcTwbhGJr3kpRcBRz39Ml6iZhJyI3pEoX3bJTlR5Pr1Kjpx13qGydSMos94CIYWAKhegI06aTdvvuiigBwjngo/Rk5S+iEGR5KmTqGyp27o6YxZy6D4NIc6PKUzhIUxfvuHNvfu sD2W1U7eyLdm/jCgticGDsRtweytsgCSYfbz0gdgUuL3EBYN3JLbAU+UZpy v/fyD4cHDWaizNy/KmOI6FFjvVh4LRCpGTGDVPHsQXaqvzUybaMb7HSfmBBzZqqfVbq9n5FqPjAgD2lJ0rkzb9XnVXHgr6bmMRlaTlBMAEQEAAYkCNgQYAQgAIBYhBINQI6gu+8G3S19i2ykkeY3MjxOkBQJiEog1AhsMAAoJECkkeY3MjxOkY1YQAKdGjHyIdOWSjM8DPLdGJaPgJdugHZowaoyCxffilMGXqc8axBtmYjUIoXurpl+f+a7S0tQhXjGUt09zKlNXxGcebL5TEPFqgJTHN/77ayLslMTtZVYHE2FiIxkvW48yDjZUlefmphGpfpoXe4nRBNto1mMB9Pb9vR47EjNBZCtWWbwJTIEUwHP2Z5fV9nMx9Zw2BhwrfnODnzI8xRWVqk7/5R+FJvl7s3nY4F+svKGD9QHYmxfd8Gx42PZc/qkeCjUORaOf1fsYyChTtJI4iNm6iWbD9HK5LTMzwl0n0lL7CEsBsCJ97i2swm1DQiY1ZJ95G2Nz5PjNRSiymIw9/neTvUT8VJJhzRl3Nb/EmO/qeahfiG7zTpqSn2dEl+AwbcwQrbAhTPzuHIcoLZYV0xDWzAibUnn7pSrQKja+b8kHD9WF+m7dPlRVY7soqEYXylyCOXr5516upH8vVBmqweCIxXSWqPAhQq8d3hB/Ww2A0H0PBTN1REVw8pRLNApEA7C2nX6RW0XmA53PIQvAP0EAakWsqHoKZ5WdpeOcH9iVlUQhRgemQSkhfNaP9LqR1XKujlTuUTpoyT3xwAzkmSxN1nABoutHEO/N87fpIbpbZaIdinF7b9srwUvDOKsywfs5HMiUZhLKoZzCcU/AEFjQsPTATACGsWf3JYPnWxL9 User-Agent: Evolution 3.50.3 (3.50.3-1.fc39) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-MailFrom: pabeni@redhat.com X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation Message-ID-Hash: 3JK6IREUTRXEILGCJKWK7REYIRXRKWG7 X-Message-ID-Hash: 3JK6IREUTRXEILGCJKWK7REYIRXRKWG7 X-Mailman-Approved-At: Wed, 14 Feb 2024 00:18:46 +0100 CC: kuba@kernel.org, passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com, dgibson@redhat.com, jmaloy@redhat.com, netdev@vger.kernel.org, davem@davemloft.net X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Tue, 2024-02-13 at 14:34 +0100, Eric Dumazet wrote: > On Tue, Feb 13, 2024 at 2:02=E2=80=AFPM Paolo Abeni w= rote: > >=20 > > On Tue, 2024-02-13 at 13:24 +0100, Eric Dumazet wrote: > > > On Tue, Feb 13, 2024 at 11:49=E2=80=AFAM Paolo Abeni wrote: > > >=20 > > > > > @@ -2508,7 +2508,10 @@ static int tcp_recvmsg_locked(struct sock = *sk, struct msghdr *msg, size_t len, > > > > > WRITE_ONCE(*seq, *seq + used); > > > > > copied +=3D used; > > > > > len -=3D used; > > > > > - > > > > > + if (flags & MSG_PEEK) > > > > > + sk_peek_offset_fwd(sk, used); > > > > > + else > > > > > + sk_peek_offset_bwd(sk, used); > > >=20 > > > Yet another cache miss in TCP fast path... > > >=20 > > > We need to move sk_peek_off in a better location before we accept thi= s patch. > > >=20 > > > I always thought MSK_PEEK was very inefficient, I am surprised we > > > allow arbitrary loops in recvmsg(). > >=20 > > Let me double check I read the above correctly: are you concerned by > > the 'skb_queue_walk(&sk->sk_receive_queue, skb) {' loop that could > > touch a lot of skbs/cachelines before reaching the relevant skb? > >=20 > > The end goal here is allowing an user-space application to read > > incrementally/sequentially the received data while leaving them in > > receive buffer. > >=20 > > I don't see a better option than MSG_PEEK, am I missing something? >=20 >=20 > This sk_peek_offset protocol, needing sk_peek_offset_bwd() in the non > MSG_PEEK case is very strange IMO. >=20 > Ideally, we should read/write over sk_peek_offset only when MSG_PEEK > is used by the caller. >=20 > That would only touch non fast paths. >=20 > Since the API is mono-threaded anyway, the caller should not rely on > the fact that normal recvmsg() call > would 'consume' sk_peek_offset. Storing in sk_peek_seq the tcp next sequence number to be peeked should avoid changes in the non MSG_PEEK cases.=C2=A0 AFAICS that would need a new get_peek_off() sock_op and a bit somewhere (in sk_flags?) to discriminate when sk_peek_seq is actually set. Would that be acceptable? Thanks! Paolo