From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ed1-x534.google.com (mail-ed1-x534.google.com [IPv6:2a00:1450:4864:20::534]) by passt.top (Postfix) with ESMTPS id 747945A027B for ; Fri, 16 Feb 2024 10:22:09 +0100 (CET) Received: by mail-ed1-x534.google.com with SMTP id 4fb4d7f45d1cf-563dfa4ccdcso5692a12.1 for ; Fri, 16 Feb 2024 01:22:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1708075329; x=1708680129; darn=passt.top; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=3bBdfctTPwqXx3EHXILft1vvn13/jEBJ3nvmxfdVnVk=; b=MWnnqTR8DxNpqq2regeUC3lT6/gPpeEwjeYaOYRJrwp79IF1EFvUY6I90DBMmdw3w2 uCPIuS4UJtufCker0j49ICRXKZ9jSX183NGHy+Z1vx5U9OqFkzQuG+NTqGOQqCyZYbat U7S0nmoxvGlO011ZD0P8QaHd1bqAgvM6s6HYWBgih3gvvfZBS61IFDx4IRWQL/0gBWty 2m4PcjW/9C4O0AGHezeTEt2Ii9CjPdVBzt2oLFP29ij0vycFcS0ZseNk6D0MVCsk/+Ou kNPFVm6Sr5luTH/sJ6wM2wetobuSTqp/x5L5Tx6IZcPTSvIrMERsp3N/H4we04MNpcxK hQeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708075329; x=1708680129; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3bBdfctTPwqXx3EHXILft1vvn13/jEBJ3nvmxfdVnVk=; b=uQfGFIfmZi3Qr9cpJxXGW0+ndABZ+ybKDVs8XKXdespXKJdYna5OILgRjwYDUYEhQX Kabkj0AuXpcBU2ltQDeO6tsbOTsANnFU/NwcpC1BiIxcPPCeMUaakwBXAD9naUZRtpxK 1VOAU5eWhQJzeN7SPK7rX1NgRCrJ2FV+Ijq1wniyyMJpxIvUXRjFbFYrY1vvJE37xAaM 5lt7DTV1oVlNUUBxCixdiwxH/EvNQeb+w7EzgibDDdbV6PLfa0cszvGLPPid3vHYDpIv pcU0zxRAyHbtGGLTPfkzvmHpn9lwrVWuqFoHO3jRCXXpu0uN2+lu2JENV3VzwMY/0tK/ 7+sg== X-Forwarded-Encrypted: i=1; AJvYcCW667oad14ERtdUXUyWfiwAfCEBhEIn/P7Axx2ZPZLiERfVTZuAWawkL8gSw7taTBLAAr5WYWVSbNWnVLjrO1Jq7OTE X-Gm-Message-State: AOJu0YwMgZxpr0cvSt0TMV04eHzDKy3oDshkQl2aFCIGkzeRYr5vpG2w aZJbJvTW44arKYM1AeGus9xOQbPLgLt0AeJ2d7XOdZVFs2hj3W2M6Y6UCdd2UWuK0TQmww8vtBj mojnPtgp8eyZQEc3RhO7Y8AQGzQGfIhbApsKK X-Google-Smtp-Source: AGHT+IHjg/nhYCKRW2LdnagfUJiZzLg6NULr5aVrf4xAGyWJAnai7crxUzqTw55wKXII2jt2TLCJfRpuzv1oGVJKHZQ= X-Received: by 2002:a50:d607:0:b0:563:adf3:f5f4 with SMTP id x7-20020a50d607000000b00563adf3f5f4mr127188edi.1.1708075328663; Fri, 16 Feb 2024 01:22:08 -0800 (PST) MIME-Version: 1.0 References: <20240209221233.3150253-1-jmaloy@redhat.com> <8d77d8a4e6a37e80aa46cd8df98de84714c384a5.camel@redhat.com> <20072ba530b34729589a3d527c420a766b49e205.camel@redhat.com> <725a92b4813242549f2316e6682d3312b5e658d8.camel@redhat.com> <20687849-ec5c-9ce5-0a18-cc80f5b64816@redhat.com> <178b9f2dbb3c56fcfef46a97ea395bdd13ebfb59.camel@redhat.com> <89f263be-3403-8404-69ed-313539d59669@redhat.com> <9cb12376da3f6cd316320b29f294cc84eaba6cfa.camel@redhat.com> In-Reply-To: <9cb12376da3f6cd316320b29f294cc84eaba6cfa.camel@redhat.com> From: Eric Dumazet Date: Fri, 16 Feb 2024 10:21:54 +0100 Message-ID: Subject: Re: [PATCH v3] tcp: add support for SO_PEEK_OFF To: Paolo Abeni Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-MailFrom: edumazet@google.com X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation Message-ID-Hash: QTYIGVFJ7OK3SO7IOSYRFVTZIQ3Q2BK7 X-Message-ID-Hash: QTYIGVFJ7OK3SO7IOSYRFVTZIQ3Q2BK7 X-Mailman-Approved-At: Fri, 16 Feb 2024 11:51:52 +0100 CC: Jon Maloy , kuba@kernel.org, passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com, dgibson@redhat.com, netdev@vger.kernel.org, davem@davemloft.net X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, Feb 16, 2024 at 10:14=E2=80=AFAM Paolo Abeni wr= ote: > > On Thu, 2024-02-15 at 17:24 -0500, Jon Maloy wrote: > > > > On 2024-02-15 12:46, Eric Dumazet wrote: > > > On Thu, Feb 15, 2024 at 6:41=E2=80=AFPM Paolo Abeni wrote: > > > > Note: please send text-only email to netdev. > > > > > > > > On Thu, 2024-02-15 at 10:11 -0500, Jon Maloy wrote: > > > > > I wonder if the following could be acceptable: > > > > > > > > > > if (flags & MSG_PEEK) > > > > > sk_peek_offset_fwd(sk, used); > > > > > else if (peek_offset > 0) > > > > > sk_peek_offset_bwd(sk, used); > > > > > > > > > > peek_offset is already present in the data cache, and if it has= the value > > > > > zero it means either that that sk->sk_peek_off is unused (-1) o= r actually is zero. > > > > > Either way, no rewind is needed in that case. > > > > I agree the above should avoid touching cold cachelines in the > > > > fastpath, and looks functionally correct to me. > > > > > > > > The last word is up to Eric :) > > > > > > > An actual patch seems needed. > > > > > > In the current form, local variable peek_offset is 0 when !MSG_PEEK. > > > > > > So the "else if (peek_offset > 0)" would always be false. > > > > > Yes, of course. This wouldn't work unless we read sk->sk_peek_off at th= e > > beginning of the function. > > I will look at the other suggestions. > > I *think* that moving sk_peek_off this way: > > --- > diff --git a/include/net/sock.h b/include/net/sock.h > index a9d99a9c583f..576a6a6abb03 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -413,7 +413,7 @@ struct sock { > unsigned int sk_napi_id; > #endif > int sk_rcvbuf; > - int sk_disconnects; > + int sk_peek_off; > > struct sk_filter __rcu *sk_filter; > union { > @@ -439,7 +439,7 @@ struct sock { > struct rb_root tcp_rtx_queue; > }; > struct sk_buff_head sk_write_queue; > - __s32 sk_peek_off; > + int sk_disconnects; > int sk_write_pending; > __u32 sk_dst_pending_confirm; > u32 sk_pacing_status; /* see enum sk_pacing *= / > --- > > should avoid problematic accesses, > > The relevant cachelines layout is as follow: > > /* --- cacheline 4 boundary (256 bytes) --- */ > struct sk_buff * tail; /* 256 8 *= / > } sk_backlog; /* 240 24 *= / > int sk_forward_alloc; /* 264 4 *= / > u32 sk_reserved_mem; /* 268 4 *= / > unsigned int sk_ll_usec; /* 272 4 *= / > unsigned int sk_napi_id; /* 276 4 *= / > int sk_rcvbuf; /* 280 4 *= / > int sk_disconnects; /* 284 4 *= / > // will become sk_peek_off > struct sk_filter * sk_filter; /* 288 8 *= / > union { > struct socket_wq * sk_wq; /* 296 8 *= / > struct socket_wq * sk_wq_raw; /* 296 8 *= / > }; /* 296 8 *= / > struct xfrm_policy * sk_policy[2]; /* 304 16 *= / > /* --- cacheline 5 boundary (320 bytes) --- */ > > // ... > > /* --- cacheline 6 boundary (384 bytes) --- */ > __s32 sk_peek_off; /* 384 4 *= / > // will become sk_diconnects > int sk_write_pending; /* 388 4 *= / > __u32 sk_dst_pending_confirm; /* 392 4= */ > u32 sk_pacing_status; /* 396 4 *= / > long int sk_sndtimeo; /* 400 8 *= / > struct timer_list sk_timer; /* 408 40 *= / > > /* XXX last struct has 4 bytes of padding */ > > /* --- cacheline 7 boundary (448 bytes) --- */ > > sk_peek_off will be in the same cachline of sk_forward_alloc / > sk_reserved_mem / backlog tail, that are already touched by the > tcp_recvmsg_locked() main loop. > > WDYT? I was about to send a similar change, also moving sk_rcvtimeo, and adding __cacheline_group_begin()/__cacheline_group_end annotations. I can finish this today.