From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTP id 4C98D5A0271 for ; Fri, 26 Apr 2024 07:59:11 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1714111150; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ib3YC0CKWAJBMJwkipZZRkrNMmM5igdV2ViTvgkBNRU=; b=HMR5fY5pTVupZe2bIxLgVyXOE7FyRI8OEUTbmECTWvRpbBWbnS26pUiIzq+FN8tV42OT/x V1a7TDrYNMu8hJs18ppXNFH1F+0qmqoS41kpi0iR5xZMjDly3JhIBWKBDOTCFKRc3BEfPH pDp5uoUtrZIF+OK+0P46AS+KVpjq1pA= Received: from mail-lj1-f200.google.com (mail-lj1-f200.google.com [209.85.208.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-591-jHxzjYdAOFm9d7b0f5jnDw-1; Fri, 26 Apr 2024 01:59:08 -0400 X-MC-Unique: jHxzjYdAOFm9d7b0f5jnDw-1 Received: by mail-lj1-f200.google.com with SMTP id 38308e7fff4ca-2df0c2b74b1so10704051fa.0 for ; Thu, 25 Apr 2024 22:59:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714111147; x=1714715947; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Ib3YC0CKWAJBMJwkipZZRkrNMmM5igdV2ViTvgkBNRU=; b=qGcZK8KnBggaL8DPN4lZGpcdg85LwfYzF1RW0w/QtAKUPBClhZL3biSvQ3LUGXyfSe 9W+UgsEnIAEhmElwIuefxD9gtbAs5PoMBcVs2cS+7/MWcGHHD3D/p+v8l/dm5gOmn4k5 zNrHTHyGhd1R8r28gMCQb3fpzjYdvg38Yz7A1SR4DiU+R25++QMsjj36erBFAhEPq/hv /BShibFzMmhWwxoMiB5Lotp4gNCUzzUF3hp/eXG6jHJ1ra1+EVgt1B37tPx7VhoWvTWk r3qOlatS74d5Unt8xl08XWu3b+t8ED2W4pu/3Rl3zXqRg8bbrceh28gnwogizNZVY+i2 bndA== X-Forwarded-Encrypted: i=1; AJvYcCVGFK0avjioA8EK3yCXn+buFVZxE4HKipD/OveKJO/nZ3ylhvpLFofQGo0oYOydnnkuRfUF7SxWh4Jvl4V7t+aywRUg X-Gm-Message-State: AOJu0YwvXGP9dSCS0VqlbjfC+KJEag3pF8Nc6wsLfSUXKG3zCIq+/u5x vEqAJ6CX0KxvlpjEBsRWKJSX5NUs2DfwJB/eBYTREboBpAuW0lqlCPw59ODeTmejLKxPe+6nvCX G5YYTWZmgdEzlDTi+Vcco7YC9pWMt8UzY9zRX6aQhyugJdtzrMg== X-Received: by 2002:a2e:9e41:0:b0:2d6:8e88:5a8b with SMTP id g1-20020a2e9e41000000b002d68e885a8bmr957511ljk.32.1714111146705; Thu, 25 Apr 2024 22:59:06 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEGaFghStcIHLO99nA58NTw6U6Mj5IU+aar4tOHB+f8X1H1I0EscqjhMxraFHrMttiZtdKgiw== X-Received: by 2002:a2e:9e41:0:b0:2d6:8e88:5a8b with SMTP id g1-20020a2e9e41000000b002d68e885a8bmr957485ljk.32.1714111146129; Thu, 25 Apr 2024 22:59:06 -0700 (PDT) Received: from maya.cloud.tilaa.com (maya.cloud.tilaa.com. [164.138.29.33]) by smtp.gmail.com with ESMTPSA id z2-20020a1709063ac200b00a4e1a9e1ab4sm10214242ejd.157.2024.04.25.22.59.05 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 25 Apr 2024 22:59:05 -0700 (PDT) Date: Fri, 26 Apr 2024 07:58:32 +0200 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH 1/2] tcp: leverage support of SO_PEEK_OFF socket option when available Message-ID: <20240426075832.093aac78@elisabeth> In-Reply-To: References: <20240420191920.104876-1-jmaloy@redhat.com> <20240420191920.104876-2-jmaloy@redhat.com> <20240423195010.2b4d5c13@elisabeth> <20240424203044.2df748d7@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.36; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: UOO4APWSUWWM2CZIFQ2LZQM5YHJKQHAB X-Message-ID-Hash: UOO4APWSUWWM2CZIFQ2LZQM5YHJKQHAB X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Jon Maloy , passt-dev@passt.top, lvivier@redhat.com, dgibson@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, 26 Apr 2024 13:27:11 +1000 David Gibson wrote: > On Wed, Apr 24, 2024 at 08:30:44PM +0200, Stefano Brivio wrote: > > On Wed, 24 Apr 2024 10:48:05 +1000 > > David Gibson wrote: > > > > > On Tue, Apr 23, 2024 at 07:50:10PM +0200, Stefano Brivio wrote: > > > > On Sat, 20 Apr 2024 15:19:19 -0400 > > > > Jon Maloy wrote: > > > [snip] > > > > > + set_peek_offset(s, 0); > > > > > > > > Do we really need to initialise it to zero on a new connection? Extra > > > > system calls on this path matter for latency of connection > > > > establishment. > > > > > > Sort of, yes: we need to enable the SO_PEEK_OFF behaviour by setting > > > it to 0, rather than the default -1. > > > > By the way of which, this is not documented at this point -- a man page > > patch (linux-man and linux-api lists) would be nice. > > > > > We could lazily enable it, but > > > we'd need either to a) do it later in the handshake (maybe when we set > > > ESTABLISHED), but we'd need to be careful it is always set before the > > > first MSG_PEEK > > > > I was actually thinking that we could set it only as we receive data > > (not every connection will receive data), and keep this out of the > > handshake (which we want to keep "faster", I think). > > That makes sense, but I think it would need a per-connection flag. Definitely. > > And setting it as we mark a connection as ESTABLISHED should have the > > same effect on latency as setting it on a new connection -- that's not > > really lazy. So, actually: > > Good point. > > > > or b) keep track of whether it's set on a per-socket > > > basis (this would have the advantage of robustness if we ever > > > encountered a kernel that weirdly allows it for some but not all TCP > > > sockets). > > > > ...this could be done as we receive data in tcp_data_from_sock(), with > > a new flag in tcp_tap_conn::flags, to avoid adding latency to the > > handshake. It also looks more robust to me, and done/checked in a > > single place where we need it. > > > > We have just three bits left there which isn't great, but if we need to > > save one at a later point, we can drop this new flag easily. > > I just realised that folding the feature detection into this is a bit > costlier than I thought. If we globally probe the feature we just > need one bit per connection: is SO_PEEK_OFF set yet or not. If we > tried to probe per-connection we'd need a tristate: haven't tried / > SO_PEEK_OFF enabled / tried and failed. I forgot to mention this part: what I wanted to propose was actually still a global probe, so that we don't waste one system call per connection on kernels not supporting this (a substantial use case for a couple of years from now?), which probably outweighs the advantage of the weird, purely theoretical kernel not supporting the feature for some sockets only. And then something like PEEK_OFFSET_SET (SO_PEEK_OFF_SET sounds awkward to me) on top. Another advantage is avoiding the tristate you described. -- Stefano