From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=bLB3pZUg; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id 3B69F5A0272 for ; Mon, 17 Feb 2025 08:12:17 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739776336; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+hiyGkPf/D8kdZhiCaXADY4p4WACQ2/Eqm3Gp/KSVeI=; b=bLB3pZUgPSD1VpuS5pm/l9bhjsmqcelEw56qcONjHw1++/ZvkegNXGEcpCZ5dzJ+Ju2yR+ Q5Vruqu9NZdJ8O/CxSvaxxr0jn5HqpG9kqzt/2qlS+gAVI3/itAroX/9SYoP8wFROLiAVG LhlvhvHuqZawbhEbWGuK42yZ8a18U+U= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-57-39zDsR07PTW2njGIf2251g-1; Mon, 17 Feb 2025 02:12:14 -0500 X-MC-Unique: 39zDsR07PTW2njGIf2251g-1 X-Mimecast-MFC-AGG-ID: 39zDsR07PTW2njGIf2251g_1739776333 Received: by mail-wr1-f71.google.com with SMTP id ffacd0b85a97d-38f34b3f9f1so1100692f8f.0 for ; Sun, 16 Feb 2025 23:12:14 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739776333; x=1740381133; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=+hiyGkPf/D8kdZhiCaXADY4p4WACQ2/Eqm3Gp/KSVeI=; b=VcOV7OnqVx2CbZrf6mW94KZnl1UIJeBH96p1V1/ocNIiTQ+5bs1foOUJAM1c/fS42t 2lLm2O/4S4EG2K5wFm1pJrN6vlY8C9F2sw6uyzpva6kV0vvKek/0feBe/8zAT4iEzhRs Yp/ol7LxJ3clljEM7SZE2SDNZar3oUfsnUZAPJoEhQ/1BzxARVZuXFMsfuWVAfU7ALgZ mmqU/2D2JfWELho03Hr9dm5a9zL8ZKCs012+bs72cRyg7547PIiGxw+azEIEP2cG+Inq jLd16U3ELqfy0PkF8ia3P4A9Bx0CgnLPLs3L94u+7beOA8TOMCL2cQu/PLkib/LhfkqP Hu/A== X-Gm-Message-State: AOJu0Yyb7VADUP251n3Kw9MZ808lG8aFIIM5q9bczSHe9bS+jYZdzMgo 72GDoI933+UuXzhn01CAtAGYufajWJiuBBi+FtNpmHk6zwGBLMVu9S5Dp+qZ/UMNtceoF9tcUyV ew3Wc7qvRU37KgU6pGoUUYzMv9CNXUGm+39eGHMuymR0t34MICnqvFDQjlg== X-Gm-Gg: ASbGnct5ni7fSozmO09+SZqrFm9CBDuXq5QqUPwKfZEzw4pTIG4KOl70WbxYxCQp/tz RH295wHnUct2nyKDIJQbNTAVeFEtWnqoIDh4fH8uCVdJ86AKmHcWNmy15ztXCQsASzMPZ5Q6toy YBRR5kWBjpY4RNbnxDcD0fQ9jaMA+9w+hHyzto4wMl4Wv4c78NwOPXFSHYk6frq0/bFyWOycfe5 Tvh+uSjBWPOItRRGpuge8DTkBGKlrUSs0fUYqPpoqEYnwVc+aE2GIAMgFvYZW3+ya8mBhzKWUGq sz/tOCUd0Qxbhvmr X-Received: by 2002:a05:6000:18a9:b0:38f:217f:b225 with SMTP id ffacd0b85a97d-38f33f2fc4emr8613291f8f.29.1739776332909; Sun, 16 Feb 2025 23:12:12 -0800 (PST) X-Google-Smtp-Source: AGHT+IE7pUR+X7XCdBf68oDaReNW3aPhLhPfUJIzc/BhJkZG15HskFnrL1HDd5aulldxvRChBvkXmQ== X-Received: by 2002:a05:6000:18a9:b0:38f:217f:b225 with SMTP id ffacd0b85a97d-38f33f2fc4emr8613265f8f.29.1739776332421; Sun, 16 Feb 2025 23:12:12 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38f259d5eeesm11543812f8f.63.2025.02.16.23.12.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 16 Feb 2025 23:12:11 -0800 (PST) Date: Mon, 17 Feb 2025 08:12:10 +0100 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH] tcp_splice: A typo three years ago and SO_RCVLOWAT is gone Message-ID: <20250217081210.52b3ba3b@elisabeth> In-Reply-To: References: <20250216221216.2014593-1-sbrivio@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 2iZ8_4sPcxGyDBJ1T6CoTNARYooZuBL7_X3HVgAdDas_1739776333 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: BIZQXTZCTEZOXOPYFB7JPK46DLKDCNY2 X-Message-ID-Hash: BIZQXTZCTEZOXOPYFB7JPK46DLKDCNY2 X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Mon, 17 Feb 2025 14:49:39 +1100 David Gibson wrote: > On Sun, Feb 16, 2025 at 11:12:15PM +0100, Stefano Brivio wrote: > > In commit e5eefe77435a ("tcp: Refactor to use events instead of > > states, split out spliced implementation"), this: > > > > if (!bitmap_isset(rcvlowat_set, conn - ts) && > > readlen > (long)c->tcp.pipe_size / 10) { > > > > (note the !) became: > > > > if (conn->flags & lowat_set_flag && > > readlen > (long)c->tcp.pipe_size / 10) { > > > > in the new tcp_splice_sock_handler(). > > > > We want to check, there, if we should set SO_RCVLOWAT, only if we > > haven't set it already. > > > > But, instead, we're checking if it's already set before we set it, so > > we'll never set it, of course. > > > > Fix the check and re-enable the functionality, which should give us > > improved CPU utilisation in non-interactive cases where we are not > > transferring at full pipe capacity. > > > > Fixes: e5eefe77435a ("tcp: Refactor to use events instead of states, split out spliced implementation") > > Signed-off-by: Stefano Brivio > > Ouch. > > Reviewed-by: David Gibson > > At least insofar as this clearly corrects towards the intended > behaviour. Given that we inadvertently bee using RCVLOWAT for so > long, I am a bit worried that this might expose deadlocks or stalls. > But, I guess we debug that when we come to it. Yeah, I was undecided as well, then I tested and tested, and I realised that commit 904b86ade7db ("tcp: Rework window handling, timers, add SO_RCVLOWAT and pools for sockets/pipes") added this gem, still there: if (read >= (long)c->tcp.pipe_size * 10 / 100) continue; if (!bitmap_isset(rcvlowat_set, conn - ts) && read > (long)c->tcp.pipe_size / 10) { int lowat = c->tcp.pipe_size / 4; setsockopt(move_from, SOL_SOCKET, SO_RCVLOWAT, &lowat, sizeof(lowat)); which means that we'll not set SO_RCVLOWAT anyway, because if read > c->tcp.pipe_size / 10, we'll skip the second block, and if not, we'll skip it anyway. Now, I have a clear memory of characterising those 10% and 25% values over a wide range of pipe sizes, message sizes, etc. Other than SSH and installing packages in a container (and check that nothing gets stuck for ~one second), my basic test idea was: $ iperf3 -c localhost -p 5202 -l 100 with port 5202 forwarded to the network namespace and iperf3 server listening there. I think I added another typo while cleaning up. I probably meant: [...] read < (long)c->tcp.pipe_size / 10) { ...and if I do that, it finally works. But anyway, it's too many typos at this point and especially we never had a release with it enabled, so I'm not "fixing" this for the moment, it needs a lot more testing than I can do now. -- Stefano