From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: passt.top; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20230601 header.b=sao5pZ1u; dkim-atps=neutral Received: from mail-ed1-x52f.google.com (mail-ed1-x52f.google.com [IPv6:2a00:1450:4864:20::52f]) by passt.top (Postfix) with ESMTPS id DF4F85A061E for ; Mon, 27 Jan 2025 10:54:02 +0100 (CET) Received: by mail-ed1-x52f.google.com with SMTP id 4fb4d7f45d1cf-5d3bdccba49so7817510a12.1 for ; Mon, 27 Jan 2025 01:54:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737971642; x=1738576442; darn=passt.top; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ma4xWZqtERgcJwwLrYW9gZ1pylvK07Wq2ibrzUvGFoQ=; b=sao5pZ1u6F4ST+8uqsgXgOMH4kMKgGNMpcT5ZenYhbWj9JBtsS0xA3UN/jWGVnkVsk gKrtQ3aTKtZm1wu56zLX+e4ePxxTnhKWc2FkJcNGiWX92/A1/eKtXT5w2haQWxG9tkKp SAFUeFuCWgzAKG4CSzxqR7FATe8ohoJIM5NOPPmE8Vzdy8pDimGQN2FN7MUWhvVySYSL 56DbuOVTruEfCUEx6Fv2ohR/JXmmO17/KdnzGDp4/uue9n3caal5zMlN5b1XQfpKEJqc U0W1azFs5D8G9zxFqDM/MiH/gy642czJIsvTuQqrO/yDNl5+CbE4pBUbRwUDxYQTsrvQ C9hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737971642; x=1738576442; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ma4xWZqtERgcJwwLrYW9gZ1pylvK07Wq2ibrzUvGFoQ=; b=fr1aDakNTYoDwn5oGzg2YChFH692R9M4lxRtIF7uNADQ+zW0zv/dHdayBtSDvSG/ZC 4XVMZs0LIfLpw3fcaFhUFewO7z5ORjPU1v5Fg27IzH6ItrrOH1LNDJeNukm22YMkZmTf C76kWZYFZGj/ans+t7vT/Z0+kRY5nzYTUkvhd5BU7qKZQhWs7PrI+irdfWN1SMWAMGZL VHgeTPRKA81VRMVWkIVPaQdZWBUcWyE0GGfMHF8zWl1QleMYI7PrfLaux2nD0erCsUp5 Dn6zVTphJR/rE/dRo+8IidFWyvKVDTMksaogDjsA4laJ5Jvibrtwrqg8m4nhstgjCN/A IQJA== X-Forwarded-Encrypted: i=1; AJvYcCWgyWrbTFmhx5tGeZU7b8nuKwKpiK7zLItSWIKqY7OSUbicgFv7JxZcElIRcOTyHczHIWwMU3RQvfE=@passt.top X-Gm-Message-State: AOJu0Yx1KSIHkmdVUN1P2a00lBdTtVuZ6KAq7l5HHIFaJlZhsUv4Ll2a jAR/+Sa6cK9waMqZJRFFruZ//3WXRvebX+xAJlQwptBKJl6YDRchlasMk7vm6ytXKjA9hb23N+U M8owxa4W7wqs6NrdS7hD3eIFb4tIiPPUdPlZU X-Gm-Gg: ASbGncutzrSGj+cFxKrU8V5LKJxlaazeDNw7uX1bPTd4KW3TuE7iVOaHloERhKseDmE HvbFCBe0wNg3/zxnFE8j7pdvCbOXfkKYaP2AD9tGclKY3hFshz4slWW7EJUL8XQfOlmRfbs3j X-Google-Smtp-Source: AGHT+IEmaJXq2g4rc62972VT9fXm4yWlnYthW0YtJduUthkxDklMiCIaNjWtCluaTI9eIj8koZzfM4WatksMZ/2dBfQ= X-Received: by 2002:a05:6402:268e:b0:5d3:ce7f:abe4 with SMTP id 4fb4d7f45d1cf-5db7db06dd7mr35494579a12.25.1737971642271; Mon, 27 Jan 2025 01:54:02 -0800 (PST) MIME-Version: 1.0 References: <20250117214035.2414668-1-jmaloy@redhat.com> In-Reply-To: From: Eric Dumazet Date: Mon, 27 Jan 2025 10:53:51 +0100 X-Gm-Features: AWEUYZll_MgNF4TmyyjJCRECouNnTMg1NhAdqvkdaiFbOabe2hKTbmMR0pTCu8I Message-ID: Subject: Re: [net,v2] tcp: correct handling of extreme memory squeeze To: Jon Maloy Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-MailFrom: edumazet@google.com X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation Message-ID-Hash: MBW2OW3Q2CEF4NKPLBG6BKAB4L7V5F3N X-Message-ID-Hash: MBW2OW3Q2CEF4NKPLBG6BKAB4L7V5F3N X-Mailman-Approved-At: Mon, 27 Jan 2025 18:12:04 +0100 CC: Neal Cardwell , netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org, passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com, dgibson@redhat.com, eric.dumazet@gmail.com, Menglong Dong X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, Jan 24, 2025 at 6:40=E2=80=AFPM Jon Maloy wrote= : > > > > On 2025-01-20 11:22, Eric Dumazet wrote: > > On Mon, Jan 20, 2025 at 5:10=E2=80=AFPM Jon Maloy w= rote: > >> > >> > >> > >> On 2025-01-20 00:03, Jon Maloy wrote: > >>> > >>> > > [...] > > >>>> I agree with Eric that probably tp->pred_flags should be cleared, an= d > >>>> a packetdrill test for this would be super-helpful. > >>> > >>> I must admit I have never used packetdrill, but I can make an effort. > >> > >> I hear from other sources that you cannot force a memory exhaustion wi= th > >> packetdrill anyway, so this sounds like a pointless exercise. > > > > We certainly can and should add a feature like that to packetdrill. > > > > Documentation/fault-injection/ has some relevant information. > > > > Even without this, tcp_try_rmem_schedule() is reading sk->sk_rcvbuf > > that could be lowered by a packetdrill script I think. > > > Neal, Eric, > How do you suggest we proceed with this? > I downloaded packetdrill and tried it a bit, but to understand it well > enough to introduce a new feature would require more time than I am > able to spend on this. Maybe Neal, who I see is one of the contributors > to packetdrill could help out? I will spend some time this week preparing for some tests. I would prefer not merging new code without a clear understanding of the is= sue. Thanks. > > I can certainly clear tp->pred_flags and post it again, maybe with > an improved and shortened log. Would that be acceptable? > > I also made a run where I looked into why __tcp_select_window() > ignores all the space that has been freed up: > > > tcp_recvmsg_locked(->) > __tcp_cleanup_rbuf(->) (copied 131072) > tp->rcv_wup: 1788299855, tp->rcv_wnd: 5812224, > tp->rcv_nxt 1793800175 > __tcp_select_window(->) > tcp_space(->) > tcp_space(<-) returning 458163 > free_space =3D round_down(458163, 1 << 4096) =3D 454656 > (free_space > tp->rcv_ssthresh) --> > free_space =3D tp->rcv_ssthresh =3D 261920 > window =3D ALIGN(261920, 4096) =3D 26144 > __tcp_select_window(<-) returning 262144 > [rcv_win_now 311904, 2 * rcv_win_now 623808, new_window 262144] > (new_window >=3D (2 * rcv_win_now)) ? --> time_to_ack 0 > NOT calling tcp_send_ack() > __tcp_cleanup_rbuf(<-) > [tp->rcv_wup 1788299855, tp->rcv_wnd 5812224, > tp->rcv_nxt 1793800175] > tcp_recvmsg_locked(<-) returning 131072 bytes. > [tp->rcv_nxt 1793800175, tp->rcv_wnd 5812224, > tp->rcv_wup 1788299855, sk->last_ack 0, tcp_receive_win() 311904, > copied_seq 1788299855->1788395953 (96098), unread 5404222, > sk_rcv_qlen 83, ofo_qlen 0] > > > As we see tp->rcv_ssthresh is the limiting factor, causing > a consistent situation where (new_window < (rcv_win_now * 2)), > and even (new_window < rcv_win_now). > > To me, it looks like tp->ssthresh should have a higher value > in this situation, or maybe we should alter this test. > > The combination of these two issues, -not updating tp->wnd and > _tcp_select_window() returning a wrong value, is what is causing > this whole problem. > > ///jon > > > > >