From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: passt.top; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20230601 header.b=eni1qGbJ; dkim-atps=neutral Received: from mail-ej1-x62d.google.com (mail-ej1-x62d.google.com [IPv6:2a00:1450:4864:20::62d]) by passt.top (Postfix) with ESMTPS id 60E775A061E for ; Mon, 27 Jan 2025 17:37:14 +0100 (CET) Received: by mail-ej1-x62d.google.com with SMTP id a640c23a62f3a-aaf0f1adef8so796943066b.3 for ; Mon, 27 Jan 2025 08:37:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737995834; x=1738600634; darn=passt.top; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=BsgDs7vjk37Nv0OdmfDEhpVK2JsB/E4w2pYC01BWuCg=; b=eni1qGbJmAHRDcvO82RMffIMKIe1SvtM/mDI0xXwEhVJle/SNNJcrKiT//WkG4WZKs A1fr70rPlbS+pPm6LloQFhqCYL0NYNKpLsSCt5Ngm6C/yGeTzJRD9c/0MiKr5V5jFulz mxNZVAoWp26MxfyMEi0vpJ5K3EYkNjby3CTBciy8qZfm2c4jqEk/2K2Hi01KRXAYLsYO m1QttjqSxW4P5Pm9vOEeA2u7Obc5fIf5cqO6gu2OR2x+suG1m06TAGhxMtQb9bzUYujD WW5zihZitTs7CdJ22+5C86CXs8+BdANSPNrZcKPp5d0jFHz6PSk32DMHL2RTkT/s9wIs VRtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737995834; x=1738600634; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BsgDs7vjk37Nv0OdmfDEhpVK2JsB/E4w2pYC01BWuCg=; b=qKlB3uUlOLji2mfL7kuLkuubhGqymFjVWSyKF2VNaBoHShxWZ642nMH0zHEPsta+iX yBrTbjIn6VjLQwfh+VPvkOHK3i6wGmO50tEKkndKYjONOEqtLzf7Ree5m2GUOtIaaz7v +wxzVa964QYB0VJUCBOC4ytda8nYADjkWsn3qrPhotGTPfr3BU2e5JWC6oKa0eN1pvDX mbu1z68+ydnjAsIpk3IIEZKe10RDwLvaS4LCb//lTanGjTdPwU9Mmw1B/9b9j+I30TQi ZgrVt6N6o+4+STGISqWUpNNNbN7BrGL1rRprp+yLKuQ5EU4fsJ5GvdJwpsZltiJxtlgQ T+VA== X-Forwarded-Encrypted: i=1; AJvYcCXd3wk1ZHRlZmgwBPF9/Y63QcK1oqKwqHDZgmyPLRS8H8OkIBf/TirpjPklkVyk3imzyACEHtKCzAQ=@passt.top X-Gm-Message-State: AOJu0YyotqkU0RAi4IEDIEsZxRe/VBjkHL+/OP/J31nM8EgcHeTLszC4 LCAzTYFDBmTWRhUYYC6rEoEkMbMFTeKJyzlFYjx2LN1cgbfRSU5SazMhrG23eX2ZPqc6k0cfdMR B/aSuNv7t6IX2aElqq+EICDpc5KD+vbFKK8Ab X-Gm-Gg: ASbGnctdg2f/mt7R/MrL2WNBIYCtILtpeDgWqYornsKjlzq1XFvG/5IG7upqBSXfCuL ECYdod7HWn8teOxEgyh6EDBCq3Iq97TrFuOnW2qHyFx/2VXoIK8APkW6LVgupGQ== X-Google-Smtp-Source: AGHT+IEC7871tnGPEW7pCJoHJwcH326cGDcMV/EqYFJcDsAr6z3H+tfiOCOdtA+RDxxV8Kap4Iv2pMhSRxynnNcHQ24= X-Received: by 2002:a17:907:97cb:b0:aaf:c259:7f6 with SMTP id a640c23a62f3a-ab38b36bcf0mr3762742066b.45.1737995833746; Mon, 27 Jan 2025 08:37:13 -0800 (PST) MIME-Version: 1.0 References: <20250117214035.2414668-1-jmaloy@redhat.com> In-Reply-To: From: Eric Dumazet Date: Mon, 27 Jan 2025 17:37:02 +0100 X-Gm-Features: AWEUYZlyr6kt4nD2O6C2lXtTxoIC0GSvZOwHzbAuI92mVQe7kwjeSA-M-afX5h4 Message-ID: Subject: Re: [net,v2] tcp: correct handling of extreme memory squeeze To: Jon Maloy Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-MailFrom: edumazet@google.com X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation Message-ID-Hash: NJU72VDBHPMFR5VMQFZEGHYP62PPKB3I X-Message-ID-Hash: NJU72VDBHPMFR5VMQFZEGHYP62PPKB3I X-Mailman-Approved-At: Mon, 27 Jan 2025 18:12:04 +0100 CC: Neal Cardwell , netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org, passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com, dgibson@redhat.com, eric.dumazet@gmail.com, Menglong Dong X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, Jan 24, 2025 at 6:40=E2=80=AFPM Jon Maloy wrote= : > > > > On 2025-01-20 11:22, Eric Dumazet wrote: > > On Mon, Jan 20, 2025 at 5:10=E2=80=AFPM Jon Maloy w= rote: > >> > >> > >> > >> On 2025-01-20 00:03, Jon Maloy wrote: > >>> > >>> > > [...] > > >>>> I agree with Eric that probably tp->pred_flags should be cleared, an= d > >>>> a packetdrill test for this would be super-helpful. > >>> > >>> I must admit I have never used packetdrill, but I can make an effort. > >> > >> I hear from other sources that you cannot force a memory exhaustion wi= th > >> packetdrill anyway, so this sounds like a pointless exercise. > > > > We certainly can and should add a feature like that to packetdrill. > > > > Documentation/fault-injection/ has some relevant information. > > > > Even without this, tcp_try_rmem_schedule() is reading sk->sk_rcvbuf > > that could be lowered by a packetdrill script I think. > > > Neal, Eric, > How do you suggest we proceed with this? > I downloaded packetdrill and tried it a bit, but to understand it well > enough to introduce a new feature would require more time than I am > able to spend on this. Maybe Neal, who I see is one of the contributors > to packetdrill could help out? > > I can certainly clear tp->pred_flags and post it again, maybe with > an improved and shortened log. Would that be acceptable? Yes. > > I also made a run where I looked into why __tcp_select_window() > ignores all the space that has been freed up: > > > tcp_recvmsg_locked(->) > __tcp_cleanup_rbuf(->) (copied 131072) > tp->rcv_wup: 1788299855, tp->rcv_wnd: 5812224, > tp->rcv_nxt 1793800175 > __tcp_select_window(->) > tcp_space(->) > tcp_space(<-) returning 458163 > free_space =3D round_down(458163, 1 << 4096) =3D 454656 > (free_space > tp->rcv_ssthresh) --> > free_space =3D tp->rcv_ssthresh =3D 261920 > window =3D ALIGN(261920, 4096) =3D 26144 > __tcp_select_window(<-) returning 262144 > [rcv_win_now 311904, 2 * rcv_win_now 623808, new_window 262144] > (new_window >=3D (2 * rcv_win_now)) ? --> time_to_ack 0 > NOT calling tcp_send_ack() > __tcp_cleanup_rbuf(<-) > [tp->rcv_wup 1788299855, tp->rcv_wnd 5812224, > tp->rcv_nxt 1793800175] > tcp_recvmsg_locked(<-) returning 131072 bytes. > [tp->rcv_nxt 1793800175, tp->rcv_wnd 5812224, > tp->rcv_wup 1788299855, sk->last_ack 0, tcp_receive_win() 311904, > copied_seq 1788299855->1788395953 (96098), unread 5404222, > sk_rcv_qlen 83, ofo_qlen 0] > > > As we see tp->rcv_ssthresh is the limiting factor, causing > a consistent situation where (new_window < (rcv_win_now * 2)), > and even (new_window < rcv_win_now). Your changelog could simply explain this, in one sentence. instead of lengthy traces.