From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=cVk02v4z; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTPS id BF0605A061E for ; Mon, 27 Jan 2025 15:03:12 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1737986591; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2yEaKPHLKMOA5meZrsgLJjdr3xqDqApW5Y4VWczVlm4=; b=cVk02v4ztQ0qOTOQtdRkWDFYkdhR8ggqq+fJsWwGpoF4lH09r3KKEomOWl5HIV0WvHD4Q9 kcyenH+rZBqkzaeVUH26nlPyF5lMhfiyC1J5bRI1AVRtPLybp2ZZEdLp4ctXeyzJXXhzt6 txxR9eSWNTgwzWuhuVX6t1uDaA+uilg= Received: from mail-lf1-f70.google.com (mail-lf1-f70.google.com [209.85.167.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-511-SiEpLIN4O5ess-ekIPuTwA-1; Mon, 27 Jan 2025 09:03:10 -0500 X-MC-Unique: SiEpLIN4O5ess-ekIPuTwA-1 X-Mimecast-MFC-AGG-ID: SiEpLIN4O5ess-ekIPuTwA Received: by mail-lf1-f70.google.com with SMTP id 2adb3069b0e04-53f167e95a1so2282102e87.1 for ; Mon, 27 Jan 2025 06:03:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737986589; x=1738591389; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=2yEaKPHLKMOA5meZrsgLJjdr3xqDqApW5Y4VWczVlm4=; b=W/zMz0/L30TD3WH2fcJ5l1c2H0Yd55wEhhiMCNy94LQhxhkofGuC9vQLfg2oeYo/Zi VdSmRI1vtS65XEyE6bAv8pVpH8SxpmilpzAFbeUwLULRjGNW4QW3FPUCgIg7RTftDDUt E8K+nXHqNKzaIkhRd0zaZ5X9hI57Yy9RBhAdNflhuGLvqGTrnsBp6s+uNRdLh71W5FfH R2N8OPrp0DRhFkFSCB2lwJ2aKL9NHi87Ak9CuK5ETnNF+4FJ5dB0hNEKsZZPaqR9m2JV +i5C4kJj12jVS3oKU05nWQbpEWeEqtihUyM4baz+zHlXw/h7HlCqoGhMhlxmv6Y9Krxr jKHw== X-Forwarded-Encrypted: i=1; AJvYcCWgUumSqajn2UcRo1rkV4C1kDv0zMpRnhuAqSwuQ0R5mklSS5xif84GpDfGj+OlH6ZIukdHIV7Sl8U=@passt.top X-Gm-Message-State: AOJu0YyA6ptI9eKKNY/SRjlOhbkl2nQlMQ5QYmJ+PB3OZAtNT0xlgBJt Pbkhfmzo081nH2ZRjThmzTCQ/1MlpbayBC7k1zANiG8oZ2xRMQjrpYgwO/D10/D3+tKoJsyRCvA QJfGWQ/Rb3746cutw+HVe4cPmXbWawY1dNBeqmnOoD8Ash1Zzfw== X-Gm-Gg: ASbGnctfNBIYVSecNMvKbEJ+oaENIrLG3UuBDrEjk36Rd+ir9QzzgdRyWoug0UVTKty c3JsH7qrpdKa7y3k70DSoj/Nb9SXq0LdEflVfbELDX5uHby1k2kOReZbhRYJg6zfY4Xr1NTrCQO mMxl1hYq0WkkcEI3vEcP+GJMAL+cfjYoPibmR1tfGdDMyIs6mLGdRU9N9EXLPnL1L8ygREVlcjU zoF4wzd38diHQla7Yv8Z4P2hT1TQ6Bg/vwPPU7pzdMhNEFBaULjUitCNVRzstgXQ/4vzYsV9iji VrsobuR8mqgQlZE0 X-Received: by 2002:a05:6512:1154:b0:541:1c48:8bf1 with SMTP id 2adb3069b0e04-5439c24108fmr13525731e87.13.1737986588493; Mon, 27 Jan 2025 06:03:08 -0800 (PST) X-Google-Smtp-Source: AGHT+IFPdnM5+tvUc7QDNtq/WnuXpbAUzEdHRTJutlDgDEGZNmEe8Hq5/yP4BmMPPqY0Qi/KewVCbg== X-Received: by 2002:a05:6512:1154:b0:541:1c48:8bf1 with SMTP id 2adb3069b0e04-5439c24108fmr13525547e87.13.1737986586733; Mon, 27 Jan 2025 06:03:06 -0800 (PST) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-438bd47f355sm131207135e9.4.2025.01.27.06.03.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Jan 2025 06:03:06 -0800 (PST) Date: Mon, 27 Jan 2025 15:03:03 +0100 From: Stefano Brivio To: Menglong Dong Subject: Re: [net,v2] tcp: correct handling of extreme memory squeeze Message-ID: <20250127150303.46c9d9f5@elisabeth> In-Reply-To: References: <20250117214035.2414668-1-jmaloy@redhat.com> <20250127110121.1f53b27d@elisabeth> <20250127113214.294bcafb@elisabeth> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: Hy-CGAKiFblnh6enG5lqtGVHHoA-Kmqq1vQQn9nlQYI_1737986589 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Message-ID-Hash: 5AQTTPZEA2PL2RHZBQM3W6B3YFJBCXGN X-Message-ID-Hash: 5AQTTPZEA2PL2RHZBQM3W6B3YFJBCXGN X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Jason Xing , Jon Maloy , Eric Dumazet , Neal Cardwell , netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org, passt-dev@passt.top, lvivier@redhat.com, dgibson@redhat.com, eric.dumazet@gmail.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Mon, 27 Jan 2025 21:37:23 +0800 Menglong Dong wrote: > On Mon, Jan 27, 2025 at 6:32=E2=80=AFPM Stefano Brivio wrote: > > > > On Mon, 27 Jan 2025 18:17:28 +0800 > > Jason Xing wrote: > > =20 > > > I'm not that sure if it's a bug belonging to the Linux kernel. =20 > > > > It is, because for at least 20-25 years (before that it's a bit hard to > > understand from history) a non-zero window would be announced, as > > obviously expected, once there's again space in the receive window. =20 >=20 > Sorry for the late reply. I think the key of this problem is > what should we do when we receive a tcp packet and we are > out of memory. >=20 > The RFC doesn't define such a thing, Why not? RFC 9293, 3.8.6: There is an assumption that this is related to the data buffer space currently available for this connection. That is, out-of-memory -> zero window. > so in the commit > e2142825c120 ("net: tcp: send zero-window ACK when no memory"), > I reply with a zero-window ACK to the peer. Your patch is fundamentally correct, nobody is disputing that. The problem is that it introduces a side effect because it gets the notion of "current window" out of sync by sending a one-off packet with a zero-window, without recording that. > And the peer will keep > probing the window by retransmitting the packet that we dropped if > the peer is a LINUX SYSTEM. >=20 > As I said, the RFC doesn't define such a case, so the behavior of > the peer is undefined if it is not a LINUX SYSTEM. If the peer doesn't > keep retransmitting the packet, it will hang the connection, just like > the problem that described in this commit log. It's not undefined. RFC 9293 3.8.6.1 (just like RFC 1122 4.2.2.17, RFC 793 3.7) requires zero-window probes. But keeping the window closed indefinitely if there's no zero-window probe is a regression anyway: - a retransmission timeout must elapse (RFC 9293 3.8.1) before the zero-window probe is sent, so relying on zero-window probes means introducing an unnecessary delay - if the peer (as it was the case here) fails to send a zero-window probe for whatever reason, things break. This is a userspace breakage, regardless of the fact that the peer should send a zero-window probe > However, we can make some optimization to make it more > adaptable. We can send a ACK with the right window to the > peer when the memory is available, and __tcp_cleanup_rbuf() > is a good choice. >=20 > Generally speaking, I think this patch makes sense. However, > I'm not sure if there is any other influence if we make > "tp->rcv_wnd=3D0", but it can trigger a ACK in __tcp_cleanup_rbuf(). I don't understand what's your concern with the patch that was proposed (and tested quite thoroughly, by the way). > Following is the code that I thought before to optimize this > case (the code is totally not tested): > > [...] --=20 Stefano