From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: passt.top; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20230601 header.b=WjKAXr6w; dkim-atps=neutral Received: from mail-qt1-x82f.google.com (mail-qt1-x82f.google.com [IPv6:2607:f8b0:4864:20::82f]) by passt.top (Postfix) with ESMTPS id 7AE425A0274 for ; Sat, 18 Jan 2025 21:04:37 +0100 (CET) Received: by mail-qt1-x82f.google.com with SMTP id d75a77b69052e-467abce2ef9so182601cf.0 for ; Sat, 18 Jan 2025 12:04:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737230676; x=1737835476; darn=passt.top; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=STXfgPiayLqn25zN02uTnfMUHqs2yaHTiFK25eshW5g=; b=WjKAXr6wm6zoUD8BUfqm/9nn7CzOjQ3euJjOyYolilv7nVSL6vdPuabFNdBNUiHtbE cIVfwAFhGdq4uhGPvqAwKvmt+n4AB9FlE3nLAeyrLJxPX6NdCJiBv9FqvJFNh793aIHg 4qYl6JkCQpBC/ejcqDT/6lKDz3drMgpVsUpD/rCKlNOuMfdDpe1cYEZM5+DIRIMhvAsT nGbFp049hqAthzZ3SmGQoTJkwWylrR0yXyvnFk7JDg3giZA4HgX1rjRUTAGVUyxVarXf AANXDrRQ5TwjOD4d1UEr51v6yNlAegaQL5nFCkgDXbMBNKIIhC9E4juyb/yQK3zz+Q4S bPPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737230676; x=1737835476; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=STXfgPiayLqn25zN02uTnfMUHqs2yaHTiFK25eshW5g=; b=B9PryP8sNyl7w+UMc+7HVy0YEJIX2fXbIWkoLJ27cS6yMaV8tmYZCwwbvRrCFqN1VV 5pOd94HujmsBgcwKpTstPmhS7HBAgDO90iMsKpCrObXaU0e98HSCYRkCH7UQwC0hTkDN YoBpzHCPXl2xGqmeUV/BLIiMphTeULj9aBdt8FSVsBl6ysJolfOkdaQ8jyuSqv2FACvd YJy1PmXBW4kpFyb+Ulq33RbXkJOTDst4SSKHCnXFkKyOA27WyioaNAjS+7IAzgbtx7Zr yK6ofdi3EkoJMvIKkdSPg1QN7Rl9U2OrZ3/7HGoeTubrUgehfJmX3ypwPHRZZqQBoM2a 8TsQ== X-Forwarded-Encrypted: i=1; AJvYcCV7QKJcZadMsB4BOVZAeXkTNusa3Hsifz7e91TSvW/xqigSkgnCDcXpYJFwREsVD5FO9TXeri6fOR0=@passt.top X-Gm-Message-State: AOJu0YyZ2sgrChIicUVyCBbMNGzrerN+JVZVvk/c/oGrPTgGQZDskLe9 Xv1kFFCr2azmrVGdomTxiBsfpfKeHeAuV1v8hKjUS/UcCFQuEXTqaBi5V71LSjodOKMm+Prvu1q hLXM76xmTInre0WSJzy3yPXhPidkIGPHKeIOq X-Gm-Gg: ASbGnctN4fLi53GdEaxR57F9rFLMLSNmOhs06TM40qGd5h5ejy/qJ4CZxpheC2M0lI8 MCHew177d1Rlen7EUqINUEUSvx98ySSO7L22ydZNJ4QKrPt0R0fA= X-Google-Smtp-Source: AGHT+IFWQJE7pXeRaChKLIYEcIGi92C7DmJOzWa7YjstL92TQP1zvq9u1T0Rt0UJuba4KZsRgV7jkMjar809dzFmmIg= X-Received: by 2002:a05:622a:120f:b0:466:8c23:823a with SMTP id d75a77b69052e-46e210e3591mr2215571cf.17.1737230676051; Sat, 18 Jan 2025 12:04:36 -0800 (PST) MIME-Version: 1.0 References: <20250117214035.2414668-1-jmaloy@redhat.com> In-Reply-To: <20250117214035.2414668-1-jmaloy@redhat.com> From: Neal Cardwell Date: Sat, 18 Jan 2025 15:04:20 -0500 X-Gm-Features: AbW1kvbFTFCjCPPn488IbheQSTDMU3SRvxzi917z3eRmyZCoqIE9EtyrlD34hsI Message-ID: Subject: Re: [net,v2] tcp: correct handling of extreme memory squeeze To: jmaloy@redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-MailFrom: ncardwell@google.com X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation Message-ID-Hash: UUKVQIZZUMPKQYKUGOEVKLHCM2LUYYXB X-Message-ID-Hash: UUKVQIZZUMPKQYKUGOEVKLHCM2LUYYXB X-Mailman-Approved-At: Sun, 19 Jan 2025 09:44:13 +0100 CC: netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org, passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com, dgibson@redhat.com, imagedong@tencent.com, eric.dumazet@gmail.com, edumazet@google.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri, Jan 17, 2025 at 4:41=E2=80=AFPM wrote: > > From: Jon Maloy > > Testing with iperf3 using the "pasta" protocol splicer has revealed > a bug in the way tcp handles window advertising in extreme memory > squeeze situations. > > Under memory pressure, a socket endpoint may temporarily advertise > a zero-sized window, but this is not stored as part of the socket data. > The reasoning behind this is that it is considered a temporary setting > which shouldn't influence any further calculations. > > However, if we happen to stall at an unfortunate value of the current > window size, the algorithm selecting a new value will consistently fail > to advertise a non-zero window once we have freed up enough memory. The "if we happen to stall at an unfortunate value of the current window size" phrase is a little vague... :-) Do you have a sense of what might count as "unfortunate" here? That might help in crafting a packetdrill test to reproduce this and have an automated regression test. > This means that this side's notion of the current window size is > different from the one last advertised to the peer, causing the latter > to not send any data to resolve the sitution. Since the peer last saw a zero receive window at the time of the memory-pressure drop, shouldn't the peer be sending repeated zero window probes, and shouldn't the local host respond to a ZWP with an ACK with the correct non-zero window? Do you happen to have a tcpdump .pcap of one of these cases that you can sh= are? > The problem occurs on the iperf3 server side, and the socket in question > is a completely regular socket with the default settings for the > fedora40 kernel. We do not use SO_PEEK or SO_RCVBUF on the socket. > > The following excerpt of a logging session, with own comments added, > shows more in detail what is happening: > > // tcp_v4_rcv(->) > // tcp_rcv_established(->) > [5201<->39222]: =3D=3D=3D=3D Activating log @ net/ipv4/tcp_input.c/tc= p_data_queue()/5257 =3D=3D=3D=3D > [5201<->39222]: tcp_data_queue(->) > [5201<->39222]: DROPPING skb [265600160..265665640], reason: SKB_D= ROP_REASON_PROTO_MEM > [rcv_nxt 265600160, rcv_wnd 262144, snt_ack 265469= 200, win_now 131184] What is "win_now"? That doesn't seem to correspond to any variable name in the Linux source tree. Can this be renamed to the tcp_select_window() variable it is printing, like "cur_win" or "effective_win" or "new_win", etc? Or perhaps you can attach your debugging patch in some email thread? I agree with Eric that these debug dumps are a little hard to parse without seeing the patch that allows us to understand what some of these fields are... I agree with Eric that probably tp->pred_flags should be cleared, and a packetdrill test for this would be super-helpful. thanks, neal