From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yb1-xb43.google.com (mail-yb1-xb43.google.com [IPv6:2607:f8b0:4864:20::b43]) by passt.top (Postfix) with ESMTPS id 4F3F15A026E for ; Sun, 7 Apr 2024 09:51:35 +0200 (CEST) Received: by mail-yb1-xb43.google.com with SMTP id 3f1490d57ef6-db4364ecd6aso3648233276.2 for ; Sun, 07 Apr 2024 00:51:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712476294; x=1713081094; darn=passt.top; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=EcrJ+f/leZ2llI1AVCg9OHk7bRAhwySZ6SeppuOh+9Y=; b=A4sXYM8JWu0nimMacBLSIdV8PgUjO0vAUz+tOkvihy8th8KH8u7VHOTWx4QEHvW2Kp f8ncah2yOr4W99n8ld3f3SC4rX16NE3xaTu3T1snJHBvjrn2Ij4/dduldD1qhG/ML10E 4v+YToASBG2kFeSYlI+/QFwnp9EJjkrVXUd08nHGkWtQ+64StCl9C3Lw2VpSjl/mb9eO 6iut9NehSNdqbX6ubXt2qC6xtDZMNq2A4YJZeT5l1z6LAF/nRE5I1cWFKkeL2+7/CkQ9 pi4gwIUGKnDJRe5Qsn79bnIQuCipwd+w9MHdYtz0u77n27WtUMxsus9foFGll8+sxOOU aSDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712476294; x=1713081094; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EcrJ+f/leZ2llI1AVCg9OHk7bRAhwySZ6SeppuOh+9Y=; b=pc18OVSJswhO3OFa5ghmJfYf6zspBpZrDfm9d83cejkhBd4Y5BJgw4Y5938xMkMz9V fXfu0VfsU4HTEDG2dYeRvRcCgy8C3FjnJNoThrRfwh8Gcf4KnRLT02e+NS2CzgoWtbJs dMykwI29ZE89CW1aRSjLNSzRVc1HxTXbcSj60t08hjaXdEpkLv5L3T65L8eY7gppwPxS qC0F77QOvf/zZdKCzr3dgg/KLUe9pnz+DRvtj9eJtU1hb7d0+q5xpLrDSNAOLoIejT2M FOFWvVLwcOssU7qqBvBx9sNqXQnMEDApVvWqWN4pInP7rufd5OkrgrwfwhBSvYletGZ6 CzLQ== X-Forwarded-Encrypted: i=1; AJvYcCXZDsGRZiCCOaDxZ+utVglg/a6yrlBrlx3yvFUbV1aIVO+IyGyA9kIGusPjaPRphOLuLjKuGxh6F4IxlIDfaQy5Dj3R X-Gm-Message-State: AOJu0YySCVyk+lzadfUwmqreMiomdYQ4ZvW1o/Izlnljs50KvqdLWLbO UfMPALrChNvRMbS7AQvGvmoBAEZh5SlSkpusiOQUccybm7FWJbckMZLQRlnkmxy88OktDImKkuG ZcHeKLOgow5iuc/54mzB1zCE45Z4= X-Google-Smtp-Source: AGHT+IH+burj7vX37nFGWgMb+ZL2QpP91qPF0sQy9BiVJM9Swp734JvHpmpEKV15DMh1crESLT9H23fhAuKM/pBSzQw= X-Received: by 2002:a05:6902:2b0d:b0:dcf:f4d3:3a16 with SMTP id fi13-20020a0569022b0d00b00dcff4d33a16mr4980429ybb.45.1712476294097; Sun, 07 Apr 2024 00:51:34 -0700 (PDT) MIME-Version: 1.0 References: <20240406182107.261472-1-jmaloy@redhat.com> <20240406182107.261472-3-jmaloy@redhat.com> In-Reply-To: From: Menglong Dong Message-ID: Subject: Re: [net-next 2/2] tcp: correct handling of extreme menory squeeze To: Jason Xing , Eric Dumazet , jmaloy@redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-MailFrom: menglong8.dong@gmail.com X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation Message-ID-Hash: K4UR7MIXJUPTLWGWW6RO6DIGTLAB7XBB X-Message-ID-Hash: K4UR7MIXJUPTLWGWW6RO6DIGTLAB7XBB X-Mailman-Approved-At: Mon, 08 Apr 2024 11:51:32 +0200 CC: netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org, passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com, dgibson@redhat.com, eric.dumazet@gmail.com, dongmenglong.8@bytedance.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Date: Sun, 07 Apr 2024 07:51:35 X-Original-Date: Sun, 7 Apr 2024 15:51:22 +0800 On Sun, Apr 7, 2024 at 2:52=E2=80=AFPM Jason Xing wrote: > > On Sun, Apr 7, 2024 at 2:38=E2=80=AFAM Eric Dumazet = wrote: > > > > On Sat, Apr 6, 2024 at 8:21=E2=80=AFPM wrote: > > > > > > From: Jon Maloy > > > > > > Testing of the previous commit ("tcp: add support for SO_PEEK_OFF") > > > in this series along with the pasta protocol splicer revealed a bug i= n > > > the way tcp handles window advertising during extreme memory squeeze > > > situations. > > > > > > The excerpt of the below logging session shows what is happeing: > > > > > > [5201<->54494]: =3D=3D=3D=3D Activating log @ tcp_select_window()= /268 =3D=3D=3D=3D > > > [5201<->54494]: (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_NOMEM)= --> TRUE > > > [5201<->54494]: tcp_select_window(<-) tp->rcv_wup: 2812454294, tp->= rcv_wnd: 5812224, tp->rcv_nxt 2818016354, returning 0 > > > [5201<->54494]: ADVERTISING WINDOW SIZE 0 > > > [5201<->54494]: __tcp_transmit_skb(<-) tp->rcv_wup: 2812454294, tp->r= cv_wnd: 5812224, tp->rcv_nxt 2818016354 > > > > > > [5201<->54494]: tcp_recvmsg_locked(->) > > > [5201<->54494]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 2812454294, tp-= >rcv_wnd: 5812224, tp->rcv_nxt 2818016354 > > > [5201<->54494]: (win_now: 250164, new_win: 262144 >=3D (2 * win_n= ow): 500328))? --> time_to_ack: 0 > > > [5201<->54494]: NOT calling tcp_send_ack() > > > [5201<->54494]: __tcp_cleanup_rbuf(<-) tp->rcv_wup: 2812454294, tp-= >rcv_wnd: 5812224, tp->rcv_nxt 2818016354 > > > [5201<->54494]: tcp_recvmsg_locked(<-) returning 131072 bytes, window= now: 250164, qlen: 83 > > > > > > [...] > > > > I would prefer a packetdrill test, it is not clear what is happening... > > > > In particular, have you used SO_RCVBUF ? > > > > > > > > [5201<->54494]: tcp_recvmsg_locked(->) > > > [5201<->54494]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 2812454294, tp-= >rcv_wnd: 5812224, tp->rcv_nxt 2818016354 > > > [5201<->54494]: (win_now: 250164, new_win: 262144 >=3D (2 * win_n= ow): 500328))? --> time_to_ack: 0 > > > [5201<->54494]: NOT calling tcp_send_ack() > > > [5201<->54494]: __tcp_cleanup_rbuf(<-) tp->rcv_wup: 2812454294, tp-= >rcv_wnd: 5812224, tp->rcv_nxt 2818016354 > > > [5201<->54494]: tcp_recvmsg_locked(<-) returning 131072 bytes, window= now: 250164, qlen: 1 > > > > > > [5201<->54494]: tcp_recvmsg_locked(->) > > > [5201<->54494]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 2812454294, tp-= >rcv_wnd: 5812224, tp->rcv_nxt 2818016354 > > > [5201<->54494]: (win_now: 250164, new_win: 262144 >=3D (2 * win_n= ow): 500328))? --> time_to_ack: 0 > > > [5201<->54494]: NOT calling tcp_send_ack() > > > [5201<->54494]: __tcp_cleanup_rbuf(<-) tp->rcv_wup: 2812454294, tp-= >rcv_wnd: 5812224, tp->rcv_nxt 2818016354 > > > [5201<->54494]: tcp_recvmsg_locked(<-) returning 57036 bytes, window = now: 250164, qlen: 0 > > > > > > [5201<->54494]: tcp_recvmsg_locked(->) > > > [5201<->54494]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 2812454294, tp-= >rcv_wnd: 5812224, tp->rcv_nxt 2818016354 > > > [5201<->54494]: NOT calling tcp_send_ack() > > > [5201<->54494]: __tcp_cleanup_rbuf(<-) tp->rcv_wup: 2812454294, tp-= >rcv_wnd: 5812224, tp->rcv_nxt 2818016354 > > > [5201<->54494]: tcp_recvmsg_locked(<-) returning -11 bytes, window no= w: 250164, qlen: 0 > > > > > > We can see that although we are adverising a window size of zero, > > > tp->rcv_wnd is not updated accordingly. This leads to a discrepancy > > > between this side's and the peer's view of the current window size. > > > - The peer thinks the window is zero, and stops sending. Hi! In my original logic, the client will send a zero-window ack when it drops the skb because it is out of the memory. And the peer SHOULD keep retrans the dropped packet. Does the peer do the transmission in this case? The receive window of the peer SHOULD recover once the retransmission is successful. > > > - This side ends up in a cycle where it repeatedly caclulates a new > > > window size it finds too small to advertise. Yeah, the zero-window suppressed the sending of ack in __tcp_cleanup_rbuf, which I wasn't aware of. The ack will recover the receive window of the peer. Does it make the peer retrans the dropped data immediately? In my opinion, the peer still needs to retrans the dropped packet until the retransmission timer timeout. Isn't it? If it is, maybe we can do the retransmission immediately if we are in zero-window from a window-shrink, which can make the recovery faster. [......] > > Any particular reason to not cc Menglong Dong ? > > (I just did) > > He is not working at Tencent any more. Let me CC here one more time. Thanks for CC the new email of mine, it's very kind of you, xing :/