From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ed1-x534.google.com (mail-ed1-x534.google.com [IPv6:2a00:1450:4864:20::534]) by passt.top (Postfix) with ESMTPS id 044B55A0275 for ; Sat, 6 Apr 2024 20:37:51 +0200 (CEST) Received: by mail-ed1-x534.google.com with SMTP id 4fb4d7f45d1cf-56e5174ffc2so500a12.1 for ; Sat, 06 Apr 2024 11:37:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712428670; x=1713033470; darn=passt.top; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=lXiaeiDriebfXzGJiJC2yW3jLrxaZ2tj8awoa8ujxwA=; b=qZkYTsEBuhBR/E7d9pAPiXEkvhn1dFVizAszsJ95NMeWPLXF7CEvALqy71Rl3VR5Dk iLw24zaGvRX4sarnUlVd6e0zQD5OJhAkiKSwf976/Tn373TI+r6LUyAN/Aijlsq8xPRd JUxCVxG3TMhYHLuCWLlgSMbWdoCecPVueXTwZ+X8aT7c/nQk6Fz7/Ns4GxRfU8NhT8af noW2+VTfLjw+NpGHM7r42Y0fF7m1uhhoy7xP4eE/WrsXwR9WyrbT2zCdeQF9/PIPsEtc GufEfHWcNrYmtzbaYPkUmnVxZtaJ99U93aJdK9c4n8Sxr9H/dX6ndS3//WQK4EccYDH/ mv/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712428670; x=1713033470; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lXiaeiDriebfXzGJiJC2yW3jLrxaZ2tj8awoa8ujxwA=; b=fC0OABfRz3BG/JiBlR06/0Dc3rXUz2mkdLsKs68vlG80fJ1DfL1QiQCkb/JI0hEqir GryfS2NM2JVwSUz0qSUXQbxuxkSB/TFgUEugBXJwNCjSM1f2ZQiWi7+SiHhAC+1+0NDh p0YXw+Gw1z/RmErfYwUG38Z10HeAf7NsKXewttr/KLOgLHeCWUmw7VHs1eY4LlOiDzaD yo16NCdxDIi4UClTsQmpMWb/npOvfsbVYvrZVk40FicEcQ1nOryT9ISeKm1XjypCjmfO Zho3OCGQd85cGAcPeM6ly7FrP0lwrCqHEc5obK3M0So2oSB5CrsosbDUKnHNIt9ltigR ylxg== X-Forwarded-Encrypted: i=1; AJvYcCWq6bT+xFuBbQSD0VfzSVohIhw39xBIi7XvXoLucclUGImeBcfkqNt/pQSV+Tb4LyH4PJBW/y1nZnuumNixzDHeSw9O X-Gm-Message-State: AOJu0YwiryZtD5TAYBzwf8wtEgzVPFvrbZTDCgcCFybMZMzAiBj4OwxH oEXgaV2kpamZQWYX1cz/HnaHzKDvhTRRYAHFEdhUetQ8wN0C9n3B0oONBq7IjI04a7wRiVirxic i8Km/ySxm3SkGsjET3flUoDRc8tobuksd/p8t X-Google-Smtp-Source: AGHT+IEty72s9OzFyTi5pHlJFZrQLvzxGjU1K9N6gYeWeNs+IZWQ5D6h6eMxU+sQ6LHwIgx1sJ6irMo6KsPxDKC/Klc= X-Received: by 2002:a05:6402:c9b:b0:56e:3486:25a3 with SMTP id cm27-20020a0564020c9b00b0056e348625a3mr75240edb.1.1712428670233; Sat, 06 Apr 2024 11:37:50 -0700 (PDT) MIME-Version: 1.0 References: <20240406182107.261472-1-jmaloy@redhat.com> <20240406182107.261472-3-jmaloy@redhat.com> In-Reply-To: <20240406182107.261472-3-jmaloy@redhat.com> From: Eric Dumazet Message-ID: Subject: Re: [net-next 2/2] tcp: correct handling of extreme menory squeeze To: jmaloy@redhat.com, Menglong Dong Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-MailFrom: edumazet@google.com X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation Message-ID-Hash: CIGFY3JDKOBPMB67AD3GZCHCQY6U4YWL X-Message-ID-Hash: CIGFY3JDKOBPMB67AD3GZCHCQY6U4YWL X-Mailman-Approved-At: Mon, 08 Apr 2024 11:51:32 +0200 CC: netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org, passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com, dgibson@redhat.com, eric.dumazet@gmail.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Date: Sat, 06 Apr 2024 18:37:51 X-Original-Date: Sat, 6 Apr 2024 20:37:35 +0200 On Sat, Apr 6, 2024 at 8:21=E2=80=AFPM wrote: > > From: Jon Maloy > > Testing of the previous commit ("tcp: add support for SO_PEEK_OFF") > in this series along with the pasta protocol splicer revealed a bug in > the way tcp handles window advertising during extreme memory squeeze > situations. > > The excerpt of the below logging session shows what is happeing: > > [5201<->54494]: =3D=3D=3D=3D Activating log @ tcp_select_window()/268= =3D=3D=3D=3D > [5201<->54494]: (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_NOMEM) -->= TRUE > [5201<->54494]: tcp_select_window(<-) tp->rcv_wup: 2812454294, tp->rcv_= wnd: 5812224, tp->rcv_nxt 2818016354, returning 0 > [5201<->54494]: ADVERTISING WINDOW SIZE 0 > [5201<->54494]: __tcp_transmit_skb(<-) tp->rcv_wup: 2812454294, tp->rcv_w= nd: 5812224, tp->rcv_nxt 2818016354 > > [5201<->54494]: tcp_recvmsg_locked(->) > [5201<->54494]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 2812454294, tp->rcv= _wnd: 5812224, tp->rcv_nxt 2818016354 > [5201<->54494]: (win_now: 250164, new_win: 262144 >=3D (2 * win_now):= 500328))? --> time_to_ack: 0 > [5201<->54494]: NOT calling tcp_send_ack() > [5201<->54494]: __tcp_cleanup_rbuf(<-) tp->rcv_wup: 2812454294, tp->rcv= _wnd: 5812224, tp->rcv_nxt 2818016354 > [5201<->54494]: tcp_recvmsg_locked(<-) returning 131072 bytes, window now= : 250164, qlen: 83 > > [...] I would prefer a packetdrill test, it is not clear what is happening... In particular, have you used SO_RCVBUF ? > > [5201<->54494]: tcp_recvmsg_locked(->) > [5201<->54494]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 2812454294, tp->rcv= _wnd: 5812224, tp->rcv_nxt 2818016354 > [5201<->54494]: (win_now: 250164, new_win: 262144 >=3D (2 * win_now):= 500328))? --> time_to_ack: 0 > [5201<->54494]: NOT calling tcp_send_ack() > [5201<->54494]: __tcp_cleanup_rbuf(<-) tp->rcv_wup: 2812454294, tp->rcv= _wnd: 5812224, tp->rcv_nxt 2818016354 > [5201<->54494]: tcp_recvmsg_locked(<-) returning 131072 bytes, window now= : 250164, qlen: 1 > > [5201<->54494]: tcp_recvmsg_locked(->) > [5201<->54494]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 2812454294, tp->rcv= _wnd: 5812224, tp->rcv_nxt 2818016354 > [5201<->54494]: (win_now: 250164, new_win: 262144 >=3D (2 * win_now):= 500328))? --> time_to_ack: 0 > [5201<->54494]: NOT calling tcp_send_ack() > [5201<->54494]: __tcp_cleanup_rbuf(<-) tp->rcv_wup: 2812454294, tp->rcv= _wnd: 5812224, tp->rcv_nxt 2818016354 > [5201<->54494]: tcp_recvmsg_locked(<-) returning 57036 bytes, window now:= 250164, qlen: 0 > > [5201<->54494]: tcp_recvmsg_locked(->) > [5201<->54494]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 2812454294, tp->rcv= _wnd: 5812224, tp->rcv_nxt 2818016354 > [5201<->54494]: NOT calling tcp_send_ack() > [5201<->54494]: __tcp_cleanup_rbuf(<-) tp->rcv_wup: 2812454294, tp->rcv= _wnd: 5812224, tp->rcv_nxt 2818016354 > [5201<->54494]: tcp_recvmsg_locked(<-) returning -11 bytes, window now: 2= 50164, qlen: 0 > > We can see that although we are adverising a window size of zero, > tp->rcv_wnd is not updated accordingly. This leads to a discrepancy > between this side's and the peer's view of the current window size. > - The peer thinks the window is zero, and stops sending. > - This side ends up in a cycle where it repeatedly caclulates a new > window size it finds too small to advertise. > > Hence no messages are received, and no acknowledges are sent, and > the situation remains locked even after the last queued receive buffer > has been consumed. > > We fix this by setting tp->rcv_wnd to 0 before we return from the > function tcp_select_window() in this particular case. > Further testing shows that the connection recovers neatly from the > squeeze situation, and traffic can continue indefinitely. > > Reviewed-by: Stefano Brivio > Signed-off-by: Jon Maloy > --- > net/ipv4/tcp_output.c | 14 +++++++++----- > 1 file changed, 9 insertions(+), 5 deletions(-) > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index 9282fafc0e61..57ead8f3c334 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -263,11 +263,15 @@ static u16 tcp_select_window(struct sock *sk) > u32 cur_win, new_win; > > /* Make the window 0 if we failed to queue the data because we > - * are out of memory. The window is temporary, so we don't store > - * it on the socket. > + * are out of memory. The window needs to be stored in the socket > + * for the connection to recover. > */ > - if (unlikely(inet_csk(sk)->icsk_ack.pending & ICSK_ACK_NOMEM)) > - return 0; > + if (unlikely(inet_csk(sk)->icsk_ack.pending & ICSK_ACK_NOMEM)) { > + new_win =3D 0; > + tp->rcv_wnd =3D 0; > + tp->rcv_wup =3D tp->rcv_nxt; > + goto out; > + } > > cur_win =3D tcp_receive_window(tp); > new_win =3D __tcp_select_window(sk); > @@ -301,7 +305,7 @@ static u16 tcp_select_window(struct sock *sk) > > /* RFC1323 scaling applied */ > new_win >>=3D tp->rx_opt.rcv_wscale; > - > +out: > /* If we advertise zero window, disable fast path. */ > if (new_win =3D=3D 0) { > tp->pred_flags =3D 0; > -- > 2.42.0 > Any particular reason to not cc Menglong Dong ? (I just did) This code was added in commit e2142825c120d4317abf7160a0fc34b3de532586 Author: Menglong Dong Date: Fri Aug 11 10:55:27 2023 +0800 net: tcp: send zero-window ACK when no memory For now, skb will be dropped when no memory, which makes client keep retrans util timeout and it's not friendly to the users. In this patch, we reply an ACK with zero-window in this case to update the snd_wnd of the sender to 0. Therefore, the sender won't timeout the connection and will probe the zero-window with the retransmits. Signed-off-by: Menglong Dong Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller