public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Menglong Dong <menglong8.dong@gmail.com>
To: Jason Xing <kerneljasonxing@gmail.com>,
	Eric Dumazet <edumazet@google.com>,
	jmaloy@redhat.com
Cc: netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org,
	passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com,
	dgibson@redhat.com, eric.dumazet@gmail.com,
	dongmenglong.8@bytedance.com
Subject: Re: [net-next 2/2] tcp: correct handling of extreme menory squeeze
Date: Sun, 07 Apr 2024 07:51:35	[thread overview]
Message-ID: <CADxym3ZfC5WF7C2B8oYq=38rsLnQ-DOfvhH3iSk6+L0g2=XWDQ@mail.gmail.com> (raw)
In-Reply-To: <CAL+tcoC8LBQGe7ES01bxKFkU15GoFpEgT5jx1tnwb2Yb_BOKfw@mail.gmail.com>

On Sun, Apr 7, 2024 at 2:52 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Sun, Apr 7, 2024 at 2:38 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Sat, Apr 6, 2024 at 8:21 PM <jmaloy@redhat.com> wrote:
> > >
> > > From: Jon Maloy <jmaloy@redhat.com>
> > >
> > > Testing of the previous commit ("tcp: add support for SO_PEEK_OFF")
> > > in this series along with the pasta protocol splicer revealed a bug in
> > > the way tcp handles window advertising during extreme memory squeeze
> > > situations.
> > >
> > > The excerpt of the below logging session shows what is happeing:
> > >
> > > [5201<->54494]:     ==== Activating log @ tcp_select_window()/268 ====
> > > [5201<->54494]:     (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_NOMEM) --> TRUE
> > > [5201<->54494]:   tcp_select_window(<-) tp->rcv_wup: 2812454294, tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354, returning 0
> > > [5201<->54494]:   ADVERTISING WINDOW SIZE 0
> > > [5201<->54494]: __tcp_transmit_skb(<-) tp->rcv_wup: 2812454294, tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354
> > >
> > > [5201<->54494]: tcp_recvmsg_locked(->)
> > > [5201<->54494]:   __tcp_cleanup_rbuf(->) tp->rcv_wup: 2812454294, tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354
> > > [5201<->54494]:     (win_now: 250164, new_win: 262144 >= (2 * win_now): 500328))? --> time_to_ack: 0
> > > [5201<->54494]:     NOT calling tcp_send_ack()
> > > [5201<->54494]:   __tcp_cleanup_rbuf(<-) tp->rcv_wup: 2812454294, tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354
> > > [5201<->54494]: tcp_recvmsg_locked(<-) returning 131072 bytes, window now: 250164, qlen: 83
> > >
> > > [...]
> >
> > I would prefer a packetdrill test, it is not clear what is happening...
> >
> > In particular, have you used SO_RCVBUF ?
> >
> > >
> > > [5201<->54494]: tcp_recvmsg_locked(->)
> > > [5201<->54494]:   __tcp_cleanup_rbuf(->) tp->rcv_wup: 2812454294, tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354
> > > [5201<->54494]:     (win_now: 250164, new_win: 262144 >= (2 * win_now): 500328))? --> time_to_ack: 0
> > > [5201<->54494]:     NOT calling tcp_send_ack()
> > > [5201<->54494]:   __tcp_cleanup_rbuf(<-) tp->rcv_wup: 2812454294, tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354
> > > [5201<->54494]: tcp_recvmsg_locked(<-) returning 131072 bytes, window now: 250164, qlen: 1
> > >
> > > [5201<->54494]: tcp_recvmsg_locked(->)
> > > [5201<->54494]:   __tcp_cleanup_rbuf(->) tp->rcv_wup: 2812454294, tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354
> > > [5201<->54494]:     (win_now: 250164, new_win: 262144 >= (2 * win_now): 500328))? --> time_to_ack: 0
> > > [5201<->54494]:     NOT calling tcp_send_ack()
> > > [5201<->54494]:   __tcp_cleanup_rbuf(<-) tp->rcv_wup: 2812454294, tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354
> > > [5201<->54494]: tcp_recvmsg_locked(<-) returning 57036 bytes, window now: 250164, qlen: 0
> > >
> > > [5201<->54494]: tcp_recvmsg_locked(->)
> > > [5201<->54494]:   __tcp_cleanup_rbuf(->) tp->rcv_wup: 2812454294, tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354
> > > [5201<->54494]:     NOT calling tcp_send_ack()
> > > [5201<->54494]:   __tcp_cleanup_rbuf(<-) tp->rcv_wup: 2812454294, tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354
> > > [5201<->54494]: tcp_recvmsg_locked(<-) returning -11 bytes, window now: 250164, qlen: 0
> > >
> > > We can see that although we are adverising a window size of zero,
> > > tp->rcv_wnd is not updated accordingly. This leads to a discrepancy
> > > between this side's and the peer's view of the current window size.
> > > - The peer thinks the window is zero, and stops sending.

Hi!

In my original logic, the client will send a zero-window
ack when it drops the skb because it is out of the
memory. And the peer SHOULD keep retrans the dropped
packet.

Does the peer do the transmission in this case? The receive
window of the peer SHOULD recover once the
retransmission is successful.

> > > - This side ends up in a cycle where it repeatedly caclulates a new
> > >   window size it finds too small to advertise.

Yeah,  the zero-window suppressed the sending of ack in
__tcp_cleanup_rbuf, which I wasn't aware of.

The ack will recover the receive window of the peer. Does
it make the peer retrans the dropped data immediately?
In my opinion, the peer still needs to retrans the dropped
packet until the retransmission timer timeout. Isn't it?

If it is, maybe we can do the retransmission immediately
if we are in zero-window from a window-shrink, which can
make the recovery faster.

[......]
> > Any particular reason to not cc Menglong Dong ?
> > (I just did)
>
> He is not working at Tencent any more. Let me CC here one more time.

Thanks for CC the new email of mine, it's very kind of you,
xing :/

  reply	other threads:[~2024-04-07  7:51 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-06 18:21 [net-next 0/2] tcp: add support for SO_PEEK_OFF socket option jmaloy
2024-04-06 18:21 ` [net-next 1/2] " jmaloy
2024-04-08  9:46   ` Eric Dumazet
2024-04-06 18:21 ` [net-next 2/2] tcp: correct handling of extreme menory squeeze jmaloy
2024-04-06 16:37   ` Eric Dumazet
2024-04-07  4:52     ` Jason Xing
2024-04-07  5:51       ` Menglong Dong [this message]
2024-04-08 11:01         ` Jon Maloy
2024-04-08  8:03     ` Eric Dumazet
2024-04-08 11:13       ` Jon Maloy
  -- strict thread matches above, loose matches on Subject: below --
2024-04-03 22:58 [net-next 0/2] tcp: add support for SO_PEEK_OFF socket option Jon Maloy
2024-04-03 22:58 ` [net-next 2/2] tcp: correct handling of extreme menory squeeze Jon Maloy
2024-04-05 17:55   ` Stefano Brivio
2024-04-05 19:37     ` Jon Maloy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADxym3ZfC5WF7C2B8oYq=38rsLnQ-DOfvhH3iSk6+L0g2=XWDQ@mail.gmail.com' \
    --to=menglong8.dong@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dgibson@redhat.com \
    --cc=dongmenglong.8@bytedance.com \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=jmaloy@redhat.com \
    --cc=kerneljasonxing@gmail.com \
    --cc=kuba@kernel.org \
    --cc=lvivier@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).