public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Jon Maloy <jmaloy@redhat.com>
To: Eric Dumazet <edumazet@google.com>
Cc: netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org,
	passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com,
	dgibson@redhat.com, eric.dumazet@gmail.com
Subject: Re: [net-next 2/2] tcp: correct handling of extreme menory squeeze
Date: Mon, 8 Apr 2024 07:13:37 -0400	[thread overview]
Message-ID: <3b78aff5-a7d3-5af0-ec27-035d99cb1bd7@redhat.com> (raw)
In-Reply-To: <CANn89i+UjuasDbqH2tUu0wv=m+roHocBHwzcV4VS+Wotz-8hng@mail.gmail.com>



On 2024-04-08 06:03, Eric Dumazet wrote:
> On Sat, Apr 6, 2024 at 8:37 PM Eric Dumazet <edumazet@google.com> wrote:
>> On Sat, Apr 6, 2024 at 8:21 PM <jmaloy@redhat.com> wrote:
[...]
>>> [5201<->54494]: tcp_recvmsg_locked(<-) returning 57036 bytes, window now: 250164, qlen: 0
>>>
>>> [5201<->54494]: tcp_recvmsg_locked(->)
>>> [5201<->54494]:   __tcp_cleanup_rbuf(->) tp->rcv_wup: 2812454294, tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354
>>> [5201<->54494]:     NOT calling tcp_send_ack()
>>> [5201<->54494]:   __tcp_cleanup_rbuf(<-) tp->rcv_wup: 2812454294, tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354
>>> [5201<->54494]: tcp_recvmsg_locked(<-) returning -11 bytes, window now: 250164, qlen: 0
>>>
>>> We can see that although we are adverising a window size of zero,
>>> tp->rcv_wnd is not updated accordingly. This leads to a discrepancy
>>> between this side's and the peer's view of the current window size.
>>> - The peer thinks the window is zero, and stops sending.
>>> - This side ends up in a cycle where it repeatedly caclulates a new
>>>    window size it finds too small to advertise.
>>>
>>> Hence no messages are received, and no acknowledges are sent, and
>>> the situation remains locked even after the last queued receive buffer
>>> has been consumed.
>>>
>>> We fix this by setting tp->rcv_wnd to 0 before we return from the
>>> function tcp_select_window() in this particular case.
>>> Further testing shows that the connection recovers neatly from the
>>> squeeze situation, and traffic can continue indefinitely.
>>>
>>> Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
>>> Signed-off-by: Jon Maloy <jmaloy@redhat.com>
> I do not think this patch is good. If we reach zero window, it is a
> sign something is wrong.
>
> TCP has heuristics to slow down the sender if the receiver does not
> drain the receive queue fast enough.
>
> MSG_PEEK is an obvious reason, and SO_RCVLOWAT too.
>
> I suggest you take a look at tcp_set_rcvlowat(), see what is needed
> for SO_PEEK_OFF (ab)use ?
>
> In short, when SO_PEEK_OFF is in action :
> - TCP needs to not delay ACK when receive queue starts to fill
> - TCP needs to make sure sk_rcvbuf and tp->window_clamp grow (if
> autotuning is enabled)
>
We are not talking about the same socket here. The one being
overloaded is the terminating socket at the guest side. This is
just a regular socket not using MSG_PEEK or SO_PEEK_OFF.

SO_PEEK_OFF is used in the intermediate socket terminating
the connection towards the remote end.  We want to preserve
the message in its receive queue until it has been acknowledged
by the guest side, so we don't need to keep a copy of it in user space.
This seems to work flawlessly.

Anyway, I think this is worth taking a closer look at, as you say.
I don't think this situation should occur at all.

///jon


  reply	other threads:[~2024-04-08 11:13 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-06 18:21 [net-next 0/2] tcp: add support for SO_PEEK_OFF socket option jmaloy
2024-04-06 18:21 ` [net-next 1/2] " jmaloy
2024-04-08  9:46   ` Eric Dumazet
2024-04-06 18:21 ` [net-next 2/2] tcp: correct handling of extreme menory squeeze jmaloy
2024-04-06 16:37   ` Eric Dumazet
2024-04-07  4:52     ` Jason Xing
2024-04-07  5:51       ` Menglong Dong
2024-04-08 11:01         ` Jon Maloy
2024-04-08  8:03     ` Eric Dumazet
2024-04-08 11:13       ` Jon Maloy [this message]
  -- strict thread matches above, loose matches on Subject: below --
2024-04-03 22:58 [net-next 0/2] tcp: add support for SO_PEEK_OFF socket option Jon Maloy
2024-04-03 22:58 ` [net-next 2/2] tcp: correct handling of extreme menory squeeze Jon Maloy
2024-04-05 17:55   ` Stefano Brivio
2024-04-05 19:37     ` Jon Maloy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3b78aff5-a7d3-5af0-ec27-035d99cb1bd7@redhat.com \
    --to=jmaloy@redhat.com \
    --cc=davem@davemloft.net \
    --cc=dgibson@redhat.com \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=kuba@kernel.org \
    --cc=lvivier@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).