public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: Jon Maloy <jmaloy@redhat.com>,
	passt-dev@passt.top, lvivier@redhat.com, dgibson@redhat.com
Subject: Re: [PATCH v9 2/2] tcp: handle shrunk window advertisemenst from guest
Date: Mon, 15 Jul 2024 19:08:13 +0200	[thread overview]
Message-ID: <20240715190656.0581b764@elisabeth> (raw)
In-Reply-To: <ZpRuj-39cCmxaAIi@zatzit>

On Mon, 15 Jul 2024 10:34:23 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Fri, Jul 12, 2024 at 03:04:50PM -0400, Jon Maloy wrote:
> > A bug in kernel TCP may lead to a deadlock where a zero window is sent
> > from the guest peer, while it is unable to send out window updates even
> > after socket reads have freed up enough buffer space to permit a larger
> > window. In this situation, new window advertisements from the peer can
> > only be triggered by data packets arriving from this side.
> > 
> > However, currently such packets are never sent, because the zero-window
> > condition prevents this side from sending out any packets whatsoever
> > to the peer.
> > 
> > We notice that the above bug is triggered *only* after the peer has
> > dropped one or more arriving packets because of severe memory squeeze,
> > and that we hence always enter a retransmission situation when this
> > occurs. This also means that the implementation goes against the
> > RFC-9293 recommendation that a previously advertised window never
> > should shrink.
> > 
> > RFC-9293 seems to permit that we can continue sending up to the right
> > edge of the last advertised non-zero window in such situations, so that
> > is what we do to resolve this situation.
> > 
> > It turns out that this solution is extremely simple to implememt in the
> > code: We just omit to save the advertised zero-window when we see that
> > it has shrunk, i.e., if the acknowledged sequence number in the
> > advertisement message is lower than that of the last data byte sent
> > from our side.
> > 
> > When that is the case, the following happens:
> > - The 'retr' flag in tcp_data_from_tap() will be 'false', so no
> >   retransmission will occur at this occasion.
> > - The data stream will soon reach the right edge of the previously
> >   advertised window. In fact, in all observed cases we have seen that
> >   it is already there when the zero-advertisement arrives.
> > - At that moment, the flags STALLED and ACK_FROM_TAP_DUE will be set,
> >   unless they already have been, meaning that only the next timer
> >   expiration will open for data retransmission or transmission.
> > - When that happens, the memory squeeze at the guest will normally have
> >   abated, and the data flow can resume.
> > 
> > It should be noted that although this solves the problem we have at
> > hand, it is a work-around, and not a genuine solution to the described
> > kernel bug.
> > 
> > Suggested-by: Stefano Brivio <sbrivio@redhat.com>
> > Signed-off-by: Jon Maloy <jmaloy@redhat.com>  
> 
> I only half-understand the problem here

Long story short(er): we fill up the socket receive buffer in a Linux
guest, completely, complying with the window.

At that point, since kernel commit e2142825c120, on memory pressure, we
get an ACK segment from the Linux guest *not* acknowledging all the
data we sent (a bit less), but reporting zero as window (as if we sent
"too much" data, which is not the case).

After that, we don't get any further segment at all (second issue
introduced by e2142825c120), and whatever pending transfer times out.

-- 
Stefano


  reply	other threads:[~2024-07-15 17:08 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-12 19:04 [PATCH v9 0/2] Add support for SO_PEEK_OFF Jon Maloy
2024-07-12 19:04 ` [PATCH v9 1/2] tcp: leverage support of SO_PEEK_OFF socket option when available Jon Maloy
2024-07-12 19:04 ` [PATCH v9 2/2] tcp: handle shrunk window advertisemenst from guest Jon Maloy
2024-07-15  0:34   ` David Gibson
2024-07-15 17:08     ` Stefano Brivio [this message]
2024-07-12 19:43 ` [PATCH v9 0/2] Add support for SO_PEEK_OFF Stefano Brivio
2024-07-15 16:58 ` Stefano Brivio
2024-07-15 18:52   ` Jon Maloy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240715190656.0581b764@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=dgibson@redhat.com \
    --cc=jmaloy@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=passt-dev@passt.top \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).