public inbox for passt-dev@passt.top
 help / color / mirror / code / Atom feed
From: Jon Maloy <jmaloy@redhat.com>
To: passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com,
	dgibson@redhat.com, jmaloy@redhat.com
Subject: [PATCH v8 2/2] tcp: handle shrunk window advertisemenst from guest
Date: Thu, 11 Jul 2024 18:26:31 -0400	[thread overview]
Message-ID: <20240711222631.1073408-3-jmaloy@redhat.com> (raw)
In-Reply-To: <20240711222631.1073408-1-jmaloy@redhat.com>

A bug in kernel TCP may lead to a deadlock where a zero window is sent
from the guest peer, while it is unable to send out window updates even
after socket reads have freed up enough buffer space to permit a larger
window. In this situation, new window advertisements from the peer can
only be triggered by data packets arriving from this side.

However, currently such packets are never sent, because the zero-window
condition prevents this side from sending out any packets whatsoever
to the peer.

We notice that the above bug is triggered *only* after the peer has
dropped one or more arriving packets because of severe memory squeeze,
and that we hence always enter a retransmission situation when this
occurs. This also means that the implementation goes against the
RFC-9293 recommendation that a previously advertised window never
should shrink.

RFC-9293 seems to permit that we can continue sending up to the right
edge of the last advertised non-zero window in such situations, so that
is what we do to resolve this situation.

It turns out that this solution is extremely simple to implememt in the
code: We just omit to save the advertised zero-window when we see that
it has shrunk, i.e., if the acknowledged sequence number in the
advertisement message is lower than that of the last data byte sent
from our side.

When that is the case, the following happens:
- The 'retr' flag in tcp_data_from_tap() will be 'false', so no
  retransmission will occur at this occasion.
- The data stream will soon reach the right edge of the previously
  advertised window. In fact, in all observed cases we have seen that
  it is already there when the zero-advertisement arrives.
- At that moment, the flags STALLED and ACK_FROM_TAP_DUE will be set,
  unless they already have been, meaning that only the next timer
  expiration will open for data retransmission or transmission.
- When that happens, the memory squeeze at the guest will normally have
  abated, and the data flow can resume.

It should be noted that although this solves the problem we have at
hand, it is a work-around, and not a genuine solution to the described
kernel bug.

Suggested-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
 tcp.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tcp.c b/tcp.c
index 1a8a8df..4e58a37 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1421,6 +1421,11 @@ static void tcp_get_tap_ws(struct tcp_tap_conn *conn,
 static void tcp_tap_window_update(struct tcp_tap_conn *conn, unsigned wnd)
 {
 	wnd = MIN(MAX_WINDOW, wnd << conn->ws_from_tap);
+
+	/* Work-around for peer bug: Don't update if window shrank to zero */
+	if (!wnd && SEQ_LT(conn->seq_ack_from_tap, conn->seq_to_tap))
+		return;
+
 	conn->wnd_from_tap = MIN(wnd >> conn->ws_from_tap, USHRT_MAX);
 
 	/* FIXME: reflect the tap-side receiver's window back to the sock-side
-- 
@@ -1421,6 +1421,11 @@ static void tcp_get_tap_ws(struct tcp_tap_conn *conn,
 static void tcp_tap_window_update(struct tcp_tap_conn *conn, unsigned wnd)
 {
 	wnd = MIN(MAX_WINDOW, wnd << conn->ws_from_tap);
+
+	/* Work-around for peer bug: Don't update if window shrank to zero */
+	if (!wnd && SEQ_LT(conn->seq_ack_from_tap, conn->seq_to_tap))
+		return;
+
 	conn->wnd_from_tap = MIN(wnd >> conn->ws_from_tap, USHRT_MAX);
 
 	/* FIXME: reflect the tap-side receiver's window back to the sock-side
-- 
2.45.2


  parent reply	other threads:[~2024-07-11 22:26 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-11 22:26 [PATCH v8 0/2] Add support for SO_PEEK_OFF Jon Maloy
2024-07-11 22:26 ` [PATCH v8 1/2] tcp: leverage support of SO_PEEK_OFF socket option when available Jon Maloy
2024-07-12 14:19   ` Stefano Brivio
2024-07-11 22:26 ` Jon Maloy [this message]
2024-07-12 14:20   ` [PATCH v8 2/2] tcp: handle shrunk window advertisemenst from guest Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240711222631.1073408-3-jmaloy@redhat.com \
    --to=jmaloy@redhat.com \
    --cc=dgibson@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=passt-dev@passt.top \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://passt.top/passt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).