From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTP id E52815A031A for ; Fri, 12 Jul 2024 00:26:40 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1720736799; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5ebLK9dDfn2849Tyioq92y4b7NZ8ekcV7r/Ik2ftxdQ=; b=FlcMp8D3AU9HCD+vcFL1l/lQQfEMSG1ETV0KUUOEZd8xpuHwRd+/g7gx/nr/4qH/CezDzJ YWD4RiJGHSlAl/P781e1Uq2eRZfz6hlYvvR6LQ9rM0s6rU3UANbWJRb1V+dffqwO7+qKBN UCFzOUHbf6gb/TbURUfI92WlKpV8VEY= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-680-NaAuLwWsNt2W5N-mVw1nzg-1; Thu, 11 Jul 2024 18:26:38 -0400 X-MC-Unique: NaAuLwWsNt2W5N-mVw1nzg-1 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id A9E2E1954B12 for ; Thu, 11 Jul 2024 22:26:37 +0000 (UTC) Received: from jmaloy-thinkpadp16vgen1.rmtcaqc.csb (unknown [10.22.18.101]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id E84551955F3B; Thu, 11 Jul 2024 22:26:35 +0000 (UTC) From: Jon Maloy To: passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com, dgibson@redhat.com, jmaloy@redhat.com Subject: [PATCH v8 2/2] tcp: handle shrunk window advertisemenst from guest Date: Thu, 11 Jul 2024 18:26:31 -0400 Message-ID: <20240711222631.1073408-3-jmaloy@redhat.com> In-Reply-To: <20240711222631.1073408-1-jmaloy@redhat.com> References: <20240711222631.1073408-1-jmaloy@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII"; x-default=true Message-ID-Hash: BF4G6SDTISQH3LRMEBL3M23JHPM7GZUO X-Message-ID-Hash: BF4G6SDTISQH3LRMEBL3M23JHPM7GZUO X-MailFrom: jmaloy@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: A bug in kernel TCP may lead to a deadlock where a zero window is sent from the guest peer, while it is unable to send out window updates even after socket reads have freed up enough buffer space to permit a larger window. In this situation, new window advertisements from the peer can only be triggered by data packets arriving from this side. However, currently such packets are never sent, because the zero-window condition prevents this side from sending out any packets whatsoever to the peer. We notice that the above bug is triggered *only* after the peer has dropped one or more arriving packets because of severe memory squeeze, and that we hence always enter a retransmission situation when this occurs. This also means that the implementation goes against the RFC-9293 recommendation that a previously advertised window never should shrink. RFC-9293 seems to permit that we can continue sending up to the right edge of the last advertised non-zero window in such situations, so that is what we do to resolve this situation. It turns out that this solution is extremely simple to implememt in the code: We just omit to save the advertised zero-window when we see that it has shrunk, i.e., if the acknowledged sequence number in the advertisement message is lower than that of the last data byte sent from our side. When that is the case, the following happens: - The 'retr' flag in tcp_data_from_tap() will be 'false', so no retransmission will occur at this occasion. - The data stream will soon reach the right edge of the previously advertised window. In fact, in all observed cases we have seen that it is already there when the zero-advertisement arrives. - At that moment, the flags STALLED and ACK_FROM_TAP_DUE will be set, unless they already have been, meaning that only the next timer expiration will open for data retransmission or transmission. - When that happens, the memory squeeze at the guest will normally have abated, and the data flow can resume. It should be noted that although this solves the problem we have at hand, it is a work-around, and not a genuine solution to the described kernel bug. Suggested-by: Stefano Brivio Signed-off-by: Jon Maloy --- tcp.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/tcp.c b/tcp.c index 1a8a8df..4e58a37 100644 --- a/tcp.c +++ b/tcp.c @@ -1421,6 +1421,11 @@ static void tcp_get_tap_ws(struct tcp_tap_conn *conn, static void tcp_tap_window_update(struct tcp_tap_conn *conn, unsigned wnd) { wnd = MIN(MAX_WINDOW, wnd << conn->ws_from_tap); + + /* Work-around for peer bug: Don't update if window shrank to zero */ + if (!wnd && SEQ_LT(conn->seq_ack_from_tap, conn->seq_to_tap)) + return; + conn->wnd_from_tap = MIN(wnd >> conn->ws_from_tap, USHRT_MAX); /* FIXME: reflect the tap-side receiver's window back to the sock-side -- 2.45.2