From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by passt.top (Postfix) with ESMTP id 427075A004F for ; Fri, 12 Jul 2024 21:05:01 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1720811100; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0sz8pdoXob95fBuqb5BDNMVNPSLk21MqFS+UiMDc5F8=; b=RKqQdEdn3hmLVLUDn2zOrgrV6760uBbkwBWusHjz85p4eTomVQVuUAfckvvjP+hcfGTikY KIH0v1cjEDD3+hfZ5VheP43oQujfrtI+t4E+Xli5wRDKNdUtDc4EVxJ8+pmoAkkJx0Q0b9 G2avmmT0Tumr8pM/HBNZeoqv1jAHvVg= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-683-RAcZKPzPPlCGwf343x23YQ-1; Fri, 12 Jul 2024 15:04:58 -0400 X-MC-Unique: RAcZKPzPPlCGwf343x23YQ-1 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id C725419560A2 for ; Fri, 12 Jul 2024 19:04:57 +0000 (UTC) Received: from jmaloy-thinkpadp16vgen1.rmtcaqc.csb (unknown [10.22.18.101]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 428811955F3B; Fri, 12 Jul 2024 19:04:55 +0000 (UTC) From: Jon Maloy To: passt-dev@passt.top, sbrivio@redhat.com, lvivier@redhat.com, dgibson@redhat.com, jmaloy@redhat.com Subject: [PATCH v9 2/2] tcp: handle shrunk window advertisemenst from guest Date: Fri, 12 Jul 2024 15:04:50 -0400 Message-ID: <20240712190450.1261907-3-jmaloy@redhat.com> In-Reply-To: <20240712190450.1261907-1-jmaloy@redhat.com> References: <20240712190450.1261907-1-jmaloy@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII"; x-default=true Message-ID-Hash: RJR7BFHXTIFZMC2PL5SLQJGM25AIYTX2 X-Message-ID-Hash: RJR7BFHXTIFZMC2PL5SLQJGM25AIYTX2 X-MailFrom: jmaloy@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: A bug in kernel TCP may lead to a deadlock where a zero window is sent from the guest peer, while it is unable to send out window updates even after socket reads have freed up enough buffer space to permit a larger window. In this situation, new window advertisements from the peer can only be triggered by data packets arriving from this side. However, currently such packets are never sent, because the zero-window condition prevents this side from sending out any packets whatsoever to the peer. We notice that the above bug is triggered *only* after the peer has dropped one or more arriving packets because of severe memory squeeze, and that we hence always enter a retransmission situation when this occurs. This also means that the implementation goes against the RFC-9293 recommendation that a previously advertised window never should shrink. RFC-9293 seems to permit that we can continue sending up to the right edge of the last advertised non-zero window in such situations, so that is what we do to resolve this situation. It turns out that this solution is extremely simple to implememt in the code: We just omit to save the advertised zero-window when we see that it has shrunk, i.e., if the acknowledged sequence number in the advertisement message is lower than that of the last data byte sent from our side. When that is the case, the following happens: - The 'retr' flag in tcp_data_from_tap() will be 'false', so no retransmission will occur at this occasion. - The data stream will soon reach the right edge of the previously advertised window. In fact, in all observed cases we have seen that it is already there when the zero-advertisement arrives. - At that moment, the flags STALLED and ACK_FROM_TAP_DUE will be set, unless they already have been, meaning that only the next timer expiration will open for data retransmission or transmission. - When that happens, the memory squeeze at the guest will normally have abated, and the data flow can resume. It should be noted that although this solves the problem we have at hand, it is a work-around, and not a genuine solution to the described kernel bug. Suggested-by: Stefano Brivio Signed-off-by: Jon Maloy -- v2: Updated comment with reference to kernel commit where bug was introduced. --- tcp.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/tcp.c b/tcp.c index e7e0a98..bda6758 100644 --- a/tcp.c +++ b/tcp.c @@ -1426,6 +1426,14 @@ static void tcp_get_tap_ws(struct tcp_tap_conn *conn, static void tcp_tap_window_update(struct tcp_tap_conn *conn, unsigned wnd) { wnd = MIN(MAX_WINDOW, wnd << conn->ws_from_tap); + + /* Work-around for bug introduced in peer kernel code, commit + * d4317abf7160 ("net: tcp: send zero-window ACK when no memory"). + * We don't update if window shrank to zero. + */ + if (!wnd && SEQ_LT(conn->seq_ack_from_tap, conn->seq_to_tap)) + return; + conn->wnd_from_tap = MIN(wnd >> conn->ws_from_tap, USHRT_MAX); /* FIXME: reflect the tap-side receiver's window back to the sock-side -- 2.45.2