From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 5CC2F5A02B8 for ; Thu, 16 May 2024 12:26:52 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202312; t=1715855207; bh=d1i6ixA/wVXfeKFAei1fYF2SCYMSu6YK/YoqGU3nanI=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=YIiJfwARCX1YGONlNxQ8jg6NBcieCPul7X0ZP/AlyA7fjznUZXkcYdCRKJuEnz9pa qG+hZWXkg4rWPSmcYw7r3x1vK9hbdX/cwZCTdgxUr65mw+rkUE5Y9jJ0+Pke+9Cg4H ICzapyIhIZzGYLfEDIkSEyic2yzuNcC7X5exzLtUfFASCKiCN2i8DFeWW/E3jwzNOI uQWeoZx6HYfXwSuUidiwRV4yw8R+QOv6DbMyduvRW4zpq26YtbMUxu/8t/HkjnYopg ytLhklqE2hSwn3bKgM6EGGR40bejy8JAIOQFrcykFRgHZU7FvY0aLUUM2ZBI2tpsbS i/g2gY9vwXwNA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4Vg5qW62Nhz4x1m; Thu, 16 May 2024 20:26:47 +1000 (AEST) Date: Thu, 16 May 2024 17:19:25 +1000 From: David Gibson To: Jon Maloy Subject: Re: [PATCH v4 3/3] tcp: allow retransmit when peer receive window is zero Message-ID: References: <20240515153429.859185-1-jmaloy@redhat.com> <20240515153429.859185-4-jmaloy@redhat.com> <20240515222417.72ce256b@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="3AZr763qP8hBB9Rh" Content-Disposition: inline In-Reply-To: Message-ID-Hash: SKVHAG4YLSZHMBVNXWINZWIJGC7WVZHK X-Message-ID-Hash: SKVHAG4YLSZHMBVNXWINZWIJGC7WVZHK X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Stefano Brivio , passt-dev@passt.top, lvivier@redhat.com, dgibson@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --3AZr763qP8hBB9Rh Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, May 15, 2024 at 07:10:49PM -0400, Jon Maloy wrote: >=20 >=20 > On 2024-05-15 16:24, Stefano Brivio wrote: > > On Wed, 15 May 2024 11:34:29 -0400 > > Jon Maloy wrote: > >=20 > > > A bug in kernel TCP may lead to a deadlock where a zero window is sent > > > from the peer, while it is unable to send out window updates even aft= er > > > reads have freed up enough buffer space to permit a larger window. > > > In this situation, new window advertisemnts from the peer can only be > > > triggered by packets arriving from this side. > > >=20 > > > However, such packets are never sent, because the zero-window conditi= on > > > currently prevents this side from sending out any packets whatsoever > > > to the peer. > > >=20 > > > We notice that the above bug is triggered *only* after the peer has > > > dropped an arriving packet because of severe memory squeeze, and that= we > > > hence always enter a retransmission situation when this occurs. This > > > also means that it goes against the RFC 9293 recommendation that a > > > previously advertised window never should shrink. > > >=20 > > > RFC 9293 gives the solution to this situation. In chapter 3.6.1 we fi= nd > > > the following statement: > > > "A TCP receiver SHOULD NOT shrink the window, i.e., move the right > > > window edge to the left (SHLD-14). However, a sending TCP peer MUST > > > be robust against window shrinking, which may cause the > > > "usable window" (see Section 3.8.6.2.1) to become negative (MUST-34). > > >=20 > > > If this happens, the sender SHOULD NOT send new data (SHLD-15), but > > > SHOULD retransmit normally the old unacknowledged data between SND.UNA > > > and SND.UNA+SND.WND (SHLD-16). The sender MAY also retransmit old data > > > beyond SND.UNA+SND.WND (MAY-7)" > > >=20 > > > We never see the window become negative, but we interpret this as a > > > recommendation to use the previously available window during > > > retransmission even when the currently advertised window is zero. > > >=20 > > > We use the above mechanism only at timer-induced retransmits. > > > In the case we receive duplicate ack and a zero window, but > > > still know we have outstanding data acks waiting, we send out an > > > empty "fast probe" instead of doing fast retransmit. This averts > > > the risk of overwhelming a memory squeezed peer with retransmits, > > > while still forcing it to send out a new window update when the > > > probe is received. This entails a theoretical risk of redundant > > > retransmits from the peer, but that is a risk worth taking. > > >=20 > > > In case of a zero-window non-retransmission situation where there > > > is no new data to be sent, we also add a simple zero-window probing > > > feature. By sending an empty packet at regular timeout events we > > > resolve the situation described above, since the peer receives the > > > necessary trigger to advertise its window once it becomes non-zero > > > again. > > >=20 > > > It should be noted that although this solves the problem we have at > > > hand, it is not a genuine solution to the kernel bug. There may well > > > be TCP stacks around in other OS-es which don't do this, nor have > > > keep-alive probing as an alternatve way to solve the situation. > > >=20 > > > Signed-off-by: Jon Maloy > > >=20 > > > --- > > > v2: - Using previously advertised window during retransmission, inste= ad > > > highest send sequencece number in the cycle. > > > v3: - Rebased to newest code > > > - Changes based on feedback from PASST team > > > - Sending out empty probe message at timer expiration when > > > we are not in retransmit situation. > > > v4: - Some small changes based on feedback from PASST team. > > > - Replaced fast retransmit with a one-time 'fast probe' when > > > window is zero. > > > --- > > > tcp.c | 32 +++++++++++++++++++++++++++----- > > > tcp_conn.h | 2 ++ > > > 2 files changed, 29 insertions(+), 5 deletions(-) > > >=20 > > > diff --git a/tcp.c b/tcp.c > > > index 4163bf9..a33f494 100644 > > > --- a/tcp.c > > > +++ b/tcp.c > > > @@ -1761,9 +1761,15 @@ static void tcp_get_tap_ws(struct tcp_tap_conn= *conn, > > > */ > > > static void tcp_tap_window_update(struct tcp_tap_conn *conn, unsign= ed wnd) > > > { > > > + uint32_t wnd_edge; > > > + > > > wnd =3D MIN(MAX_WINDOW, wnd << conn->ws_from_tap); > > > conn->wnd_from_tap =3D MIN(wnd >> conn->ws_from_tap, USHRT_MAX); > > > + wnd_edge =3D conn->seq_ack_from_tap + wnd; > > > + if (wnd && SEQ_GT(wnd_edge, conn->seq_wnd_edge_from_tap)) > > Here, cppcheck ('make cppcheck') says: > >=20 > > tcp.c:1770:6: style: Condition 'wnd' is always true [knownConditionTrue= False] > > if (wnd && SEQ_GT(wnd_edge, conn->seq_wnd_edge_from_tap)) > > ^ > > tcp.c:1766:8: note: Assignment 'wnd=3D((1<<(16+8))<(wnd<ws_from_= tap))?(1<<(16+8)):(wnd<ws_from_tap)', assigned value is less than 1 > > wnd =3D MIN(MAX_WINDOW, wnd << conn->ws_from_tap); > > ^ > > tcp.c:1770:6: note: Condition 'wnd' is always true > > if (wnd && SEQ_GT(wnd_edge, conn->seq_wnd_edge_from_tap)) > > ^ > >=20 > > See the comment in tcp_update_seqack_wnd() and related suppression. > >=20 > > It's clearly a false positive (if you omit the MIN() macro, it goes > > away), so we need that same suppression here. > Ok. I'll change it. Still a little annoying when our tools are causing us > extra job because they aren't up to the task. Yeah, it's frustrating, particularly the fact that this was reported ages ago and there's no sign of motion on fixing it. But, I'm pretty sure cppcheck has caught considerably more than two bugs for me that might have taken a while to catch otherwise, so I still think it's worth using on balance. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --3AZr763qP8hBB9Rh Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmZFs3wACgkQzQJF27ox 2GeSNg/8C4rjnWF5o/JMVXpWB5M6ATOhsE3otoUxKb5TzCBv6XUqbEnzOtakI0+v a9fP9BcZr79CrYE9DuRKfgxEjrrhPzTHuMABeGij2dl7SZwjTp/xnjIS1pEGRz5Q icg1dKeS8EXcRtW9HOFejzJ0B2cpqXfRWnXUhBTG+ut+ZoiU4yOWmILZ/iOUkGWb VFmDxrC5/VnBK+iJOPsAp4/YVcXSATXWKME8/kX53wvzVdgxRPeVIg2Bf948Cn6/ r8OEqB3FNnCLsvHCHz1H+i3R4RYlMNnpDK030fZfH72GzSNYkloT908db3eCKvJX HmkFM14ATT7PjDtew11uza2nTeJC0HGpS18+ncjJ74ZV31YOpul/OHsxNn+B0UCL VtOnNadz1SJ+DF+mxRlLJNiD5ZwivNzk/m+50K7PbH8ShmxwKJF7ylLDYRv+s0k3 WC13mPjvxBbtue3Dt8vcwaW0vPHowg1XXgAQbG7ua3LuJKHW9CkzXwWYzImd36jg KOhwkht81WuCLicOka5P+bzPxpMmrFiOKF4bO2wmz2belWoj+wB5kcWM0yZT72Vi 48dzlJmO/zbzSUGzuw0vcOn113dWqzi3Jc9Y1CMiRWOkmyzwX9cGSuJx8ugL9wh6 UTw/VLt6Qi2OiAS7801HTDCDwcHCr1w9+QwtpbzD5slL5C4UaJo= =CwDC -----END PGP SIGNATURE----- --3AZr763qP8hBB9Rh--