From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTP id 235B05A004E for ; Mon, 15 Jul 2024 19:08:53 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721063332; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KApAGAEES9VbcwO80DYSWJYQhLo0w5G7fjVTZy5ozYU=; b=da742CfjmMbo73J1kY40Yx9c3AwIERlrf1vMPsfS6snlwPFDw05CCMZBNJnBgsHs4tJBsM +ntBZMB6FyCOwZebLRuSOYYY55kqvNmg8bGSNSMvBHsL93Ob7YvwzyDkTsRyyX2Ivwxy54 oBwKBXm8tuXG+mz/LRAlU5uREJRv1Lg= Received: from mail-oa1-f71.google.com (mail-oa1-f71.google.com [209.85.160.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-368-ml7lUDOSOjuYtDM0bZUMeA-1; Mon, 15 Jul 2024 13:08:49 -0400 X-MC-Unique: ml7lUDOSOjuYtDM0bZUMeA-1 Received: by mail-oa1-f71.google.com with SMTP id 586e51a60fabf-25e133f4044so4173500fac.1 for ; Mon, 15 Jul 2024 10:08:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721063329; x=1721668129; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=KApAGAEES9VbcwO80DYSWJYQhLo0w5G7fjVTZy5ozYU=; b=Lqm5VFVyqMK24hp4EYymlcooWOVVBkrrc7OKRTwzf6YYGzDzrSlznu39mzSKb4v39m Ky5A10tTSm3qSmcijJWMrILp06IemM0Z0SNani2ossdW9ApvJoldFvnEE3cq3FXhIMcG TcItZc2/8lchquiR4IaL7QLqMTD+h0YxXjhVNSQYVlsuAXI57TRINeKglM+/s2Fr9TYx lujcLW0XPSyVH9nbzYf5DrKUL4asRUgaN7wvVFnbWBxFoRRPubQwnyyz5ZMh4ikWG46+ SQmTqa0oXtxvdqbFZk+vswnl3/v3//31QkVG29FAxzMdcI2yR81sToCpO5h6kRDrLNUC mPhw== X-Forwarded-Encrypted: i=1; AJvYcCXwgHJTuL94gusmstliAKe6vCf/cx+YDLhOgJ2J0GNLGT/8A/ojityJDevoW4zg4yUOBjFRi45A1ybPH2orTDkIx1Cs X-Gm-Message-State: AOJu0Yyd0ri9YgZdcQA8B4dqkxSR3Uel/G/xgyX/4tiChkQaj+Pkx8cG btKEgFSvry+29LbvsT/JxAHBmwb2LabABQ935TQPcOKac//J7yUhrDdD0LAP8Hv28KGhNvld/9H xTdEcD4T5VUF/zxcZpGuFQnajbcOeE7wcLuh/C9n/LixfP0wU+A== X-Received: by 2002:a05:6870:ac26:b0:23c:bc3a:6ccb with SMTP id 586e51a60fabf-25eae7baa68mr17394085fac.19.1721063328843; Mon, 15 Jul 2024 10:08:48 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH1dZwV/bd7LMbILmoRi7wbna8qTIB83w7MrUapLv74kjaPznAIO7/TSJ9JBZiiU7vW6PxdtQ== X-Received: by 2002:a05:6870:ac26:b0:23c:bc3a:6ccb with SMTP id 586e51a60fabf-25eae7baa68mr17394055fac.19.1721063328372; Mon, 15 Jul 2024 10:08:48 -0700 (PDT) Received: from maya.cloud.tilaa.com (maya.cloud.tilaa.com. [164.138.29.33]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a160bbfadbsm217887585a.35.2024.07.15.10.08.47 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Jul 2024 10:08:47 -0700 (PDT) Date: Mon, 15 Jul 2024 19:08:13 +0200 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH v9 2/2] tcp: handle shrunk window advertisemenst from guest Message-ID: <20240715190656.0581b764@elisabeth> In-Reply-To: References: <20240712190450.1261907-1-jmaloy@redhat.com> <20240712190450.1261907-3-jmaloy@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: AV7322EYEJFWGMJA7HDPS7O6GIQ536J4 X-Message-ID-Hash: AV7322EYEJFWGMJA7HDPS7O6GIQ536J4 X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Jon Maloy , passt-dev@passt.top, lvivier@redhat.com, dgibson@redhat.com X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Mon, 15 Jul 2024 10:34:23 +1000 David Gibson wrote: > On Fri, Jul 12, 2024 at 03:04:50PM -0400, Jon Maloy wrote: > > A bug in kernel TCP may lead to a deadlock where a zero window is sent > > from the guest peer, while it is unable to send out window updates even > > after socket reads have freed up enough buffer space to permit a larger > > window. In this situation, new window advertisements from the peer can > > only be triggered by data packets arriving from this side. > > > > However, currently such packets are never sent, because the zero-window > > condition prevents this side from sending out any packets whatsoever > > to the peer. > > > > We notice that the above bug is triggered *only* after the peer has > > dropped one or more arriving packets because of severe memory squeeze, > > and that we hence always enter a retransmission situation when this > > occurs. This also means that the implementation goes against the > > RFC-9293 recommendation that a previously advertised window never > > should shrink. > > > > RFC-9293 seems to permit that we can continue sending up to the right > > edge of the last advertised non-zero window in such situations, so that > > is what we do to resolve this situation. > > > > It turns out that this solution is extremely simple to implememt in the > > code: We just omit to save the advertised zero-window when we see that > > it has shrunk, i.e., if the acknowledged sequence number in the > > advertisement message is lower than that of the last data byte sent > > from our side. > > > > When that is the case, the following happens: > > - The 'retr' flag in tcp_data_from_tap() will be 'false', so no > > retransmission will occur at this occasion. > > - The data stream will soon reach the right edge of the previously > > advertised window. In fact, in all observed cases we have seen that > > it is already there when the zero-advertisement arrives. > > - At that moment, the flags STALLED and ACK_FROM_TAP_DUE will be set, > > unless they already have been, meaning that only the next timer > > expiration will open for data retransmission or transmission. > > - When that happens, the memory squeeze at the guest will normally have > > abated, and the data flow can resume. > > > > It should be noted that although this solves the problem we have at > > hand, it is a work-around, and not a genuine solution to the described > > kernel bug. > > > > Suggested-by: Stefano Brivio > > Signed-off-by: Jon Maloy > > I only half-understand the problem here Long story short(er): we fill up the socket receive buffer in a Linux guest, completely, complying with the window. At that point, since kernel commit e2142825c120, on memory pressure, we get an ACK segment from the Linux guest *not* acknowledging all the data we sent (a bit less), but reporting zero as window (as if we sent "too much" data, which is not the case). After that, we don't get any further segment at all (second issue introduced by e2142825c120), and whatever pending transfer times out. -- Stefano