From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: passt.top; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=eYLWD9uv; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by passt.top (Postfix) with ESMTPS id 539965A0278 for ; Wed, 10 Sep 2025 08:37:54 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757486273; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fLHT595QdKTCiZgRGqOTBg6UHrwV0LVPlsab0/ESXN4=; b=eYLWD9uvTqH5mZMDz/Y1Rica2z9y/Cmn9JqHLESTwUbjckkbheQuDupfpBsIsn6fNgNxxZ vWP04+7gzkVprUesok7YKaIIkI+vu33DvQtEV9lBEZk9TjcqQ0GWqKwsZMEajbzydXXgcb Forq9mTkIl0nBx6ouj1BnYHK8kH7rQI= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-634-2qE8jbBQMWSNSlomV15ZLw-1; Wed, 10 Sep 2025 02:37:51 -0400 X-MC-Unique: 2qE8jbBQMWSNSlomV15ZLw-1 X-Mimecast-MFC-AGG-ID: 2qE8jbBQMWSNSlomV15ZLw_1757486271 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-45b9912a07dso38096145e9.3 for ; Tue, 09 Sep 2025 23:37:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1757486270; x=1758091070; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=fLHT595QdKTCiZgRGqOTBg6UHrwV0LVPlsab0/ESXN4=; b=R1UFUPDYT/9mkRc5IjMFQjkhzmGdzYpOwwoYTswOjaLMjflOMLy5veipLiu+/MqDUP eZohqKzy5bLrVABdgfWhLVuSOzqQYmcAbiGKWEbLDuw0KB5h1fEIqB5x/HuaNI63/CcB kPXflSnvl/vKxSNLc9WmlWBGncHfH0BPInpA8qnCnQ6P8miTAzzt3iXl+W0F8F5WYYQ6 u+A5CDTezSBrC/FOcSRiQ82sjiXbLG+mZvVxLRZSRtN0CBE/ZXNpdt0SdI+KZvIaoNzI 6+S+pyX3ry4ZwZkgomAU5+nB9+W/1Xp9tDx23Kc9DUlpNui6NEs2iu01rvgJkAALpUoo bUzg== X-Gm-Message-State: AOJu0Yw8QdcXFMmClrPx0PwikN6qkw3vv1pHFL+V8lpotCzkLpn1fnyg C7GuJmhzbfi6os3Nr84Sl7qVbuq4yzF+iEhO4ffvLkXzZYI198ejjTxWZ1Y+IUPsF2OwGd6QicJ i7ISKRCESy64t4kGZqsERjMowYZv0RevbUa+b/zfIS/SMvipJkAnPeg== X-Gm-Gg: ASbGncu6diDe2YJd8USs14Jyolj6Y3Uzn3nPNO/Cx/8XkwacaGyNxiFNqPJigf4P3cS Cu1n9y7gQf5QQsc/CtjK1sPCnGWRYqUysIUfOww46vkPsfghAYKp/nM65+eYh0RAfxA/qzCthyv aLkG6p0MH3HfiiqrDW7/qIC/I+dvc0S6OBeHzw8JgsDhsaTpI6Gra6gQlhPPd/a77LQmf38rJNZ OjeFpoE5A3+aLKiEhuiXJ88A6yOIkDeVPH5x7I01dJBXH3SzvV6I0iuQgJ0dmmP7NAlFaIKfJ4c ondncpZBfn71RpKHCJ6je0QFBt//ckMmErjQqwS3Vpa2aCajmjM= X-Received: by 2002:a05:600c:3112:b0:45b:47e1:ef67 with SMTP id 5b1f17b1804b1-45ddded3dfemr116942035e9.34.1757486270574; Tue, 09 Sep 2025 23:37:50 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGfhluJdyNkqk7XgCwHxH0QbHuUkCDF5rxF2Cft2T6ViLxRxMkLhGoKwxynKSYt+4hVhKFklg== X-Received: by 2002:a05:600c:3112:b0:45b:47e1:ef67 with SMTP id 5b1f17b1804b1-45ddded3dfemr116941925e9.34.1757486270099; Tue, 09 Sep 2025 23:37:50 -0700 (PDT) Received: from maya.myfinge.rs (ifcgrfdd.trafficplex.cloud. [2a10:fc81:a806:d6a9::1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3e752238910sm5348986f8f.41.2025.09.09.23.37.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Sep 2025 23:37:49 -0700 (PDT) Date: Wed, 10 Sep 2025 08:37:48 +0200 From: Stefano Brivio To: David Gibson Subject: Re: [PATCH v4 3/8] tcp: Rewind sequence when guest shrinks window to zero Message-ID: <20250910083748.6b566880@elisabeth> In-Reply-To: References: <20250909181655.2990223-1-sbrivio@redhat.com> <20250909181655.2990223-4-sbrivio@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 4.2.0 (GTK 3.24.49; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 3JwJRM-vG5DJm3zz4UviIOiTmYpjKZBq7wSJgfubdvI_1757486271 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID-Hash: CCTDCO3FCB5LTBPKDAVX7SHC4JMRC4AN X-Message-ID-Hash: CCTDCO3FCB5LTBPKDAVX7SHC4JMRC4AN X-MailFrom: sbrivio@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top, Jon Maloy , Paul Holzinger X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Wed, 10 Sep 2025 12:20:19 +1000 David Gibson wrote: > On Tue, Sep 09, 2025 at 08:16:50PM +0200, Stefano Brivio wrote: > > A window shrunk to zero means by definition that anything else that > > might be in flight is now out of window. Restart from the currently > > acknowledged sequence. > > > > We need to do that both in tcp_tap_window_update(), where we already > > check for zero-window updates, as well as in tcp_data_from_tap(), > > because we might get one of those updates in a batch of packets that > > also contains a non-zero window update. > > > > Suggested-by: Jon Maloy > > Signed-off-by: Stefano Brivio > > Reviewed-by: David Gibson > > Though a couple of documentation nits below. > > > --- > > tcp.c | 34 +++++++++++++++++++++++++--------- > > 1 file changed, 25 insertions(+), 9 deletions(-) > > > > diff --git a/tcp.c b/tcp.c > > index 86e08f1..12d42e0 100644 > > --- a/tcp.c > > +++ b/tcp.c > > @@ -1268,19 +1268,25 @@ static void tcp_get_tap_ws(struct tcp_tap_conn *conn, > > > > /** > > * tcp_tap_window_update() - Process an updated window from tap side > > + * @c: Execution context > > * @conn: Connection pointer > > * @wnd: Window value, host order, unscaled > > */ > > -static void tcp_tap_window_update(struct tcp_tap_conn *conn, unsigned wnd) > > +static void tcp_tap_window_update(const struct ctx *c, > > + struct tcp_tap_conn *conn, unsigned wnd) > > { > > wnd = MIN(MAX_WINDOW, wnd << conn->ws_from_tap); > > > > /* Work-around for bug introduced in peer kernel code, commit > > - * e2142825c120 ("net: tcp: send zero-window ACK when no memory"). > > - * We don't update if window shrank to zero. > > + * e2142825c120 ("net: tcp: send zero-window ACK when no memory"): don't > > + * update the window if it shrank to zero, so that we'll eventually > > + * retry to send data, but rewind the sequence as that obviously implies > > + * that no data beyond the updated window will ever be acknowledged. > > As noted earlier "will ever be acknowledged" might be a bit > misleading. Maybe "no data beyond the window will be acknowledged > until it is retransmitted". Sorry, I actually fixed the patch after your same comment on v3 and then for some reason I threw away the changes together with some of your Reviewed-by: tags. Fixed as we discussed on v3 now. > > */ > > - if (!wnd && SEQ_LT(conn->seq_ack_from_tap, conn->seq_to_tap)) > > + if (!wnd && SEQ_LT(conn->seq_ack_from_tap, conn->seq_to_tap)) { > > + tcp_rewind_seq(c, conn); > > return; > > + } > > > > conn->wnd_from_tap = MIN(wnd >> conn->ws_from_tap, USHRT_MAX); > > > > @@ -1709,7 +1715,8 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, > > tcp_timer_ctl(c, conn); > > > > if (p->count == 1) { > > - tcp_tap_window_update(conn, ntohs(th->window)); > > + tcp_tap_window_update(c, conn, > > + ntohs(th->window)); > > return 1; > > } > > > > @@ -1728,6 +1735,15 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, > > ack_seq == max_ack_seq && > > ntohs(th->window) == max_ack_seq_wnd; > > > > + /* See tcp_tap_window_update() for details. On > > + * top of that, we also need to check here if a > > + * zero-window update is contained in a batch of > > + * packets that includes a non-zero window as > > + * well. > > I'm not 100% convinced of this reasoning. But at worst this should > result in some unnecessary but mostly harmless retransmits, and it > seems to fix the problem empirically, so I'm not suggesting changing > it at this time. Right, strictly speaking, it *might* be that the peer didn't actually throw data away, which should be indicated by the fact that the same data is acknowledged in another segment of this same batch. But handling that without making this part unreasonably complicated implies a refactor of this whole thing: we should extract a function dealing with sequences and perhaps another one with window updates from here. I plan to file a ticket or two with stuff that emerged from this series. -- Stefano