From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202508 header.b=FypW869n; dkim-atps=neutral Received: from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by passt.top (Postfix) with ESMTPS id 54F745A0275 for ; Thu, 02 Oct 2025 04:53:06 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202508; t=1759373583; bh=fTah40IvJQ7UPMtGHp7YzfSQwiY2D5oCA6vpi0g0qZc=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=FypW869naw/MMHlA7r+qfQPrPfB97FSHv6HpRSjxqYlE/XipT3GfN7rqEZz9G4wEH QGj+AqmbGgMQAIW0lF3bnjE4OL6XfYFlM0RzaIaCN+dA3kZNM1utmMozOEpFuPkCBh I/RBdncUn/3WZwsF1L4x4qDyXxMw6y9mTZxbQg69dhFmQ6LkRGXy2TUUMTRvrxQyNM DjbzyUMBhLP/9bSr1bR8lN1RNo16VO2rBK3xdW2bzmQc+WdhSXV8VK/1KytDxx2JuA S63Ayhk7s9f/yjcJD/TFwFuDfU12vsb5MR3my2RWZeDrlmsUYm+IhhlLmMcY+ae3Ey PHA4l4qM+A0ig== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4ccbvM2L5Kz4w9w; Thu, 2 Oct 2025 12:53:03 +1000 (AEST) Date: Thu, 2 Oct 2025 12:41:08 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 1/4] tcp: Fix ACK sequence on FIN to tap Message-ID: References: <20251002000646.2136202-1-sbrivio@redhat.com> <20251002000646.2136202-2-sbrivio@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="fJDpDn69+dH0nA5m" Content-Disposition: inline In-Reply-To: <20251002000646.2136202-2-sbrivio@redhat.com> Message-ID-Hash: NYTF6YD2YZ7VMBZRT7QCU6I3HWNG3REG X-Message-ID-Hash: NYTF6YD2YZ7VMBZRT7QCU6I3HWNG3REG X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --fJDpDn69+dH0nA5m Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Oct 02, 2025 at 02:06:43AM +0200, Stefano Brivio wrote: > If we reach end-of-file on a socket (or get EPOLLRDHUP / EPOLLHUP) and > send a FIN segment to the guest / container acknowledging a sequence > number that's behind what we received so far, we won't have any > further trigger to send an updated ACK segment, as we are now > switching the epoll socket monitoring to edge-triggered mode. >=20 > To avoid this situation, in tcp_update_seqack_wnd(), we set the next > acknowledgement sequence to the current observed sequence, regardless > of what was acknowledged socket-side. To double check my understanding: things should work if we always acknowledged everything we've received. Acknowledging only what the peer has acked is a refinement to give the guest a view that's closer to what it would be end-to-end with the peer (which might improve the operation of flow control). We can't use that refined mechanism when the socket is closing (amongst other cases), because while we can get the peer acked bytes =66rom TCP_INFO, we can't get events when that changes, so we have no mechanism to provide updates to the guest at the right time. So we fall back to the simpler method. Is that correct? > However, we don't necessarily call tcp_update_seqack_wnd() before > sending the FIN segment, which might potentially lead to a situation, > not observed in practice, where we unnecessarily cause a > retransmission at some point after our FIN segment. >=20 > Avoid that by setting the ACK sequence to whatever we received from > the container / guest, before sending a FIN segment and switching to > EPOLLET. >=20 > Signed-off-by: Stefano Brivio Based on my understanding above, this looks correct to me, so, Reviewed-by: David Gibson My only concern is whether we could instead insert an extra call to tcp_update_seqack_wnd() to reduce duplicated logic. > --- > tcp_buf.c | 14 +++++++++++++- > tcp_vu.c | 7 ++++++- > 2 files changed, 19 insertions(+), 2 deletions(-) >=20 > diff --git a/tcp_buf.c b/tcp_buf.c > index a493b5a..cc106bc 100644 > --- a/tcp_buf.c > +++ b/tcp_buf.c > @@ -368,7 +368,19 @@ int tcp_buf_data_from_sock(const struct ctx *c, stru= ct tcp_tap_conn *conn) > conn_flag(c, conn, STALLED); > } else if ((conn->events & (SOCK_FIN_RCVD | TAP_FIN_SENT)) =3D=3D > SOCK_FIN_RCVD) { > - int ret =3D tcp_buf_send_flag(c, conn, FIN | ACK); > + int ret; > + > + /* On TAP_FIN_SENT, we won't get further data events > + * from the socket, and this might be the last ACK > + * segment we send to the tap, so update its sequence to > + * include everything we received until now. > + * > + * See also the special handling on CONN_IS_CLOSING() in > + * tcp_update_seqack_wnd(). > + */ > + conn->seq_ack_to_tap =3D conn->seq_from_tap; > + > + ret =3D tcp_buf_send_flag(c, conn, FIN | ACK); > if (ret) { > tcp_rst(c, conn); > return ret; > diff --git a/tcp_vu.c b/tcp_vu.c > index ebd3a1e..3ec3538 100644 > --- a/tcp_vu.c > +++ b/tcp_vu.c > @@ -410,7 +410,12 @@ int tcp_vu_data_from_sock(const struct ctx *c, struc= t tcp_tap_conn *conn) > conn_flag(c, conn, STALLED); > } else if ((conn->events & (SOCK_FIN_RCVD | TAP_FIN_SENT)) =3D=3D > SOCK_FIN_RCVD) { > - int ret =3D tcp_vu_send_flag(c, conn, FIN | ACK); > + int ret; > + > + /* See tcp_buf_data_from_sock() */ > + conn->seq_ack_to_tap =3D conn->seq_from_tap; > + > + ret =3D tcp_vu_send_flag(c, conn, FIN | ACK); > if (ret) { > tcp_rst(c, conn); > return ret; > --=20 > 2.43.0 >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --fJDpDn69+dH0nA5m Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmjd5kMACgkQzQJF27ox 2GdCgQ//RlnU0sfsRGFZQWvqbq1BWWS5acoP6StSM3gqDzcm9YgUnKpvpLRlfYuP GXl//rQPWdkAgdR3z0p/uXwNPoAGB9egzN7XWJdb7gVqMjB4JOqEcLtt5IdnMmhN 9tl9glcfTldK/bMQ4OVR+YbJSUnojTQFlyM2dHcUrKI/j1uMAX6x1RnVFaoPQwnd 1TqvBOyQLhrWRwQGQZfuDCXtaEbBr/3axRodPDupdCVk8ForDRI/+JFMEMgbSP5A 7BKPSIh4rIxjFGn3sLIxseZaRzu4B+DpVYY1517gLMPFz1aP6hL0pJS9241Zu/VA kwNMIs5tBuBq3+TsZ6ifqGN4xKpT4MhJ+PmHM2UKrUvYvozCANZauuA86dYchb6B vSYiyPHJChRQNtfRTRykp7DjNtGeJQCYPdwYRGgZyeCyJcH2dSN8jLRRMZnq6IMh WtgIoBtNG3VuGWkoy7n5k5WUmaQZt99KEU+q+PjZ5aVfDNiZheUeBT59pcI1DShB 7JX/IOdHNJh5ivIyBzqaMoTpgh81T9ndzCLBwGWFx37EO5k+pRuI8kxIWAdTU+4F upM9Tez7AJkndI2ac1y5ZcIt8raHxt0TOZQ8vybzM3G0CQTwK1IEjFXrmb1UKBeS zRPR1I12bJuWsxW/pMrM7dq84r6a5LjcQQspeEjJiCBMutc9TKM= =aytT -----END PGP SIGNATURE----- --fJDpDn69+dH0nA5m--