From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: passt.top; dmarc=none (p=none dis=none) header.from=gibson.dropbear.id.au Authentication-Results: passt.top; dkim=pass (2048-bit key; secure) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.a=rsa-sha256 header.s=202508 header.b=iwf7aF6f; dkim-atps=neutral Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by passt.top (Postfix) with ESMTPS id 49BE65A0275 for ; Fri, 03 Oct 2025 05:43:41 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gibson.dropbear.id.au; s=202508; t=1759463017; bh=o1+AOiKpK7wwe9LkWU2aYEHaZkPV1g+oUKt46kr0W04=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=iwf7aF6fk/qRfQGY9VveOrhjuqTe2d2Sl9wHTjRPnTK+v3DQbMKBeC7rvH6h/3ATS yUHf4fvMHnq2B9bZFgliHIpKuhvsmw8I1HxG6UyKJ0cqFHecqAczNHngq+6aTIOere BCRp5+F+xTGQJAyem9XiLGBxNjPYgxDuTYrjAgVX8+oaEH2PKHOz9UWZRqjVBQgoxO wuCzDYlaUS1lS8aMuzYlVHGUfwj6lTYwk7cpEIENzvHN4CPeOft21F+dxhqz5upC6Y O5HkectWgB3wCuOuRAjVkJ60r/IcYFlIxvCpnf5iy3O7ZOM6h8bgBZbIHv65YgH9UL AsrmX0MYTn/UA== Received: by gandalf.ozlabs.org (Postfix, from userid 1007) id 4cdDzF57z2z4wBC; Fri, 3 Oct 2025 13:43:37 +1000 (AEST) Date: Fri, 3 Oct 2025 13:19:17 +1000 From: David Gibson To: Stefano Brivio Subject: Re: [PATCH 1/4] tcp: Fix ACK sequence on FIN to tap Message-ID: References: <20251002000646.2136202-1-sbrivio@redhat.com> <20251002000646.2136202-2-sbrivio@redhat.com> <20251002135841.112eb4d3@elisabeth> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="/TEz2dVvBfC26Mag" Content-Disposition: inline In-Reply-To: <20251002135841.112eb4d3@elisabeth> Message-ID-Hash: ADWVPPLWIQ6QV6NVP5J7O5KNUPZUO3A6 X-Message-ID-Hash: ADWVPPLWIQ6QV6NVP5J7O5KNUPZUO3A6 X-MailFrom: dgibson@gandalf.ozlabs.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: passt-dev@passt.top X-Mailman-Version: 3.3.8 Precedence: list List-Id: Development discussion and patches for passt Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --/TEz2dVvBfC26Mag Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Oct 02, 2025 at 01:58:41PM +0200, Stefano Brivio wrote: > On Thu, 2 Oct 2025 12:41:08 +1000 > David Gibson wrote: >=20 > > On Thu, Oct 02, 2025 at 02:06:43AM +0200, Stefano Brivio wrote: > > > If we reach end-of-file on a socket (or get EPOLLRDHUP / EPOLLHUP) and > > > send a FIN segment to the guest / container acknowledging a sequence > > > number that's behind what we received so far, we won't have any > > > further trigger to send an updated ACK segment, as we are now > > > switching the epoll socket monitoring to edge-triggered mode. > > >=20 > > > To avoid this situation, in tcp_update_seqack_wnd(), we set the next > > > acknowledgement sequence to the current observed sequence, regardless > > > of what was acknowledged socket-side. =20 > >=20 > > To double check my understanding: things should work if we always > > acknowledged everything we've received. Acknowledging only what the > > peer has acked is a refinement to give the guest a view that's closer > > to what it would be end-to-end with the peer (which might improve the > > operation of flow control). >=20 > Right. >=20 > > We can't use that refined mechanism when the socket is closing > > (amongst other cases), because while we can get the peer acked bytes > > from TCP_INFO, we can't get events when that changes, so we have no > > mechanism to provide updates to the guest at the right time. So we > > fall back to the simpler method. > >=20 > > Is that correct? >=20 > Also correct, yes. If you have a better idea to summarise this in the > comment in tcp_buf_data_from_sock() let me know. Hm, I might. Or actually a way to reorganise the code that I think will be a bit clearer and probably allow a clearer comment too. > Maybe I could mention > EPOLLET explicitly there? I don't think EPOLLET is actually relevant. Even if we had level triggered events, a change in bytes_acked doesn't count as an event (AFAIK). So either some other event is on, in which case we'd effectively be busy polling bytes_acked, or it's not in which case we don't get updates, just like now. I principle we could implement some sort of timer based polling, but that sounds like way more trouble than it's worth. > > > However, we don't necessarily call tcp_update_seqack_wnd() before > > > sending the FIN segment, which might potentially lead to a situation, > > > not observed in practice, where we unnecessarily cause a > > > retransmission at some point after our FIN segment. > > >=20 > > > Avoid that by setting the ACK sequence to whatever we received from > > > the container / guest, before sending a FIN segment and switching to > > > EPOLLET. > > >=20 > > > Signed-off-by: Stefano Brivio =20 > >=20 > > Based on my understanding above, this looks correct to me, so, > >=20 > > Reviewed-by: David Gibson > >=20 > > My only concern is whether we could instead insert an extra call to > > tcp_update_seqack_wnd() to reduce duplicated logic. >=20 > Hmm, maybe, but on the other hand we're closing the connection. Should > we really spend time querying TCP_INFO to recalculate the window at > this point? I wouldn't. Good point. I mean tcp_update_seqack_wnd() could skip the TCP_INFO in that case, but that does look a bit fiddly. On the other hand, in favour of not duplicating logic... [snip] > > > @@ -368,7 +368,19 @@ int tcp_buf_data_from_sock(const struct ctx *c, = struct tcp_tap_conn *conn) > > > conn_flag(c, conn, STALLED); > > > } else if ((conn->events & (SOCK_FIN_RCVD | TAP_FIN_SENT)) =3D=3D > > > SOCK_FIN_RCVD) { > > > - int ret =3D tcp_buf_send_flag(c, conn, FIN | ACK); > > > + int ret; > > > + > > > + /* On TAP_FIN_SENT, we won't get further data events > > > + * from the socket, and this might be the last ACK > > > + * segment we send to the tap, so update its sequence to > > > + * include everything we received until now. > > > + * > > > + * See also the special handling on CONN_IS_CLOSING() in > > > + * tcp_update_seqack_wnd(). > > > + */ > > > + conn->seq_ack_to_tap =3D conn->seq_from_tap; =2E.. the equivalent bits in tcp_update_seqack_wnd() have after them: if (SEQ_LT(conn->seq_ack_to_tap, prev_ack_to_tap)) conn->seq_ack_to_tap =3D prev_ack_to_tap; Don't we need that here as well, in case the guest is retransmitting when we get the sock side FIN? > > > + > > > + ret =3D tcp_buf_send_flag(c, conn, FIN | ACK); > > > if (ret) { > > > tcp_rst(c, conn); > > > return ret; > > > diff --git a/tcp_vu.c b/tcp_vu.c > > > index ebd3a1e..3ec3538 100644 > > > --- a/tcp_vu.c > > > +++ b/tcp_vu.c > > > @@ -410,7 +410,12 @@ int tcp_vu_data_from_sock(const struct ctx *c, s= truct tcp_tap_conn *conn) > > > conn_flag(c, conn, STALLED); > > > } else if ((conn->events & (SOCK_FIN_RCVD | TAP_FIN_SENT)) =3D=3D > > > SOCK_FIN_RCVD) { > > > - int ret =3D tcp_vu_send_flag(c, conn, FIN | ACK); > > > + int ret; > > > + > > > + /* See tcp_buf_data_from_sock() */ > > > + conn->seq_ack_to_tap =3D conn->seq_from_tap; > > > + > > > + ret =3D tcp_vu_send_flag(c, conn, FIN | ACK); > > > if (ret) { > > > tcp_rst(c, conn); > > > return ret; > > > --=20 > > > 2.43.0 > > > =20 >=20 > --=20 > Stefano >=20 --=20 David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson --/TEz2dVvBfC26Mag Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEO+dNsU4E3yXUXRK2zQJF27ox2GcFAmjfQKkACgkQzQJF27ox 2GcPbRAAhEAddFBgJLqkmMngVvOmq0vqvve2dBHElLgkVdPWpzgyc+ZR+INaOf7E S0Wx8Ca3vVZaW80Hvy2Ch5YK6uncjYjbrTugHKvkRUU5DfEC1ixfqlWhkcfUR9D+ I2nywsudyztX5Mfk9P4o5HT8qPRcjtTgtX4YEjoR9Nni3N8KuX+kwiid8L+zlTEN agSUik+ZILn9Hn9CXuuQd4PgxemxDhf/QnHG8KYBHZzA/Q6bxqFUZBP6phr1XDHK BrGXdE/0ffXGIroN5FTEN+1iJtrxKuJFtA0ET5ewMNsAOneVEiu+BtDupb0BoFdP cgvPfFOLYWVoKLvGvlem2TnsWHm0yabF4s6ndDc0Ahmj5o17h5WBSrLn3zbwHD7L XqJnthAo1BF3EkKcOJGuk8dBFCOV5cZxda9a/kLaaroio3UlIe5KryeN5C2DVQA1 q6jbPMZoYgUaga0QjnQQZP485cVIJoChGxmA2pRbU+vDoYdjGUNFe8bOJu987kdn 4lzF1FW7i54fU5EJskbaGZzoF68/ayLi1VonWUMQbnxmoq3iGn9srTwSltA3UA+G 2dmlmJrp21Z6lP4NPXDNrPzbtw0Uf/gcmPZXjiJp6nNFScqM4sMHc8FKa7PC8nRu niPFsthvQt6U13owPdB26IqZMhmrQpHj5qwi/isdMWbxx/HVFM0= =+qMi -----END PGP SIGNATURE----- --/TEz2dVvBfC26Mag--