From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top
Subject: Re: [PATCH 1/4] tcp: Fix ACK sequence on FIN to tap
Date: Thu, 2 Oct 2025 12:41:08 +1000 [thread overview]
Message-ID: <aN3mRIPkCSIbOkXs@zatzit> (raw)
In-Reply-To: <20251002000646.2136202-2-sbrivio@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 3871 bytes --]
On Thu, Oct 02, 2025 at 02:06:43AM +0200, Stefano Brivio wrote:
> If we reach end-of-file on a socket (or get EPOLLRDHUP / EPOLLHUP) and
> send a FIN segment to the guest / container acknowledging a sequence
> number that's behind what we received so far, we won't have any
> further trigger to send an updated ACK segment, as we are now
> switching the epoll socket monitoring to edge-triggered mode.
>
> To avoid this situation, in tcp_update_seqack_wnd(), we set the next
> acknowledgement sequence to the current observed sequence, regardless
> of what was acknowledged socket-side.
To double check my understanding: things should work if we always
acknowledged everything we've received. Acknowledging only what the
peer has acked is a refinement to give the guest a view that's closer
to what it would be end-to-end with the peer (which might improve the
operation of flow control).
We can't use that refined mechanism when the socket is closing
(amongst other cases), because while we can get the peer acked bytes
from TCP_INFO, we can't get events when that changes, so we have no
mechanism to provide updates to the guest at the right time. So we
fall back to the simpler method.
Is that correct?
> However, we don't necessarily call tcp_update_seqack_wnd() before
> sending the FIN segment, which might potentially lead to a situation,
> not observed in practice, where we unnecessarily cause a
> retransmission at some point after our FIN segment.
>
> Avoid that by setting the ACK sequence to whatever we received from
> the container / guest, before sending a FIN segment and switching to
> EPOLLET.
>
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Based on my understanding above, this looks correct to me, so,
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
My only concern is whether we could instead insert an extra call to
tcp_update_seqack_wnd() to reduce duplicated logic.
> ---
> tcp_buf.c | 14 +++++++++++++-
> tcp_vu.c | 7 ++++++-
> 2 files changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/tcp_buf.c b/tcp_buf.c
> index a493b5a..cc106bc 100644
> --- a/tcp_buf.c
> +++ b/tcp_buf.c
> @@ -368,7 +368,19 @@ int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
> conn_flag(c, conn, STALLED);
> } else if ((conn->events & (SOCK_FIN_RCVD | TAP_FIN_SENT)) ==
> SOCK_FIN_RCVD) {
> - int ret = tcp_buf_send_flag(c, conn, FIN | ACK);
> + int ret;
> +
> + /* On TAP_FIN_SENT, we won't get further data events
> + * from the socket, and this might be the last ACK
> + * segment we send to the tap, so update its sequence to
> + * include everything we received until now.
> + *
> + * See also the special handling on CONN_IS_CLOSING() in
> + * tcp_update_seqack_wnd().
> + */
> + conn->seq_ack_to_tap = conn->seq_from_tap;
> +
> + ret = tcp_buf_send_flag(c, conn, FIN | ACK);
> if (ret) {
> tcp_rst(c, conn);
> return ret;
> diff --git a/tcp_vu.c b/tcp_vu.c
> index ebd3a1e..3ec3538 100644
> --- a/tcp_vu.c
> +++ b/tcp_vu.c
> @@ -410,7 +410,12 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
> conn_flag(c, conn, STALLED);
> } else if ((conn->events & (SOCK_FIN_RCVD | TAP_FIN_SENT)) ==
> SOCK_FIN_RCVD) {
> - int ret = tcp_vu_send_flag(c, conn, FIN | ACK);
> + int ret;
> +
> + /* See tcp_buf_data_from_sock() */
> + conn->seq_ack_to_tap = conn->seq_from_tap;
> +
> + ret = tcp_vu_send_flag(c, conn, FIN | ACK);
> if (ret) {
> tcp_rst(c, conn);
> return ret;
> --
> 2.43.0
>
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2025-10-02 2:53 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-02 0:06 [PATCH 0/4] tcp: Fix bad switch to CLOSE-WAIT state and surrounding issues Stefano Brivio
2025-10-02 0:06 ` [PATCH 1/4] tcp: Fix ACK sequence on FIN to tap Stefano Brivio
2025-10-02 2:41 ` David Gibson [this message]
2025-10-02 11:58 ` Stefano Brivio
2025-10-03 3:19 ` David Gibson
2025-10-06 22:32 ` Stefano Brivio
2025-10-06 23:31 ` David Gibson
2025-10-02 0:06 ` [PATCH 2/4] tcp: Completely ignore data segment in CLOSE-WAIT state, log a message Stefano Brivio
2025-10-02 2:44 ` David Gibson
2025-10-02 0:06 ` [PATCH 3/4] tcp: Don't consider FIN flags with mismatching sequence Stefano Brivio
2025-10-02 2:52 ` David Gibson
2025-10-02 3:02 ` David Gibson
2025-10-02 11:51 ` Stefano Brivio
2025-10-03 3:43 ` David Gibson
2025-10-06 22:32 ` Stefano Brivio
2025-10-06 23:34 ` David Gibson
2025-10-02 0:06 ` [PATCH 4/4] tcp: On partial send (incomplete sendmsg()), request a retransmission right away Stefano Brivio
2025-10-02 3:00 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aN3mRIPkCSIbOkXs@zatzit \
--to=david@gibson.dropbear.id.au \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).