From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: passt-dev@passt.top, Matej Hrica <mhrica@redhat.com>
Subject: Re: [PATCH v2 2/3] tcp: Force TCP_WINDOW_CLAMP before resetting STALLED flag
Date: Tue, 3 Oct 2023 13:47:13 +1100 [thread overview]
Message-ID: <ZRuAsfoj8JGjGb5S@zatzit> (raw)
In-Reply-To: <20230929150446.2671959-3-sbrivio@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 4875 bytes --]
On Fri, Sep 29, 2023 at 05:04:45PM +0200, Stefano Brivio wrote:
> It looks like we need it as workaround for this situation, readily
> reproducible at least with a 6.5 Linux kernel, with default rmem_max
> and wmem_max values:
>
> - an iperf3 client on the host sends about 160 KiB, typically
> segmented into five frames by passt. We read this data using
> MSG_PEEK
>
> - the iperf3 server on the guest starts receiving
>
> - meanwhile, the host kernel advertised a zero-sized window to the
> receiver, as expected
>
> - eventually, the guest acknowledges all the data sent so far, and
> we drop it from the buffer, courtesy of tcp_sock_consume(), using
> recv() with MSG_TRUNC
>
> - the client, however, doesn't get an updated window value, and
> even keepalive packets are answered with zero-window segments,
> until the connection is closed
>
> It looks like dropping data from a socket using MSG_TRUNC doesn't
> cause a recalculation of the window, which would be expected as a
> result of any receiving operation that invalidates data on a buffer
> (that is, not with MSG_PEEK).
>
> Strangely enough, setting TCP_WINDOW_CLAMP via setsockopt(), even to
> the previous value we clamped to, forces a recalculation of the
> window which is advertised to the guest.
>
> I couldn't quite confirm this issue by following all the possible
> code paths in the kernel, yet. If confirmed, this should be fixed in
> the kernel, but meanwhile this workaround looks robust to me (and it
> will be needed for backward compatibility anyway).
>
> Reported-by: Matej Hrica <mhrica@redhat.com>
> Link: https://bugs.passt.top/show_bug.cgi?id=74
> Analysed-by: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> tcp.c | 29 ++++++++++++++++++++++++-----
> 1 file changed, 24 insertions(+), 5 deletions(-)
>
> diff --git a/tcp.c b/tcp.c
> index dff5e79..32917c8 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -1780,7 +1780,23 @@ static void tcp_clamp_window(const struct ctx *c, struct tcp_tap_conn *conn,
> wnd <<= conn->ws_from_tap;
> wnd = MIN(MAX_WINDOW, wnd);
>
> - if (conn->flags & WND_CLAMPED) {
> + /* TODO: With (at least) Linux kernel versions 6.1 to 6.5, if we end up
> + * with a zero-sized window on a TCP socket, dropping data (once
> + * acknowledged by the guest) with recv() and MSG_TRUNC doesn't appear
> + * to be enough to make the kernel advertise a non-zero window to the
> + * receiver. Forcing a TCP_WINDOW_CLAMP setting, even with the existing
> + * value, fixes this.
> + *
> + * The STALLED flag on a connection is a sufficient indication that we
> + * might have a zero-sized window on the socket, because it's set if we
> + * exhausted the tap-side window, or if everything we receive from a
> + * socket is already in flight to the guest.
> + *
> + * So, if STALLED is set, and we received a window value from the tap,
> + * force a TCP_WINDOW_CLAMP setsockopt(). This should be investigated
> + * further and fixed in the kernel instead, if confirmed.
> + */
> + if (!(conn->flags & STALLED) && conn->flags & WND_CLAMPED) {
> if (prev_scaled == wnd)
> return;
>
> @@ -2409,12 +2425,12 @@ static int tcp_data_from_tap(struct ctx *c, struct tcp_tap_conn *conn,
> i = keep - 1;
> }
>
> - tcp_clamp_window(c, conn, max_ack_seq_wnd);
> -
> /* On socket flush failure, pretend there was no ACK, try again later */
> if (ack && !tcp_sock_consume(conn, max_ack_seq))
> tcp_update_seqack_from_tap(c, conn, max_ack_seq);
>
> + tcp_clamp_window(c, conn, max_ack_seq_wnd);
> +
> if (retr) {
> trace("TCP: fast re-transmit, ACK: %u, previous sequence: %u",
> max_ack_seq, conn->seq_to_tap);
> @@ -2572,8 +2588,6 @@ int tcp_tap_handler(struct ctx *c, int af, const void *saddr, const void *daddr,
> if (th->ack && !(conn->events & ESTABLISHED))
> tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq));
>
> - conn_flag(c, conn, ~STALLED);
> -
> /* Establishing connection from socket */
> if (conn->events & SOCK_ACCEPTED) {
> if (th->syn && th->ack && !th->fin) {
> @@ -2628,6 +2642,11 @@ int tcp_tap_handler(struct ctx *c, int af, const void *saddr, const void *daddr,
> if (count == -1)
> goto reset;
>
> + /* Note: STALLED matters for tcp_clamp_window(): unset it only after
> + * processing data (and window) from the tap side
> + */
> + conn_flag(c, conn, ~STALLED);
> +
> if (conn->seq_ack_to_tap != conn->seq_from_tap)
> ack_due = 1;
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2023-10-03 2:50 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-29 15:04 [PATCH v2 0/3] Fixes and a workaround for TCP stalls with small buffers Stefano Brivio
2023-09-29 15:04 ` [PATCH v2 1/3] tcp: Fix comment to tcp_sock_consume() Stefano Brivio
2023-09-29 15:04 ` [PATCH v2 2/3] tcp: Force TCP_WINDOW_CLAMP before resetting STALLED flag Stefano Brivio
2023-09-29 15:44 ` Stefano Brivio
2023-10-03 2:47 ` David Gibson [this message]
2023-09-29 15:04 ` [PATCH v2 3/3] tcp, tap: Don't increase tap-side sequence counter for dropped frames Stefano Brivio
2023-10-03 2:50 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZRuAsfoj8JGjGb5S@zatzit \
--to=david@gibson.dropbear.id.au \
--cc=mhrica@redhat.com \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).