From: David Gibson <david@gibson.dropbear.id.au>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: Matej Hrica <mhrica@redhat.com>, passt-dev@passt.top
Subject: Re: [PATCH RFT 3/5] tcp: Force TCP_WINDOW_CLAMP before resetting STALLED flag
Date: Sat, 23 Sep 2023 17:55:44 +1000 [thread overview]
Message-ID: <ZQ6aACKk3LpHl7F4@zatzit> (raw)
In-Reply-To: <20230922220610.58767-4-sbrivio@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 4144 bytes --]
On Sat, Sep 23, 2023 at 12:06:08AM +0200, Stefano Brivio wrote:
> It looks like we need it as workaround for this situation, readily
> reproducible at least with a 6.5 Linux kernel, with default rmem_max
> and wmem_max values:
>
> - an iperf3 client on the host sends about 160 KiB, typically
> segmented into five frames by passt. We read this data using
> MSG_PEEK
>
> - the iperf3 server on the guest starts receiving
>
> - meanwhile, the host kernel advertised a zero-sized window to the
> receiver, as expected
You noted the s/receiver/sender/ here..
>
> - eventually, the guest acknowledges all the data sent so far, and
> we drop it from the buffer, courtesy of tcp_sock_consume(), using
> recv() with MSG_TRUNC
>
> - the client, however, doesn't get an updated window value, and
> even keepalive packets are answered with zero-window segments,
> until the connection is closed
>
> It looks like dropping data from a socket using MSG_TRUNC doesn't
> cause a recalculation of the window, which would be expected as a
> result of any receiving operation that invalidates data on a buffer
> (that is, not with MSG_PEEK).
>
> Strangely enough, setting TCP_WINDOW_CLAMP via setsockopt(), even to
> the previous value we clamped to, forces a recalculation of the
> window which is advertised to the guest.
..and the s/guest/sender/ here..
>
> I couldn't quite confirm this issue by following all the possible
> code paths in the kernel, yet. If confirmed, this should be fixed in
> the kernel, but meanwhile this workaround looks robust to me (and it
> will be needed for backward compatibility anyway).
>
> Reported-by: Matej Hrica <mhrica@redhat.com>
> Link: https://bugs.passt.top/show_bug.cgi?id=74
> Analysed-by: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> ---
> tcp.c | 22 +++++++++++++++++++---
> 1 file changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/tcp.c b/tcp.c
> index 5528e05..4606f17 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -1780,7 +1780,23 @@ static void tcp_clamp_window(const struct ctx *c, struct tcp_tap_conn *conn,
> wnd <<= conn->ws_from_tap;
> wnd = MIN(MAX_WINDOW, wnd);
>
> - if (conn->flags & WND_CLAMPED) {
> + /* TODO: With (at least) Linux kernel versions 6.1 to 6.5, if we end up
> + * with a zero-sized window on a TCP socket, dropping data (once
> + * acknowledged by the guest) with recv() and MSG_TRUNC doesn't appear
> + * to be enough to make the kernel advertise a non-zero window to the
> + * receiver. Forcing a TCP_WINDOW_CLAMP setting, even with the existing
.. but you need another s/receiver/sender/ here too.
> + * value, fixes this.
> + *
> + * The STALLED flag on a connection is a sufficient indication that we
> + * might have a zero-sized window on the socket, because it's set if we
> + * exhausted the tap-side window, or if everything we receive from a
> + * socket is already in flight to the guest.
> + *
> + * So, if STALLED is set, and we received a window value from the tap,
> + * force a TCP_WINDOW_CLAMP setsockopt(). This should be investigated
> + * further and fixed in the kernel instead, if confirmed.
> + */
> + if (!(conn->flags & STALLED) && conn->flags & WND_CLAMPED) {
> if (prev_scaled == wnd)
> return;
>
> @@ -2409,12 +2425,12 @@ static int tcp_data_from_tap(struct ctx *c, struct tcp_tap_conn *conn,
> i = keep - 1;
> }
>
> - tcp_clamp_window(c, conn, max_ack_seq_wnd);
> -
> /* On socket flush failure, pretend there was no ACK, try again later */
> if (ack && !tcp_sock_consume(conn, max_ack_seq))
> tcp_update_seqack_from_tap(c, conn, max_ack_seq);
>
> + tcp_clamp_window(c, conn, max_ack_seq_wnd);
> +
> if (retr) {
> trace("TCP: fast re-transmit, ACK: %u, previous sequence: %u",
> max_ack_seq, conn->seq_to_tap);
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2023-09-23 8:08 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-22 22:06 [PATCH RFT 0/5] Fixes and a workaround for TCP stalls with small buffers Stefano Brivio
2023-09-22 22:06 ` [PATCH RFT 1/5] tcp: Fix comment to tcp_sock_consume() Stefano Brivio
2023-09-23 2:48 ` David Gibson
2023-09-22 22:06 ` [PATCH RFT 2/5] tcp: Reset STALLED flag on ACK only, check for pending socket data Stefano Brivio
2023-09-25 3:07 ` David Gibson
2023-09-27 17:05 ` Stefano Brivio
2023-09-28 1:48 ` David Gibson
2023-09-29 15:20 ` Stefano Brivio
2023-10-03 3:20 ` David Gibson
2023-10-05 6:18 ` Stefano Brivio
2023-10-05 7:36 ` David Gibson
2023-09-22 22:06 ` [PATCH RFT 3/5] tcp: Force TCP_WINDOW_CLAMP before resetting STALLED flag Stefano Brivio
2023-09-22 22:31 ` Stefano Brivio
2023-09-23 7:55 ` David Gibson [this message]
2023-09-25 4:09 ` David Gibson
2023-09-25 4:10 ` David Gibson
2023-09-25 4:21 ` David Gibson
2023-09-27 17:05 ` Stefano Brivio
2023-09-28 1:51 ` David Gibson
2023-09-22 22:06 ` [PATCH RFT 4/5] tcp, tap: Don't increase tap-side sequence counter for dropped frames Stefano Brivio
2023-09-25 4:47 ` David Gibson
2023-09-27 17:06 ` Stefano Brivio
2023-09-28 1:58 ` David Gibson
2023-09-29 15:19 ` Stefano Brivio
2023-10-03 3:22 ` David Gibson
2023-10-05 6:19 ` Stefano Brivio
2023-10-05 7:38 ` David Gibson
2023-09-22 22:06 ` [PATCH RFT 5/5] passt.1: Add note about tuning rmem_max and wmem_max for throughput Stefano Brivio
2023-09-25 4:57 ` David Gibson
2023-09-27 17:06 ` Stefano Brivio
2023-09-28 2:02 ` David Gibson
2023-09-25 5:52 ` [PATCH RFT 0/5] Fixes and a workaround for TCP stalls with small buffers David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZQ6aACKk3LpHl7F4@zatzit \
--to=david@gibson.dropbear.id.au \
--cc=mhrica@redhat.com \
--cc=passt-dev@passt.top \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).