On Sat, Sep 23, 2023 at 12:06:08AM +0200, Stefano Brivio wrote: > It looks like we need it as workaround for this situation, readily > reproducible at least with a 6.5 Linux kernel, with default rmem_max > and wmem_max values: > > - an iperf3 client on the host sends about 160 KiB, typically > segmented into five frames by passt. We read this data using > MSG_PEEK > > - the iperf3 server on the guest starts receiving > > - meanwhile, the host kernel advertised a zero-sized window to the > receiver, as expected You noted the s/receiver/sender/ here.. > > - eventually, the guest acknowledges all the data sent so far, and > we drop it from the buffer, courtesy of tcp_sock_consume(), using > recv() with MSG_TRUNC > > - the client, however, doesn't get an updated window value, and > even keepalive packets are answered with zero-window segments, > until the connection is closed > > It looks like dropping data from a socket using MSG_TRUNC doesn't > cause a recalculation of the window, which would be expected as a > result of any receiving operation that invalidates data on a buffer > (that is, not with MSG_PEEK). > > Strangely enough, setting TCP_WINDOW_CLAMP via setsockopt(), even to > the previous value we clamped to, forces a recalculation of the > window which is advertised to the guest. ..and the s/guest/sender/ here.. > > I couldn't quite confirm this issue by following all the possible > code paths in the kernel, yet. If confirmed, this should be fixed in > the kernel, but meanwhile this workaround looks robust to me (and it > will be needed for backward compatibility anyway). > > Reported-by: Matej Hrica > Link: https://bugs.passt.top/show_bug.cgi?id=74 > Analysed-by: David Gibson > Signed-off-by: Stefano Brivio > --- > tcp.c | 22 +++++++++++++++++++--- > 1 file changed, 19 insertions(+), 3 deletions(-) > > diff --git a/tcp.c b/tcp.c > index 5528e05..4606f17 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -1780,7 +1780,23 @@ static void tcp_clamp_window(const struct ctx *c, struct tcp_tap_conn *conn, > wnd <<= conn->ws_from_tap; > wnd = MIN(MAX_WINDOW, wnd); > > - if (conn->flags & WND_CLAMPED) { > + /* TODO: With (at least) Linux kernel versions 6.1 to 6.5, if we end up > + * with a zero-sized window on a TCP socket, dropping data (once > + * acknowledged by the guest) with recv() and MSG_TRUNC doesn't appear > + * to be enough to make the kernel advertise a non-zero window to the > + * receiver. Forcing a TCP_WINDOW_CLAMP setting, even with the existing .. but you need another s/receiver/sender/ here too. > + * value, fixes this. > + * > + * The STALLED flag on a connection is a sufficient indication that we > + * might have a zero-sized window on the socket, because it's set if we > + * exhausted the tap-side window, or if everything we receive from a > + * socket is already in flight to the guest. > + * > + * So, if STALLED is set, and we received a window value from the tap, > + * force a TCP_WINDOW_CLAMP setsockopt(). This should be investigated > + * further and fixed in the kernel instead, if confirmed. > + */ > + if (!(conn->flags & STALLED) && conn->flags & WND_CLAMPED) { > if (prev_scaled == wnd) > return; > > @@ -2409,12 +2425,12 @@ static int tcp_data_from_tap(struct ctx *c, struct tcp_tap_conn *conn, > i = keep - 1; > } > > - tcp_clamp_window(c, conn, max_ack_seq_wnd); > - > /* On socket flush failure, pretend there was no ACK, try again later */ > if (ack && !tcp_sock_consume(conn, max_ack_seq)) > tcp_update_seqack_from_tap(c, conn, max_ack_seq); > > + tcp_clamp_window(c, conn, max_ack_seq_wnd); > + > if (retr) { > trace("TCP: fast re-transmit, ACK: %u, previous sequence: %u", > max_ack_seq, conn->seq_to_tap); -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson