On Wed, Sep 10, 2025 at 08:37:48AM +0200, Stefano Brivio wrote: > On Wed, 10 Sep 2025 12:20:19 +1000 > David Gibson wrote: > > > On Tue, Sep 09, 2025 at 08:16:50PM +0200, Stefano Brivio wrote: > > > A window shrunk to zero means by definition that anything else that > > > might be in flight is now out of window. Restart from the currently > > > acknowledged sequence. > > > > > > We need to do that both in tcp_tap_window_update(), where we already > > > check for zero-window updates, as well as in tcp_data_from_tap(), > > > because we might get one of those updates in a batch of packets that > > > also contains a non-zero window update. > > > > > > Suggested-by: Jon Maloy > > > Signed-off-by: Stefano Brivio > > > > Reviewed-by: David Gibson > > > > Though a couple of documentation nits below. > > > > > --- > > > tcp.c | 34 +++++++++++++++++++++++++--------- > > > 1 file changed, 25 insertions(+), 9 deletions(-) > > > > > > diff --git a/tcp.c b/tcp.c > > > index 86e08f1..12d42e0 100644 > > > --- a/tcp.c > > > +++ b/tcp.c > > > @@ -1268,19 +1268,25 @@ static void tcp_get_tap_ws(struct tcp_tap_conn *conn, > > > > > > /** > > > * tcp_tap_window_update() - Process an updated window from tap side > > > + * @c: Execution context > > > * @conn: Connection pointer > > > * @wnd: Window value, host order, unscaled > > > */ > > > -static void tcp_tap_window_update(struct tcp_tap_conn *conn, unsigned wnd) > > > +static void tcp_tap_window_update(const struct ctx *c, > > > + struct tcp_tap_conn *conn, unsigned wnd) > > > { > > > wnd = MIN(MAX_WINDOW, wnd << conn->ws_from_tap); > > > > > > /* Work-around for bug introduced in peer kernel code, commit > > > - * e2142825c120 ("net: tcp: send zero-window ACK when no memory"). > > > - * We don't update if window shrank to zero. > > > + * e2142825c120 ("net: tcp: send zero-window ACK when no memory"): don't > > > + * update the window if it shrank to zero, so that we'll eventually > > > + * retry to send data, but rewind the sequence as that obviously implies > > > + * that no data beyond the updated window will ever be acknowledged. > > > > As noted earlier "will ever be acknowledged" might be a bit > > misleading. Maybe "no data beyond the window will be acknowledged > > until it is retransmitted". > > Sorry, I actually fixed the patch after your same comment on v3 and > then for some reason I threw away the changes together with some of > your Reviewed-by: tags. Fixed as we discussed on v3 now. Ah, right. It's happened to all of us. > > > > */ > > > - if (!wnd && SEQ_LT(conn->seq_ack_from_tap, conn->seq_to_tap)) > > > + if (!wnd && SEQ_LT(conn->seq_ack_from_tap, conn->seq_to_tap)) { > > > + tcp_rewind_seq(c, conn); > > > return; > > > + } > > > > > > conn->wnd_from_tap = MIN(wnd >> conn->ws_from_tap, USHRT_MAX); > > > > > > @@ -1709,7 +1715,8 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, > > > tcp_timer_ctl(c, conn); > > > > > > if (p->count == 1) { > > > - tcp_tap_window_update(conn, ntohs(th->window)); > > > + tcp_tap_window_update(c, conn, > > > + ntohs(th->window)); > > > return 1; > > > } > > > > > > @@ -1728,6 +1735,15 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, > > > ack_seq == max_ack_seq && > > > ntohs(th->window) == max_ack_seq_wnd; > > > > > > + /* See tcp_tap_window_update() for details. On > > > + * top of that, we also need to check here if a > > > + * zero-window update is contained in a batch of > > > + * packets that includes a non-zero window as > > > + * well. > > > > I'm not 100% convinced of this reasoning. But at worst this should > > result in some unnecessary but mostly harmless retransmits, and it > > seems to fix the problem empirically, so I'm not suggesting changing > > it at this time. > > Right, strictly speaking, it *might* be that the peer didn't actually > throw data away, which should be indicated by the fact that the same > data is acknowledged in another segment of this same batch. > > But handling that without making this part unreasonably complicated > implies a refactor of this whole thing: we should extract a function > dealing with sequences and perhaps another one with window updates from > here. > > I plan to file a ticket or two with stuff that emerged from this series. Ok, sounds good. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson