From: Stefano Brivio <sbrivio@redhat.com>
To: Jon Maloy <jmaloy@redhat.com>
Cc: passt-dev@passt.top, lvivier@redhat.com, dgibson@redhat.com
Subject: Re: [PATCH v5 3/3] tcp: allow retransmit when peer receive window is zero
Date: Fri, 17 May 2024 14:15:19 +0200 [thread overview]
Message-ID: <20240517141519.499f362f@elisabeth> (raw)
In-Reply-To: <bd793a8a-02aa-47f0-17b7-d1ab188902b4@redhat.com>
On Thu, 16 May 2024 22:39:53 -0400
Jon Maloy <jmaloy@redhat.com> wrote:
> On 2024-05-16 18:21, Stefano Brivio wrote:
> > On Thu, 16 May 2024 17:33:28 -0400
> > Jon Maloy <jmaloy@redhat.com> wrote:
> >
> >> A bug in kernel TCP may lead to a deadlock where a zero window is sent
> >> from the peer, while it is unable to send out window updates even after
> >> reads have freed up enough buffer space to permit a larger window.
> >> In this situation, new window advertisemnts from the peer can only be
> >> triggered by packets arriving from this side.
> >>
> >> However, such packets are never sent, because the zero-window condition
> >> currently prevents this side from sending out any packets whatsoever
> >> to the peer.
> >>
> >> We notice that the above bug is triggered *only* after the peer has
> >> dropped an arriving packet because of severe memory squeeze, and that we
> >> hence always enter a retransmission situation when this occurs. This
> >> also means that it goes against the RFC 9293 recommendation that a
> >> previously advertised window never should shrink.
> >>
> >> RFC 9293 gives the solution to this situation. In chapter 3.6.1 we find
> >> the following statement:
> >> "A TCP receiver SHOULD NOT shrink the window, i.e., move the right
> >> window edge to the left (SHLD-14). However, a sending TCP peer MUST
> >> be robust against window shrinking, which may cause the
> >> "usable window" (see Section 3.8.6.2.1) to become negative (MUST-34).
> >>
> >> If this happens, the sender SHOULD NOT send new data (SHLD-15), but
> >> SHOULD retransmit normally the old unacknowledged data between SND.UNA
> >> and SND.UNA+SND.WND (SHLD-16). The sender MAY also retransmit old data
> >> beyond SND.UNA+SND.WND (MAY-7)"
> >>
> >> We never see the window become negative, but we interpret this as a
> >> recommendation to use the previously available window during
> >> retransmission even when the currently advertised window is zero.
> >>
> >> We use the above mechanism only for timer-induced retransmits, while
> >> the fast-retransmit mechanism won't trigger on this condition.
> >>
> >> It should be noted that although this solves the problem we have at
> >> hand, it is not a genuine solution to the kernel bug. There may well
> >> be TCP stacks around in other OS-es which don't do this, nor have
> >> keep-alive probing as an alternatve way to solve the situation.
> >>
> >> Signed-off-by: Jon Maloy <jmaloy@redhat.com>
> >>
> >> ---
> >> v2: - Using previously advertised window during retransmission, instead
> >> highest send sequencece number in the cycle.
> >> v3: - Rebased to newest code
> >> - Changes based on feedback from PASST team
> >> - Sending out empty probe message at timer expiration when
> >> we are not in retransmit situation.
> >> v4: - Some small changes based on feedback from PASST team.
> >> - Replaced fast retransmit with a one-time 'fast probe' when
> >> window is zero.
> >> v5: - Gave up on 'fast probing' for now. When I got the sequence
> >> numbers right in the flag message (after having emptied the tap
> >> queue), it turns out an empty message does *not* force a new peer
> >> window update as was my previous understanding of the code.
> >> - Added cppcheck suppression line (which I was unable to verify)
> > ...hmm, why? All it takes is a 'make cppcheck'. Or is your version of
> > cppcheck not showing it?
> Exactly. My version is too recent, I think.
> >
> >> as suggested by S. Brivio.
> >> - Removed sending of empty probe when window from tap side is zero.
> >> It looks pointless at the moment, at least for solving the above
> >> described situation.
> > So, as far as I understand, this patch now doesn't introduce any
> > functional change, except that if an existing timer happened to be
> > pending (this patch never schedules it),
> > and the window _we_ are
> > advertising (wnd_to_tap) is zero (I'm not sure why that's relevant),
> > then we'll send an ACK segment.
> The change is that we really will do a retransmit from the timer handler,
> ignoring that current window is zero.
...but how does the window *we* advertise matter here? You're checking
conn->wnd_to_tap now, not conn->wnd_from_tap. Anyway, if you're now
abandoning this approach, it doesn't matter.
> This solves the deadlock problem
> I have been describing, but it is not a good solution.
>
> One problem is that new incoming data will cause an EPOLLIN event,
> which will cause a call to tcp_data_from_sock(), which will also ignore
> the zero-window.
Right now it doesn't ignore it. Do you plan to use that function to
check if the usable window is now negative (because the window value is
zero but we have unacknowledged data), and re-send something in that
case?
> This is easy to fix, I realize:
>
> if (events & EPOLLIN && conn->wnd_from_tap)
> tcp_data_from_sock(c, conn);
>
> The next is that I would like to start a short timer when I receive
> the zero-window and seq_to_tap > seq_ack_from_tap (same condition
> as in previous versions, but without sending the flag packet).
> As far as I can see there is no simple way to schedule the timer to
> the value I want. I am open to suggestions here.
You could reuse/abuse ACK_TO_TAP_DUE (after all, you want to send a
keep-alive in a while), and schedule a different timeout in
tcp_timer_ctl() if the peer window is zero.
Then, in the timer handler, depending on the peer window, you would
check if you want to use ACK_IF_NEEDED (no keepalive), or ACK (force a
keepalive in any case).
--
Stefano
next parent reply other threads:[~2024-05-17 12:15 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20240516213328.1049176-1-jmaloy@redhat.com>
[not found] ` <20240516213328.1049176-4-jmaloy@redhat.com>
[not found] ` <20240517002110.1c957b6d@elisabeth>
[not found] ` <bd793a8a-02aa-47f0-17b7-d1ab188902b4@redhat.com>
2024-05-17 12:15 ` Stefano Brivio [this message]
2024-05-17 14:13 ` [PATCH v5 3/3] tcp: allow retransmit when peer receive window is zero Jon Maloy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240517141519.499f362f@elisabeth \
--to=sbrivio@redhat.com \
--cc=dgibson@redhat.com \
--cc=jmaloy@redhat.com \
--cc=lvivier@redhat.com \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).