On Fri, Jul 12, 2024 at 03:04:50PM -0400, Jon Maloy wrote: > A bug in kernel TCP may lead to a deadlock where a zero window is sent > from the guest peer, while it is unable to send out window updates even > after socket reads have freed up enough buffer space to permit a larger > window. In this situation, new window advertisements from the peer can > only be triggered by data packets arriving from this side. > > However, currently such packets are never sent, because the zero-window > condition prevents this side from sending out any packets whatsoever > to the peer. > > We notice that the above bug is triggered *only* after the peer has > dropped one or more arriving packets because of severe memory squeeze, > and that we hence always enter a retransmission situation when this > occurs. This also means that the implementation goes against the > RFC-9293 recommendation that a previously advertised window never > should shrink. > > RFC-9293 seems to permit that we can continue sending up to the right > edge of the last advertised non-zero window in such situations, so that > is what we do to resolve this situation. > > It turns out that this solution is extremely simple to implememt in the > code: We just omit to save the advertised zero-window when we see that > it has shrunk, i.e., if the acknowledged sequence number in the > advertisement message is lower than that of the last data byte sent > from our side. > > When that is the case, the following happens: > - The 'retr' flag in tcp_data_from_tap() will be 'false', so no > retransmission will occur at this occasion. > - The data stream will soon reach the right edge of the previously > advertised window. In fact, in all observed cases we have seen that > it is already there when the zero-advertisement arrives. > - At that moment, the flags STALLED and ACK_FROM_TAP_DUE will be set, > unless they already have been, meaning that only the next timer > expiration will open for data retransmission or transmission. > - When that happens, the memory squeeze at the guest will normally have > abated, and the data flow can resume. > > It should be noted that although this solves the problem we have at > hand, it is a work-around, and not a genuine solution to the described > kernel bug. > > Suggested-by: Stefano Brivio > Signed-off-by: Jon Maloy I only half-understand the problem here, but the fix LGTM. Reviewed-by: David Gibson -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson