On Thu, Jan 16, 2025 at 09:32:49PM +0100, Stefano Brivio wrote: > Before SO_PEEK_OFF support was introduced by commit e63d281871ef > ("tcp: leverage support of SO_PEEK_OFF socket option when available"), > we would peek data from sockets using a "discard" buffer as first > iovec element, so that, unless we had no pending data at all, we would > always get a positive return code from recvmsg() (except for closing > connections or errors). > > If we couldn't send more data to the guest, in the window, we would > set the STALLED flag (causing the epoll descriptor to switch to > edge-triggered mode), and return early from tcp_data_from_sock(). > > With SO_PEEK_OFF, we don't have a discard buffer, and if there's data > on the socket, but nothing beyond our current peeking offset, we'll > get EAGAIN instead of our current "discard" length. In that case, we > return even earlier, and we don't set EPOLLET on the socket as a > result. > > As reported by Asahi Lina, this causes event loops where the kernel is > signalling socket readiness, because there's data we didn't dequeue > yet (waiting for the guest to acknowledge it), but we won't actually > peek anything new, and return early without setting EPOLLET. > > This is the original report, mentioning the originally proposed fix: Reviewed-by: David Gibson -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson