From: Stefano Brivio <sbrivio@redhat.com>
To: Eric Dumazet <edumazet@google.com>
Cc: Jon Maloy <jmaloy@redhat.com>,
Neal Cardwell <ncardwell@google.com>,
netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org,
passt-dev@passt.top, lvivier@redhat.com, dgibson@redhat.com,
eric.dumazet@gmail.com, Menglong Dong <menglong8.dong@gmail.com>
Subject: Re: [net,v2] tcp: correct handling of extreme memory squeeze
Date: Mon, 27 Jan 2025 11:27:12 +0100 [thread overview]
Message-ID: <20250127112712.50bb6341@elisabeth> (raw)
In-Reply-To: <CANn89iJ4u5QBfhc1LC6ipmmmiEG0bCWhRG1obm3=05A_BsPt4w@mail.gmail.com>
On Mon, 27 Jan 2025 11:06:07 +0100
Eric Dumazet <edumazet@google.com> wrote:
> On Mon, Jan 27, 2025 at 11:01 AM Stefano Brivio <sbrivio@redhat.com> wrote:
> >
> > On Fri, 24 Jan 2025 12:40:16 -0500
> > Jon Maloy <jmaloy@redhat.com> wrote:
> >
> > > I can certainly clear tp->pred_flags and post it again, maybe with
> > > an improved and shortened log. Would that be acceptable?
> >
> > Talking about an improved log, what strikes me the most of the whole
> > problem is:
> >
> > $ tshark -r iperf3_jon_zero_window.pcap -td -Y 'frame.number in { 1064 .. 1068 }'
> > 1064 0.004416 192.168.122.1 → 192.168.122.198 TCP 65534 34482 → 5201 [ACK] Seq=1611679466 Ack=1 Win=36864 Len=65480
> > 1065 0.007334 192.168.122.1 → 192.168.122.198 TCP 65534 34482 → 5201 [ACK] Seq=1611744946 Ack=1 Win=36864 Len=65480
> > 1066 0.005104 192.168.122.1 → 192.168.122.198 TCP 56382 [TCP Window Full] 34482 → 5201 [ACK] Seq=1611810426 Ack=1 Win=36864 Len=56328
> > 1067 0.015226 192.168.122.198 → 192.168.122.1 TCP 54 [TCP ZeroWindow] 5201 → 34482 [ACK] Seq=1 Ack=1611090146 Win=0 Len=0
> > 1068 6.298138 fe80::44b3:f5ff:fe86:c529 → ff02::2 ICMPv6 70 Router Solicitation from 46:b3:f5:86:c5:29
> >
> > ...and then the silence, 192.168.122.198 never announces that its
> > window is not zero, so the peer gives up 15 seconds later:
> >
> > $ tshark -r iperf3_jon_zero_window_cut.pcap -td -Y 'frame.number in { 1069 .. 1070 }'
> > 1069 8.709313 192.168.122.1 → 192.168.122.198 TCP 55 34466 → 5201 [ACK] Seq=166 Ack=5 Win=36864 Len=1
> > 1070 0.008943 192.168.122.198 → 192.168.122.1 TCP 54 5201 → 34482 [FIN, ACK] Seq=1 Ack=1611090146 Win=778240 Len=0
> >
> > Data in frame #1069 is iperf3 ending the test.
> >
> > This didn't happen before e2142825c120 ("net: tcp: send zero-window
> > ACK when no memory") so it's a relatively recent (17 months) regression.
> >
> > It actually looks pretty simple (and rather serious) to me.
>
> With all that, it should be pretty easy to cook a packetdrill test, right ?
Not really :( because to reproduce this exact condition you need to
somehow get the right amount of memory pressure so that you can
actually establish a connection, start the transfer, and then exhaust
the receive buffer at the right moment.
And packetdrill doesn't do that. Sure, it would be great if it did, and
it's probably a nice feature to implement... given enough time. Given
less time, I guess fixing regressions has a higher priority.
One could perhaps tweak sk->sk_rcvbuf as you suggested but that just
artificially reproduces one part of it. It's not a really fitting test.
For example: when would you increase it back?
> packetdrill tests are part of tools/testing/selftests/net/ already, we
> are not asking for something unreasonable.
I would agree, in general, except that I don't see a way to craft a
test like this with packetdrill. At least not trivially with the
current feature set.
On top of that, this is not a new feature, it's a fix for a regression
(that was introduced without adding any test, of course). And the fix
itself was definitely tested, just not with packetdrill.
Requesting that tests are 1. automated and 2. written with a specific
tool is something I can quite understand for general convenience, but
I don't think it always makes sense.
Especially as this fix has been blocked for about 9 months now because
of the fact that automating a test for it is quite hard.
--
Stefano
next prev parent reply other threads:[~2025-01-27 10:27 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-17 21:40 [net,v2] tcp: correct handling of extreme memory squeeze jmaloy
2025-01-17 22:09 ` Eric Dumazet
2025-01-17 22:27 ` Stefano Brivio
2025-01-18 17:01 ` Jason Xing
2025-01-18 20:04 ` Neal Cardwell
2025-01-20 5:03 ` Jon Maloy
2025-01-20 16:10 ` Jon Maloy
2025-01-20 16:22 ` Eric Dumazet
2025-01-24 17:40 ` Jon Maloy
2025-01-27 9:53 ` Eric Dumazet
2025-01-27 10:01 ` Stefano Brivio
2025-01-27 10:06 ` Eric Dumazet
2025-01-27 10:27 ` Stefano Brivio [this message]
2025-01-27 10:17 ` Jason Xing
2025-01-27 10:32 ` Stefano Brivio
2025-01-27 13:37 ` Menglong Dong
2025-01-27 14:03 ` Stefano Brivio
2025-01-27 16:37 ` Eric Dumazet
-- strict thread matches above, loose matches on Subject: below --
2025-01-16 2:29 [net, v2] " Jon Maloy
2025-01-16 21:14 ` Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250127112712.50bb6341@elisabeth \
--to=sbrivio@redhat.com \
--cc=davem@davemloft.net \
--cc=dgibson@redhat.com \
--cc=edumazet@google.com \
--cc=eric.dumazet@gmail.com \
--cc=jmaloy@redhat.com \
--cc=kuba@kernel.org \
--cc=lvivier@redhat.com \
--cc=menglong8.dong@gmail.com \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://passt.top/passt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for IMAP folder(s).