From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefano Brivio To: passt-dev@passt.top Subject: Re: Weird timeout issue with passt Date: Thu, 23 Jun 2022 11:32:12 +0200 Message-ID: <20220623113212.727162c6@elisabeth> In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3572742779308525474==" --===============3572742779308525474== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On Thu, 23 Jun 2022 15:59:20 +1000 David Gibson wrote: > I alluded to this in my last patchset, but here's the information I've > gathered on the current problem I'm hitting running the passt distro > tests. It's pretty weird. > > On at least two occasions the tests have stalled during the Fedora 30, > aarch64 test, with the guest getting a timeout downloading the package > lists before installing new packages. For me this is selecting the > mirror mirror.2degrees.nz. I'm not sure which factors are are > relevant to reproducing the problem though. > > * The problem seems to me that the download suddenly stops progressing > * partway through, causing dnf to eventually time out > > * If I manually try a "dnf clean all && dnf makecache -v" using the > same guest image, it doesn't fail every time, but it fails > significantly more often than not > > * It doesn't fail on the same file every time > > * I haven't been able to reproduce manually downloading the failing > file with curl (tried repeatedly) > > * If I restrict dnf to a single repository rather than the whole set, > I haven't managed to reproduce the problem If I remember correctly, dnf downloads from multiple repositories at the same time, which might explain these two points. > > * If I use qemu's -net user slirp instead of passt with the same disk > image , I haven't been able to reproduce the problem (tried a bunch > of times) > > * I've reproduced with the guest using both IPv4 and IPv6 > > * I have reproduced what looks like the same problem with an x86 guest > image under KVM (also Fedora 30), but it seems to happen much less > often (seen once in 10 or more attempts) > > * Seems to reproduce fairly readily with an x86 guest under TCG > though, so I'm guessing the difference is timing related. ...hmm, I never hit this, and I guess our versions of qemu eventually crossed at some point -- I'm using 7.0.50 (v7.0.0-937-gd6900f445e) right now. Passing a capture file via --pcap for that instance of passt (started at the beginning of fedora/tests) might help shine some light on this. You could also add run ./test with PCAP=1, but that would capture everything, which will take a ton of space. -- Stefano --===============3572742779308525474==--