On Wed, Aug 07, 2024 at 03:06:44PM +0200, Stefano Brivio wrote:
> On Wed, 7 Aug 2024 20:51:08 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Wed, Aug 07, 2024 at 12:11:26AM +0200, Stefano Brivio wrote:
> > > On Mon,  5 Aug 2024 22:36:45 +1000
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > Add a new test script to run the equivalent of the tests in build/all
> > > > using exeter and Avocado.  This new version of the tests is more robust
> > > > than the original, since it makes a temporary copy of the source tree so
> > > > will not be affected by concurrent manual builds.  
> > > 
> > > I think this is much more readable than the previous Python attempt.  
> > 
> > That's encouraging.
> > 
> > > On the other hand, I guess it's not an ideal candidate for a fair
> > > comparison because this is exactly the kind of stuff where shell
> > > scripting shines: it's a simple test that needs a few basic shell
> > > commands.  
> > 
> > Right.
> > 
> > > On that subject, the shell test is about half the lines of code (just
> > > skipping headers, it's 48 lines instead of 90... and yes, this version  
> > 
> > Even ignoring the fact that this case is particularly suited to shell,
> > I don't think that's really an accurate comparison, but getting to one
> > is pretty hard.
> > 
> > The existing test isn't 48 lines of shell, but of "passt test DSL".
> > There are several hundred additional lines of shell to interpret that.
> 
> Yeah, but the 48 lines is all I have to look at, which is what matters
> I would argue. That's exactly why I wrote that interpreter.
> 
> Here, it's 90 lines of *test file*.

Fair point.  Fwiw, it's down to 77 so far for my next draft.

> > Now obviously we don't need all of that for just this test.  Likewise
> > the new Python test needs at least exeter - that's only a couple of
> > hundred lines - but also Avocado (huge, but only a small amount is
> > really relevant here).
> > 
> > > now uses a copy of the source code, but that would be two lines).  
> > 
> > I feel like it would be a bit more than two lines, to copy exactly
> > what youwant, and to clean up after yourself.
> 
> host    mkdir __STATEDIR__/sources
> host    cp --parents $(git ls-files) __STATEDIR__/sources
> 
> ...which is actually an improvement on the original as __STATEDIR__ can
> be handled in a centralised way, if one wants to keep that after the
> single test case, after the whole test run, or not at all.

Huh, I didn't know about cp --parents, which does exactly what's
needed.  In the Python library there are, alas, several things that do
almost but not quite what's needed.  I guess I could just invoke 'cp
--parents' myself.

> > > In terms of time overhead, dropping delays to make the display capture
> > > nice (a feature that we would anyway lose with exeter plus Avocado, if
> > > I understood correctly):  
> > 
> > Yes.  Unlike you, I'm really not convinced of the value of the display
> > capture versus log files, at least in the majority of cases.
> 
> Well, but I use that...
> 
> By the way, openQA nowadays takes periodic screenshots. That's certainly
> not as useful, but I'm indeed not the only one who benefits from
> _seeing_ tests as they run instead of correlating log files from
> different contexts, especially when you have a client, a server, and
> what you're testing in between.

If you have to correlate multiple logs that's a pain, yes.  My
approach here is, as much as possible, to have a single "log"
(actually stdout & stderr) from the top level test logic, so the
logical ordering is kind of built in.

> > I certainly don't think it's worth slowing down the test running in the
> > normal case.
> 
> It doesn't significantly slow things down,

It does if you explicitly add delays to make the display capture nice
as mentioned above.

> but it certainly makes it
> more complicated to run test cases in parallel... which you can't do
> anyway for throughput and latency tests (which take 22 out of the 37
> minutes of a current CI run), unless you set up VMs with CPU pinning and
> cgroups, or a server farm.

So, yes, the perf tests take the majority of the runtime for CI, but
I'm less concerned about runtime for CI tests.  I'm more interested in
runtime for a subset of functional tests you can run repeatedly while
developing.  I routinely disable the perf and other slow tests, to get
a subset taking 5-7 minutes.  That's ok, but I'm pretty confident I
can get better coverage in significantly less time using parallel
tests.

> I mean, I see the value of running things in parallel in a general
> case, but I don't think you should just ignore everything else.
> 
> > > $ time (make clean; make passt; make clean; make pasta; make clean; make qrap; make clean; make; d=$(mktemp -d); prefix=$d make install; prefix=$d make uninstall; )
> > > [...]
> > > real	0m17.449s
> > > user	0m15.616s
> > > sys	0m2.136s  
> > 
> > On my system:
> > [...]
> > real	0m20.325s
> > user	0m15.595s
> > sys	0m5.287s
> > 
> > > compared to:
> > > 
> > > $ time ./run
> > > [...]
> > > real	0m18.217s
> > > user	0m0.010s
> > > sys	0m0.001s
> > > 
> > > ...which I would call essentially no overhead. I didn't try out this
> > > version yet, I suspect it would be somewhere in between.  
> > 
> > Well..
> > 
> > $ time PYTHONPATH=test/exeter/py3 test/venv/bin/avocado run test/build/build.json 
> > [...]
> > RESULTS    : PASS 5 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
> > JOB TIME   : 10.85 s
> > 
> > real	0m11.000s
> > user	0m23.439s
> > sys	0m7.315s
> > 
> > Because parallel.  It looks like the avocado start up time is
> > reasonably substantial too, so that should look better with a larger
> > set of tests.
> 
> With the current set of tests, I doubt it's ever going to pay off. Even
> if you run the non-perf tests in 10% of the time, it's going to be 24
> minutes instead of 37.

Including the perf tests, probably not.  Excluding them (which is
extremely useful when actively coding) I think it will.

> I guess it will start making sense with larger matrices of network
> environments, or with more test cases (but really a lot of them).

We could certainly do with a lot more tests, though I expect it will
take a while to get them.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson