[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Robust timings in unit tests

From: Vadim Zeitlin
Subject: Re: [lmi] Robust timings in unit tests
Date: Thu, 11 May 2017 18:45:54 +0200

On Thu, 11 May 2017 14:57:18 +0000 Greg Chicares <address@hidden> wrote:

GC> On 2017-05-07 22:14, Vadim Zeitlin wrote:
GC> > On Sun, 7 May 2017 17:49:29 +0000 Greg Chicares <address@hidden> wrote:
GC> [...]
GC> > GC> An argument might be made for reporting the lowest measurement rather
GC> > GC> than the mean.
GC> > 
GC> >  This seems a good argument to me and this is exactly what I do when
GC> > measuring CPU-bound code.
GC> Commit 1a629bf changed AliquotTimer so that it reports the minimum.

 Thanks! Is it normal that I (reproducibly) get the warning below in this
test output now:

  5.268e-08 s mean;          0 us least of 189833 runs
  7.143e-06 s mean;          7 us least of 140 runs
  7.088e-05 s mean;         70 us least of 100 runs
  1.000e-02 s: first trial took longer than 1.000e-02 s desired limit
  1.000e-02 s mean;      10001 us least of  10 runs
  1.000e-02 s mean;      10001 us least of  11 runs
  1.000e+00 s mean;    1000065 us least of   2 runs


GC> I was especially reluctant to give up the "Third" illusion above for
GC> 'expression_template_0_test', because I had once set great store by its
GC> results, though that was probably with gcc-3.0 on a one-core 3.5 GHz
GC> CPU...because now there's no difference to measure on x64_86-linux-gnu
GC> except for the "STL" methods that are known to be awful:
GC>   Speed tests: array length 1000
GC>   C               : 9.877e-07 s mean;          0 us least of 10125 runs
GC>   STL plain       : 4.006e-06 s mean;          3 us least of 2496 runs
GC>   STL fancy       : 1.462e-06 s mean;          1 us least of 6840 runs
GC>   valarray        : 9.952e-07 s mean;          0 us least of 10048 runs
GC>   uBLAS           : 1.020e-06 s mean;          0 us least of 9801 runs
GC>   PETE            : 8.727e-07 s mean;          0 us least of 11459 runs

 At least PETE is still the best, even if with a rather small margin.

GC> ...and the corresponding results for array lengths of {1, 10, 100} just
GC> look ridiculous now: they're all zero.

 I've just seen your next email saying that you're going to change it by
running the tests many times to obtain something more meaningful, so I
won't write this, but I'm still going to mention that now that tests can be
run under Linux it is also possible to use "perf" for measuring much more
than just the time taken and it works even for code executing for very
short time as perf uses hardware CPU counters, which allows to see, for
example, the number of instructions retired per cycle, which is arguably
the single most important number for estimating code performance. And perf
is pretty good as classic sampling profiler too: its default sampling
frequency is ~50kHz, but I can increase it to at least twice that on an
otherwise idle system, so you can collect samples as 10μs intervals.

 Unfortunately it's not very convenient to use perf tool with current lmi
performance tests as they typically try several strategies during the same
run, so all the results are mixed up, so I usually comment out all parts of
the test except the one I'm interested in. A much nicer, but also more
involved, way to do it would be to use perf API (perf_event_open() etc) to
directly measure the statistics of each region of interest from the test
itself. Maybe a more realistic compromise, i.e. something requiring some,
but not too much, work, while allowing to measure performance of each code
version independently, could be to add command line options allowing to
select the tests to run, e.g. "./expression_template_0_test -a --run=PETE".


reply via email to

[Prev in Thread] Current Thread [Next in Thread]