[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: our benchmark-suite
Re: our benchmark-suite
Tue, 24 Apr 2012 10:26:03 +0200
Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)
I pushed a change to the format of the text logged to the console when
you do a ./benchmark-guile. It seems that this affected your
benchmarking bot. I was hoping that this would not be the case, because
the benchmark suite also writes a log to `guile-benchmark.log', and I
tried to avoid changing the format of that file.
Can you take a look at your bot and see if it's possible to switch to
use benchmark-guile.log instead of the console output?
Other suggestions as to a solution are also most welcome.
On Mon 23 Apr 2012 11:22, Andy Wingo <address@hidden> writes:
> I was going to try to optimize vhash-assoc, but I wanted a good
> benchmark first, so I started to look at our benchmark suite. We have
> some issues to deal with.
> For those of you who are not familiar with the benchmark suite, we have
> a bunch of benchmarks in benchmark-suite/benchmarks/: those files that
> end in ".bm". The format of a .bm file is like our .test files, except
> that instead of `pass-if' and the like, we have `benchmark'. You run
> benchmarks via ./benchmark-guile in the $top_builddir.
> The benchmarking framework tries to be appropriate for microbenchmarks,
> as the `benchmark' form includes a suggested number of iterations.
> Ideally when you create a benchmark, you give it a number of iterations
> that makes it run approximately as long as the other benchmarks.
> When the benchmarking suite was first made, 10 years ago, there was an
> empty "reference" benchmark that was created to run for approximately 1
> second. Currently it runs in 0.012 seconds. This is one problem: the
> overall suite has old iteration counts. There is a facility for scaling
> the iteration counts of the suite as a whole, but it is unused.
> Another problem is that the actual runtime of the various benchmarks
> varies quite a lot, from 3.3 seconds for assoc (srfi-1), to 0.012 for
> Short runtimes magnify imprecisions in measurement. It used to be that
> the measurement function was "times", but I just changed that to the
> higher-precision get-internal-real-time / get-internal-run-time. Still,
> though, there is nothing you can do for a benchmark that runs in a few
> milliseconds or less.
> Another big problem is that some effect-free microbenchmarks optimize
> away. For example, the computations in arithmetic.bm fold entirely.
> The same goes for if.bm. These benchmarks do not measure anything
> The benchmarking suite attempts to compensate for the overhead of the
> test by providing for "core time": the time taken to run a benchmark,
> minus the time taken to run an empty benchmark with the same number of
> iterations. The benchmark itself is compiled as a thunk, and the
> framework calls the thunk repeatedly. In theory this sounds good. In
> practice however, for high-iteration microbenchmarks, the overhead of
> the thunk call outweighs any micro-benchmark being called.
> For what it's worth, the current overhead of the benchmark appears to be
> about 35 microseconds per iteration, on my laptop. If we inline the
> iteration into the benchmark itself, rather than calling a thunk
> repeatedly, we can bring that down to around 13 microseconds. However
> it's probably best to leave it as it is, because if we inline the loop,
> it's liable to be optimized out.
> So, those are the problems: benchmarks running for inappropriate,
> inconsistent durations; inappropriate benchmarks; and benchmarks being
> optimized out.
> My proposal is to rebase the iteration count in 0-reference.bm to run
> for 0.5s on some modern machine, and adjust all benchmarks to match,
> removing those benchmarks that do not measure anything useful. Finally
> we should perhaps enable automatic scaling of the iteration count. What
> do folks think about that?
> On the positive side, all of our benchmarks are very clear that they are
> a time per number of iterations, and so this change should not affect
> users that measure time per iteration.