Re: our benchmark-suite

guile-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: our benchmark-suite

From:	Andy Wingo
Subject:	Re: our benchmark-suite
Date:	Tue, 24 Apr 2012 10:26:03 +0200
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)

Heya Neil,

I pushed a change to the format of the text logged to the console when
you do a ./benchmark-guile.  It seems that this affected your
benchmarking bot.  I was hoping that this would not be the case, because
the benchmark suite also writes a log to `guile-benchmark.log', and I
tried to avoid changing the format of that file.

Can you take a look at your bot and see if it's possible to switch to
use benchmark-guile.log instead of the console output?

Other suggestions as to a solution are also most welcome.

Thanks!

Andy

On Mon 23 Apr 2012 11:22, Andy Wingo <address@hidden> writes:

> Hi,
>
> I was going to try to optimize vhash-assoc, but I wanted a good
> benchmark first, so I started to look at our benchmark suite.  We have
> some issues to deal with.
>
> For those of you who are not familiar with the benchmark suite, we have
> a bunch of benchmarks in benchmark-suite/benchmarks/: those files that
> end in ".bm".  The format of a .bm file is like our .test files, except
> that instead of `pass-if' and the like, we have `benchmark'.  You run
> benchmarks via ./benchmark-guile in the $top_builddir.
>
> The benchmarking framework tries to be appropriate for microbenchmarks,
> as the `benchmark' form includes a suggested number of iterations.
> Ideally when you create a benchmark, you give it a number of iterations
> that makes it run approximately as long as the other benchmarks.
>
> When the benchmarking suite was first made, 10 years ago, there was an
> empty "reference" benchmark that was created to run for approximately 1
> second.  Currently it runs in 0.012 seconds.  This is one problem: the
> overall suite has old iteration counts.  There is a facility for scaling
> the iteration counts of the suite as a whole, but it is unused.
>
> Another problem is that the actual runtime of the various benchmarks
> varies quite a lot, from 3.3 seconds for assoc (srfi-1), to 0.012 for
> if.bm.
>
> Short runtimes magnify imprecisions in measurement.  It used to be that
> the measurement function was "times", but I just changed that to the
> higher-precision get-internal-real-time / get-internal-run-time.  Still,
> though, there is nothing you can do for a benchmark that runs in a few
> milliseconds or less.
>
> Another big problem is that some effect-free microbenchmarks optimize
> away.  For example, the computations in arithmetic.bm fold entirely.
> The same goes for if.bm.  These benchmarks do not measure anything
> useful.
>
> The benchmarking suite attempts to compensate for the overhead of the
> test by providing for "core time": the time taken to run a benchmark,
> minus the time taken to run an empty benchmark with the same number of
> iterations.  The benchmark itself is compiled as a thunk, and the
> framework calls the thunk repeatedly.  In theory this sounds good.  In
> practice however, for high-iteration microbenchmarks, the overhead of
> the thunk call outweighs any micro-benchmark being called.
>
> For what it's worth, the current overhead of the benchmark appears to be
> about 35 microseconds per iteration, on my laptop.  If we inline the
> iteration into the benchmark itself, rather than calling a thunk
> repeatedly, we can bring that down to around 13 microseconds.  However
> it's probably best to leave it as it is, because if we inline the loop,
> it's liable to be optimized out.
>
> So, those are the problems: benchmarks running for inappropriate,
> inconsistent durations; inappropriate benchmarks; and benchmarks being
> optimized out.
>
> My proposal is to rebase the iteration count in 0-reference.bm to run
> for 0.5s on some modern machine, and adjust all benchmarks to match,
> removing those benchmarks that do not measure anything useful.  Finally
> we should perhaps enable automatic scaling of the iteration count.  What
> do folks think about that?
>
> On the positive side, all of our benchmarks are very clear that they are
> a time per number of iterations, and so this change should not affect
> users that measure time per iteration.
>
> Regards,
>
> Andy

-- 
http://wingolog.org/

[Prev in Thread]

Current Thread

[Next in Thread]

our benchmark-suite, Andy Wingo, 2012/04/23
- Re: our benchmark-suite, Andy Wingo <=
- Re: our benchmark-suite, Ludovic Courtès, 2012/04/25
  - Re: our benchmark-suite, Neil Jerram, 2012/04/28

Prev by Date: Re: Broken Backtraces, and Part of a Solution
Next by Date: Re: wip-cse
Previous by thread: our benchmark-suite
Next by thread: Re: our benchmark-suite
Index(es):
- Date
- Thread