Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register o

avr-gcc-list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register o

From:	Dave N6NZ
Subject:	Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations]
Date:	Sun, 13 Jan 2008 15:19:29 -0800
User-agent:	Thunderbird 1.5 (X11/20051201)



Weddington, Eric wrote:


Hi John, Dave, others,

Here are some random thoughts about a benchmark test suite:

- GCC has a page on benchmarks:
<http://gcc.gnu.org/benchmarks/>
However all of those are geared towards larger processors and host
systems. There is a link to a benchmark that focuses on code size,
CSiBE, <http://www.inf.u-szeged.hu/csibe/>. Again, that benchmark is
geared towards larger processors.

This creates a need to have a benchmark that is geared towards 8-bit
microcontroller environments in general, and specifically for the AVR.

What would we like to test?

Code size for sure. Everyone always seems to be interested in code size.
There is an interest in seeing how the GCC compiler performs from one
version to the next, to see if optimizations have improved or if they
have regressed.

Which I would call regression tests, not "benchmarks", per se. Ofperformance regressions, I would guess that code size regressions under-Os are the #1 priority for the typical user. (A friend is currentlytearing his hair out over a code size regression in a commercial PIC Ccompiler -- he needs to release a minor firmware update to the field...but not even the original code fits his flash any more...)

It's worth drawing a distinction between benchmarks and regressiontests. They need to be written differently. A regression test needs tosensitize a particular condition, and needs to be small enough to bedebuggable. A benchmark needs to be "realistic", which often makes themharder to debug. I say we need both. The performance regression testscan easily roll into release criteria. A suite of performancebenchmarks is more useful as a confirmatory "measure of goodness" -- butactual mysteries in the aggregate score will most likely be chased withsmaller tests.

My guess is that existing tests my help us a lot in the benchmarkcategory, but the regression tests will require some elbow grease on ourpart to get a good set. There's a good chance we can extract goodregression tests from existing benchmark-sized tests.

A semi-related question is how many of these tests can be pushed upstream? If we could get a handful of uCtlr-oriented code sizeregression tests packaged up so that the developers of the genericoptimizer could run them as release criteria, it would, I would think,improve the overall quality of gcc for all uCtlr targets.


There is also an interest in comparing AVR compilers, such as how GCC
compares to IAR, Codevision or ImageCraft compilers.

Who is interested? gcc developers, as a means to keep gcc competitive?Or potential users? The former is benchmarking, the latter is movingtowards bench-marketing. Not that marketing is bad, but that sort ofthing can be a distraction. In any case, the tests that are meaningfulhere are the benchmark "overall goodness" test suite, not the targetedtest suite.


And sometimes there is an interest in comparing AVR against other
microcontrollers, notably Microchip's PIC and TI's MSP430.

Different processor with same compiler? Different processor with bestcompiler? -- Now this is beginning to sound like SPEC.


Because there are these different interests, it is challenging to come
up with appropriate code samples to showcase and benchmark these
different issues. But we could also implement this in stages, and focus
on AVR-specific code, and GCC-specific AVR code at that.

Clarity of classification is import. Different buckets for differentissues.


If we are going to put together a benchmark test suite, like others
benchmarks for GCC (for larger processors), then I would think that it
would be better to model it somewhat after those other benchmarks. I see
that they tend to use publicly available code, and a variety of
different types of applications.

For benchmarking, and bench-marketing, that's a good approach. I'll beredundant and say those are probably not what you want to be debugging.It would make sense for what I'll call a "avr-gcc dashboard". I see aweb page with a bunch of bar graphs on it. A summary bar at the topthat is the weighted sum of individual test bars. As an avr-gcc user,that kind of summary page would be very useful from one release to thenext for setting expectations regarding performance on your ownapplication. As an avr-gcc release master, it's a good dashboard fortracking progress and release worthy-ness.

We should have something similar. Some
suggested projects: FreeRTOS (for the AVR)

Sounds good,
>, uIP (however, we need to

pick a specific implementation of it for the AVR; I have a copy of

uIP-Crumb644),

Another good one

the Atmel 802.15.4 MAC,

Need to check license on that one -- but a good choice otherwise

and the GCC version of the
Butterfly firmware. I also have a copy of the "TI Competitive
Benchmark", which they, and other semiconductor companies, have used to
do comparisons between processors.

Not familiar with it. Also, check the license. Processor manufacturers(like, oh, for instance, *all* the several I have worked for) are verytouchy about benchmarks and benchmark publications. My sea charts havea notation: "Here be lawyers".

Does anyone have other suggestions on projects to include in the
Benchmark? One are that seems to be lacking is some application that
uses floating point. Any help to find some application in this area
would be much appreciated.

Yup. Floating point is important, but we could probably make somesynthetic benchmarks pretty quickly that were meaningful. Need to watchthe data sets, though, since run time can vary greatly once you get intogradual underflow or NaN's and such. Also, remember these may need torun on a simulator, and need to complete in our lifetime.


There needs to be some consensus on what we measure, how we measure it,
what output files we want generated, and hopefully some way to
automatically generate composite results. I'm certainly open to anything
in this area. I would think that we need to be as open as possible on
this, with documentation (minimal, it can be a text file) on what are
our methods, how the results were arrived at, but importantly that the
secondary/generated files be available for others to review and verify
the results.

Agree completely.


On practicalities: I am certainly willing to host the benchmark test
suite on the WinAVR project on SourceForge and use it's CVS repository.
If it is desired to have it in a more neutral place, such as avr-libc,
I'm open to that too, if Joerg Wunsch is willing.

Seems to me that as long as they are publicly available under anappropriate license, it doesn't really matter much who backs them up :)


Thoughts?


Test categories:
1. float v. scalar
2. targeted test v. benchmark v. published dashboard metric
3. member of quick v. extended v. full test list
4. size v. speed

That unrolls into 36 test lists, but the same test may appear multipletimes (in both quick and extended, perhaps both size and speed).


As to priorities, IMO the top two priorities are:
1. targeted scalar size
2. targeted scalar speed

Why? To get tests that target specific optimization regressions. Asize regression is more painful to an embedded developer than a speedregression. Floating point math is largely in a library so less at riskfor a compiler optimization regression.

I'm not saying other things are not important, that's just my take onwhat to tackle first (after infrastructure, of course.)


-dave

BTW -- having a defined place to put a performance regression test is agood start. Any performance regression that pops up should have a testwritten for it and cataloged in the framework.


Thanks,
Eric Weddington

[Prev in Thread]

Current Thread

[Next in Thread]

AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations], (continued)

Prev by Date: RE: Simulator for GCC Testing [was: RE: [Fwd: Re: [avr-gcc-list]GCC-AVR Register optimisations]]
Next by Date: Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations]
Previous by thread: RE: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Registeroptimisations]
Next by thread: Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations]
Index(es):
- Date
- Thread