qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] performance monitor


From: Rob Landley
Subject: Re: [Qemu-devel] performance monitor
Date: Fri, 4 Jan 2008 02:49:22 -0600
User-agent: KMail/1.9.6 (enterprise 0.20070907.709405)

On Thursday 03 January 2008 15:38:02 Clemens Kolbitsch wrote:
> Does anyone have an idea on how I can measure performance in qemu to a
> somewhat accurate level?

hwclock --show > time1
tar xvjf linux-2.6.23.tar.bz2 && cd linux-2.6.23 && make allnoconfig && make 
cd ..
hwclock --show > time2

Do that on host and client, and you've got a ratio of the performance of qemu 
to your host that should be good to within a few percent.

> I have modified qemu (the memory handling) and the 
> linux kernel and want to find out the penalty this introduced... does
> anyone have any comments / ideas on this?

If it's something big, you can compare the result in minutes and seconds.  
That's probably the best you're going to do.  (Although really you want 
hwclock --show before and after, and then do the math.  That tunnels out to 
the host system to get its idea of the time, which doesn't get thrown off by 
timer interrupt delivery (as a signal) getting deferred by the host system's 
scheduler.  Of course the fact that hwclock _takes_ a second or so to read 
the clock is a bit of a downer, but anything that takes less than a minute or 
so to run isn't going to give you a very accurate time because the 
performance of qemu isn't constant, and your results are going to skew all 
over the place.

Especially for small things, the performance varies from run to run.  Start by 
imagining qemu as having the mother of all page fault latencies.  The cost of 
faulting code into the L2 cache includes dynamic recompilation, which is 
expensive.

Worse, when the dynamic recompilation buffer fills up it blanks the whole 
thing, and recompiles every new page it hits one at a time until the buffer 
fills up again.  (What is it these days, 16 megs of translated code before it 
resets?)  No LRU or anything, no cache management at _all_, just "when the 
bucket fills up, dump it and start over".  (Well, that's what it did back 
around the last stable release anyway.  It has been almost a year since then, 
so maybe it's changed.  I've been busy with other things and not really 
keeping track of changes that didn't affect what I could and couldn't get to 
run.)

So anyway, depending on what code you run in what order, the performance can 
_differ_ from one run to the next due to when the cache gets blanked and 
stuff gets retranslated.  By a lot.  There's no obvious way to predict this 
or control it.  And the "software" clock inside your emulated system can lie 
to you about it if timer interrupts get deferred.

All this should pretty much average out if you do something big with lots of 
execs (like build a linux kernel from source).  But if you do something small 
expect serious butterfly effects.  Expect microbenchmarks to swing around 
wildly.

Quick analogy: you know the performance difference faulting your executable in 
from disk vs running it out of cache?  Imagine a daemon that makes random 
intermittent calls to "echo 1 > /proc/sys/vm/drop_caches", and now try to do 
a sane benchmark.  No matter what you use to measure, what you're measuring 
isn't going to be consistent from one run to the next.

Performance should be better (and more stable) with kqemu or kvm.  Maybe that 
you can benchmark sanely, I wouldn't know.  Ask somebody else. :)

P.S.  Take the above with a large grain of salt, I'm not close to an expert in 
this area...

Rob
-- 
"One of my most productive days was throwing away 1000 lines of code."
  - Ken Thompson.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]