qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] How to measure guest memory access (qemu_ld/qemu_st) ti


From: Lluís Vilanova
Subject: Re: [Qemu-devel] How to measure guest memory access (qemu_ld/qemu_st) time?
Date: Fri, 15 Jun 2012 01:30:55 +0300
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.1.50 (gnu/linux)

陳韋任 (Wei-Ren Chen) writes:

> On Wed, Jun 13, 2012 at 12:43:28PM +0200, Laurent Desnogues wrote:
>> On Wed, Jun 13, 2012 at 5:14 AM, 陳韋任 (Wei-Ren Chen)
>> <address@hidden> wrote:
>> > Hi all,
>> >
>> >  I suspect that guest memory access (qemu_ld/qemu_st) account for the 
>> > major of
>> > time spent in system mode. I would like to know precisely how much (if 
>> > possible).
>> > We use tools like perf [1] before, but since the logic of guest memory 
>> > access aslo
>> > embedded in the host binary not only helper functions, the result cannot be
>> > relied. The current idea is adding helper functions before/after guest 
>> > memory
>> > access logic. Take ARM guest on x86_64 host for example, should I add the 
>> > helper
>> > functions before/after tcg_gen_qemu_{ld,st} in target-arm/translate.c or
>> > tcg_out_qemu_{ld,st} in tcg/i386/tcg-target.c? Or there is a better way to 
>> > know
>> > how much time QEMU spend on handling guest memory access?
>> 
>> I'm afraid there's no easy way to measure that: any change you make
>> to generated code will completely change the timing given that the ld/st
>> fast path is only a few instructions long.

>   Lluis, how's your opinion on that? Does your tracepoints have the same 
> timing
> issue, too?

They just give you a set of well-known events and a public API to insert
whatever you want in there, so whatever overhead you might have by directly
hacking into QEMU, you will have it also when using trace instrumentation.

Now that I think of it, you will have problems generating code to surround each
qemu_ld/st with a lightweight mechanism to get the time. In x86 it would be
rdtsc, but you want to generate a host rdtsc instruction inside the code
generated by QEMU's TCG, so you should also have to hack TCG (or the code
generation pointers) to issue an rdtsc instruction.


>> Another approach might be to run the program in user mode and then in system
>> mode (provided the guest OS is very light).

>   We ran SPEC2006 test input both in user and system mode (arm guest os). The
> result is that system mode is roughly 2x slower than user mode. Not sure if 
> the
> result is reasonable. 

Well, you have all the MMU checks in system mode.

You might try checking which percentage of the application is actually
performing memory oprerations. This could help you accept or dismiss your
theory.


Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth



reply via email to

[Prev in Thread] Current Thread [Next in Thread]