qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v6 01/22] instrument: Add documentation


From: Lluís Vilanova
Subject: Re: [Qemu-devel] [PATCH v6 01/22] instrument: Add documentation
Date: Sat, 30 Sep 2017 00:46:33 +0300
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux)

Emilio G Cota writes:

> On Fri, Sep 29, 2017 at 16:16:41 +0300, Lluís Vilanova wrote:
>> Lluís Vilanova writes:
>> [...]
>> > This was working on a much older version of instrumentation for QEMU, but 
>> > I can
>> > implement something that does the first use-case point above and some 
>> > filtering
>> > example (second use-case point) to see what's the performance difference.
>> 
>> Ok, so here's some numbers for the discussion (booting Emilio's ARM full 
>> system
>> image that immediately shuts down):
>> 
>> * Without instrumentation
>> 
>> real 0m10,099s
>> user 0m9,876s
>> sys  0m0,128s
>> 
>> * Count number of memory access writes, by instrumenting only when they are
>> executed
>> 
>> real 0m15,896s
>> user 0m15,752s
>> sys  0m0,108s
>> 
>> * Counting same, but the filtering is done at translation time (i.e., not
>> generate an execute-time callback if it's not a write)
>> 
>> real 0m11,084s
>> user 0m10,880s
>> sys  0m0,112s
>> 
>> As Peter said, the filtering can be added into the API to take advantage of
>> this "speedup", without exposing translation vs execution time callbacks.

> I'm not sure I understand this concept of filtering. Are you saying that in
> the first case, all memory accesses are instrumented, and then in the
> "access helper" we only call the user's callback if it's a memory write?
> And in the second case, we simply just generate a "write helper" instead
> of an "access helper". Am I understanding this correctly?

In the previous case (no filtering), the user callback is always called when a
memory access is *executed*, and the user then checks if the access mode is a
write to decide whether to increment a counter.

In this case (with filtering), a user callback is called when a memory access is
*translated*, and if the access mode is a write, the user generates a call to a
second callback that is executed every time a memory access is executed (only
that it is only generated for memory writes, the ones we care about).

Is this clearer?


>> * Counting number of executed instructions, by instrumenting the beginning of
>> each one of them
>> 
>> real 0m24,583s
>> user 0m24,352s
>> sys  0m0,184s
>> 
>> * Counting same, but per-TB numbers are collected at translation-time, and we
>> only generate a per-TB execution time callback to add the corresponding 
>> number
>> of instructions for that TB
>> 
>> real 0m11,151s
>> user 0m10,952s
>> sys  0m0,092s
>> 
>> This really needs to expose translation vs execution time callbacks to take
>> advantage of this "speedup".

> Clearly instrumenting per-TB is a significant net gain. I think we should
> definitely allow instrumenters to use this option.

> FWIW my experiments so far show similar numbers for instrumenting each
> instruction (haven't done the per-tb yet). The difference is that I'm
> exposing to instrumenters a copy of the guest instructions (const void *data,
> size_t size). These copies are kept around until TB's are flushed.
> Luckily there seems to be very little overhead in keeping these around,
> apart from the memory overhead -- but in terms of performance, the
> necessary allocations do not induce significant overhead.

To keep this use-case simpler, I added the memory access API I posted in this
series, where instrumenters can read guest memory (more general than passing a
copy of the current instruction).


Cheers,
  Lluis



reply via email to

[Prev in Thread] Current Thread [Next in Thread]