qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v6 01/22] instrument: Add documentation


From: Lluís Vilanova
Subject: Re: [Qemu-devel] [PATCH v6 01/22] instrument: Add documentation
Date: Thu, 05 Oct 2017 02:28:12 +0300
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux)

Emilio G Cota writes:

> On Sat, Sep 30, 2017 at 00:46:33 +0300, Lluís Vilanova wrote:
>> Emilio G Cota writes:
>> > I'm not sure I understand this concept of filtering. Are you saying that in
>> > the first case, all memory accesses are instrumented, and then in the
>> > "access helper" we only call the user's callback if it's a memory write?
>> > And in the second case, we simply just generate a "write helper" instead
>> > of an "access helper". Am I understanding this correctly?
>> 
>> In the previous case (no filtering), the user callback is always called when 
>> a
>> memory access is *executed*, and the user then checks if the access mode is a
>> write to decide whether to increment a counter.
>> 
>> In this case (with filtering), a user callback is called when a memory 
>> access is
>> *translated*, and if the access mode is a write, the user generates a call 
>> to a
>> second callback that is executed every time a memory access is executed (only
>> that it is only generated for memory writes, the ones we care about).
>> 
>> Is this clearer?

> I get it now, thanks!

>> > FWIW my experiments so far show similar numbers for instrumenting each
>> > instruction (haven't done the per-tb yet). The difference is that I'm
>> > exposing to instrumenters a copy of the guest instructions (const void 
>> > *data,
>> > size_t size). These copies are kept around until TB's are flushed.
>> > Luckily there seems to be very little overhead in keeping these around,
>> > apart from the memory overhead -- but in terms of performance, the
>> > necessary allocations do not induce significant overhead.
>> 
>> To keep this use-case simpler, I added the memory access API I posted in this
>> series, where instrumenters can read guest memory (more general than passing 
>> a
>> copy of the current instruction).

> I see some potential problems with this:
> 1. Instrumenters' accesses could generate exceptions. I presume we'd want to 
> avoid
>    this, or leave it as a debug-only kind of option.

The API takes care of telling you if the access could be performed
successfully. If you access the instruction's memory representation at
translation time, it should be able to perform the access, since QEMU's
translation loop just had to do so in order to access that instruction (I should
check what happens in the corner case where another guest CPU changes the page
table, since I'm not sure if the address translation functions I'm using in QEMU
will use the per-vCPU TLB cache or always traverse the page table).


> 2. Instrumenters won't know where the end of an instruction (for 
> variable-length
>   ISAs) or of a TB is (TB != basic block). For instructions one could have a 
> loop
>   where we read byte-by-byte and pass it to the decoder, something similar to
>   what we have in the capstone code recently posted to the list (v4). For TBs,
>   we really should have a way to delimit the length of the TB. This is further
>   complicated if we want instrumentation to be inserted *before* a TB is
>   translated.

> Some thoughts on the latter problem: if we want a tb_trans_pre callback, like
> Pin/DynamoRIO provide, instead of doing two passes (one to delimit the TB and
> call the tb_trans_pre callback, to then generate the translated TB), we could:
>   - have a tb_trans_pre callback. This callback inserts an exec-time callback
>     with a user-defined pointer (let's call it **tb_info). The callback has
>     no arguments, perhaps just the pc.
>   - have a tb_trans_post callback. This one passes a copy of the guest
>     instructions. The instrumenter then can allocate whatever data structure
>     to represent the TB (*tb_info), and copies this pointer to **tb_info, so
>     that at execution time, we can obtain tb_info _before_ the TB is executed.
>     After the callback returns, the copy of the guest instructions can be 
> freed.
>   This has two disadvantages:
>   - We have an extra dereference to find tb_info
>   - If it turns out that the TB should not be instrumented, we have generated
>     a callback for nothing.

That's precisely one of the reasons why I proposed adding instrumentation points
before and after events happen (e.g., instrument right after translating an
instruction, where you know its size).

What you propose is actually a broader issue, how to allow instrumentors to pass
their own data to execution-time functions "after the fact". For this, I
implemented "promises", a kind of generalization of what gen_icount() does (you
pass a value to the execution-time callback that is computed later during
translation-time).


Cheers,
  Lluis



reply via email to

[Prev in Thread] Current Thread [Next in Thread]