qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v6 01/22] instrument: Add documentation


From: Emilio G. Cota
Subject: Re: [Qemu-devel] [PATCH v6 01/22] instrument: Add documentation
Date: Sat, 30 Sep 2017 14:09:41 -0400
User-agent: Mutt/1.5.24 (2015-08-30)

On Sat, Sep 30, 2017 at 00:46:33 +0300, Lluís Vilanova wrote:
> Emilio G Cota writes:
> > I'm not sure I understand this concept of filtering. Are you saying that in
> > the first case, all memory accesses are instrumented, and then in the
> > "access helper" we only call the user's callback if it's a memory write?
> > And in the second case, we simply just generate a "write helper" instead
> > of an "access helper". Am I understanding this correctly?
> 
> In the previous case (no filtering), the user callback is always called when a
> memory access is *executed*, and the user then checks if the access mode is a
> write to decide whether to increment a counter.
> 
> In this case (with filtering), a user callback is called when a memory access 
> is
> *translated*, and if the access mode is a write, the user generates a call to 
> a
> second callback that is executed every time a memory access is executed (only
> that it is only generated for memory writes, the ones we care about).
> 
> Is this clearer?

I get it now, thanks!

> > FWIW my experiments so far show similar numbers for instrumenting each
> > instruction (haven't done the per-tb yet). The difference is that I'm
> > exposing to instrumenters a copy of the guest instructions (const void 
> > *data,
> > size_t size). These copies are kept around until TB's are flushed.
> > Luckily there seems to be very little overhead in keeping these around,
> > apart from the memory overhead -- but in terms of performance, the
> > necessary allocations do not induce significant overhead.
> 
> To keep this use-case simpler, I added the memory access API I posted in this
> series, where instrumenters can read guest memory (more general than passing a
> copy of the current instruction).

I see some potential problems with this:
1. Instrumenters' accesses could generate exceptions. I presume we'd want to 
avoid
   this, or leave it as a debug-only kind of option.
2. Instrumenters won't know where the end of an instruction (for variable-length
  ISAs) or of a TB is (TB != basic block). For instructions one could have a 
loop
  where we read byte-by-byte and pass it to the decoder, something similar to
  what we have in the capstone code recently posted to the list (v4). For TBs,
  we really should have a way to delimit the length of the TB. This is further
  complicated if we want instrumentation to be inserted *before* a TB is
  translated.

Some thoughts on the latter problem: if we want a tb_trans_pre callback, like
Pin/DynamoRIO provide, instead of doing two passes (one to delimit the TB and
call the tb_trans_pre callback, to then generate the translated TB), we could:
  - have a tb_trans_pre callback. This callback inserts an exec-time callback
    with a user-defined pointer (let's call it **tb_info). The callback has
    no arguments, perhaps just the pc.
  - have a tb_trans_post callback. This one passes a copy of the guest
    instructions. The instrumenter then can allocate whatever data structure
    to represent the TB (*tb_info), and copies this pointer to **tb_info, so
    that at execution time, we can obtain tb_info _before_ the TB is executed.
    After the callback returns, the copy of the guest instructions can be freed.
  This has two disadvantages:
  - We have an extra dereference to find tb_info
  - If it turns out that the TB should not be instrumented, we have generated
    a callback for nothing.

                Emilio




reply via email to

[Prev in Thread] Current Thread [Next in Thread]