qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 20/21] accel/tcg: allow plugin instrumentation to be disa


From: Alex Bennée
Subject: Re: [PATCH v2 20/21] accel/tcg: allow plugin instrumentation to be disable via cflags
Date: Fri, 12 Feb 2021 11:22:42 +0000
User-agent: mu4e 1.5.8; emacs 28.0.50

Aaron Lindsay <aaron@os.amperecomputing.com> writes:

> On Feb 10 22:10, Alex Bennée wrote:
>> When icount is enabled and we recompile an MMIO access we end up
>> double counting the instruction execution. To avoid this we introduce
>> the CF_NOINSTR cflag which disables instrumentation for the next TB.
>> As this is part of the hashed compile flags we will only execute the
>> generated TB while coming out of a cpu_io_recompile.
>
> Unfortunately this patch works a little too well!
>
> With this change, the memory access callbacks registered via
> `qemu_plugin_register_vcpu_mem_cb()` are never called for the
> re-translated instruction making the IO access, since we've disabled all
> instrumentation.

Hmm well we correctly don't instrument stores (as we have already
executed the plugin for them) - but of course the load instrumentation
is after the fact so we are now missing them.


> Is it possible to selectively disable only instruction callbacks using
> this mechanism, while still allowing others that would not yet have been
> called for the re-translated instruction?

Hmmm let me see if I can finesse the CF_NOINSTR logic to allow
plugin_gen_insn_end() without the rest? It probably needs a better name
for the flag as well. 

>
> -Aaron
>
>> While we are at it delete the old TODO. We might as well keep the
>> translation handy as it's likely you will repeatedly hit it on each
>> MMIO access.
>> 
>> Reported-by: Aaron Lindsay <aaron@os.amperecomputing.com>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>> Message-Id: <20210209182749.31323-12-alex.bennee@linaro.org>
>> 
>> ---
>> v2
>>   - squashed CH_HASHMASK to ~CF_INVALID
>> ---
>>  include/exec/exec-all.h   |  6 +++---
>>  accel/tcg/translate-all.c | 17 ++++++++---------
>>  accel/tcg/translator.c    |  2 +-
>>  3 files changed, 12 insertions(+), 13 deletions(-)
>> 
>> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
>> index e08179de34..299282cc59 100644
>> --- a/include/exec/exec-all.h
>> +++ b/include/exec/exec-all.h
>> @@ -454,14 +454,14 @@ struct TranslationBlock {
>>      uint32_t cflags;    /* compile flags */
>>  #define CF_COUNT_MASK  0x00007fff
>>  #define CF_LAST_IO     0x00008000 /* Last insn may be an IO access.  */
>> +#define CF_NOINSTR     0x00010000 /* Disable instrumentation of TB */
>>  #define CF_USE_ICOUNT  0x00020000
>>  #define CF_INVALID     0x00040000 /* TB is stale. Set with @jmp_lock held */
>>  #define CF_PARALLEL    0x00080000 /* Generate code for a parallel context */
>>  #define CF_CLUSTER_MASK 0xff000000 /* Top 8 bits are cluster ID */
>>  #define CF_CLUSTER_SHIFT 24
>> -/* cflags' mask for hashing/comparison */
>> -#define CF_HASH_MASK   \
>> -    (CF_COUNT_MASK | CF_LAST_IO | CF_USE_ICOUNT | CF_PARALLEL | 
>> CF_CLUSTER_MASK)
>> +/* cflags' mask for hashing/comparison, basically ignore CF_INVALID */
>> +#define CF_HASH_MASK   (~CF_INVALID)
>>  
>>      /* Per-vCPU dynamic tracing state used to generate this TB */
>>      uint32_t trace_vcpu_dstate;
>> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
>> index 0666f9ef14..32a3d8fe24 100644
>> --- a/accel/tcg/translate-all.c
>> +++ b/accel/tcg/translate-all.c
>> @@ -2399,7 +2399,8 @@ void tb_check_watchpoint(CPUState *cpu, uintptr_t 
>> retaddr)
>>  }
>>  
>>  #ifndef CONFIG_USER_ONLY
>> -/* in deterministic execution mode, instructions doing device I/Os
>> +/*
>> + * In deterministic execution mode, instructions doing device I/Os
>>   * must be at the end of the TB.
>>   *
>>   * Called by softmmu_template.h, with iothread mutex not held.
>> @@ -2430,19 +2431,17 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t 
>> retaddr)
>>          n = 2;
>>      }
>>  
>> -    /* Generate a new TB executing the I/O insn.  */
>> -    cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | n;
>> +    /*
>> +     * Exit the loop and potentially generate a new TB executing the
>> +     * just the I/O insns. We also disable instrumentation so we don't
>> +     * double count the instruction.
>> +     */
>> +    cpu->cflags_next_tb = curr_cflags() | CF_NOINSTR | CF_LAST_IO | n;
>>  
>>      qemu_log_mask_and_addr(CPU_LOG_EXEC, tb->pc,
>>                             "cpu_io_recompile: rewound execution of TB to "
>>                             TARGET_FMT_lx "\n", tb->pc);
>>  
>> -    /* TODO: If env->pc != tb->pc (i.e. the faulting instruction was not
>> -     * the first in the TB) then we end up generating a whole new TB and
>> -     *  repeating the fault, which is horribly inefficient.
>> -     *  Better would be to execute just this insn uncached, or generate a
>> -     *  second new TB.
>> -     */
>>      cpu_loop_exit_noexc(cpu);
>>  }
>>  
>> diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
>> index a49a794065..14d1ea795d 100644
>> --- a/accel/tcg/translator.c
>> +++ b/accel/tcg/translator.c
>> @@ -58,7 +58,7 @@ void translator_loop(const TranslatorOps *ops, 
>> DisasContextBase *db,
>>      ops->tb_start(db, cpu);
>>      tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
>>  
>> -    plugin_enabled = plugin_gen_tb_start(cpu, tb);
>> +    plugin_enabled = !(tb_cflags(db->tb) & CF_NOINSTR) && 
>> plugin_gen_tb_start(cpu, tb);
>>  
>>      while (true) {
>>          db->num_insns++;
>> -- 
>> 2.20.1
>> 


-- 
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]