qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 12/19] target/ppc/pmu_book3s_helper.c: enable PMC1 counter ne


From: Daniel Henrique Barboza
Subject: Re: [PATCH 12/19] target/ppc/pmu_book3s_helper.c: enable PMC1 counter negative EBB
Date: Thu, 12 Aug 2021 18:24:03 -0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0



On 8/12/21 7:17 AM, Daniel Henrique Barboza wrote:


On 8/12/21 1:56 AM, Richard Henderson wrote:
On 8/11/21 6:45 PM, Richard Henderson wrote:
On 8/11/21 5:39 PM, David Gibson wrote:
I mean, nothing is stopping us from calculating cycles using time, but in the
end we would do the same thing we're already doing today.

Oh.. ok.  I had assumed that icount worked by instrumenting the
generate TCG code to actually count instructions, rather than working
off the time.

David, you're right, icount instruments the generated tcg code.
You also have to add -icount to the command-line.

Oh, and btw, icount disables multi-threaded tcg, so you're going to be running 
that guest in round-robin mode.

Icount affects so many aspects of qemu that I really do not think it is the 
best option for a PMU.

Using icount in the PMU is my fallback plan. I spent some time trying to
count instructions using translationOps but never got it right. I got
up to a point where the tests were counting instructions for the first
time it was run in the guest, then nothing else counted in consecutive
runs.

I was able to figure out that it had to do with how the translation block
works. If a previously ran TB was found via lookup then the translationOps
callbacks I was using weren't being called. I know that I was missing a
piece of info there, but since I'm trying to deal with other aspects of the
PMU logic I fell back into using icount to get things of the ground.


So, I made an attempt to remove all icount() references from the PMU code and 
instead
did the following (posting just the relevant diff):


+
+void helper_insns_inc(CPUPPCState *env)
+{
+    env->pmu_insns_count++;
+}
+
+void helper_insns_dec(CPUPPCState *env)
+{
+    env->pmu_insns_count--;
+}
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 60f35360b7..c56c656694 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -8689,6 +8689,7 @@ static void ppc_tr_tb_start(DisasContextBase *db, 
CPUState *cs)
static void ppc_tr_insn_start(DisasContextBase *dcbase, CPUState *cs)
 {
+    gen_helper_insns_inc(cpu_env);
     tcg_gen_insn_start(dcbase->pc_next);
 }
@@ -8755,6 +8756,8 @@ static void ppc_tr_tb_stop(DisasContextBase *dcbase, CPUState *cs)
         return;
     }
+ gen_helper_insns_dec(cpu_env);
+
     /* Honor single stepping. */
     sse = ctx->singlestep_enabled & (CPU_SINGLE_STEP | GDBSTUB_SINGLE_STEP);
     if (unlikely(sse)) {


And then I used 'env->pmu_insns_count' to update the instruction counters. And 
it's
working, with a slightly better performance than we have with the icount()
version. I'm not sure why it's working now since I tried something very similar
before and it didn't. Figures.

It's still overshooting the instructions a bit, but then there's the 
optimization
you mentioned previously with the tb lookup logic that could be done to improve
that as well.


I'm not sure if this is in line or close with what you proposed, but it's 
already
better than the icount version that I posted here.


Thanks,


Daniel



All this said ....


If you want to count instructions retired, then just do that.  Stuff values 
into tcg_gen_insn_start so they're available for exception unwind, and 
otherwise decrement the counter at the end of a TB.

... I don't bother giving this a try. Can you clarify what do you mean
with "exception unwind"?

I tried something similar with tcg_gen_insn_start() (called via 
ppc_tr_insn_start()).
This this ops is called inside translator_loop(), and translator_loop() isn't
being ran if TB_lookup returns the TB, I was observing the behavior I
described above of a test counting instructions in its first run only.



If you really must interrupt exactly at 0 (and not simply at some point past 
underflow), then we can adjust the tb lookup logic to let you reduce 
tb->cflags.CF_COUNT_MASK in cpu_get_tb_cpu_state.

That would be good, but depending on the amount of work I would consider doing
this is a follow-up of this work. It's ok if the PMU overflows a bit 
instructions
for our current purposes ATM.


Thanks,


Daniel



r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]