Re: Question about direct block chaining

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Question about direct block chaining

From:	Alex Bennée
Subject:	Re: Question about direct block chaining
Date:	Tue, 19 Apr 2022 11:24:22 +0100
User-agent:	mu4e 1.7.13; emacs 28.1.50

Taylor Simpson <tsimpson@quicinc.com> writes:

>> -----Original Message-----
>> From: Richard Henderson <richard.henderson@linaro.org>
>> Sent: Monday, April 18, 2022 10:38 AM
>> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org
>> Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
>> Subject: Re: Question about direct block chaining
>> 
>> On 4/18/22 07:54, Taylor Simpson wrote:
>> > I implemented both approaches for inner loops and didn't see speedup
>> > in my benchmark.  So, I have a couple of questions
>> > 1) What are the pros and cons of the two approaches
>> (lookup_and_goto_ptr and goto_tb + exit_tb)?
>> 
>> goto_tb can only be used within a single page (plus other restrictions, see
>> translator_use_goto_tb).  In addition, as documented, the change in cpu
>> state must be constant, beginning with a direct jump.
>> 
>> lookup_and_goto_ptr can handle any change in cpu state, including indirect
>> jumps.
>> 
>> 
>> > 2) How can I verify that direct block chaining is working properly?
>> >        With -d exec, I see lines like the following with goto_tb + exit_tb 
>> > but
>> NOT lookup_and_goto_ptr
>> >        Linking TBs 0x7fda44172e00 [0050ac38] index 1 -> 0x7fda44173b40
>> > [0050ac6c]
>> 
>> Well, that's one way.  I would have also suggested simply looking at -d op
>> output, for the various branchy cases you're considering, to see that all of 
>> the
>> exits are as expected.
>
> Thanks!!
>
> I created a synthetic benchmark with a loop with a very small body and a very 
> high number of iterations.  I can see differences in execution time.
>
> Here are my observations:
> - goto_tb + exit_tb gives the fastest execution time because it will
> patch the native jump address

As we would expect.

> - lookup_and_goto_ptr is an improvement over tcg_gen_exit_tb(NULL, 0)

Yes - mainly saving the cost of prologue and coming out of generated
code to the main loop. However once we get to tb_lookup and fail the
tb_jump_cache its going to take some time to get a block via QHT.

The tb_jump_cache is pretty simple in its implementation but I don't
know if we've ever decently characterised the hit rate and if it could
be improved. I think we already have slightly different hashing
functions for user-mode vs softmmu.

(aside I suspect the trace_vcpu_dstate check can now be removed which
should save a bit of time on the hash function).

-- 
Alex Bennée

[Prev in Thread]

Current Thread

[Next in Thread]

Question about direct block chaining, Taylor Simpson, 2022/04/18
- Re: Question about direct block chaining, Richard Henderson, 2022/04/18
  - RE: Question about direct block chaining, Taylor Simpson, 2022/04/19
    - Re: Question about direct block chaining, Alex Bennée <=

Prev by Date: Re: [PATCH] softmmu/memory: Skip translation size instead of fixed granularity if translate() successfully
Next by Date: Re: [PATCH 4/4] AVX tests
Previous by thread: RE: Question about direct block chaining
Next by thread: [PATCH for-7.1 00/10] BCDA and mffscdrn implementations
Index(es):
- Date
- Thread