Niek Linnenbank <address@hidden> writes:
> Hi Alex,
> On Wed, Feb 26, 2020 at 7:13 PM Alex Bennée <address@hidden> wrote:
>> While 32mb is certainly usable a full system boot ends up flushing the
>> codegen buffer nearly 100 times. Increase the default on 64 bit hosts
>> to take advantage of all that spare memory. After this change I can
>> boot my tests system without any TB flushes.
> That great, with this change I'm seeing a performance improvement when
> running the avocado tests for cubieboard.
> It runs about 4-5 seconds faster. My host is Ubuntu 18.04 on 64-bit.
> I don't know much about the internals of TCG nor how it actually uses the
> but it seems logical to me that increasing the cache size would improve
> What I'm wondering is: will this also result in TCG translating larger
> chunks in one shot, so potentially
> taking more time to do the translation? If so, could it perhaps affect more
> latency sensitive code?
No - the size of the translation blocks is governed by the guest code
and where it ends a basic block. In system mode we also care about
crossing guest page boundaries.
>> Signed-off-by: Alex Bennée <address@hidden>
> Tested-by: Niek Linnenbank <address@hidden>
>> accel/tcg/translate-all.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
>> index 4ce5d1b3931..f7baa512059 100644
>> --- a/accel/tcg/translate-all.c
>> +++ b/accel/tcg/translate-all.c
>> @@ -929,7 +929,11 @@ static void page_lock_pair(PageDesc **ret_p1,
>> tb_page_addr_t phys1,
>> # define MAX_CODE_GEN_BUFFER_SIZE ((size_t)-1)
>> +#if TCG_TARGET_REG_BITS == 32
>> #define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
>> +#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (2 * GiB)
> The qemu process now takes up more virtual memory, about ~2.5GiB in my
> test, which can be expected with this change.
> Is it very likely that the TCG cache will be filled quickly and completely?
> I'm asking because I also use Qemu to do automated testing
> where the nodes are 64-bit but each have only 2GiB physical RAM.
Well so this is the interesting question and as ever it depends.
For system emulation the buffer will just slowly fill-up over time until
exhausted and which point it will flush and reset. Each time the guest
needs to flush a page and load fresh code in we will generate more
translated code. If the guest isn't under load and never uses all it's
RAM for code then in theory the pages of the mmap that are never filled
never need to be actualised by the host kernel.
You can view the behaviour by running "info jit" from the HMP monitor in
your tests. The "TB Flush" value shows the number of times this has
happened along with other information about translation state.
Thanks for clarifying this, now it all starts to make more sense to me.
>> #define DEFAULT_CODE_GEN_BUFFER_SIZE \
>> (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \