qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] RFC: Code fetch optimisation


From: J. Mayer
Subject: Re: [Qemu-devel] RFC: Code fetch optimisation
Date: Mon, 15 Oct 2007 14:09:00 +0200

On Mon, 2007-10-15 at 03:30 +0100, Paul Brook wrote:
> On Sunday 14 October 2007, J. Mayer wrote:
> > Here's an updated version of the code fetch optimisation patch against
> > current CVS.
> > As a remainder, this patch avoid use of softmmu helpers to fetch the
> > code in most case. A new target define TARGET_HAS_VLE_INSNS has been
> > added which is used to handle the case of an instruction that span 2
> > pages, when the target CPU uses a variable-length instructions encoding.
> > For pure RISC, the code fetch is done using raw access routines.
> 
> > +    unsigned long phys_pc;
> > +    unsigned long phys_pc_start;
> 
> These are ram offsets, not physical addresses. I recommend naming them as 
> such 
> to avoid confusion.

Well, those are host addresses. Fabrice even suggested me to replace
them with void * to prevent confusion, but I kept using unsigned long
because the _p functions API do not use pointers. As those values are
defined as phys_ram_base + offset, those are likely to be host address,
not RAM offset, and are used directly to dereference host pointers in
the ldxxx_p functions. Did I miss something ?

> > +    opc = glue(glue(lds,SUFFIX),MEMSUFFIX)(virt_pc);
> > +    /* Avoid softmmu access on next load */
> > +    /* XXX: dont: phys PC is not correct anymore
> > +     *      We could call get_phys_addr_code(env, pc); and remove the else
> > +     *      condition, here. 
> > +     */
> > +    //*start_pc = phys_pc;
> 
> The commented out code is completely bogus, please remove it. The comment is 
> also somewhat misleading/incorrect. The else would still be required for 
> accesses that span a page boundary.

I guess trying to optimize this case retrieving the physical address
would not bring any optimization as in fact only the last translated
instruction of a TB (then only a few code loads) may hit this case.
I'd like to keep a comment here to show that it may not be a good idea
(or may not be as simple as it seems at first sight) to try to do more
optimisation here, but you're right this comment is not correct.

> The code itself looks ok, though I'd be surprised if it made a significant 
> difference. We're always going to hit the fast-path TLB lookup case anyway.

It seems that the generated code for the code fetch is much more
efficient than the one generated when we get when using the softmmu
routines. But it's true we do not get any significant performance boost.
As it was previously mentioned, the idea of the patch is more a 'don't
do unneeded things during code translation' than a great performance
improvment.

-- 
J. Mayer <address@hidden>
Never organized





reply via email to

[Prev in Thread] Current Thread [Next in Thread]