qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] qemu vs gcc4


From: Paul Brook
Subject: Re: [Qemu-devel] qemu vs gcc4
Date: Wed, 1 Nov 2006 03:22:09 +0000
User-agent: KMail/1.9.5

On Wednesday 01 November 2006 01:51, Rob Landley wrote:
> On Tuesday 31 October 2006 7:29 pm, Paul Brook wrote:
> > > Actually it sounds additive rather than multiplicative.  Does each
> > > target have an entirely unrelated set of ops, or is there a shared set
> > > of primitive ops plus some oddballs?
> >
> > The shared set of primitive ops is basically qops :-)
> > You probably could figure out a single common qet of qops, then write
>
> assembly
>
> > and glue them together like we do with dyngen. However once you've done
> > that you've implemented most of what's needed for fully dynamic qops, so
> > it doesn't really seem worth it.
>
> I missed a curve.  What's "fully dynamic qops"?  (There's no translation
> cache?)

I mean all the qop stuff I've implemented.

> > > > It corresponds to "T0" in dyngen. In addition to the actual CPU
> > > > state, dyngen
> > > > uses 3 fixed register as scratch workspace. for qop purposes these
> > > > are part of the guest CPU state. They're only there to aid conversion
> > > > of the translation code, they'll go away eventually.
> > >
> > > Presumably the m68k target is pure qop, and hasn't got this sort of
> > > thing?
> >
> > Correct.
> > There is one use of T0 left for communicating with the TB chaining code,
> > but that's it and will probably go away eventually.
>
> Any idea where I can get a toolchain that can output a "hello world"
> program for m68k nommu?  (Or perhaps you have a statically linked "hello
> world" program for the platform lying around?)

Funnily enough I do :-)
http://www.codesourcery.com/gnu_toolchains/coldfire/

> > Theoretically possible, but not so easy in practice. Especially when you
> > get things like partial flag clobbers, and lazy flag evaluation. Doing it
> > as a target specific hack is much simpler and quicker.
>
> I think I know what partial flag clobbers are (although if you're working
> your way back, in theory you could handle it with a mask of exposed bits),
> but what's lazy flag evaulation?  (I thought that was the point of
> eliminating the unused flag setting.  Are you saying the hardware also does
> this and we have to emulate that?)

Lazy flag evaluation is where you don't bother calculating the actual flags 
when executing the flag-setting instruction. Instead you save the 
operands/result and compute the flags when you actually need them.

> > > > There are three fairly independent stages:
> > > > 1) target-*/translate.c converts guest code into qops.
> > > > 2) translate-all.c messes about with those qops a bit (allocates host
> > > > registers, etc).
> > > > 3) translate-op.c,translate-qop.c and target-*/ turns those qops into
> > > > host code.
> > >
> > > Is pass 2 where the flag elimination pass goes (and presumably any
> > > other optimizations that might get added)?  No, that can't be the case
> > > or the m68k code wouldn't need its own implementation of the flag
> > > elimination pass...
> >
> > Flag elimination is at the end of step 1.
>
> Because it's platform specific?

Yes.

> > > > qops and dyngen ops are both small "functions" that are represented
> > > > in a similar way. The difference is that dyngen ops are target
> > > > specific fixed functions, whereas qops are generic parameterized
> > > > functions.
> > >
> > > So the 11x11 exponential complexity of qemu producing its own assembly
> > > output might not be as much of a problem after switching to qops?
> >
> > RIght. The exponential complexity is if you write the assembly by hand
> > instead of using gcc to generate it.
>
> The exponential complexity is if you have to write different code for each
> combination of host and target.  If every target disassembles to the same
> set of target QOPs, then you could have a hand-written assembly version of
> each QOP for each host platform, and still have N rather than N^2 of them.

Right, but by the time you've got everything to use the same set of ops you 
may as well teach qemu how to generate code instead of using potted 
fragments.

Using hand-written assembly fragments probably doesn't make qemu any faster, 
it just removes the gcc dependency. Using qops also allows qemu to generate 
better (faster) translated code.

Paul




reply via email to

[Prev in Thread] Current Thread [Next in Thread]