qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] Streamlining endian handling in TCG


From: Peter Maydell
Subject: Re: [Qemu-devel] [RFC] Streamlining endian handling in TCG
Date: Wed, 28 Aug 2013 17:38:07 +0100

On 28 August 2013 16:26, Richard Henderson <address@hidden> wrote:
> On 08/28/2013 07:34 AM, Peter Maydell wrote:
>> On 28 August 2013 15:31, Richard Henderson <address@hidden> wrote:
>>> On 08/28/2013 01:15 AM, Peter Maydell wrote:
>>>> [*] not impossible, we already do something on the ppc
>>>> that's similar; however I'd really want to take the time to
>>>> figure out how to do endianness swapping "properly"
>>>> and what qemu does currently before messing with it.
>>>
>>> I've got a loose plan in my head for how to clean up handling of
>>> reverse-endian load/store instructions at both the translator and
>>> tcg backend levels.
>>
>> Nice. Will it allow us to get rid of TARGET_WORDS_BIGENDIAN?
>
> I don't know, as I don't know off-hand what all that implies.
>
> Let me lay out my idea and see what you think:
>
> Currently, at the TCG level we have 8 qemu_ld* opcodes, and 4 qemu_st* 
> opcodes,
> that always produce target_ulong sized results, and always in the guest
> declared endianness.
>
> There are several problems I want to address:
>
> (1) I want explicit _i32 and _i64 sizes for the loads and stores.  This will
> clean up a number of places in several translators where we have to load to 
> _tl
> and then truncate or extend to an explicit size.
>
> (2) I want explicit endianness for the loads and stores.  E.g. when a sparc
> guest does a byte-swapped store, there's little point in doing two offsetting
> bswaps to make that happen.

I like both of these. (Do we have much code that relies on being able
to do "just load whatever the natural register width is " ? In any case,
we can rewrite that without too much trouble I think.)

> (3) For hosts that do not support byte-swapped loads and stores themselves, 
> the
> need to allocate extra registers during the memory operation in order to  hold
> the swapped results is an unnecessary burden.  Better to expose the bswap
> operation at the tcg opcode level and let normal register allocation happen.
>
> Now, naively implementing 1 and 2 would result in 32 opcodes for qemu_ld*. 
> That
> is obviously a non-starter.  However, the very first thing that each tcg
> backend does is map the current 8 opcodes into a bitmask ("opc" and "s_bits"
> in the source).  Let us make that official, and then extend it.
>
> Therefore:
>
> (A) Compress qemu_ld* into two qemu_ld_{i32,i64}, with an additional constant
> argument that describes the actual load, exactly as "opc" does today.
> Adjusting the translators to match can be done in stages, or we might decide 
> to
> leave the existing translator-level interface in place permanently.

I think I'd prefer to remove the old interface; having multiple
ways to do something is a problem we have across the codebase.

> (B) Add an additional bit to the "opc" to indicate which endianness is 
> desired.
>  E.g. 0 = LE, 8 = BE.  Expose the opc interface to the translators.  At which
> point generating a load becomes more like
>
>     tcg_gen_qemu_ld_tl(dest, addr, size | sign | dc->big_endian);
>
> and the current endianness of the guest becomes a bit on the TB, to be copied
> into the DisasContext at the beginning of translation.

I guess we deal with ARMv5-style BE32 by having the target
emit an explicit XOR TCG op?

> (C) Examine the endian bit in the tcg-op.h expander, and check a
> TCG_TARGET_HAS_foo flag to see if the tcg backend supports reverse endian
> memory ops.  If not, break out the bswap into the opcode stream as a 
> temporary.
>
> The corollary here is that we must have a full set of bi-endian tcg helper
> functions.  At the moment, the helper functions are all keyed to the 
> hard-coded
> guest endianness.  That means the typical LE/BE host/guest memory op looks 
> like
>
>         if (tlb hit) {
>             t = bswap(data);
>             store t;
>         } else {
>             helper_store_be(data);
>         }
>
> If we hoist the bswap it'll need to be
>
>         t = bswap(data);
>         if (tlb hit) {
>             store t;
>         } else {
>             helper_store_le(t);
>         }

Do we need to overhaul the C interface to the
memory system too? (ie ldl_p and friends).

> (D) Profit!  I'm not sure what will be left of TARGET_WORDS_BIGENDIAN at this
> point.  Possibly only if we leave the current translator interface in place in
> step A.

I think there are a number of devices and boards which use it
as a convenient shortcut, but we can fix those -- the TCG
reliance on knowing about the target endianness is the
hard part of the problem, I think.

-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]