qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 2/4] tcg/aarch64: implement new TCG target for a


From: Claudio Fontana
Subject: Re: [Qemu-devel] [PATCH 2/4] tcg/aarch64: implement new TCG target for aarch64
Date: Fri, 24 May 2013 10:53:35 +0200
User-agent: Mozilla/5.0 (Windows NT 6.1; rv:17.0) Gecko/20130509 Thunderbird/17.0.6

On 23.05.2013 18:29, Richard Henderson wrote:
> On 05/23/2013 01:18 AM, Claudio Fontana wrote:
>> +static inline void patch_reloc(uint8_t *code_ptr, int type,
>> +                               tcg_target_long value, tcg_target_long 
>> addend)
>> +{
>> +    switch (type) {
>> +    case R_AARCH64_JUMP26:
>> +    case R_AARCH64_CALL26:
>> +        reloc_pc26(code_ptr, value);
>> +        break;
>> +    case R_AARCH64_CONDBR19:
>> +        reloc_pc19(code_ptr, value);
>> +        break;
> 
> The addend operand may always be zero atm, but honor it anyway.

Ok, I will add addend to value.

> 
>> +static inline void tcg_out_ldst_9(TCGContext *s,
>> +                                  enum aarch64_ldst_op_data op_data,
>> +                                  enum aarch64_ldst_op_type op_type,
>> +                                  int rd, int rn, tcg_target_long offset)
> 
> Universally, use TCGReg for arguments that must be registers.

Good catch, thanks.

>> +static inline void tcg_out_movi32(TCGContext *s, int ext, int rd,
>> +                                  uint32_t value)
>> +{
>> +    uint32_t half, base, movk = 0;
>> +    if (!value) {
>> +        tcg_out_movr(s, ext, rd, TCG_REG_XZR);
>> +        return;
>> +    }
> 
> No real need to special case zero; it's just an extra test slowing down the
> compiler.

Yes, we need to handle the special case zero.
Otherwise no instruction at all would be emitted for value 0.

>> +    /* construct halfwords of the immediate with MOVZ with LSL */
>> +    /* using MOVZ 0x52800000 | extended reg.. */
>> +    base = ext ? 0xd2800000 : 0x52800000;
> 
> Why is ext an argument to movi32?  Don't we know just because of the name that
> we only case about 32-bit data?  And thus you should always be writing to the
> Wn registers, which automatically zero the high bits of the Xn register?
> 
> Although honestly, if you wanted to keep "ext", you could just merge the two
> movi routines.  For most tcg targets that's what we already do -- the arm port
> from whence you copied this is the outlier, because it wanted to add a
> condition argument.

I think that the idea to merge the two is worth considering, it will reduce 
confusion.
I will give it a try, and review 32/64 movi use in the whole thing.

I actually don't know whether to prefer ext=0 or ext=1,
in the sense that it would be useful to know whether using the extended 
registers
with a small constant is performance-wise preferable to using the 32bit 
operation,
and relying on 0-extension. See also the rotation comment below.

>> +/* solve the whole ldst problem */
>> +static inline void tcg_out_ldst(TCGContext *s, enum aarch64_ldst_op_data 
>> data,
>> +                                enum aarch64_ldst_op_type type,
>> +                                int rd, int rn, tcg_target_long offset)
>> +{
>> +    if (offset > -256 && offset < 256) {
>> +        tcg_out_ldst_9(s, data, type, rd, rn, offset);
> 
> Ouch, that's not much room.  You'll overflow that regularly getting to the
> various register slots in env.  You really are going to want to be able to 
> make
> use of the scaled 12-bit offset.
> 
> That said, this is certainly ok for now.
> 
>> +/* mov alias implemented with add immediate, useful to move to/from SP */
>> +static inline void tcg_out_movr_sp(TCGContext *s, int ext, int rd, int rn)
>> +{
>> +    /* using ADD 0x11000000 | (ext) | rn << 5 | rd */
>> +    unsigned int base = ext ? 0x91000000 : 0x11000000;
>> +    tcg_out32(s, base | rn << 5 | rd);
>> +}
> 
> Any reason not to just make this the move register function?  That is,
> assuming there's a real reason you set up that frame pointer...

The reason I have separate functions is to keep both the possibilities to move
XZR to a register, and to move to/from SP.
The Frame Pointer is not involved specifically in movr_sp.

There is reason to use FP, although we could use SP instead, in my view.
Using FP to reach the local callee-saved data seems more obvious to me,
and less surprising for the reader. It makes the code in the prologue also
easier to understand, as we can restore SP without losing the ability
to reach the callee-saved data, keeping the order of operations consistent. 

> It's starting to look like "ext" could be set to 0x80000000 (or really a
> symbolic alias of that) and written as x | ext everwhere, instead of
> conditionals.  At least in most cases.

Hmm it could be, although I would need two defines,
since there seem to be two formats in which ext is encoded, the one with the
(0x80 << 24) and the one with the 0x4.

>> +    /* all arguments passed via registers */
>> +    tcg_out_movr(s, 1, TCG_REG_X0, TCG_AREG0);
>> +    tcg_out_movr(s, 1, TCG_REG_X1, addr_reg);
> 
> addr_reg almost certainly needs to be zero-extended for 32-bit guests, easily
> done by setting ext = 0 here.

I can easily put an #ifdef just to be sure.

>> +        unsigned int bits; bits = 8 * (1 << s_bits) - 1;
> 
>   unsigned int bits = ...
> 
>> +#else /* !CONFIG_SOFTMMU */
>> +    tcg_abort(); /* TODO */
>> +#endif
> 
> This really is even easier: zero-extend (if needed), add GUEST_BASE (often 
> held
> in a reserved register for 64-bit targets), perform the load/store.  And of
> course for aarch64, the add can be done via reg+reg addressing.
> 
> See e.g. ppc64 for how to conditionally reserve a register containing 
> GUEST_BASE.
> 

That's ok, but can we keep user mode for a separate patch? Testing is the 
reason.

>> +    case INDEX_op_rotl_i64: ext = 1;
>> +    case INDEX_op_rotl_i32:     /* same as rotate right by (32 - m) */
>> +        if (const_args[2]) {    /* ROR / EXTR Wd, Wm, Wm, 32 - m */
>> +            tcg_out_rotl(s, ext, args[0], args[1], args[2]);
>> +        } else {
>> +            tcg_out_arith(s, ARITH_SUB, ext, args[2], TCG_REG_XZR, args[2]);
>> +            tcg_out_shiftrot_reg(s, SRR_ROR, ext, args[0], args[1], 
>> args[2]);
> 
> You can't clobber the args[2] register here.  You need to use the TMP 
> register.

Ok.

> 
> And fwiw, you can always use ext = 0 for that negation, since we don't care
> about the high bits.

Yes. See comment above about movi32/movi64.

>> +    case INDEX_op_setcond_i64: ext = 1;
>> +    case INDEX_op_setcond_i32:
>> +        tcg_out_cmp(s, ext, args[1], args[2]);
>> +        tcg_out_cset(s, ext, args[0], args[3]);
>> +        break;
> 
> ext = 0 for the cset, since the result is always 0/1.

Another instance of the ext/noext: here you seem to assume that using ext = 0 
with small constants is better,
and I assume better here means better performance. Should I assume that that is 
generally the case?

>> +static void tcg_target_qemu_prologue(TCGContext *s)
>> +{
>> +    /* NB: frame sizes are in 16 byte stack units! */
>> +    int frame_size_callee_saved, frame_size_tcg_locals;
>> +    int r;
>> +
>> +    /* save pairs             (FP, LR) and (X19, X20) .. (X27, X28) */
>> +    frame_size_callee_saved = (1) + (TCG_REG_X28 - TCG_REG_X19) / 2 + 1;
>> +
>> +    /* frame size requirement for TCG local variables */
>> +    frame_size_tcg_locals = TCG_STATIC_CALL_ARGS_SIZE
>> +        + CPU_TEMP_BUF_NLONGS * sizeof(long)
>> +        + (TCG_TARGET_STACK_ALIGN - 1);
>> +    frame_size_tcg_locals &= ~(TCG_TARGET_STACK_ALIGN - 1);
>> +    frame_size_tcg_locals /= TCG_TARGET_STACK_ALIGN;
>> +
>> +    /* push (FP, LR) and update sp */
>> +    tcg_out_push_pair(s, TCG_REG_FP, TCG_REG_LR, frame_size_callee_saved);
>> +
>> +    /* FP -> callee_saved */
>> +    tcg_out_movr_sp(s, 1, TCG_REG_FP, TCG_REG_SP);
> 
> You initialize FP, but you don't reserve the register, so it's going to get
> clobbered.  We don't actually use the frame pointer in the translated code, so
> I don't think there's any call to actually initialize it either.

The FP is not going to be clobbered, not by code here and not by called code.

It is not going to be clobbered between our use before the jump and after the
jump, because all the called functions need to preserve FP as mandated by the
calling conventions.

It is not going to be clobbered from the point of view of our caller,
because we save (FP, LR) along with (X19, X20) .. (X27, X28) and restore them
before returning.

We use FP to point to the callee_saved registers, and to move to/from them
in the tcg_out_store_pair and tcg_out_load_pair functions.

> r~
> 

Claudio




reply via email to

[Prev in Thread] Current Thread [Next in Thread]