qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 12/28] target-xtensa: implement shifts (ST1 and RS


From: Max Filippov
Subject: Re: [Qemu-devel] [RFC 12/28] target-xtensa: implement shifts (ST1 and RST1 groups)
Date: Thu, 5 May 2011 12:40:22 +0400

>> To track immediate values written to SAR? You mean that there may be
>> some performance difference of fixed size shift vs indirect shift and
>> TCG is able to tell them apart?
>
> Well, not really fixed vs indirect, but if you know that the value
> in the SAR register is in the right range, you can avoid using a
> 64-bit shift.
>
> For instance,
>
>        SSL     ar2
>        SLL     ar0, ar1
>
> could be implemented with
>
>        tcg_gen_sll_i32(ar0, ar1, ar2);
>
> assuming we have enough context.
>
> Let us decompose the SAR register into two parts, storing both the
> true value, and 32-value.
>
>    struct DisasContext {
>        // Current Stuff
>        // ...
>
>        // When valid, holds 32-SAR.
>        TCGv sar_m32;
>        bool sar_m32_alloc;
>        bool sar_m32_valid;
>        bool sar_5bit;
>    };
>
> At the beginning of the TB:
>
>        TCGV_UNUSED_I32(dc->sar_m32);
>        dc->sar_m32_alloc = false;
>        dc->sar_m32_valid = false;
>        dc->sar_5bit = false;
>
>
>
> static void gen_set_sra_m32(DisasContext *dc, TCGv val)
> {
>    if (!dc->sar_m32_alloc) {
>        dc->sar_m32_alloc = true;
>        dc->sar_m32 = tcg_temp_local_new_i32();
>    }
>    dc->sar_m32_valid = true;
>
>    /* Clear 5 bit because the SAR value could be 32.  */
>    dc->sar_5bit = false;
>
>    tcg_gen_movi_i32(cpu_SR[SAR], 32);
>    tcg_gen_sub_i32(cpu_SR[SAR], cpu_SR[SAR], val);
>    tcg_gen_mov_i32(dc->sar_m32, val);
> }
>
> static void gen_set_sra(DisasContext *dc, TCGv val, bool is_5bit)
> {
>    if (dc->sar_m32_alloc && dc->sar_m32_valid) {
>        tcg_gen_discard_i32(dc->sar_m32);
>    }
>    dc->sar_m32_valid = false;
>    dc->sar_5bit = is_5bit;
>
>    tcg_gen_mov_i32(cpu_SR[SAR], val);
> }
>
>        /* SSL */
>        tcg_gen_andi_i32(tmp, cpu_R[AS], 31);
>        gen_set_sra_m32(dc, tmp);
>        break;
>
>        /* SRL */
>        tcg_gen_andi_i32(tmp, cpu_R[AS], 31);
>        gen_set_sra(dc, tmp, true);
>        break;
>
>        /* WSR.SAR */
>        tcg_gen_andi_i32(tmp, cpu_R[AS], 63);
>        gen_set_sra(dc, tmp, false);
>        break;
>
>        /* SSAI */
>        tcg_gen_movi_i32(tmp, constant);
>        gen_gen_sra(dc, tmp, true);
>        break;
>
>        /* SLL */
>        if (dc->sar_m32_valid) {
>            tcg_gen_sll_i32(cpu_R[AR], cpu_R[AS], dc->sar_m32);
>        } else {
>            /* your existing 64-bit shift emulation.  */
>        }
>        break;
>
>        /* SRL */
>        if (dc->sar_5bit) {
>            tcg_gen_srl_i32(cpu_R[AR], cpu_R[AS], cpu_SR[SAR]);
>        } else {
>            /* your existing 64-bit shift emulation.  */
>        }
>
>
> A couple of points: The use of the local temp avoids problems with
> intervening insns that might generate branch opcodes.  For the
> simplest cases, as with the case at the start of the message, we
> ought to be able to propagate the values into the TCG shift insn
> directly.
>
> Does that make sense?

Yes it does. Thanks for the good explanation.
I tried to keep it all as simple as possible to have a working
prototype qickly. Now that it works optimizations should be no
problem.

Thanks.
-- Max



reply via email to

[Prev in Thread] Current Thread [Next in Thread]