[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions

From: Mark Cave-Ayland
Subject: Re: [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG vector operations
Date: Mon, 17 Dec 2018 18:49:47 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.0

On 17/12/2018 17:39, Richard Henderson wrote:

> On 12/17/18 4:23 AM, Mark Cave-Ayland wrote:
>> NOTE: there are a lot of instructions that cannot (yet) be optimised to use 
>> TCG vector
>> operations, however it struck me that there may be some potential for 
>> converting
>> saturating add/sub and cmp instructions if there were a mechanism to return 
>> a set of
>> flags indicating the result of the saturation/comparison.
> There are also a lot of instructions that can be converted, but aren't:
> * vspltis[bhw] can use tcg_gen_gvec_dup{8,16,32}i.
> * vsplt{b,h,w} can use tcg_gen_gvec_dup_mem.
>   Note that you'll need something like vec_reg_offset from
>   target/arm/translate-a64.h to compute the offset of the
>   specific byte/word/long from which we are to splat.

Oh okay, thanks for the hints - I remember thinking that I couldn't much with 
but I'll go and look again.

> * vmr should be handled by having tcg_gen_gvec_or notice aofs == bofs.
>   For ARM, we do special case this during translation.
>   But since tcg/tcg-op.c does these things for tcg_gen_or_i64,
>   we should probably handle the same set of transformations.
> * vnot would need to be handled by actually adding a tcg_gen_gvec_nor
>   and then also noticing aofs == bofs.

And I'll revisit these ones too.

> For saturation, I think the easiest thing to do is represent SAT as a
> ppc_avr_t.  We notice saturation by also computing normal arithmetic and
> comparing to see if they differ.  E.g.
>     tcg_gen_gvec_add(vece, offsetof_avr_tmp,
>                      offsetof(ra), offsetof(rb), 16, 16);
>     tcg_gen_gvec_ssadd(vece, offsetof(rt),
>                        offsetof(ra), offsetof(rb), 16, 16);
>     tcg_gen_gvec_cmp(TCG_COND_NE, vece, offsetof_avr_tmp,
>                      offsetof_avr_tmp, offsetof(rt), 16, 16);
>     tcg_gen_gvec_or(vece, offsetof_avr_sat, offsetof_avr_sat,
>                     offsetof_avr_tmp, 16, 16);
> You only need to convert the ppc_avr_t to a single bit when reading VSCR.

I actually had a PoC that looked somewhat similar to this, except that I 
the idea as I thought that the penalty of doing the add twice (plus comparison) 
slow down everything by several orders of magnitude for backends that didn't 
vector instructions. What's the best way to handle this?

> For comparisons... that's tricky.  I wonder if there's anything better than
>     tcg_gen_gvec_cmp(TCG_COND_FOO, vece, offsetof(rt),
>                      offsetof(ra), offsetof(rb), 16, 16);
>     if (rc) {
>         TCGv_i64 hi, lo, t, f;
>         tcg_gen_ld_i64(hi, cpu_env, offsetof(rt));
>         tcg_gen_ld_i64(lo, cpu_env, offsetof(rt) + 8);
>         tcg_gen_and_i64(t, hi, lo);
>         tcg_gen_or_i64(f, hi, lo);
>         tcg_gen_setcondi_i64(TCG_COND_EQ, t, t, -1);
>         tcg_gen_setcondi_i64(TCG_COND_EQ, f, f, 0);
>         // truncate to i32, shift, or, and set to cr6.
>     }

Certainly I can look at this approach, but again my concern is that we end up 
penalising the backends without vector instruction support :/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]