qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 13/13] target-ppc: Add xxperm and xxpermr instru


From: Bharata B Rao
Subject: Re: [Qemu-devel] [PATCH 13/13] target-ppc: Add xxperm and xxpermr instructions
Date: Tue, 6 Dec 2016 14:25:41 +0530
User-agent: Mutt/1.7.1 (2016-10-04)

On Tue, Dec 06, 2016 at 03:11:22PM +1100, David Gibson wrote:
> On Mon, Dec 05, 2016 at 04:55:30PM +0530, Nikunj A Dadhania wrote:
> > From: Bharata B Rao <address@hidden>
> > 
> > xxperm:  VSX Vector Permute
> > xxpermr: VSX Vector Permute Right-indexed
> > 
> > Signed-off-by: Bharata B Rao <address@hidden>
> > Signed-off-by: Nikunj A Dadhania <address@hidden>
> > ---
> >  target-ppc/fpu_helper.c             | 50 
> > +++++++++++++++++++++++++++++++++++++
> >  target-ppc/helper.h                 |  2 ++
> >  target-ppc/translate/vsx-impl.inc.c |  2 ++
> >  target-ppc/translate/vsx-ops.inc.c  |  2 ++
> >  4 files changed, 56 insertions(+)
> > 
> > diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c
> > index 3b867cf..be552c7 100644
> > --- a/target-ppc/fpu_helper.c
> > +++ b/target-ppc/fpu_helper.c
> > @@ -2869,3 +2869,53 @@ uint64_t helper_xsrsp(CPUPPCState *env, uint64_t xb)
> >      float_check_status(env);
> >      return xt;
> >  }
> > +
> > +static void vsr_copy_256(ppc_vsr_t *xa, ppc_vsr_t *xt, int8_t *src)
> > +{
> > +#if defined(HOST_WORDS_BIGENDIAN)
> > +    memcpy(src, xa, sizeof(*xa));
> > +    memcpy(src + 16, xt, sizeof(*xt));
> > +#else
> > +    memcpy(src, xt, sizeof(*xt));
> > +    memcpy(src + 16, xa, sizeof(*xa));
> 
> Is this right?  I thought the order of the bytes within each word
> varied with the host endianness as well.

Since we are already working with 2 16 byte vectors xa and xb here, I thought
we don't have to worry about order of bytes within each vector, but instead can
construct the 32 byte vector as above based on host endianness.

> 
> > +#endif
> > +}
> > +
> > +static int8_t vsr_get_byte(int8_t *src, int bound, int idx)
> > +{
> > +    if (idx >= bound) {
> > +        return 0xFF;
> > +    }
> 
> AFAICT you don't need this check.  For both xxperm and xxpermr you're
> already masking the index to 5 bits, so it can't exceed 31.

Was thinking of making it a generic API and hence had that boundary
check but yes, no point for the check in the context of this instruction.

> 
> > +#if defined(HOST_WORDS_BIGENDIAN)
> > +    return src[idx];
> > +#else
> > +    return src[bound - 1 - idx];
> > +#endif
> > +}
> > +
> > +#define VSX_XXPERM(op, indexed)                                    \
> > +void helper_##op(CPUPPCState *env, uint32_t opcode)                \
> > +{                                                                  \
> > +    ppc_vsr_t xt, xa, pcv;                                         \
> > +    int i, idx;                                                    \
> > +    int8_t src[32];                                                \
> > +                                                                   \
> > +    getVSR(xA(opcode), &xa, env);                                  \
> > +    getVSR(xT(opcode), &xt, env);                                  \
> > +    getVSR(xB(opcode), &pcv, env);                                 \
> > +                                                                   \
> > +    vsr_copy_256(&xa, &xt, src);                                   \
> 
> You have a double copy here AFAICT - first from the actual env
> structure to xt and xa, then to the src array.  That seems like it
> would be good to avoid.
> 
> It seems like it would nice in any case to avoid even the one copy.
> You'd need a temporary for the output of course and to copy that, but
> you should be able to combine indexed with host endianness to
> translate each index to retrieve directly from the VSR values in env.

I am not sure it would be good to retrieve byte values directly from
env as getVSR nicely abstracts out from which fields
(env->[fpr, vsr, avr] the data is fetched based on the register specified
in the opcode.

I can reduce one copy though by not constructing a 32 byte vector (src)
but instead retrieving the bytes directly from xa and xt based on
the index.

Regards,
Bharata.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]