qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/9] tcg: reorganize bswap* functions


From: Aurelien Jarno
Subject: Re: [Qemu-devel] [PATCH 0/9] tcg: reorganize bswap* functions
Date: Thu, 12 Mar 2009 13:56:32 +0100
User-agent: Mutt/1.5.18 (2008-05-17)

On Thu, Mar 12, 2009 at 12:29:46PM +0000, Paul Brook wrote:
> > This patch series tries to reorganize a bit the bswap* TCG functions.
> 
> In principle this looks ok, however several implementation issues:

Thanks for the review.

> > +/* Note: we assume the six high bytes are set to zero */
> > +static inline void tcg_gen_bswap16_i64(TCGv_i64 ret, TCGv_i64 arg)
> > +{
> > +    tcg_gen_bswap16_i32(TCGV_LOW(ret), TCGV_LOW(arg));
> > +}
> > +
> > +/* Note: we assume the four high bytes are set to zero */
> > +static inline void tcg_gen_bswap32_i64(TCGv_i64 ret, TCGv_i64 arg)
> > +{
> > +    tcg_gen_bswap32_i32(TCGV_LOW(ret), TCGV_LOW(arg));
> > +}
> 
> I think we want to preserve the zero extension of the value, i.e. you want 
> something along the lines of:
> 
> if (!TCGV_EQUAL_I64(ret, arg))
>   tcg_gen_movi_i32(TCGV_HIGH(ret), 0);

That's why I have modified the documentation to say the high bytes have
to be zero, similar to the current bswap16_i32 implementation.

I think we want to keep this behavior as when operating on 16-bit or
32-bit values, the high bytes most often stay 0, so this operation
is useless.

I have checked on the PowerPC target that the high bytes are always 
zero when need (either explicitely, or as part as a load operation).

For the i386 target, this is something missing, IMHO I should update
the "target-i386: use the new bswap* TCG ops" instead and put the zero
extension there.

> - I'm not sure whether it's more efficient to do movi(ret, 0) or mov(ret, 
> arg). With a bit of luck they'll both be optimized away most of the time.

idem.

> > +/* Note: we assume the four high bytes are set to zero */
> > +static inline void tcg_gen_bswap32_i64(TCGv_i64 ret, TCGv_i64 arg)
> > +{
> >...
> > +    tcg_gen_shli_i64(t0, arg, 24);
> 
> Likewise this is missing a mask operation. ext32u_i64(t0, t0) is probably the 
> most efficient way).

idem.

> > +++ b/tcg/i386/tcg-target.c
> > +    case INDEX_op_bswap16_i32:
> > +        tcg_out8(s, 0x66);
> > +        tcg_out_modrm(s, 0xc1, SHIFT_ROL, args[0]);
> > +        tcg_out8(s, 8);
> 
> > +    { INDEX_op_bswap16_i32, { "r", "0" } },
> 
> This is wrong. The r/m field of the modrm byte can only address the first 4 
> registers (AL-DL). Values 4-7 address te second bytes of these registers 
> (a.k.a. AH-DH). I suspect you need to use the "q" constraint.

That's what I did first when using XCHG xH, xL. That's why I used ROLW
instead which is able to work on the 8 registers by accessing 16-bit
registers.

> > +++ b/tcg/x86_64/tcg-target.c
> > +    case INDEX_op_bswap16_i32:
> > +    case INDEX_op_bswap16_i64:
> > +        tcg_out8(s, 0x66);
> > +        tcg_out_modrm(s, 0xc1, SHIFT_ROL, args[0]);
> > +        tcg_out8(s, 8);
> > +        break;
> 
> You need to use tcg_out_opc here to get REX prefixes. You need P_REXB to 
> avoid 
> the legacy encoding issues mentioned above, and the high bit of the r/m field 
> also goes in the REX byte.

Same here.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
address@hidden                 http://www.aurel32.net




reply via email to

[Prev in Thread] Current Thread [Next in Thread]