qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PULL 11/26] target/riscv: Add orc.b instruction for Zbb, removing g


From: Vineet Gupta
Subject: Re: [PULL 11/26] target/riscv: Add orc.b instruction for Zbb, removing gorc/gorci
Date: Wed, 13 Oct 2021 09:20:51 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0

On 10/13/21 6:49 AM, Philipp Tomsich wrote:
On Wed, 13 Oct 2021 at 15:44, Vincent Palatin <vpalatin@rivosinc.com> wrote:

On Wed, Oct 13, 2021 at 3:13 PM Philipp Tomsich
<philipp.tomsich@vrull.eu> wrote:

I had a much simpler version initially (using 3 x mask/shift/or, for
12 instructions after setup of constants), but took up the suggestion
to optimize based on haszero(v)...
Indeed this appears to not do what we expect, when there's only 0x01
set in a byte.

The less optimized form, with a single constant, that would still do
what we want is:
    /* set high-bit for non-zero bytes */
    constant = dup_const_tl(MO_8, 0x7f);
    tmp = v & constant;   // AND
    tmp += constant;       // ADD
    tmp |= v;                    // OR
    /* extract high-bit to low-bit, for each word */
    tmp &= ~constant;     // ANDC
    tmp >>= 7;                 // SHR
    /* multiply with 0xff to populate entire byte where the low-bit is set */
    tmp *= 0xff;                // MUL

I'll submit a patch with this one later today, once I had a chance to
pass this through a full test.


Thanks for the insight.

I have tried it, implemented as:
```
static void gen_orc_b(TCGv ret, TCGv source1)
{
     TCGv  tmp = tcg_temp_new();
     TCGv  constant = tcg_constant_tl(dup_const_tl(MO_8, 0x7f));

     /* set high-bit for non-zero bytes */
     tcg_gen_and_tl(tmp, source1, constant);
     tcg_gen_add_tl(tmp, tmp, constant);
     tcg_gen_or_tl(tmp, tmp, source1);
     /* extract high-bit to low-bit, for each word */
     tcg_gen_andc_tl(tmp, tmp, constant);
     tcg_gen_shri_tl(tmp, tmp, 7);

     /* Replicate the lsb of each byte across the byte. */
     tcg_gen_muli_tl(ret, tmp, 0xff);

     tcg_temp_free(tmp);
}
```

It does pass my own test sequences.

I am running it against SPEC at the moment, using optimized
strlen/strcpy/strcmp functions using orc.b.
The verdict on that should be available later today...

off topic but relates, for Zb (and similar things in the future) whats the strategy for change management/discovery. I understand you can hardcode things for quick test, but for a proper glibc implementation this would be an IFUNC but there seems to be no architectural way per spec (for software/kernel) to discover this.

Same issue is with building linux kernel with Zb - how do we make sure that hardware/sim supports Zb when running corresponding software.

It seems some generic discovery/enumeration scheme is in works but what to do in the interim.

Thx,
-Vineet



reply via email to

[Prev in Thread] Current Thread [Next in Thread]