Re: [Qemu-devel] [PATCH v5 33/35] target/arm: Implement SVE dot product

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5 33/35] target/arm: Implement SVE dot product

From:	Peter Maydell
Subject:	Re: [Qemu-devel] [PATCH v5 33/35] target/arm: Implement SVE dot product (indexed)
Date:	Tue, 26 Jun 2018 17:30:02 +0100

On 26 June 2018 at 17:17, Richard Henderson
<address@hidden> wrote:
> On 06/26/2018 08:30 AM, Peter Maydell wrote:
>> On 21 June 2018 at 02:53, Richard Henderson
>> <address@hidden> wrote:
>>> Signed-off-by: Richard Henderson <address@hidden>
>>> ---
>>>  target/arm/helper.h        |  5 ++
>>>  target/arm/translate-sve.c | 18 +++++++
>>>  target/arm/vec_helper.c    | 96 ++++++++++++++++++++++++++++++++++++++
>>>  target/arm/sve.decode      |  8 +++-
>>>  4 files changed, 126 insertions(+), 1 deletion(-)
>>>
>>
>>> +void HELPER(gvec_sdot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
>>> +{
>>> +    intptr_t i, j, opr_sz = simd_oprsz(desc), opr_sz_4 = opr_sz / 4;
>>> +    intptr_t index = simd_data(desc);
>>> +    uint32_t *d = vd;
>>> +    int8_t *n = vn, *m = vm;
>>> +
>>> +    for (i = 0; i < opr_sz_4; i = j) {
>>> +        int8_t m0 = m[(i + index) * 4 + 0];
>>> +        int8_t m1 = m[(i + index) * 4 + 1];
>>> +        int8_t m2 = m[(i + index) * 4 + 2];
>>> +        int8_t m3 = m[(i + index) * 4 + 3];
>>> +
>>> +        j = i;
>>> +        do {
>>> +            d[j] += n[j * 4 + 0] * m0
>>> +                  + n[j * 4 + 1] * m1
>>> +                  + n[j * 4 + 2] * m2
>>> +                  + n[j * 4 + 3] * m3;
>>> +        } while (++j < MIN(i + 4, opr_sz_4));
>>> +    }
>>> +    clear_tail(d, opr_sz, simd_maxsz(desc));
>>> +}
>>
>> Maybe I'm just half asleep this afternoon, but this is pretty
>> confusing -- nested loops where the outer loop's increment
>> uses the inner loop's index, and the inner loop's conditions
>> depend on the outer loop index...
>
> Yeah, well.
>
> There is an edge case of aa64 advsimd, reusing this same helper,
>
>         sdot    v0.2s, v1.8b, v0.4b[0]
>
> where m values must be read (and held) before writing d results,
> and there are not 16/4=4 elements to process but only 2.
>
> I suppose I could special-case oprsz == 8 in order to simplify
> iteration of what is otherwise a multiple of 16.
>
> I thought iterating J from I to I+4 was easier to read than
> writing out I+J everywhere.  Perhaps not.

Mmm. I did indeed fail to notice the symmetry between the
indexes into m[] and those into n[].
The other bit that threw me is where the outer loop on i
updates using j.

A comment describing the intent might assist ?

thanks
-- PMM

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH v5 29/35] target/arm: Implement SVE fp complex multiply add, (continued)
- [Qemu-devel] [PATCH v5 29/35] target/arm: Implement SVE fp complex multiply add, Richard Henderson, 2018/06/20
  - Re: [Qemu-devel] [PATCH v5 29/35] target/arm: Implement SVE fp complex multiply add, Peter Maydell, 2018/06/26
    - Re: [Qemu-devel] [PATCH v5 29/35] target/arm: Implement SVE fp complex multiply add, Richard Henderson, 2018/06/26
    - Re: [Qemu-devel] [PATCH v5 29/35] target/arm: Implement SVE fp complex multiply add, Peter Maydell, 2018/06/26
- [Qemu-devel] [PATCH v5 30/35] target/arm: Pass index to AdvSIMD FCMLA (indexed), Richard Henderson, 2018/06/20
  - Re: [Qemu-devel] [PATCH v5 30/35] target/arm: Pass index to AdvSIMD FCMLA (indexed), Peter Maydell, 2018/06/26
    - Re: [Qemu-devel] [PATCH v5 30/35] target/arm: Pass index to AdvSIMD FCMLA (indexed), Richard Henderson, 2018/06/26
- [Qemu-devel] [PATCH v5 33/35] target/arm: Implement SVE dot product (indexed), Richard Henderson, 2018/06/20
  - Re: [Qemu-devel] [PATCH v5 33/35] target/arm: Implement SVE dot product (indexed), Peter Maydell, 2018/06/26
    - Re: [Qemu-devel] [PATCH v5 33/35] target/arm: Implement SVE dot product (indexed), Richard Henderson, 2018/06/26
    - Re: [Qemu-devel] [PATCH v5 33/35] target/arm: Implement SVE dot product (indexed), Peter Maydell <=
- [Qemu-devel] [PATCH v5 34/35] target/arm: Enable SVE for aarch64-linux-user, Richard Henderson, 2018/06/20
- [Qemu-devel] [PATCH v5 32/35] target/arm: Implement SVE dot product (vectors), Richard Henderson, 2018/06/20
  - Re: [Qemu-devel] [PATCH v5 32/35] target/arm: Implement SVE dot product (vectors), Peter Maydell, 2018/06/26
- [Qemu-devel] [PATCH v5 31/35] target/arm: Implement SVE fp complex multiply add (indexed), Richard Henderson, 2018/06/20
- [Qemu-devel] [PATCH v5 35/35] target/arm: Implement ARMv8.2-DotProd, Richard Henderson, 2018/06/20
  - Re: [Qemu-devel] [PATCH v5 35/35] target/arm: Implement ARMv8.2-DotProd, Peter Maydell, 2018/06/26
- Re: [Qemu-devel] [PATCH v5 00/35] target/arm SVE patches, no-reply, 2018/06/21
- Re: [Qemu-devel] [PATCH v5 00/35] target/arm SVE patches, Alex Bennée, 2018/06/26

Prev by Date: [Qemu-devel] [PULL 2/3] ramfb: fix overflow
Next by Date: Re: [Qemu-devel] [PATCH 4/5] pr-manager: add query-pr-managers QMP command
Previous by thread: Re: [Qemu-devel] [PATCH v5 33/35] target/arm: Implement SVE dot product (indexed)
Next by thread: [Qemu-devel] [PATCH v5 34/35] target/arm: Enable SVE for aarch64-linux-user
Index(es):
- Date
- Thread