[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: ARM SVE issues with non "standard" vector lengths
From: |
Richard Henderson |
Subject: |
Re: ARM SVE issues with non "standard" vector lengths |
Date: |
Sat, 25 Apr 2020 10:59:12 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 |
On 4/23/20 6:59 AM, Laurent Desnogues wrote:
> Hello,
>
> I found the following issues in SVE while playing with a vector length
> of 640-bit.
>
> 1. sve_uzp_p
>
> I think the comment about VL not being a power of 2 should be that VL
> is not a multiple of 512-bit elements with VL > 512 (not sure how to
> phrase that properly).
>
> if (oprsz & 15) {
> d[i] = compress_bits(n[2 * i] >> odd, esz);
> Here n[2 * i + 1] should be taken into account.
>
> for (i = 0; i < oprsz_16; i++) {
> l = m[2 * i + 0];
> h = m[2 * i + 1];
> l = compress_bits(l >> odd, esz);
> h = compress_bits(h >> odd, esz);
> tmp_m.p[i] = l + (h << 32);
> }
> tmp_m.p[i] = compress_bits(m[2 * i] >> odd, esz);
> Here m[2 * i + 1] should be taken into account.
Fixed. This was obvious once you pointed it out.
> This generates extraneous data in the higher part of the result.
>
> I hit this when I got a wrong result on an instruction that ends up
> using sve_cntp which counts all bits set in each 64-bit chunk. There
> might be some other instructions beyond ZIP that generate extra data
> that would break sve_cntp. So perhaps it'd be easier to fix sve_cmtp
> (and hope that it's the only function that uses bits beyond vector
> length...).
>
> I hope I got all of this correctly I'm not familiar with that
> implementation of SVE :)
This is not so obvious. I'll write a test case to try and find out, but
perhaps fixed by:
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index f0c9f81db9..a7ffd9a655 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -3071,7 +3071,7 @@ void HELPER(sve_zip_p)
high = oprsz >> 1;
}
- if ((high & 3) == 0) {
+ if ((high & 7) == 0) {
uint32_t *n = vn, *m = vm;
high >>= 2;
r~