|
From: | Richard Henderson |
Subject: | Re: [Qemu-arm] [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking |
Date: | Sat, 9 Apr 2016 15:45:43 -0700 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.1 |
On 04/07/2016 02:58 AM, address@hidden wrote:
+#elif defined __aarch64__ +#include "arm_neon.h"
A better test is __NEON__, which asserts that neon is available at compile time (which will be true basically always for aarch64), and then you don't need a runime test for neon.
You also get support for armv7 with neon.
+#define NEON_VECTYPE uint64x2_t +#define NEON_LOAD_N_ORR(v1, v2) (vld1q_u64(&v1) | vld1q_u64(&v2)) +#define NEON_ORR(v1, v2) ((v1) | (v2)) +#define NEON_NOT_EQ_ZERO(v1) \ + ((vgetq_lane_u64(v1, 0) != 0) || (vgetq_lane_u64(v1, 1) != 0))
FWIW, I think that vmaxvq_u32 would be a better reduction for aarch64. Extracting the individual lanes isn't as efficient as one would like.
For armv7, folding via vget_lane_u64(vget_high_u64(v1) | vget_low_u64(v1), 0) is probably best.
r~
[Prev in Thread] | Current Thread | [Next in Thread] |