Re: [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checki

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checki

From:	Peter Maydell
Subject:	Re: [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking
Date:	Thu, 7 Apr 2016 11:44:05 +0100

On 7 April 2016 at 11:30, Paolo Bonzini <address@hidden> wrote:
>
>> +#elif defined __aarch64__
>> +#include "arm_neon.h"
>> +
>> +#define NEON_VECTYPE               uint64x2_t
>> +#define NEON_LOAD_N_ORR(v1, v2)    (vld1q_u64(&v1) | vld1q_u64(&v2))
>
> Why is the load and orr necessary?  Is ((v1) | (v2)) enough?
>
>> +#define NEON_ORR(v1, v2)           ((v1) | (v2))
>> +#define NEON_NOT_EQ_ZERO(v1) \
>> +        ((vgetq_lane_u64(v1, 0) != 0) || (vgetq_lane_u64(v1, 1) != 0))
>> +
>> +#define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR_NEON 16
>
> Unless you have numbers saying that a 16-unroll is better than an 8-unroll
> (and then you should put those in the commit message), you do not need to
> duplicate code, just add aarch64 definitions for the existing code.

This pure-neon code is also not doing the initial short-loop to
test for non-zero buffers, which means it's not an apples-to-apples
comparison. It seems unlikely that workload balances are going
to be different on ARM vs x86 such that it's worth doing the
small loop on one but not the other. (This is also why it's helpful
to explain your benchmarking method -- the short loop will slow
things down for some cases like "large and untouched RAM", but be
faster again for cases like "large RAM of which most pages have
been dirtied".)

thanks
-- PMM

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [RFC PATCH v2 0/3] ARM64: Live migration optimization, vijayak, 2016/04/07
- [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking, vijayak, 2016/04/07
  - Re: [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking, Paolo Bonzini, 2016/04/07
    - Re: [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking, Peter Maydell <=
  - Re: [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking, Peter Maydell, 2016/04/07
  - Re: [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking, Richard Henderson, 2016/04/09
    - Re: [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking, Peter Maydell, 2016/04/11
- [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo, vijayak, 2016/04/07
  - Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo, Peter Maydell, 2016/04/07
    - Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo, Vijay Kilari, 2016/04/07
    - Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo, Peter Maydell, 2016/04/07
    - Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo, Vijay Kilari, 2016/04/08
    - Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo, Peter Maydell, 2016/04/08
    - Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo, Vijay Kilari, 2016/04/11

Prev by Date: Re: [Qemu-devel] [PATCH v2] doc: Add NBD_CMD_BLOCK_STATUS extension
Next by Date: Re: [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking
Previous by thread: Re: [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking
Next by thread: Re: [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking
Index(es):
- Date
- Thread