Re: [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx pla

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx pla

From:	Richard Henderson
Subject:	Re: [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx platform
Date:	Fri, 12 Aug 2016 14:20:43 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0

On 08/12/2016 12:32 PM, Vijay Kilari wrote:

On Sat, Aug 6, 2016 at 3:47 PM, Richard Henderson <address@hidden> wrote:

On 08/02/2016 03:50 PM, address@hidden wrote:


+#define VEC_PREFETCH(base, index) \
+        asm volatile ("prfm pldl1strm, [%x[a]]\n" : :
[a]"r"(&base[(index)]))



Is this not __builtin_prefetch(base + index) ?

I.e. you can defined this generically for all targets.


__builtin_prefetch() is available only in gcc 5.3 for arm64.

So? You can't really defend the position that you care about aarch64 codequality if you're using gcc 4.x. Essentially all of the performance work hasbeen done for later versions.

I'll note that you're also prefetching too much, off the end of the block,
and that you're probably not prefetching far enough.  You'd need to break
off the last iteration(s) of the loop.

I'll note that you're also prefetching too close.  The loop operates on
8*vecsize units.  In the case of aarch64, 128 byte units.  Both i+32 and


128 unit is specific to thunder. I will move this to thunder
specific function


No, you misunderstand.

While it's true that thunderx is unique within other aarch64 implementations inhaving a 128-byte cacheline size, the "128" I mention above has nothing to dowith that.

The loop is operating on BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR bytes, whichis defined above as 8 * sizeof(vector), which happens to be 128.

r~

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [RFC PATCH v1 0/2] Live migration optimization for Thunderx platform, vijay . kilari, 2016/08/02
- [Qemu-devel] [RFC PATCH v1 1/2] utils: Add helper to read arm MIDR_EL1 register, vijay . kilari, 2016/08/02
  - Re: [Qemu-devel] [RFC PATCH v1 1/2] utils: Add helper to read arm MIDR_EL1 register, Paolo Bonzini, 2016/08/02
    - Re: [Qemu-devel] [RFC PATCH v1 1/2] utils: Add helper to read arm MIDR_EL1 register, Vijay Kilari, 2016/08/04
    - Re: [Qemu-devel] [RFC PATCH v1 1/2] utils: Add helper to read arm MIDR_EL1 register, Paolo Bonzini, 2016/08/04
  - Re: [Qemu-devel] [RFC PATCH v1 1/2] utils: Add helper to read arm MIDR_EL1 register, Peter Maydell, 2016/08/02
- [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx platform, vijay . kilari, 2016/08/02
  - Re: [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx platform, Richard Henderson, 2016/08/06
    - Re: [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx platform, Vijay Kilari, 2016/08/12
    - Re: [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx platform, Richard Henderson <=

Prev by Date: Re: [Qemu-devel] [PATCH V2] add migration capability to bypass the shared memory
Next by Date: Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/6] target-ppc: Implement darn instruction
Previous by thread: Re: [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx platform
Next by thread: [Qemu-devel] [PATCH] mptsas: really fix migration compatibility
Index(es):
- Date
- Thread