[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 10/10] cutils: Rewrite x86 buffer zero checking
From: |
Richard Henderson |
Subject: |
Re: [Qemu-devel] [PATCH 10/10] cutils: Rewrite x86 buffer zero checking |
Date: |
Tue, 13 Sep 2016 09:27:02 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 |
On 09/13/2016 09:10 AM, Paolo Bonzini wrote:
> @@ -177,16 +231,15 @@ bool test_buffer_is_zero_next_accel(void)
>
> static bool select_accel_fn(const void *buf, size_t len)
> {
> - uintptr_t ibuf = (uintptr_t)buf;
> #ifdef CONFIG_AVX2_OPT
> - if (len % 128 == 0 && ibuf % 32 == 0 && (cpuid_cache & CACHE_AVX2)) {
> + if (len >= 128 && (cpuid_cache & CACHE_AVX2)) {
> return buffer_zero_avx2(buf, len);
> }
> - if (len % 64 == 0 && ibuf % 16 == 0 && (cpuid_cache & CACHE_SSE4)) {
> + if (len >= 64 && (cpuid_cache & CACHE_SSE4)) {
> return buffer_zero_sse4(buf, len);
> }
> #endif
> - if (len % 64 == 0 && ibuf % 16 == 0 && (cpuid_cache & CACHE_SSE2)) {
> + if (len >= 64 && (cpuid_cache & CACHE_SSE2)) {
> return buffer_zero_sse2(buf, len);
> }
You've dropped a major change to select_accel_fn here.
(1) The avx2 routine, as written, can support len >= 64, therefore a common
test works for all of the vectorized functions.
(2) I had saved the pointer to the routine, so that we didn't have to
repeatedly test multiple cpuid_cache bits.
r~
- [Qemu-devel] [PATCH 01/10] cutils: Move buffer_is_zero and subroutines to a new file, (continued)
- [Qemu-devel] [PATCH 01/10] cutils: Move buffer_is_zero and subroutines to a new file, Paolo Bonzini, 2016/09/13
- [Qemu-devel] [PATCH 04/10] cutils: Rearrange buffer_is_zero acceleration, Paolo Bonzini, 2016/09/13
- [Qemu-devel] [PATCH 03/10] cutils: Export only buffer_is_zero, Paolo Bonzini, 2016/09/13
- [Qemu-devel] [PATCH 05/10] cutils: Remove aarch64 buffer zero checking, Paolo Bonzini, 2016/09/13
- [Qemu-devel] [PATCH 02/10] cutils: Remove SPLAT macro, Paolo Bonzini, 2016/09/13
- [Qemu-devel] [PATCH 07/10] cutils: Add test for buffer_is_zero, Paolo Bonzini, 2016/09/13
- [Qemu-devel] [PATCH 06/10] cutils: Remove ppc buffer zero checking, Paolo Bonzini, 2016/09/13
- [Qemu-devel] [PATCH 08/10] cutils: Add SSE4 version, Paolo Bonzini, 2016/09/13
- [Qemu-devel] [PATCH 09/10] cutils: Add generic prefetch, Paolo Bonzini, 2016/09/13
- [Qemu-devel] [PATCH 10/10] cutils: Rewrite x86 buffer zero checking, Paolo Bonzini, 2016/09/13
- Re: [Qemu-devel] [PATCH 10/10] cutils: Rewrite x86 buffer zero checking,
Richard Henderson <=