qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCHv2 3/9] buffer_is_zero: use vector optimizations


From: Eric Blake
Subject: Re: [Qemu-devel] [PATCHv2 3/9] buffer_is_zero: use vector optimizations if possible
Date: Tue, 19 Mar 2013 10:08:03 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130311 Thunderbird/17.0.4

On 03/15/2013 09:50 AM, Peter Lieven wrote:
> performance gain on SSE2 is approx. 20-25%. altivec
> is not tested. performance for unsigned long arithmetic
> is unchanged.
> 
> Signed-off-by: Peter Lieven <address@hidden>
> ---
>  util/cutils.c |    7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/util/cutils.c b/util/cutils.c
> index 857dd7d..00d98fb 100644
> --- a/util/cutils.c
> +++ b/util/cutils.c
> @@ -190,6 +190,13 @@ size_t buffer_find_nonzero_offset(const void *buf, 
> size_t len)
>   */
>  bool buffer_is_zero(const void *buf, size_t len)
>  {
> +    /* use vector optimized zero check if possible */
> +    if (((uintptr_t) buf) % sizeof(VECTYPE) == 0 
> +          && len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
> +             * sizeof(VECTYPE)) == 0) {

Is it worth factoring this check into something more reusable, by adding
something like 'bool buffer_can_use_vectors(buf, len)' in patch 2/9?

> +        return buffer_find_nonzero_offset(buf, len)==len;

Spaces around binary operators.

Is it worth rewriting this function into a simpler:

check up to (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR - 1) longs until
we are aligned
check buffer_find_nonzero_offset on the aligned middle
check up to (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR - 1) longs at tail

instead of having two instances of code that can loop over the entire
buffer?  Otherwise, searching for zeros on an unaligned buffer will
remain slower, even though the bulk of the search could still benefit
from the vector operations.

> +    }
> +
>      /*
>       * Use long as the biggest available internal data type that fits into 
> the
>       * CPU register and unroll the loop to smooth out the effect of memory

Your patch results in C99 declarations after statements; while we
require C99, I'm not sure if qemu prefers to stick to the C89 style of
declarations before statements.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]