[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCHv2 3/9] buffer_is_zero: use vector optimizations
From: |
Peter Lieven |
Subject: |
Re: [Qemu-devel] [PATCHv2 3/9] buffer_is_zero: use vector optimizations if possible |
Date: |
Tue, 19 Mar 2013 17:14:40 +0100 |
Am 19.03.2013 um 17:08 schrieb Eric Blake <address@hidden>:
> On 03/15/2013 09:50 AM, Peter Lieven wrote:
>> performance gain on SSE2 is approx. 20-25%. altivec
>> is not tested. performance for unsigned long arithmetic
>> is unchanged.
>>
>> Signed-off-by: Peter Lieven <address@hidden>
>> ---
>> util/cutils.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/util/cutils.c b/util/cutils.c
>> index 857dd7d..00d98fb 100644
>> --- a/util/cutils.c
>> +++ b/util/cutils.c
>> @@ -190,6 +190,13 @@ size_t buffer_find_nonzero_offset(const void *buf,
>> size_t len)
>> */
>> bool buffer_is_zero(const void *buf, size_t len)
>> {
>> + /* use vector optimized zero check if possible */
>> + if (((uintptr_t) buf) % sizeof(VECTYPE) == 0
>> + && len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
>> + * sizeof(VECTYPE)) == 0) {
>
> Is it worth factoring this check into something more reusable, by adding
> something like 'bool buffer_can_use_vectors(buf, len)' in patch 2/9?
good idea!
>
>> + return buffer_find_nonzero_offset(buf, len)==len;
>
> Spaces around binary operators.
>
> Is it worth rewriting this function into a simpler:
>
> check up to (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR - 1) longs until
> we are aligned
> check buffer_find_nonzero_offset on the aligned middle
> check up to (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR - 1) longs at tail
>
> instead of having two instances of code that can loop over the entire
> buffer? Otherwise, searching for zeros on an unaligned buffer will
> remain slower, even though the bulk of the search could still benefit
> from the vector operations.
i do not think that this is necessary as in all cases I know the buffers are
aligned properly.
a) ram pages (4k)
b) block migration pages (1M)
c) bdrv sectors (512)
At least on x86_64 they have all 16-byte aligned addresses.
It would just add more branches to the case where the optimized version can be
used maybe
making it slower.
Peter
>
>> + }
>> +
>> /*
>> * Use long as the biggest available internal data type that fits into
>> the
>> * CPU register and unroll the loop to smooth out the effect of memory
>
> Your patch results in C99 declarations after statements; while we
> require C99, I'm not sure if qemu prefers to stick to the C89 style of
> declarations before statements.
the code was slower if I made the checks afterwards.
Peter
>
> --
> Eric Blake eblake redhat com +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
- [Qemu-devel] [PATCHv2 9/9] migration: use XBZRLE only after bulk stage, (continued)
- [Qemu-devel] [PATCHv2 9/9] migration: use XBZRLE only after bulk stage, Peter Lieven, 2013/03/15
- [Qemu-devel] [PATCHv2 8/9] migration: do not search dirty pages in bulk stage, Peter Lieven, 2013/03/15
- [Qemu-devel] [PATCHv2 4/9] bitops: use vector algorithm to optimize find_next_bit(), Peter Lieven, 2013/03/15
- [Qemu-devel] [PATCHv2 3/9] buffer_is_zero: use vector optimizations if possible, Peter Lieven, 2013/03/15
- [Qemu-devel] [PATCHv2 2/9] cutils: add a function to find non-zero content in a buffer, Peter Lieven, 2013/03/15
[Qemu-devel] [PATCHv2 7/9] migration: do not sent zero pages in bulk stage, Peter Lieven, 2013/03/15
[Qemu-devel] [PATCHv2 5/9] migration: search for zero instead of dup pages, Peter Lieven, 2013/03/15