[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations |
Date: |
Mon, 25 Mar 2013 15:34:16 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130311 Thunderbird/17.0.4 |
Il 25/03/2013 14:32, Peter Lieven ha scritto:
>
> Am 25.03.2013 um 14:23 schrieb Peter Lieven <address@hidden>:
>
>>
>> Am 25.03.2013 um 14:02 schrieb Paolo Bonzini <address@hidden>:
>>
>>>> Maybe I should have explained the output more detailed. The percentages
>>>> are added. 35.8% in the second last column means that
>>>> 35.8% have a return value that is less than TARGET_PAGE_SIZE.
>>>> This was meant to illustrate at how many 64-bit chunks you have
>>>> to look to grab a certain percentage of non-zero pages.
>>>
>>> Ok, I wrongly understood that many pages had 4088 zero bytes but
>>> the last 8 were not zero. Now it's clearer, and more logical too. :)
>>>
>>>> Looking e.g. at the third value it means that looking at the first
>>>> three 64-bit chunks it will catch 34.0% of all pages.
>>>> It turns out that the non-zeroness of a page can be detected looking
>>>> at the first 256 or so bits and only a low
>>>> percentage turns out to be non-zero at a later position. So after
>>>> having checked the first chunks one by one
>>>> there is no big penalty looking at the remaining chunks with the
>>>> vectorized loop.
>>>
>>> I think it makes most sense to unroll the first four non-vectorized
>>> iterations, i.e. not use SSE and use three or four ifs. Either:
>>>
>>> if (foo[0]) return 0;
>>> if (foo[1]) return 8;
>>> if (foo[2]) return 16;
>>> if (foo[3]) return 24;
>>>
>>> or
>>>
>>> if (foo[0]) return 0;
>>> if (foo[1] | foo[2] | foo[3]) return 8;
>>>
>>> and then proceed on the remaining 4096-4*sizeof(long) bytes with
>>> the vectorized loop. foo+4 is aligned for SIMD operations on both
>>> 32- and 64-bit machines, which makes this a nice choice.
>>
>> i can't start at foo+4 since the remaining X-4*sizeof(long) bytes
>> are not dividable by 8*sizeof(VECTYPE).
Hmm, right. What about just processing the first few longs twice, i.e.
the above followed by "for (i = 0; i < len / sizeof(sizeof(VECTYPE); i
+= BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR)"?
Paolo
>>
>> for (i = BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR;
>> i < len / sizeof(VECTYPE);
>> i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
>> …
>> }
>
> performance of the above is bad compared to:
>
> for (i = 0; i < BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR; i++) {
> if (!ALL_EQ(p[i], zero)) {
> return i * sizeof(VECTYPE);
> }
> }
>
> …
>
> The above is basically what old is_dup_page is doing, but after the first
> 8 iterations the optimized version kicks in.
>
> Peter
>
>
>
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, (continued)
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Paolo Bonzini, 2013/03/22
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Peter Lieven, 2013/03/22
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Paolo Bonzini, 2013/03/22
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Peter Lieven, 2013/03/23
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Peter Lieven, 2013/03/25
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Paolo Bonzini, 2013/03/25
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Peter Lieven, 2013/03/25
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Paolo Bonzini, 2013/03/25
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Peter Lieven, 2013/03/25
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Peter Lieven, 2013/03/25
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations,
Paolo Bonzini <=
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Peter Lieven, 2013/03/25
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Peter Lieven, 2013/03/26
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Paolo Bonzini, 2013/03/26