|
From: | Richard Henderson |
Subject: | Re: [Qemu-devel] [PATCH v2 0/8] Improve buffer_is_zero |
Date: | Wed, 24 Aug 2016 13:31:38 -0700 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 |
On 08/24/2016 12:18 PM, Eric Blake wrote:
On 08/24/2016 12:48 PM, Richard Henderson wrote:Patches 1-4 remove the use of ifunc from the implementation. Patch 6 adjusts the x86 implementation a bit more to take advantage of ptest (in sse4.1) and unaligned accesses (in avx1).Do we really care about unaligned access? Or can we guarantee that all our calls to buffer_is_zero are already aligned, and make optimizations along those lines?
The old code asserted alignment of at least sizeof(long), although a survey of call sites doesn't make this obvious. I could imagine that we get alignment consistent with that of malloc, but can't prove it.
However, we're certainly not going to be able to assert arbitrary alignment, such as the 32-byte for AVX2, or the 64-byte for AVX512 (when that comes along).
Thankfully, at least AVX capable cpus are very efficient with unaligned accesses. r~
[Prev in Thread] | Current Thread | [Next in Thread] |