[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PULL 07/15] util/bufferiszero: improve avx2 accelerator
From: |
Paolo Bonzini |
Subject: |
[PULL 07/15] util/bufferiszero: improve avx2 accelerator |
Date: |
Thu, 2 Apr 2020 15:06:32 -0400 |
From: Robert Hoo <address@hidden>
By increasing avx2 length_to_accel to 128, we can simplify its logic and reduce
a
branch.
The authorship of this patch actually belongs to Richard Henderson
<address@hidden>, I just fixed a boundary case on his
original patch.
Suggested-by: Richard Henderson <address@hidden>
Signed-off-by: Robert Hoo <address@hidden>
Message-Id: <address@hidden>
Signed-off-by: Paolo Bonzini <address@hidden>
---
util/bufferiszero.c | 26 +++++++++-----------------
1 file changed, 9 insertions(+), 17 deletions(-)
diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index b8012532e4..695bb4ce28 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -158,27 +158,19 @@ buffer_zero_avx2(const void *buf, size_t len)
__m256i *p = (__m256i *)(((uintptr_t)buf + 5 * 32) & -32);
__m256i *e = (__m256i *)(((uintptr_t)buf + len) & -32);
- if (likely(p <= e)) {
- /* Loop over 32-byte aligned blocks of 128. */
- do {
- __builtin_prefetch(p);
- if (unlikely(!_mm256_testz_si256(t, t))) {
- return false;
- }
- t = p[-4] | p[-3] | p[-2] | p[-1];
- p += 4;
- } while (p <= e);
- } else {
- t |= _mm256_loadu_si256(buf + 32);
- if (len <= 128) {
- goto last2;
+ /* Loop over 32-byte aligned blocks of 128. */
+ while (p <= e) {
+ __builtin_prefetch(p);
+ if (unlikely(!_mm256_testz_si256(t, t))) {
+ return false;
}
- }
+ t = p[-4] | p[-3] | p[-2] | p[-1];
+ p += 4;
+ } ;
/* Finish the last block of 128 unaligned. */
t |= _mm256_loadu_si256(buf + len - 4 * 32);
t |= _mm256_loadu_si256(buf + len - 3 * 32);
- last2:
t |= _mm256_loadu_si256(buf + len - 2 * 32);
t |= _mm256_loadu_si256(buf + len - 1 * 32);
@@ -263,7 +255,7 @@ static void init_accel(unsigned cache)
}
if (cache & CACHE_AVX2) {
fn = buffer_zero_avx2;
- length_to_accel = 64;
+ length_to_accel = 128;
}
#endif
#ifdef CONFIG_AVX512F_OPT
--
2.18.2
- [PULL 00/15] Misc patches for 5.0-rc2, Paolo Bonzini, 2020/04/02
- [PULL 02/15] hw/isa/superio: Correct the license text, Paolo Bonzini, 2020/04/02
- [PULL 01/15] hw/scsi/vmw_pvscsi: Remove assertion for kick after reset, Paolo Bonzini, 2020/04/02
- [PULL 03/15] virtio-iommu: depend on PCI, Paolo Bonzini, 2020/04/02
- [PULL 04/15] softmmu: fix crash with invalid -M memory-backend=, Paolo Bonzini, 2020/04/02
- [PULL 06/15] util/bufferiszero: assign length_to_accel value for each accelerator case, Paolo Bonzini, 2020/04/02
- [PULL 07/15] util/bufferiszero: improve avx2 accelerator,
Paolo Bonzini <=
- [PULL 08/15] vl: fix broken IPA range for ARM -M virt with KVM enabled, Paolo Bonzini, 2020/04/02
- [PULL 11/15] target/i386: do not set unsupported VMX secondary execution controls, Paolo Bonzini, 2020/04/02
- [PULL 12/15] migration: fix cleanup_bh leak on resume, Paolo Bonzini, 2020/04/02
- [PULL 09/15] i386: hvf: Reset IRQ inhibition after moving RIP, Paolo Bonzini, 2020/04/02
- [PULL 14/15] object-add: don't create return value if failed, Paolo Bonzini, 2020/04/02
- [PULL 05/15] MAINTAINERS: Add an entry for the HVF accelerator, Paolo Bonzini, 2020/04/02
- [PULL 13/15] qmp: fix leak on callbacks that return both value and error, Paolo Bonzini, 2020/04/02
- [PULL 10/15] serial: Fix double migration data, Paolo Bonzini, 2020/04/02
- [PULL 15/15] xen: fixup RAM memory region initialization, Paolo Bonzini, 2020/04/02
- Re: [PULL 00/15] Misc patches for 5.0-rc2, no-reply, 2020/04/02