From: OHMURA Kei
Subject: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
Date: Fri, 12 Feb 2010 11:03:54 +0900
On 02/11/2010 Anthony Liguori <address@hidden> wrote:
> Oh, I see what's happening here. Yes, I think a leul_to_cpu() makes more
> sense.

Maybe I'm missing something here.
I couldn't find leul_to_cpu(), so have defined it in bswap.h.

--- a/bswap.h
+++ b/bswap.h
@@ -205,8 +205,10 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v)
 #define cpu_to_32wu cpu_to_be32wu
+#define leul_to_cpu(v) le ## HOST_LONG_BITS ## _to_cpu(v)
 #define cpu_to_32wu cpu_to_le32wu
+#define leul_to_cpu(v) (v)

On 02/10/2010 Ulrich Drepper <address@hidden> wrote:
> If you're optimizing this code you might want to do it all.  The
> compiler might not see through the bswap call and create unnecessary
> data dependencies.  Especially problematic if the bitmap is really
> sparse.  Also, the outer test is != while the inner test is >.  Be
> consistent.  I suggest to replace the inner loop with
>      do {
>        ...
>      } while (c != 0);
> Depending on how sparse the bitmap is populated this might reduce the
> number of data dependencies quite a bit.

Combining all comments, the code would be like this.
 if (bitmap_ul[i] != 0) {
     c = leul_to_cpu(bitmap_ul[i]);
     do {
         j = ffsl(c) - 1;
         c &= ~(1ul << j);
         page_number = i * HOST_LONG_BITS + j;
         addr1 = page_number * TARGET_PAGE_SIZE;
         addr = offset + addr1;
         ram_addr = cpu_get_physical_page_desc(addr);
     } while (c != 0);

