Re: [Qemu-devel] [RFC] optimization for qcow2 cache get/put

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] optimization for qcow2 cache get/put

From:	Max Reitz
Subject:	Re: [Qemu-devel] [RFC] optimization for qcow2 cache get/put
Date:	Mon, 26 Jan 2015 09:11:25 -0500
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0

On 2015-01-26 at 08:20, Zhang Haoyu wrote:

Hi, all

Regarding too large qcow2 image, e.g., 2TB,
so long disruption happened when performing snapshot,
which was caused by cache update and IO wait.
perf top data shown as below,
    PerfTop:    2554 irqs/sec  kernel: 0.4%  exact:  0.0% [4000Hz cycles],  
(target_pid: 34294)
------------------------------------------------------------------------------------------------------------------------

     33.80%  qemu-system-x86_64  [.] qcow2_cache_do_get
     27.59%  qemu-system-x86_64  [.] qcow2_cache_put
     15.19%  qemu-system-x86_64  [.] qcow2_cache_entry_mark_dirty
      5.49%  qemu-system-x86_64  [.] update_refcount
      3.02%  libpthread-2.13.so  [.] pthread_getspecific
      2.26%  qemu-system-x86_64  [.] get_refcount
      1.95%  qemu-system-x86_64  [.] coroutine_get_thread_state
      1.32%  qemu-system-x86_64  [.] qcow2_update_snapshot_refcount
      1.20%  qemu-system-x86_64  [.] qemu_coroutine_self
      1.16%  libz.so.1.2.7       [.] 0x0000000000003018
      0.95%  qemu-system-x86_64  [.] qcow2_update_cluster_refcount
      0.91%  qemu-system-x86_64  [.] qcow2_cache_get
      0.76%  libc-2.13.so        [.] 0x0000000000134e49
      0.73%  qemu-system-x86_64  [.] bdrv_debug_event
      0.16%  qemu-system-x86_64  [.] address@hidden
      0.12%  [kernel]            [k] _raw_spin_unlock_irqrestore
      0.10%  qemu-system-x86_64  [.] vga_draw_line24_32
      0.09%  [vdso]              [.] 0x000000000000060c
      0.09%  qemu-system-x86_64  [.] qcow2_check_metadata_overlap
      0.08%  [kernel]            [k] do_blockdev_direct_IO

If expand the cache table size, the IO will be decreased,
but the calculation time will be grown.
so it's worthy to optimize qcow2 cache get and put algorithm.

My proposal:
get:
using ((use offset >> cluster_bits) % c->size) to locate the cache entry,
raw implementation,
index = (use offset >> cluster_bits) % c->size;
if (c->entries[index].offset == offset) {
     goto found;
}

replace:
c->entries[use offset >> cluster_bits) % c->size].offset = offset;

Well, direct-mapped caches do have their benefits, but remember thatthey do have disadvantages, too. Regarding CPU caches, set associativecaches seem to be largely favored, so that may be a better idea.


CC'ing Kevin, because it's his code.

Max

...

put:
using 64-entries cache table to cache
the recently got c->entries, i.e., cache for cache,
then during put process, firstly search the 64-entries cache,
if not found, then the c->entries.

Any idea?

Thanks,
Zhang Haoyu

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [RFC] optimization for qcow2 cache get/put, Zhang Haoyu, 2015/01/26
- Re: [Qemu-devel] [RFC] optimization for qcow2 cache get/put, Max Reitz <=
- Re: [Qemu-devel] [RFC] optimization for qcow2 cache get/put, Zhang Haoyu, 2015/01/26
- Re: [Qemu-devel] [RFC] optimization for qcow2 cache get/put, Zhang Haoyu, 2015/01/26

Prev by Date: Re: [Qemu-devel] [PATCH v2 00/14] block: Remove "growable", add blk_new_open()
Next by Date: [Qemu-devel] [PATCH] target-ppc: Use right page size with hash table lookup
Previous by thread: [Qemu-devel] [RFC] optimization for qcow2 cache get/put
Next by thread: Re: [Qemu-devel] [RFC] optimization for qcow2 cache get/put
Index(es):
- Date
- Thread