Re: [Qemu-devel] [PATCH 0/3] memory: an optimization

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/3] memory: an optimization

From:	Gonglei (Arei)
Subject:	Re: [Qemu-devel] [PATCH 0/3] memory: an optimization
Date:	Sat, 20 Feb 2016 10:34:27 +0000

Hi Paolo,


> -----Original Message-----
> From: Paolo Bonzini [mailto:address@hidden On Behalf Of Paolo
> Bonzini
> Sent: Saturday, February 20, 2016 5:48 PM
> To: Gonglei (Arei); address@hidden
> Cc: Huangpeng (Peter)
> Subject: Re: [PATCH 0/3] memory: an optimization
> 
> 
> 
> On 20/02/2016 03:35, Gonglei wrote:
> > Perf top tells me qemu_get_ram_ptr consume too much cpu cycles.
> >> 22.56%  qemu-kvm                 [.] address_space_translate
> >>  13.29%  qemu-kvm                 [.] qemu_get_ram_ptr
> >>   4.71%  qemu-kvm                 [.] phys_page_find
> >>   4.43%  qemu-kvm                 [.]
> address_space_translate_internal
> >>   3.47%  libpthread-2.19.so       [.] __pthread_mutex_unlock_usercnt
> >>   3.08%  qemu-kvm                 [.] qemu_ram_addr_from_host
> >>   2.62%  qemu-kvm                 [.] address_space_map
> >>   2.61%  libc-2.19.so             [.] _int_malloc
> >>   2.58%  libc-2.19.so             [.] _int_free
> >>   2.38%  libc-2.19.so             [.] malloc
> >>   2.06%  libpthread-2.19.so       [.] pthread_mutex_lock
> >>   1.68%  libc-2.19.so             [.] malloc_consolidate
> >>   1.35%  libc-2.19.so             [.] __memcpy_sse2_unaligned
> >>   1.23%  qemu-kvm                 [.] lduw_le_phys
> >>   1.18%  qemu-kvm                 [.] find_next_zero_bit
> >>   1.02%  qemu-kvm                 [.] object_unref
> >
> > And Paolo suggested that we can get rid of qemu_get_ram_ptr
> > by storing the RAMBlock pointer into the memory region,
> > instead of the ram_addr_t value. And after appling this change,
> > I got much better performance indeed.
> 
> What's the gain like?
> 
After rebased on the master branch right now, I found that the qemu_get_ram_ptr 
is
not one of main consumers. But I also get some bonus from this patch set.

Before this optimization:
  1.26%  qemu-kvm                  [.] qemu_get_ram_ptr
  0.89%  qemu-kvm                  [.] qemu_get_ram_block

Applied the patch set:
 0.87%  qemu-kvm                 [.] qemu_get_ram_ptr

Now the main consumers are (too much different with qemu-2.3):
 6.38%  libpthread-2.19.so       [.] __pthread_mutex_unlock_usercnt
  6.02%  qemu-kvm                 [.] vring_desc_read.isra.26
  5.27%  qemu-kvm                 [.] address_space_map
  4.45%  qemu-kvm                 [.] qemu_ram_block_from_host
  4.13%  libpthread-2.19.so       [.] pthread_mutex_lock
  3.95%  libc-2.19.so             [.] _int_free
  3.46%  qemu-kvm                 [.] address_space_translate_internal
  3.40%  qemu-kvm                 [.] address_space_translate
  3.39%  qemu-kvm                 [.] phys_page_find
  3.37%  libc-2.19.so             [.] _int_malloc
  3.21%  qemu-kvm                 [.] stw_le_phys
  2.70%  libc-2.19.so             [.] malloc
  2.18%  qemu-kvm                 [.] lduw_le_phys
  2.15%  libc-2.19.so             [.] __memcpy_sse2_unaligned
  1.58%  qemu-kvm                 [.] address_space_write
  1.48%  libc-2.19.so             [.] memset
  1.22%  qemu-kvm                 [.] virtqueue_map_desc
  1.22%  libc-2.19.so             [.] __libc_calloc
  1.21%  qemu-kvm                 [.] virtio_notify

And the speed based on the master branch and my patch series:
 Testing AES-128-CBC cipher: 
        Encrypting in chunks of 256 bytes: done. 506.27 MiB in 5.01 secs: 
100.97 MiB/sec (2073684 packets)
        Encrypting in chunks of 256 bytes: done. 505.89 MiB in 5.02 secs: 
100.85 MiB/sec (2072106 packets)
        Encrypting in chunks of 256 bytes: done. 505.94 MiB in 5.02 secs: 
100.86 MiB/sec (2072343 packets)
        Encrypting in chunks of 256 bytes: done. 505.96 MiB in 5.02 secs: 
100.87 MiB/sec (2072412 packets)
        Encrypting in chunks of 256 bytes: done. 505.92 MiB in 5.02 secs: 
100.86 MiB/sec (2072241 packets)
        Encrypting in chunks of 256 bytes: done. 506.36 MiB in 5.02 secs: 
100.95 MiB/sec (2074057 packets)
        Encrypting in chunks of 256 bytes: done. 506.35 MiB in 5.01 secs: 
101.02 MiB/sec (2073998 packets)
        Encrypting in chunks of 256 bytes: done. 505.41 MiB in 5.01 secs: 
100.92 MiB/sec (2070157 packets)

> I've not reviewed the patch in depth, but what I can say is that I like
> it a lot.  It only does the bare minimum needed to provide the
> optimization, but this also makes it very simple to understand.  More
> cleanups and further optimizations are possible (including removing
> mr->ram_addr completely), but your patches really does one thing and
> does it well.  Good job!
> 
Thanks!

Regards,
-Gonglei

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH 0/3] memory: an optimization, Gonglei, 2016/02/19
- [Qemu-devel] [PATCH 3/3] memory: Remove the superfluous code, Gonglei, 2016/02/19
- [Qemu-devel] [PATCH 1/3] exec: store RAMBlock pointer into memory region, Gonglei, 2016/02/19
  - Re: [Qemu-devel] [PATCH 1/3] exec: store RAMBlock pointer into memory region, Fam Zheng, 2016/02/21
    - Re: [Qemu-devel] [PATCH 1/3] exec: store RAMBlock pointer into memory region, Gonglei (Arei), 2016/02/21
- [Qemu-devel] [PATCH 2/3] memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length, Gonglei, 2016/02/19
- Re: [Qemu-devel] [PATCH 0/3] memory: an optimization, Paolo Bonzini, 2016/02/20
  - Re: [Qemu-devel] [PATCH 0/3] memory: an optimization, Gonglei (Arei) <=

Prev by Date: Re: [Qemu-devel] [PATCH 00/13] IOMMU: Enable interrupt remapping for Intel IOMMU
Next by Date: [Qemu-devel] kernel 4.4.2: kvm_irq_delivery_to_api / rwsem_down_read_failed
Previous by thread: Re: [Qemu-devel] [PATCH 0/3] memory: an optimization
Next by thread: [Qemu-devel] [PATCH 0/2] virtio-balloon: improve balloon statistics
Index(es):
- Date
- Thread