qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/3] memory: an optimization


From: Gonglei (Arei)
Subject: Re: [Qemu-devel] [PATCH 0/3] memory: an optimization
Date: Sat, 20 Feb 2016 10:34:27 +0000

Hi Paolo,


> -----Original Message-----
> From: Paolo Bonzini [mailto:address@hidden On Behalf Of Paolo
> Bonzini
> Sent: Saturday, February 20, 2016 5:48 PM
> To: Gonglei (Arei); address@hidden
> Cc: Huangpeng (Peter)
> Subject: Re: [PATCH 0/3] memory: an optimization
> 
> 
> 
> On 20/02/2016 03:35, Gonglei wrote:
> > Perf top tells me qemu_get_ram_ptr consume too much cpu cycles.
> >> 22.56%  qemu-kvm                 [.] address_space_translate
> >>  13.29%  qemu-kvm                 [.] qemu_get_ram_ptr
> >>   4.71%  qemu-kvm                 [.] phys_page_find
> >>   4.43%  qemu-kvm                 [.]
> address_space_translate_internal
> >>   3.47%  libpthread-2.19.so       [.] __pthread_mutex_unlock_usercnt
> >>   3.08%  qemu-kvm                 [.] qemu_ram_addr_from_host
> >>   2.62%  qemu-kvm                 [.] address_space_map
> >>   2.61%  libc-2.19.so             [.] _int_malloc
> >>   2.58%  libc-2.19.so             [.] _int_free
> >>   2.38%  libc-2.19.so             [.] malloc
> >>   2.06%  libpthread-2.19.so       [.] pthread_mutex_lock
> >>   1.68%  libc-2.19.so             [.] malloc_consolidate
> >>   1.35%  libc-2.19.so             [.] __memcpy_sse2_unaligned
> >>   1.23%  qemu-kvm                 [.] lduw_le_phys
> >>   1.18%  qemu-kvm                 [.] find_next_zero_bit
> >>   1.02%  qemu-kvm                 [.] object_unref
> >
> > And Paolo suggested that we can get rid of qemu_get_ram_ptr
> > by storing the RAMBlock pointer into the memory region,
> > instead of the ram_addr_t value. And after appling this change,
> > I got much better performance indeed.
> 
> What's the gain like?
> 
After rebased on the master branch right now, I found that the qemu_get_ram_ptr 
is
not one of main consumers. But I also get some bonus from this patch set.

Before this optimization:
  1.26%  qemu-kvm                  [.] qemu_get_ram_ptr
  0.89%  qemu-kvm                  [.] qemu_get_ram_block

Applied the patch set:
 0.87%  qemu-kvm                 [.] qemu_get_ram_ptr

Now the main consumers are (too much different with qemu-2.3):
 6.38%  libpthread-2.19.so       [.] __pthread_mutex_unlock_usercnt
  6.02%  qemu-kvm                 [.] vring_desc_read.isra.26
  5.27%  qemu-kvm                 [.] address_space_map
  4.45%  qemu-kvm                 [.] qemu_ram_block_from_host
  4.13%  libpthread-2.19.so       [.] pthread_mutex_lock
  3.95%  libc-2.19.so             [.] _int_free
  3.46%  qemu-kvm                 [.] address_space_translate_internal
  3.40%  qemu-kvm                 [.] address_space_translate
  3.39%  qemu-kvm                 [.] phys_page_find
  3.37%  libc-2.19.so             [.] _int_malloc
  3.21%  qemu-kvm                 [.] stw_le_phys
  2.70%  libc-2.19.so             [.] malloc
  2.18%  qemu-kvm                 [.] lduw_le_phys
  2.15%  libc-2.19.so             [.] __memcpy_sse2_unaligned
  1.58%  qemu-kvm                 [.] address_space_write
  1.48%  libc-2.19.so             [.] memset
  1.22%  qemu-kvm                 [.] virtqueue_map_desc
  1.22%  libc-2.19.so             [.] __libc_calloc
  1.21%  qemu-kvm                 [.] virtio_notify

And the speed based on the master branch and my patch series:
 Testing AES-128-CBC cipher: 
        Encrypting in chunks of 256 bytes: done. 506.27 MiB in 5.01 secs: 
100.97 MiB/sec (2073684 packets)
        Encrypting in chunks of 256 bytes: done. 505.89 MiB in 5.02 secs: 
100.85 MiB/sec (2072106 packets)
        Encrypting in chunks of 256 bytes: done. 505.94 MiB in 5.02 secs: 
100.86 MiB/sec (2072343 packets)
        Encrypting in chunks of 256 bytes: done. 505.96 MiB in 5.02 secs: 
100.87 MiB/sec (2072412 packets)
        Encrypting in chunks of 256 bytes: done. 505.92 MiB in 5.02 secs: 
100.86 MiB/sec (2072241 packets)
        Encrypting in chunks of 256 bytes: done. 506.36 MiB in 5.02 secs: 
100.95 MiB/sec (2074057 packets)
        Encrypting in chunks of 256 bytes: done. 506.35 MiB in 5.01 secs: 
101.02 MiB/sec (2073998 packets)
        Encrypting in chunks of 256 bytes: done. 505.41 MiB in 5.01 secs: 
100.92 MiB/sec (2070157 packets)

> I've not reviewed the patch in depth, but what I can say is that I like
> it a lot.  It only does the bare minimum needed to provide the
> optimization, but this also makes it very simple to understand.  More
> cleanups and further optimizations are possible (including removing
> mr->ram_addr completely), but your patches really does one thing and
> does it well.  Good job!
> 
Thanks!

Regards,
-Gonglei



reply via email to

[Prev in Thread] Current Thread [Next in Thread]