[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 0/3] memory: an optimization
From: |
Gonglei (Arei) |
Subject: |
Re: [Qemu-devel] [PATCH 0/3] memory: an optimization |
Date: |
Sat, 20 Feb 2016 10:34:27 +0000 |
Hi Paolo,
> -----Original Message-----
> From: Paolo Bonzini [mailto:address@hidden On Behalf Of Paolo
> Bonzini
> Sent: Saturday, February 20, 2016 5:48 PM
> To: Gonglei (Arei); address@hidden
> Cc: Huangpeng (Peter)
> Subject: Re: [PATCH 0/3] memory: an optimization
>
>
>
> On 20/02/2016 03:35, Gonglei wrote:
> > Perf top tells me qemu_get_ram_ptr consume too much cpu cycles.
> >> 22.56% qemu-kvm [.] address_space_translate
> >> 13.29% qemu-kvm [.] qemu_get_ram_ptr
> >> 4.71% qemu-kvm [.] phys_page_find
> >> 4.43% qemu-kvm [.]
> address_space_translate_internal
> >> 3.47% libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt
> >> 3.08% qemu-kvm [.] qemu_ram_addr_from_host
> >> 2.62% qemu-kvm [.] address_space_map
> >> 2.61% libc-2.19.so [.] _int_malloc
> >> 2.58% libc-2.19.so [.] _int_free
> >> 2.38% libc-2.19.so [.] malloc
> >> 2.06% libpthread-2.19.so [.] pthread_mutex_lock
> >> 1.68% libc-2.19.so [.] malloc_consolidate
> >> 1.35% libc-2.19.so [.] __memcpy_sse2_unaligned
> >> 1.23% qemu-kvm [.] lduw_le_phys
> >> 1.18% qemu-kvm [.] find_next_zero_bit
> >> 1.02% qemu-kvm [.] object_unref
> >
> > And Paolo suggested that we can get rid of qemu_get_ram_ptr
> > by storing the RAMBlock pointer into the memory region,
> > instead of the ram_addr_t value. And after appling this change,
> > I got much better performance indeed.
>
> What's the gain like?
>
After rebased on the master branch right now, I found that the qemu_get_ram_ptr
is
not one of main consumers. But I also get some bonus from this patch set.
Before this optimization:
1.26% qemu-kvm [.] qemu_get_ram_ptr
0.89% qemu-kvm [.] qemu_get_ram_block
Applied the patch set:
0.87% qemu-kvm [.] qemu_get_ram_ptr
Now the main consumers are (too much different with qemu-2.3):
6.38% libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt
6.02% qemu-kvm [.] vring_desc_read.isra.26
5.27% qemu-kvm [.] address_space_map
4.45% qemu-kvm [.] qemu_ram_block_from_host
4.13% libpthread-2.19.so [.] pthread_mutex_lock
3.95% libc-2.19.so [.] _int_free
3.46% qemu-kvm [.] address_space_translate_internal
3.40% qemu-kvm [.] address_space_translate
3.39% qemu-kvm [.] phys_page_find
3.37% libc-2.19.so [.] _int_malloc
3.21% qemu-kvm [.] stw_le_phys
2.70% libc-2.19.so [.] malloc
2.18% qemu-kvm [.] lduw_le_phys
2.15% libc-2.19.so [.] __memcpy_sse2_unaligned
1.58% qemu-kvm [.] address_space_write
1.48% libc-2.19.so [.] memset
1.22% qemu-kvm [.] virtqueue_map_desc
1.22% libc-2.19.so [.] __libc_calloc
1.21% qemu-kvm [.] virtio_notify
And the speed based on the master branch and my patch series:
Testing AES-128-CBC cipher:
Encrypting in chunks of 256 bytes: done. 506.27 MiB in 5.01 secs:
100.97 MiB/sec (2073684 packets)
Encrypting in chunks of 256 bytes: done. 505.89 MiB in 5.02 secs:
100.85 MiB/sec (2072106 packets)
Encrypting in chunks of 256 bytes: done. 505.94 MiB in 5.02 secs:
100.86 MiB/sec (2072343 packets)
Encrypting in chunks of 256 bytes: done. 505.96 MiB in 5.02 secs:
100.87 MiB/sec (2072412 packets)
Encrypting in chunks of 256 bytes: done. 505.92 MiB in 5.02 secs:
100.86 MiB/sec (2072241 packets)
Encrypting in chunks of 256 bytes: done. 506.36 MiB in 5.02 secs:
100.95 MiB/sec (2074057 packets)
Encrypting in chunks of 256 bytes: done. 506.35 MiB in 5.01 secs:
101.02 MiB/sec (2073998 packets)
Encrypting in chunks of 256 bytes: done. 505.41 MiB in 5.01 secs:
100.92 MiB/sec (2070157 packets)
> I've not reviewed the patch in depth, but what I can say is that I like
> it a lot. It only does the bare minimum needed to provide the
> optimization, but this also makes it very simple to understand. More
> cleanups and further optimizations are possible (including removing
> mr->ram_addr completely), but your patches really does one thing and
> does it well. Good job!
>
Thanks!
Regards,
-Gonglei