qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC v1 00/12] Enable MTTCG for 32 bit arm on x86


From: Alex Bennée
Subject: Re: [Qemu-devel] [RFC v1 00/12] Enable MTTCG for 32 bit arm on x86
Date: Fri, 15 Apr 2016 20:12:20 +0100
User-agent: mu4e 0.9.17; emacs 25.0.92.6

Alex Bennée <address@hidden> writes:

> Hi,
>
> This series finally completes the re-build of Fred's multi_tcg_v8 tree
> by enabling MTTCG for armv7 guests on x86 hosts. This applies on top
> of the previous series:
<snip>
>
> Benchmarks
> ==========
>
> The benchmark is a simple boot and build test which builds stress-ng
> with -j ${NR_CPUS} and shuts down to facilitate easy repetition.
>
> arm-softmmu/qemu-system-arm -machine type=virt -display none -m 4096 \
>     -cpu cortex-a15 -serial telnet:127.0.0.1:4444 \
>     -monitor stdio -netdev user,id=unet,hostfwd=tcp::2222-:22 \
>     -device virtio-net -device,netdev=unet \
>     -drive 
> file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none
>  \
>     -device virtio-blk-device,drive=myblock
>     -append "console=ttyAMA0 systemd.unit=benchmark-build.service 
> root=/dev/vda1"
>     -kernel /home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img
>
>
> | -smp 1 (mttcg=off) | -smp 4 (mttcg=off) | -smp 4 (mttcg=on) |
> |--------------------+--------------------+-------------------|
> | 301.60 (5 runs)    | 312.27 (4 runs)    |  573.26 (5 runs)  |
>
> As the results show currently the performance for mttcg is worse than
> the single threaded version. However this tree doesn't have the
> lockless tb_find_fast which means every time there is a transition
> from one page to the next the lock needs to be taken. There is still
> work to be done for performance ;-)
>
> Alex Bennée (5):
>   qemu-thread: add simple test-and-set spinlock
>   atomic: introduce atomic_dec_fetch.
>   atomic: introduce cmpxchg_bool
>   cpus: pass CPUState to run_on_cpu helpers
>   cpus: default MTTCG to on for 32 bit ARM on x86
>
> KONRAD Frederic (5):
>   cpus: introduce async_safe_run_on_cpu.
>   cputlb: introduce tlb_flush_* async work.
>   translate-all: introduces tb_flush_safe.
>   arm: use tlb_flush_page_all for tlbimva[a]
>   arm: atomically check the exclusive value in a STREX
>
> Paolo Bonzini (1):
>   include: move CPU-related definitions out of qemu-common.h
>
> Sergey Fedorov (1):
>   tcg/i386: Make direct jump patching thread-safe
>
>  cpu-exec-common.c         |   1 +
>  cpu-exec.c                |  11 ++++
>  cpus.c                    | 137 
> +++++++++++++++++++++++++++++++++++++++++-----
>  cputlb.c                  |  61 ++++++++++++++++-----
>  hw/i386/kvm/apic.c        |   3 +-
>  hw/i386/kvmvapic.c        |   8 +--
>  hw/ppc/ppce500_spin.c     |   3 +-
>  hw/ppc/spapr.c            |   6 +-
>  hw/ppc/spapr_hcall.c      |  12 ++--
>  include/exec/exec-all.h   |   7 ++-
>  include/qemu-common.h     |  24 --------
>  include/qemu/atomic.h     |  15 +++++
>  include/qemu/processor.h  |  28 ++++++++++
>  include/qemu/thread.h     |  34 ++++++++++++
>  include/qemu/timer.h      |   1 +
>  include/qom/cpu.h         |  34 +++++++++++-
>  include/sysemu/cpus.h     |  13 +++++

As suggested by treblig I also ran a more pure CPU heavy task (pigz
compression of a kernel tarball):

command is ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', 
'-machine', 'type=virt', '-display', 'none', '-m', '4096', '-cpu', 
'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio', 
'-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device', 
'virtio-net-device,netdev=unet', '-drive', 
'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none',
 '-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0 
root=/dev/vda1 systemd.unit=benchmark-pigz.service', '-kernel', 
'/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp', 
'1', '-tcg', 'mttcg=off']
run 1: ret=0 (PASS), time=136.379699 (1/1)
run 2: ret=0 (PASS), time=135.358848 (2/2)
run 3: ret=0 (PASS), time=135.708094 (3/3)
run 4: ret=0 (PASS), time=136.076002 (4/4)
run 5: ret=0 (PASS), time=137.863306 (5/5)
command is ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', 
'-machine', 'type=virt', '-display', 'none', '-m', '4096', '-cpu', 
'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio', 
'-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device', 
'virtio-net-device,netdev=unet', '-drive', 
'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none',
 '-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0 
root=/dev/vda1 systemd.unit=benchmark-pigz.service', '-kernel', 
'/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp', 
'4', '-tcg', 'mttcg=on']
run 1: ret=0 (PASS), time=142.524636 (1/1)
run 2: ret=0 (PASS), time=139.960601 (2/2)
run 3: ret=0 (PASS), time=137.956633 (3/3)
run 4: ret=0 (PASS), time=139.699225 (4/4)
run 5: ret=0 (PASS), time=143.365373 (5/5)

More parity but of course we'd actually want it to be faster.

--
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]