[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC v1 00/12] Enable MTTCG for 32 bit arm on x86
From: |
Alex Bennée |
Subject: |
Re: [Qemu-devel] [RFC v1 00/12] Enable MTTCG for 32 bit arm on x86 |
Date: |
Fri, 15 Apr 2016 20:12:20 +0100 |
User-agent: |
mu4e 0.9.17; emacs 25.0.92.6 |
Alex Bennée <address@hidden> writes:
> Hi,
>
> This series finally completes the re-build of Fred's multi_tcg_v8 tree
> by enabling MTTCG for armv7 guests on x86 hosts. This applies on top
> of the previous series:
<snip>
>
> Benchmarks
> ==========
>
> The benchmark is a simple boot and build test which builds stress-ng
> with -j ${NR_CPUS} and shuts down to facilitate easy repetition.
>
> arm-softmmu/qemu-system-arm -machine type=virt -display none -m 4096 \
> -cpu cortex-a15 -serial telnet:127.0.0.1:4444 \
> -monitor stdio -netdev user,id=unet,hostfwd=tcp::2222-:22 \
> -device virtio-net -device,netdev=unet \
> -drive
> file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none
> \
> -device virtio-blk-device,drive=myblock
> -append "console=ttyAMA0 systemd.unit=benchmark-build.service
> root=/dev/vda1"
> -kernel /home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img
>
>
> | -smp 1 (mttcg=off) | -smp 4 (mttcg=off) | -smp 4 (mttcg=on) |
> |--------------------+--------------------+-------------------|
> | 301.60 (5 runs) | 312.27 (4 runs) | 573.26 (5 runs) |
>
> As the results show currently the performance for mttcg is worse than
> the single threaded version. However this tree doesn't have the
> lockless tb_find_fast which means every time there is a transition
> from one page to the next the lock needs to be taken. There is still
> work to be done for performance ;-)
>
> Alex Bennée (5):
> qemu-thread: add simple test-and-set spinlock
> atomic: introduce atomic_dec_fetch.
> atomic: introduce cmpxchg_bool
> cpus: pass CPUState to run_on_cpu helpers
> cpus: default MTTCG to on for 32 bit ARM on x86
>
> KONRAD Frederic (5):
> cpus: introduce async_safe_run_on_cpu.
> cputlb: introduce tlb_flush_* async work.
> translate-all: introduces tb_flush_safe.
> arm: use tlb_flush_page_all for tlbimva[a]
> arm: atomically check the exclusive value in a STREX
>
> Paolo Bonzini (1):
> include: move CPU-related definitions out of qemu-common.h
>
> Sergey Fedorov (1):
> tcg/i386: Make direct jump patching thread-safe
>
> cpu-exec-common.c | 1 +
> cpu-exec.c | 11 ++++
> cpus.c | 137
> +++++++++++++++++++++++++++++++++++++++++-----
> cputlb.c | 61 ++++++++++++++++-----
> hw/i386/kvm/apic.c | 3 +-
> hw/i386/kvmvapic.c | 8 +--
> hw/ppc/ppce500_spin.c | 3 +-
> hw/ppc/spapr.c | 6 +-
> hw/ppc/spapr_hcall.c | 12 ++--
> include/exec/exec-all.h | 7 ++-
> include/qemu-common.h | 24 --------
> include/qemu/atomic.h | 15 +++++
> include/qemu/processor.h | 28 ++++++++++
> include/qemu/thread.h | 34 ++++++++++++
> include/qemu/timer.h | 1 +
> include/qom/cpu.h | 34 +++++++++++-
> include/sysemu/cpus.h | 13 +++++
As suggested by treblig I also ran a more pure CPU heavy task (pigz
compression of a kernel tarball):
command is ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm',
'-machine', 'type=virt', '-display', 'none', '-m', '4096', '-cpu',
'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio',
'-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device',
'virtio-net-device,netdev=unet', '-drive',
'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none',
'-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0
root=/dev/vda1 systemd.unit=benchmark-pigz.service', '-kernel',
'/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp',
'1', '-tcg', 'mttcg=off']
run 1: ret=0 (PASS), time=136.379699 (1/1)
run 2: ret=0 (PASS), time=135.358848 (2/2)
run 3: ret=0 (PASS), time=135.708094 (3/3)
run 4: ret=0 (PASS), time=136.076002 (4/4)
run 5: ret=0 (PASS), time=137.863306 (5/5)
command is ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm',
'-machine', 'type=virt', '-display', 'none', '-m', '4096', '-cpu',
'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio',
'-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device',
'virtio-net-device,netdev=unet', '-drive',
'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none',
'-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0
root=/dev/vda1 systemd.unit=benchmark-pigz.service', '-kernel',
'/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp',
'4', '-tcg', 'mttcg=on']
run 1: ret=0 (PASS), time=142.524636 (1/1)
run 2: ret=0 (PASS), time=139.960601 (2/2)
run 3: ret=0 (PASS), time=137.956633 (3/3)
run 4: ret=0 (PASS), time=139.699225 (4/4)
run 5: ret=0 (PASS), time=143.365373 (5/5)
More parity but of course we'd actually want it to be faster.
--
Alex Bennée
- Re: [Qemu-devel] [RFC v1 05/12] atomic: introduce cmpxchg_bool, (continued)
- [Qemu-devel] [RFC v1 07/12] cpus: introduce async_safe_run_on_cpu., Alex Bennée, 2016/04/15
- [Qemu-devel] [RFC v1 11/12] arm: atomically check the exclusive value in a STREX, Alex Bennée, 2016/04/15
- [Qemu-devel] [RFC v1 10/12] arm: use tlb_flush_page_all for tlbimva[a], Alex Bennée, 2016/04/15
- [Qemu-devel] [RFC v1 06/12] cpus: pass CPUState to run_on_cpu helpers, Alex Bennée, 2016/04/15
- [Qemu-devel] [RFC v1 09/12] translate-all: introduces tb_flush_safe., Alex Bennée, 2016/04/15
- [Qemu-devel] [RFC v1 08/12] cputlb: introduce tlb_flush_* async work., Alex Bennée, 2016/04/15
- [Qemu-devel] [RFC v1 12/12] cpus: default MTTCG to on for 32 bit ARM on x86, Alex Bennée, 2016/04/15
- Re: [Qemu-devel] [RFC v1 00/12] Enable MTTCG for 32 bit arm on x86,
Alex Bennée <=