[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [RFC v1 00/12] Enable MTTCG for 32 bit arm on x86
From: |
Alex Bennée |
Subject: |
[Qemu-devel] [RFC v1 00/12] Enable MTTCG for 32 bit arm on x86 |
Date: |
Fri, 15 Apr 2016 15:23:38 +0100 |
Hi,
This series finally completes the re-build of Fred's multi_tcg_v8 tree
by enabling MTTCG for armv7 guests on x86 hosts. This applies on top
of the previous series:
- [RFC v2 00/11] Base enabling patches for MTTCG
You can find the final tree at:
https://github.com/stsquad/qemu/tree/mttcg/enable-mttcg-for-armv7-v1
which builds on:
https://github.com/stsquad/qemu/tree/mttcg/base-patches-v2
which includes Sergey's:
https://github.com/stsquad/qemu/tree/mttcg/tb-and-tcg-cleanups
I've tested this with a Debian Jessie guest as well as my extensive
MTTCG focused torture tests built on kvm-unit-tests.
Series Breakdown
================
The first 3 patches have been cherry-picked from other series and can
be skipped while reviewing. Paolo's is just some simple header
house-cleaning that made the run_on_cpu changes easier. Sergey's
thread safe patching is being reviewed elsewhere but does prevent a
crash I could stimulate with heavy TB invalidation. The final patch is
a squash patch from Emilio's QHT tree which provides a QemuSpinLock
which is used by the atomic patch later on.
The next 2 introduce a few more atomic primitives which I use later
on in the series.
The next 2 patches are concerned with async work. The first cleans up
the existing async work to pass CPUState which minimises the need to
malloc structures later on. The new async_run_safe_work_on_cpu has
been changed a bit from Fred's tree - it operates from a single queue
in an effort to ensure all deferred operations were handled in a
timely manner. There has also been an attempt to minimise the amount
of dynamic allocation done by using a pre-allocated array combined
with the dealt CPUState passing of earlier.
Then there are 2 patches which take advantage of this functionality
are a few cputlb flush routines as well as the translation buffer
overflow case.
The final patches involve architecture specific changes to ensure the
ARM flush operations use the async'd cputlb functions. The final STREX
patches are a temporary fix for atomicity which I've put at the end of
the series so Alvise can easily drop them for his LL/SC based approach.
The last patch makes MTTCG the default for the common case of running
ARMv7 on an x86 backend. I know there is debate about the benefit of
having a control knob for MTTCG but certainly while developing it is
handy to have. There are also cases where MTTCG will be incompatible
with other features such as record/replay.
Benchmarks
==========
The benchmark is a simple boot and build test which builds stress-ng
with -j ${NR_CPUS} and shuts down to facilitate easy repetition.
arm-softmmu/qemu-system-arm -machine type=virt -display none -m 4096 \
-cpu cortex-a15 -serial telnet:127.0.0.1:4444 \
-monitor stdio -netdev user,id=unet,hostfwd=tcp::2222-:22 \
-device virtio-net -device,netdev=unet \
-drive
file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none \
-device virtio-blk-device,drive=myblock
-append "console=ttyAMA0 systemd.unit=benchmark-build.service
root=/dev/vda1"
-kernel /home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img
| -smp 1 (mttcg=off) | -smp 4 (mttcg=off) | -smp 4 (mttcg=on) |
|--------------------+--------------------+-------------------|
| 301.60 (5 runs) | 312.27 (4 runs) | 573.26 (5 runs) |
As the results show currently the performance for mttcg is worse than
the single threaded version. However this tree doesn't have the
lockless tb_find_fast which means every time there is a transition
from one page to the next the lock needs to be taken. There is still
work to be done for performance ;-)
Alex Bennée (5):
qemu-thread: add simple test-and-set spinlock
atomic: introduce atomic_dec_fetch.
atomic: introduce cmpxchg_bool
cpus: pass CPUState to run_on_cpu helpers
cpus: default MTTCG to on for 32 bit ARM on x86
KONRAD Frederic (5):
cpus: introduce async_safe_run_on_cpu.
cputlb: introduce tlb_flush_* async work.
translate-all: introduces tb_flush_safe.
arm: use tlb_flush_page_all for tlbimva[a]
arm: atomically check the exclusive value in a STREX
Paolo Bonzini (1):
include: move CPU-related definitions out of qemu-common.h
Sergey Fedorov (1):
tcg/i386: Make direct jump patching thread-safe
cpu-exec-common.c | 1 +
cpu-exec.c | 11 ++++
cpus.c | 137 +++++++++++++++++++++++++++++++++++++++++-----
cputlb.c | 61 ++++++++++++++++-----
hw/i386/kvm/apic.c | 3 +-
hw/i386/kvmvapic.c | 8 +--
hw/ppc/ppce500_spin.c | 3 +-
hw/ppc/spapr.c | 6 +-
hw/ppc/spapr_hcall.c | 12 ++--
include/exec/exec-all.h | 7 ++-
include/qemu-common.h | 24 --------
include/qemu/atomic.h | 15 +++++
include/qemu/processor.h | 28 ++++++++++
include/qemu/thread.h | 34 ++++++++++++
include/qemu/timer.h | 1 +
include/qom/cpu.h | 34 +++++++++++-
include/sysemu/cpus.h | 13 +++++
kvm-all.c | 20 +++----
stubs/cpu-get-icount.c | 1 +
target-arm/cpu.c | 21 +++++++
target-arm/cpu.h | 6 ++
target-arm/helper.c | 28 ++++++----
target-arm/helper.h | 6 ++
target-arm/op_helper.c | 130 ++++++++++++++++++++++++++++++++++++++++++-
target-arm/translate.c | 96 ++++++--------------------------
target-i386/helper.c | 3 +-
target-i386/kvm.c | 6 +-
target-s390x/cpu.c | 4 +-
target-s390x/cpu.h | 7 +--
tcg/i386/tcg-target.inc.c | 17 ++++++
translate-all.c | 34 +++++++++---
translate-common.c | 1 +
vl.c | 1 +
33 files changed, 582 insertions(+), 197 deletions(-)
create mode 100644 include/qemu/processor.h
--
2.7.4
- [Qemu-devel] [RFC v1 00/12] Enable MTTCG for 32 bit arm on x86,
Alex Bennée <=
- [Qemu-devel] [RFC v1 02/12] tcg/i386: Make direct jump patching thread-safe, Alex Bennée, 2016/04/15
- [Qemu-devel] [RFC v1 01/12] include: move CPU-related definitions out of qemu-common.h, Alex Bennée, 2016/04/15
- [Qemu-devel] [RFC v1 00/12] Enable MTTCG for 32 bit arm on x86, Alex Bennée, 2016/04/15
- [Qemu-devel] [RFC v1 03/12] qemu-thread: add simple test-and-set spinlock, Alex Bennée, 2016/04/15
- [Qemu-devel] [RFC v1 04/12] atomic: introduce atomic_dec_fetch., Alex Bennée, 2016/04/15
- [Qemu-devel] [RFC v1 05/12] atomic: introduce cmpxchg_bool, Alex Bennée, 2016/04/15
- [Qemu-devel] [RFC v1 07/12] cpus: introduce async_safe_run_on_cpu., Alex Bennée, 2016/04/15
- [Qemu-devel] [RFC v1 11/12] arm: atomically check the exclusive value in a STREX, Alex Bennée, 2016/04/15