[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [RFC v1 00/12] Enable MTTCG for 32 bit arm on x86

From: Alex Bennée
Subject: [Qemu-devel] [RFC v1 00/12] Enable MTTCG for 32 bit arm on x86
Date: Fri, 15 Apr 2016 15:23:39 +0100


The finally completes the re-build of Fred's multi_tcg_v8 tree by
enabling MTTCG for armv7 guests on x86 hosts. This applies on top of
the previous series:

  - [RFC v2 00/11] Base enabling patches for MTTCG

You can find the final tree at:


which builds on:


which includes Sergey's:


I've tested this with a Debian Jessie guest as well as my extensive
MTTCG focused torture tests built on kvm-unit-tests.

Series Breakdown

The first 3 patches have been cherry-picked from other series and can
be skipped while reviewing. Paolo's is just some simple header
house-cleaning that made the run_on_cpu changes easier. Sergey's
thread safe patching is being reviewed elsewhere but does prevent a
crash I could stimulate with heavy TB invalidation. The final patch is
a squash patch from Emilio's QHT tree which provides a QemuSpinLock
which is used by the atomic patch later on.

The next 2 introduce a few more atomic primitives which I use later
on in the series.

The next 2 patches are concerned with async work. The first cleans up
the existing async work to pass CPUState which minimises the need to
malloc structures later on. The new async_run_safe_work_on_cpu has
been changed a bit from Fred's tree - it operates from a single queue
in an effort to ensure all deferred operations were handled in a
timely manner. There has also been an attempt to minimise the amount
of dynamic allocation done by using a pre-allocated array combined
with the dealt CPUState passing of earlier.

Then there are 2 patches which take advantage of this functionality
are a few cputlb flush routines as well as the translation buffer
overflow case.

The final patches involve architecture specific changes to ensure the
ARM flush operations use the async'd cputlb functions. The final STREX
patches are a temporary fix for atomicity which I've put at the end of
the series so Alvise can easily drop them for his LL/SC based approach.

The last patch makes MTTCG the default for the common case of running
ARMv7 on an x86 backend. I know there is debate about the benefit of
having a control knob for MTTCG but certainly while developing it is
handy to have. There are also cases where MTTCG will be incompatible
with other features such as record/replay.


The benchmark is a simple boot and build test which builds stress-ng
with -j ${NR_CPUS} and shuts down to facilitate easy repetition.

arm-softmmu/qemu-system-arm -machine type=virt -display none -m 4096 \
    -cpu cortex-a15 -serial telnet: \
    -monitor stdio -netdev user,id=unet,hostfwd=tcp::2222-:22 \
    -device virtio-net -device,netdev=unet \
file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none \
    -device virtio-blk-device,drive=myblock
    -append "console=ttyAMA0 systemd.unit=benchmark-build.service 
    -kernel /home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img

| -smp 1 (mttcg=off) | -smp 4 (mttcg=off) | -smp 4 (mttcg=on) |
| 301.60 (5 runs)    | 312.27 (4 runs)    |  573.26 (5 runs)  |

As the results show currently the performance for mttcg is worse than
the single threaded version. However this tree doesn't have the
lockless tb_find_fast which means every time there is a transition
from one page to the next the lock needs to be taken. There is still
work to be done for performance ;-)

Alex Bennée (5):
  qemu-thread: add simple test-and-set spinlock
  atomic: introduce atomic_dec_fetch.
  atomic: introduce cmpxchg_bool
  cpus: pass CPUState to run_on_cpu helpers
  cpus: default MTTCG to on for 32 bit ARM on x86

KONRAD Frederic (5):
  cpus: introduce async_safe_run_on_cpu.
  cputlb: introduce tlb_flush_* async work.
  translate-all: introduces tb_flush_safe.
  arm: use tlb_flush_page_all for tlbimva[a]
  arm: atomically check the exclusive value in a STREX

Paolo Bonzini (1):
  include: move CPU-related definitions out of qemu-common.h

Sergey Fedorov (1):
  tcg/i386: Make direct jump patching thread-safe

 cpu-exec-common.c         |   1 +
 cpu-exec.c                |  11 ++++
 cpus.c                    | 137 +++++++++++++++++++++++++++++++++++++++++-----
 cputlb.c                  |  61 ++++++++++++++++-----
 hw/i386/kvm/apic.c        |   3 +-
 hw/i386/kvmvapic.c        |   8 +--
 hw/ppc/ppce500_spin.c     |   3 +-
 hw/ppc/spapr.c            |   6 +-
 hw/ppc/spapr_hcall.c      |  12 ++--
 include/exec/exec-all.h   |   7 ++-
 include/qemu-common.h     |  24 --------
 include/qemu/atomic.h     |  15 +++++
 include/qemu/processor.h  |  28 ++++++++++
 include/qemu/thread.h     |  34 ++++++++++++
 include/qemu/timer.h      |   1 +
 include/qom/cpu.h         |  34 +++++++++++-
 include/sysemu/cpus.h     |  13 +++++
 kvm-all.c                 |  20 +++----
 stubs/cpu-get-icount.c    |   1 +
 target-arm/cpu.c          |  21 +++++++
 target-arm/cpu.h          |   6 ++
 target-arm/helper.c       |  28 ++++++----
 target-arm/helper.h       |   6 ++
 target-arm/op_helper.c    | 130 ++++++++++++++++++++++++++++++++++++++++++-
 target-arm/translate.c    |  96 ++++++--------------------------
 target-i386/helper.c      |   3 +-
 target-i386/kvm.c         |   6 +-
 target-s390x/cpu.c        |   4 +-
 target-s390x/cpu.h        |   7 +--
 tcg/i386/tcg-target.inc.c |  17 ++++++
 translate-all.c           |  34 +++++++++---
 translate-common.c        |   1 +
 vl.c                      |   1 +
 33 files changed, 582 insertions(+), 197 deletions(-)
 create mode 100644 include/qemu/processor.h


reply via email to

[Prev in Thread] Current Thread [Next in Thread]