[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v6 0/7] trace: [tcg] Optimize per-vCPU tracing s
Re: [Qemu-devel] [PATCH v6 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches
Mon, 9 Jan 2017 17:04:34 +0000
On Wed, Dec 28, 2016 at 07:40:54PM +0100, Lluís Vilanova wrote:
> Optimizes tracing of events with the 'tcg' and 'vcpu' properties (e.g., memory
> accesses), making it feasible to statically enable them by default on all QEMU
> Some quick'n'dirty numbers with 400.perlbench (SPECcpu2006) on the train input
> (medium size - suns.pl) and the guest_mem_before event:
> * vanilla, statically disabled
> real 0m2,259s
> user 0m2,252s
> sys 0m0,004s
> * vanilla, statically enabled (overhead: 2.18x)
> real 0m4,921s
> user 0m4,912s
> sys 0m0,008s
> * multi-tb, statically disabled (overhead: 0.99x) [within noise range]
> real 0m2,228s
> user 0m2,216s
> sys 0m0,008s
> * multi-tb, statically enabled (overhead: 0.99x) [within noise range]
> real 0m2,229s
> user 0m2,224s
> sys 0m0,004s
> Right now, events with the 'tcg' property always generate TCG code to trace
> event at guest code execution time, where the event's dynamic state is
> This series adds a performance optimization where TCG code for events with the
> 'tcg' and 'vcpu' properties is not generated if the event is dynamically
> disabled. This optimization raises two issues:
> * An event can be dynamically disabled/enabled after the corresponding TCG
> has been generated (i.e., a new TB with the corresponding code should be
> * Each vCPU can have a different dynamic state for the same event (i.e.,
> the memory accesses of only one process pinned to a vCPU).
> To handle both issues, this series integrates the dynamic tracing event state
> into the TB hashing function, so that vCPUs tracing different events will use
> separate TBs. Note that only events with the 'vcpu' property are used for
> hashing (as stored in the bitmap of CPUState->trace_dstate).
> This makes dynamic event state changes on vCPUs very efficient, since they can
> use TBs produced by other vCPUs while on the same event state combination (or
> produced by the same vCPU, earlier).
> Discarded alternatives:
> * Emitting TCG code to check if an event needs tracing, where we should still
> move the tracing call code to either a cold path (making tracing performance
> worse), or leave it inlined (making non-tracing performance worse).
> * Eliding TCG code only when *zero* vCPUs are tracing an event, since enabling
> it on a single vCPU will impact the performance of all other vCPUs that are
> not tracing that event.
> Signed-off-by: Lluís Vilanova <address@hidden>
> Changes in v6
> * Check hashing size error with QEMU_BUILD_BUG_ON [Richard Henderson].
> Changes in v5
> * Move define into "qemu-common.h" to allow compilation of tests.
> Changes in v4
> * Incorporate trace_dstate into the TB hashing function instead of using
> multiple physical TB caches [suggested by Richard Henderson].
> Changes in v3
> * Rebase on 0737f32daf.
> * Do not use reserved symbol prefixes ("__") [Stefan Hajnoczi].
> * Refactor trace_get_vcpu_event_count() to be inlinable.
> * Optimize cpu_tb_cache_set_requested() (hottest path).
> Changes in v2
> * Fix bitmap copy in cpu_tb_cache_set_apply().
> * Split generated code re-alignment into a separate patch [Daniel P.
> Lluís Vilanova (7):
> exec: [tcg] Refactor flush of per-CPU virtual TB cache
> trace: Make trace_get_vcpu_event_count() inlinable
> trace: [tcg] Delay changes to dynamic state when translating
> exec: [tcg] Use different TBs according to the vCPU's dynamic tracing
> trace: [tcg] Do not generate TCG code to trace dinamically-disabled
> trace: [tcg,trivial] Re-align generated code
> trace: [trivial] Statically enable all guest events
> cpu-exec.c | 52
> cputlb.c | 2 +
> include/exec/exec-all.h | 11 ++++++
> include/exec/tb-hash-xx.h | 8 ++++-
> include/exec/tb-hash.h | 5 ++-
> include/qemu-common.h | 3 ++
> include/qom/cpu.h | 7 ++++
> qom/cpu.c | 4 ++
> scripts/tracetool/__init__.py | 1 +
> scripts/tracetool/backend/dtrace.py | 2 +
> scripts/tracetool/backend/ftrace.py | 20 ++++++------
> scripts/tracetool/backend/log.py | 17 +++++-----
> scripts/tracetool/backend/simple.py | 2 +
> scripts/tracetool/backend/syslog.py | 6 ++-
> scripts/tracetool/backend/ust.py | 2 +
> scripts/tracetool/format/h.py | 24 ++++++++++----
> scripts/tracetool/format/tcg_h.py | 19 +++++++++--
> scripts/tracetool/format/tcg_helper_c.py | 3 +-
> tests/qht-bench.c | 2 +
> trace-events | 6 ++-
> trace/control-internal.h | 5 +++
> trace/control-target.c | 14 +++++++-
> trace/control.c | 9 +----
> trace/control.h | 5 ++-
> translate-all.c | 30 +++++++++++++----
> 25 files changed, 195 insertions(+), 64 deletions(-)
> To: address@hidden
> Cc: Stefan Hajnoczi <address@hidden>
> Cc: Eduardo Habkost <address@hidden>
> Cc: Eric Blake <address@hidden>
The tracing aspects seem fine. I have left a comment regarding
I'll merge it once Richard Henderson has had time to review it from a
Description: PGP signature
|[Prev in Thread]
||[Next in Thread]|
- Re: [Qemu-devel] [PATCH v6 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches,
Stefan Hajnoczi <=