[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v3 3/3] tcg: add perfmap and jitdump
From: |
Ilya Leoshkevich |
Subject: |
Re: [PATCH v3 3/3] tcg: add perfmap and jitdump |
Date: |
Wed, 11 Jan 2023 16:06:09 +0100 |
User-agent: |
Evolution 3.46.2 (3.46.2-1.fc37) |
On Wed, 2023-01-11 at 02:47 +0100, Ilya Leoshkevich wrote:
> Add ability to dump /tmp/perf-<pid>.map and jit-<pid>.dump.
> The first one allows the perf tool to map samples to each individual
> translation block. The second one adds the ability to resolve symbol
> names, line numbers and inspect JITed code.
>
> Example of use:
>
> perf record qemu-x86_64 -perfmap ./a.out
> perf report
>
> or
>
> perf record -k 1 qemu-x86_64 -jitdump ./a.out
> DEBUGINFOD_URLS= perf inject -j -i perf.data -o perf.data.jitted
> perf report -i perf.data.jitted
>
> Co-developed-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> Co-developed-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> ---
> accel/tcg/meson.build | 1 +
> accel/tcg/perf.c | 366
> ++++++++++++++++++++++++++++++++++++++
> accel/tcg/perf.h | 49 +++++
> accel/tcg/translate-all.c | 8 +
> docs/devel/tcg.rst | 23 +++
> linux-user/exit.c | 2 +
> linux-user/main.c | 15 ++
> qemu-options.hx | 20 +++
> softmmu/vl.c | 11 ++
> tcg/tcg.c | 2 +
> 10 files changed, 497 insertions(+)
> create mode 100644 accel/tcg/perf.c
> create mode 100644 accel/tcg/perf.h
...
> +void perf_report_code(unsigned long long guest_pc, size_t icount,
> + const void *start, size_t size)
> +{
> + struct debuginfo_query *q;
> + size_t insn;
> +
> + if (!perfmap && !jitdump) {
> + return;
> + }
> +
> + q = g_try_malloc0_n(icount, sizeof(*q));
> + if (!q) {
> + return;
> + }
> +
> + debuginfo_lock();
> +
> + /* Query debuginfo for each guest instruction. */
> + for (insn = 0; insn < icount; insn++) {
> + q[insn].address = tcg_ctx->gen_insn_data[insn][0] +
> + (TARGET_TB_PCREL ? guest_pc : 0);
Currently this produces plausibly looking, but actually wrong
addresses. This needs to match restore_state_to_opc(), so at least:
--- a/accel/tcg/perf.c
+++ b/accel/tcg/perf.c
@@ -325,8 +325,10 @@ void perf_report_code(unsigned long long guest_pc,
size_t icount,
/* Query debuginfo for each guest instruction. */
for (insn = 0; insn < icount; insn++) {
- q[insn].address = tcg_ctx->gen_insn_data[insn][0] +
- (TARGET_TB_PCREL ? guest_pc : 0);
+ q[insn].address = tcg_ctx->gen_insn_data[insn][0];
+ if (TARGET_TB_PCREL) {
+ q[insn].address |= (guest_pc & TARGET_PAGE_MASK);
+ }
q[insn].flags = DEBUGINFO_SYMBOL | (jitdump ? DEBUGINFO_LINE :
0);
}
debuginfo_query(q, icount);
Apparently even with this there are corner cases, e.g. in
x86_restore_state_to_opc() we have:
if (TARGET_TB_PCREL) {
env->eip = (env->eip & TARGET_PAGE_MASK) | data[0];
} else {
env->eip = data[0] - tb->cs_base;
}
so if tb->cs_base != 0, the result is still going to be wrong.
I wonder if it would make sense to create a new TCGCPUOps member
purely for resolving a PC from data[0]?