qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 3/3] tcg: add perfmap and jitdump


From: Ilya Leoshkevich
Subject: Re: [PATCH v3 3/3] tcg: add perfmap and jitdump
Date: Wed, 11 Jan 2023 16:06:09 +0100
User-agent: Evolution 3.46.2 (3.46.2-1.fc37)

On Wed, 2023-01-11 at 02:47 +0100, Ilya Leoshkevich wrote:
> Add ability to dump /tmp/perf-<pid>.map and jit-<pid>.dump.
> The first one allows the perf tool to map samples to each individual
> translation block. The second one adds the ability to resolve symbol
> names, line numbers and inspect JITed code.
> 
> Example of use:
> 
>     perf record qemu-x86_64 -perfmap ./a.out
>     perf report
> 
> or
> 
>     perf record -k 1 qemu-x86_64 -jitdump ./a.out
>     DEBUGINFOD_URLS= perf inject -j -i perf.data -o perf.data.jitted
>     perf report -i perf.data.jitted
> 
> Co-developed-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> Co-developed-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> ---
>  accel/tcg/meson.build     |   1 +
>  accel/tcg/perf.c          | 366
> ++++++++++++++++++++++++++++++++++++++
>  accel/tcg/perf.h          |  49 +++++
>  accel/tcg/translate-all.c |   8 +
>  docs/devel/tcg.rst        |  23 +++
>  linux-user/exit.c         |   2 +
>  linux-user/main.c         |  15 ++
>  qemu-options.hx           |  20 +++
>  softmmu/vl.c              |  11 ++
>  tcg/tcg.c                 |   2 +
>  10 files changed, 497 insertions(+)
>  create mode 100644 accel/tcg/perf.c
>  create mode 100644 accel/tcg/perf.h

...

> +void perf_report_code(unsigned long long guest_pc, size_t icount,
> +                      const void *start, size_t size)
> +{
> +    struct debuginfo_query *q;
> +    size_t insn;
> +
> +    if (!perfmap && !jitdump) {
> +        return;
> +    }
> +
> +    q = g_try_malloc0_n(icount, sizeof(*q));
> +    if (!q) {
> +        return;
> +    }
> +
> +    debuginfo_lock();
> +
> +    /* Query debuginfo for each guest instruction. */
> +    for (insn = 0; insn < icount; insn++) {
> +        q[insn].address = tcg_ctx->gen_insn_data[insn][0] +
> +                          (TARGET_TB_PCREL ? guest_pc : 0);

Currently this produces plausibly looking, but actually wrong
addresses. This needs to match restore_state_to_opc(), so at least:

--- a/accel/tcg/perf.c
+++ b/accel/tcg/perf.c
@@ -325,8 +325,10 @@ void perf_report_code(unsigned long long guest_pc,
size_t icount,
 
     /* Query debuginfo for each guest instruction. */
     for (insn = 0; insn < icount; insn++) {
-        q[insn].address = tcg_ctx->gen_insn_data[insn][0] +
-                          (TARGET_TB_PCREL ? guest_pc : 0);
+        q[insn].address = tcg_ctx->gen_insn_data[insn][0];
+        if (TARGET_TB_PCREL) {
+            q[insn].address |= (guest_pc & TARGET_PAGE_MASK);
+        }
         q[insn].flags = DEBUGINFO_SYMBOL | (jitdump ? DEBUGINFO_LINE :
0);
     }
     debuginfo_query(q, icount);

Apparently even with this there are corner cases, e.g. in
x86_restore_state_to_opc() we have:

    if (TARGET_TB_PCREL) {
        env->eip = (env->eip & TARGET_PAGE_MASK) | data[0];
    } else {
        env->eip = data[0] - tb->cs_base;
    }

so if tb->cs_base != 0, the result is still going to be wrong.

I wonder if it would make sense to create a new TCGCPUOps member
purely for resolving a PC from data[0]?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]