qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] trace: timestamps, core IDs, and file creation


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] trace: timestamps, core IDs, and file creation
Date: Mon, 15 Feb 2016 15:29:14 +0000
User-agent: Mutt/1.5.24 (2015-08-30)

On Mon, Feb 08, 2016 at 11:59:00AM -0800, Hollis Blanchard wrote:
> On 02/08/2016 07:30 AM, Stefan Hajnoczi wrote:
> >On Wed, Jan 13, 2016 at 03:13:02PM -0800, Hollis Blanchard wrote:
> >>The event timestamps are host time (get_clock()). I'm correlating qemu
> >>events with other logs (using icount), so host time is unhelpful. Could we
> >>use cpu_get_clock() instead? (Trace events are used in other tools like
> >>qemu-io, where guest time doesn't exist, and there we could continue to use
> >>get_clock().)
> >It must be possible to trace code while the guest CPUs are stopped, so
> >cpu_get_clock() on own breaks existing users.
> >
> >If the CPU clock is more convenient for you, perhaps you can add an
> >option to use that clocksource?
> 
> "Breaks"? Is that because every trace record is assumed to have a unique
> timestamp? Who makes that assumption? Qemu's tracing infrastructure works
> fine without it...

It breaks existing users in the sense that the timestamp will not be
valid while vCPUs are paused.  Today you can trace any part of QEMU and
the timestamps will be valid even while vCPUs are paused (e.g. migration
after vm_stop()).  Having valid timestamps is important for performance
analysis so the time between two trace events can be calculated.

> Hmm, an option. I guess if you want to debug Qemu behavior, like "why are
> block operations taking so long", seeing host time is the most important
> thing. However, if you want debug visibility into guest behavior, like "how
> long did I run with IRQs disabled", guest time is the most important thing.
> I'm not sure when a developer should be forced to make that decision...?

When invoking QEMU (qemu -trace clocksource=vcpu).  It's not ideal, but
should be good enough for using the simple trace backend during
development.  In this case the developer knows what information they
want from tracing so they can decide whether to use cpu time or wall
clock time at startup.

> >Regarding I/O emulation, we have to be careful because architecturally
> >some types of devices (e.g. PCI devices) don't have the concept of which
> >core is performing an action.  QEMU takes advantage of that with
> >ioeventfd where a lightweight vmexit simply marks an eventfd file
> >descriptor readable in the kernel and quickly returns back to vcpu
> >execution.  Another QEMU thread monitors the eventfd and processes the
> >notification and there is no concept of the current vcpu at that point.
> 
> Yeah, by "MMIO" I was thinking in particular of memory.c instrumentation,
> which almost always executes in the context of a vcpu.
> 
> I'm not sure a vcpu property could work there. cpu_physical_memory_rw() may
> be executed by a machine init function, before any VCPUs are running. To
> handle that, I have a hack like this in memory_region_read_accessor():
> 
>         int cpu_index = -1;
>         if (current_cpu)
>             cpu_index = current_cpu->cpu_index;
>         trace_memory_region_ops_read(cpu_index, mr, addr, tmp, size);
> 
> ... and then my analysis tools, which display guest activity, know to ignore
> events with cpu_index==-1.
> 
> I'm not sure if a vcpu property could or should do the same...?

Your cpu_index = -1 trick seems fine to me.  I'd merge it.

Another corner case: KVM's coalesced MMIO ring.  MMIO does not
necessarily happen in the vcpu context that caused it.  Most of the time
it will but I think SMP guests could violate this.

Guest MMIO writes to coalesced MMIO physical memory regions are
accumulated in the ring buffer instead of being dispatched by memory.c
right away.  QEMU flushes the ring eventually and executes the buffered
writes.  See kvm-all.c:kvm_flush_coalesced_mmio_buffer().

> >It might be easiest to include the vcpu id in the trace event explicitly
> >for now.
> 
> Mmm hmm. That could cause a lot of churn to the current set of trace events,
> but they seem to be mostly IO operations that execute outside VCPU context.
> 
> >>Right now, the simple tracepoint backend creates a trace file even when no
> >>events are active and no -trace options are present. Compounding the
> >>situation, omitting "-trace file=foo" results in the directory being
> >>littered with trace-NNNN files... could we create the file only when tracing
> >>is activated? And maybe with a more predictable name?
> >It should be possible to create the file only if traces are actually
> >generated.  I guess this wasn't done for simplicity and because the
> >write-out thread needs to empty the ring buffer as quickly as possible
> >to avoid dropping trace data.
> >
> >The names contains the QEMU pid.  Although the pid isn't predictable it
> >is useful for correlating with system logs, host performance tools, etc.
> >
> >You can override the name with -trace file=foo.log if you want.
> >
> >Do you have an idea for another automatic file naming scheme?
> 
> "qemu.trace" or something static. If the log is important, the user will
> copy it anyways to give it a descriptive name, e.g. "out_of_memory.trace".
> If the log is not important, it should be overwritten by the next run;
> otherwise the workflow becomes:
> 
> qemu-system-... args args args
> ls trace-*
> rm trace-NNNN
> qemu-system-... args args args
> ls trace-*
> rm trace-MMMM

Then use -trace file=qemu.trace :).

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]