qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 00/38] MTTCG: i386, user+system mode


From: Emilio G. Cota
Subject: Re: [Qemu-devel] [RFC 00/38] MTTCG: i386, user+system mode
Date: Mon, 24 Aug 2015 16:16:27 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

On Mon, Aug 24, 2015 at 18:08:37 +0200, Artyom Tarasenko wrote:
> On Mon, Aug 24, 2015 at 2:23 AM, Emilio G. Cota <address@hidden> wrote:
> >   * tb_lock must be held every time code is generated. The rationale is
> >     that most of the time QEMU is executing code, not generating it.
> 
> While this is indeed true for an ideal case,  currently there are
> situations where it's not:
>  running a g++ process under qemu-system-sparc64 the comparable amount
> of time is spent on executing and generating the code [1].
> Does this lock imply the translation performance won't gain anything
> when emulating a single core machine on a multi-core one?
>
> 1. https://lists.gnu.org/archive/html/qemu-devel/2015-08/msg02194.html

AFAICT we can't say that's the desired TCG behavior, right? It seems we might
be translating more often than we should for sparc64:
  https://lists.gnu.org/archive/html/qemu-devel/2015-08/msg02531.html

I'll run multi-programmed workloads (i.e. several instances running
at the same time, for instance doing a 'make -j' kernel build) on x86
to see how far up translation can go--in general I'd expect any multi-programmed
workload in full-system mode to require more translations than a
multi-threaded one, since in the latter code is the same for all threads.

But really I'd only expect self-modifying code to be slow/non-scalable--and
I don't think we should worry about it too much.

If you can think of other workloads that might trigger more translations
than usual, please let me know.

> Does this lock imply the translation performance won't gain anything
> when emulating a single core machine on a multi-core one?

The goal so far has been to emulate each VCPU on its own thread; as you
can see in the perf results in this thread this provides huge perf
gains when emulating multi-core guests on large enough hosts.

Code generation is done by the VCPU threads as they need it, and for
that they need to hold a lock to prevent corrupting TCG data structures
--for instance there's a single hash of TB's, and a single code_gen_buffer.

So to answer your question: speeding up a single-core guest on a multi-core
host is not something we're trying to do. If you think about it,
*if* the premise that QEMU is mostly executing (and not translating) code
holds true (and I'd say it holds for most workloads), then one host thread
per VCPU is the right design.

Thanks,

                Emilio




reply via email to

[Prev in Thread] Current Thread [Next in Thread]