Re: [Qemu-devel] [RFC 00/38] MTTCG: i386, user+system mode

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 00/38] MTTCG: i386, user+system mode

From:	Emilio G. Cota
Subject:	Re: [Qemu-devel] [RFC 00/38] MTTCG: i386, user+system mode
Date:	Mon, 24 Aug 2015 16:16:27 -0400
User-agent:	Mutt/1.5.21 (2010-09-15)

On Mon, Aug 24, 2015 at 18:08:37 +0200, Artyom Tarasenko wrote:
> On Mon, Aug 24, 2015 at 2:23 AM, Emilio G. Cota <address@hidden> wrote:
> >   * tb_lock must be held every time code is generated. The rationale is
> >     that most of the time QEMU is executing code, not generating it.
> 
> While this is indeed true for an ideal case,  currently there are
> situations where it's not:
>  running a g++ process under qemu-system-sparc64 the comparable amount
> of time is spent on executing and generating the code [1].
> Does this lock imply the translation performance won't gain anything
> when emulating a single core machine on a multi-core one?
>
> 1. https://lists.gnu.org/archive/html/qemu-devel/2015-08/msg02194.html

AFAICT we can't say that's the desired TCG behavior, right? It seems we might
be translating more often than we should for sparc64:
  https://lists.gnu.org/archive/html/qemu-devel/2015-08/msg02531.html

I'll run multi-programmed workloads (i.e. several instances running
at the same time, for instance doing a 'make -j' kernel build) on x86
to see how far up translation can go--in general I'd expect any multi-programmed
workload in full-system mode to require more translations than a
multi-threaded one, since in the latter code is the same for all threads.

But really I'd only expect self-modifying code to be slow/non-scalable--and
I don't think we should worry about it too much.

If you can think of other workloads that might trigger more translations
than usual, please let me know.

> Does this lock imply the translation performance won't gain anything
> when emulating a single core machine on a multi-core one?

The goal so far has been to emulate each VCPU on its own thread; as you
can see in the perf results in this thread this provides huge perf
gains when emulating multi-core guests on large enough hosts.

Code generation is done by the VCPU threads as they need it, and for
that they need to hold a lock to prevent corrupting TCG data structures
--for instance there's a single hash of TB's, and a single code_gen_buffer.

So to answer your question: speeding up a single-core guest on a multi-core
host is not something we're trying to do. If you think about it,
*if* the premise that QEMU is mostly executing (and not translating) code
holds true (and I'd say it holds for most workloads), then one host thread
per VCPU is the right design.

Thanks,

                Emilio

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [RFC 33/38] cpu: introduce cpu_tcg_sched_work to run work while other CPUs sleep, (continued)
- [Qemu-devel] [RFC 33/38] cpu: introduce cpu_tcg_sched_work to run work while other CPUs sleep, Emilio G. Cota, 2015/08/23
  - Re: [Qemu-devel] [RFC 33/38] cpu: introduce cpu_tcg_sched_work to run work while other CPUs sleep, Paolo Bonzini, 2015/08/24
- [Qemu-devel] [RFC 21/38] target-i386: emulate atomic instructions + barriers using AIE, Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 38/38] Revert "target-i386: yield to another VCPU on PAUSE", Emilio G. Cota, 2015/08/23
  - Re: [Qemu-devel] [RFC 38/38] Revert "target-i386: yield to another VCPU on PAUSE", Paolo Bonzini, 2015/08/24
- [Qemu-devel] [RFC 37/38] cpus: remove async_run_safe_work_on_cpu, Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 32/38] cpu list: convert to RCU QLIST, Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 28/38] cpu-exec: use RCU to perform lockless TB lookups, Emilio G. Cota, 2015/08/23
- Re: [Qemu-devel] [RFC 00/38] MTTCG: i386, user+system mode, Paolo Bonzini, 2015/08/24
- Re: [Qemu-devel] [RFC 00/38] MTTCG: i386, user+system mode, Artyom Tarasenko, 2015/08/24
  - Re: [Qemu-devel] [RFC 00/38] MTTCG: i386, user+system mode, Emilio G. Cota <=

Prev by Date: Re: [Qemu-devel] [PATCHv2] block/nfs: cache allocated filesize for read-only files
Next by Date: Re: [Qemu-devel] [PATCH] target-i386: add a list of enforceable CPU models to the help output
Previous by thread: Re: [Qemu-devel] [RFC 00/38] MTTCG: i386, user+system mode
Next by thread: [Qemu-devel] [RFC PATCH v0] spapr: Disable memory hotplug when HTAB size is insufficient
Index(es):
- Date
- Thread