[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC 00/38] MTTCG: i386, user+system mode
From: |
Emilio G. Cota |
Subject: |
Re: [Qemu-devel] [RFC 00/38] MTTCG: i386, user+system mode |
Date: |
Mon, 24 Aug 2015 16:16:27 -0400 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Mon, Aug 24, 2015 at 18:08:37 +0200, Artyom Tarasenko wrote:
> On Mon, Aug 24, 2015 at 2:23 AM, Emilio G. Cota <address@hidden> wrote:
> > * tb_lock must be held every time code is generated. The rationale is
> > that most of the time QEMU is executing code, not generating it.
>
> While this is indeed true for an ideal case, currently there are
> situations where it's not:
> running a g++ process under qemu-system-sparc64 the comparable amount
> of time is spent on executing and generating the code [1].
> Does this lock imply the translation performance won't gain anything
> when emulating a single core machine on a multi-core one?
>
> 1. https://lists.gnu.org/archive/html/qemu-devel/2015-08/msg02194.html
AFAICT we can't say that's the desired TCG behavior, right? It seems we might
be translating more often than we should for sparc64:
https://lists.gnu.org/archive/html/qemu-devel/2015-08/msg02531.html
I'll run multi-programmed workloads (i.e. several instances running
at the same time, for instance doing a 'make -j' kernel build) on x86
to see how far up translation can go--in general I'd expect any multi-programmed
workload in full-system mode to require more translations than a
multi-threaded one, since in the latter code is the same for all threads.
But really I'd only expect self-modifying code to be slow/non-scalable--and
I don't think we should worry about it too much.
If you can think of other workloads that might trigger more translations
than usual, please let me know.
> Does this lock imply the translation performance won't gain anything
> when emulating a single core machine on a multi-core one?
The goal so far has been to emulate each VCPU on its own thread; as you
can see in the perf results in this thread this provides huge perf
gains when emulating multi-core guests on large enough hosts.
Code generation is done by the VCPU threads as they need it, and for
that they need to hold a lock to prevent corrupting TCG data structures
--for instance there's a single hash of TB's, and a single code_gen_buffer.
So to answer your question: speeding up a single-core guest on a multi-core
host is not something we're trying to do. If you think about it,
*if* the premise that QEMU is mostly executing (and not translating) code
holds true (and I'd say it holds for most workloads), then one host thread
per VCPU is the right design.
Thanks,
Emilio
- [Qemu-devel] [RFC 33/38] cpu: introduce cpu_tcg_sched_work to run work while other CPUs sleep, (continued)
- [Qemu-devel] [RFC 33/38] cpu: introduce cpu_tcg_sched_work to run work while other CPUs sleep, Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 21/38] target-i386: emulate atomic instructions + barriers using AIE, Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 38/38] Revert "target-i386: yield to another VCPU on PAUSE", Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 37/38] cpus: remove async_run_safe_work_on_cpu, Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 32/38] cpu list: convert to RCU QLIST, Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 28/38] cpu-exec: use RCU to perform lockless TB lookups, Emilio G. Cota, 2015/08/23
- Re: [Qemu-devel] [RFC 00/38] MTTCG: i386, user+system mode, Paolo Bonzini, 2015/08/24
- Re: [Qemu-devel] [RFC 00/38] MTTCG: i386, user+system mode, Artyom Tarasenko, 2015/08/24
- Re: [Qemu-devel] [RFC 00/38] MTTCG: i386, user+system mode,
Emilio G. Cota <=