[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC 05/38] thread-posix: inline qemu_spin functions
From: |
Emilio G. Cota |
Subject: |
Re: [Qemu-devel] [RFC 05/38] thread-posix: inline qemu_spin functions |
Date: |
Mon, 24 Aug 2015 22:30:03 -0400 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Sun, Aug 23, 2015 at 18:04:46 -0700, Paolo Bonzini wrote:
> On 23/08/2015 17:23, Emilio G. Cota wrote:
> > On some parallel workloads this gives up to a 15% speed improvement.
> >
> > Signed-off-by: Emilio G. Cota <address@hidden>
> > ---
> > include/qemu/thread-posix.h | 47 ++++++++++++++++++++++++++++++++++++++++++
> > include/qemu/thread.h | 6 ------
> > util/qemu-thread-posix.c | 50
> > +++++----------------------------------------
> > 3 files changed, 52 insertions(+), 51 deletions(-)
(snip)
> Applied, but in the end the spinlock will probably simply use a simple
> test-and-test-and-set lock, or an MCS lock. There is no need to use
> pthreads for this.
Agreed.
In fact in my tests I sometimes use concurrencykit [http://concurrencykit.org/]
to
test lock alternatives (would love to be able to just add ck as a submodule
of qemu, but they do not support as many architectures as qemu does).
Note that fair locks (such as MCS) for user-space programs are not
necessarily a good idea when preemption is considered--and for usermode
we'd be forced (if we allowed MCS's to nest) to use per-lock stack variables
given that the number of threads is unbounded, which is pretty ugly.
If contention is a problem, a simple, fast spinlock combined with an exponential
backoff is already pretty good. Fairness is not a requirement (the cache
substrate of a NUMA machine isn't necessarily fair, is it?); scalability is.
If the algorithm in the guest requires fairness, the guest must use a fair lock
(e.g. MCS), and that works as intended when run natively or under qemu.
I just tested a fetch-and-swap+exp.backoff spinlock with usermode on a
program that spawns N threads and each thread performs an 2**M atomic increments
on the same variable. That is, a degenerate worst-case kind of contention.
N varies from 1 to 64, and M=15 on all runs, 5 runs per experiment:
http://imgur.com/XpYctyT
With backoff, the per-access latency grows roughly linearly with the number of
cores, i.e. this is scalable. The other two are clearly superlinear.
The fastest spinlock as per ck's documentation (for uncontended cases) is
the fetch-and-swap lock. I just re-ran the usermode experiments from yesterday
with fas and fas+exp.backoff:
http://imgur.com/OK2WZg8
There really isn't much difference among the three candidates.
In light of these results I see very little against going for a solution
with exponential backoff.
Thanks,
Emilio
- [Qemu-devel] [RFC 04/38] translate-all: remove volatile from have_tb_lock, (continued)
- [Qemu-devel] [RFC 04/38] translate-all: remove volatile from have_tb_lock, Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 06/38] seqlock: add missing 'inline' to seqlock_read_retry, Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 02/38] hw/i386/kvmvapic: add missing include of tcg.h, Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 01/38] cpu-exec: add missing mmap_lock in tb_find_slow, Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 10/38] translate-all: remove obsolete comment about l1_map, Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 18/38] tcg: add fences, Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 12/38] linux-user: call rcu_(un)register_thread on pthread_(exit|create), Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 05/38] thread-posix: inline qemu_spin functions, Emilio G. Cota, 2015/08/23
[Qemu-devel] [RFC 11/38] qemu-thread: handle spurious futex_wait wakeups, Emilio G. Cota, 2015/08/23
[Qemu-devel] [RFC 13/38] cputlb: add physical address to CPUTLBEntry, Emilio G. Cota, 2015/08/23
[Qemu-devel] [RFC 14/38] softmmu: add helpers to get ld/st physical addresses, Emilio G. Cota, 2015/08/23
[Qemu-devel] [RFC 17/38] aie: add target helpers, Emilio G. Cota, 2015/08/23
[Qemu-devel] [RFC 15/38] radix-tree: add generic lockless radix tree module, Emilio G. Cota, 2015/08/23
[Qemu-devel] [RFC 16/38] aie: add module for Atomic Instruction Emulation, Emilio G. Cota, 2015/08/23