Re: [Qemu-devel] [RFC 05/38] thread-posix: inline qemu

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 05/38] thread-posix: inline qemu_spin functions

From:	Emilio G. Cota
Subject:	Re: [Qemu-devel] [RFC 05/38] thread-posix: inline qemu_spin functions
Date:	Mon, 24 Aug 2015 22:30:03 -0400
User-agent:	Mutt/1.5.21 (2010-09-15)

On Sun, Aug 23, 2015 at 18:04:46 -0700, Paolo Bonzini wrote:
> On 23/08/2015 17:23, Emilio G. Cota wrote:
> > On some parallel workloads this gives up to a 15% speed improvement.
> > 
> > Signed-off-by: Emilio G. Cota <address@hidden>
> > ---
> >  include/qemu/thread-posix.h | 47 ++++++++++++++++++++++++++++++++++++++++++
> >  include/qemu/thread.h       |  6 ------
> >  util/qemu-thread-posix.c    | 50 
> > +++++----------------------------------------
> >  3 files changed, 52 insertions(+), 51 deletions(-)
(snip)
> Applied, but in the end the spinlock will probably simply use a simple
> test-and-test-and-set lock, or an MCS lock.  There is no need to use
> pthreads for this.

Agreed.

In fact in my tests I sometimes use concurrencykit [http://concurrencykit.org/] 
to
test lock alternatives (would love to be able to just add ck as a submodule
of qemu, but they do not support as many architectures as qemu does).

Note that fair locks (such as MCS) for user-space programs are not
necessarily a good idea when preemption is considered--and for usermode
we'd be forced (if we allowed MCS's to nest) to use per-lock stack variables
given that the number of threads is unbounded, which is pretty ugly.

If contention is a problem, a simple, fast spinlock combined with an exponential
backoff is already pretty good. Fairness is not a requirement (the cache
substrate of a NUMA machine isn't necessarily fair, is it?); scalability is.
If the algorithm in the guest requires fairness, the guest must use a fair lock
(e.g. MCS), and that works as intended when run natively or under qemu.

I just tested a fetch-and-swap+exp.backoff spinlock with usermode on a
program that spawns N threads and each thread performs an 2**M atomic increments
on the same variable. That is, a degenerate worst-case kind of contention.
N varies from 1 to 64, and M=15 on all runs, 5 runs per experiment:

  http://imgur.com/XpYctyT
  With backoff, the per-access latency grows roughly linearly with the number of
  cores, i.e. this is scalable. The other two are clearly superlinear.

The fastest spinlock as per ck's documentation (for uncontended cases) is
the fetch-and-swap lock. I just re-ran the usermode experiments from yesterday
with fas and fas+exp.backoff:

  http://imgur.com/OK2WZg8
  There really isn't much difference among the three candidates.

In light of these results I see very little against going for a solution
with exponential backoff.

Thanks,

                Emilio

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [RFC 04/38] translate-all: remove volatile from have_tb_lock, (continued)
- [Qemu-devel] [RFC 04/38] translate-all: remove volatile from have_tb_lock, Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 06/38] seqlock: add missing 'inline' to seqlock_read_retry, Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 02/38] hw/i386/kvmvapic: add missing include of tcg.h, Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 01/38] cpu-exec: add missing mmap_lock in tb_find_slow, Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 10/38] translate-all: remove obsolete comment about l1_map, Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 18/38] tcg: add fences, Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 12/38] linux-user: call rcu_(un)register_thread on pthread_(exit|create), Emilio G. Cota, 2015/08/23
  - Re: [Qemu-devel] [RFC 12/38] linux-user: call rcu_(un)register_thread on pthread_(exit|create), Emilio G. Cota, 2015/08/24
- [Qemu-devel] [RFC 05/38] thread-posix: inline qemu_spin functions, Emilio G. Cota, 2015/08/23
  - Re: [Qemu-devel] [RFC 05/38] thread-posix: inline qemu_spin functions, Paolo Bonzini, 2015/08/24
    - Re: [Qemu-devel] [RFC 05/38] thread-posix: inline qemu_spin functions, Emilio G. Cota <=
    - Re: [Qemu-devel] [RFC 05/38] thread-posix: inline qemu_spin functions, Emilio G. Cota, 2015/08/25
    - Re: [Qemu-devel] [RFC 05/38] thread-posix: inline qemu_spin functions, Paolo Bonzini, 2015/08/25
- [Qemu-devel] [RFC 11/38] qemu-thread: handle spurious futex_wait wakeups, Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 13/38] cputlb: add physical address to CPUTLBEntry, Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 14/38] softmmu: add helpers to get ld/st physical addresses, Emilio G. Cota, 2015/08/23
  - Re: [Qemu-devel] [RFC 14/38] softmmu: add helpers to get ld/st physical addresses, Paolo Bonzini, 2015/08/24
    - Re: [Qemu-devel] [RFC 14/38] softmmu: add helpers to get ld/st physical addresses, Emilio G. Cota, 2015/08/24
- [Qemu-devel] [RFC 17/38] aie: add target helpers, Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 15/38] radix-tree: add generic lockless radix tree module, Emilio G. Cota, 2015/08/23
- [Qemu-devel] [RFC 16/38] aie: add module for Atomic Instruction Emulation, Emilio G. Cota, 2015/08/23

Prev by Date: Re: [Qemu-devel] [PATCH v7 RESEND 10/11] filter/buffer: update command description and help
Next by Date: Re: [Qemu-devel] [RFC PATCH v0 2/3] spapr-rtas: Enable rtas_set_indicator() to return correct error
Previous by thread: Re: [Qemu-devel] [RFC 05/38] thread-posix: inline qemu_spin functions
Next by thread: Re: [Qemu-devel] [RFC 05/38] thread-posix: inline qemu_spin functions
Index(es):
- Date
- Thread