qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/3] rcu: add option to use upstream liburcu


From: Emilio G. Cota
Subject: Re: [Qemu-devel] [PATCH 0/3] rcu: add option to use upstream liburcu
Date: Wed, 4 Feb 2015 16:01:56 -0500
User-agent: Mutt/1.5.21 (2010-09-15)

On Wed, Feb 04, 2015 at 11:32:57 +0100, Paolo Bonzini wrote:
> On 03/02/2015 23:08, Emilio G. Cota wrote:
> > * The first two patches bring back the RCU API to exactly
> >   match that of liburcu.
> 
> Bringing over rcu_dereference/rcu_assign_pointer is unnecessary, I
> think.  The names do not conflict.

On Wed, Feb 04, 2015 at 18:31:55 +0100, Paolo Bonzini wrote:
> On 04/02/2015 18:25, Emilio G. Cota wrote:
> > They're not exactly the same, otherwise the patch would be trivial.
> 
> You're right, I was imprecise---I meant they are interoperable.  You can
> use atomic_rcu_read/write together with liburcu, you do not need to use
> the liburcu-provided rcu_dereference/rcu_assign_pointer.

It's true that they can coexist. I'm just not sold on having two ways
of doing the same thing. It would make more sense to me to only divert
from the original API if there's a good reason to do so -- otherwise
an eventual merge (say after option a) discussed below) would be more
painful than necessary.

On Wed, Feb 04, 2015 at 11:32:57 +0100, Paolo Bonzini wrote:
> As to call_rcu/call_rcu1, I understand there is a problem.  Maybe call
> the QEMU versions rcu_call/rcu_call1?  Then you can add simple wrappers
> if you want to use liburcu-mb.

Again, adhering to the original API if possible makes more sense to me, to
prevent future confusion.

> > * The third patch adds a configure flag to choose from either
> >   liburcu or QEMU's RCU.
> > 
> > Apart from this, I wonder what to do about other valuable bits in
> > liburcu, particularly in liburcu-cds, which I'm using currently
> > off-tree.  I see three ways of eventually doing this:
> > 
> > a) Add Windows support in liburcu, thereby eliminating the
> >    de facto fork in QEMU.
> 
> This would be certainly doable.
> 
> Note that liburcu is not widely packaged, so we would have to add it as
> a submodule.  What is the advantage of using liburcu?

Better testing, (eventually) less work, less bugs, no code duplication,
ability to just merge new features from upstream.

Essentially the usual reasons against forking a project.

> > b) Bring (fork) liburcu-cds to QEMU, just like liburcu-mb was.
> 
> To some extent this is what we're doing.  cds is very much inspired by
> the Linux kernel, but QEMU is already using both BSD (intrusive) and
> GLib (non-intrusive) lists, and I didn't like the idea of adding yet
> another API.  I like the simplicity of the Linux hlist/list APIs, but
> two kinds of lists are already one too many.

Agreed.

> So, part 2 of the RCU work has an API for RCU lists based on BSD lists
> (that QEMU is already using).  These are not the lock-free data
> structures available in CDS, just the usual RCU-based lists with
> blocking write side and wait-free read side.
> 
> QEMU has very limited support for (non-RCU-based) lock-free data
> structures in qemu/queue.h: see QSLIST_INSERT_HEAD_ATOMIC and
> QSLIST_MOVE_ATOMIC.  The call_rcu implementation in util/rcu.c is based
> on wfqueue from liburcu-cds, but it would not be hard to change it to
> use QSLIST_INSERT_HEAD_ATOMIC/QSLIST_MOVE_ATOMIC instead.  In both cases
> the data structure is multi-producer/single-consumer.
> 
> 
> QEMU hardly uses hash tables at all.

I understand there's currently not much demand for these. I think however
they might become valuable in the future, provided we end up having
a multi-threaded TCG.

> Another thing to note is that I didn't envision a huge use of RCU in
> QEMU; I'm employing it in decidedly central places where it can bring
> great benefit, but I wouldn't be surprised if RCU only found a handful
> of users in the end.

If RCU's history in the linux kernel is of any guide, chances are RCU
will end up being used in more places than one could initially guess:

  http://www2.rdrop.com/~paulmck/techreports/RCUUsage.2013.02.24a.pdf

I think the same reasoning is likely to apply to the concurrent data
structures in liburcu-cds.

> Coarse locks with very low contention (such as AioContext) have great
> performance, and most data structures in QEMU fit this model: data
> structures for separate devices are more or less independent, and the
> lock is only needed for the rare cases when the main thread (for example
> the monitor) interacts with the device.

Agreed, this has worked well so far.

> > c) Add a compile-time flag (say CONFIG_LIBURCU_CDS), and then only
> >    use data structures from liburcu-cds where appropriate, falling
> >    back to traditional locked structures when !CONFIG_LIBURCU_CDS.
> > 
> > Would c) be acceptable for upstream, provided the gains (say in
> > scalability/memory footprint) are significant?
> 
> I think if there were a killer use of liburcu-cds, we would just port
> liburcu (the mb, bp and qsbr variants) to Windows, and switch to
> liburcu-mb/liburcu-cds.

Agreed.

> This brings the most important question of all: what are you doing with
> QEMU? :)

So far I've been tinkering with it as a frontend for an instruction-set
simulator. I've managed to make it scale (TCG-based "multicore on multicore")
very well up to dozens of cores by modifying qsim[1], which requires one
library file per guest core.

What I'm investigating now is how to do this in a manner that is palatable
to upstream. For this as it's well known we need a multi-threaded TCG,
and I believe quite a few bits from liburcu(-cds) might help to
get there.

So for now I'll keep the liburcu(-cds) compile option in my tree and
see how far I can get. Of course I'll keep fallback code using locks
in there so that we can measure the performance differences.

Thanks,

                Emilio

[1] https://github.com/cdkersey/qsim




reply via email to

[Prev in Thread] Current Thread [Next in Thread]