Re: [Qemu-devel] [PATCH 08/30] rcu: add rcu library

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 08/30] rcu: add rcu library

From:	Jan Kiszka
Subject:	Re: [Qemu-devel] [PATCH 08/30] rcu: add rcu library
Date:	Mon, 01 Jul 2013 11:47:06 +0200
User-agent:	Mozilla/5.0 (X11; U; Linux i686 (x86_64); de; rv:1.8.1.12) Gecko/20080226 SUSE/2.0.0.12-1.1 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666

On 2013-06-28 20:26, Paolo Bonzini wrote:
> This includes a (mangled) copy of the urcu-qsbr code from liburcu.
> The main changes are: 1) removing dependencies on many other header files
> in liburcu; 2) removing for simplicity the tentative busy waiting in
> synchronize_rcu, which has limited performance effects; 3) replacing
> futexes in synchronize_rcu with QemuEvents for Win32 portability.
> The API is the same as liburcu, so it should be possible in the future
> to require liburcu on POSIX systems for example and use our copy only
> on Windows.
> 
> Among the various versions available I chose urcu-qsbr, which has the
> fastest rcu_read_{lock,unlock} but requires the program to manually
> annotate quiescent points or intervals.  QEMU threads usually have easily
> identified quiescent periods, so this should not be a problem.
> 
> Signed-off-by: Paolo Bonzini <address@hidden>
> ---
>  docs/rcu.txt               | 303 
> +++++++++++++++++++++++++++++++++++++++++++++
>  hw/9pfs/virtio-9p-synth.c  |   1 +
>  include/qemu/queue.h       |  13 ++
>  include/qemu/rcu-pointer.h | 110 ++++++++++++++++
>  include/qemu/rcu.h         | 141 +++++++++++++++++++++
>  include/qemu/thread.h      |   3 -
>  libcacard/Makefile         |   3 +-
>  util/Makefile.objs         |   1 +
>  util/rcu.c                 | 195 +++++++++++++++++++++++++++++
>  9 files changed, 766 insertions(+), 4 deletions(-)
>  create mode 100644 docs/rcu.txt
>  create mode 100644 include/qemu/rcu-pointer.h
>  create mode 100644 include/qemu/rcu.h
>  create mode 100644 util/rcu.c
> 
> diff --git a/docs/rcu.txt b/docs/rcu.txt
> new file mode 100644
> index 0000000..4869ec7
> --- /dev/null
> +++ b/docs/rcu.txt
> @@ -0,0 +1,303 @@
> +Using RCU (Read-Copy-Update) for synchronization
> +================================================
> +
> +Read-copy update (RCU) is a synchronization mechanism that is used to
> +protect read-mostly data structures.  RCU is very efficient and scalable
> +on the read side (it is wait-free), and thus can make the read paths
> +extremely fast.
> +
> +RCU supports concurrency between a single writer and multiple readers,
> +thus it is not used alone.  Typically, the write-side will use a lock to
> +serialize multiple updates, but other approaches are possible (e.g.,
> +restricting updates to a single task).  In QEMU, when a lock is used,
> +this will often be the "iothread mutex", also known as the "big QEMU
> +lock" (BQL).  Also, restricting updates to a single task is done in
> +QEMU using the "bottom half" API.
> +
> +RCU is fundamentally a "wait-to-finish" mechanism.  The read side marks
> +sections of code with "critical sections", and the update side will wait
> +for the execution of all *currently running* critical sections before
> +proceeding, or before asynchronously executing a callback.
> +
> +The key point here is that only the currently running critical sections
> +are waited for; critical sections that are started _after_ the beginning
> +of the wait do not extend the wait, despite running concurrently with
> +the updater.  This is the reason why RCU is more scalable than,
> +for example, reader-writer locks.  It is so much more scalable that
> +the system will have a single instance of the RCU mechanism; a single
> +mechanism can be used for an arbitrary number of "things", without
> +having to worry about things such as contention or deadlocks.
> +
> +How is this possible?  The basic idea is to split updates in two phases,
> +"removal" and "reclamation".  During removal, we ensure that subsequent
> +readers will not be able to get a reference to the old data.  After
> +removal has completed, a critical section will not be able to access
> +the old data.  Therefore, critical sections that begin after removal
> +do not matter; as soon as all previous critical sections have finished,
> +there cannot be any readers who hold references to the data structure,
> +which may not be safely reclaimed (e.g., freed or unref'ed).
> +
> +Here is a picutre:
> +
> +        thread 1                  thread 2                  thread 3
> +    -------------------    ------------------------    -------------------
> +    enter RCU crit.sec.
> +           |                finish removal phase
> +           |                begin wait
> +           |                      |                    enter RCU crit.sec.
> +    exit RCU crit.sec             |                           |
> +                            complete wait                     |
> +                            begin reclamation phase           |
> +                                                       exit RCU crit.sec.
> +
> +
> +Note how thread 3 is still executing its critical section when thread 2
> +starts reclaiming data.  This is possible, because the old version of the
> +data structure was not accessible at the time thread 3 began executing
> +that critical section.
> +
> +
> +RCU API
> +=======
> +
> +The core RCU API is small:
> +
> +     void rcu_read_lock(void);
> +
> +        Used by a reader to inform the reclaimer that the reader is
> +        entering an RCU read-side critical section.
> +
> +     void rcu_read_unlock(void);
> +
> +        Used by a reader to inform the reclaimer that the reader is
> +        exiting an RCU read-side critical section.  Note that RCU
> +        read-side critical sections may be nested and/or overlapping.
> +
> +     void synchronize_rcu(void);
> +
> +        Blocks until all pre-existing RCU read-side critical sections
> +        on all threads have completed.  This marks the end of the removal
> +        phase and the beginning of reclamation phase.
> +
> +        Note that it would be valid for another update to come while
> +        synchronize_rcu is running.  Because of this, it is better that
> +        the updater releases any locks it may hold before calling
> +        synchronize_rcu.
> +
> +     typeof(*p) rcu_dereference(p);
> +     typeof(p) rcu_assign_pointer(p, typeof(p) v);
> +
> +        These macros are similar to atomic_mb_read() and atomic_mb_set()
> +        respectively.  However, they make some assumptions on the code
> +        that calls them, which allows a more optimized implementation.
> +
> +        rcu_assign_pointer assumes that the update side is not going
> +        to read from the data structure after "publishing" the new
> +        values; that is, it assumes that all assignments happen at
> +        the very end of the removal phase.
> +
> +        rcu_dereference assumes that whenever a single RCU critical
> +        section reads multiple shared data, these reads are either
> +        data-dependent or need no ordering.  This is almost always the
> +        case when using RCU.  If this were not the case, you can use
> +        atomic_mb_read() or smp_rmb().
> +
> +        If you are going to be fetching multiple fields from the
> +        RCU-protected structure, repeated rcu_dereference() calls
> +        would look ugly and incur unnecessary overhead on Alpha CPUs.
> +        You can then do this:
> +
> +        p = &rcu_dereference(head);
> +        foo = head->foo;
> +        bar = head->bar;
> +
> +
> +RCU QUIESCENT STATES
> +====================
> +
> +An efficient implementation of rcu_read_lock() and rcu_read_unlock()
> +relies on the availability of fast thread-local storage.  Unfortunately,
> +this is not possible on all the systems supported by QEMU (in particular
> +on many POSIX systems other than Linux and Solaris).
> +
> +For this reason, QEMU's RCU implementation resorts to manual annotation
> +of "quiescent states", i.e. points where no RCU read-side critical
> +section can be active.  All threads that participate in the RCU mechanism
> +need to annotate such points.

"... can be active" also implies that

rcu_read_lock();
rcu_thread_offline();

is actually an illegal ordering, right? Can we add some optional debug
code that performs runtime checking of this requirements? That should
also catch (sooner or later) leaking RCU locks.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH 08/30] rcu: add rcu library, Jan Kiszka <=

Prev by Date: [Qemu-devel] [PATCH 0/2] Fast Thread-Local Storage support
Next by Date: Re: [Qemu-devel] [PATCH 1/2] exec: do not use qemu/tls.h
Previous by thread: [Qemu-devel] [PATCH 0/2] Fast Thread-Local Storage support
Next by thread: Re: [Qemu-devel] [PATCH] semaphore: fix a hangup problem under load on NetBSD hosts.
Index(es):
- Date
- Thread