Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->chi

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->chi

From:	Kevin Wolf
Subject:	Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock
Date:	Tue, 24 May 2022 14:10:04 +0200

Am 18.05.2022 um 14:28 hat Emanuele Giuseppe Esposito geschrieben:
> label: // read till the end to see why I wrote this here
> 
> I was hoping someone from the "No" party would answer to your question,
> because as you know we reached this same conclusion together.
> 
> We thought about dropping the drain for various reasons, the main one
> (at least as far as I understood) is that we are not sure if something
> can still happen in between drain_begin/end, and it is a little bit
> confusing to use the same mechanism to block I/O and protect the graph.
> 
> We then thought about implementing a rwlock. A rdlock would clarify what
> we are protecting and who is using the lock. I had a rwlock draft
> implementation sent in this thread, but this also lead to additional
> problems.
> Main problem was that this new lock would introduce nested event loops,
> that together with such locking would just create deadlocks.
> If readers are in coroutines and writers are not (because graph
> operations are not running in coroutines), we have a lot of deadlocks.
> If a writer has to take the lock, it must wait all other readers to
> finish. And it does it by internally calling AIO_WAIT_WHILE, creating
> nested event loop. We don't know what could execute when polling for
> events, and for example another writer could be resumed.

Why is this a problem? Your AIO_WAIT_WHILE() condition would be that
there are neither readers nor writers, so you would just keep waiting
until the other writer is done.

> Ideally, we want writers in coroutines too.
>
> Additionally, many readers are running in what we call "mixed"
> functions: usually implemented automatically with generated_co_wrapper
> tag, they let a function (usually bdrv callback) run always in a
> coroutine, creating one if necessary. For example, bdrv_flush() makes
> sure hat bs->bdrv_co_flush() is always run in a coroutine.
> Such mixed functions are used in other callbacks too, making it really
> difficult to understand if we are in a coroutine or not, and mostly
> important make rwlock usage very difficult.

How do they make rwlock usage difficult?

*goes back to old IRC discussions*

Ah, the problem was not the AIO_WAIT_WHILE() while taking the lock, but
taking the lock first and then running an AIO_WAIT_WHILE() inside the
locked section. This is forbidden because the callbacks that run during
AIO_WAIT_WHILE() may in turn wait for the lock that you own, causing a
deadlock.

This is indeed a problem that running in coroutines would avoid because
the inner waiter would just yield and the outer one could complete its
job as soon as it's its turn.

My conclusion in the IRC discussion was that maybe we need to take the
graph locks when we're entering coroutine context, i.e. the "mixed"
functions would rdlock the graph when called from non-coroutine context
and would assume that it's already locked when called from coroutine
context.

> Which lead us to stepping back once more and try to convert all
> BlockDriverState callbacks in coroutines. This would greatly simplify
> rwlock usage, because we could make the rwlock coroutine-frendly
> (without any AIO_WAIT_WHILE, allowing a writer to wait for readers by
> just yielding and queuing itself in coroutine queues).
> 
> First step was then to convert all callbacks in coroutines, using
> generated_coroutine_wrapper (g_c_w).
> A typical g_c_w is implemented in this way:
>       if (qemu_in_coroutine()) {
>               callback();
>       } else { // much simplified
>               co = qemu_coroutine_create(callback);
>               bdrv_coroutine_enter(bs, co);
>               BDRV_POLL_WHILE(bs, coroutine_in_progress);
>       }
> Once all callbacks are implemented using g_c_w, we can start splitting
> the two sides of the if function to only create a coroutine when we are
> outside from a bdrv callback.
> 
> However, we immediately found a problem while starting to convert the
> first callbacks: the AioContext lock is taken around some non coroutine
> callbacks! For example, bs->bdrv_open() is always called with the
> AioContext lock taken. In addition, callbacks like bdrv_open are
> graph-modifying functions, which is probably why we are taking the
> Aiocontext lock, and they do not like to run in coroutines.
> Anyways, the real problem comes when we create a coroutine in such
> places where the AioContext lock is taken and we have a graph-modifying
> function.
> 
> bdrv_coroutine_enter() calls aio_co_enter(), which in turns first checks
>  if the coroutine is entering another context from the current (which is
> not the case for open) and if we are already in coroutine (for sure
> not). Therefore it resorts to the following calls;
>       aio_context_acquire(ctx);
>         qemu_aio_coroutine_enter(ctx, co);
>         aio_context_release(ctx);
> Which is clearly a problem, because we are taking the lock twice: once
> from the original caller of the callback, and once here due to the
> coroutine. This creates a lot of deadlock situations.

What are the deadlock situations that are created by locking twice?

The only problem I'm aware of is AIO_WAIT_WHILE(), which wants to
temporarily unlock the AioContext It calls aio_context_release() once to
achieve this, which obviously isn't enough when the context was locked
twice.

But AIO_WAIT_WHILE() isn't allowed in coroutines anyway. So how are we
running into deadlocks here?

Note that we're probably already doing this inside the .bdrv_open
implementations: They will ususally read something from the image file,
calling bdrv_preadv() which is already a generated_coroutine_wrapper
today and creates a coroutine internally with the same locking pattern
applied that you describe as problematic here.

Making .bdrv_open itself a generated_coroutine_wrapper wouldn't really
change anything fundamental, it would just pull the existing mechanism
one function higher in the call stack.

> For example, all callers of bdrv_open() always take the AioContext lock.
> Often it is taken very high in the call stack, but it's always taken.
> 
> Getting rid of the lock around qemu_aio_coroutine_enter() is difficult
> too, because coroutines expect to have the lock taken. For example, if
> we want to drain from a coroutine, bdrv_co_yield_to_drain releases the
> lock for us.

It's not difficult at all in your case where you know that you're
already in the right thread and the lock is taken: You can call
qemu_aio_coroutine_enter() directly instead of bdrv_coroutine_enter() in
this case.

But as I said, I'm not sure why we need to get rid of it at all.

Kevin

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock, (continued)

Prev by Date: [PATCH 19/20] migration: remove the QEMUFileOps 'get_return_path' callback
Next by Date: Re: [PATCH] block: drop unused bdrv_co_drain() API
Previous by thread: Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock
Next by thread: Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock
Index(es):
- Date
- Thread