qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AioContext lock removal: help needed


From: Emanuele Giuseppe Esposito
Subject: Re: AioContext lock removal: help needed
Date: Fri, 8 Jul 2022 11:42:06 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0


Am 08/07/2022 um 10:42 schrieb Emanuele Giuseppe Esposito:
> Hello everyone,
> 
> As you all know, I am trying to find a way to replace the well known
> AioContext lock with something else that makes sense and provides the
> same (or even better) guarantees than using this lock.
> 
> The reason for this change have been explained over and over and I don't
> really want to repeat them. Please read the various series I posted in
> the past [1] for more information.
> 
> The end goal is to get rid of the AioContext, and have fine-granularity
> locks in the various components, to make the whole block layer more
> multi-thread friendly and eventually be able to assign multiple virtual
> queues to a single iothread.
> 
> AioContext lock is used everywhere, to protect a huge variety of data.
> This limits a lot the level of multithreading that iothreads can achieve.
> 
> Before digging into the problem itself and possible solutions, I would
> like to also add that we are having a weekly (or bi-weekly, we'll see)
> public meeting where we plan to discuss about this project. Anyone
> interested is very welcome to join. Event invitation is here:
> 
> https://calendar.google.com/event?action=TEMPLATE&tmeid=NTdja2VwMDFyYm9nNjNyc25pdXU5bm8wb3FfMjAyMjA3MTRUMDgwMDAwWiBlZXNwb3NpdEByZWRoYXQuY29t&tmsrc=eesposit%40redhat.com&scp=ALL
> 
> One huge blocker we are having is removing the AioContext from the block
> API (bdrv_* and friends).
> I identified two initial and main candidates that need to lose the
> aiocontext protection:
> - bdrv_replace_child_noperm
> - bdtv_try_set_aio_context
> 
> When these two functions can safely run without AioContext lock, then we
> are getting rid of the majority of its usage.
> The main issue is: what can we use as replacement?
> 
> Let's analyze bdrv_replace_child_noperm (probably the toughest of the
> two): this function performs a graph modification, removing a child from
> a bs and putting it under another. It modifies the bs' ->parents and
> ->children nodes list, and it definitely needs protection because these
> lists are also read from iothreads in parallel.
> 
> Possible candidates to use as replacement:
> 
> - rwlock. With the help of Paolo, I implemented a rwlock optimized for
> many and fast readers, and few writers. Ideal for
> bdrv_replace_child_noperm. However, the problem here is that when a
> writer has to wait other readers to finish (since it has exclusive
> access), it should call a nested event loop to allow others (reader
> included) to progress.
> And this brings us into serious complications, because polling with a
> wlock taken is prone to a lot of deadlocks, including the fact that the
> AioContext lock is still needed in AIO_WAIT_WHILE. The solution would be
> to run everything, readers included, in coroutines. However, this is not
> easy either: long story short, switching BlockDriverState callbacks to
> coroutines is a big problem, as the AioContext lock is still being taken
> in many of the callbacks caller and therefore switching from a coroutine
> creates a mixture of locks taken that simply results in deadlocks.
> Ideally we want to first get rid of the AioContext lock and then switch
> to coroutines, but that's the whole point of the rwlock.
> More on this here:
> https://patchew.org/QEMU/20220426085114.199647-1-eesposit@redhat.com/#cc5e12d1-d25f-d338-bff2-0d3f5cc0def7@redhat.com

This is also very useful (on the same thread as above):
https://patchew.org/QEMU/20220426085114.199647-1-eesposit@redhat.com/#6fc3e40e-7682-b9dc-f789-3ca95e0430db@redhat.com

> 
> But I would say this is not an ideal candidate to replace the AioContext
> lock. At least not in the immediate future.
> 
> - drains. This was the initial and still main lead. Using
> bdrv_drained_begin/end we are sure that a node and all its parents will
> be paused (included jobs), no io will further come since it will be
> temporarily disabled and all processing requests are ensured to be
> finished by the end bdrv_drained_begin returns.
> Even better than bdrv_drained, I proposed using
> bdrv_subtree_drained_begin, which also stops and protects the child of a
> node.
> I think the major drawback of this is that we need to be sure that there
> are no cases where drains is not enough. Together with Kevin and Stefan
> we identified that we need to prevent drain to be called in coroutines,
> regardless on which AioContext they are run. That's because they could
> allow parallel drain/graph reading to happen, for example (thinking
> about the general case) a coroutine yielding after calling drain_begin
> and in the middle of a graph modification could allow another coroutine
> to drain/read the graph.
> Note that draining itself also involves reading the graph too.
> 
> We thought the only usage of coroutines draining was in mirror run()
> callback. However, that is just the tip of the iceberg.
> Other functions like .bdrv_open callbacks (like qcow2_open) take care of
> creating coroutines to execute part of the logic, with valid performance
> reasons (we don't want to wait when we could simply yield and allow
> something else to run).
> 
> So another question is: what could we do to solve this coroutine issue?
> Ideas?
> 
> Main drain series:
> https://patchew.org/QEMU/20220118162738.1366281-1-eesposit@redhat.com/
> [1]
> 
> 
> 
> [1] = https://patchew.org/QEMU/20220301142113.163174-1-eesposit@redhat.com/
> 
> Thank you,
> Emanuele
> 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]