qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->childr


From: Emanuele Giuseppe Esposito
Subject: Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept
Date: Wed, 13 Apr 2022 17:22:18 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0


Am 13/04/2022 um 17:14 schrieb Emanuele Giuseppe Esposito:
> 
> 
> Am 13/04/2022 um 16:51 schrieb Kevin Wolf:
>> Am 13.04.2022 um 15:43 hat Emanuele Giuseppe Esposito geschrieben:
>>> So this is a more concrete and up-to-date header.
>>>
>>> Few things to notice:
>>> - we have a list of AioContext. They are registered once an aiocontext
>>> is created, and deleted when it is destroyed.
>>> This list is helpful because each aiocontext can only modify its own
>>> number of readers, avoiding unnecessary cacheline bouncing
>>>
>>> - if a coroutine changes aiocontext, it's ok with regards to the
>>> per-aiocontext reader counter. As long as the sum is correct, there's no
>>> issue. The problem comes only once the original aiocontext is deleted,
>>> and at that point we need to move the count it held to a shared global
>>> variable, otherwise we risk to lose track of readers.
>>
>> So the idea is that we can do bdrv_graph_co_rdlock() in one thread and
>> the corresponding bdrv_graph_co_rdunlock() in a different thread?
>>
>> Would the unlock somehow remember the original thread, or do you use the
>> "sum is correct" argument and allow negative counter values, so you can
>> end up having count +1 in A and -1 in B to represent "no active
>> readers"? If this happens, it's likely to happen many times, so do we
>> have to take integer overflows into account then?
>>
>>> - All synchronization between the flags explained in this header is of
>>> course handled in the implementation. But for now it would be nice to
>>> have a feedback on the idea/API.
>>>
>>> So in short we need:
>>> - per-aiocontext counter
>>> - global list of aiocontext
>>> - global additional reader counter (in case an aiocontext is deleted)
>>> - global CoQueue
>>> - global has_writer flag
>>> - global QemuMutex to protect the list access
>>>
>>> Emanuele
>>>
>>> #ifndef BLOCK_LOCK_H
>>> #define BLOCK_LOCK_H
>>>
>>> #include "qemu/osdep.h"
>>>
>>> /*
>>>  * register_aiocontext:
>>>  * Add AioContext @ctx to the list of AioContext.
>>>  * This list is used to obtain the total number of readers
>>>  * currently running the graph.
>>>  */
>>> void register_aiocontext(AioContext *ctx);
>>>
>>> /*
>>>  * unregister_aiocontext:
>>>  * Removes AioContext @ctx to the list of AioContext.
>>>  */
>>> void unregister_aiocontext(AioContext *ctx);
>>>
>>> /*
>>>  * bdrv_graph_wrlock:
>>>  * Modify the graph. Nobody else is allowed to access the graph.
>>>  * Set global has_writer to 1, so that the next readers will wait
>>>  * that writer is done in a coroutine queue.
>>>  * Then keep track of the running readers by counting what is the total
>>>  * amount of readers (sum of all aiocontext readers), and wait until
>>>  * they all finish with AIO_WAIT_WHILE.
>>>  */
>>> void bdrv_graph_wrlock(void);
>>
>> Do we need a coroutine version that yields instead of using
>> AIO_WAIT_WHILE() or are we sure this will only ever be called from
>> non-coroutine contexts?
> 
> writes (graph modifications) are always done under BQL in the main loop.
> Except an unit test, I don't think a coroutine ever does that.

Additional point (1): I am also prepraring a serie with all "helpful
fixes" I got through all other discarderd/obsolete series, like
subtree_drain and similar.

So except for the job patches, you can discard all other series.

> 
>>
>>> /*
>>>  * bdrv_graph_wrunlock:
>>>  * Write finished, reset global has_writer to 0 and restart
>>>  * all readers that are waiting.
>>>  */
>>> void bdrv_graph_wrunlock(void);
>>>
>>> /*
>>>  * bdrv_graph_co_rdlock:
>>>  * Read the bs graph. Increases the reader counter of the current
>>> aiocontext,
>>>  * and if has_writer is set, it means that the writer is modifying
>>>  * the graph, therefore wait in a coroutine queue.
>>>  * The writer will then wake this coroutine once it is done.
>>>  *
>>>  * This lock cannot be taken recursively.
>>>  */
>>> void coroutine_fn bdrv_graph_co_rdlock(void);
>>
>> What prevents it from being taken recursively when it's just a counter?
>> (I do see however, that you can't take a reader lock while you have the
>> writer lock or vice versa because it would deadlock.)
>>
> I actually didn't add the assertion to prevent it from being recoursive
> yet, but I think it simplifies everything if it's not recoursive


Additional point (2): I forgot that with counters there's no easy way to
avoid recursion, so yeah theoretically it can be recursive. Still,
better avoid doing it intentionally though.

> 
>> Does this being a coroutine_fn mean that we would have to convert QMP
>> command handlers to coroutines so that they can take the rdlock while
>> they don't expect the graph to change? Or should we have a non-coroutine
>> version, too, that works with AIO_WAIT_WHILE()?
> 
> Why convert the QMP command handlers? coroutine_fn was just to signal
> that it can also be called from coroutines, like the ones created by the
> blk_* API.
> A reader does not have to be a coroutine. AIO_WAIT_WHILE is not
> mandatory to allow it to finish, it helps to ensure progress in case
> some reader is waiting for something, but other than that is not
> necessary IMO.
> 
>> Or should this only be taken for very small pieces of code directly
>> accessing the BdrvChild objects, and high-level users like QMP commands
>> shouldn't even consider themselves readers?
>>
> 
> No I think if we focus on small pieces of code we end up having a
> million lock/unlock pairs.
> 
>>> /*
>>>  * bdrv_graph_rdunlock:
>>>  * Read terminated, decrease the count of readers in the current aiocontext.
>>>  * If the writer is waiting for reads to finish (has_writer == 1), signal
>>>  * the writer that we are done via aio_wait_kick() to let it continue.
>>>  */
>>> void coroutine_fn bdrv_graph_co_rdunlock(void);
>>>
>>> #endif /* BLOCK_LOCK_H */
>>
>> I expect that in the final version, we might want to have some sugar
>> like a WITH_BDRV_GRAPH_RDLOCK_GUARD() macro, but obviously that doesn't
>> affect the fundamental design.
> 
> Yeah I will ping you once I get to that point ;)
> 
> Emanuele
>>
>> Kevin
>>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]