[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Block layer complexity: what to do to keep it under con

From: Paolo Bonzini
Subject: Re: [Qemu-devel] Block layer complexity: what to do to keep it under control?
Date: Wed, 29 Nov 2017 13:24:46 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0

On 29/11/2017 13:00, Stefan Hajnoczi wrote:
> We are at a point where code review isn't finding certain bugs because
> no single person knows all the assumptions.  Previously the problem was
> contained because maintainers spotted problems before patches were
> merged.
> This is not primarily a documentation problem though.  We cannot
> document our way out of this because no single person (patch author or
> code reviewer) can know or check everything anymore due to the scale.
> I think it's a (lack of) design problem because we have many incomplete
> abstractions like block jobs, IOThreads, block graph, image locking,
> etc.  They do not cover all possibly states and interactions today.
> Extending them leads to complex bugs.

I think the main interactions are:

1) block graph modifications and drain.  This has always been a carnage.
 Implementing BlockBackend isolation instead of drain would probably be
a starting point to fix it, because IIRC there are extremely few cases
where we really need "drain" semantics.

2) block jobs and coroutines.  Block jobs were too clever about
coroutines.  Using a simplified API is going to fix this problem.
Ideally, if you're not in a coroutine "co", the only coroutine APIs you
should use on "co" are:

- aio_co_enter/qemu_coroutine_enter (start a coroutine, respectively on
another AioContext or this context);

- aio_co_schedule/aio_co_wake (restart a coroutine that has yielded,
respectively on a given AioContext or its own original.

3) block jobs and drain.  This is related to (1) because drain can
terminate jobs and in turn that can cause block graph modifications.
I'm not even sure it's a separate issue.

Regarding documentation, the include file documentation is good for
coroutines and block jobs.  But it's bad for block graph modification
APIs, and even for coroutines + block jobs the docs/devel documentation
could be improved *and* it's ugly that we're not generating anything
readable from include file documentation, to go with docs/devel.


> A little progress has been made with defining higher-level APIs for
> block drivers and block jobs.  This way they either don't deal with
> low-level details of the concurrency and event loop models (e.g.
> bdrv_coroutine_enter()) or there is an interface that prompts them to
> integrate properly like bdrv_attach/detach_aio_context().
> Event loops and coroutines are good but they should not be used directly
> by block drivers and block jobs.  We need safe, high-level APIs that
> implement commonly-used operations.

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]