[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH 3/7] jobs: add exit shim

From: Max Reitz
Subject: Re: [Qemu-block] [PATCH 3/7] jobs: add exit shim
Date: Sat, 25 Aug 2018 15:05:45 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 2018-08-22 23:52, John Snow wrote:
> On 08/22/2018 07:43 AM, Max Reitz wrote:
>> On 2018-08-17 21:04, John Snow wrote:
>>> All jobs do the same thing when they leave their running loop:
>>> - Store the return code in a structure
>>> - wait to receive this structure in the main thread
>>> - signal job completion via job_completed
>>> Few jobs do anything beyond exactly this. Consolidate this exit
>>> logic for a net reduction in SLOC.
>>> More seriously, when we utilize job_defer_to_main_loop_bh to call
>>> a function that calls job_completed, job_finalize_single will run
>>> in a context where it has recursively taken the aio_context lock,
>>> which can cause hangs if it puts down a reference that causes a flush.
>>> You can observe this in practice by looking at mirror_exit's careful
>>> placement of job_completed and bdrv_unref calls.
>>> If we centralize job exiting, we can signal job completion from outside
>>> of the aio_context, which should allow for job cleanup code to run with
>>> only one lock, which makes cleanup callbacks less tricky to write.
>>> Signed-off-by: John Snow <address@hidden>
>>> ---
>>>  include/qemu/job.h |  7 +++++++
>>>  job.c              | 19 +++++++++++++++++++
>>>  2 files changed, 26 insertions(+)
>> Currently all jobs do this, the question of course is why.  The answer
>> is because they are block jobs that need to do some graph manipulation
>> in the main thread, right?
> Yep.
>> OK, that's reasonable enough, that sounds like even non-block jobs may
>> need this (i.e. modify some global qemu state that you can only do in
>> the main loop).  Interestingly, the create job only calls
>> job_completed() of which it says nowhere that it needs to be executed in
>> the main loop.
> Yeah, not all jobs will have anything meaningful to do in the main loop
> context. This is one of them.
>> ...on second thought, do we really want to execute job_complete() in the
>> main loop?  First of all, all of the transactional functions will run in
>> the main loop.  Which makes sense, but it isn't noted anywhere.
>> Secondly, we may end up calling JobDriver.user_resume(), which is
>> probably not something we want to call in the main loop.
> I think we need to execute job_complete in the main loop, or otherwise
> restructure the code that can run between job_completed and
> job_finalize_single so that .prepare/.commit/.abort/.clean run in the
> main thread, which is something we want to preserve.


> It's simpler just to say that complete will run from the main thread,
> like it does presently.

Yes, but we don't say that.

> Why would we not want to call user_resume from the main loop? That's
> directly where it's called from, since it gets invoked directly from the
> qmp thread.

Hmm!  True indeed.

The reason why we might not want to do it is because the job may not run
in the main loop, so modifying the job (especially invoking a job
method) may be dangerous without taking precautions.

>> OTOH, job_finish_sync() is something that has to be run in the main loop
>> because it polls the main loop (and as far as my FUSE experiments have
>> told me, polling a foreign AioContext doesn't work).
>> So...  I suppose it would be nice if we had a real distinction which
>> functions are run in which AioContext.  It seems like we indeed want to
>> run job_completed() in the main loop, but what to do about the
>> user_resume() call in job_cancel_async()?
> I don't think we need to do anything -- at least, these functions
> *already* run from the main loop.

Yeah, but we don't mark that anywhere.  I really don't like that.  Jobs
need to know which of their functions are run in which AioContext.

> mirror_exit et al get scheduled from job_defer_to_main_loop and call
> job_completed there, so it's already always done from the main loop; I'm
> just cutting out the part where the jobs have to manually schedule this.

I'm not saying what you're doing is wrong, I'm just saying tracking
which things are running in which context is not easy because there are
no comments on how it's supposed to be run.  (Apart from your new
.exit() method which does say that it's run in the main loop.)

No, I don't find it obvious which functions are run in which context
when first I have to think about in which context those functions are
used (e.g. user_resume is usually the result of a QMP command, so it's
run in the main loop; the transactional methods are part of completion,
which is done in the main loop, so they are also called in the main
loop; and so on).

But that's not part of this series.  It just occurred to me when
tracking down which function belongs to which context when reviewing
this patch.


Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]