qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH 3/7] jobs: add exit shim


From: John Snow
Subject: Re: [Qemu-block] [PATCH 3/7] jobs: add exit shim
Date: Mon, 27 Aug 2018 11:54:52 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0


On 08/25/2018 09:05 AM, Max Reitz wrote:
> On 2018-08-22 23:52, John Snow wrote:
>>
>>
>> On 08/22/2018 07:43 AM, Max Reitz wrote:
>>> On 2018-08-17 21:04, John Snow wrote:
>>>> All jobs do the same thing when they leave their running loop:
>>>> - Store the return code in a structure
>>>> - wait to receive this structure in the main thread
>>>> - signal job completion via job_completed
>>>>
>>>> Few jobs do anything beyond exactly this. Consolidate this exit
>>>> logic for a net reduction in SLOC.
>>>>
>>>> More seriously, when we utilize job_defer_to_main_loop_bh to call
>>>> a function that calls job_completed, job_finalize_single will run
>>>> in a context where it has recursively taken the aio_context lock,
>>>> which can cause hangs if it puts down a reference that causes a flush.
>>>>
>>>> You can observe this in practice by looking at mirror_exit's careful
>>>> placement of job_completed and bdrv_unref calls.
>>>>
>>>> If we centralize job exiting, we can signal job completion from outside
>>>> of the aio_context, which should allow for job cleanup code to run with
>>>> only one lock, which makes cleanup callbacks less tricky to write.
>>>>
>>>> Signed-off-by: John Snow <address@hidden>
>>>> ---
>>>>  include/qemu/job.h |  7 +++++++
>>>>  job.c              | 19 +++++++++++++++++++
>>>>  2 files changed, 26 insertions(+)
>>>
>>> Currently all jobs do this, the question of course is why.  The answer
>>> is because they are block jobs that need to do some graph manipulation
>>> in the main thread, right?
>>>
>>
>> Yep.
>>
>>> OK, that's reasonable enough, that sounds like even non-block jobs may
>>> need this (i.e. modify some global qemu state that you can only do in
>>> the main loop).  Interestingly, the create job only calls
>>> job_completed() of which it says nowhere that it needs to be executed in
>>> the main loop.
>>>
>>
>> Yeah, not all jobs will have anything meaningful to do in the main loop
>> context. This is one of them.
>>
>>> ...on second thought, do we really want to execute job_complete() in the
>>> main loop?  First of all, all of the transactional functions will run in
>>> the main loop.  Which makes sense, but it isn't noted anywhere.
>>> Secondly, we may end up calling JobDriver.user_resume(), which is
>>> probably not something we want to call in the main loop.
>>>
>>
>> I think we need to execute job_complete in the main loop, or otherwise
>> restructure the code that can run between job_completed and
>> job_finalize_single so that .prepare/.commit/.abort/.clean run in the
>> main thread, which is something we want to preserve.
> 
> Sure.
> 
>> It's simpler just to say that complete will run from the main thread,
>> like it does presently.
> 
> Yes, but we don't say that.
> 
>> Why would we not want to call user_resume from the main loop? That's
>> directly where it's called from, since it gets invoked directly from the
>> qmp thread.
> 
> Hmm!  True indeed.
> 
> The reason why we might not want to do it is because the job may not run
> in the main loop, so modifying the job (especially invoking a job
> method) may be dangerous without taking precautions.
> 
>>> OTOH, job_finish_sync() is something that has to be run in the main loop
>>> because it polls the main loop (and as far as my FUSE experiments have
>>> told me, polling a foreign AioContext doesn't work).
>>>
>>> So...  I suppose it would be nice if we had a real distinction which
>>> functions are run in which AioContext.  It seems like we indeed want to
>>> run job_completed() in the main loop, but what to do about the
>>> user_resume() call in job_cancel_async()?
>>>
>>
>> I don't think we need to do anything -- at least, these functions
>> *already* run from the main loop.
> 
> Yeah, but we don't mark that anywhere.  I really don't like that.  Jobs
> need to know which of their functions are run in which AioContext.
> 
>> mirror_exit et al get scheduled from job_defer_to_main_loop and call
>> job_completed there, so it's already always done from the main loop; I'm
>> just cutting out the part where the jobs have to manually schedule this.
> 
> I'm not saying what you're doing is wrong, I'm just saying tracking
> which things are running in which context is not easy because there are
> no comments on how it's supposed to be run.  (Apart from your new
> .exit() method which does say that it's run in the main loop.)
> 
> No, I don't find it obvious which functions are run in which context
> when first I have to think about in which context those functions are
> used (e.g. user_resume is usually the result of a QMP command, so it's
> run in the main loop; the transactional methods are part of completion,
> which is done in the main loop, so they are also called in the main
> loop; and so on).
> 
> But that's not part of this series.  It just occurred to me when
> tracking down which function belongs to which context when reviewing
> this patch.
> 
> Max
> 

Oh, I see. I can mark up the functions I/we expect to run in the main
thread with comments above the function implementation, would that help?

Probably also a top level document would also help... We're overdue for
one after all the changes recently.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]