qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH v5 10/21] blockjobs: add NULL state


From: John Snow
Subject: Re: [Qemu-block] [PATCH v5 10/21] blockjobs: add NULL state
Date: Mon, 12 Mar 2018 12:23:23 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0


On 03/12/2018 12:07 PM, Kevin Wolf wrote:
> Am 12.03.2018 um 16:41 hat John Snow geschrieben:
>> On 03/12/2018 11:28 AM, Kevin Wolf wrote:
>>> Am 10.03.2018 um 09:27 hat John Snow geschrieben:
>>>> Add a new state that specifically demarcates when we begin to permanently
>>>> demolish a job after it has performed all work. This makes the transition
>>>> explicit in the STM table and highlights conditions under which a job may
>>>> be demolished.
>>>>
>>>> Alongside this state, add a new helper command "block_job_decommission",
>>>> which transitions to the NULL state and puts down our implicit reference.
>>>> This separates instances in the code for "block_job_unref" which merely
>>>> undo a matching "block_job_ref" with instances intended to initiate the
>>>> full destruction of the object.
>>>>
>>>> This decommission action also sets a number of fields to make sure that
>>>> block internals or external users that are holding a reference to a job
>>>> to see when it "finishes" are convinced that the job object is "done."
>>>> This is necessary, for instance, to do a block_job_cancel_sync on a
>>>> created object which will not make any progress.
>>>>
>>>> Now, all jobs must go through block_job_decommission prior to being
>>>> freed, giving us start-to-finish state machine coverage for jobs.
>>>>
>>>>
>>>> Transitions:
>>>> Created   -> Null: Early failure event before the job is started
>>>> Concluded -> Null: Standard transition.
>>>>
>>>> Verbs:
>>>> None. This should not ever be visible to the monitor.
>>>>
>>>>              +---------+
>>>>              |UNDEFINED|
>>>>              +--+------+
>>>>                 |
>>>>              +--v----+
>>>>    +---------+CREATED+------------------+
>>>>    |         +--+----+                  |
>>>>    |            |                       |
>>>>    |         +--v----+     +------+     |
>>>>    +---------+RUNNING<----->PAUSED|     |
>>>>    |         +--+-+--+     +------+     |
>>>>    |            | |                     |
>>>>    |            | +------------------+  |
>>>>    |            |                    |  |
>>>>    |         +--v--+       +-------+ |  |
>>>>    +---------+READY<------->STANDBY| |  |
>>>>    |         +--+--+       +-------+ |  |
>>>>    |            |                    |  |
>>>> +--v-----+   +--v------+             |  |
>>>> |ABORTING+--->CONCLUDED<-------------+  |
>>>> +--------+   +--+------+                |
>>>>                 |                       |
>>>>              +--v-+                     |
>>>>              |NULL<---------------------+
>>>>              +----+
>>>>
>>>> Signed-off-by: John Snow <address@hidden>
>>>
>>>> +static void block_job_decommission(BlockJob *job)
>>>> +{
>>>> +    assert(job);
>>>> +    job->completed = true;
>>>> +    job->busy = false;
>>>> +    job->paused = false;
>>>> +    job->deferred_to_main_loop = true;
>>>
>>> Why do we set all of these fields now? I don't see the use of it, and
>>> overwriting fields here potentially makes debugging harder.
>>>
>>> Especially for deferred_to_main_loop I might expect an assert() that it
>>> already is true, but shouldn't setting it always be done while actually
>>> deferring to the main loop?
>>>
>>> Can we turn all of these assignments into asserts or are there some that
>>> actually aren't already guaranteed, but that we want anyway?
>>>
>>>> +    block_job_state_transition(job, BLOCK_JOB_STATUS_NULL);
>>>> +    block_job_unref(job);
>>>> +}
>>>
>>> Kevin
>>>
>>
>> Gonna be real honest; we probably only need to set maybe one field
>> (job->completed = true) but it was late and I started hitting things
>> with big hammers.
>>
>> The problem is that if jobs do not look "done" to functions like
>> finish_sync, they will loop forever trying to make progress on a job
>> that doesn't do anything.
>>
>> I set a bunch of fields here more as a semantic statement than a
>> necessity, to be really really honest. ("Well, the job definitely has
>> these properties if it made it here, so let's update these fields to be
>> correct and the rest of the code will hopefully Do The Right Thing.")
> 
> So essentially, we want this to be assert(), but currently that breaks
> for some reasons and we can't figure out why before the freeze?
> 

Nah, I knew exactly why it broke.

> I guess that's fair enough, but then it would be good to use the freeze
> period to find the offenders and actually turn it into assertions.
> 
> Kevin
> 

I appear to be horridly confused, and you haven't seen the intermediate
mess that caused my confusion. A veritable maelstrom of confusion. Mr
Babbage would not be able to rightly comprehend, &c.

Let's give this another shot.

I added that code at a time when my local branch was not calling
block_job_completed, because I declared in v4's STM that a pre-created
job "shall not pass go, and shall not collect $200" -- that CREATED jobs
should either go to RUNNING or NULL.

The discovery here is that directly decommissioning a created job
actually breaks finish_sync because it polls on the completed boolean,
which nothing ever sets. So, under the reasoning I gave you in my last
reply;

"I'm simply setting these booleans based on the facts of the state
machine at this point: we ARE completed, we AREN'T busy, we AREN'T
paused, and we have technically now deferred back to the main loop (we
never entered it.)"

this was enough for finish_sync to get out of the way, but there were
other problems with the approach -- a CREATED job has pre-2.12 called
the abort/commit callbacks, so I went back to the 2.11 style and added
the appropriate transition into the graph. I forgot to check if this
code was still necessary at that point.

So actually, as of right now, these lines in decommission are useless;
but they shouldn't be assertions ... they might fail in a few cases. for
instance, "deferred to main loop" won't be set when we cancel a CREATED job.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]