[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [Qemu-devel] [PATCH v2 04/11] blockjobs: Always use blo
From: |
John Snow |
Subject: |
Re: [Qemu-block] [Qemu-devel] [PATCH v2 04/11] blockjobs: Always use block_job_get_aio_context |
Date: |
Thu, 6 Oct 2016 16:22:33 -0400 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 |
On 10/05/2016 10:02 AM, Kevin Wolf wrote:
Am 01.10.2016 um 00:00 hat John Snow geschrieben:
There are a few places where we're fishing it out for ourselves.
Let's not do that and instead use the helper.
Signed-off-by: John Snow <address@hidden>
That change makes a difference when the block job is running its
completion part after block_job_defer_to_main_loop(). The commit message
could be more explicit about whether this is intentional or whether this
case is expected to happen at all.
I suspect that if it can happen, this is a bug fix. Please check and
update the commit message accordingly.
Because I'm bad with being concise, I wrote a TLDR at the bottom.
Otherwise, enjoy this wall of text.
Kevin
Intentional under the premise of:
(1) Acquiring the context for which a job is not actually running under
is likely incorrect (or at the very least misleading), and
(2) If using the main thread context for any would-be callers is
incorrect, this is a problem with the job lifetime that needs to be
corrected anyway.
In general, if we are acquiring the context to secure exclusive access
to the BlockJob state itself, using the getter here is perfectly safe.
If we are acquiring context for other reasons, we need to consider more
carefully.
The callers are:
(A) bdrv_drain_all (block/io)
Obtains context for the sake of pause/resume. Pauses all jobs before
draining all BDSes. For starters, pausing a job that has deferred to
main has no effect (and neither does resuming). This usage appears
slightly erroneous, though, in that if we are not running from the main
thread, we are definitely not securing exclusive rights to the block
job. We could, in theory, race on reads/writes to the pause count field.
This would be a bugfix.
(B) find_block_job (all monitor context)
Acquires context as a courtesy for its callers:
- qmp_block_job_set_speed
- qmp_block_job_cancel
- qmp_block_job_pause
- qmp_block_job_resume
- qmp_block_job_complete
In an "already deferred to main" sense... in general, if the job has
already deferred to main we don't need to acquire the block's context to
get safe access to the job, because we're already running in the main
context. Further, none of these functions actually have any meaning for
a job in such a state.
- set_speed: Sets speed parameters, harmless either way.
- cancel: Will set the cancelled boolean, reset iostatus, then attempt
to enter the job. Since a job that enters the main loop remains busy,
the enter is a NOP. The BlockBackend AIO context here is therefore
extraneous, and the getter is safe.
- pause: Only increments a counter, and will have no effect.
- resume: Decrements a counter. Attempts to enter(), but as stated
above this is a NOP.
- complete: Calls .complete(), for which the only implementation is
mirror_complete. Uh, this actually seems messy. Looks like there's
nothing to prevent us from calling this after we've already told it to
complete once. This could be a legitimate bug that this patch does
nothing in particular to address. If complete() is shored up such that
it can be called precisely once, this becomes safe.
(C) qmp_query_block_jobs (monitor context)
Just a getter. Using get_context is safe in either state.
(D) run_block_job (qemu-img)
Never called when the context is in the main loop anyway. Effectively
no change here.
So, with the exception of .complete, I think this is a safe change as it
stands... However... Paolo wants to complicate my life and get rid of
this getter for his own fiendish purposes. He suggests pushing down
context acquisition into blockjob.c directly for any QMP callers:
- qmp_block_job_set_speed -> block_job_set_speed
- qmp_block_job_cancel -> block_job_cancel
- qmp_block_job_pause -> block_job_user_pause
- qmp_block_job_resume -> block_job_user_resume
- qmp_block_job_complete -> block_job_complete
- qmp_query_block_jobs -> block_job_query
Most of these have only one caller in the QMP layer:
block_job_set_speed
block_job_user_pause
block_job_user_resume
block_job_query
These can easily just take the context they need, removing external uses
of job->blk for purposes of acquiring the context.
block_job_cancel and block_job_complete are different.
block_job_cancel is called in many places, but we can just add a similar
block_job_user_cancel if we wanted a version which takes care to acquire
context and one that does not. (Or we could just acquire the context
regardless, but Paolo warned me ominously that recursive locks are EVIL.
He sounded serious.)
block_job_complete has no direct callers outside of QMP, but it is also
used as a callback by block_job_complete_sync, used in qemu-img for
run_block_job. I can probably rewrite qemu_img to avoid this usage.
TLDR:
- This change should be perfectly safe, but Paolo wants to get rid of
this usage anyway.
- At least 5/6 uses of external context grabbing can be internalized easily.
- qemu-img's run_block_job needs to be refactored a bit, though I don't
have an idea for that yet, but as you pointed out it needs to be done
for the public/private split anyway.
- block_job_complete needs to be touched up no matter what we do.
- The aio_context getter can probably be removed entirely as per Paolo's
wishes, but I'll have to change bdrv_drain_all a bit. A
block_job_pause_all and block_job_resume all would work, though that's a
bit special purpose. I could craft up a block_job_apply_all for the
purpose instead. (e.g. block_job_apply_all(block_job_pause))
I think that answers everyone's questions...
--js