On Thu 16 Nov 2017 05:09:59 PM CET, Anton Nefedov wrote:
I have the impression that one major source of headaches is the
fact that the reopen queue contains nodes that don't need to be
reopened at all. Ideally this should be detected early on in
bdrv_reopen_queue(), so there's no chance that the queue contains
nodes used by a different block job. If we had that then op
blockers should be enough to prevent these things. Or am I missing
something?
After applying Max's patch I tried the similar approach; that is
keep BDSes referenced while they are in the reopen queue. Now I get
the stream job hanging. Somehow one blk_root_drained_begin() is not
paired with blk_root_drained_end(). So the job stays paused.
I can see this if I apply Max's patch and keep refs to BDSs in the
reopen queue:
#0 block_job_pause (...) at blockjob.c:130
#1 0x000055c143cb586d in block_job_drained_begin (...) at blockjob.c:227
#2 0x000055c143d08067 in blk_set_dev_ops (...) at block/block-backend.c:887
#3 0x000055c143cb69db in block_job_create (...) at blockjob.c:678
#4 0x000055c143d17c0c in mirror_start_job (...) at block/mirror.c:1177
There's a ops->drained_begin(opaque) call in blk_set_dev_ops() that
doesn't seem to be paired.
My understanding for now is
1. in bdrv_drain_all_begin(), BDS gets drained with
bdrv_parent_drained_begin(), all the parents get
blk_root_drained_begin(), pause their jobs.
2. one of the jobs finishes and deletes BB.
3. in bdrv_drain_all_end(), bdrv_parent_drained_end() is never
called because even though BDS still exists (referenced in the
queue), it cannot be accessed as bdrv_next() takes BDS from the
global BB list (and the corresponding BB is deleted).
Yes, I was debugging this and I got to a similar conclusion. With
test_stream_commit from iotest 030 I can see that
1. the block-stream job is created first, then stream_run begins and
starts copying the data.
2. block-commit starts and tries to reopen the top image in
read-write mode. This pauses the stream block job and drains all
BDSs.
3. The draining causes the stream job to finish, it is deferred to
the main loop, then stream_complete finishes and unrefs the block
job, deleting it. At the point of deletion the pause count was
still > 0 (because of step (2))
Max's patch v1 could have helped:
http://lists.nongnu.org/archive/html/qemu-devel/2017-11/msg01835.html
This doesn't solve the problem.
Or, perhaps another approach, keep BlockJob referenced while it is
paused (by block_job_pause/resume_all()). That should prevent it from
deleting the BB.
Yes, I tried this and it actually solves the issue. But I still think
that the problem is that block jobs are allowed to finish when they are
paused.