There are a few problems with transactional job completion right now.
First, if jobs complete so quickly they complete before remaining jobs
get a chance to join the transaction, the completion mode can leave well
known state and the QLIST can get corrupted and the transactional jobs
can complete in batches or phases instead of all together.
Second, if two or more jobs defer to the main loop at roughly the same
time, it's possible for one job's cleanup to directly invoke the other
job's cleanup from within the same thread, leading to a situation that
will deadlock the entire transaction.
Thanks to Vladimir for pointing out these modes of failure.
===
v4:
===
Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively
001/6:[----] [--] 'blockjob: fix dead pointer in txn list'
002/6:[----] [--] 'blockjob: add .clean property'
003/6:[----] [--] 'blockjob: add .start field'
004/6:[0021] [FC] 'blockjob: add block_job_start'
005/6:[0010] [FC] 'blockjob: refactor backup_start as backup_job_create'
006/6:[----] [--] 'iotests: add transactional failure race test'
04: Fix command tracers (Kevin)
Implement the ability to 'start' a 'paused' job (Kevin, Jeff)
05: Replace superfluous conditionals with assertions. (Kevin, Jeff)
===
v3:
===
- Rebase to origin/master, requisite patches now upstream.
===
v2:
===
- Correct Vladimir's email (Sorry!)
- Add test as a variant of an existing test [Vladimir]
________________________________________________________________________________
For convenience, this branch is available at:
https://github.com/jnsnow/qemu.git branch job-fix-race-condition
https://github.com/jnsnow/qemu/tree/job-fix-race-condition
This version is tagged job-fix-race-condition-v4:
https://github.com/jnsnow/qemu/releases/tag/job-fix-race-condition-v4
John Snow (5):
blockjob: add .clean property
blockjob: add .start field
blockjob: add block_job_start
blockjob: refactor backup_start as backup_job_create
iotests: add transactional failure race test
Vladimir Sementsov-Ogievskiy (1):
blockjob: fix dead pointer in txn list
block/backup.c | 63 +++++++++++++++++++---------------
block/commit.c | 6 ++--
block/mirror.c | 7 ++--
block/replication.c | 12 ++++---
block/stream.c | 6 ++--
block/trace-events | 6 ++--
blockdev.c | 81 ++++++++++++++++++++++++++++----------------
blockjob.c | 58 ++++++++++++++++++++++++-------
include/block/block_int.h | 23 +++++++------
include/block/blockjob.h | 9 +++++
include/block/blockjob_int.h | 11 ++++++
tests/qemu-iotests/124 | 53 +++++++++++++++++++----------
tests/qemu-iotests/124.out | 4 +--
tests/test-blockjob-txn.c | 12 +++----
14 files changed, 228 insertions(+), 123 deletions(-)