[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v13 00/17] block: incremental backup series
Re: [Qemu-devel] [PATCH v13 00/17] block: incremental backup series
Mon, 23 Feb 2015 17:52:45 +0000
On Fri, Feb 20, 2015 at 12:20:53PM -0500, John Snow wrote:
Eric: I've added you on CC for a libvirt perspective on the APIs that
this series introduces. We're discussing how incremental backup
failures should be handled when there are multiple drives attached to
the guest. Please see towards the bottom of this email.
> On 02/20/2015 06:09 AM, Stefan Hajnoczi wrote:
> >On Fri, Feb 13, 2015 at 05:08:41PM -0500, John Snow wrote:
> >>This series requires: [PATCH v3] blkdebug: fix "once" rule
> >>Welcome to the "incremental backup" newsletter, where we discuss
> >>exciting developments in non-redundant backup technology.
> >>This issue is called the "Max Reitz Fixup" issue.
> >>This patchset enables the in-memory part of the incremental backup
> >>feature. There are two series on the mailing list now by Vladimir
> >>Sementsov-Ogievskiy that enable the migration and persistence of
> >>dirty bitmaps.
> >>This series and most patches were originally by Fam Zheng.
> >Please add docs/incremental-backup.txt explaining how the commands are
> >intended to be used to perform backups. The QAPI documentation is
> >sparse and QMP clients would probably need to read the code to
> >understand how these commands work.
> OK. I wrote a markdown formatted file that explains it all pretty well;
> should I check it in as-is, or should I convert it to some other format?
Thanks, that is helpful. The multi-disk use case is most complex,
please include it.
Please submit to docs/ as plain text.
> >I'm not sure I understand the need for all the commands: add, remove,
> >enable, disable, clear. Why are these commands necessary?
> add/remove are self explanatory.
> clear allows you to re-sync a bitmap to a full backup after you have already
> been running for some time. See the readme for the usage case. Yes, you
> *COULD* delete and re-create the bitmap with add/remove, but why?
> Clear is nice, I stand by it.
> enable/disable: Not necessarily useful for the simple case at this exact
> moment; they can be used to track writes during a period of time if desired.
> You could use them with transactions to record activity through different
> periods of time. They were originally used for what the "frozen" case covers
> now, but I left them in.
> I could stand to part with them if you think they detract more than help.
> They seemed like useful primitives. My default action will be to leave them
> in, still.
> >I think just add and remove should be enough:
> >Initial full backup
> >Use transaction to add bitmaps to all drives atomically (i.e.
> >point-in-time snapshot across all drives) and launch drive-backup (full)
> >on all drives.
> >In your patch the bitmap starts disabled. In my example bitmaps are
> >always enabled unless they have a successor.
> No they don't:
> bitmap->disabled = false;
Great, I missed that. In that case enable/disable are truly optional.
They can be added in the future without needing to change the 'add'
All code qemu.git requires code review, testing, maintenance, and in the
case of QMP, a commitment to keep the command forever. Please drop
enable/disable for now.
> >Incremental backup
> >Use transaction to launch drive-backup (bitmap mode) on all drives.
> >If backup completes successfully we'll be able to run the same command
> >again for the next incremental backup.
> >If backup fails I see a problem when taking consistent incremental
> >backups across multiple drives:
> >Imagine 2 drives (A and B). Transaction is used to launch drive-backup
> >on both with their respective dirty bitmaps. drive-backup A fails and
> >merges successor dirty bits so nothing is lost, but what do we do with
> >drive-backup B?
> >drive-backup B could be in-progress or it could have completed before
> >drive-backup A failed. Now we're in trouble because we need to take
> >consistent snapshots across all drives but we've thrown away the dirty
> >bitmap for drive B!
> Just re-run the transaction. If one failed but one succeeded, just run it
> again. The one that succeeded prior will now have a trivial incremental
> backup, and the one that failed will have a new valid incremental backup.
> The two new incrementals will be point in time consistent.
> [full_a] <-- [incremental_a_0: FAILED]
> [full_b] <-- [incremental_b_0: SUCCESS]
> then re-run:
> [full_a] <------------------------ [incremental_a_1]
> [full_b] <-- [incremental_b_0] <-- [incremental_b_1]
> If the extra image in the chain for the one that succeeded is problematic,
> you can use other tools to consolidate them later.
Each client implementation needs to duplicate the logic for partial
backups. There can be arbitrary levels of partial backups since failure
could occur twice, three times, etc in a row. We're forcing all clients
to deal with failure themselves or to fall back to full backup on
The "client" may be a backup application that wants QEMU to send data
directly to a NBD server. In order for that to work, the backup
application also needs to keep state or fall back to full backup on any
In other words, making the API require a stateful client is ugly.
> Either way, how do you propose getting a consistent snapshot across multiple
> drives after a failure? The only recovery option is to just create a new
> snapshot *later*, you can never go back to what was, just like you can't for
> single drives.
This is the same problem as for a single drive. There the solution is
to merge the old and new bitmaps, and the client has to take a later
snapshot like you explained.
That policy is fine and should apply to multiple drives too. Take a new
snapshot if there was a failure.
> I am not convinced there is a problem. Since we don't want filename
> management (&c) as part of this solution inside of QEMU, there is nothing
> left to do except within libvirt.
No filenames are involved. Just don't throw away the frozen dirty
bitmap until the entire group of backup operations completes. It's
equivalent to what we do in the single drive case.
> >Stopping incremental backup
> >Use transaction to remove bitmaps on all drives. This case is easy.
> >Finally, what about bdrv_truncate()? When the disk size changes the
> >dirty bitmaps keep the old size. Seems likely to cause problems. Have
> >you thought about this case?
> Only just recently when reviewing Vladimir's bitmap persistence patches,
This patch series needs to address the issue since the QMP API to create
bitmaps is exposed here.
The drive cannot be resized during the backup operation. Luckily we're
already protected against that by the op blocker API.
The question is what happens to existing bitmaps (enabled, disabled, or
frozen) when there is no backup blockjob running and the drive is
resized. I think it makes sense to resize the bitmaps along with the
Description: PGP signature
- Re: [Qemu-devel] [PATCH v13 15/17] iotests: add simple incremental backup case, (continued)
- [Qemu-devel] [PATCH v13 13/17] block: Ensure consistent bitmap function prototypes, John Snow, 2015/02/13
- [Qemu-devel] [PATCH v13 16/17] iotests: add transactional incremental backup test, John Snow, 2015/02/13
- [Qemu-devel] [PATCH v13 14/17] iotests: add invalid input incremental backup tests, John Snow, 2015/02/13
- [Qemu-devel] [PATCH v13 17/17] iotests: add incremental backup failure recovery test, John Snow, 2015/02/13
- Re: [Qemu-devel] [PATCH v13 00/17] block: incremental backup series, Stefan Hajnoczi, 2015/02/20