[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v14 7/8] Implement new driver for block replicat
From: |
Stefan Hajnoczi |
Subject: |
Re: [Qemu-devel] [PATCH v14 7/8] Implement new driver for block replication |
Date: |
Thu, 28 Jan 2016 15:15:43 +0000 |
User-agent: |
Mutt/1.5.24 (2015-08-30) |
On Thu, Jan 28, 2016 at 09:13:24AM +0800, Wen Congyang wrote:
> On 01/27/2016 10:46 PM, Stefan Hajnoczi wrote:
> > On Wed, Jan 13, 2016 at 05:18:31PM +0800, Changlong Xie wrote:
> >> +static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
> >> +{
> >> + Error *local_err = NULL;
> >> + int ret;
> >> +
> >> + if (!s->secondary_disk->job) {
> >> + error_setg(errp, "Backup job is cancelled unexpectedly");
> >> + return;
> >> + }
> >> +
> >> + block_job_do_checkpoint(s->secondary_disk->job, &local_err);
> >> + if (local_err) {
> >> + error_propagate(errp, local_err);
> >> + return;
> >> + }
> >> +
> >> + ret = s->active_disk->drv->bdrv_make_empty(s->active_disk);
> >
> > What happens to in-flight requests to the active and hidden disks?
>
> we MUST call do_checkpoint() when the vm is stopped.
Please document the environment under which the block replication
callback functions run.
I'm concerned that the bdrv_drain_all() in vm_stop() can take a long
time if the disk is slow/failing. bdrv_drain_all() blocks until all
in-flight I/O requests have completed. What does the Primary do if the
Secondary becomes unresponsive?
> >> + switch (s->mode) {
> >> + case REPLICATION_MODE_PRIMARY:
> >> + break;
> >> + case REPLICATION_MODE_SECONDARY:
> >> + s->active_disk = bs->file->bs;
> >> + if (!bs->file->bs->backing) {
> >> + error_setg(errp, "Active disk doesn't have backing file");
> >> + return;
> >> + }
> >> +
> >> + s->hidden_disk = s->active_disk->backing->bs;
> >> + if (!s->hidden_disk->backing) {
> >> + error_setg(errp, "Hidden disk doesn't have backing file");
> >> + return;
> >> + }
> >> +
> >> + s->secondary_disk = s->hidden_disk->backing->bs;
> >> + if (!s->secondary_disk->blk) {
> >> + error_setg(errp, "The secondary disk doesn't have block
> >> backend");
> >> + return;
> >> + }
> >
> > Kevin: Is code allowed to stash away BlockDriverState pointers for
> > convenience or should it keep the BdrvChild pointers instead? In order
> > for replication to work as expected, the graph shouldn't change but for
> > consistency maybe BdrvChild is best.
I asked Kevin about this on IRC and he agreed that BdrvChild should be
used instead of holding on to BlockDriverState * pointers. Although
these pointers will not change during replication (if the op blockers
are set up correctly), it's more consistent and certainly safer to go
through BdrvChild.
> >> + /* start backup job now */
> >> + error_setg(&s->blocker,
> >> + "block device is in use by internal backup job");
> >> + bdrv_op_block_all(s->top_bs, s->blocker);
> >> + bdrv_op_unblock(s->top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
> >> + bdrv_ref(s->hidden_disk);
> >
> > Why is the explicit reference to hidden_disk (but not secondary_disk or
> > active_disk) is necessary?
>
> IIRC, we should reference the backup target before calling backup_start(),
> and we will reference the backup source in backup_start().
I'm not sure why this is necessary since they are part of the backing
chain.
If it is necessary, please add a comment so it's clear why the reference
is being taken.
Stefan
signature.asc
Description: PGP signature
[Qemu-devel] [PATCH v14 2/8] Store parent BDS in BdrvChild, Changlong Xie, 2016/01/13
[Qemu-devel] [PATCH v14 8/8] support replication driver in blockdev-add, Changlong Xie, 2016/01/13
[Qemu-devel] [PATCH v14 3/8] Backup: clear all bitmap when doing block checkpoint, Changlong Xie, 2016/01/13
[Qemu-devel] [PATCH v14 4/8] Allow creating backup jobs when opening BDS, Changlong Xie, 2016/01/13