Re: [Qemu-block] [Patch v12 resend 05/10] docs: block replication's desc

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Patch v12 resend 05/10] docs: block replication's desc

From:	Dr. David Alan Gilbert
Subject:	Re: [Qemu-block] [Patch v12 resend 05/10] docs: block replication's description
Date:	Mon, 4 Jan 2016 15:51:26 +0000
User-agent:	Mutt/1.5.24 (2015-08-30)

* Stefan Hajnoczi (address@hidden) wrote:
> On Wed, Dec 02, 2015 at 01:31:46PM +0800, Wen Congyang wrote:
> > +== Failure Handling ==
> > +There are 6 internal errors when block replication is running:
> > +1. I/O error on primary disk
> > +2. Forwarding primary write requests failed
> > +3. Backup failed
> > +4. I/O error on secondary disk
> > +5. I/O error on active disk
> > +6. Making active disk or hidden disk empty failed
> > +In case 1 and 5, we just report the error to the disk layer. In case 2, 3,
> > +4 and 6, we just report block replication's error to FT/HA manager (which
> > +decides when to do a new checkpoint, when to do failover).
> > +There is no internal error when doing failover.
> 
> Not sure this is true.
> 
> Below it says the following for failover: "We will flush the Disk buffer
> into Secondary Disk and stop block replication".  Flushing the disk
> buffer can result in I/O errors.  This means that failover operations
> are not guaranteed to succeed.
> 
> In practice I think this is similar to a successful failover followed by
> immediately getting I/O errors on the new Primary Disk.  It means that
> right after failover there is another failure and the system may not be
> able to continue.

Yes, I think that's true.

> So this really only matters in the case where there is a new Secondary
> ready after failover.  In that case the user might expect failover to
> continue to the new Secondary (Host 3):
> 
>    [X]        [X]
>   Host 1 <-> Host 2 <-> Host 3

Since COLO is just doing a 1+1 redundency, I think it's not expecting to
cope with a double host failure; it's going to take some time (seconds?) to
sync Host 3 back in when you add it after a failover and the aim would
be not to have distrubed the application for that long, so it should
already be running on Host 2 during that resync.

Dave
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-block] [Patch v12 resend 05/10] docs: block replication's description, Stefan Hajnoczi, 2016/01/04
- Re: [Qemu-block] [Patch v12 resend 05/10] docs: block replication's description, Wen Congyang, 2016/01/04
  - Re: [Qemu-block] [Patch v12 resend 05/10] docs: block replication's description, Stefan Hajnoczi, 2016/01/26
- Re: [Qemu-block] [Patch v12 resend 05/10] docs: block replication's description, Dr. David Alan Gilbert <=

Prev by Date: [Qemu-block] [PATCH 13/13] tests: Add test code for hbitmap serialization
Next by Date: Re: [Qemu-block] [Patch v12 resend 00/10] Block replication for continuous checkpoints
Previous by thread: Re: [Qemu-block] [Patch v12 resend 05/10] docs: block replication's description
Next by thread: Re: [Qemu-block] [Patch v12 resend 08/10] Implement new driver for block replication
Index(es):
- Date
- Thread