[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [Patch v12 resend 05/10] docs: block replication's desc
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [Qemu-devel] [Patch v12 resend 05/10] docs: block replication's description |
Date: |
Mon, 4 Jan 2016 15:51:26 +0000 |
User-agent: |
Mutt/1.5.24 (2015-08-30) |
* Stefan Hajnoczi (address@hidden) wrote:
> On Wed, Dec 02, 2015 at 01:31:46PM +0800, Wen Congyang wrote:
> > +== Failure Handling ==
> > +There are 6 internal errors when block replication is running:
> > +1. I/O error on primary disk
> > +2. Forwarding primary write requests failed
> > +3. Backup failed
> > +4. I/O error on secondary disk
> > +5. I/O error on active disk
> > +6. Making active disk or hidden disk empty failed
> > +In case 1 and 5, we just report the error to the disk layer. In case 2, 3,
> > +4 and 6, we just report block replication's error to FT/HA manager (which
> > +decides when to do a new checkpoint, when to do failover).
> > +There is no internal error when doing failover.
>
> Not sure this is true.
>
> Below it says the following for failover: "We will flush the Disk buffer
> into Secondary Disk and stop block replication". Flushing the disk
> buffer can result in I/O errors. This means that failover operations
> are not guaranteed to succeed.
>
> In practice I think this is similar to a successful failover followed by
> immediately getting I/O errors on the new Primary Disk. It means that
> right after failover there is another failure and the system may not be
> able to continue.
Yes, I think that's true.
> So this really only matters in the case where there is a new Secondary
> ready after failover. In that case the user might expect failover to
> continue to the new Secondary (Host 3):
>
> [X] [X]
> Host 1 <-> Host 2 <-> Host 3
Since COLO is just doing a 1+1 redundency, I think it's not expecting to
cope with a double host failure; it's going to take some time (seconds?) to
sync Host 3 back in when you add it after a failover and the aim would
be not to have distrubed the application for that long, so it should
already be running on Host 2 during that resync.
Dave
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK