[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [Patch v12 resend 05/10] docs: block replication's desc
Re: [Qemu-block] [Patch v12 resend 05/10] docs: block replication's description
Mon, 4 Jan 2016 14:03:16 +0800
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0
On 12/23/2015 05:26 PM, Stefan Hajnoczi wrote:
> On Wed, Dec 02, 2015 at 01:31:46PM +0800, Wen Congyang wrote:
>> +== Failure Handling ==
>> +There are 6 internal errors when block replication is running:
>> +1. I/O error on primary disk
>> +2. Forwarding primary write requests failed
>> +3. Backup failed
>> +4. I/O error on secondary disk
>> +5. I/O error on active disk
>> +6. Making active disk or hidden disk empty failed
>> +In case 1 and 5, we just report the error to the disk layer. In case 2, 3,
>> +4 and 6, we just report block replication's error to FT/HA manager (which
>> +decides when to do a new checkpoint, when to do failover).
>> +There is no internal error when doing failover.
> Not sure this is true.
> Below it says the following for failover: "We will flush the Disk buffer
> into Secondary Disk and stop block replication". Flushing the disk
> buffer can result in I/O errors. This means that failover operations
> are not guaranteed to succeed.
We don't use mirror job now. We may use it in the next version.
Is there any way to know the I/O error when the mirror job is running?
Get the job's status?
> In practice I think this is similar to a successful failover followed by
> immediately getting I/O errors on the new Primary Disk. It means that
> right after failover there is another failure and the system may not be
> able to continue.
Block replication is not designed for such case. For example, we don't do
failover on primary disk's failure. In such case, we just report the error
to the disk layer(It is the case 1 in the above Failure Handling).
Sorry for the late reply. Your mail is sent at 2015-12-23, but I receive
it at 2016-01-04....
> So this really only matters in the case where there is a new Secondary
> ready after failover. In that case the user might expect failover to
> continue to the new Secondary (Host 3):
> [X] [X]
> Host 1 <-> Host 2 <-> Host 3