[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [Patch v12 resend 05/10] docs: block replication's desc
Re: [Qemu-block] [Patch v12 resend 05/10] docs: block replication's description
Tue, 26 Jan 2016 13:57:19 +0000
On Mon, Jan 04, 2016 at 02:03:16PM +0800, Wen Congyang wrote:
> On 12/23/2015 05:26 PM, Stefan Hajnoczi wrote:
> > On Wed, Dec 02, 2015 at 01:31:46PM +0800, Wen Congyang wrote:
> >> +== Failure Handling ==
> >> +There are 6 internal errors when block replication is running:
> >> +1. I/O error on primary disk
> >> +2. Forwarding primary write requests failed
> >> +3. Backup failed
> >> +4. I/O error on secondary disk
> >> +5. I/O error on active disk
> >> +6. Making active disk or hidden disk empty failed
> >> +In case 1 and 5, we just report the error to the disk layer. In case 2, 3,
> >> +4 and 6, we just report block replication's error to FT/HA manager (which
> >> +decides when to do a new checkpoint, when to do failover).
> >> +There is no internal error when doing failover.
> > Not sure this is true.
> > Below it says the following for failover: "We will flush the Disk buffer
> > into Secondary Disk and stop block replication". Flushing the disk
> > buffer can result in I/O errors. This means that failover operations
> > are not guaranteed to succeed.
> We don't use mirror job now. We may use it in the next version.
> Is there any way to know the I/O error when the mirror job is running?
> Get the job's status?
Block jobs have an error status which is exposed via QMP. The block job
emits a QMP event notifying the client. If the client issues
query-block-jobs it will also see the iostatus field.
I'm not aware of an internal API to monitor QMP events. It would be
possible to add it but first I wonder why you want to use mirror?
> > In practice I think this is similar to a successful failover followed by
> > immediately getting I/O errors on the new Primary Disk. It means that
> > right after failover there is another failure and the system may not be
> > able to continue.
> Block replication is not designed for such case. For example, we don't do
> failover on primary disk's failure. In such case, we just report the error
> to the disk layer(It is the case 1 in the above Failure Handling).
> Sorry for the late reply. Your mail is sent at 2015-12-23, but I receive
> it at 2016-01-04....
What is supposed to happen when flushing the Disk Buffer into the
Secondary Disk fails?
Description: PGP signature