[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH RESEND 1/2] Block: Block replication design for
From: |
Eric Blake |
Subject: |
Re: [Qemu-devel] [PATCH RESEND 1/2] Block: Block replication design for COLO |
Date: |
Wed, 25 Mar 2015 10:06:53 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 |
On 12/25/2014 08:31 PM, Yang Hongyang wrote:
> This is the initial design of block replication.
> The blkcolo block driver enables disk replication for continuous
> checkpoints. It is designed for COLO that Secondary VM is running.
> It can also be applied for FT/HA scene that Secondary VM is not
> running.
>
> Signed-off-by: Wen Congyang <address@hidden>
> Signed-off-by: Lai Jiangshan <address@hidden>
> Signed-off-by: Yang Hongyang <address@hidden>
> ---
> docs/blkcolo.txt | 85
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 85 insertions(+)
> create mode 100644 docs/blkcolo.txt
Grammar review only (I'll leave the technical review to others)
>
> diff --git a/docs/blkcolo.txt b/docs/blkcolo.txt
> new file mode 100644
> index 0000000..41c2a05
> --- /dev/null
> +++ b/docs/blkcolo.txt
> @@ -0,0 +1,85 @@
> +Disk replication using blkcolo
> +----------------------------------------
> +Copyright Fujitsu, Corp. 2014
Visually, the separator line should match the length of the line above,
and maybe have a blank line after.
> +
> +This work is licensed under the terms of the GNU GPL, version 2 or later.
> +See the COPYING file in the top-level directory.
> +
> +The blkcolo block driver enables disk replication for continuous checkpoints.
> +It is designed for COLO that Secondary VM is running. It can also be applied
similar comments as for Wen's RFC COLO v2 series for
docs/block-replication.txt (in fact, do we need two files, or should all
this information be merged into a single file?):
s/for COLO that/for COLO (COurse-grain LOck-stepping replication), where/
> +for FT/HA scene that Secondary VM is not running.
s/for FT/HA scene that/to FT/HA (Fault-tolerance/High assurance)
scenarios, where/
> +
> +This document gives an overview of blkcolo's design.
> +
> +== Background ==
> +High availability solutions such as micro checkpoint and COLO will do
> +consecutive checkpoint. The VM state of Primary VM and Secondary VM is
s/checkpoint/checkpoints/
> +identical right after a VM checkpoint, but becomes different as the VM
> +executes till the next checkpoint. To support disk contents checkpoint,
> +the modified disk contents in the Secondary VM must be buffered, and are
> +only dropped at next checkpoint time. To reduce the network transportation
> +effort at the time of checkpoint, the disk modification operations of
> +Primary disk are asynchronously forwarded to the Secondary node.
> +
> +== Disk Buffer ==
> +The following is the image of Disk buffer:
> +
> + +----------------------+ +------------------------+
> + |Primary Write Requests| |Secondary Write Requests|
> + +----------------------+ +------------------------+
> + | |
> + | (4)
> + | V
> + | /-------------\
> + | Copy and Forward | |
> + |---------(1)----------+ | Disk Buffer |
> + | | | |
> + | (3) \-------------/
> + | speculative ^
> + | write through (2)
> + | | |
> + V V |
> + +--------------+ +----------------+
> + | Primary Disk | | Secondary Disk |
> + +--------------+ +----------------+
> + 1) Primary write requests will be copied and forwarded to Secondary
> + QEMU.
> + 2) Before Primary write requests are written to Secondary disk, the
> + original sector content will be read from Secondary disk and
> + buffered in the Disk buffer, but it will not overwrite the existing
> + sector content in the Disk buffer.
> + 3) Primary write requests will be written to Secondary disk.
> + 4) Secondary write requests will be bufferd in the Disk buffer and it
s/bufferd/buffered/
> + will overwrite the existing sector content in the buffer.
> +
> +== Capture I/O request ==
> +The blkcolo is a new block driver protocol, so all I/O requests can be
> +captured in the driver interface bdrv_co_readv()/bdrv_co_writev().
> +
> +== Checkpoint & failover ==
> +The blkcolo buffers the write requests in Secondary QEMU. And the buffer
> +should be dropped at a checkpoint, or be flushed to Secondary disk when
s/when/on/
> +failover. We add four block driver interfaces to do this:
> +a. bdrv_prepare_checkpoint()
> + This interface may block, and return when all Primary write
s/return/returns/
> + requests are forwarded to Secondary QEMU.
> +b. bdrv_do_checkpoint()
> + This interface is called after all VM state is transfered to
s/transfered/transferred/
> + Secondary QEMU. The Disk buffer will be dropped in this interface.
> +c. bdrv_get_sent_data_size()
> + This is used on Primary node.
> + It should be called by migration/checkpoint thread in order
> + to decide whether to start a new checkpoint or not. If the data
> + amount being sent is too large, we should start a new checkpoint.
> +d. bdrv_stop_replication()
> + It is called when failover. We will flush the Disk buffer into
s/when/on/
> + Secondary Disk and stop disk replication.
> +
> +== Usage ==
> +On both Primary/Secondary host, invoke QEMU with the following parameters:
> + "-drive file=blkcolo:host:port:/path/to/image"
> +a. host
> + Hostname or IP of the Secondary host.
> +b. port
> + The Secondary QEMU will listen on this port, and the Primary QEMU
> + will connect to this port.
>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
- Re: [Qemu-devel] [PATCH RESEND 1/2] Block: Block replication design for COLO,
Eric Blake <=