qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH v6 2/2] live-block-ops.txt: Rename, rewrite, and


From: Eric Blake
Subject: Re: [Qemu-block] [PATCH v6 2/2] live-block-ops.txt: Rename, rewrite, and improve it
Date: Tue, 11 Jul 2017 10:03:29 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1

On 07/10/2017 03:15 AM, Kashyap Chamarthy wrote:
> This patch documents (including their QMP invocations) all the four
> major kinds of live block operations:
> 
>   - `block-stream`
>   - `block-commit`
>   - `drive-mirror` (& `blockdev-mirror`)
>   - `drive-backup` (& `blockdev-backup`)
> 
> Things considered while writing this document:
> 
>   - Use reStructuredText as markup language (with the goal of generating
>     the HTML output using the Sphinx Documentation Generator).  It is
>     gentler on the eye, and can be trivially converted to different
>     formats.  (Another reason: upstream QEMU is considering to switch to
>     Sphinx, which uses reStructuredText as its markup language.)
> 
>   - Raw QMP JSON output vs. 'qmp-shell'.  I debated with myself whether
>     to only show raw QMP JSON output (as that is the canonical
>     representation), or use 'qmp-shell', which takes key-value pairs.  I
>     settled on the approach of: for the first occurence of a command,

s/occurence/occurrence/

>     use raw JSON; for subsequent occurences, use 'qmp-shell', with an

and again

>     occasional exception.
> 
>   - Usage of `-blockdev` command-line.
> 
>   - Usage of 'node-name' vs. file path to refer to disks.  While we have
>     `blockdev-{mirror, backup}` as 'node-name'-alternatives for
>     `drive-{mirror, backup}`, the `block-commit` command still operate

s/operate/operates/

>     on file names for parameters 'base' and 'top'.  So I added a caveat
>     at the beginning to that effect.
> 
>     Refer this related thread that I started (where I learnt
>     `block-stream` was recently reworked to accept 'node-name' for 'top'
>     and 'base' parameters):
>     https://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg06466.html
>     "[RFC] Making 'block-stream', and 'block-commit' accept node-name"
> 
> All commands showed in this document were tested while documenting.
> 
> Thanks: Eric Blake for the section: "A note on points-in-time vs file
> names".  This useful bit was originally articulated by Eric in his
> KVMForum 2015 presentation, so I included that specific bit in this
> document.
> 
> Signed-off-by: Kashyap Chamarthy <address@hidden>
> ---

> 
> diff --git a/docs/interop/live-block-operations.rst 
> b/docs/interop/live-block-operations.rst
> new file mode 100644
> index 0000000..6580f85
> --- /dev/null
> +++ b/docs/interop/live-block-operations.rst
> @@ -0,0 +1,1088 @@
> +..
> +    Copyright (C) 2017 Red Hat Inc.
> +
> +    This work is licensed under the terms of the GNU GPL, version 2 or
> +    later.  See the COPYING file in the top-level directory.

Does this paragraph get rendered in such a way that someone reading an
.html site will wonder where the top-level directory lives?  I'm not
sure if it should be a comment local to this file, or if the final
rendered text should mention the license.  Hmm, reading further, it
looks like the '..' followed by indentation serves as a multi-line
comment that does not appear in the rendering; so I think that means I
have no recommended change.

> +Disk image backing chain notation
> +---------------------------------
> +
> +A simple disk image chain.  (This can be created live using QMP
> +``blockdev-snapshot-sync``, or offline via ``qemu-img``)::

Do we want to go into details about the command-line arguments to
qemu-img used for offline creation/manipulation of an image in a chain?
I guess it's okay to not worry about it; your focus here is QMP commands
(what can we do while qemu is running) rather than offline commands.

> +
> +Brief overview of live block QMP primitives
> +-------------------------------------------
> +
> +The following are the four different kinds of live block operations that
> +QEMU block layer supports.
> +
> +(1) ``block-stream``: Live copy of data from backing files into overlay
> +    files.
> +
> +    .. note:: Once the 'stream' operation has finished, three things to
> +              note:
> +
> +                (a) QEMU rewrites the backing chain to remove
> +                    reference to the now-streamed and redundant backing
> +                    file;
> +
> +                (b) the streamed file *itself* won't be removed by QEMU,
> +                    and must be explicitly discarded by the user;
> +
> +                (c) the streamed file remains valid -- i.e. further
> +                    overlays can be created based on it.  Refer the
> +                    ``block-stream`` section further below for more
> +                    details.
> +
> +(2) ``block-commit``: Live merge of data from overlay files into backing
> +    files (with the optional goal of removing the overlay file from the
> +    chain).  Since QEMU 2.0, this includes "active ``block-commit``"
> +    (i.e. merge the current active layer into the base image).
> +
> +    .. note:: Once the 'commit' operation has finished, there are three
> +              things to note here as well:
> +
> +                (a) QEMU rewrites the backing chain to remove reference
> +                    to now-redundant overlay images that have been
> +                    commited into a backing file;

s/commited/committed/ (several places in the document, I'll just point
it out here)

> +
> +                (b) the commited file *itself* won't be removed by QEMU
> +                    -- it ought to be manually removed;
> +
> +                (c) however, unlike in the case of ``block-stream``, the
> +                    intermediate images will be rendered invalid -- i.e.
> +                    no more further overlays can be created based on
> +                    them.  Refer the ``block-commit`` section further
> +                    below for more details.
> +
> +(3) ``drive-mirror`` (and ``blockdev-mirror``): Synchronize running disk

s/running/a running/

> +    to another image.
> +
> +(4) ``drive-backup`` (and ``blockdev-backup``): Point-in-time (live) copy
> +    of a block device to a destination.
> +
> +
> +.. _`Interacting with a QEMU instance`:
> +
> +Interacting with a QEMU instance
> +--------------------------------
> +
> +To show some example invocations of command-line, we will use the
> +following invocation of QEMU, with a QMP server running over UNIX
> +socket::
> +
> +    $ ./x86_64-softmmu/qemu-system-x86_64 -display none -nodefconfig \
> +        -M q35 -nodefaults -m 512 \
> +        -blockdev 
> node-name=node-A,driver=qcow2,file.driver=file,file.node-name=file,file.filename=./a.qcow2
>  \
> +        -device virtio-blk,drive=node-A,id=virtio0 \
> +        -monitor stdio -qmp unix:/tmp/qmp-sock,server,nowait
> +
> +The ``-blockdev`` command-line option, used above, is available from
> +QEMU 2.9 onwards.  In the above invocation, notice the ``node-name``
> +parameter that is used to refer to the disk image a.qcow2 ('node-A') --
> +this is a cleaner way to refer to a disk image (as opposed to referring
> +to it by spelling out file paths).  So, we will continue to designate a
> +``node-name`` to each further disk image created (either via
> +``blockdev-snapshot-sync``, or ``blockdev-add``) as part of the disk
> +image chain, and continue to refer to the disks using their
> +``node-name`` (where possible, because ``block-commit`` does not yet, as
> +of QEMU 2.9, accept ``node-name`` parameter) when performing various
> +block operations.
> +
> +To interact with the QEMU instance launched above, we will use the
> +``qmp-shell`` (located at: ``qemu/scripts/qmp``, as part of the QEMU
> +source directory) utility, which takes key-value pairs for QMP commands.

s/qmp-shell (...) utility/qmp-shell utility (...)/

> +Invoke it as below (which will also print out the complete raw JSON
> +syntax for reference -- examples in the following sections)::
> +
> +    $ ./qmp-shell -v -p /tmp/qmp-sock
> +    (QEMU)
> +
> +.. note::
> +    In the event we have to repeat a certain QMP command, we will: for
> +    the first occurrence of it, show the ``qmp-shell`` invocation, *and*
> +    the corresponding raw JSON QMP syntax; but for subsequent
> +    invocations, present just the ``qmp-shell`` syntax, and omit the
> +    equivalent JSON output.
> +
> +
> +Example disk image chain
> +------------------------
> +
> +We will use the below disk image chain (and occasionally spelling it
> +out where appropriate) when discussing various primitives::
> +
> +    [A] <-- [B] <-- [C] <-- [D]
> +
> +Where [A] is the original base image; [B] and [C] are intermediate
> +overlay images; image [D] is the active layer -- i.e. live QEMU is
> +writing to it.  (The rule of thumb is: live QEMU will always be pointing
> +to the rightmost image in a disk image chain.)
> +
> +The above image chain can be created by invoking
> +``blockdev-snapshot-sync`` commands as following (which shows the
> +creation of overlay image [B]) using the ``qmp-shell`` (our invocation
> +also prints the raw JSON invocation of it)::
> +
> +    (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 
> snapshot-node-name=node-B format=qcow2
> +    {
> +        "execute": "blockdev-snapshot-sync",
> +        "arguments": {
> +            "node-name": "node-A",
> +            "snapshot-file": "b.qcow2",
> +            "format": "qcow2",
> +            "snapshot-node-name": "node-B"
> +        }
> +    }
> +
> +Here, "node-A" is the name QEMU internally uses to refer to the base
> +image [A] -- it is the backing file, based on which the overlay image,
> +[B], is created.
> +
> +To create the rest of the overlay images, [C], and [D] (omitted the raw

s/omitted/omitting/

> +JSON output for brevity)::
> +
> +    (QEMU) blockdev-snapshot-sync node-name=node-B snapshot-file=c.qcow2 
> snapshot-node-name=node-C format=qcow2
> +    (QEMU) blockdev-snapshot-sync node-name=node-C snapshot-file=d.qcow2 
> snapshot-node-name=node-D format=qcow2
> +

> +QMP invocation for ``block-commit``
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +For :ref:`Case-1 <block-commit_Case-1>`, to merge contents only from
> +image [B] into image [A], the invocation is as following::

s/following/follows/

> +
> +    (QEMU) block-commit device=node-D base=a.qcow2 top=b.qcow2 job-id=job0
> +    {
> +        "execute": "block-commit",
> +        "arguments": {
> +            "device": "node-D",
> +            "job-id": "job0",
> +            "top": "b.qcow2",
> +            "base": "a.qcow2"
> +        }
> +    }
> +
> +Once the above ``block-commit`` operation has completed, a
> +``BLOCK_JOB_COMPLETED`` event will be issued, and no further action is
> +required.  The end result being, the backing file of image [C] is

Comes off awkwardly to me, but I'm debating on the best fix.  Perhaps:

s/The end result being,/As the end result,/

> +adjusted to point to image [A], and the original 4-image chain will end
> +up being transformed to::
> +

> +
> +Live disk synchronization --- ``drive-mirror`` and ``blockdev-mirror``
> +----------------------------------------------------------------------
> +
> +Synchronize a running disk image chain (all or part of it) to a target
> +image.
> +
> +Again, given our familiar disk image chain::
> +
> +    [A] <-- [B] <-- [C] <-- [D]
> +
> +The ``drive-mirror`` (and its newer equivalent ``blockdev-mirror``) allows
> +you to copy data from the entire chain into a single target image (which
> +can be located on a different host).
> +
> +Once a 'mirror' job has started, there are two possible actions when a

maybe s/when/while/

> +``drive-mirror`` job is active:
> +
> +(1) Issuing the command ``block-job-cancel`` after it emits the event
> +    ``BLOCK_JOB_CANCELLED``: will (after completing synchronization of
> +    the content from the disk image chain to the target image, [E])
> +    create a point-in-time (which is at the time of *triggering* the
> +    cancel command) copy, contained in image [E], of the the entire disk
> +    image chain (or only the top-most image, depending on the ``sync``
> +    mode).
> +
> +(2) Issuing the command ``block-job-complete`` after it emits the event
> +    ``BLOCK_JOB_COMPLETED``: will, after completing synchronization of
> +    the content, adjust the guest device (i.e. live QEMU) to point to
> +    the target image, and, causing all the new writes from this point on
> +    to happen there.  One use case for this is live storage migration.
> +
> +About synchronization modes: The synchronization mode determines
> +*which* part of the disk image chain will be copied to the target.
> +Currently, there are four different kinds:
> +
> +(1) ``full`` -- Synchronize the content of entire disk image chain to
> +    the target
> +
> +(2) ``top`` -- Synchronize only the contents of the top-most disk image
> +    in the chain to the target
> +
> +(3) ``none`` -- Synchronize only the new writes from this point on.
> +
> +    .. note:: In the case of ``drive-backup`` (or ``blockdev-backup``),
> +              the behavior of ``none`` sychronization mode is different.

s/sychronization/synchronization/

> +              Normally, a ``backup`` job consists of two parts: Anything
> +              that is overwritten by the guest is first copied out to
> +              the backup, and in the background the whole image is
> +              copied from start to end. With ``sync=none``, it's only
> +              the first part.
> +
> +(4) ``incremental`` -- Synchronize content that is described by the
> +    dirty bitmap
> +
> +.. note::
> +    Refer to the :doc:`bitmaps` document in the QEMU source
> +    tree to learn about the detailed workings of the ``incremental``
> +    synchronization mode.
> +
> +
> +QMP invocation for ``drive-mirror``
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +

> +.. important::
> +    The destination host must already have the contents of the backing
> +    chain, involving images [A], [B], and [C], visible via other means
> +    -- whether by ``cp``, ``rsync``, or by some storage array-specific
> +    command.)
> +
> +Sometimes, this is also referred to as "shallow copy" -- because: only

s/because:/because/

> +the "active layer", and not the rest of the image chain, is copied to
> +the destination.
> +
> +.. note::
> +    In this example, for the sake of simplicity, we'll be using the same
> +    ``localhost`` as both, source and destination.

s/both,/both/

> +
> +As noted earlier, on the destination host the contents of the backing
> +chain -- from images [A] to [C] -- are already expected to exist in some
> +form (e.g. in a file called, ``Contents-of-A-B-C.qcow2``).  Now, on the
> +destination host, let's create a target overlay image (with the image
> +``Contents-of-A-B-C.qcow2`` as its backing file), to which the contents
> +of image [D] (from the source QEMU) will be mirrored to::
> +
> +    $ qemu-img create -f qcow2 -b ./Contents-of-A-B-C.qcow2 \
> +        -F qcow2 ./target-disk.qcow2

Ah, so you DO have one example of an offline use of qemu-img for
manipulating backing chain relationships.

> +
> +And start the destination QEMU (we already have the source QEMU running
> +-- discussed in the section: `Interacting with a QEMU instance`_)
> +instance, with the following invocation.  (As noted earlier, for
> +simplicity's sake, the destination QEMU is started on the same host, but
> +it could be located elsewhere)::

libvirt doesn't allow migration to localhost - but that doesn't affect
your example...

> +(6) [On *destination* QEMU] Finally, resume the guest vCPUs by issuing the
> +    QMP command `cont`::
> +
> +        (QEMU) cont
> +        {
> +            "execute": "cont",
> +            "arguments": {}
> +        }
> +
> +
> +.. note::
> +    Higher-level libraries (e.g. libvirt) automate the entire above
> +    process.

...other than this note. Maybe s/process./process (although note that
libvirt does not allow same-host migrations to localhost for other reasons).

Overall, looking good!  Content-wise, I think we have a good document,
and it was just a few spelling errors and grammar suggestions, minor
enough that I'm comfortable with you adding:
Reviewed-by: Eric Blake <address@hidden>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]