[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Nbd] [PATCH v2] doc: Add NBD_CMD_BLOCK_STATUS extensio

From: Denis V. Lunev
Subject: Re: [Qemu-devel] [Nbd] [PATCH v2] doc: Add NBD_CMD_BLOCK_STATUS extension
Date: Mon, 4 Apr 2016 22:54:02 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1

On 04/04/2016 10:34 PM, Eric Blake wrote:
On 04/04/2016 12:06 PM, Alex Bligh wrote:
On 4 Apr 2016, at 17:39, Eric Blake <address@hidden> wrote:

+    This command is meant to operate in tandem with other (non-NBD)
+    channels to the server.  Generally, a "dirty" block is a block
+    that has been written to by someone, but the exact meaning of "has
+    been written" is left to the implementation.  For example, a
+    virtual machine monitor could provide a (non-NBD) command to start
+    tracking blocks written by the virtual machine.  A backup client
+    can then connect to an NBD server provided by the virtual machine
+    monitor and use `NBD_CMD_BLOCK_STATUS` with the
+    `NBD_FLAG_STATUS_DIRTY` bit set in order to read only the dirty
+    blocks that the virtual machine has changed.
+    An implementation that doesn't track the "dirtiness" state of
+    blocks MUST either fail this command with `EINVAL`, or mark all
+    blocks as dirty in the descriptor that it returns.  Upon receiving
+    an `NBD_CMD_BLOCK_STATUS` command with the flag
+    `NBD_FLAG_STATUS_DIRTY` set, the server MUST return the dirtiness
+    status of the device, where the status field of each descriptor is
+    determined by the following bit:
+      - `NBD_STATE_CLEAN` (bit 2); if set, the block represents a
+        portion of the file that is still clean because it has not
+        been written; if clear, the block represents a portion of the
+        file that is dirty, or where the server could not otherwise
+        determine its status.
A couple of questions:

1. I am not sure that the block dirtiness and the zero/allocation/hole thing
    always have the same natural blocksize. It's pretty easy to imagine
    a server whose natural blocksize is a disk sector (and can therefore
    report presence of zeroes to that resolution) but where 'dirtiness'
    was maintained independently at a less fine-grained level. Maybe
    that suggests 2 commands would be useful.
In fact, qemu does just that with qcow2 images - the user can request a
dirtiness granularity that is much larger than cluster granularity
(where clusters are the current limitation on reporting holes, but where
Kevin Wolf has an idea about a potential qcow2 extension that would even
let us report holes at a sector granularity).

Nothing requires the two uses to report at the same granularity.  THe
NBD_REPLY_TYPE_BLOCK_STATUS allows the server to divide into descriptors
as it sees fit (so it could report holes at a 4k granularity, but
dirtiness only at a 64k granularity) - all that matters is that when all
the descriptors have been sent, they total up to the length of the
original client request.  So by itself, granularity does not require
another command.

2. Given the communication is out of band, how is it realistically
    possible to sync this backup? You'll ask for all the dirty blocks,
    but whilst the command is being executed (as well as immediately
    after the reply) further blocks may be dirtied. So your reply
    always overestimates what is clean (probably the wrong way around).
    Furthermore, the next time you do a 'backup', you don't know whether
    the blocks were dirty as they were dirty on the previous backup,
    or because they were dirty on this backup.
You are correct that as a one-way operation, querying dirtiness is not
very useful if there is not a way to mark something clean, or if
something else can be dirtying things in parallel.  But that doesn't
mean the command is not useful - if the NBD server is exporting a file
as read-only, where nothing else can be dirtying it in parallel, then a
single pass over the dirty information is sufficient to learn what
portions of the file to copy out.

At this point, I was just trying to rebase the proposal as originally
made by Denis and Pavel; perhaps they will have more insight on how they
envisioned using the command, or on whether we should try harder to make
this more of a two-way protocol (where the client can tell the server
when to mark something as clean, or when to start tracking whether
something is dirty).
for now and for QEMU we want this to expose accumulated dirtiness
of the block device, which is collected by the server. Yes, this requires
external coordination. May be this COULD be the part of the protocol,
but QEMU will not use that part of the protocol.

saying about dirtiness, we would soon come to the fact, that
we can have several dirtiness states regarding different
lines of incremental backups. This complexity is hidden
inside QEMU and it would be very difficult to publish and
reuse it.

If I was designing a backup protocol (off the top of my head) I'd
make all commands return a monotonic 64 bit counter of the number of
writes to the disk since some arbitrary time, and provide a 'GETDIRTY'
command that returned all blocks with a monotonic counter greater than that.
That way I could precisely get the writes that were executed since
any particular read. You'd allow it to be 'slack' and include things
in that list that might not have changed (i.e. false positives) but
not false negatives.
Yes, that might work as an implementation - but there's the question of
whether other implementations would also work.  We want the protocol to
describe the concept, and not be too heavily tied to one particular

The documentation is also trying to be very straightforward that asking
about dirtiness requires out-of-band coordination, and that a server can
just blindly report everything as dirty if there is no better thing to
report.  So anyone actually making use of this command already has to be
aware of the out-of-band coordination needed to make it useful.

yes, and this approach is perfect. If there is no information about
dirtiness, we should report this as all dirty. Though this information
could be type-specific.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]