qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Nbd] [PATCH v3] doc: Add NBD_CMD_BLOCK_STATUS extensio


From: Wouter Verhelst
Subject: Re: [Qemu-devel] [Nbd] [PATCH v3] doc: Add NBD_CMD_BLOCK_STATUS extension
Date: Sun, 27 Nov 2016 20:17:14 +0100
User-agent: NeoMutt/20161104 (1.7.1)

Hi Vladimir,

Quickly: the reason I haven't merged this yes is twofold:
- I wasn't thrilled with the proposal at the time. It felt a bit
  hackish, and bolted onto NBD so you could use it, but without defining
  everything in the NBD protocol. "We're reading some data, but it's not
  about you". That didn't feel right
- There were a number of questions still unanswered (you're answering a
  few below, so that's good).

For clarity, I have no objection whatsoever to adding more commands if
they're useful, but I would prefer that they're also useful with NBD on
its own, i.e., without requiring an initiation or correlation of some
state through another protocol or network connection or whatever. If
that's needed, that feels like I didn't do my job properly, if you get
my point.

On Fri, Nov 25, 2016 at 02:28:16PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> With the availability of sparse storage formats, it is often needed
> to query status of a particular range and read only those blocks of
> data that are actually present on the block device.
> 
> To provide such information, the patch adds the BLOCK_STATUS
> extension with one new NBD_CMD_BLOCK_STATUS command, a new
> structured reply chunk format, and a new transmission flag.
> 
> There exists a concept of data dirtiness, which is required
> during, for example, incremental block device backup. To express
> this concept via NBD protocol, this patch also adds a flag to
> NBD_CMD_BLOCK_STATUS to request dirtiness information rather than
> provisioning information; however, with the current proposal, data
> dirtiness is only useful with additional coordination outside of
> the NBD protocol (such as a way to start and stop the server from
> tracking dirty sectors).  Future NBD extensions may add commands
> to control dirtiness through NBD.
> 
> Since NBD protocol has no notion of block size, and to mimic SCSI
> "GET LBA STATUS" command more closely, it has been chosen to return
> a list of extents in the response of NBD_CMD_BLOCK_STATUS command,
> instead of a bitmap.
> 
> CC: Pavel Borzenkov <address@hidden>
> CC: Denis V. Lunev <address@hidden>
> CC: Wouter Verhelst <address@hidden>
> CC: Paolo Bonzini <address@hidden>
> CC: Kevin Wolf <address@hidden>
> CC: Stefan Hajnoczi <address@hidden>
> Signed-off-by: Eric Blake <address@hidden>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
> ---
> 
> v3:
> 
> Hi all. This is almost a resend of v2 (by Eric Blake), The only change is
> removing the restriction, that sum of status descriptor lengths must be equal
> to requested length. I.e., let's permit the server to replay with less data
> than required if it wants.

Reasonable, yes. The length that the client requests should be a maximum (i.e.
"I'm interested in this range"), not an exact request.

> Also, bit of NBD_FLAG_SEND_BLOCK_STATUS is changed to 9, as 8 is now
>  NBD_FLAG_CAN_MULTI_CONN in master branch.

Right.

> And, finally, I've rebased this onto current state of
> extension-structured-reply branch (which itself should be rebased on
> master IMHO).

Probably a good idea, given the above.

> By this resend I just want to continue the diqussion, started about half
> a year ago. Here is a summary of some questions and ideas from v2
> diqussion:
> 
> 1. Q: Synchronisation. Is such data (dirty/allocated) reliable? 
>    A: This all is for read-only disks, so the data is static and unchangeable.

I think we should declare that it's up to the client to ensure no other
writes happen without its knowledge. This may be because the client and
server communicate out of band about state changes, or because the
client somehow knows that it's the only writer, or whatever.

We can easily do that by declaring that the result of that command only
talks about *current* state, and that concurrent writes by different
clients may invalidate the state. This is true for NBD in general (i.e.,
concurrent read or write commands from other clients may confuse file
systems on top of NBD), so it doesn't change expectations in any way.

> 2. Q: different granularities of dirty/allocated bitmaps. Any problems?
>    A: 1: server replies with status descriptors of any size, granularity
>          is hidden from the client
>       2: dirty/allocated requests are separate and unrelated to each
>          other, so their granularities are not intersecting

Not entirely sure anymore what this is about?

> 3. Q: selecting of dirty bitmap to export
>    A: several variants:
>       1: id of bitmap is in flags field of request
>           pros: - simple
>           cons: - it's a hack. flags field is for other uses.
>                 - we'll have to map bitmap names to these "ids"
>       2: introduce extended nbd requests with variable length and exploit this
>          feature for BLOCK_STATUS command, specifying bitmap identifier.
>          pros: - looks like a true way
>          cons: - we have to create additional extension
>                - possible we have to create a map,
>                  {<QEMU bitmap name> <=> <NBD bitmap id>}
>       3: exteranl tool should select which bitmap to export. So, in case of 
> Qemu
>          it should be something like qmp command block-export-dirty-bitmap.
>          pros: - simple
>                - we can extend it to behave like (2) later
>          cons: - additional qmp command to implement (possibly, the lesser 
> evil)
>          note: Hmm, external tool can make chose between allocated/dirty data 
> too,
>                so, we can remove 'NBD_FLAG_STATUS_DIRTY' flag at all.

Downside of 3, though, is that it moves the definition of what the
different states mean outside of the NBD protocol (i.e., the protocol
messages are not entirely defined anymore, and their meaning depends on
the clients and servers in use).

To avoid this, we should have a clear definition of what the reply means
*by default*, but then we can add a note that clients and servers can
possibly define other meanings out of band if they want to.

> 4. Q: Should not get_{allocated,dirty} be separate commands?
>    cons: Two commands with almost same semantic and similar means?
>    pros: However here is a good point of separating clearly defined and native
>          for block devices GET_BLOCK_STATUS from user-driven and actually
>          undefined data, called 'dirtyness'.

Yeah, having them separate commands might be a bad idea indeed.

> 5. Number of status descriptors, sent by server, should be restricted
>    variants:
>    1: just allow server to restrict this as it wants (which was done in v3)
>    2: (not excluding 1). Client specifies somehow the maximum for number
>       of descriptors.
>       2.1: add command flag, which will request only one descriptor
>            (otherwise, no restrictions from the client)
>       2.2: again, introduce extended nbd requests, and add field to
>            specify this maximum

I think having a flag which requests just one descriptor can be useful,
but I'm hesitant to add it unless it's actually going to be used; so in
other words, I'll leave the decision on that bit to you.

> 6. A: What to do with unspecified flags (in request/reply)?
>    I think the normal variant is to make them reserved. (Server should
>    return EINVAL if found unknown bits, client should consider replay
>    with unknown bits as an error)

Right, probably best to do that, yes.

> ======
> 
> Also, an idea on 2-4:
> 
>     As we say, that dirtiness is unknown for NBD, and external tool
>     should specify, manage and understand, which data is actually
>     transmitted, why not just call it user_data and leave status field
>     of reply chunk unspecified in this case?
> 
>     So, I propose one flag for NBD_CMD_BLOCK_STATUS:
>     NBD_FLAG_STATUS_USER. If it is clear, than behaviour is defined by
>     Eric's 'Block provisioning status' paragraph.  If it is set, we just
>     leave status field for some external... protocol? Who knows, what is
>     this user data.

Yes, this sounds like a reasonable approach.

>     Note: I'm not sure, that I like this (my) proposal. It's just an
>     idea, may be someone like it.  And, I think, it represents what we
>     are trying to do more honestly.

Indeed.

>     Note2: the next step of generalization will be NBD_CMD_USER, with
>     variable request size, structured reply and no definition :)

Well, er, no please, if we can avoid it :-)

> Another idea, about backups themselves:
> 
>     Why do we need allocated/zero status for backup? IMHO we don't.

Well, I've been thinking so all along, but then I don't really know what
it is, in detail, that you want to do :-)

I can understand a "has this changed since time X" request, which the
"dirty" thing seems to want to be. Whether something is allocated is
just a special case of that.

Actually, come to think of that. What is the exact use case for this
thing? I understand you're trying to create incremental backups of
things, which would imply you don't write from the client that is
getting the block status thingies, right? If so, how about:

- NBD_OPT_GET_SNAPSHOTS (during negotiation): returns a list of
  snapshots. Not required, optional, includes a machine-readable form,
  not defined by NBD, which explains what the snapshot is about (e.g., a
  qemu json file). The "base" version of that is just "allocation
  status", and is implied (i.e., you don't need to run
  NBD_OPT_GET_SNAPSHOTS if you're not interested in anything but the
  allocation status).
- NBD_CMD_BLOCK_STATUS (during transmission), returns block descriptors
  which tell you what the status of a block of data is for each of the
  relevant snapshots that we know about.

Perhaps this is somewhat overengineered, but it does bring most of the
definition of what a snapshot is back into the NBD protocol, without
having to say "this could be anything", and without requiring
connectivity over two ports for this to be useful (e.g., you could store
the machine-readable form of the snapshot description into your backup
program and match what they mean with what you're interested in at
restore time, etc).

This wouldn't work if you're interested in new snapshots that get
created once we've already moved into transmission, but hey.

Thoughts?

>     Full backup: just do structured read - it will show us, which chunks
>     may be treaded as zeroes.

Right.

>     Incremental backup: get dirty bitmap (somehow, for example through
>     user-defined part of proposed command), than, for dirty blocks, read
>     them through structured read, so information about zero/unallocated
>     areas are here.
> 
> For me all the variants above are OK. Let's finally choose something.
> 
> v2:
> v1 was: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg05574.html
> 
> Since then, we've added the STRUCTURED_REPLY extension, which
> necessitates a rather larger rebase; I've also changed things
> to rename the command 'NBD_CMD_BLOCK_STATUS', changed the request
> modes to be determined by boolean flags (rather than by fixed
> values of the 16-bit flags field), changed the reply status fields
> to be bitwise-or values (with a default of 0 always being sane),
> and changed the descriptor layout to drop an offset but to include
> a 32-bit status so that the descriptor is nicely 8-byte aligned
> without padding.
> 
>  doc/proto.md | 155 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 154 insertions(+), 1 deletion(-)

[...]

I'll commit this in a minute into a separate branch called
"extension-blockstatus", under the understanding that changes are still
required, as per above (i.e., don't assume that just because there's a
branch I'm happy with the current result ;-)

Regards

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
       people in the world who think they really understand all of its rules,
       and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12



reply via email to

[Prev in Thread] Current Thread [Next in Thread]