qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Nbd] [PATCH v3] doc: Add NBD_CMD_BLOCK_STATUS extensio


From: Wouter Verhelst
Subject: Re: [Qemu-devel] [Nbd] [PATCH v3] doc: Add NBD_CMD_BLOCK_STATUS extension
Date: Fri, 2 Dec 2016 10:25:58 +0100
User-agent: NeoMutt/20161104 (1.7.1)

Hi Vladimir,

On Thu, Dec 01, 2016 at 02:26:28PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 01.12.2016 13:14, Wouter Verhelst wrote:
[...]
> > -    - `NBD_ALLOC_ADD_CONTEXT` (2): the list of allocation contexts
> > +    - `NBD_META_ADD_CONTEXT` (2): the list of metadata contexts
> >         selected by the query string is added to the list of existing
> > -      allocation contexts.
> 
> If I understand correctly, it should be not 'existing', but 'exporting'. 
> So there are several contexts, server knows about. They are definitely 
> exists. Some of them may be selected (by client) for export (to client, 
> through get_block_status).
> 
> so, what about 'list of metadata contexts to export' or something like this?

Yes, good idea. Thanks.

[...]
> > +The "BASE:allocation" metadata context is the basic "exists at all"
> > +metadata context. If an extent is marked with `NBD_STATE_HOLE` at that
> > +context, this means that the given extent is not allocated in the
> > +backend storage, and that writing to the extent MAY result in the ENOSPC
> > +error. This supports sparse file semantics on the server side. If a
> > +server has only one metadata context (the default), then writing to an
> > +extent which has `NBD_STATE_HOLE` clear MUST NOT fail with ENOSPC.
> 
> this dependence looks strange. user defined metadata, why it affects 
> allocation?

The reference nbd-server implementation (i.e., the one that accompanies
this document) has a "copy-on-write" mode in which modifications are
written to a separate file. This separate file can easily be a sparse
file.

Let's say you have an export which is fully allocated; the
BASE:allocation state would have STATE_HOLE cleared. However, when
writing to a particular block that has not yet been written to during
that session, the copy-on-write mode would have to allocate a new block
in the copy-on-write file, which may fail with ENOSPC.

Perhaps the above paragraph could be updated in the sense that it SHOULD
NOT fail in other cases, but that it may depend on the semantics of the
other metadata contexts, whether active (i.e., selected with
NBD_OPT_META_CONTEXT) or not.

Side note: that does mean that some metadata contexts may be specific to
particular exports (e.g., you may have a list of named snapshots to
select, and which ones would exist would depend on the specific export
chosen). I guess that means the metadata contexts should have the name
of the export, too.

> At least, 'only one' is not descriptive, it would be better to mention
> 'BASE:allocation' name.

Yes, that would make things clearer, indeed.

> (I hope, I can ask server to export only dirty_bitmap context, not
> exporting allocation?

That was the point of naming that context and making it selectable :-)

> In that case this 'only one' would be dirty_bitmap)
> 
> >   For all other cases, this specification requires no specific semantics
> 
> what are 'other cases'? For all other metadata contexts? Or for all 
> cases when we have more than one context?

Other metadata contexts. I'll rephrease it in that sense.

> > -of allocation contexts. Implementations could support allocation
> > +of metadata contexts. Implementations could support metadata
> >   contexts with semantics like the following:
> >   
> > -- Incremental snapshots; if a block is allocated in one allocation
> > +- Incremental snapshots; if a block is allocated in one metadata
> >     context, that implies that it is also allocated in the next level up.
> >   - Various bits of data about the backend of the storage; e.g., if the
> > -  storage is written on a RAID array, an allocation context could
> > +  storage is written on a RAID array, a metadata context could
> >     return information about the redundancy level of a given extent
> >   - If the backend implements a write-through cache of some sort, or
> > -  synchronises with other servers, an allocation context could state
> > -  that an extent is "allocated" once it has reached permanent storage
> > +  synchronises with other servers, a metadata context could state
> > +  that an extent is "active" once it has reached permanent storage
> >     and/or is synchronized with other servers.
> 
> Incremental snapshots sounds strange for me. Snapshots are just 
> snapshots.. Backup may be incremental, but it is not about snapshots.. I 
> think this example may be safely deleted from the spec.

Yeah, I'll just remove all the examples; they're not really critical,
anyway, and might indeed confuse people.

[...]
> >       Servers MUST return an `NBD_REPLY_TYPE_BLOCK_STATUS` chunk for every
> > -    allocation context ID, except if the semantics of particular
> > -    allocation contexts mean that the information for one allocation
> > -    context is implied by the information for another.
> > +    metadata context ID, except if the semantics of particular
> > +    metadata contexts mean that the information for one active metadata
> > +    context is implied by the information for another; e.g., if a
> > +    particular metadata context can only have meaning for extents where
> > +    the `NBD_STATE_HOLE` flag is cleared on the "BASE:allocation"
> > +    context, servers MAY omit the relevant chunks for that context if
> > +    they already sent an extent with the `NBD_STATE_HOLE` flag set in
> > +    reply to the same `NBD_CMD_BLOCK_STATUS` command.
> 
> Hmm stop. Are you saying that server may omit some status descriptors 
> for some context? But how? We have not field 'offset' in status 
> descriptor..

Ah. Yes. I'm an idiot :-)

> Or we can omit _context_, if _all_ descriptors of allocation context
> are holes?
> 
> Anyway, I'm still against this paragraph. These 8 lines actually say 
> "server may omit contexts if he wants". I always can explain that 
> semantics of some context means that metadata is implied (actually, I 
> can just introduce such semantics).., but it not differs with "I want to 
> omit this". In this spec there is no way to negotiate context semantics, 
> so actually client doesn't know it. We just hope, that both client and 
> server are managed by some layer, which knows what is it all about.

Yes, I suppose you're right. I'll remove it. It never hurts to send too
much information, and otherwise clients might have to become "too smart"
to understand things properly.

> Also. If it all is about "metadata", name NBD_CMD_BLOCK_STATUS becomes 
> not very descriptive... May be we should move to something like just 
> NBD_CMD_GET_METADATA )

I don't think that's critical (and I'd rather not rename the branch ;-P )

[...]
> >       The exact semantics of what it means for a block to be "clean" or
> > -    "active" at a given allocation context is not defined by this
> > +    "active" at a given metadata context is not defined by this
> >       specification, except that the default in both cases should be to
> > -    clear the bit. That is, when the allocation context does not have
> > +    clear the bit. That is, when the metadata context does not have
> >       knowledge of the relevant status for the given extent, or when the
> > -    allocation context does not assign any meaning to it, the bits
> > +    metadata context does not assign any meaning to it, the bits
> >       should be cleared.
> 
> And again. If we said that it is user-defined metadata, and it is not 
> defined in this spec, why to define not-defined flags? I propose to 
> remove from here all tries to define internals of user-defined-metadata, 
> and than, when we finish up with clean and simple spec of the feature, 
> we can add separate paragraph, which will define metadata context for 
> dirty bitmaps like this:

Good point. I'll leave the "bits should be cleared" bit in there,
though.

> ====
> Metadata contexts for NBD_CMD_.... , with names not started with 
> "BASE:", are defined by third-party tools, but to avoid conflicts and to 
> have common documentation it is recommended to publish their names and 
> short descriptions here.
> 
> QEMU_DIRTY_BITMAP:<name>  - contexts family of dirty bitmaps, defined by 
> Qemu. Dirty bitmap is ...., It is used for ..., by some-company.
>                          status extent flags:
>                                   bit 0: 0 - means dirty  [If you want, 
> but I prefer 0 - clean, 1 - dirty]
>                                            1 - means clean
>                                   bits 1-31: reserved, always zero.
> RANDOM_DATA  - just random data
>                          status extent flags:
>                                  all 32 bits: random data
> ....
> ====

I don't think semantics of third-party implementations' metadata modes
should be part of this document, but I would be happy to add a link to a
document describing such semantics.

Thanks for your review!

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
       people in the world who think they really understand all of its rules,
       and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12



reply via email to

[Prev in Thread] Current Thread [Next in Thread]