qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v7] spec: add qcow2 bitmaps extension specificat


From: John Snow
Subject: Re: [Qemu-devel] [PATCH v7] spec: add qcow2 bitmaps extension specification
Date: Mon, 18 Jan 2016 11:54:38 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0


On 01/16/2016 09:06 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 15.01.2016 02:26, John Snow wrote:
>>
>> On 01/14/2016 05:08 PM, Eric Blake wrote:
>>> On 01/11/2016 06:05 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>> The new feature for qcow2: storing bitmaps.
>>>>
>>>> This patch adds new header extension to qcow2 - Bitmaps Extension. It
>>>> provides an ability to store virtual disk related bitmaps in a qcow2
>>>> image. For now there is only one type of such bitmaps: Dirty Tracking
>>>> Bitmap, which just tracks virtual disk changes from some moment.
>>>>
>>>> Note: Only bitmaps, relative to the virtual disk, stored in qcow2 file,
>>>> should be stored in this qcow2 file. The size of each bitmap
>>>> (considering its granularity) is equal to virtual disk size.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
>>>> ---
>>>>
>>>> @@ -166,6 +178,34 @@ the header extension data. Each entry look like
>>>> this:
>>>>                       terminated if it has full length)
>>>>     +== Bitmaps extension ==
>>>> +          0 -  3:  nb_bitmaps
>>>> +                   The number of bitmaps contained in the image.
>>>> Must be
>>>> +                   greater than or equal to 1.
>>>> +
>>>> +                   Note: Qemu currently only supports up to 65535
>>>> bitmaps per
>>>> +                   image.
>>>> +
>>>> +          4 -  7:  bitmap_directory_size
>>>> +                   Size of the bitmap directory in bytes. It is the
>>>> cumulative
>>>> +                   size of all (nb_bitmaps) bitmap headers.
>>> Only 4 bytes - if we ever raise our 64k entry restriction (nb_bitmaps),
>>> could we run into an image that has so many directory entries as to make
>>> the directory itself spill past 4G?  But I don't think it is likely, so
>>> I can live with your choice.
>>>
>> "We'll never need this!"
>>
>> I hope someone in 2082 is reading this right now and is quite angry.
>>
>> (But really, I can't foresee needing this many per each drive -- and if
>> we do, we have external storage mechanisms in development to handle such
>> wild cases.)
> 
> Let's change it to 8 bytes, and add 4bytes reserved field to keep 8b
> alignment.
> 
>>
>>>> +
>>>> +== Bitmaps ==
>>>> +
>>>> +As mentioned above, the bitmaps extension provides the ability to
>>>> store bitmaps
>>>> +related a virtual disk. This section describes how these bitmaps
>>>> are stored.
>>>> +
>>>> +Note: all bitmaps are related to the virtual disk stored in this
>>>> image.
>>>> +
>>>> +=== Bitmap directory ===
>>>> +
>>>> +Each bitmap saved in the image is described in a bitmap directory
>>>> entry. The
>>>> +bitmap directory is a contiguous area in the image file, whose
>>>> starting offset
>>>> +and length are given by the header extension fields
>>>> bitmap_directory_offset and
>>>> +bitmap_directory_size. The entries of the bitmap directory have
>>>> variable
>>>> +length, depending on the length of the bitmap name and extra data.
>>>> These
>>>> +entries are also called bitmap headers.
>>>> +
>>>> +Structure of a bitmap directory entry:
>>>> +
>>>> +    Byte 0 -  7:    bitmap_table_offset
>>>> +                    Offset into the image file at which the bitmap
>>>> table
>>>> +                    (described below) for the bitmap starts. Must
>>>> be aligned to
>>>> +                    a cluster boundary.
>>>> +
>>>> +         8 - 11:    bitmap_table_size
>>>> +                    Number of entries in the bitmap table of the
>>>> bitmap.
>>> Should this be the size in bytes, instead of the number of entries? But
>> For what benefit? We can calculate either from the other, and this gives
>> us a better resolution.
>>
>>> at least the entries are fixed width of 8 bytes each, so this lets you
>>> get a bitmap table up to 32G bytes rather than just 4G in size.  (Let's
>>> see here - if we have 32G bytes in the bitmap table, that means 4G
>>> clusters occupied by the bitmap itself; in the worst case of 512-byte
>>> clusters and granularity 0, that is a maximum bitmap size of 2T
>>> describing 16T of virtual guest image; with larger cluster size and/or
>>> larger granularity, we cover a lot more virtual guest space with less
>>> bitmap size; so I guess we aren't too worried about running out of
>>> space?).
>>>
>> Yes, worst case of g=0 and cluster size of 512 bytes, we can get 2T
>> bitmaps describing 16T of virtual data.
>>
>> "default case" of 64K clusters and 64K granularity: 256TiB bitmaps
>> describing ... let's see ... if my math is right, 128EiB?
>>
>> We're probably fine :)
>>
>> (Cue future space-person from 2159 wondering how I could have ever been
>> so naive. Sorry, future space-person!)
> 
> it is just like l1_size from qcow2 header.
> 
>          36 - 39:   l1_size
>                     Number of entries in the active L1 table
> 
> 
>>
>>>> +        20 - 23:    extra_data_size
>>>> +                    Size of type-specific extra data.
>>>> +
>>>> +                    For now, as no extra data is defined,
>>>> extra_data_size is
>>>> +                    reserved and must be zero.
>>>> +
>>>> +        variable:   Type-specific extra data for the bitmap.
>>> I'd write this as:
>>>             variable:   extra_data
>>>                         Type-specific extra data for the bitmap,
>>>                         occupying extra_data_size bytes.
> 
> Ok.. then, similar should be done for the name.
> 
> About extra data and our discussion of versioning it. For now, we don't
> have concrete cases and requirements for this data, we are only foresee,
> that it may be needed in future, and waste time in assumptions about
> what kind of data will it be..
> 
> Will it be really type specific, i.e. bitmap type strictly define its
> structure? Or it will contain self-defined fields, some of them will be
> optional, some of them will depend on bitmap type or something like
> this? We have no concrete cases and strategy of using this data... We
> need not versioning for now, as we have no versions).
> 
> The only question, that should be answered, what should do the software,
> if it find extra_data of unknown format. (currently any extra data is
> unknown).
> 
> variants:
> 
> 1) abort (or, more precisely: "image file must not be written") (like
> incompatible feature)
> 2) delete extra data (like autoclear feature)
> 3) ignore (like compatible feature)
> 
> We can use two bits of flags field to specify the behavior.
> 
> ===========================================
> 
> Or, the other way, we should fix extra data format like this:
> 
> Extra data contains several fields of the following format:
> 
> 4b: field magic
> 1b: flags: specifies abort/delete/ignore behavior if the field is unknown
> 1b: field name size - for error message if the field is unknown
> 2b: field size
> var: field name
> var: field data
> var: padding
> 
> - where I've seen it...
> 
> =================
> 
> And the last option:
> 
> We can select abort behavior for now for any extra data, i.e. just
> remove "variable extra_data" and make field extra_data_size 'reserved
> must be zero'.
> 
> ================
> 
> 
> So, lets finally choose something and go on.
> 
> 
> Note: for ways 1 and 3, we should in future define extra data in the way
> that gives a software clear method to understand, is this extra data
> known or unknown for it. I.e. it should have some magic number, defining
> format, or version, or structure like in way 2.
> 
>>>
>>>> +
>>>> +        variable:   The name of the bitmap (not null terminated).
>>>> Must be
>>>> +                    unique among all bitmap names within the
>>>> bitmaps extension.
>>>> +
>>>> +        variable:   Padding to round up the bitmap directory entry
>>>> size to the
>>>> +                    next multiple of 8.
>>> Should we require the padding to be all NUL bytes?  (We aren't
>>> consistent on whether we require that for other locations of padding in
>>> the spec, so that could be a followup patch).
> 
> Let's require this, I do not see any shortcomings of this, only benefits.
> 
>>>
>>>> +
>>>> +=== Bitmap table ===
>>>> +
>>>> +Bitmaps are stored using a one-level structure (as opposed to
>>>> two-level
>>>> +structure like for refcounts and guest clusters mapping) for the
>>>> mapping of
>>>> +bitmap data to host clusters. This structure is called the bitmap
>>>> table.
>>> Possible wording tweak:
>>> Bitmaps are stored using a one-level structure (as opposed to the
>>> two-level structures for refcounts and guest cluster mapping), and are
>>> used for the mapping of bitmap data to host clusters
> 
> No, this is wrong.. Who are used? Bitmaps are stored and are used? No,
> not bitmaps, structures are used.
> 
>>>
>>>> +
>>>> +Each bitmap table has a variable size (stored in the bitmap
>>>> directory Entry)
>>> Does 'Entry' still need to be capitalized?
> 
> It's a mistake, thanks.
> 
>>>
>>>> +and may use multiple clusters, however, it must be contiguous in
>>>> the image
>>>> +file.
>>>> +
>>>> +Structure of a bitmap table entry:
>>>> +
>>>> +    Bit       0:    Reserved and must be zero if bits 9 - 55 are
>>>> non-zero.
>>>> +                    If bits 9 - 55 are zero:
>>>> +                      0: Cluster should be read as all zeros.
>>>> +                      1: Cluster should be read as all ones.
>>>> +
>>>> +         1 -  8:    Reserved and must be zero.
>>>> +
>>>> +         9 - 55:    Bits 9 - 55 of the host cluster offset. Must be
>>>> aligned to
>>>> +                    a cluster boundary. If the offset is 0, the
>>>> cluster is
>>>> +                    unallocated; in that case, bit 0 determines how
>>>> this
>>>> +                    cluster should be treated when read from.
>>> Possible wording tweak:
>>> s/when read from/during reads/.
> 
> ok
> 
>>>
>>>> +
>>>> +        56 - 63:    Reserved and must be zero.
>>>> +
>>>> +=== Bitmap data ===
>>>> +
>>>> +As noted above, bitmap data is stored in separate clusters,
>>>> described by the
>>>> +bitmap table. Given an offset (in bytes) into the bitmap data, the
>>>> offset into
>>>> +the image file can be obtained as follows:
>>>> +
>>>> +    image_offset =
>>>> +        bitmap_table[bitmap_data_offset / cluster_size] +
>>>> +            (bitmap_data_offset % cluster_size)
>>>> +
>>>> +This offset is not defined if bits 9 - 55 of bitmap table entry are
>>>> zero (see
>>>> +above).
>>>> +
>>>> +Given an offset byte_nr into the virtual disk and the bitmap's
>>>> granularity, the
>>>> +bit offset into the bitmap can be calculated like this:
>>>> +
>>>> +    bit_offset =
>>>> +        image_offset(byte_nr / granularity / 8) * 8 +
>>>> +            (byte_nr / granularity) % 8
>>>> +
>>>> +If the size of the bitmap data is not a multiply of cluster size
>>>> then the last
>>> s/multiply of cluster size/multiple of the cluster size,/
> 
> ok, thanks
> 
>>>
>>>> +cluster of the bitmap data contains some unused tail bits. These
>>>> bits must be
>>>> +zero.
>>>>
>> Thanks!
>>
>> --js
> 
> Please, let's decide finally about extra data, than I'll reroll it and,
> I hope, it will be committed, to make it possible to continue work on
> persistence series. About extra data, I'm ready to accept any variant,
> strictly defining, what software should do with unknown extra data.
> 
> 

I discussed this with Eric Blake on IRC briefly, and I mentioned I was
concerned that we didn't specify a format at all for the extra data.
Eric felt that it was not unusual to leave a space for future expansion
and that as we haven't used it yet, we don't need to solidify it.

He also felt it would be unusual to stipulate the format of data that we
don't even intend to use yet.

In short, I'm being too proactive.

A commit message mention that, should anyone wish to expand the
type-specific data in the future that adding a 2-byte version as the
first field in extra data would probably be sufficient, and we can worry
about the spec wording later. It is fine to assume for now that if
extra_data_size is 0 that the version/format of the data is "v0" and
that does not limit our future expansion.

Sound good? Sorry for the red tape.
--John Snow



reply via email to

[Prev in Thread] Current Thread [Next in Thread]