qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [PATCH v4 1/5] docs: Add QED image format specification


From: Stefan Hajnoczi
Subject: [Qemu-devel] [PATCH v4 1/5] docs: Add QED image format specification
Date: Thu, 28 Oct 2010 12:01:20 +0100

Signed-off-by: Stefan Hajnoczi <address@hidden>
---
 docs/specs/qed_spec.txt |  128 +++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 128 insertions(+), 0 deletions(-)
 create mode 100644 docs/specs/qed_spec.txt

diff --git a/docs/specs/qed_spec.txt b/docs/specs/qed_spec.txt
new file mode 100644
index 0000000..e4425c8
--- /dev/null
+++ b/docs/specs/qed_spec.txt
@@ -0,0 +1,128 @@
+=Specification=
+
+The file format looks like this:
+
+ +----------+----------+----------+-----+
+ | cluster0 | cluster1 | cluster2 | ... |
+ +----------+----------+----------+-----+
+
+The first cluster begins with the '''header'''.  The header contains 
information about where regular clusters start; this allows the header to be 
extensible and store extra information about the image file.  A regular cluster 
may be a '''data cluster''', an '''L2''', or an '''L1 table'''.  L1 and L2 
tables are composed of one or more contiguous clusters.
+
+Normally the file size will be a multiple of the cluster size.  If the file 
size is not a multiple, extra information after the last cluster may not be 
preserved if data is written.  Legitimate extra information should use space 
between the header and the first regular cluster.
+
+All fields are little-endian.
+
+==Header==
+ Header {
+     uint32_t magic;               /* QED\0 */
+ 
+     uint32_t cluster_size;        /* in bytes */
+     uint32_t table_size;          /* for L1 and L2 tables, in clusters */
+     uint32_t header_size;         /* in clusters */
+ 
+     uint64_t features;            /* format feature bits */
+     uint64_t compat_features;     /* compat feature bits */
+     uint64_t l1_table_offset;     /* in bytes */
+     uint64_t image_size;          /* total logical image size, in bytes */
+ 
+     /* if (features & QED_F_BACKING_FILE) */
+     uint32_t backing_filename_offset; /* in bytes from start of header */
+     uint32_t backing_filename_size;   /* in bytes */
+ }
+
+Field descriptions:
+* ''cluster_size'' must be a power of 2 in range [2^12, 2^26].
+* ''table_size'' must be a power of 2 in range [1, 16].
+* ''header_size'' is the number of clusters used by the header and any 
additional information stored before regular clusters.
+* ''features'', ''compat_features'', and ''autoclear_features'' are file 
format extension bitmaps.  They work as follows:
+** An image with unknown ''features'' bits enabled must not be opened.  File 
format changes that are not backwards-compatible must use ''features'' bits.
+** An image with unknown ''compat_features'' bits enabled can be opened 
safely.  The unknown features are simply ignored and represent 
backwards-compatible changes to the file format.
+** An image with unknown ''autoclear_features'' bits enable can be opened 
safely after clearing the unknown bits.  This allows for backwards-compatible 
changes to the file format which degrade gracefully and can be re-enabled again 
by a new program later.
+* ''l1_table_offset'' is the offset of the first byte of the L1 table in the 
image file and must be a multiple of ''cluster_size''.
+* ''image_size'' is the block device size seen by the guest and must be a 
multiple of 512 bytes.
+* ''backing_filename'' is a string in (byte offset, byte size) form.  It is 
not NUL-terminated and has no alignment constraints.
+
+Feature bits:
+* QED_F_BACKING_FILE = 0x01.  The image uses a backing file.  The backing 
filename string is given in the ''backing_filename_{offset,size}'' fields and 
may be an absolute path or relative to the image file.
+* QED_F_NEED_CHECK = 0x02.  The image needs a consistency check before use.
+* QED_F_BACKING_FORMAT_NO_PROBE = 0x04.  The backing file is a raw disk image 
and no file format autodetection should be attempted.  This should be used to 
ensure that raw backing images are never detected as an image format if they 
happen to contain magic constants.
+
+There are currently no defined ''compat_features'' or ''autoclear_features'' 
bits.
+
+Fields predicated on a feature bit are only used when that feature is set.  
The fields always take up header space, regardless of whether or not the 
feature bit is set.
+
+==Tables==
+
+Tables provide the translation from logical offsets in the block device to 
cluster offsets in the file.
+
+ #define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t))
+  
+ Table {
+     uint64_t offsets[TABLE_NOFFSETS];
+ }
+
+The tables are organized as follows:
+
+                    +----------+
+                    | L1 table |
+                    +----------+
+               ,------'  |  '------.
+          +----------+   |    +----------+
+          | L2 table |  ...   | L2 table |
+          +----------+        +----------+
+      ,------'  |  '------.
+ +----------+   |    +----------+
+ |   Data   |  ...   |   Data   |
+ +----------+        +----------+
+
+A table is made up of one or more contiguous clusters.  The table_size header 
field determines table size for an image file.  For example, cluster_size=64 KB 
and table_size=4 results in 256 KB tables.
+
+The logical image size must be less than or equal to the maximum possible size 
of clusters rooted by the L1 table:
+ header.image_size <= TABLE_NOFFSETS * TABLE_NOFFSETS * header.cluster_size
+
+All offsets in L1 and L2 tables are cluster-aligned.  The least significant 
bits up to ''cluster_size'' are reserved and must be zero.  This may be used in 
future format extensions to store per-offset information.
+
+The following offsets have special meanings:
+
+===L2 table offsets===
+* 0 - unallocated.  The L2 table is not yet allocated.
+
+===Data cluster offsets===
+* 0 - unallocated.  The data cluster is not yet allocated.
+
+===Unallocated L2 tables and data clusters===
+Reads to an unallocated area of the image file access the backing file.  If 
there is no backing file, then zeroes are produced.  The backing file may be 
smaller than the image file and reads of unallocated areas beyond the end of 
the backing file produce zeroes.
+
+Writes to an unallocated area cause a new data clusters to be allocated, and a 
new L2 table if that is also unallocated.  The new data cluster is populated 
with data from the backing image (or zeroes if no backing image) and the data 
being written.
+
+===Logical offset translation===
+Logical offsets are translated into cluster offsets as follows:
+
+  table_bits table_bits    cluster_bits
+  <--------> <--------> <--------------->
+ +----------+----------+-----------------+
+ | L1 index | L2 index |     byte offset |
+ +----------+----------+-----------------+
+ 
+       Structure of a logical offset
+
+ offset_mask = ~(cluster_size - 1) # mask for the image file byte offset
+ 
+ def logical_to_cluster_offset(l1_index, l2_index, byte_offset):
+   l2_offset = l1_table[l1_index]
+   l2_table = load_table(l2_offset)
+   cluster_offset = l2_table[l2_index] & offset_mask
+   return cluster_offset + byte_offset
+
+==Consistency checking==
+
+This section is informational and included to provide background on the use of 
the QED_F_NEED_CHECK ''features'' bit.
+
+The QED_F_NEED_CHECK bit is used to mark an image as dirty before starting an 
operation that could leave the image in an inconsistent state if interrupted by 
a crash or power failure.  A dirty image must be checked on open because its 
metadata may not be consistent.
+
+Consistency check includes the following invariants:
+# Each cluster is referenced once and only once.  It is an inconsistency to 
have a cluster referenced more than once by L1 or L2 tables.  A cluster has 
been leaked if it has no references.
+# Offsets must be within the image file size and must be ''cluster_size'' 
aligned.
+# Table offsets must at least ''table_size'' * ''cluster_size'' bytes from the 
end of the image file so that there is space for the entire table.
+
+The consistency check process starts by from ''l1_table_offset'' and scans all 
L2 tables.  After the check completes with no other errors besides leaks, the 
QED_F_NEED_CHECK bit can be cleared and the image can be accessed.
-- 
1.7.1




reply via email to

[Prev in Thread] Current Thread [Next in Thread]