1=================================== 2QED Image File Format Specification 3=================================== 4 5The file format looks like this:: 6 7 +----------+----------+----------+-----+ 8 | cluster0 | cluster1 | cluster2 | ... | 9 +----------+----------+----------+-----+ 10 11The first cluster begins with the ``header``. The header contains information 12about where regular clusters start; this allows the header to be extensible and 13store extra information about the image file. A regular cluster may be 14a ``data cluster``, an ``L2``, or an ``L1 table``. L1 and L2 tables are composed 15of one or more contiguous clusters. 16 17Normally the file size will be a multiple of the cluster size. If the file size 18is not a multiple, extra information after the last cluster may not be preserved 19if data is written. Legitimate extra information should use space between the header 20and the first regular cluster. 21 22All fields are little-endian. 23 24Header 25------ 26 27:: 28 29 Header { 30 uint32_t magic; /* QED\0 */ 31 32 uint32_t cluster_size; /* in bytes */ 33 uint32_t table_size; /* for L1 and L2 tables, in clusters */ 34 uint32_t header_size; /* in clusters */ 35 36 uint64_t features; /* format feature bits */ 37 uint64_t compat_features; /* compat feature bits */ 38 uint64_t autoclear_features; /* self-resetting feature bits */ 39 40 uint64_t l1_table_offset; /* in bytes */ 41 uint64_t image_size; /* total logical image size, in bytes */ 42 43 /* if (features & QED_F_BACKING_FILE) */ 44 uint32_t backing_filename_offset; /* in bytes from start of header */ 45 uint32_t backing_filename_size; /* in bytes */ 46 } 47 48Field descriptions: 49~~~~~~~~~~~~~~~~~~~ 50 51- ``cluster_size`` must be a power of 2 in range [2^12, 2^26]. 52- ``table_size`` must be a power of 2 in range [1, 16]. 53- ``header_size`` is the number of clusters used by the header and any additional 54 information stored before regular clusters. 55- ``features``, ``compat_features``, and ``autoclear_features`` are file format 56 extension bitmaps. They work as follows: 57 58 - An image with unknown ``features`` bits enabled must not be opened. File format 59 changes that are not backwards-compatible must use ``features`` bits. 60 - An image with unknown ``compat_features`` bits enabled can be opened safely. 61 The unknown features are simply ignored and represent backwards-compatible 62 changes to the file format. 63 - An image with unknown ``autoclear_features`` bits enable can be opened safely 64 after clearing the unknown bits. This allows for backwards-compatible changes 65 to the file format which degrade gracefully and can be re-enabled again by a 66 new program later. 67- ``l1_table_offset`` is the offset of the first byte of the L1 table in the image 68 file and must be a multiple of ``cluster_size``. 69- ``image_size`` is the block device size seen by the guest and must be a multiple 70 of 512 bytes. 71- ``backing_filename_offset`` and ``backing_filename_size`` describe a string in 72 (byte offset, byte size) form. It is not NUL-terminated and has no alignment constraints. 73 The string must be stored within the first ``header_size`` clusters. The backing filename 74 may be an absolute path or relative to the image file. 75 76Feature bits: 77~~~~~~~~~~~~~ 78 79- ``QED_F_BACKING_FILE = 0x01``. The image uses a backing file. 80- ``QED_F_NEED_CHECK = 0x02``. The image needs a consistency check before use. 81- ``QED_F_BACKING_FORMAT_NO_PROBE = 0x04``. The backing file is a raw disk image 82 and no file format autodetection should be attempted. This should be used to 83 ensure that raw backing files are never detected as an image format if they happen 84 to contain magic constants. 85 86There are currently no defined ``compat_features`` or ``autoclear_features`` bits. 87 88Fields predicated on a feature bit are only used when that feature is set. 89The fields always take up header space, regardless of whether or not the feature 90bit is set. 91 92Tables 93------ 94 95Tables provide the translation from logical offsets in the block device to cluster 96offsets in the file. 97 98:: 99 100 #define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t)) 101 102 Table { 103 uint64_t offsets[TABLE_NOFFSETS]; 104 } 105 106The tables are organized as follows:: 107 108 +----------+ 109 | L1 table | 110 +----------+ 111 ,------' | '------. 112 +----------+ | +----------+ 113 | L2 table | ... | L2 table | 114 +----------+ +----------+ 115 ,------' | '------. 116 +----------+ | +----------+ 117 | Data | ... | Data | 118 +----------+ +----------+ 119 120A table is made up of one or more contiguous clusters. The ``table_size`` header 121field determines table size for an image file. For example, ``cluster_size=64 KB`` 122and ``table_size=4`` results in 256 KB tables. 123 124The logical image size must be less than or equal to the maximum possible size of 125clusters rooted by the L1 table: 126 127.. code:: 128 129 header.image_size <= TABLE_NOFFSETS * TABLE_NOFFSETS * header.cluster_size 130 131L1, L2, and data cluster offsets must be aligned to ``header.cluster_size``. 132The following offsets have special meanings: 133 134L2 table offsets 135~~~~~~~~~~~~~~~~ 136 137- 0 - unallocated. The L2 table is not yet allocated. 138 139Data cluster offsets 140~~~~~~~~~~~~~~~~~~~~ 141 142- 0 - unallocated. The data cluster is not yet allocated. 143- 1 - zero. The data cluster contents are all zeroes and no cluster is allocated. 144 145Future format extensions may wish to store per-offset information. The least 146significant 12 bits of an offset are reserved for this purpose and must be set 147to zero. Image files with ``cluster_size`` > 2^12 will have more unused bits 148which should also be zeroed. 149 150Unallocated L2 tables and data clusters 151~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 152 153Reads to an unallocated area of the image file access the backing file. If there 154is no backing file, then zeroes are produced. The backing file may be smaller 155than the image file and reads of unallocated areas beyond the end of the backing 156file produce zeroes. 157 158Writes to an unallocated area cause a new data clusters to be allocated, and a new 159L2 table if that is also unallocated. The new data cluster is populated with data 160from the backing file (or zeroes if no backing file) and the data being written. 161 162Zero data clusters 163~~~~~~~~~~~~~~~~~~ 164 165Zero data clusters are a space-efficient way of storing zeroed regions of the image. 166 167Reads to a zero data cluster produce zeroes. 168 169.. note:: 170 The difference between an unallocated and a zero data cluster is that zero data 171 clusters stop the reading of contents from the backing file. 172 173Writes to a zero data cluster cause a new data cluster to be allocated. The new 174data cluster is populated with zeroes and the data being written. 175 176Logical offset translation 177~~~~~~~~~~~~~~~~~~~~~~~~~~ 178 179Logical offsets are translated into cluster offsets as follows:: 180 181 table_bits table_bits cluster_bits 182 <--------> <--------> <---------------> 183 +----------+----------+-----------------+ 184 | L1 index | L2 index | byte offset | 185 +----------+----------+-----------------+ 186 187 Structure of a logical offset 188 189 offset_mask = ~(cluster_size - 1) # mask for the image file byte offset 190 191 def logical_to_cluster_offset(l1_index, l2_index, byte_offset): 192 l2_offset = l1_table[l1_index] 193 l2_table = load_table(l2_offset) 194 cluster_offset = l2_table[l2_index] & offset_mask 195 return cluster_offset + byte_offset 196 197Consistency checking 198-------------------- 199 200This section is informational and included to provide background on the use 201of the ``QED_F_NEED_CHECK features`` bit. 202 203The ``QED_F_NEED_CHECK`` bit is used to mark an image as dirty before starting 204an operation that could leave the image in an inconsistent state if interrupted 205by a crash or power failure. A dirty image must be checked on open because its 206metadata may not be consistent. 207 208Consistency check includes the following invariants: 209 210- Each cluster is referenced once and only once. It is an inconsistency to have 211 a cluster referenced more than once by L1 or L2 tables. A cluster has been leaked 212 if it has no references. 213- Offsets must be within the image file size and must be ``cluster_size`` aligned. 214- Table offsets must at least ``table_size`` * ``cluster_size`` bytes from the end 215 of the image file so that there is space for the entire table. 216 217The consistency check process starts from ``l1_table_offset`` and scans all L2 tables. 218After the check completes with no other errors besides leaks, the ``QED_F_NEED_CHECK`` 219bit can be cleared and the image can be accessed. 220