1*73cdd846SSouleymane Conte=================================== 2*73cdd846SSouleymane ConteQED Image File Format Specification 3*73cdd846SSouleymane Conte=================================== 4*73cdd846SSouleymane Conte 5*73cdd846SSouleymane ConteThe file format looks like this:: 6*73cdd846SSouleymane Conte 7*73cdd846SSouleymane Conte +----------+----------+----------+-----+ 8*73cdd846SSouleymane Conte | cluster0 | cluster1 | cluster2 | ... | 9*73cdd846SSouleymane Conte +----------+----------+----------+-----+ 10*73cdd846SSouleymane Conte 11*73cdd846SSouleymane ConteThe first cluster begins with the ``header``. The header contains information 12*73cdd846SSouleymane Conteabout where regular clusters start; this allows the header to be extensible and 13*73cdd846SSouleymane Contestore extra information about the image file. A regular cluster may be 14*73cdd846SSouleymane Contea ``data cluster``, an ``L2``, or an ``L1 table``. L1 and L2 tables are composed 15*73cdd846SSouleymane Conteof one or more contiguous clusters. 16*73cdd846SSouleymane Conte 17*73cdd846SSouleymane ConteNormally the file size will be a multiple of the cluster size. If the file size 18*73cdd846SSouleymane Conteis not a multiple, extra information after the last cluster may not be preserved 19*73cdd846SSouleymane Conteif data is written. Legitimate extra information should use space between the header 20*73cdd846SSouleymane Conteand the first regular cluster. 21*73cdd846SSouleymane Conte 22*73cdd846SSouleymane ConteAll fields are little-endian. 23*73cdd846SSouleymane Conte 24*73cdd846SSouleymane ConteHeader 25*73cdd846SSouleymane Conte------ 26*73cdd846SSouleymane Conte 27*73cdd846SSouleymane Conte:: 28*73cdd846SSouleymane Conte 29*73cdd846SSouleymane Conte Header { 30*73cdd846SSouleymane Conte uint32_t magic; /* QED\0 */ 31*73cdd846SSouleymane Conte 32*73cdd846SSouleymane Conte uint32_t cluster_size; /* in bytes */ 33*73cdd846SSouleymane Conte uint32_t table_size; /* for L1 and L2 tables, in clusters */ 34*73cdd846SSouleymane Conte uint32_t header_size; /* in clusters */ 35*73cdd846SSouleymane Conte 36*73cdd846SSouleymane Conte uint64_t features; /* format feature bits */ 37*73cdd846SSouleymane Conte uint64_t compat_features; /* compat feature bits */ 38*73cdd846SSouleymane Conte uint64_t autoclear_features; /* self-resetting feature bits */ 39*73cdd846SSouleymane Conte 40*73cdd846SSouleymane Conte uint64_t l1_table_offset; /* in bytes */ 41*73cdd846SSouleymane Conte uint64_t image_size; /* total logical image size, in bytes */ 42*73cdd846SSouleymane Conte 43*73cdd846SSouleymane Conte /* if (features & QED_F_BACKING_FILE) */ 44*73cdd846SSouleymane Conte uint32_t backing_filename_offset; /* in bytes from start of header */ 45*73cdd846SSouleymane Conte uint32_t backing_filename_size; /* in bytes */ 46*73cdd846SSouleymane Conte } 47*73cdd846SSouleymane Conte 48*73cdd846SSouleymane ConteField descriptions: 49*73cdd846SSouleymane Conte~~~~~~~~~~~~~~~~~~~ 50*73cdd846SSouleymane Conte 51*73cdd846SSouleymane Conte- ``cluster_size`` must be a power of 2 in range [2^12, 2^26]. 52*73cdd846SSouleymane Conte- ``table_size`` must be a power of 2 in range [1, 16]. 53*73cdd846SSouleymane Conte- ``header_size`` is the number of clusters used by the header and any additional 54*73cdd846SSouleymane Conte information stored before regular clusters. 55*73cdd846SSouleymane Conte- ``features``, ``compat_features``, and ``autoclear_features`` are file format 56*73cdd846SSouleymane Conte extension bitmaps. They work as follows: 57*73cdd846SSouleymane Conte 58*73cdd846SSouleymane Conte - An image with unknown ``features`` bits enabled must not be opened. File format 59*73cdd846SSouleymane Conte changes that are not backwards-compatible must use ``features`` bits. 60*73cdd846SSouleymane Conte - An image with unknown ``compat_features`` bits enabled can be opened safely. 61*73cdd846SSouleymane Conte The unknown features are simply ignored and represent backwards-compatible 62*73cdd846SSouleymane Conte changes to the file format. 63*73cdd846SSouleymane Conte - An image with unknown ``autoclear_features`` bits enable can be opened safely 64*73cdd846SSouleymane Conte after clearing the unknown bits. This allows for backwards-compatible changes 65*73cdd846SSouleymane Conte to the file format which degrade gracefully and can be re-enabled again by a 66*73cdd846SSouleymane Conte new program later. 67*73cdd846SSouleymane Conte- ``l1_table_offset`` is the offset of the first byte of the L1 table in the image 68*73cdd846SSouleymane Conte file and must be a multiple of ``cluster_size``. 69*73cdd846SSouleymane Conte- ``image_size`` is the block device size seen by the guest and must be a multiple 70*73cdd846SSouleymane Conte of 512 bytes. 71*73cdd846SSouleymane Conte- ``backing_filename_offset`` and ``backing_filename_size`` describe a string in 72*73cdd846SSouleymane Conte (byte offset, byte size) form. It is not NUL-terminated and has no alignment constraints. 73*73cdd846SSouleymane Conte The string must be stored within the first ``header_size`` clusters. The backing filename 74*73cdd846SSouleymane Conte may be an absolute path or relative to the image file. 75*73cdd846SSouleymane Conte 76*73cdd846SSouleymane ConteFeature bits: 77*73cdd846SSouleymane Conte~~~~~~~~~~~~~ 78*73cdd846SSouleymane Conte 79*73cdd846SSouleymane Conte- ``QED_F_BACKING_FILE = 0x01``. The image uses a backing file. 80*73cdd846SSouleymane Conte- ``QED_F_NEED_CHECK = 0x02``. The image needs a consistency check before use. 81*73cdd846SSouleymane Conte- ``QED_F_BACKING_FORMAT_NO_PROBE = 0x04``. The backing file is a raw disk image 82*73cdd846SSouleymane Conte and no file format autodetection should be attempted. This should be used to 83*73cdd846SSouleymane Conte ensure that raw backing files are never detected as an image format if they happen 84*73cdd846SSouleymane Conte to contain magic constants. 85*73cdd846SSouleymane Conte 86*73cdd846SSouleymane ConteThere are currently no defined ``compat_features`` or ``autoclear_features`` bits. 87*73cdd846SSouleymane Conte 88*73cdd846SSouleymane ConteFields predicated on a feature bit are only used when that feature is set. 89*73cdd846SSouleymane ConteThe fields always take up header space, regardless of whether or not the feature 90*73cdd846SSouleymane Contebit is set. 91*73cdd846SSouleymane Conte 92*73cdd846SSouleymane ConteTables 93*73cdd846SSouleymane Conte------ 94*73cdd846SSouleymane Conte 95*73cdd846SSouleymane ConteTables provide the translation from logical offsets in the block device to cluster 96*73cdd846SSouleymane Conteoffsets in the file. 97*73cdd846SSouleymane Conte 98*73cdd846SSouleymane Conte:: 99*73cdd846SSouleymane Conte 100*73cdd846SSouleymane Conte #define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t)) 101*73cdd846SSouleymane Conte 102*73cdd846SSouleymane Conte Table { 103*73cdd846SSouleymane Conte uint64_t offsets[TABLE_NOFFSETS]; 104*73cdd846SSouleymane Conte } 105*73cdd846SSouleymane Conte 106*73cdd846SSouleymane ConteThe tables are organized as follows:: 107*73cdd846SSouleymane Conte 108*73cdd846SSouleymane Conte +----------+ 109*73cdd846SSouleymane Conte | L1 table | 110*73cdd846SSouleymane Conte +----------+ 111*73cdd846SSouleymane Conte ,------' | '------. 112*73cdd846SSouleymane Conte +----------+ | +----------+ 113*73cdd846SSouleymane Conte | L2 table | ... | L2 table | 114*73cdd846SSouleymane Conte +----------+ +----------+ 115*73cdd846SSouleymane Conte ,------' | '------. 116*73cdd846SSouleymane Conte +----------+ | +----------+ 117*73cdd846SSouleymane Conte | Data | ... | Data | 118*73cdd846SSouleymane Conte +----------+ +----------+ 119*73cdd846SSouleymane Conte 120*73cdd846SSouleymane ConteA table is made up of one or more contiguous clusters. The ``table_size`` header 121*73cdd846SSouleymane Contefield determines table size for an image file. For example, ``cluster_size=64 KB`` 122*73cdd846SSouleymane Conteand ``table_size=4`` results in 256 KB tables. 123*73cdd846SSouleymane Conte 124*73cdd846SSouleymane ConteThe logical image size must be less than or equal to the maximum possible size of 125*73cdd846SSouleymane Conteclusters rooted by the L1 table: 126*73cdd846SSouleymane Conte 127*73cdd846SSouleymane Conte.. code:: 128*73cdd846SSouleymane Conte 129*73cdd846SSouleymane Conte header.image_size <= TABLE_NOFFSETS * TABLE_NOFFSETS * header.cluster_size 130*73cdd846SSouleymane Conte 131*73cdd846SSouleymane ConteL1, L2, and data cluster offsets must be aligned to ``header.cluster_size``. 132*73cdd846SSouleymane ConteThe following offsets have special meanings: 133*73cdd846SSouleymane Conte 134*73cdd846SSouleymane ConteL2 table offsets 135*73cdd846SSouleymane Conte~~~~~~~~~~~~~~~~ 136*73cdd846SSouleymane Conte 137*73cdd846SSouleymane Conte- 0 - unallocated. The L2 table is not yet allocated. 138*73cdd846SSouleymane Conte 139*73cdd846SSouleymane ConteData cluster offsets 140*73cdd846SSouleymane Conte~~~~~~~~~~~~~~~~~~~~ 141*73cdd846SSouleymane Conte 142*73cdd846SSouleymane Conte- 0 - unallocated. The data cluster is not yet allocated. 143*73cdd846SSouleymane Conte- 1 - zero. The data cluster contents are all zeroes and no cluster is allocated. 144*73cdd846SSouleymane Conte 145*73cdd846SSouleymane ConteFuture format extensions may wish to store per-offset information. The least 146*73cdd846SSouleymane Contesignificant 12 bits of an offset are reserved for this purpose and must be set 147*73cdd846SSouleymane Conteto zero. Image files with ``cluster_size`` > 2^12 will have more unused bits 148*73cdd846SSouleymane Contewhich should also be zeroed. 149*73cdd846SSouleymane Conte 150*73cdd846SSouleymane ConteUnallocated L2 tables and data clusters 151*73cdd846SSouleymane Conte~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 152*73cdd846SSouleymane Conte 153*73cdd846SSouleymane ConteReads to an unallocated area of the image file access the backing file. If there 154*73cdd846SSouleymane Conteis no backing file, then zeroes are produced. The backing file may be smaller 155*73cdd846SSouleymane Contethan the image file and reads of unallocated areas beyond the end of the backing 156*73cdd846SSouleymane Contefile produce zeroes. 157*73cdd846SSouleymane Conte 158*73cdd846SSouleymane ConteWrites to an unallocated area cause a new data clusters to be allocated, and a new 159*73cdd846SSouleymane ConteL2 table if that is also unallocated. The new data cluster is populated with data 160*73cdd846SSouleymane Contefrom the backing file (or zeroes if no backing file) and the data being written. 161*73cdd846SSouleymane Conte 162*73cdd846SSouleymane ConteZero data clusters 163*73cdd846SSouleymane Conte~~~~~~~~~~~~~~~~~~ 164*73cdd846SSouleymane Conte 165*73cdd846SSouleymane ConteZero data clusters are a space-efficient way of storing zeroed regions of the image. 166*73cdd846SSouleymane Conte 167*73cdd846SSouleymane ConteReads to a zero data cluster produce zeroes. 168*73cdd846SSouleymane Conte 169*73cdd846SSouleymane Conte.. note:: 170*73cdd846SSouleymane Conte The difference between an unallocated and a zero data cluster is that zero data 171*73cdd846SSouleymane Conte clusters stop the reading of contents from the backing file. 172*73cdd846SSouleymane Conte 173*73cdd846SSouleymane ConteWrites to a zero data cluster cause a new data cluster to be allocated. The new 174*73cdd846SSouleymane Contedata cluster is populated with zeroes and the data being written. 175*73cdd846SSouleymane Conte 176*73cdd846SSouleymane ConteLogical offset translation 177*73cdd846SSouleymane Conte~~~~~~~~~~~~~~~~~~~~~~~~~~ 178*73cdd846SSouleymane Conte 179*73cdd846SSouleymane ConteLogical offsets are translated into cluster offsets as follows:: 180*73cdd846SSouleymane Conte 181*73cdd846SSouleymane Conte table_bits table_bits cluster_bits 182*73cdd846SSouleymane Conte <--------> <--------> <---------------> 183*73cdd846SSouleymane Conte +----------+----------+-----------------+ 184*73cdd846SSouleymane Conte | L1 index | L2 index | byte offset | 185*73cdd846SSouleymane Conte +----------+----------+-----------------+ 186*73cdd846SSouleymane Conte 187*73cdd846SSouleymane Conte Structure of a logical offset 188*73cdd846SSouleymane Conte 189*73cdd846SSouleymane Conte offset_mask = ~(cluster_size - 1) # mask for the image file byte offset 190*73cdd846SSouleymane Conte 191*73cdd846SSouleymane Conte def logical_to_cluster_offset(l1_index, l2_index, byte_offset): 192*73cdd846SSouleymane Conte l2_offset = l1_table[l1_index] 193*73cdd846SSouleymane Conte l2_table = load_table(l2_offset) 194*73cdd846SSouleymane Conte cluster_offset = l2_table[l2_index] & offset_mask 195*73cdd846SSouleymane Conte return cluster_offset + byte_offset 196*73cdd846SSouleymane Conte 197*73cdd846SSouleymane ConteConsistency checking 198*73cdd846SSouleymane Conte-------------------- 199*73cdd846SSouleymane Conte 200*73cdd846SSouleymane ConteThis section is informational and included to provide background on the use 201*73cdd846SSouleymane Conteof the ``QED_F_NEED_CHECK features`` bit. 202*73cdd846SSouleymane Conte 203*73cdd846SSouleymane ConteThe ``QED_F_NEED_CHECK`` bit is used to mark an image as dirty before starting 204*73cdd846SSouleymane Contean operation that could leave the image in an inconsistent state if interrupted 205*73cdd846SSouleymane Conteby a crash or power failure. A dirty image must be checked on open because its 206*73cdd846SSouleymane Contemetadata may not be consistent. 207*73cdd846SSouleymane Conte 208*73cdd846SSouleymane ConteConsistency check includes the following invariants: 209*73cdd846SSouleymane Conte 210*73cdd846SSouleymane Conte- Each cluster is referenced once and only once. It is an inconsistency to have 211*73cdd846SSouleymane Conte a cluster referenced more than once by L1 or L2 tables. A cluster has been leaked 212*73cdd846SSouleymane Conte if it has no references. 213*73cdd846SSouleymane Conte- Offsets must be within the image file size and must be ``cluster_size`` aligned. 214*73cdd846SSouleymane Conte- Table offsets must at least ``table_size`` * ``cluster_size`` bytes from the end 215*73cdd846SSouleymane Conte of the image file so that there is space for the entire table. 216*73cdd846SSouleymane Conte 217*73cdd846SSouleymane ConteThe consistency check process starts from ``l1_table_offset`` and scans all L2 tables. 218*73cdd846SSouleymane ConteAfter the check completes with no other errors besides leaks, the ``QED_F_NEED_CHECK`` 219*73cdd846SSouleymane Contebit can be cleared and the image can be accessed. 220