xref: /qemu/docs/interop/qed_spec.rst (revision 0bc0e92be50058bc3b881b0d5051206b015a3fa7)
1*73cdd846SSouleymane Conte===================================
2*73cdd846SSouleymane ConteQED Image File Format Specification
3*73cdd846SSouleymane Conte===================================
4*73cdd846SSouleymane Conte
5*73cdd846SSouleymane ConteThe file format looks like this::
6*73cdd846SSouleymane Conte
7*73cdd846SSouleymane Conte +----------+----------+----------+-----+
8*73cdd846SSouleymane Conte | cluster0 | cluster1 | cluster2 | ... |
9*73cdd846SSouleymane Conte +----------+----------+----------+-----+
10*73cdd846SSouleymane Conte
11*73cdd846SSouleymane ConteThe first cluster begins with the ``header``. The header contains information
12*73cdd846SSouleymane Conteabout where regular clusters start; this allows the header to be extensible and
13*73cdd846SSouleymane Contestore extra information about the image file. A regular cluster may be
14*73cdd846SSouleymane Contea ``data cluster``, an ``L2``, or an ``L1 table``. L1 and L2 tables are composed
15*73cdd846SSouleymane Conteof one or more contiguous clusters.
16*73cdd846SSouleymane Conte
17*73cdd846SSouleymane ConteNormally the file size will be a multiple of the cluster size.  If the file size
18*73cdd846SSouleymane Conteis not a multiple, extra information after the last cluster may not be preserved
19*73cdd846SSouleymane Conteif data is written. Legitimate extra information should use space between the header
20*73cdd846SSouleymane Conteand the first regular cluster.
21*73cdd846SSouleymane Conte
22*73cdd846SSouleymane ConteAll fields are little-endian.
23*73cdd846SSouleymane Conte
24*73cdd846SSouleymane ConteHeader
25*73cdd846SSouleymane Conte------
26*73cdd846SSouleymane Conte
27*73cdd846SSouleymane Conte::
28*73cdd846SSouleymane Conte
29*73cdd846SSouleymane Conte  Header {
30*73cdd846SSouleymane Conte     uint32_t magic;               /* QED\0 */
31*73cdd846SSouleymane Conte
32*73cdd846SSouleymane Conte     uint32_t cluster_size;        /* in bytes */
33*73cdd846SSouleymane Conte     uint32_t table_size;          /* for L1 and L2 tables, in clusters */
34*73cdd846SSouleymane Conte     uint32_t header_size;         /* in clusters */
35*73cdd846SSouleymane Conte
36*73cdd846SSouleymane Conte     uint64_t features;            /* format feature bits */
37*73cdd846SSouleymane Conte     uint64_t compat_features;     /* compat feature bits */
38*73cdd846SSouleymane Conte     uint64_t autoclear_features;  /* self-resetting feature bits */
39*73cdd846SSouleymane Conte
40*73cdd846SSouleymane Conte     uint64_t l1_table_offset;     /* in bytes */
41*73cdd846SSouleymane Conte     uint64_t image_size;          /* total logical image size, in bytes */
42*73cdd846SSouleymane Conte
43*73cdd846SSouleymane Conte     /* if (features & QED_F_BACKING_FILE) */
44*73cdd846SSouleymane Conte     uint32_t backing_filename_offset; /* in bytes from start of header */
45*73cdd846SSouleymane Conte     uint32_t backing_filename_size;   /* in bytes */
46*73cdd846SSouleymane Conte  }
47*73cdd846SSouleymane Conte
48*73cdd846SSouleymane ConteField descriptions:
49*73cdd846SSouleymane Conte~~~~~~~~~~~~~~~~~~~
50*73cdd846SSouleymane Conte
51*73cdd846SSouleymane Conte- ``cluster_size`` must be a power of 2 in range [2^12, 2^26].
52*73cdd846SSouleymane Conte- ``table_size`` must be a power of 2 in range [1, 16].
53*73cdd846SSouleymane Conte- ``header_size`` is the number of clusters used by the header and any additional
54*73cdd846SSouleymane Conte  information stored before regular clusters.
55*73cdd846SSouleymane Conte- ``features``, ``compat_features``, and ``autoclear_features`` are file format
56*73cdd846SSouleymane Conte  extension bitmaps. They work as follows:
57*73cdd846SSouleymane Conte
58*73cdd846SSouleymane Conte  - An image with unknown ``features`` bits enabled must not be opened. File format
59*73cdd846SSouleymane Conte    changes that are not backwards-compatible must use ``features`` bits.
60*73cdd846SSouleymane Conte  - An image with unknown ``compat_features`` bits enabled can be opened safely.
61*73cdd846SSouleymane Conte    The unknown features are simply ignored and represent backwards-compatible
62*73cdd846SSouleymane Conte    changes to the file format.
63*73cdd846SSouleymane Conte  - An image with unknown ``autoclear_features`` bits enable can be opened safely
64*73cdd846SSouleymane Conte    after clearing the unknown bits. This allows for backwards-compatible changes
65*73cdd846SSouleymane Conte    to the file format which degrade gracefully and can be re-enabled again by a
66*73cdd846SSouleymane Conte    new program later.
67*73cdd846SSouleymane Conte- ``l1_table_offset`` is the offset of the first byte of the L1 table in the image
68*73cdd846SSouleymane Conte  file and must be a multiple of ``cluster_size``.
69*73cdd846SSouleymane Conte- ``image_size`` is the block device size seen by the guest and must be a multiple
70*73cdd846SSouleymane Conte  of 512 bytes.
71*73cdd846SSouleymane Conte- ``backing_filename_offset`` and ``backing_filename_size`` describe a string in
72*73cdd846SSouleymane Conte  (byte offset, byte size) form. It is not NUL-terminated and has no alignment constraints.
73*73cdd846SSouleymane Conte  The string must be stored within the first ``header_size`` clusters. The backing filename
74*73cdd846SSouleymane Conte  may be an absolute path or relative to the image file.
75*73cdd846SSouleymane Conte
76*73cdd846SSouleymane ConteFeature bits:
77*73cdd846SSouleymane Conte~~~~~~~~~~~~~
78*73cdd846SSouleymane Conte
79*73cdd846SSouleymane Conte- ``QED_F_BACKING_FILE = 0x01``. The image uses a backing file.
80*73cdd846SSouleymane Conte- ``QED_F_NEED_CHECK = 0x02``. The image needs a consistency check before use.
81*73cdd846SSouleymane Conte- ``QED_F_BACKING_FORMAT_NO_PROBE = 0x04``. The backing file is a raw disk image
82*73cdd846SSouleymane Conte  and no file format autodetection should be attempted.  This should be used to
83*73cdd846SSouleymane Conte  ensure that raw backing files are never detected as an image format if they happen
84*73cdd846SSouleymane Conte  to contain magic constants.
85*73cdd846SSouleymane Conte
86*73cdd846SSouleymane ConteThere are currently no defined ``compat_features`` or ``autoclear_features`` bits.
87*73cdd846SSouleymane Conte
88*73cdd846SSouleymane ConteFields predicated on a feature bit are only used when that feature is set.
89*73cdd846SSouleymane ConteThe fields always take up header space, regardless of whether or not the feature
90*73cdd846SSouleymane Contebit is set.
91*73cdd846SSouleymane Conte
92*73cdd846SSouleymane ConteTables
93*73cdd846SSouleymane Conte------
94*73cdd846SSouleymane Conte
95*73cdd846SSouleymane ConteTables provide the translation from logical offsets in the block device to cluster
96*73cdd846SSouleymane Conteoffsets in the file.
97*73cdd846SSouleymane Conte
98*73cdd846SSouleymane Conte::
99*73cdd846SSouleymane Conte
100*73cdd846SSouleymane Conte #define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t))
101*73cdd846SSouleymane Conte
102*73cdd846SSouleymane Conte Table {
103*73cdd846SSouleymane Conte     uint64_t offsets[TABLE_NOFFSETS];
104*73cdd846SSouleymane Conte }
105*73cdd846SSouleymane Conte
106*73cdd846SSouleymane ConteThe tables are organized as follows::
107*73cdd846SSouleymane Conte
108*73cdd846SSouleymane Conte                    +----------+
109*73cdd846SSouleymane Conte                    | L1 table |
110*73cdd846SSouleymane Conte                    +----------+
111*73cdd846SSouleymane Conte               ,------'  |  '------.
112*73cdd846SSouleymane Conte          +----------+   |    +----------+
113*73cdd846SSouleymane Conte          | L2 table |  ...   | L2 table |
114*73cdd846SSouleymane Conte          +----------+        +----------+
115*73cdd846SSouleymane Conte      ,------'  |  '------.
116*73cdd846SSouleymane Conte +----------+   |    +----------+
117*73cdd846SSouleymane Conte |   Data   |  ...   |   Data   |
118*73cdd846SSouleymane Conte +----------+        +----------+
119*73cdd846SSouleymane Conte
120*73cdd846SSouleymane ConteA table is made up of one or more contiguous clusters.  The ``table_size`` header
121*73cdd846SSouleymane Contefield determines table size for an image file. For example, ``cluster_size=64 KB``
122*73cdd846SSouleymane Conteand ``table_size=4`` results in 256 KB tables.
123*73cdd846SSouleymane Conte
124*73cdd846SSouleymane ConteThe logical image size must be less than or equal to the maximum possible size of
125*73cdd846SSouleymane Conteclusters rooted by the L1 table:
126*73cdd846SSouleymane Conte
127*73cdd846SSouleymane Conte.. code::
128*73cdd846SSouleymane Conte
129*73cdd846SSouleymane Conte header.image_size <= TABLE_NOFFSETS * TABLE_NOFFSETS * header.cluster_size
130*73cdd846SSouleymane Conte
131*73cdd846SSouleymane ConteL1, L2, and data cluster offsets must be aligned to ``header.cluster_size``.
132*73cdd846SSouleymane ConteThe following offsets have special meanings:
133*73cdd846SSouleymane Conte
134*73cdd846SSouleymane ConteL2 table offsets
135*73cdd846SSouleymane Conte~~~~~~~~~~~~~~~~
136*73cdd846SSouleymane Conte
137*73cdd846SSouleymane Conte- 0 - unallocated. The L2 table is not yet allocated.
138*73cdd846SSouleymane Conte
139*73cdd846SSouleymane ConteData cluster offsets
140*73cdd846SSouleymane Conte~~~~~~~~~~~~~~~~~~~~
141*73cdd846SSouleymane Conte
142*73cdd846SSouleymane Conte- 0 - unallocated.  The data cluster is not yet allocated.
143*73cdd846SSouleymane Conte- 1 - zero. The data cluster contents are all zeroes and no cluster is allocated.
144*73cdd846SSouleymane Conte
145*73cdd846SSouleymane ConteFuture format extensions may wish to store per-offset information. The least
146*73cdd846SSouleymane Contesignificant 12 bits of an offset are reserved for this purpose and must be set
147*73cdd846SSouleymane Conteto zero. Image files with ``cluster_size`` > 2^12 will have more unused bits
148*73cdd846SSouleymane Contewhich should also be zeroed.
149*73cdd846SSouleymane Conte
150*73cdd846SSouleymane ConteUnallocated L2 tables and data clusters
151*73cdd846SSouleymane Conte~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
152*73cdd846SSouleymane Conte
153*73cdd846SSouleymane ConteReads to an unallocated area of the image file access the backing file. If there
154*73cdd846SSouleymane Conteis no backing file, then zeroes are produced. The backing file may be smaller
155*73cdd846SSouleymane Contethan the image file and reads of unallocated areas beyond the end of the backing
156*73cdd846SSouleymane Contefile produce zeroes.
157*73cdd846SSouleymane Conte
158*73cdd846SSouleymane ConteWrites to an unallocated area cause a new data clusters to be allocated, and a new
159*73cdd846SSouleymane ConteL2 table if that is also unallocated. The new data cluster is populated with data
160*73cdd846SSouleymane Contefrom the backing file (or zeroes if no backing file) and the data being written.
161*73cdd846SSouleymane Conte
162*73cdd846SSouleymane ConteZero data clusters
163*73cdd846SSouleymane Conte~~~~~~~~~~~~~~~~~~
164*73cdd846SSouleymane Conte
165*73cdd846SSouleymane ConteZero data clusters are a space-efficient way of storing zeroed regions of the image.
166*73cdd846SSouleymane Conte
167*73cdd846SSouleymane ConteReads to a zero data cluster produce zeroes.
168*73cdd846SSouleymane Conte
169*73cdd846SSouleymane Conte.. note::
170*73cdd846SSouleymane Conte    The difference between an unallocated and a zero data cluster is that zero data
171*73cdd846SSouleymane Conte    clusters stop the reading of contents from the backing file.
172*73cdd846SSouleymane Conte
173*73cdd846SSouleymane ConteWrites to a zero data cluster cause a new data cluster to be allocated.  The new
174*73cdd846SSouleymane Contedata cluster is populated with zeroes and the data being written.
175*73cdd846SSouleymane Conte
176*73cdd846SSouleymane ConteLogical offset translation
177*73cdd846SSouleymane Conte~~~~~~~~~~~~~~~~~~~~~~~~~~
178*73cdd846SSouleymane Conte
179*73cdd846SSouleymane ConteLogical offsets are translated into cluster offsets as follows::
180*73cdd846SSouleymane Conte
181*73cdd846SSouleymane Conte  table_bits table_bits    cluster_bits
182*73cdd846SSouleymane Conte  <--------> <--------> <--------------->
183*73cdd846SSouleymane Conte +----------+----------+-----------------+
184*73cdd846SSouleymane Conte | L1 index | L2 index |     byte offset |
185*73cdd846SSouleymane Conte +----------+----------+-----------------+
186*73cdd846SSouleymane Conte
187*73cdd846SSouleymane Conte       Structure of a logical offset
188*73cdd846SSouleymane Conte
189*73cdd846SSouleymane Conte offset_mask = ~(cluster_size - 1) # mask for the image file byte offset
190*73cdd846SSouleymane Conte
191*73cdd846SSouleymane Conte def logical_to_cluster_offset(l1_index, l2_index, byte_offset):
192*73cdd846SSouleymane Conte   l2_offset = l1_table[l1_index]
193*73cdd846SSouleymane Conte   l2_table = load_table(l2_offset)
194*73cdd846SSouleymane Conte   cluster_offset = l2_table[l2_index] & offset_mask
195*73cdd846SSouleymane Conte   return cluster_offset + byte_offset
196*73cdd846SSouleymane Conte
197*73cdd846SSouleymane ConteConsistency checking
198*73cdd846SSouleymane Conte--------------------
199*73cdd846SSouleymane Conte
200*73cdd846SSouleymane ConteThis section is informational and included to provide background on the use
201*73cdd846SSouleymane Conteof the ``QED_F_NEED_CHECK features`` bit.
202*73cdd846SSouleymane Conte
203*73cdd846SSouleymane ConteThe ``QED_F_NEED_CHECK`` bit is used to mark an image as dirty before starting
204*73cdd846SSouleymane Contean operation that could leave the image in an inconsistent state if interrupted
205*73cdd846SSouleymane Conteby a crash or power failure.  A dirty image must be checked on open because its
206*73cdd846SSouleymane Contemetadata may not be consistent.
207*73cdd846SSouleymane Conte
208*73cdd846SSouleymane ConteConsistency check includes the following invariants:
209*73cdd846SSouleymane Conte
210*73cdd846SSouleymane Conte- Each cluster is referenced once and only once. It is an inconsistency to have
211*73cdd846SSouleymane Conte  a cluster referenced more than once by L1 or L2 tables. A cluster has been leaked
212*73cdd846SSouleymane Conte  if it has no references.
213*73cdd846SSouleymane Conte- Offsets must be within the image file size and must be ``cluster_size`` aligned.
214*73cdd846SSouleymane Conte- Table offsets must at least ``table_size`` * ``cluster_size`` bytes from the end
215*73cdd846SSouleymane Conte  of the image file so that there is space for the entire table.
216*73cdd846SSouleymane Conte
217*73cdd846SSouleymane ConteThe consistency check process starts from ``l1_table_offset`` and scans all L2 tables.
218*73cdd846SSouleymane ConteAfter the check completes with no other errors besides leaks, the ``QED_F_NEED_CHECK``
219*73cdd846SSouleymane Contebit can be cleared and the image can be accessed.
220