xref: /qemu/docs/interop/qed_spec.rst (revision 5dc8e4e892ba10e040d12afece0d36b8b6a269d6) !
1===================================
2QED Image File Format Specification
3===================================
4
5The file format looks like this::
6
7 +----------+----------+----------+-----+
8 | cluster0 | cluster1 | cluster2 | ... |
9 +----------+----------+----------+-----+
10
11The first cluster begins with the ``header``. The header contains information
12about where regular clusters start; this allows the header to be extensible and
13store extra information about the image file. A regular cluster may be
14a ``data cluster``, an ``L2``, or an ``L1 table``. L1 and L2 tables are composed
15of one or more contiguous clusters.
16
17Normally the file size will be a multiple of the cluster size.  If the file size
18is not a multiple, extra information after the last cluster may not be preserved
19if data is written. Legitimate extra information should use space between the header
20and the first regular cluster.
21
22All fields are little-endian.
23
24Header
25------
26
27::
28
29  Header {
30     uint32_t magic;               /* QED\0 */
31
32     uint32_t cluster_size;        /* in bytes */
33     uint32_t table_size;          /* for L1 and L2 tables, in clusters */
34     uint32_t header_size;         /* in clusters */
35
36     uint64_t features;            /* format feature bits */
37     uint64_t compat_features;     /* compat feature bits */
38     uint64_t autoclear_features;  /* self-resetting feature bits */
39
40     uint64_t l1_table_offset;     /* in bytes */
41     uint64_t image_size;          /* total logical image size, in bytes */
42
43     /* if (features & QED_F_BACKING_FILE) */
44     uint32_t backing_filename_offset; /* in bytes from start of header */
45     uint32_t backing_filename_size;   /* in bytes */
46  }
47
48Field descriptions:
49~~~~~~~~~~~~~~~~~~~
50
51- ``cluster_size`` must be a power of 2 in range [2^12, 2^26].
52- ``table_size`` must be a power of 2 in range [1, 16].
53- ``header_size`` is the number of clusters used by the header and any additional
54  information stored before regular clusters.
55- ``features``, ``compat_features``, and ``autoclear_features`` are file format
56  extension bitmaps. They work as follows:
57
58  - An image with unknown ``features`` bits enabled must not be opened. File format
59    changes that are not backwards-compatible must use ``features`` bits.
60  - An image with unknown ``compat_features`` bits enabled can be opened safely.
61    The unknown features are simply ignored and represent backwards-compatible
62    changes to the file format.
63  - An image with unknown ``autoclear_features`` bits enable can be opened safely
64    after clearing the unknown bits. This allows for backwards-compatible changes
65    to the file format which degrade gracefully and can be re-enabled again by a
66    new program later.
67- ``l1_table_offset`` is the offset of the first byte of the L1 table in the image
68  file and must be a multiple of ``cluster_size``.
69- ``image_size`` is the block device size seen by the guest and must be a multiple
70  of 512 bytes.
71- ``backing_filename_offset`` and ``backing_filename_size`` describe a string in
72  (byte offset, byte size) form. It is not NUL-terminated and has no alignment constraints.
73  The string must be stored within the first ``header_size`` clusters. The backing filename
74  may be an absolute path or relative to the image file.
75
76Feature bits:
77~~~~~~~~~~~~~
78
79- ``QED_F_BACKING_FILE = 0x01``. The image uses a backing file.
80- ``QED_F_NEED_CHECK = 0x02``. The image needs a consistency check before use.
81- ``QED_F_BACKING_FORMAT_NO_PROBE = 0x04``. The backing file is a raw disk image
82  and no file format autodetection should be attempted.  This should be used to
83  ensure that raw backing files are never detected as an image format if they happen
84  to contain magic constants.
85
86There are currently no defined ``compat_features`` or ``autoclear_features`` bits.
87
88Fields predicated on a feature bit are only used when that feature is set.
89The fields always take up header space, regardless of whether or not the feature
90bit is set.
91
92Tables
93------
94
95Tables provide the translation from logical offsets in the block device to cluster
96offsets in the file.
97
98::
99
100 #define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t))
101
102 Table {
103     uint64_t offsets[TABLE_NOFFSETS];
104 }
105
106The tables are organized as follows::
107
108                    +----------+
109                    | L1 table |
110                    +----------+
111               ,------'  |  '------.
112          +----------+   |    +----------+
113          | L2 table |  ...   | L2 table |
114          +----------+        +----------+
115      ,------'  |  '------.
116 +----------+   |    +----------+
117 |   Data   |  ...   |   Data   |
118 +----------+        +----------+
119
120A table is made up of one or more contiguous clusters.  The ``table_size`` header
121field determines table size for an image file. For example, ``cluster_size=64 KB``
122and ``table_size=4`` results in 256 KB tables.
123
124The logical image size must be less than or equal to the maximum possible size of
125clusters rooted by the L1 table:
126
127.. code::
128
129 header.image_size <= TABLE_NOFFSETS * TABLE_NOFFSETS * header.cluster_size
130
131L1, L2, and data cluster offsets must be aligned to ``header.cluster_size``.
132The following offsets have special meanings:
133
134L2 table offsets
135~~~~~~~~~~~~~~~~
136
137- 0 - unallocated. The L2 table is not yet allocated.
138
139Data cluster offsets
140~~~~~~~~~~~~~~~~~~~~
141
142- 0 - unallocated.  The data cluster is not yet allocated.
143- 1 - zero. The data cluster contents are all zeroes and no cluster is allocated.
144
145Future format extensions may wish to store per-offset information. The least
146significant 12 bits of an offset are reserved for this purpose and must be set
147to zero. Image files with ``cluster_size`` > 2^12 will have more unused bits
148which should also be zeroed.
149
150Unallocated L2 tables and data clusters
151~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
152
153Reads to an unallocated area of the image file access the backing file. If there
154is no backing file, then zeroes are produced. The backing file may be smaller
155than the image file and reads of unallocated areas beyond the end of the backing
156file produce zeroes.
157
158Writes to an unallocated area cause a new data clusters to be allocated, and a new
159L2 table if that is also unallocated. The new data cluster is populated with data
160from the backing file (or zeroes if no backing file) and the data being written.
161
162Zero data clusters
163~~~~~~~~~~~~~~~~~~
164
165Zero data clusters are a space-efficient way of storing zeroed regions of the image.
166
167Reads to a zero data cluster produce zeroes.
168
169.. note::
170    The difference between an unallocated and a zero data cluster is that zero data
171    clusters stop the reading of contents from the backing file.
172
173Writes to a zero data cluster cause a new data cluster to be allocated.  The new
174data cluster is populated with zeroes and the data being written.
175
176Logical offset translation
177~~~~~~~~~~~~~~~~~~~~~~~~~~
178
179Logical offsets are translated into cluster offsets as follows::
180
181  table_bits table_bits    cluster_bits
182  <--------> <--------> <--------------->
183 +----------+----------+-----------------+
184 | L1 index | L2 index |     byte offset |
185 +----------+----------+-----------------+
186
187       Structure of a logical offset
188
189 offset_mask = ~(cluster_size - 1) # mask for the image file byte offset
190
191 def logical_to_cluster_offset(l1_index, l2_index, byte_offset):
192   l2_offset = l1_table[l1_index]
193   l2_table = load_table(l2_offset)
194   cluster_offset = l2_table[l2_index] & offset_mask
195   return cluster_offset + byte_offset
196
197Consistency checking
198--------------------
199
200This section is informational and included to provide background on the use
201of the ``QED_F_NEED_CHECK features`` bit.
202
203The ``QED_F_NEED_CHECK`` bit is used to mark an image as dirty before starting
204an operation that could leave the image in an inconsistent state if interrupted
205by a crash or power failure.  A dirty image must be checked on open because its
206metadata may not be consistent.
207
208Consistency check includes the following invariants:
209
210- Each cluster is referenced once and only once. It is an inconsistency to have
211  a cluster referenced more than once by L1 or L2 tables. A cluster has been leaked
212  if it has no references.
213- Offsets must be within the image file size and must be ``cluster_size`` aligned.
214- Table offsets must at least ``table_size`` * ``cluster_size`` bytes from the end
215  of the image file so that there is space for the entire table.
216
217The consistency check process starts from ``l1_table_offset`` and scans all L2 tables.
218After the check completes with no other errors besides leaks, the ``QED_F_NEED_CHECK``
219bit can be cleared and the image can be accessed.
220