17f65ce83SAlberto Garciaqcow2 L2/refcount cache configuration 27f65ce83SAlberto Garcia===================================== 3be820971SAlberto GarciaCopyright (C) 2015, 2018 Igalia, S.L. 47f65ce83SAlberto GarciaAuthor: Alberto Garcia <berto@igalia.com> 57f65ce83SAlberto Garcia 67f65ce83SAlberto GarciaThis work is licensed under the terms of the GNU GPL, version 2 or 77f65ce83SAlberto Garcialater. See the COPYING file in the top-level directory. 87f65ce83SAlberto Garcia 97f65ce83SAlberto GarciaIntroduction 107f65ce83SAlberto Garcia------------ 117f65ce83SAlberto GarciaThe QEMU qcow2 driver has two caches that can improve the I/O 127f65ce83SAlberto Garciaperformance significantly. However, setting the right cache sizes is 137f65ce83SAlberto Garcianot a straightforward operation. 147f65ce83SAlberto Garcia 157f65ce83SAlberto GarciaThis document attempts to give an overview of the L2 and refcount 167f65ce83SAlberto Garciacaches, and how to configure them. 177f65ce83SAlberto Garcia 18f3fdeb9cSPhilippe Mathieu-DaudéPlease refer to the docs/interop/qcow2.txt file for an in-depth 197f65ce83SAlberto Garciatechnical description of the qcow2 file format. 207f65ce83SAlberto Garcia 217f65ce83SAlberto Garcia 227f65ce83SAlberto GarciaClusters 237f65ce83SAlberto Garcia-------- 247f65ce83SAlberto GarciaA qcow2 file is organized in units of constant size called clusters. 257f65ce83SAlberto Garcia 267f65ce83SAlberto GarciaThe cluster size is configurable, but it must be a power of two and 277f65ce83SAlberto Garciaits value 512 bytes or higher. QEMU currently defaults to 64 KB 287f65ce83SAlberto Garciaclusters, and it does not support sizes larger than 2MB. 297f65ce83SAlberto Garcia 307f65ce83SAlberto GarciaThe 'qemu-img create' command supports specifying the size using the 317f65ce83SAlberto Garciacluster_size option: 327f65ce83SAlberto Garcia 337f65ce83SAlberto Garcia qemu-img create -f qcow2 -o cluster_size=128K hd.qcow2 4G 347f65ce83SAlberto Garcia 357f65ce83SAlberto Garcia 367f65ce83SAlberto GarciaThe L2 tables 377f65ce83SAlberto Garcia------------- 387f65ce83SAlberto GarciaThe qcow2 format uses a two-level structure to map the virtual disk as 397f65ce83SAlberto Garciaseen by the guest to the disk image in the host. These structures are 407f65ce83SAlberto Garciacalled the L1 and L2 tables. 417f65ce83SAlberto Garcia 427f65ce83SAlberto GarciaThere is one single L1 table per disk image. The table is small and is 437f65ce83SAlberto Garciaalways kept in memory. 447f65ce83SAlberto Garcia 457f65ce83SAlberto GarciaThere can be many L2 tables, depending on how much space has been 467f65ce83SAlberto Garciaallocated in the image. Each table is one cluster in size. In order to 477f65ce83SAlberto Garciaread or write data from the virtual disk, QEMU needs to read its 487f65ce83SAlberto Garciacorresponding L2 table to find out where that data is located. Since 497f65ce83SAlberto Garciareading the table for each I/O operation can be expensive, QEMU keeps 507f65ce83SAlberto Garciaan L2 cache in memory to speed up disk access. 517f65ce83SAlberto Garcia 527f65ce83SAlberto GarciaThe size of the L2 cache can be configured, and setting the right 537f65ce83SAlberto Garciavalue can improve the I/O performance significantly. 547f65ce83SAlberto Garcia 557f65ce83SAlberto Garcia 567f65ce83SAlberto GarciaThe refcount blocks 577f65ce83SAlberto Garcia------------------- 587f65ce83SAlberto GarciaThe qcow2 format also mantains a reference count for each cluster. 597f65ce83SAlberto GarciaReference counts are used for cluster allocation and internal 607f65ce83SAlberto Garciasnapshots. The data is stored in a two-level structure similar to the 617f65ce83SAlberto GarciaL1/L2 tables described above. 627f65ce83SAlberto Garcia 637f65ce83SAlberto GarciaThe second level structures are called refcount blocks, are also one 647f65ce83SAlberto Garciacluster in size and the number is also variable and dependent on the 657f65ce83SAlberto Garciaamount of allocated space. 667f65ce83SAlberto Garcia 677f65ce83SAlberto GarciaEach block contains a number of refcount entries. Their size (in bits) 687f65ce83SAlberto Garciais a power of two and must not be higher than 64. It defaults to 16 697f65ce83SAlberto Garciabits, but a different value can be set using the refcount_bits option: 707f65ce83SAlberto Garcia 717f65ce83SAlberto Garcia qemu-img create -f qcow2 -o refcount_bits=8 hd.qcow2 4G 727f65ce83SAlberto Garcia 737f65ce83SAlberto GarciaQEMU keeps a refcount cache to speed up I/O much like the 747f65ce83SAlberto Garciaaforementioned L2 cache, and its size can also be configured. 757f65ce83SAlberto Garcia 767f65ce83SAlberto Garcia 777f65ce83SAlberto GarciaChoosing the right cache sizes 787f65ce83SAlberto Garcia------------------------------ 797f65ce83SAlberto GarciaIn order to choose the cache sizes we need to know how they relate to 807f65ce83SAlberto Garciathe amount of allocated space. 817f65ce83SAlberto Garcia 82*40fb215dSLeonid BlochThe part of the virtual disk that can be mapped by the L2 and refcount 837f65ce83SAlberto Garciacaches (in bytes) is: 847f65ce83SAlberto Garcia 857f65ce83SAlberto Garcia disk_size = l2_cache_size * cluster_size / 8 867f65ce83SAlberto Garcia disk_size = refcount_cache_size * cluster_size * 8 / refcount_bits 877f65ce83SAlberto Garcia 887f65ce83SAlberto GarciaWith the default values for cluster_size (64KB) and refcount_bits 89*40fb215dSLeonid Bloch(16), this becomes: 907f65ce83SAlberto Garcia 917f65ce83SAlberto Garcia disk_size = l2_cache_size * 8192 927f65ce83SAlberto Garcia disk_size = refcount_cache_size * 32768 937f65ce83SAlberto Garcia 947f65ce83SAlberto GarciaSo in order to cover n GB of disk space with the default values we 957f65ce83SAlberto Garcianeed: 967f65ce83SAlberto Garcia 977f65ce83SAlberto Garcia l2_cache_size = disk_size_GB * 131072 987f65ce83SAlberto Garcia refcount_cache_size = disk_size_GB * 32768 997f65ce83SAlberto Garcia 100*40fb215dSLeonid BlochFor example, 1MB of L2 cache is needed to cover every 8 GB of the virtual 101*40fb215dSLeonid Blochimage size (given that the default cluster size is used): 1027f65ce83SAlberto Garcia 103*40fb215dSLeonid Bloch 8 GB / 8192 = 1 MB 104*40fb215dSLeonid Bloch 105*40fb215dSLeonid BlochThe refcount cache is 4 times the cluster size by default. With the default 106*40fb215dSLeonid Blochcluster size of 64 KB, it is 256 KB (262144 bytes). This is sufficient for 107*40fb215dSLeonid Bloch8 GB of image size: 108*40fb215dSLeonid Bloch 109*40fb215dSLeonid Bloch 262144 * 32768 = 8 GB 1107f65ce83SAlberto Garcia 1117f65ce83SAlberto Garcia 1127f65ce83SAlberto GarciaHow to configure the cache sizes 1137f65ce83SAlberto Garcia-------------------------------- 1147f65ce83SAlberto GarciaCache sizes can be configured using the -drive option in the 1157f65ce83SAlberto Garciacommand-line, or the 'blockdev-add' QMP command. 1167f65ce83SAlberto Garcia 1177f65ce83SAlberto GarciaThere are three options available, and all of them take bytes: 1187f65ce83SAlberto Garcia 1197f65ce83SAlberto Garcia"l2-cache-size": maximum size of the L2 table cache 1207f65ce83SAlberto Garcia"refcount-cache-size": maximum size of the refcount block cache 1217f65ce83SAlberto Garcia"cache-size": maximum size of both caches combined 1227f65ce83SAlberto Garcia 123603790efSAlberto GarciaThere are a few things that need to be taken into account: 1247f65ce83SAlberto Garcia 125be820971SAlberto Garcia - Both caches must have a size that is a multiple of the cluster size 126be820971SAlberto Garcia (or the cache entry size: see "Using smaller cache sizes" below). 1277f65ce83SAlberto Garcia 128603790efSAlberto Garcia - The default L2 cache size is 8 clusters or 1MB (whichever is more), 129603790efSAlberto Garcia and the minimum is 2 clusters (or 2 cache entries, see below). 1307f65ce83SAlberto Garcia 131603790efSAlberto Garcia - The default (and minimum) refcount cache size is 4 clusters. 1327f65ce83SAlberto Garcia 133603790efSAlberto Garcia - If only "cache-size" is specified then QEMU will assign as much 134603790efSAlberto Garcia memory as possible to the L2 cache before increasing the refcount 135603790efSAlberto Garcia cache size. 1367f65ce83SAlberto Garcia 137*40fb215dSLeonid Bloch - At most two of "l2-cache-size", "refcount-cache-size", and "cache-size" 138*40fb215dSLeonid Bloch can be set simultaneously. 139*40fb215dSLeonid Bloch 140603790efSAlberto GarciaUnlike L2 tables, refcount blocks are not used during normal I/O but 141603790efSAlberto Garciaonly during allocations and internal snapshots. In most cases they are 142603790efSAlberto Garciaaccessed sequentially (even during random guest I/O) so increasing the 143603790efSAlberto Garciarefcount cache size won't have any measurable effect in performance 144603790efSAlberto Garcia(this can change if you are using internal snapshots, so you may want 145603790efSAlberto Garciato think about increasing the cache size if you use them heavily). 1467f65ce83SAlberto Garcia 147603790efSAlberto GarciaBefore QEMU 2.12 the refcount cache had a default size of 1/4 of the 148603790efSAlberto GarciaL2 cache size. This resulted in unnecessarily large caches, so now the 149603790efSAlberto Garciarefcount cache is as small as possible unless overridden by the user. 1507f65ce83SAlberto Garcia 1517f65ce83SAlberto Garcia 152be820971SAlberto GarciaUsing smaller cache entries 153be820971SAlberto Garcia--------------------------- 154be820971SAlberto GarciaThe qcow2 L2 cache stores complete tables by default. This means that 155be820971SAlberto Garciaif QEMU needs an entry from an L2 table then the whole table is read 156be820971SAlberto Garciafrom disk and is kept in the cache. If the cache is full then a 157be820971SAlberto Garciacomplete table needs to be evicted first. 158be820971SAlberto Garcia 159be820971SAlberto GarciaThis can be inefficient with large cluster sizes since it results in 160be820971SAlberto Garciamore disk I/O and wastes more cache memory. 161be820971SAlberto Garcia 162be820971SAlberto GarciaSince QEMU 2.12 you can change the size of the L2 cache entry and make 163be820971SAlberto Garciait smaller than the cluster size. This can be configured using the 164be820971SAlberto Garcia"l2-cache-entry-size" parameter: 165be820971SAlberto Garcia 166be820971SAlberto Garcia -drive file=hd.qcow2,l2-cache-size=2097152,l2-cache-entry-size=4096 167be820971SAlberto Garcia 168be820971SAlberto GarciaSome things to take into account: 169be820971SAlberto Garcia 170be820971SAlberto Garcia - The L2 cache entry size has the same restrictions as the cluster 171be820971SAlberto Garcia size (power of two, at least 512 bytes). 172be820971SAlberto Garcia 173be820971SAlberto Garcia - Smaller entry sizes generally improve the cache efficiency and make 174be820971SAlberto Garcia disk I/O faster. This is particularly true with solid state drives 175be820971SAlberto Garcia so it's a good idea to reduce the entry size in those cases. With 176be820971SAlberto Garcia rotating hard drives the situation is a bit more complicated so you 177be820971SAlberto Garcia should test it first and stay with the default size if unsure. 178be820971SAlberto Garcia 179be820971SAlberto Garcia - Try different entry sizes to see which one gives faster performance 180be820971SAlberto Garcia in your case. The block size of the host filesystem is generally a 181be820971SAlberto Garcia good default (usually 4096 bytes in the case of ext4). 182be820971SAlberto Garcia 183be820971SAlberto Garcia - Only the L2 cache can be configured this way. The refcount cache 184be820971SAlberto Garcia always uses the cluster size as the entry size. 185be820971SAlberto Garcia 186be820971SAlberto Garcia - If the L2 cache is big enough to hold all of the image's L2 tables 187be820971SAlberto Garcia (as explained in the "Choosing the right cache sizes" section 188be820971SAlberto Garcia earlier in this document) then none of this is necessary and you 189be820971SAlberto Garcia can omit the "l2-cache-entry-size" parameter altogether. 190be820971SAlberto Garcia 191be820971SAlberto Garcia 1927f65ce83SAlberto GarciaReducing the memory usage 1937f65ce83SAlberto Garcia------------------------- 1947f65ce83SAlberto GarciaIt is possible to clean unused cache entries in order to reduce the 1957f65ce83SAlberto Garciamemory usage during periods of low I/O activity. 1967f65ce83SAlberto Garcia 1977f65ce83SAlberto GarciaThe parameter "cache-clean-interval" defines an interval (in seconds). 1987f65ce83SAlberto GarciaAll cache entries that haven't been accessed during that interval are 1997f65ce83SAlberto Garciaremoved from memory. 2007f65ce83SAlberto Garcia 2017f65ce83SAlberto GarciaThis example removes all unused cache entries every 15 minutes: 2027f65ce83SAlberto Garcia 2037f65ce83SAlberto Garcia -drive file=hd.qcow2,cache-clean-interval=900 2047f65ce83SAlberto Garcia 2057f65ce83SAlberto GarciaIf unset, the default value for this parameter is 0 and it disables 2067f65ce83SAlberto Garciathis feature. 2077f65ce83SAlberto Garcia 2087f65ce83SAlberto GarciaNote that this functionality currently relies on the MADV_DONTNEED 2098f577583SAlberto Garciaargument for madvise() to actually free the memory. This is a 2108f577583SAlberto GarciaLinux-specific feature, so cache-clean-interval is not supported in 2118f577583SAlberto Garciaother systems. 212