1# Memory 2 3Cloud-Hypervisor has many ways to expose memory to the guest VM. This document 4aims to explain what Cloud-Hypervisor is capable of and how it can be used to 5meet the needs of very different use cases. 6 7## Basic Parameters 8 9`MemoryConfig` or what is known as `--memory` from the CLI perspective is the 10easiest way to get started with Cloud-Hypervisor. 11 12```rust 13struct MemoryConfig { 14 size: u64, 15 mergeable: bool, 16 shared: bool, 17 hugepages: bool, 18 hugepage_size: Option<u64>, 19 hotplug_method: HotplugMethod, 20 hotplug_size: Option<u64>, 21 hotplugged_size: Option<u64>, 22 zones: Option<Vec<MemoryZoneConfig>>, 23} 24``` 25 26``` 27--memory <memory> Memory parameters "size=<guest_memory_size>,mergeable=on|off,shared=on|off,hugepages=on|off,hugepage_size=<hugepage_size>,hotplug_method=acpi|virtio-mem,hotplug_size=<hotpluggable_memory_size>,hotplugged_size=<hotplugged_memory_size>" 28``` 29 30### `size` 31 32Size of the RAM in the guest VM. 33 34This option is mandatory when using the `--memory` parameter. 35 36Value is an unsigned integer of 64 bits. 37 38_Example_ 39 40``` 41--memory size=1G 42``` 43 44### `mergeable` 45 46Specifies if the pages from the guest RAM must be marked as _mergeable_. In 47case this option is `true` or `on`, the pages will be marked with `madvise(2)` 48to let the host kernel know which pages are eligible for being merged by the 49KSM daemon. 50 51This option can be used when trying to reach a higher density of VMs running 52on a single host, as it will reduce the amount of memory consumed by each VM. 53 54By default this option is turned off. 55 56_Example_ 57 58``` 59--memory size=1G,mergeable=on 60``` 61 62### `shared` 63 64Specifies if the memory must be `mmap(2)` with `MAP_SHARED` flag. 65 66By sharing a memory mapping, one can share the guest RAM with other processes 67running on the host. One can use this option when running vhost-user devices 68as part of the VM device model, as they will be driven by standalone daemons 69needing access to the guest RAM content. 70 71By default this option is turned off, which results in performing `mmap(2)` 72with `MAP_PRIVATE` flag. 73 74_Example_ 75 76``` 77--memory size=1G,shared=on 78``` 79 80### `hugepages` and `hugepage_size` 81 82Specifies if the memory must be created and `mmap(2)` with `MAP_HUGETLB` and size 83flags. This performs a memory mapping relying on the specified huge page size. If no huge page size is supplied the system's default huge page size is used. 84 85By using hugepages, one can improve the overall performance of the VM, assuming 86the guest will allocate hugepages as well. Another interesting use case is VFIO 87as it speeds up the VM's boot time since the amount of IOMMU mappings are 88reduced. 89 90The user is responsible for ensuring there are sufficient huge pages of the specified size for the VMM to use. 91Failure to do so may result in strange VMM behaviour, e.g. error with `ReadKernelImage` is common. 92If there is a strange error with `hugepages` enabled, just disable it or check whether there are enough huge pages. 93 94By default this option is turned off. 95 96_Example_ 97 98``` 99--memory size=1G,hugepages=on,hugepage_size=2M 100``` 101 102### `hotplug_method` 103 104Selects the way of adding and/or removing memory to/from a booted VM. 105 106Possible values are `acpi` and `virtio-mem`. Default value is `acpi`. 107 108_Example_ 109 110``` 111--memory size=1G,hotplug_method=acpi 112``` 113 114### `hotplug_size` 115 116Amount of memory that can be dynamically added to the VM. 117 118Value is an unsigned integer of 64 bits. A value of 0 is invalid. 119 120_Example_ 121 122``` 123--memory size=1G,hotplug_size=1G 124``` 125 126### `hotplugged_size` 127 128Amount of memory that will be dynamically added to the VM at boot. This option 129allows for starting a VM with a certain amount of memory that can be reduced 130during runtime. 131 132This is only valid when the `hotplug_method` is `virtio-mem` as it does not 133make sense for the `acpi` use case. When using ACPI, the memory can't be 134resized after it has been extended. 135 136This option is only valid when `hotplug_size` is specified, and its value can't 137exceed the value of `hotplug_size`. 138 139Value is an unsigned integer of 64 bits. A value of 0 is invalid. 140 141_Example_ 142 143``` 144--memory size=1G,hotplug_method=virtio-mem,hotplug_size=1G,hotplugged_size=512M 145``` 146 147## Advanced Parameters 148 149`MemoryZoneConfig` or what is known as `--memory-zone` from the CLI perspective 150is a power user parameter. It allows for a full description of the guest RAM, 151describing how every memory region is backed and exposed to the guest. 152 153```rust 154struct MemoryZoneConfig { 155 id: String, 156 size: u64, 157 file: Option<PathBuf>, 158 shared: bool, 159 hugepages: bool, 160 host_numa_node: Option<u32>, 161 hotplug_size: Option<u64>, 162 hotplugged_size: Option<u64>, 163} 164``` 165 166``` 167--memory-zone <memory-zone> User defined memory zone parameters "size=<guest_memory_region_size>,file=<backing_file>,shared=on|off,hugepages=on|off,host_numa_node=<node_id>,id=<zone_identifier>,hotplug_size=<hotpluggable_memory_size>,hotplugged_size=<hotplugged_memory_size>" 168``` 169 170This parameter expects one or more occurences, allowing for a list of memory 171zones to be defined. It must be used with `--memory size=0`, clearly indicating 172that the memory will be described through advanced parameters. 173 174Each zone is given a list of options which we detail through the following 175sections. 176 177### `id` 178 179Memory zone identifier. This identifier must be unique, otherwise an error will 180be returned. 181 182This option is useful when referring to a memory zone previously created. In 183particular, the `--numa` parameter can associate a memory zone to a specific 184NUMA node based on the memory zone identifier. 185 186This option is mandatory when using the `--memory-zone` parameter. 187 188Value is a string. 189 190_Example_ 191 192``` 193--memory size=0 194--memory-zone id=mem0,size=1G 195``` 196 197### `size` 198 199Size of the memory zone. 200 201This option is mandatory when using the `--memory-zone` parameter. 202 203Value is an unsigned integer of 64 bits. 204 205_Example_ 206 207``` 208--memory size=0 209--memory-zone id=mem0,size=1G 210``` 211 212### `file` 213 214Path to the file backing the memory zone. This can be either a file or a 215directory. In case of a file, it will be opened and used as the backing file 216for the `mmap(2)` operation. In case of a directory, a temporary file with no 217hard link on the filesystem will be created. This file will be used as the 218backing file for the `mmap(2)` operation. 219 220This option can be particularly useful when trying to back a part of the guest 221RAM with a well known file. In the context of the snapshot/restore feature, and 222if the provided path is a file, the snapshot operation will not perform any 223copy of the guest RAM content for this specific memory zone since the user has 224access to it and it would duplicate data already stored on the current 225filesystem. 226 227Value is a string. 228 229_Example_ 230 231``` 232--memory size=0 233--memory-zone id=mem0,size=1G,file=/foo/bar 234``` 235 236### `shared` 237 238Specifies if the memory zone must be `mmap(2)` with `MAP_SHARED` flag. 239 240By sharing a memory zone mapping, one can share part of the guest RAM with 241other processes running on the host. One can use this option when running 242vhost-user devices as part of the VM device model, as they will be driven 243by standalone daemons needing access to the guest RAM content. 244 245By default this option is turned off, which result in performing `mmap(2)` 246with `MAP_PRIVATE` flag. 247 248_Example_ 249 250``` 251--memory size=0 252--memory-zone id=mem0,size=1G,shared=on 253``` 254 255### `hugepages` 256 257Specifies if the memory zone must be `mmap(2)` with `MAP_HUGETLB` and 258`MAP_HUGE_2MB` flags. This performs a memory zone mapping relying on 2MiB 259pages instead of the default 4kiB pages. 260 261By using hugepages, one can improve the overall performance of the VM, assuming 262the guest will allocate hugepages as well. Another interesting use case is VFIO 263as it speeds up the VM's boot time since the amount of IOMMU mappings are 264reduced. 265 266By default this option is turned off. 267 268_Example_ 269 270``` 271--memory size=0 272--memory-zone id=mem0,size=1G,hugepages=on 273``` 274 275### `host_numa_node` 276 277Node identifier of a node present on the host. This option will let the user 278pick a specific NUMA node from which the memory must be allocated. After the 279memory zone is `mmap(2)`, the NUMA policy for this memory mapping will be 280applied through `mbind(2)`, relying on the provided node identifier. If the 281node does not exist on the host, the call to `mbind(2)` will fail. 282 283This option is useful when trying to back a VM memory with a specific type of 284memory from the host. Assuming a host has two types of memory, with one slower 285than the other, each related to a distinct NUMA node, one could create a VM 286with slower memory accesses by backing the entire guest RAM from the furthest 287NUMA node on the host. 288 289This option also gives the opportunity to create a VM with non uniform memory 290accesses as one could define a first memory zone backed by fast memory, and a 291second memory zone backed by slow memory. 292 293Value is an unsigned integer of 32 bits. 294 295_Example_ 296 297``` 298--memory size=0 299--memory-zone id=mem0,size=1G,host_numa_node=0 300``` 301 302### `hotplug_size` 303 304Amount of memory that can be dynamically added to the memory zone. Since 305`virtio-mem` is the only way of resizing a memory zone, one must specify 306the `hotplug_method=virtio-mem` to the `--memory` parameter. 307 308Value is an unsigned integer of 64 bits. A value of 0 is invalid. 309 310_Example_ 311 312``` 313--memory size=0,hotplug_method=virtio-mem 314--memory-zone id=mem0,size=1G,hotplug_size=1G 315``` 316 317### `hotplugged_size` 318 319Amount of memory that will be dynamically added to a memory zone at VM's boot. 320This option allows for starting a VM with a certain amount of memory that can 321be reduced during runtime. 322 323This is only valid when the `hotplug_method` is `virtio-mem` as it does not 324make sense for the `acpi` use case. When using ACPI, the memory can't be 325resized after it has been extended. 326 327This option is only valid when `hotplug_size` is specified, and its value can't 328exceed the value of `hotplug_size`. 329 330Value is an unsigned integer of 64 bits. A value of 0 is invalid. 331 332_Example_ 333 334``` 335--memory size=0,hotplug_method=virtio-mem 336--memory-zone id=mem0,size=1G,hotplug_size=1G,hotplugged_size=512M 337``` 338 339## NUMA settings 340 341`NumaConfig` or what is known as `--numa` from the CLI perspective has been 342introduced to define a guest NUMA topology. It allows for a fine description 343about the CPUs and memory ranges associated with each NUMA node. Additionally 344it allows for specifying the distance between each NUMA node. 345 346```rust 347struct NumaConfig { 348 guest_numa_id: u32, 349 cpus: Option<Vec<u8>>, 350 distances: Option<Vec<NumaDistance>>, 351 memory_zones: Option<Vec<String>>, 352 sgx_epc_sections: Option<Vec<String>>, 353} 354``` 355 356``` 357--numa <numa> Settings related to a given NUMA node "guest_numa_id=<node_id>,cpus=<cpus_id>,distances=<list_of_distances_to_destination_nodes>,memory_zones=<list_of_memory_zones>,sgx_epc_sections=<list_of_sgx_epc_sections>" 358``` 359 360### `guest_numa_id` 361 362Node identifier of a guest NUMA node. This identifier must be unique, otherwise 363an error will be returned. 364 365This option is mandatory when using the `--numa` parameter. 366 367Value is an unsigned integer of 32 bits. 368 369_Example_ 370 371``` 372--numa guest_numa_id=0 373``` 374 375### `cpus` 376 377List of virtual CPUs attached to the guest NUMA node identified by the 378`guest_numa_id` option. This allows for describing a list of CPUs which 379must be seen by the guest as belonging to the NUMA node `guest_numa_id`. 380 381One can use this option for a fine grained description of the NUMA topology 382regarding the CPUs associated with it, which might help the guest run more 383efficiently. 384 385Multiple values can be provided to define the list. Each value is an unsigned 386integer of 8 bits. 387 388For instance, if one needs to attach all CPUs from 0 to 4 to a specific node, 389the syntax using `-` will help define a contiguous range with `cpus=0-4`. The 390same example could also be described with `cpus=0:1:2:3:4`. 391 392A combination of both `-` and `:` separators is useful when one might need to 393describe a list containing all CPUs from 0 to 99 and the CPU 255, as it could 394simply be described with `cpus=0-99:255`. 395 396_Example_ 397 398``` 399--cpus boot=8 400--numa guest_numa_id=0,cpus=1-3:7 guest_numa_id=1,cpus=0:4-6 401``` 402 403### `distances` 404 405List of distances between the current NUMA node referred by `guest_numa_id` 406and the destination NUMA nodes listed along with distances. This option let 407the user choose the distances between guest NUMA nodes. This is important to 408provide an accurate description of the way non uniform memory accesses will 409perform in the guest. 410 411One or more tuple of two values must be provided through this option. The first 412value is an unsigned integer of 32 bits as it represents the destination NUMA 413node. The second value is an unsigned integer of 8 bits as it represents the 414distance between the current NUMA node and the destination NUMA node. The two 415values are separated by `@` (`value1@value2`), meaning the destination NUMA 416node `value1` is located at a distance of `value2`. Each tuple is separated 417from the others with `:` separator. 418 419For instance, if one wants to define 3 NUMA nodes, with each node located at 420different distances, it can be described with the following example. 421 422_Example_ 423 424``` 425--numa guest_numa_id=0,distances=1@15:2@25 guest_numa_id=1,distances=0@15:2@20 guest_numa_id=2,distances=0@25:1@20 426``` 427 428### `memory_zones` 429 430List of memory zones attached to the guest NUMA node identified by the 431`guest_numa_id` option. This allows for describing a list of memory ranges 432which must be seen by the guest as belonging to the NUMA node `guest_numa_id`. 433 434This option can be very useful and powerful when combined with `host_numa_node` 435option from `--memory-zone` parameter as it allows for creating a VM with non 436uniform memory accesses, and let the guest know about it. It allows for 437exposing memory zones through different NUMA nodes, which can help the guest 438workload run more efficiently. 439 440Multiple values can be provided to define the list. Each value is a string 441referring to an existing memory zone identifier. Values are separated from 442each other with the `:` separator. 443 444Note that a memory zone must belong to a single NUMA node. The following 445configuration is incorrect, therefore not allowed: 446`--numa guest_numa_id=0,memory_zones=mem0 guest_numa_id=1,memory_zones=mem0` 447 448_Example_ 449 450``` 451--memory size=0 452--memory-zone id=mem0,size=1G id=mem1,size=1G id=mem2,size=1G 453--numa guest_numa_id=0,memory_zones=mem0:mem2 guest_numa_id=1,memory_zones=mem1 454``` 455 456### `sgx_epc_sections` 457 458List of SGX EPC sections attached to the guest NUMA node identified by the 459`guest_numa_id` option. This allows for describing a list of SGX EPC sections 460which must be seen by the guest as belonging to the NUMA node `guest_numa_id`. 461 462Multiple values can be provided to define the list. Each value is a string 463referring to an existing SGX EPC section identifier. Values are separated from 464each other with the `:` separator. 465 466_Example_ 467 468``` 469--sgx-epc id=epc0,size=32M id=epc1,size=64M id=epc2,size=32M 470--numa guest_numa_id=0,sgx_epc_sections=epc1 guest_numa_id=1,sgx_epc_sections=epc0:epc2 471``` 472 473### PCI bus 474 475Cloud Hypervisor supports only one PCI bus, which is why it has been tied to 476the NUMA node 0 by default. It is the user responsibility to organize the NUMA 477nodes correctly so that vCPUs and guest RAM which should be located on the same 478NUMA node as the PCI bus end up on the NUMA node 0. 479