1# Memory 2 3Cloud Hypervisor has many ways to expose memory to the guest VM. This document 4aims to explain what Cloud Hypervisor is capable of and how it can be used to 5meet the needs of very different use cases. 6 7## Basic Parameters 8 9`MemoryConfig` or what is known as `--memory` from the CLI perspective is the 10easiest way to get started with Cloud Hypervisor. 11 12```rust 13struct MemoryConfig { 14 size: u64, 15 mergeable: bool, 16 hotplug_method: HotplugMethod, 17 hotplug_size: Option<u64>, 18 hotplugged_size: Option<u64>, 19 shared: bool, 20 hugepages: bool, 21 hugepage_size: Option<u64>, 22 prefault: bool, 23 thp: bool 24 zones: Option<Vec<MemoryZoneConfig>>, 25} 26``` 27 28``` 29--memory <memory> Memory parameters "size=<guest_memory_size>,mergeable=on|off,shared=on|off,hugepages=on|off,hugepage_size=<hugepage_size>,hotplug_method=acpi|virtio-mem,hotplug_size=<hotpluggable_memory_size>,hotplugged_size=<hotplugged_memory_size>,prefault=on|off,thp=on|off" [default: size=512M,thp=on] 30``` 31 32### `size` 33 34Size of the RAM in the guest VM. 35 36This option is mandatory when using the `--memory` parameter. 37 38Value is an unsigned integer of 64 bits. 39 40_Example_ 41 42``` 43--memory size=1G 44``` 45 46### `mergeable` 47 48Specifies if the pages from the guest RAM must be marked as _mergeable_. In 49case this option is `true` or `on`, the pages will be marked with `madvise(2)` 50to let the host kernel know which pages are eligible for being merged by the 51KSM daemon. 52 53This option can be used when trying to reach a higher density of VMs running 54on a single host, as it will reduce the amount of memory consumed by each VM. 55 56By default this option is turned off. 57 58_Example_ 59 60``` 61--memory size=1G,mergeable=on 62``` 63 64### `hotplug_method` 65 66Selects the way of adding and/or removing memory to/from a booted VM. 67 68Possible values are `acpi` and `virtio-mem`. Default value is `acpi`. 69 70_Example_ 71 72``` 73--memory size=1G,hotplug_method=acpi 74``` 75 76### `hotplug_size` 77 78Amount of memory that can be dynamically added to the VM. 79 80Value is an unsigned integer of 64 bits. A value of 0 is invalid. 81 82_Example_ 83 84``` 85--memory size=1G,hotplug_size=1G 86``` 87 88### `hotplugged_size` 89 90Amount of memory that will be dynamically added to the VM at boot. This option 91allows for starting a VM with a certain amount of memory that can be reduced 92during runtime. 93 94This is only valid when the `hotplug_method` is `virtio-mem` as it does not 95make sense for the `acpi` use case. When using ACPI, the memory can't be 96resized after it has been extended. 97 98This option is only valid when `hotplug_size` is specified, and its value can't 99exceed the value of `hotplug_size`. 100 101Value is an unsigned integer of 64 bits. A value of 0 is invalid. 102 103_Example_ 104 105``` 106--memory size=1G,hotplug_method=virtio-mem,hotplug_size=1G,hotplugged_size=512M 107``` 108 109### `shared` 110 111Specifies if the memory must be `mmap(2)` with `MAP_SHARED` flag. 112 113By sharing a memory mapping, one can share the guest RAM with other processes 114running on the host. One can use this option when running vhost-user devices 115as part of the VM device model, as they will be driven by standalone daemons 116needing access to the guest RAM content. 117 118By default this option is turned off, which results in performing `mmap(2)` 119with `MAP_PRIVATE` flag. 120 121If `hugepages=on` then the value of this field is ignored as huge pages always 122requires `MAP_SHARED`. 123 124_Example_ 125 126``` 127--memory size=1G,shared=on 128``` 129 130### `hugepages` and `hugepage_size` 131 132Specifies if the memory must be created and `mmap(2)` with `MAP_HUGETLB` and size 133flags. This performs a memory mapping relying on the specified huge page size. 134If no huge page size is supplied the system's default huge page size is used. 135 136By using hugepages, one can improve the overall performance of the VM, assuming 137the guest will allocate hugepages as well. Another interesting use case is VFIO 138as it speeds up the VM's boot time since the amount of IOMMU mappings are 139reduced. 140 141The user is responsible for ensuring there are sufficient huge pages of the 142specified size for the VMM to use. Failure to do so may result in strange VMM 143behaviour, e.g. error with `ReadKernelImage` is common. If there is a strange 144error with `hugepages` enabled, just disable it or check whether there are enough 145huge pages. 146 147If `hugepages=on` then the value of `shared` is ignored as huge pages always 148requires `MAP_SHARED`. 149 150By default this option is turned off. 151 152_Example_ 153 154``` 155--memory size=1G,hugepages=on,hugepage_size=2M 156``` 157 158### `prefault` 159 160Specifies if the memory must be `mmap(2)` with `MAP_POPULATE` flag. 161 162By triggering prefault, one can allocate all required physical memory and create 163its page tables while calling `mmap`. With physical memory allocated, the number 164of page faults will decrease during running, and performance will also improve. 165 166Note that boot of VM will be slower with `prefault` enabled because of allocating 167physical memory and creating page tables in advance, and physical memory of the 168specified size will be consumed quickly. 169 170This option only takes effect at boot of VM. There is also a `prefault` option in 171restore and its choice will overwrite `prefault` in memory. 172 173By default this option is turned off. 174 175_Example_ 176 177``` 178--memory size=1G,prefault=on 179``` 180 181### `thp` 182 183Specifies if private anonymous memory for the guest (i.e. `shared=off` and no 184backing file) should be labelled `MADV_HUGEPAGE` with `madvise(2)` indicating 185to the kernel that this memory may be backed with huge pages transparently. 186 187The use of transparent huge pages can improve the performance of the guest as 188there will fewer virtualisation related page faults. Unlike using 189`hugepages=on` a specific number of huge pages do not need to be allocated by 190the kernel. 191 192By default this option is turned on. 193 194_Example_ 195 196``` 197--memory size=1G,thp=on 198``` 199 200## Advanced Parameters 201 202`MemoryZoneConfig` or what is known as `--memory-zone` from the CLI perspective 203is a power user parameter. It allows for a full description of the guest RAM, 204describing how every memory region is backed and exposed to the guest. 205 206```rust 207struct MemoryZoneConfig { 208 id: String, 209 size: u64, 210 file: Option<PathBuf>, 211 shared: bool, 212 hugepages: bool, 213 hugepage_size: Option<u64>, 214 host_numa_node: Option<u32>, 215 hotplug_size: Option<u64>, 216 hotplugged_size: Option<u64>, 217 prefault: bool, 218} 219``` 220 221``` 222--memory-zone <memory-zone> User defined memory zone parameters "size=<guest_memory_region_size>,file=<backing_file>,shared=on|off,hugepages=on|off,hugepage_size=<hugepage_size>,host_numa_node=<node_id>,id=<zone_identifier>,hotplug_size=<hotpluggable_memory_size>,hotplugged_size=<hotplugged_memory_size>,prefault=on|off" 223``` 224 225This parameter expects one or more occurences, allowing for a list of memory 226zones to be defined. It must be used with `--memory size=0`, clearly indicating 227that the memory will be described through advanced parameters. 228 229Each zone is given a list of options which we detail through the following 230sections. 231 232### `id` 233 234Memory zone identifier. This identifier must be unique, otherwise an error will 235be returned. 236 237This option is useful when referring to a memory zone previously created. In 238particular, the `--numa` parameter can associate a memory zone to a specific 239NUMA node based on the memory zone identifier. 240 241This option is mandatory when using the `--memory-zone` parameter. 242 243Value is a string. 244 245_Example_ 246 247``` 248--memory size=0 249--memory-zone id=mem0,size=1G 250``` 251 252### `size` 253 254Size of the memory zone. 255 256This option is mandatory when using the `--memory-zone` parameter. 257 258Value is an unsigned integer of 64 bits. 259 260_Example_ 261 262``` 263--memory size=0 264--memory-zone id=mem0,size=1G 265``` 266 267### `file` 268 269Path to the file backing the memory zone. This can be either a file or a 270directory. In case of a file, it will be opened and used as the backing file 271for the `mmap(2)` operation. In case of a directory, a temporary file with no 272hard link on the filesystem will be created. This file will be used as the 273backing file for the `mmap(2)` operation. 274 275This option can be particularly useful when trying to back a part of the guest 276RAM with a well known file. In the context of the snapshot/restore feature, and 277if the provided path is a file, the snapshot operation will not perform any 278copy of the guest RAM content for this specific memory zone since the user has 279access to it and it would duplicate data already stored on the current 280filesystem. 281 282Value is a string. 283 284_Example_ 285 286``` 287--memory size=0 288--memory-zone id=mem0,size=1G,file=/foo/bar 289``` 290 291### `shared` 292 293Specifies if the memory zone must be `mmap(2)` with `MAP_SHARED` flag. 294 295By sharing a memory zone mapping, one can share part of the guest RAM with 296other processes running on the host. One can use this option when running 297vhost-user devices as part of the VM device model, as they will be driven 298by standalone daemons needing access to the guest RAM content. 299 300If `hugepages=on` then the value of this field is ignored as huge pages always 301requires `MAP_SHARED`. 302 303By default this option is turned off, which result in performing `mmap(2)` 304with `MAP_PRIVATE` flag. 305 306_Example_ 307 308``` 309--memory size=0 310--memory-zone id=mem0,size=1G,shared=on 311``` 312 313### `hugepages` and `hugepage_size` 314 315Specifies if the memory must be created and `mmap(2)` with `MAP_HUGETLB` and size 316flags. This performs a memory mapping relying on the specified huge page size. 317If no huge page size is supplied the system's default huge page size is used. 318 319By using hugepages, one can improve the overall performance of the VM, assuming 320the guest will allocate hugepages as well. Another interesting use case is VFIO 321as it speeds up the VM's boot time since the amount of IOMMU mappings are 322reduced. 323 324The user is responsible for ensuring there are sufficient huge pages of the 325specified size for the VMM to use. Failure to do so may result in strange VMM 326behaviour, e.g. error with `ReadKernelImage` is common. If there is a strange 327error with `hugepages` enabled, just disable it or check whether there are enough 328huge pages. 329 330If `hugepages=on` then the value of `shared` is ignored as huge pages always 331requires `MAP_SHARED`. 332 333By default this option is turned off. 334 335_Example_ 336 337``` 338--memory size=0 339--memory-zone id=mem0,size=1G,hugepages=on,hugepage_size=2M 340``` 341 342### `host_numa_node` 343 344Node identifier of a node present on the host. This option will let the user 345pick a specific NUMA node from which the memory must be allocated. After the 346memory zone is `mmap(2)`, the NUMA policy for this memory mapping will be 347applied through `mbind(2)`, relying on the provided node identifier. If the 348node does not exist on the host, the call to `mbind(2)` will fail. 349 350This option is useful when trying to back a VM memory with a specific type of 351memory from the host. Assuming a host has two types of memory, with one slower 352than the other, each related to a distinct NUMA node, one could create a VM 353with slower memory accesses by backing the entire guest RAM from the furthest 354NUMA node on the host. 355 356This option also gives the opportunity to create a VM with non uniform memory 357accesses as one could define a first memory zone backed by fast memory, and a 358second memory zone backed by slow memory. 359 360Value is an unsigned integer of 32 bits. 361 362_Example_ 363 364``` 365--memory size=0 366--memory-zone id=mem0,size=1G,host_numa_node=0 367``` 368 369### `hotplug_size` 370 371Amount of memory that can be dynamically added to the memory zone. Since 372`virtio-mem` is the only way of resizing a memory zone, one must specify 373the `hotplug_method=virtio-mem` to the `--memory` parameter. 374 375Value is an unsigned integer of 64 bits. A value of 0 is invalid. 376 377_Example_ 378 379``` 380--memory size=0,hotplug_method=virtio-mem 381--memory-zone id=mem0,size=1G,hotplug_size=1G 382``` 383 384### `hotplugged_size` 385 386Amount of memory that will be dynamically added to a memory zone at VM's boot. 387This option allows for starting a VM with a certain amount of memory that can 388be reduced during runtime. 389 390This is only valid when the `hotplug_method` is `virtio-mem` as it does not 391make sense for the `acpi` use case. When using ACPI, the memory can't be 392resized after it has been extended. 393 394This option is only valid when `hotplug_size` is specified, and its value can't 395exceed the value of `hotplug_size`. 396 397Value is an unsigned integer of 64 bits. A value of 0 is invalid. 398 399_Example_ 400 401``` 402--memory size=0,hotplug_method=virtio-mem 403--memory-zone id=mem0,size=1G,hotplug_size=1G,hotplugged_size=512M 404``` 405 406### `prefault` 407 408Specifies if the memory must be `mmap(2)` with `MAP_POPULATE` flag. 409 410By triggering prefault, one can allocate all required physical memory and create 411its page tables while calling `mmap`. With physical memory allocated, the number 412of page faults will decrease during running, and performance will also improve. 413 414Note that boot of VM will be slower with `prefault` enabled because of allocating 415physical memory and creating page tables in advance, and physical memory of the 416specified size will be consumed quickly. 417 418This option only takes effect at boot of VM. There is also a `prefault` option in 419restore and its choice will overwrite `prefault` in memory. 420 421By default this option is turned off. 422 423_Example_ 424 425``` 426--memory size=0 427--memory-zone id=mem0,size=1G,prefault=on 428``` 429 430## NUMA settings 431 432`NumaConfig` or what is known as `--numa` from the CLI perspective has been 433introduced to define a guest NUMA topology. It allows for a fine description 434about the CPUs and memory ranges associated with each NUMA node. Additionally 435it allows for specifying the distance between each NUMA node. 436 437```rust 438struct NumaConfig { 439 guest_numa_id: u32, 440 cpus: Option<Vec<u8>>, 441 distances: Option<Vec<NumaDistance>>, 442 memory_zones: Option<Vec<String>>, 443 sgx_epc_sections: Option<Vec<String>>, 444} 445``` 446 447``` 448--numa <numa> Settings related to a given NUMA node "guest_numa_id=<node_id>,cpus=<cpus_id>,distances=<list_of_distances_to_destination_nodes>,memory_zones=<list_of_memory_zones>,sgx_epc_sections=<list_of_sgx_epc_sections>" 449``` 450 451### `guest_numa_id` 452 453Node identifier of a guest NUMA node. This identifier must be unique, otherwise 454an error will be returned. 455 456This option is mandatory when using the `--numa` parameter. 457 458Value is an unsigned integer of 32 bits. 459 460_Example_ 461 462``` 463--numa guest_numa_id=0 464``` 465 466### `cpus` 467 468List of virtual CPUs attached to the guest NUMA node identified by the 469`guest_numa_id` option. This allows for describing a list of CPUs which 470must be seen by the guest as belonging to the NUMA node `guest_numa_id`. 471 472One can use this option for a fine grained description of the NUMA topology 473regarding the CPUs associated with it, which might help the guest run more 474efficiently. 475 476Multiple values can be provided to define the list. Each value is an unsigned 477integer of 8 bits. 478 479For instance, if one needs to attach all CPUs from 0 to 4 to a specific node, 480the syntax using `-` will help define a contiguous range with `cpus=0-4`. The 481same example could also be described with `cpus=[0,1,2,3,4]`. 482 483A combination of both `-` and `,` separators is useful when one might need to 484describe a list containing all CPUs from 0 to 99 and the CPU 255, as it could 485simply be described with `cpus=[0-99,255]`. 486 487As soon as one tries to describe a list of values, `[` and `]` must be used to 488demarcate the list. 489 490_Example_ 491 492``` 493--cpus boot=8 494--numa guest_numa_id=0,cpus=[1-3,7] guest_numa_id=1,cpus=[0,4-6] 495``` 496 497### `distances` 498 499List of distances between the current NUMA node referred by `guest_numa_id` 500and the destination NUMA nodes listed along with distances. This option let 501the user choose the distances between guest NUMA nodes. This is important to 502provide an accurate description of the way non uniform memory accesses will 503perform in the guest. 504 505One or more tuple of two values must be provided through this option. The first 506value is an unsigned integer of 32 bits as it represents the destination NUMA 507node. The second value is an unsigned integer of 8 bits as it represents the 508distance between the current NUMA node and the destination NUMA node. The two 509values are separated by `@` (`value1@value2`), meaning the destination NUMA 510node `value1` is located at a distance of `value2`. Each tuple is separated 511from the others with `,` separator. 512 513As soon as one tries to describe a list of values, `[` and `]` must be used to 514demarcate the list. 515 516For instance, if one wants to define 3 NUMA nodes, with each node located at 517different distances, it can be described with the following example. 518 519_Example_ 520 521``` 522--numa guest_numa_id=0,distances=[1@15,2@25] guest_numa_id=1,distances=[0@15,2@20] guest_numa_id=2,distances=[0@25,1@20] 523``` 524 525### `memory_zones` 526 527List of memory zones attached to the guest NUMA node identified by the 528`guest_numa_id` option. This allows for describing a list of memory ranges 529which must be seen by the guest as belonging to the NUMA node `guest_numa_id`. 530 531This option can be very useful and powerful when combined with `host_numa_node` 532option from `--memory-zone` parameter as it allows for creating a VM with non 533uniform memory accesses, and let the guest know about it. It allows for 534exposing memory zones through different NUMA nodes, which can help the guest 535workload run more efficiently. 536 537Multiple values can be provided to define the list. Each value is a string 538referring to an existing memory zone identifier. Values are separated from 539each other with the `,` separator. 540 541As soon as one tries to describe a list of values, `[` and `]` must be used to 542demarcate the list. 543 544Note that a memory zone must belong to a single NUMA node. The following 545configuration is incorrect, therefore not allowed: 546`--numa guest_numa_id=0,memory_zones=mem0 guest_numa_id=1,memory_zones=mem0` 547 548_Example_ 549 550``` 551--memory size=0 552--memory-zone id=mem0,size=1G id=mem1,size=1G id=mem2,size=1G 553--numa guest_numa_id=0,memory_zones=[mem0,mem2] guest_numa_id=1,memory_zones=mem1 554``` 555 556### `sgx_epc_sections` 557 558List of SGX EPC sections attached to the guest NUMA node identified by the 559`guest_numa_id` option. This allows for describing a list of SGX EPC sections 560which must be seen by the guest as belonging to the NUMA node `guest_numa_id`. 561 562Multiple values can be provided to define the list. Each value is a string 563referring to an existing SGX EPC section identifier. Values are separated from 564each other with the `,` separator. 565 566As soon as one tries to describe a list of values, `[` and `]` must be used to 567demarcate the list. 568 569_Example_ 570 571``` 572--sgx-epc id=epc0,size=32M id=epc1,size=64M id=epc2,size=32M 573--numa guest_numa_id=0,sgx_epc_sections=epc1 guest_numa_id=1,sgx_epc_sections=[epc0,epc2] 574``` 575 576### PCI bus 577 578Cloud Hypervisor supports only one PCI bus, which is why it has been tied to 579the NUMA node 0 by default. It is the user responsibility to organize the NUMA 580nodes correctly so that vCPUs and guest RAM which should be located on the same 581NUMA node as the PCI bus end up on the NUMA node 0. 582