1# Memory 2 3Cloud Hypervisor has many ways to expose memory to the guest VM. This document 4aims to explain what Cloud Hypervisor is capable of and how it can be used to 5meet the needs of very different use cases. 6 7## Basic Parameters 8 9`MemoryConfig` or what is known as `--memory` from the CLI perspective is the 10easiest way to get started with Cloud Hypervisor. 11 12```rust 13struct MemoryConfig { 14 size: u64, 15 mergeable: bool, 16 hotplug_method: HotplugMethod, 17 hotplug_size: Option<u64>, 18 hotplugged_size: Option<u64>, 19 shared: bool, 20 hugepages: bool, 21 hugepage_size: Option<u64>, 22 prefault: bool, 23 zones: Option<Vec<MemoryZoneConfig>>, 24} 25``` 26 27``` 28--memory <memory> Memory parameters "size=<guest_memory_size>,mergeable=on|off,shared=on|off,hugepages=on|off,hugepage_size=<hugepage_size>,hotplug_method=acpi|virtio-mem,hotplug_size=<hotpluggable_memory_size>,hotplugged_size=<hotplugged_memory_size>,prefault=on|off" [default: size=512M] 29``` 30 31### `size` 32 33Size of the RAM in the guest VM. 34 35This option is mandatory when using the `--memory` parameter. 36 37Value is an unsigned integer of 64 bits. 38 39_Example_ 40 41``` 42--memory size=1G 43``` 44 45### `mergeable` 46 47Specifies if the pages from the guest RAM must be marked as _mergeable_. In 48case this option is `true` or `on`, the pages will be marked with `madvise(2)` 49to let the host kernel know which pages are eligible for being merged by the 50KSM daemon. 51 52This option can be used when trying to reach a higher density of VMs running 53on a single host, as it will reduce the amount of memory consumed by each VM. 54 55By default this option is turned off. 56 57_Example_ 58 59``` 60--memory size=1G,mergeable=on 61``` 62 63### `hotplug_method` 64 65Selects the way of adding and/or removing memory to/from a booted VM. 66 67Possible values are `acpi` and `virtio-mem`. Default value is `acpi`. 68 69_Example_ 70 71``` 72--memory size=1G,hotplug_method=acpi 73``` 74 75### `hotplug_size` 76 77Amount of memory that can be dynamically added to the VM. 78 79Value is an unsigned integer of 64 bits. A value of 0 is invalid. 80 81_Example_ 82 83``` 84--memory size=1G,hotplug_size=1G 85``` 86 87### `hotplugged_size` 88 89Amount of memory that will be dynamically added to the VM at boot. This option 90allows for starting a VM with a certain amount of memory that can be reduced 91during runtime. 92 93This is only valid when the `hotplug_method` is `virtio-mem` as it does not 94make sense for the `acpi` use case. When using ACPI, the memory can't be 95resized after it has been extended. 96 97This option is only valid when `hotplug_size` is specified, and its value can't 98exceed the value of `hotplug_size`. 99 100Value is an unsigned integer of 64 bits. A value of 0 is invalid. 101 102_Example_ 103 104``` 105--memory size=1G,hotplug_method=virtio-mem,hotplug_size=1G,hotplugged_size=512M 106``` 107 108### `shared` 109 110Specifies if the memory must be `mmap(2)` with `MAP_SHARED` flag. 111 112By sharing a memory mapping, one can share the guest RAM with other processes 113running on the host. One can use this option when running vhost-user devices 114as part of the VM device model, as they will be driven by standalone daemons 115needing access to the guest RAM content. 116 117By default this option is turned off, which results in performing `mmap(2)` 118with `MAP_PRIVATE` flag. 119 120_Example_ 121 122``` 123--memory size=1G,shared=on 124``` 125 126### `hugepages` and `hugepage_size` 127 128Specifies if the memory must be created and `mmap(2)` with `MAP_HUGETLB` and size 129flags. This performs a memory mapping relying on the specified huge page size. 130If no huge page size is supplied the system's default huge page size is used. 131 132By using hugepages, one can improve the overall performance of the VM, assuming 133the guest will allocate hugepages as well. Another interesting use case is VFIO 134as it speeds up the VM's boot time since the amount of IOMMU mappings are 135reduced. 136 137The user is responsible for ensuring there are sufficient huge pages of the 138specified size for the VMM to use. Failure to do so may result in strange VMM 139behaviour, e.g. error with `ReadKernelImage` is common. If there is a strange 140error with `hugepages` enabled, just disable it or check whether there are enough 141huge pages. 142 143By default this option is turned off. 144 145_Example_ 146 147``` 148--memory size=1G,hugepages=on,hugepage_size=2M 149``` 150 151### `prefault` 152 153Specifies if the memory must be `mmap(2)` with `MAP_POPULATE` flag. 154 155By triggering prefault, one can allocate all required physical memory and create 156its page tables while calling `mmap`. With physical memory allocated, the number 157of page faults will decrease during running, and performance will also improve. 158 159Note that boot of VM will be slower with `prefault` enabled because of allocating 160physical memory and creating page tables in advance, and physical memory of the 161specified size will be consumed quickly. 162 163This option only takes effect at boot of VM. There is also a `prefault` option in 164restore and its choice will overwrite `prefault` in memory. 165 166By default this option is turned off. 167 168_Example_ 169 170``` 171--memory size=1G,prefault=on 172``` 173 174## Advanced Parameters 175 176`MemoryZoneConfig` or what is known as `--memory-zone` from the CLI perspective 177is a power user parameter. It allows for a full description of the guest RAM, 178describing how every memory region is backed and exposed to the guest. 179 180```rust 181struct MemoryZoneConfig { 182 id: String, 183 size: u64, 184 file: Option<PathBuf>, 185 shared: bool, 186 hugepages: bool, 187 hugepage_size: Option<u64>, 188 host_numa_node: Option<u32>, 189 hotplug_size: Option<u64>, 190 hotplugged_size: Option<u64>, 191 prefault: bool, 192} 193``` 194 195``` 196--memory-zone <memory-zone> User defined memory zone parameters "size=<guest_memory_region_size>,file=<backing_file>,shared=on|off,hugepages=on|off,hugepage_size=<hugepage_size>,host_numa_node=<node_id>,id=<zone_identifier>,hotplug_size=<hotpluggable_memory_size>,hotplugged_size=<hotplugged_memory_size>,prefault=on|off" 197``` 198 199This parameter expects one or more occurences, allowing for a list of memory 200zones to be defined. It must be used with `--memory size=0`, clearly indicating 201that the memory will be described through advanced parameters. 202 203Each zone is given a list of options which we detail through the following 204sections. 205 206### `id` 207 208Memory zone identifier. This identifier must be unique, otherwise an error will 209be returned. 210 211This option is useful when referring to a memory zone previously created. In 212particular, the `--numa` parameter can associate a memory zone to a specific 213NUMA node based on the memory zone identifier. 214 215This option is mandatory when using the `--memory-zone` parameter. 216 217Value is a string. 218 219_Example_ 220 221``` 222--memory size=0 223--memory-zone id=mem0,size=1G 224``` 225 226### `size` 227 228Size of the memory zone. 229 230This option is mandatory when using the `--memory-zone` parameter. 231 232Value is an unsigned integer of 64 bits. 233 234_Example_ 235 236``` 237--memory size=0 238--memory-zone id=mem0,size=1G 239``` 240 241### `file` 242 243Path to the file backing the memory zone. This can be either a file or a 244directory. In case of a file, it will be opened and used as the backing file 245for the `mmap(2)` operation. In case of a directory, a temporary file with no 246hard link on the filesystem will be created. This file will be used as the 247backing file for the `mmap(2)` operation. 248 249This option can be particularly useful when trying to back a part of the guest 250RAM with a well known file. In the context of the snapshot/restore feature, and 251if the provided path is a file, the snapshot operation will not perform any 252copy of the guest RAM content for this specific memory zone since the user has 253access to it and it would duplicate data already stored on the current 254filesystem. 255 256Value is a string. 257 258_Example_ 259 260``` 261--memory size=0 262--memory-zone id=mem0,size=1G,file=/foo/bar 263``` 264 265### `shared` 266 267Specifies if the memory zone must be `mmap(2)` with `MAP_SHARED` flag. 268 269By sharing a memory zone mapping, one can share part of the guest RAM with 270other processes running on the host. One can use this option when running 271vhost-user devices as part of the VM device model, as they will be driven 272by standalone daemons needing access to the guest RAM content. 273 274By default this option is turned off, which result in performing `mmap(2)` 275with `MAP_PRIVATE` flag. 276 277_Example_ 278 279``` 280--memory size=0 281--memory-zone id=mem0,size=1G,shared=on 282``` 283 284### `hugepages` and `hugepage_size` 285 286Specifies if the memory must be created and `mmap(2)` with `MAP_HUGETLB` and size 287flags. This performs a memory mapping relying on the specified huge page size. 288If no huge page size is supplied the system's default huge page size is used. 289 290By using hugepages, one can improve the overall performance of the VM, assuming 291the guest will allocate hugepages as well. Another interesting use case is VFIO 292as it speeds up the VM's boot time since the amount of IOMMU mappings are 293reduced. 294 295The user is responsible for ensuring there are sufficient huge pages of the 296specified size for the VMM to use. Failure to do so may result in strange VMM 297behaviour, e.g. error with `ReadKernelImage` is common. If there is a strange 298error with `hugepages` enabled, just disable it or check whether there are enough 299huge pages. 300 301By default this option is turned off. 302 303_Example_ 304 305``` 306--memory size=0 307--memory-zone id=mem0,size=1G,hugepages=on,hugepage_size=2M 308``` 309 310### `host_numa_node` 311 312Node identifier of a node present on the host. This option will let the user 313pick a specific NUMA node from which the memory must be allocated. After the 314memory zone is `mmap(2)`, the NUMA policy for this memory mapping will be 315applied through `mbind(2)`, relying on the provided node identifier. If the 316node does not exist on the host, the call to `mbind(2)` will fail. 317 318This option is useful when trying to back a VM memory with a specific type of 319memory from the host. Assuming a host has two types of memory, with one slower 320than the other, each related to a distinct NUMA node, one could create a VM 321with slower memory accesses by backing the entire guest RAM from the furthest 322NUMA node on the host. 323 324This option also gives the opportunity to create a VM with non uniform memory 325accesses as one could define a first memory zone backed by fast memory, and a 326second memory zone backed by slow memory. 327 328Value is an unsigned integer of 32 bits. 329 330_Example_ 331 332``` 333--memory size=0 334--memory-zone id=mem0,size=1G,host_numa_node=0 335``` 336 337### `hotplug_size` 338 339Amount of memory that can be dynamically added to the memory zone. Since 340`virtio-mem` is the only way of resizing a memory zone, one must specify 341the `hotplug_method=virtio-mem` to the `--memory` parameter. 342 343Value is an unsigned integer of 64 bits. A value of 0 is invalid. 344 345_Example_ 346 347``` 348--memory size=0,hotplug_method=virtio-mem 349--memory-zone id=mem0,size=1G,hotplug_size=1G 350``` 351 352### `hotplugged_size` 353 354Amount of memory that will be dynamically added to a memory zone at VM's boot. 355This option allows for starting a VM with a certain amount of memory that can 356be reduced during runtime. 357 358This is only valid when the `hotplug_method` is `virtio-mem` as it does not 359make sense for the `acpi` use case. When using ACPI, the memory can't be 360resized after it has been extended. 361 362This option is only valid when `hotplug_size` is specified, and its value can't 363exceed the value of `hotplug_size`. 364 365Value is an unsigned integer of 64 bits. A value of 0 is invalid. 366 367_Example_ 368 369``` 370--memory size=0,hotplug_method=virtio-mem 371--memory-zone id=mem0,size=1G,hotplug_size=1G,hotplugged_size=512M 372``` 373 374### `prefault` 375 376Specifies if the memory must be `mmap(2)` with `MAP_POPULATE` flag. 377 378By triggering prefault, one can allocate all required physical memory and create 379its page tables while calling `mmap`. With physical memory allocated, the number 380of page faults will decrease during running, and performance will also improve. 381 382Note that boot of VM will be slower with `prefault` enabled because of allocating 383physical memory and creating page tables in advance, and physical memory of the 384specified size will be consumed quickly. 385 386This option only takes effect at boot of VM. There is also a `prefault` option in 387restore and its choice will overwrite `prefault` in memory. 388 389By default this option is turned off. 390 391_Example_ 392 393``` 394--memory size=0 395--memory-zone id=mem0,size=1G,prefault=on 396``` 397 398## NUMA settings 399 400`NumaConfig` or what is known as `--numa` from the CLI perspective has been 401introduced to define a guest NUMA topology. It allows for a fine description 402about the CPUs and memory ranges associated with each NUMA node. Additionally 403it allows for specifying the distance between each NUMA node. 404 405```rust 406struct NumaConfig { 407 guest_numa_id: u32, 408 cpus: Option<Vec<u8>>, 409 distances: Option<Vec<NumaDistance>>, 410 memory_zones: Option<Vec<String>>, 411 sgx_epc_sections: Option<Vec<String>>, 412} 413``` 414 415``` 416--numa <numa> Settings related to a given NUMA node "guest_numa_id=<node_id>,cpus=<cpus_id>,distances=<list_of_distances_to_destination_nodes>,memory_zones=<list_of_memory_zones>,sgx_epc_sections=<list_of_sgx_epc_sections>" 417``` 418 419### `guest_numa_id` 420 421Node identifier of a guest NUMA node. This identifier must be unique, otherwise 422an error will be returned. 423 424This option is mandatory when using the `--numa` parameter. 425 426Value is an unsigned integer of 32 bits. 427 428_Example_ 429 430``` 431--numa guest_numa_id=0 432``` 433 434### `cpus` 435 436List of virtual CPUs attached to the guest NUMA node identified by the 437`guest_numa_id` option. This allows for describing a list of CPUs which 438must be seen by the guest as belonging to the NUMA node `guest_numa_id`. 439 440One can use this option for a fine grained description of the NUMA topology 441regarding the CPUs associated with it, which might help the guest run more 442efficiently. 443 444Multiple values can be provided to define the list. Each value is an unsigned 445integer of 8 bits. 446 447For instance, if one needs to attach all CPUs from 0 to 4 to a specific node, 448the syntax using `-` will help define a contiguous range with `cpus=0-4`. The 449same example could also be described with `cpus=[0,1,2,3,4]`. 450 451A combination of both `-` and `,` separators is useful when one might need to 452describe a list containing all CPUs from 0 to 99 and the CPU 255, as it could 453simply be described with `cpus=[0-99,255]`. 454 455As soon as one tries to describe a list of values, `[` and `]` must be used to 456demarcate the list. 457 458_Example_ 459 460``` 461--cpus boot=8 462--numa guest_numa_id=0,cpus=[1-3,7] guest_numa_id=1,cpus=[0,4-6] 463``` 464 465### `distances` 466 467List of distances between the current NUMA node referred by `guest_numa_id` 468and the destination NUMA nodes listed along with distances. This option let 469the user choose the distances between guest NUMA nodes. This is important to 470provide an accurate description of the way non uniform memory accesses will 471perform in the guest. 472 473One or more tuple of two values must be provided through this option. The first 474value is an unsigned integer of 32 bits as it represents the destination NUMA 475node. The second value is an unsigned integer of 8 bits as it represents the 476distance between the current NUMA node and the destination NUMA node. The two 477values are separated by `@` (`value1@value2`), meaning the destination NUMA 478node `value1` is located at a distance of `value2`. Each tuple is separated 479from the others with `,` separator. 480 481As soon as one tries to describe a list of values, `[` and `]` must be used to 482demarcate the list. 483 484For instance, if one wants to define 3 NUMA nodes, with each node located at 485different distances, it can be described with the following example. 486 487_Example_ 488 489``` 490--numa guest_numa_id=0,distances=[1@15,2@25] guest_numa_id=1,distances=[0@15,2@20] guest_numa_id=2,distances=[0@25,1@20] 491``` 492 493### `memory_zones` 494 495List of memory zones attached to the guest NUMA node identified by the 496`guest_numa_id` option. This allows for describing a list of memory ranges 497which must be seen by the guest as belonging to the NUMA node `guest_numa_id`. 498 499This option can be very useful and powerful when combined with `host_numa_node` 500option from `--memory-zone` parameter as it allows for creating a VM with non 501uniform memory accesses, and let the guest know about it. It allows for 502exposing memory zones through different NUMA nodes, which can help the guest 503workload run more efficiently. 504 505Multiple values can be provided to define the list. Each value is a string 506referring to an existing memory zone identifier. Values are separated from 507each other with the `,` separator. 508 509As soon as one tries to describe a list of values, `[` and `]` must be used to 510demarcate the list. 511 512Note that a memory zone must belong to a single NUMA node. The following 513configuration is incorrect, therefore not allowed: 514`--numa guest_numa_id=0,memory_zones=mem0 guest_numa_id=1,memory_zones=mem0` 515 516_Example_ 517 518``` 519--memory size=0 520--memory-zone id=mem0,size=1G id=mem1,size=1G id=mem2,size=1G 521--numa guest_numa_id=0,memory_zones=[mem0,mem2] guest_numa_id=1,memory_zones=mem1 522``` 523 524### `sgx_epc_sections` 525 526List of SGX EPC sections attached to the guest NUMA node identified by the 527`guest_numa_id` option. This allows for describing a list of SGX EPC sections 528which must be seen by the guest as belonging to the NUMA node `guest_numa_id`. 529 530Multiple values can be provided to define the list. Each value is a string 531referring to an existing SGX EPC section identifier. Values are separated from 532each other with the `,` separator. 533 534As soon as one tries to describe a list of values, `[` and `]` must be used to 535demarcate the list. 536 537_Example_ 538 539``` 540--sgx-epc id=epc0,size=32M id=epc1,size=64M id=epc2,size=32M 541--numa guest_numa_id=0,sgx_epc_sections=epc1 guest_numa_id=1,sgx_epc_sections=[epc0,epc2] 542``` 543 544### PCI bus 545 546Cloud Hypervisor supports only one PCI bus, which is why it has been tied to 547the NUMA node 0 by default. It is the user responsibility to organize the NUMA 548nodes correctly so that vCPUs and guest RAM which should be located on the same 549NUMA node as the PCI bus end up on the NUMA node 0. 550