1# Memory 2 3Cloud Hypervisor has many ways to expose memory to the guest VM. This document 4aims to explain what Cloud Hypervisor is capable of and how it can be used to 5meet the needs of very different use cases. 6 7## Basic Parameters 8 9`MemoryConfig` or what is known as `--memory` from the CLI perspective is the 10easiest way to get started with Cloud Hypervisor. 11 12```rust 13struct MemoryConfig { 14 size: u64, 15 mergeable: bool, 16 hotplug_method: HotplugMethod, 17 hotplug_size: Option<u64>, 18 hotplugged_size: Option<u64>, 19 shared: bool, 20 hugepages: bool, 21 hugepage_size: Option<u64>, 22 prefault: bool, 23 thp: bool 24 zones: Option<Vec<MemoryZoneConfig>>, 25} 26``` 27 28``` 29--memory <memory> Memory parameters "size=<guest_memory_size>,mergeable=on|off,shared=on|off,hugepages=on|off,hugepage_size=<hugepage_size>,hotplug_method=acpi|virtio-mem,hotplug_size=<hotpluggable_memory_size>,hotplugged_size=<hotplugged_memory_size>,prefault=on|off,thp=on|off" [default: size=512M,thp=on] 30``` 31 32### `size` 33 34Size of the RAM in the guest VM. 35 36This option is mandatory when using the `--memory` parameter. 37 38Value is an unsigned integer of 64 bits. 39 40_Example_ 41 42``` 43--memory size=1G 44``` 45 46### `mergeable` 47 48Specifies if the pages from the guest RAM must be marked as _mergeable_. In 49case this option is `true` or `on`, the pages will be marked with `madvise(2)` 50to let the host kernel know which pages are eligible for being merged by the 51KSM daemon. 52 53This option can be used when trying to reach a higher density of VMs running 54on a single host, as it will reduce the amount of memory consumed by each VM. 55 56By default this option is turned off. 57 58_Example_ 59 60``` 61--memory size=1G,mergeable=on 62``` 63 64### `hotplug_method` 65 66Selects the way of adding and/or removing memory to/from a booted VM. 67 68Possible values are `acpi` and `virtio-mem`. Default value is `acpi`. 69 70_Example_ 71 72``` 73--memory size=1G,hotplug_method=acpi 74``` 75 76### `hotplug_size` 77 78Amount of memory that can be dynamically added to the VM. 79 80Value is an unsigned integer of 64 bits. A value of 0 is invalid. 81 82_Example_ 83 84``` 85--memory size=1G,hotplug_size=1G 86``` 87 88### `hotplugged_size` 89 90Amount of memory that will be dynamically added to the VM at boot. This option 91allows for starting a VM with a certain amount of memory that can be reduced 92during runtime. 93 94This is only valid when the `hotplug_method` is `virtio-mem` as it does not 95make sense for the `acpi` use case. When using ACPI, the memory can't be 96resized after it has been extended. 97 98This option is only valid when `hotplug_size` is specified, and its value can't 99exceed the value of `hotplug_size`. 100 101Value is an unsigned integer of 64 bits. A value of 0 is invalid. 102 103_Example_ 104 105``` 106--memory size=1G,hotplug_method=virtio-mem,hotplug_size=1G,hotplugged_size=512M 107``` 108 109### `shared` 110 111Specifies if the memory must be `mmap(2)` with `MAP_SHARED` flag. 112 113By sharing a memory mapping, one can share the guest RAM with other processes 114running on the host. One can use this option when running vhost-user devices 115as part of the VM device model, as they will be driven by standalone daemons 116needing access to the guest RAM content. 117 118By default this option is turned off, which results in performing `mmap(2)` 119with `MAP_PRIVATE` flag. 120 121If `hugepages=on` then the value of this field is ignored as huge pages always 122requires `MAP_SHARED`. 123 124_Example_ 125 126``` 127--memory size=1G,shared=on 128``` 129 130### `hugepages` and `hugepage_size` 131 132Specifies if the memory must be created and `mmap(2)` with `MAP_HUGETLB` and size 133flags. This performs a memory mapping relying on the specified huge page size. 134If no huge page size is supplied the system's default huge page size is used. 135 136By using hugepages, one can improve the overall performance of the VM, assuming 137the guest will allocate hugepages as well. Another interesting use case is VFIO 138as it speeds up the VM's boot time since the amount of IOMMU mappings are 139reduced. 140 141The user is responsible for ensuring there are sufficient huge pages of the 142specified size for the VMM to use. Failure to do so may result in strange VMM 143behaviour, e.g. error with `ReadKernelImage` is common. If there is a strange 144error with `hugepages` enabled, just disable it or check whether there are enough 145huge pages. 146 147If `hugepages=on` then the value of `shared` is ignored as huge pages always 148requires `MAP_SHARED`. 149 150By default this option is turned off. 151 152_Example_ 153 154``` 155--memory size=1G,hugepages=on,hugepage_size=2M 156``` 157 158### `prefault` 159 160Specifies if the memory must be `mmap(2)` with `MAP_POPULATE` flag. 161 162By triggering prefault, one can allocate all required physical memory and create 163its page tables while calling `mmap`. With physical memory allocated, the number 164of page faults will decrease during running, and performance will also improve. 165 166Note that boot of VM will be slower with `prefault` enabled because of allocating 167physical memory and creating page tables in advance, and physical memory of the 168specified size will be consumed quickly. 169 170This option only takes effect at boot of VM. There is also a `prefault` option in 171restore and its choice will overwrite `prefault` in memory. 172 173By default this option is turned off. 174 175_Example_ 176 177``` 178--memory size=1G,prefault=on 179``` 180 181### `thp` 182 183Specifies if private anonymous memory for the guest (i.e. `shared=off` and no 184backing file) should be labelled `MADV_HUGEPAGE` with `madvise(2)` indicating 185to the kernel that this memory may be backed with huge pages transparently. 186 187The use of transparent huge pages can improve the performance of the guest as 188there will fewer virtualisation related page faults. Unlike using 189`hugepages=on` a specific number of huge pages do not need to be allocated by 190the kernel. 191 192By default this option is turned on. 193 194_Example_ 195 196``` 197--memory size=1G,thp=on 198``` 199 200## Advanced Parameters 201 202`MemoryZoneConfig` or what is known as `--memory-zone` from the CLI perspective 203is a power user parameter. It allows for a full description of the guest RAM, 204describing how every memory region is backed and exposed to the guest. 205 206```rust 207struct MemoryZoneConfig { 208 id: String, 209 size: u64, 210 file: Option<PathBuf>, 211 shared: bool, 212 hugepages: bool, 213 hugepage_size: Option<u64>, 214 host_numa_node: Option<u32>, 215 hotplug_size: Option<u64>, 216 hotplugged_size: Option<u64>, 217 prefault: bool, 218} 219``` 220 221``` 222--memory-zone <memory-zone> User defined memory zone parameters "size=<guest_memory_region_size>,file=<backing_file>,shared=on|off,hugepages=on|off,hugepage_size=<hugepage_size>,host_numa_node=<node_id>,id=<zone_identifier>,hotplug_size=<hotpluggable_memory_size>,hotplugged_size=<hotplugged_memory_size>,prefault=on|off" 223``` 224 225This parameter expects one or more occurrences, allowing for a list of memory 226zones to be defined. It must be used with `--memory size=0`, clearly indicating 227that the memory will be described through advanced parameters. 228 229Each zone is given a list of options which we detail through the following 230sections. 231 232### `id` 233 234Memory zone identifier. This identifier must be unique, otherwise an error will 235be returned. 236 237This option is useful when referring to a memory zone previously created. In 238particular, the `--numa` parameter can associate a memory zone to a specific 239NUMA node based on the memory zone identifier. 240 241This option is mandatory when using the `--memory-zone` parameter. 242 243Value is a string. 244 245_Example_ 246 247``` 248--memory size=0 249--memory-zone id=mem0,size=1G 250``` 251 252### `size` 253 254Size of the memory zone. 255 256This option is mandatory when using the `--memory-zone` parameter. 257 258Value is an unsigned integer of 64 bits. 259 260_Example_ 261 262``` 263--memory size=0 264--memory-zone id=mem0,size=1G 265``` 266 267### `file` 268 269Path to the file backing the memory zone. The file will be opened and used as 270the backing file for the `mmap(2)` operation. 271 272This option can be particularly useful when trying to back a part of the guest 273RAM with a well known file. In the context of the snapshot/restore feature, and 274if the provided path is a file, the snapshot operation will not perform any 275copy of the guest RAM content for this specific memory zone since the user has 276access to it and it would duplicate data already stored on the current 277filesystem. 278 279Value is a string. 280 281_Example_ 282 283``` 284--memory size=0 285--memory-zone id=mem0,size=1G,file=/foo/bar 286``` 287 288### `shared` 289 290Specifies if the memory zone must be `mmap(2)` with `MAP_SHARED` flag. 291 292By sharing a memory zone mapping, one can share part of the guest RAM with 293other processes running on the host. One can use this option when running 294vhost-user devices as part of the VM device model, as they will be driven 295by standalone daemons needing access to the guest RAM content. 296 297If `hugepages=on` then the value of this field is ignored as huge pages always 298requires `MAP_SHARED`. 299 300By default this option is turned off, which result in performing `mmap(2)` 301with `MAP_PRIVATE` flag. 302 303_Example_ 304 305``` 306--memory size=0 307--memory-zone id=mem0,size=1G,shared=on 308``` 309 310### `hugepages` and `hugepage_size` 311 312Specifies if the memory must be created and `mmap(2)` with `MAP_HUGETLB` and size 313flags. This performs a memory mapping relying on the specified huge page size. 314If no huge page size is supplied the system's default huge page size is used. 315 316By using hugepages, one can improve the overall performance of the VM, assuming 317the guest will allocate hugepages as well. Another interesting use case is VFIO 318as it speeds up the VM's boot time since the amount of IOMMU mappings are 319reduced. 320 321The user is responsible for ensuring there are sufficient huge pages of the 322specified size for the VMM to use. Failure to do so may result in strange VMM 323behaviour, e.g. error with `ReadKernelImage` is common. If there is a strange 324error with `hugepages` enabled, just disable it or check whether there are enough 325huge pages. 326 327If `hugepages=on` then the value of `shared` is ignored as huge pages always 328requires `MAP_SHARED`. 329 330By default this option is turned off. 331 332_Example_ 333 334``` 335--memory size=0 336--memory-zone id=mem0,size=1G,hugepages=on,hugepage_size=2M 337``` 338 339### `host_numa_node` 340 341Node identifier of a node present on the host. This option will let the user 342pick a specific NUMA node from which the memory must be allocated. After the 343memory zone is `mmap(2)`, the NUMA policy for this memory mapping will be 344applied through `mbind(2)`, relying on the provided node identifier. If the 345node does not exist on the host, the call to `mbind(2)` will fail. 346 347This option is useful when trying to back a VM memory with a specific type of 348memory from the host. Assuming a host has two types of memory, with one slower 349than the other, each related to a distinct NUMA node, one could create a VM 350with slower memory accesses by backing the entire guest RAM from the furthest 351NUMA node on the host. 352 353This option also gives the opportunity to create a VM with non uniform memory 354accesses as one could define a first memory zone backed by fast memory, and a 355second memory zone backed by slow memory. 356 357Value is an unsigned integer of 32 bits. 358 359_Example_ 360 361``` 362--memory size=0 363--memory-zone id=mem0,size=1G,host_numa_node=0 364``` 365 366### `hotplug_size` 367 368Amount of memory that can be dynamically added to the memory zone. Since 369`virtio-mem` is the only way of resizing a memory zone, one must specify 370the `hotplug_method=virtio-mem` to the `--memory` parameter. 371 372Value is an unsigned integer of 64 bits. A value of 0 is invalid. 373 374_Example_ 375 376``` 377--memory size=0,hotplug_method=virtio-mem 378--memory-zone id=mem0,size=1G,hotplug_size=1G 379``` 380 381### `hotplugged_size` 382 383Amount of memory that will be dynamically added to a memory zone at VM's boot. 384This option allows for starting a VM with a certain amount of memory that can 385be reduced during runtime. 386 387This is only valid when the `hotplug_method` is `virtio-mem` as it does not 388make sense for the `acpi` use case. When using ACPI, the memory can't be 389resized after it has been extended. 390 391This option is only valid when `hotplug_size` is specified, and its value can't 392exceed the value of `hotplug_size`. 393 394Value is an unsigned integer of 64 bits. A value of 0 is invalid. 395 396_Example_ 397 398``` 399--memory size=0,hotplug_method=virtio-mem 400--memory-zone id=mem0,size=1G,hotplug_size=1G,hotplugged_size=512M 401``` 402 403### `prefault` 404 405Specifies if the memory must be `mmap(2)` with `MAP_POPULATE` flag. 406 407By triggering prefault, one can allocate all required physical memory and create 408its page tables while calling `mmap`. With physical memory allocated, the number 409of page faults will decrease during running, and performance will also improve. 410 411Note that boot of VM will be slower with `prefault` enabled because of allocating 412physical memory and creating page tables in advance, and physical memory of the 413specified size will be consumed quickly. 414 415This option only takes effect at boot of VM. There is also a `prefault` option in 416restore and its choice will overwrite `prefault` in memory. 417 418By default this option is turned off. 419 420_Example_ 421 422``` 423--memory size=0 424--memory-zone id=mem0,size=1G,prefault=on 425``` 426 427## NUMA settings 428 429`NumaConfig` or what is known as `--numa` from the CLI perspective has been 430introduced to define a guest NUMA topology. It allows for a fine description 431about the CPUs and memory ranges associated with each NUMA node. Additionally 432it allows for specifying the distance between each NUMA node. 433 434```rust 435struct NumaConfig { 436 guest_numa_id: u32, 437 cpus: Option<Vec<u8>>, 438 distances: Option<Vec<NumaDistance>>, 439 memory_zones: Option<Vec<String>>, 440 sgx_epc_sections: Option<Vec<String>>, 441} 442``` 443 444``` 445--numa <numa> Settings related to a given NUMA node "guest_numa_id=<node_id>,cpus=<cpus_id>,distances=<list_of_distances_to_destination_nodes>,memory_zones=<list_of_memory_zones>,sgx_epc_sections=<list_of_sgx_epc_sections>" 446``` 447 448### `guest_numa_id` 449 450Node identifier of a guest NUMA node. This identifier must be unique, otherwise 451an error will be returned. 452 453This option is mandatory when using the `--numa` parameter. 454 455Value is an unsigned integer of 32 bits. 456 457_Example_ 458 459``` 460--numa guest_numa_id=0 461``` 462 463### `cpus` 464 465List of virtual CPUs attached to the guest NUMA node identified by the 466`guest_numa_id` option. This allows for describing a list of CPUs which 467must be seen by the guest as belonging to the NUMA node `guest_numa_id`. 468 469One can use this option for a fine grained description of the NUMA topology 470regarding the CPUs associated with it, which might help the guest run more 471efficiently. 472 473Multiple values can be provided to define the list. Each value is an unsigned 474integer of 8 bits. 475 476For instance, if one needs to attach all CPUs from 0 to 4 to a specific node, 477the syntax using `-` will help define a contiguous range with `cpus=0-4`. The 478same example could also be described with `cpus=[0,1,2,3,4]`. 479 480A combination of both `-` and `,` separators is useful when one might need to 481describe a list containing all CPUs from 0 to 99 and the CPU 255, as it could 482simply be described with `cpus=[0-99,255]`. 483 484As soon as one tries to describe a list of values, `[` and `]` must be used to 485demarcate the list. 486 487_Example_ 488 489``` 490--cpus boot=8 491--numa guest_numa_id=0,cpus=[1-3,7] guest_numa_id=1,cpus=[0,4-6] 492``` 493 494### `distances` 495 496List of distances between the current NUMA node referred by `guest_numa_id` 497and the destination NUMA nodes listed along with distances. This option let 498the user choose the distances between guest NUMA nodes. This is important to 499provide an accurate description of the way non uniform memory accesses will 500perform in the guest. 501 502One or more tuple of two values must be provided through this option. The first 503value is an unsigned integer of 32 bits as it represents the destination NUMA 504node. The second value is an unsigned integer of 8 bits as it represents the 505distance between the current NUMA node and the destination NUMA node. The two 506values are separated by `@` (`value1@value2`), meaning the destination NUMA 507node `value1` is located at a distance of `value2`. Each tuple is separated 508from the others with `,` separator. 509 510As soon as one tries to describe a list of values, `[` and `]` must be used to 511demarcate the list. 512 513For instance, if one wants to define 3 NUMA nodes, with each node located at 514different distances, it can be described with the following example. 515 516_Example_ 517 518``` 519--numa guest_numa_id=0,distances=[1@15,2@25] guest_numa_id=1,distances=[0@15,2@20] guest_numa_id=2,distances=[0@25,1@20] 520``` 521 522### `memory_zones` 523 524List of memory zones attached to the guest NUMA node identified by the 525`guest_numa_id` option. This allows for describing a list of memory ranges 526which must be seen by the guest as belonging to the NUMA node `guest_numa_id`. 527 528This option can be very useful and powerful when combined with `host_numa_node` 529option from `--memory-zone` parameter as it allows for creating a VM with non 530uniform memory accesses, and let the guest know about it. It allows for 531exposing memory zones through different NUMA nodes, which can help the guest 532workload run more efficiently. 533 534Multiple values can be provided to define the list. Each value is a string 535referring to an existing memory zone identifier. Values are separated from 536each other with the `,` separator. 537 538As soon as one tries to describe a list of values, `[` and `]` must be used to 539demarcate the list. 540 541Note that a memory zone must belong to a single NUMA node. The following 542configuration is incorrect, therefore not allowed: 543`--numa guest_numa_id=0,memory_zones=mem0 guest_numa_id=1,memory_zones=mem0` 544 545_Example_ 546 547``` 548--memory size=0 549--memory-zone id=mem0,size=1G id=mem1,size=1G id=mem2,size=1G 550--numa guest_numa_id=0,memory_zones=[mem0,mem2] guest_numa_id=1,memory_zones=mem1 551``` 552 553### `sgx_epc_sections` 554 555List of SGX EPC sections attached to the guest NUMA node identified by the 556`guest_numa_id` option. This allows for describing a list of SGX EPC sections 557which must be seen by the guest as belonging to the NUMA node `guest_numa_id`. 558 559Multiple values can be provided to define the list. Each value is a string 560referring to an existing SGX EPC section identifier. Values are separated from 561each other with the `,` separator. 562 563As soon as one tries to describe a list of values, `[` and `]` must be used to 564demarcate the list. 565 566_Example_ 567 568``` 569--sgx-epc id=epc0,size=32M id=epc1,size=64M id=epc2,size=32M 570--numa guest_numa_id=0,sgx_epc_sections=epc1 guest_numa_id=1,sgx_epc_sections=[epc0,epc2] 571``` 572 573### PCI bus 574 575Cloud Hypervisor supports guests with one or more PCI segments. The default PCI segment always 576has affinity to NUMA node 0. Be default, all other PCI segments have afffinity to NUMA node 0. 577The user may configure the NUMA affinity for any additional PCI segments. 578 579_Example_ 580 581``` 582--platform num_pci_segments=2 583--memory-zone size=16G,host_numa_node=0,id=mem0 584--memory-zone size=16G,host_numa_node=1,id=mem1 585--numa guest_numa_id=0,memory_zones=mem0,pci_segments=[0] 586--numa guest_numa_id=1,memory_zones=mem1,pci_segments=[1] 587``` 588