1# Memory 2 3Cloud Hypervisor has many ways to expose memory to the guest VM. This document 4aims to explain what Cloud Hypervisor is capable of and how it can be used to 5meet the needs of very different use cases. 6 7## Basic Parameters 8 9`MemoryConfig` or what is known as `--memory` from the CLI perspective is the 10easiest way to get started with Cloud Hypervisor. 11 12```rust 13struct MemoryConfig { 14 size: u64, 15 mergeable: bool, 16 hotplug_method: HotplugMethod, 17 hotplug_size: Option<u64>, 18 hotplugged_size: Option<u64>, 19 shared: bool, 20 hugepages: bool, 21 hugepage_size: Option<u64>, 22 prefault: bool, 23 thp: bool 24 zones: Option<Vec<MemoryZoneConfig>>, 25} 26``` 27 28``` 29--memory <memory> Memory parameters "size=<guest_memory_size>,mergeable=on|off,shared=on|off,hugepages=on|off,hugepage_size=<hugepage_size>,hotplug_method=acpi|virtio-mem,hotplug_size=<hotpluggable_memory_size>,hotplugged_size=<hotplugged_memory_size>,prefault=on|off,thp=on|off" [default: size=512M,thp=on] 30``` 31 32### `size` 33 34Size of the RAM in the guest VM. 35 36This option is mandatory when using the `--memory` parameter. 37 38Value is an unsigned integer of 64 bits. 39 40_Example_ 41 42``` 43--memory size=1G 44``` 45 46### `mergeable` 47 48Specifies if the pages from the guest RAM must be marked as _mergeable_. In 49case this option is `true` or `on`, the pages will be marked with `madvise(2)` 50to let the host kernel know which pages are eligible for being merged by the 51KSM daemon. 52 53This option can be used when trying to reach a higher density of VMs running 54on a single host, as it will reduce the amount of memory consumed by each VM. 55 56By default this option is turned off. 57 58_Example_ 59 60``` 61--memory size=1G,mergeable=on 62``` 63 64### `hotplug_method` 65 66Selects the way of adding and/or removing memory to/from a booted VM. 67 68Possible values are `acpi` and `virtio-mem`. Default value is `acpi`. 69 70_Example_ 71 72``` 73--memory size=1G,hotplug_method=acpi 74``` 75 76### `hotplug_size` 77 78Amount of memory that can be dynamically added to the VM. 79 80Value is an unsigned integer of 64 bits. A value of 0 is invalid. 81 82_Example_ 83 84``` 85--memory size=1G,hotplug_size=1G 86``` 87 88### `hotplugged_size` 89 90Amount of memory that will be dynamically added to the VM at boot. This option 91allows for starting a VM with a certain amount of memory that can be reduced 92during runtime. 93 94This is only valid when the `hotplug_method` is `virtio-mem` as it does not 95make sense for the `acpi` use case. When using ACPI, the memory can't be 96resized after it has been extended. 97 98This option is only valid when `hotplug_size` is specified, and its value can't 99exceed the value of `hotplug_size`. 100 101Value is an unsigned integer of 64 bits. A value of 0 is invalid. 102 103_Example_ 104 105``` 106--memory size=1G,hotplug_method=virtio-mem,hotplug_size=1G,hotplugged_size=512M 107``` 108 109### `shared` 110 111Specifies if the memory must be `mmap(2)` with `MAP_SHARED` flag. 112 113By sharing a memory mapping, one can share the guest RAM with other processes 114running on the host. One can use this option when running vhost-user devices 115as part of the VM device model, as they will be driven by standalone daemons 116needing access to the guest RAM content. 117 118By default this option is turned off, which results in performing `mmap(2)` 119with `MAP_PRIVATE` flag. 120 121If `hugepages=on` then the value of this field is ignored as huge pages always 122requires `MAP_SHARED`. 123 124_Example_ 125 126``` 127--memory size=1G,shared=on 128``` 129 130### `hugepages` and `hugepage_size` 131 132Specifies if the memory must be created and `mmap(2)` with `MAP_HUGETLB` and size 133flags. This performs a memory mapping relying on the specified huge page size. 134If no huge page size is supplied the system's default huge page size is used. 135 136By using hugepages, one can improve the overall performance of the VM, assuming 137the guest will allocate hugepages as well. Another interesting use case is VFIO 138as it speeds up the VM's boot time since the amount of IOMMU mappings are 139reduced. 140 141The user is responsible for ensuring there are sufficient huge pages of the 142specified size for the VMM to use. Failure to do so may result in strange VMM 143behaviour, e.g. error with `ReadKernelImage` is common. If there is a strange 144error with `hugepages` enabled, just disable it or check whether there are enough 145huge pages. 146 147If `hugepages=on` then the value of `shared` is ignored as huge pages always 148requires `MAP_SHARED`. 149 150By default this option is turned off. 151 152_Example_ 153 154``` 155--memory size=1G,hugepages=on,hugepage_size=2M 156``` 157 158### `prefault` 159 160Specifies if the memory must be `mmap(2)` with `MAP_POPULATE` flag. 161 162By triggering prefault, one can allocate all required physical memory and create 163its page tables while calling `mmap`. With physical memory allocated, the number 164of page faults will decrease during running, and performance will also improve. 165 166Note that boot of VM will be slower with `prefault` enabled because of allocating 167physical memory and creating page tables in advance, and physical memory of the 168specified size will be consumed quickly. 169 170This option only takes effect at boot of VM. There is also a `prefault` option in 171restore and its choice will overwrite `prefault` in memory. 172 173By default this option is turned off. 174 175_Example_ 176 177``` 178--memory size=1G,prefault=on 179``` 180 181### `thp` 182 183Specifies if private anonymous memory for the guest (i.e. `shared=off` and no 184backing file) should be labelled `MADV_HUGEPAGE` with `madvise(2)` indicating 185to the kernel that this memory may be backed with huge pages transparently. 186 187The use of transparent huge pages can improve the performance of the guest as 188there will fewer virtualisation related page faults. Unlike using 189`hugepages=on` a specific number of huge pages do not need to be allocated by 190the kernel. 191 192By default this option is turned on. 193 194_Example_ 195 196``` 197--memory size=1G,thp=on 198``` 199 200## Advanced Parameters 201 202`MemoryZoneConfig` or what is known as `--memory-zone` from the CLI perspective 203is a power user parameter. It allows for a full description of the guest RAM, 204describing how every memory region is backed and exposed to the guest. 205 206```rust 207struct MemoryZoneConfig { 208 id: String, 209 size: u64, 210 shared: bool, 211 hugepages: bool, 212 hugepage_size: Option<u64>, 213 host_numa_node: Option<u32>, 214 hotplug_size: Option<u64>, 215 hotplugged_size: Option<u64>, 216 prefault: bool, 217} 218``` 219 220``` 221--memory-zone <memory-zone> User defined memory zone parameters "size=<guest_memory_region_size>,shared=on|off,hugepages=on|off,hugepage_size=<hugepage_size>,host_numa_node=<node_id>,id=<zone_identifier>,hotplug_size=<hotpluggable_memory_size>,hotplugged_size=<hotplugged_memory_size>,prefault=on|off" 222``` 223 224This parameter expects one or more occurences, allowing for a list of memory 225zones to be defined. It must be used with `--memory size=0`, clearly indicating 226that the memory will be described through advanced parameters. 227 228Each zone is given a list of options which we detail through the following 229sections. 230 231### `id` 232 233Memory zone identifier. This identifier must be unique, otherwise an error will 234be returned. 235 236This option is useful when referring to a memory zone previously created. In 237particular, the `--numa` parameter can associate a memory zone to a specific 238NUMA node based on the memory zone identifier. 239 240This option is mandatory when using the `--memory-zone` parameter. 241 242Value is a string. 243 244_Example_ 245 246``` 247--memory size=0 248--memory-zone id=mem0,size=1G 249``` 250 251### `size` 252 253Size of the memory zone. 254 255This option is mandatory when using the `--memory-zone` parameter. 256 257Value is an unsigned integer of 64 bits. 258 259_Example_ 260 261``` 262--memory size=0 263--memory-zone id=mem0,size=1G 264``` 265 266### `shared` 267 268Specifies if the memory zone must be `mmap(2)` with `MAP_SHARED` flag. 269 270By sharing a memory zone mapping, one can share part of the guest RAM with 271other processes running on the host. One can use this option when running 272vhost-user devices as part of the VM device model, as they will be driven 273by standalone daemons needing access to the guest RAM content. 274 275If `hugepages=on` then the value of this field is ignored as huge pages always 276requires `MAP_SHARED`. 277 278By default this option is turned off, which result in performing `mmap(2)` 279with `MAP_PRIVATE` flag. 280 281_Example_ 282 283``` 284--memory size=0 285--memory-zone id=mem0,size=1G,shared=on 286``` 287 288### `hugepages` and `hugepage_size` 289 290Specifies if the memory must be created and `mmap(2)` with `MAP_HUGETLB` and size 291flags. This performs a memory mapping relying on the specified huge page size. 292If no huge page size is supplied the system's default huge page size is used. 293 294By using hugepages, one can improve the overall performance of the VM, assuming 295the guest will allocate hugepages as well. Another interesting use case is VFIO 296as it speeds up the VM's boot time since the amount of IOMMU mappings are 297reduced. 298 299The user is responsible for ensuring there are sufficient huge pages of the 300specified size for the VMM to use. Failure to do so may result in strange VMM 301behaviour, e.g. error with `ReadKernelImage` is common. If there is a strange 302error with `hugepages` enabled, just disable it or check whether there are enough 303huge pages. 304 305If `hugepages=on` then the value of `shared` is ignored as huge pages always 306requires `MAP_SHARED`. 307 308By default this option is turned off. 309 310_Example_ 311 312``` 313--memory size=0 314--memory-zone id=mem0,size=1G,hugepages=on,hugepage_size=2M 315``` 316 317### `host_numa_node` 318 319Node identifier of a node present on the host. This option will let the user 320pick a specific NUMA node from which the memory must be allocated. After the 321memory zone is `mmap(2)`, the NUMA policy for this memory mapping will be 322applied through `mbind(2)`, relying on the provided node identifier. If the 323node does not exist on the host, the call to `mbind(2)` will fail. 324 325This option is useful when trying to back a VM memory with a specific type of 326memory from the host. Assuming a host has two types of memory, with one slower 327than the other, each related to a distinct NUMA node, one could create a VM 328with slower memory accesses by backing the entire guest RAM from the furthest 329NUMA node on the host. 330 331This option also gives the opportunity to create a VM with non uniform memory 332accesses as one could define a first memory zone backed by fast memory, and a 333second memory zone backed by slow memory. 334 335Value is an unsigned integer of 32 bits. 336 337_Example_ 338 339``` 340--memory size=0 341--memory-zone id=mem0,size=1G,host_numa_node=0 342``` 343 344### `hotplug_size` 345 346Amount of memory that can be dynamically added to the memory zone. Since 347`virtio-mem` is the only way of resizing a memory zone, one must specify 348the `hotplug_method=virtio-mem` to the `--memory` parameter. 349 350Value is an unsigned integer of 64 bits. A value of 0 is invalid. 351 352_Example_ 353 354``` 355--memory size=0,hotplug_method=virtio-mem 356--memory-zone id=mem0,size=1G,hotplug_size=1G 357``` 358 359### `hotplugged_size` 360 361Amount of memory that will be dynamically added to a memory zone at VM's boot. 362This option allows for starting a VM with a certain amount of memory that can 363be reduced during runtime. 364 365This is only valid when the `hotplug_method` is `virtio-mem` as it does not 366make sense for the `acpi` use case. When using ACPI, the memory can't be 367resized after it has been extended. 368 369This option is only valid when `hotplug_size` is specified, and its value can't 370exceed the value of `hotplug_size`. 371 372Value is an unsigned integer of 64 bits. A value of 0 is invalid. 373 374_Example_ 375 376``` 377--memory size=0,hotplug_method=virtio-mem 378--memory-zone id=mem0,size=1G,hotplug_size=1G,hotplugged_size=512M 379``` 380 381### `prefault` 382 383Specifies if the memory must be `mmap(2)` with `MAP_POPULATE` flag. 384 385By triggering prefault, one can allocate all required physical memory and create 386its page tables while calling `mmap`. With physical memory allocated, the number 387of page faults will decrease during running, and performance will also improve. 388 389Note that boot of VM will be slower with `prefault` enabled because of allocating 390physical memory and creating page tables in advance, and physical memory of the 391specified size will be consumed quickly. 392 393This option only takes effect at boot of VM. There is also a `prefault` option in 394restore and its choice will overwrite `prefault` in memory. 395 396By default this option is turned off. 397 398_Example_ 399 400``` 401--memory size=0 402--memory-zone id=mem0,size=1G,prefault=on 403``` 404 405## NUMA settings 406 407`NumaConfig` or what is known as `--numa` from the CLI perspective has been 408introduced to define a guest NUMA topology. It allows for a fine description 409about the CPUs and memory ranges associated with each NUMA node. Additionally 410it allows for specifying the distance between each NUMA node. 411 412```rust 413struct NumaConfig { 414 guest_numa_id: u32, 415 cpus: Option<Vec<u8>>, 416 distances: Option<Vec<NumaDistance>>, 417 memory_zones: Option<Vec<String>>, 418 sgx_epc_sections: Option<Vec<String>>, 419} 420``` 421 422``` 423--numa <numa> Settings related to a given NUMA node "guest_numa_id=<node_id>,cpus=<cpus_id>,distances=<list_of_distances_to_destination_nodes>,memory_zones=<list_of_memory_zones>,sgx_epc_sections=<list_of_sgx_epc_sections>" 424``` 425 426### `guest_numa_id` 427 428Node identifier of a guest NUMA node. This identifier must be unique, otherwise 429an error will be returned. 430 431This option is mandatory when using the `--numa` parameter. 432 433Value is an unsigned integer of 32 bits. 434 435_Example_ 436 437``` 438--numa guest_numa_id=0 439``` 440 441### `cpus` 442 443List of virtual CPUs attached to the guest NUMA node identified by the 444`guest_numa_id` option. This allows for describing a list of CPUs which 445must be seen by the guest as belonging to the NUMA node `guest_numa_id`. 446 447One can use this option for a fine grained description of the NUMA topology 448regarding the CPUs associated with it, which might help the guest run more 449efficiently. 450 451Multiple values can be provided to define the list. Each value is an unsigned 452integer of 8 bits. 453 454For instance, if one needs to attach all CPUs from 0 to 4 to a specific node, 455the syntax using `-` will help define a contiguous range with `cpus=0-4`. The 456same example could also be described with `cpus=[0,1,2,3,4]`. 457 458A combination of both `-` and `,` separators is useful when one might need to 459describe a list containing all CPUs from 0 to 99 and the CPU 255, as it could 460simply be described with `cpus=[0-99,255]`. 461 462As soon as one tries to describe a list of values, `[` and `]` must be used to 463demarcate the list. 464 465_Example_ 466 467``` 468--cpus boot=8 469--numa guest_numa_id=0,cpus=[1-3,7] guest_numa_id=1,cpus=[0,4-6] 470``` 471 472### `distances` 473 474List of distances between the current NUMA node referred by `guest_numa_id` 475and the destination NUMA nodes listed along with distances. This option let 476the user choose the distances between guest NUMA nodes. This is important to 477provide an accurate description of the way non uniform memory accesses will 478perform in the guest. 479 480One or more tuple of two values must be provided through this option. The first 481value is an unsigned integer of 32 bits as it represents the destination NUMA 482node. The second value is an unsigned integer of 8 bits as it represents the 483distance between the current NUMA node and the destination NUMA node. The two 484values are separated by `@` (`value1@value2`), meaning the destination NUMA 485node `value1` is located at a distance of `value2`. Each tuple is separated 486from the others with `,` separator. 487 488As soon as one tries to describe a list of values, `[` and `]` must be used to 489demarcate the list. 490 491For instance, if one wants to define 3 NUMA nodes, with each node located at 492different distances, it can be described with the following example. 493 494_Example_ 495 496``` 497--numa guest_numa_id=0,distances=[1@15,2@25] guest_numa_id=1,distances=[0@15,2@20] guest_numa_id=2,distances=[0@25,1@20] 498``` 499 500### `memory_zones` 501 502List of memory zones attached to the guest NUMA node identified by the 503`guest_numa_id` option. This allows for describing a list of memory ranges 504which must be seen by the guest as belonging to the NUMA node `guest_numa_id`. 505 506This option can be very useful and powerful when combined with `host_numa_node` 507option from `--memory-zone` parameter as it allows for creating a VM with non 508uniform memory accesses, and let the guest know about it. It allows for 509exposing memory zones through different NUMA nodes, which can help the guest 510workload run more efficiently. 511 512Multiple values can be provided to define the list. Each value is a string 513referring to an existing memory zone identifier. Values are separated from 514each other with the `,` separator. 515 516As soon as one tries to describe a list of values, `[` and `]` must be used to 517demarcate the list. 518 519Note that a memory zone must belong to a single NUMA node. The following 520configuration is incorrect, therefore not allowed: 521`--numa guest_numa_id=0,memory_zones=mem0 guest_numa_id=1,memory_zones=mem0` 522 523_Example_ 524 525``` 526--memory size=0 527--memory-zone id=mem0,size=1G id=mem1,size=1G id=mem2,size=1G 528--numa guest_numa_id=0,memory_zones=[mem0,mem2] guest_numa_id=1,memory_zones=mem1 529``` 530 531### `sgx_epc_sections` 532 533List of SGX EPC sections attached to the guest NUMA node identified by the 534`guest_numa_id` option. This allows for describing a list of SGX EPC sections 535which must be seen by the guest as belonging to the NUMA node `guest_numa_id`. 536 537Multiple values can be provided to define the list. Each value is a string 538referring to an existing SGX EPC section identifier. Values are separated from 539each other with the `,` separator. 540 541As soon as one tries to describe a list of values, `[` and `]` must be used to 542demarcate the list. 543 544_Example_ 545 546``` 547--sgx-epc id=epc0,size=32M id=epc1,size=64M id=epc2,size=32M 548--numa guest_numa_id=0,sgx_epc_sections=epc1 guest_numa_id=1,sgx_epc_sections=[epc0,epc2] 549``` 550 551### PCI bus 552 553Cloud Hypervisor supports only one PCI bus, which is why it has been tied to 554the NUMA node 0 by default. It is the user responsibility to organize the NUMA 555nodes correctly so that vCPUs and guest RAM which should be located on the same 556NUMA node as the PCI bus end up on the NUMA node 0. 557