xref: /cloud-hypervisor/docs/memory.md (revision 5e52729453cb62edbe4fb3a4aa24f8cca31e667e)
1# Memory
2
3Cloud Hypervisor has many ways to expose memory to the guest VM. This document
4aims to explain what Cloud Hypervisor is capable of and how it can be used to
5meet the needs of very different use cases.
6
7## Basic Parameters
8
9`MemoryConfig` or what is known as `--memory` from the CLI perspective is the
10easiest way to get started with Cloud Hypervisor.
11
12```rust
13struct MemoryConfig {
14    size: u64,
15    mergeable: bool,
16    hotplug_method: HotplugMethod,
17    hotplug_size: Option<u64>,
18    hotplugged_size: Option<u64>,
19    shared: bool,
20    hugepages: bool,
21    hugepage_size: Option<u64>,
22    prefault: bool,
23    thp: bool
24    zones: Option<Vec<MemoryZoneConfig>>,
25}
26```
27
28```
29--memory <memory>	Memory parameters "size=<guest_memory_size>,mergeable=on|off,shared=on|off,hugepages=on|off,hugepage_size=<hugepage_size>,hotplug_method=acpi|virtio-mem,hotplug_size=<hotpluggable_memory_size>,hotplugged_size=<hotplugged_memory_size>,prefault=on|off,thp=on|off" [default: size=512M,thp=on]
30```
31
32### `size`
33
34Size of the RAM in the guest VM.
35
36This option is mandatory when using the `--memory` parameter.
37
38Value is an unsigned integer of 64 bits.
39
40_Example_
41
42```
43--memory size=1G
44```
45
46### `mergeable`
47
48Specifies if the pages from the guest RAM must be marked as _mergeable_. In
49case this option is `true` or `on`, the pages will be marked with `madvise(2)`
50to let the host kernel know which pages are eligible for being merged by the
51KSM daemon.
52
53This option can be used when trying to reach a higher density of VMs running
54on a single host, as it will reduce the amount of memory consumed by each VM.
55
56By default this option is turned off.
57
58_Example_
59
60```
61--memory size=1G,mergeable=on
62```
63
64### `hotplug_method`
65
66Selects the way of adding and/or removing memory to/from a booted VM.
67
68Possible values are `acpi` and `virtio-mem`. Default value is `acpi`.
69
70_Example_
71
72```
73--memory size=1G,hotplug_method=acpi
74```
75
76### `hotplug_size`
77
78Amount of memory that can be dynamically added to the VM.
79
80Value is an unsigned integer of 64 bits. A value of 0 is invalid.
81
82_Example_
83
84```
85--memory size=1G,hotplug_size=1G
86```
87
88### `hotplugged_size`
89
90Amount of memory that will be dynamically added to the VM at boot. This option
91allows for starting a VM with a certain amount of memory that can be reduced
92during runtime.
93
94This is only valid when the `hotplug_method` is `virtio-mem` as it does not
95make sense for the `acpi` use case. When using ACPI, the memory can't be
96resized after it has been extended.
97
98This option is only valid when `hotplug_size` is specified, and its value can't
99exceed the value of `hotplug_size`.
100
101Value is an unsigned integer of 64 bits. A value of 0 is invalid.
102
103_Example_
104
105```
106--memory size=1G,hotplug_method=virtio-mem,hotplug_size=1G,hotplugged_size=512M
107```
108
109### `shared`
110
111Specifies if the memory must be `mmap(2)` with `MAP_SHARED` flag.
112
113By sharing a memory mapping, one can share the guest RAM with other processes
114running on the host. One can use this option when running vhost-user devices
115as part of the VM device model, as they will be driven by standalone daemons
116needing access to the guest RAM content.
117
118By default this option is turned off, which results in performing `mmap(2)`
119with `MAP_PRIVATE` flag.
120
121If `hugepages=on` then the value of this field is ignored as huge pages always
122requires `MAP_SHARED`.
123
124_Example_
125
126```
127--memory size=1G,shared=on
128```
129
130### `hugepages` and `hugepage_size`
131
132Specifies if the memory must be created and `mmap(2)` with `MAP_HUGETLB` and size
133flags. This performs a memory mapping relying on the specified huge page size.
134If no huge page size is supplied the system's default huge page size is used.
135
136By using hugepages, one can improve the overall performance of the VM, assuming
137the guest will allocate hugepages as well. Another interesting use case is VFIO
138as it speeds up the VM's boot time since the amount of IOMMU mappings are
139reduced.
140
141The user is responsible for ensuring there are sufficient huge pages of the
142specified size for the VMM to use. Failure to do so may result in strange VMM
143behaviour, e.g. error with `ReadKernelImage` is common. If there is a strange
144error with `hugepages` enabled, just disable it or check whether there are enough
145huge pages.
146
147If `hugepages=on` then the value of `shared` is ignored as huge pages always
148requires `MAP_SHARED`.
149
150By default this option is turned off.
151
152_Example_
153
154```
155--memory size=1G,hugepages=on,hugepage_size=2M
156```
157
158### `prefault`
159
160Specifies if the memory must be `mmap(2)` with `MAP_POPULATE` flag.
161
162By triggering prefault, one can allocate all required physical memory and create
163its page tables while calling `mmap`. With physical memory allocated, the number
164of page faults will decrease during running, and performance will also improve.
165
166Note that boot of VM will be slower with `prefault` enabled because of allocating
167physical memory and creating page tables in advance, and physical memory of the
168specified size will be consumed quickly.
169
170This option only takes effect at boot of VM. There is also a `prefault` option in
171restore and its choice will overwrite `prefault` in memory.
172
173By default this option is turned off.
174
175_Example_
176
177```
178--memory size=1G,prefault=on
179```
180
181### `thp`
182
183Specifies if private anonymous memory for the guest (i.e. `shared=off` and no
184backing file) should be labelled `MADV_HUGEPAGE` with `madvise(2)` indicating
185to the kernel that this memory may be backed with huge pages transparently.
186
187The use of transparent huge pages can improve the performance of the guest as
188there will fewer virtualisation related page faults. Unlike using
189`hugepages=on` a specific number of huge pages do not need to be allocated by
190the kernel.
191
192By default this option is turned on.
193
194_Example_
195
196```
197--memory size=1G,thp=on
198```
199
200## Advanced Parameters
201
202`MemoryZoneConfig` or what is known as `--memory-zone` from the CLI perspective
203is a power user parameter. It allows for a full description of the guest RAM,
204describing how every memory region is backed and exposed to the guest.
205
206```rust
207struct MemoryZoneConfig {
208    id: String,
209    size: u64,
210    shared: bool,
211    hugepages: bool,
212    hugepage_size: Option<u64>,
213    host_numa_node: Option<u32>,
214    hotplug_size: Option<u64>,
215    hotplugged_size: Option<u64>,
216    prefault: bool,
217}
218```
219
220```
221--memory-zone <memory-zone>	User defined memory zone parameters "size=<guest_memory_region_size>,shared=on|off,hugepages=on|off,hugepage_size=<hugepage_size>,host_numa_node=<node_id>,id=<zone_identifier>,hotplug_size=<hotpluggable_memory_size>,hotplugged_size=<hotplugged_memory_size>,prefault=on|off"
222```
223
224This parameter expects one or more occurences, allowing for a list of memory
225zones to be defined. It must be used with `--memory size=0`, clearly indicating
226that the memory will be described through advanced parameters.
227
228Each zone is given a list of options which we detail through the following
229sections.
230
231### `id`
232
233Memory zone identifier. This identifier must be unique, otherwise an error will
234be returned.
235
236This option is useful when referring to a memory zone previously created. In
237particular, the `--numa` parameter can associate a memory zone to a specific
238NUMA node based on the memory zone identifier.
239
240This option is mandatory when using the `--memory-zone` parameter.
241
242Value is a string.
243
244_Example_
245
246```
247--memory size=0
248--memory-zone id=mem0,size=1G
249```
250
251### `size`
252
253Size of the memory zone.
254
255This option is mandatory when using the `--memory-zone` parameter.
256
257Value is an unsigned integer of 64 bits.
258
259_Example_
260
261```
262--memory size=0
263--memory-zone id=mem0,size=1G
264```
265
266### `shared`
267
268Specifies if the memory zone must be `mmap(2)` with `MAP_SHARED` flag.
269
270By sharing a memory zone mapping, one can share part of the guest RAM with
271other processes running on the host. One can use this option when running
272vhost-user devices as part of the VM device model, as they will be driven
273by standalone daemons needing access to the guest RAM content.
274
275If `hugepages=on` then the value of this field is ignored as huge pages always
276requires `MAP_SHARED`.
277
278By default this option is turned off, which result in performing `mmap(2)`
279with `MAP_PRIVATE` flag.
280
281_Example_
282
283```
284--memory size=0
285--memory-zone id=mem0,size=1G,shared=on
286```
287
288### `hugepages` and `hugepage_size`
289
290Specifies if the memory must be created and `mmap(2)` with `MAP_HUGETLB` and size
291flags. This performs a memory mapping relying on the specified huge page size.
292If no huge page size is supplied the system's default huge page size is used.
293
294By using hugepages, one can improve the overall performance of the VM, assuming
295the guest will allocate hugepages as well. Another interesting use case is VFIO
296as it speeds up the VM's boot time since the amount of IOMMU mappings are
297reduced.
298
299The user is responsible for ensuring there are sufficient huge pages of the
300specified size for the VMM to use. Failure to do so may result in strange VMM
301behaviour, e.g. error with `ReadKernelImage` is common. If there is a strange
302error with `hugepages` enabled, just disable it or check whether there are enough
303huge pages.
304
305If `hugepages=on` then the value of `shared` is ignored as huge pages always
306requires `MAP_SHARED`.
307
308By default this option is turned off.
309
310_Example_
311
312```
313--memory size=0
314--memory-zone id=mem0,size=1G,hugepages=on,hugepage_size=2M
315```
316
317### `host_numa_node`
318
319Node identifier of a node present on the host. This option will let the user
320pick a specific NUMA node from which the memory must be allocated. After the
321memory zone is `mmap(2)`, the NUMA policy for this memory mapping will be
322applied through `mbind(2)`, relying on the provided node identifier. If the
323node does not exist on the host, the call to `mbind(2)` will fail.
324
325This option is useful when trying to back a VM memory with a specific type of
326memory from the host. Assuming a host has two types of memory, with one slower
327than the other, each related to a distinct NUMA node, one could create a VM
328with slower memory accesses by backing the entire guest RAM from the furthest
329NUMA node on the host.
330
331This option also gives the opportunity to create a VM with non uniform memory
332accesses as one could define a first memory zone backed by fast memory, and a
333second memory zone backed by slow memory.
334
335Value is an unsigned integer of 32 bits.
336
337_Example_
338
339```
340--memory size=0
341--memory-zone id=mem0,size=1G,host_numa_node=0
342```
343
344### `hotplug_size`
345
346Amount of memory that can be dynamically added to the memory zone. Since
347`virtio-mem` is the only way of resizing a memory zone, one must specify
348the `hotplug_method=virtio-mem` to the `--memory` parameter.
349
350Value is an unsigned integer of 64 bits. A value of 0 is invalid.
351
352_Example_
353
354```
355--memory size=0,hotplug_method=virtio-mem
356--memory-zone id=mem0,size=1G,hotplug_size=1G
357```
358
359### `hotplugged_size`
360
361Amount of memory that will be dynamically added to a memory zone at VM's boot.
362This option allows for starting a VM with a certain amount of memory that can
363be reduced during runtime.
364
365This is only valid when the `hotplug_method` is `virtio-mem` as it does not
366make sense for the `acpi` use case. When using ACPI, the memory can't be
367resized after it has been extended.
368
369This option is only valid when `hotplug_size` is specified, and its value can't
370exceed the value of `hotplug_size`.
371
372Value is an unsigned integer of 64 bits. A value of 0 is invalid.
373
374_Example_
375
376```
377--memory size=0,hotplug_method=virtio-mem
378--memory-zone id=mem0,size=1G,hotplug_size=1G,hotplugged_size=512M
379```
380
381### `prefault`
382
383Specifies if the memory must be `mmap(2)` with `MAP_POPULATE` flag.
384
385By triggering prefault, one can allocate all required physical memory and create
386its page tables while calling `mmap`. With physical memory allocated, the number
387of page faults will decrease during running, and performance will also improve.
388
389Note that boot of VM will be slower with `prefault` enabled because of allocating
390physical memory and creating page tables in advance, and physical memory of the
391specified size will be consumed quickly.
392
393This option only takes effect at boot of VM. There is also a `prefault` option in
394restore and its choice will overwrite `prefault` in memory.
395
396By default this option is turned off.
397
398_Example_
399
400```
401--memory size=0
402--memory-zone id=mem0,size=1G,prefault=on
403```
404
405## NUMA settings
406
407`NumaConfig` or what is known as `--numa` from the CLI perspective has been
408introduced to define a guest NUMA topology. It allows for a fine description
409about the CPUs and memory ranges associated with each NUMA node. Additionally
410it allows for specifying the distance between each NUMA node.
411
412```rust
413struct NumaConfig {
414    guest_numa_id: u32,
415    cpus: Option<Vec<u8>>,
416    distances: Option<Vec<NumaDistance>>,
417    memory_zones: Option<Vec<String>>,
418    sgx_epc_sections: Option<Vec<String>>,
419}
420```
421
422```
423--numa <numa>	Settings related to a given NUMA node "guest_numa_id=<node_id>,cpus=<cpus_id>,distances=<list_of_distances_to_destination_nodes>,memory_zones=<list_of_memory_zones>,sgx_epc_sections=<list_of_sgx_epc_sections>"
424```
425
426### `guest_numa_id`
427
428Node identifier of a guest NUMA node. This identifier must be unique, otherwise
429an error will be returned.
430
431This option is mandatory when using the `--numa` parameter.
432
433Value is an unsigned integer of 32 bits.
434
435_Example_
436
437```
438--numa guest_numa_id=0
439```
440
441### `cpus`
442
443List of virtual CPUs attached to the guest NUMA node identified by the
444`guest_numa_id` option. This allows for describing a list of CPUs which
445must be seen by the guest as belonging to the NUMA node `guest_numa_id`.
446
447One can use this option for a fine grained description of the NUMA topology
448regarding the CPUs associated with it, which might help the guest run more
449efficiently.
450
451Multiple values can be provided to define the list. Each value is an unsigned
452integer of 8 bits.
453
454For instance, if one needs to attach all CPUs from 0 to 4 to a specific node,
455the syntax using `-` will help define a contiguous range with `cpus=0-4`. The
456same example could also be described with `cpus=[0,1,2,3,4]`.
457
458A combination of both `-` and `,` separators is useful when one might need to
459describe a list containing all CPUs from 0 to 99 and the CPU 255, as it could
460simply be described with `cpus=[0-99,255]`.
461
462As soon as one tries to describe a list of values, `[` and `]` must be used to
463demarcate the list.
464
465_Example_
466
467```
468--cpus boot=8
469--numa guest_numa_id=0,cpus=[1-3,7] guest_numa_id=1,cpus=[0,4-6]
470```
471
472### `distances`
473
474List of distances between the current NUMA node referred by `guest_numa_id`
475and the destination NUMA nodes listed along with distances. This option let
476the user choose the distances between guest NUMA nodes. This is important to
477provide an accurate description of the way non uniform memory accesses will
478perform in the guest.
479
480One or more tuple of two values must be provided through this option. The first
481value is an unsigned integer of 32 bits as it represents the destination NUMA
482node. The second value is an unsigned integer of 8 bits as it represents the
483distance between the current NUMA node and the destination NUMA node. The two
484values are separated by `@` (`value1@value2`), meaning the destination NUMA
485node `value1` is located at a distance of `value2`. Each tuple is separated
486from the others with `,` separator.
487
488As soon as one tries to describe a list of values, `[` and `]` must be used to
489demarcate the list.
490
491For instance, if one wants to define 3 NUMA nodes, with each node located at
492different distances, it can be described with the following example.
493
494_Example_
495
496```
497--numa guest_numa_id=0,distances=[1@15,2@25] guest_numa_id=1,distances=[0@15,2@20] guest_numa_id=2,distances=[0@25,1@20]
498```
499
500### `memory_zones`
501
502List of memory zones attached to the guest NUMA node identified by the
503`guest_numa_id` option. This allows for describing a list of memory ranges
504which must be seen by the guest as belonging to the NUMA node `guest_numa_id`.
505
506This option can be very useful and powerful when combined with `host_numa_node`
507option from `--memory-zone` parameter as it allows for creating a VM with non
508uniform memory accesses, and let the guest know about it. It allows for
509exposing memory zones through different NUMA nodes, which can help the guest
510workload run more efficiently.
511
512Multiple values can be provided to define the list. Each value is a string
513referring to an existing memory zone identifier. Values are separated from
514each other with the `,` separator.
515
516As soon as one tries to describe a list of values, `[` and `]` must be used to
517demarcate the list.
518
519Note that a memory zone must belong to a single NUMA node. The following
520configuration is incorrect, therefore not allowed:
521`--numa guest_numa_id=0,memory_zones=mem0 guest_numa_id=1,memory_zones=mem0`
522
523_Example_
524
525```
526--memory size=0
527--memory-zone id=mem0,size=1G id=mem1,size=1G id=mem2,size=1G
528--numa guest_numa_id=0,memory_zones=[mem0,mem2] guest_numa_id=1,memory_zones=mem1
529```
530
531### `sgx_epc_sections`
532
533List of SGX EPC sections attached to the guest NUMA node identified by the
534`guest_numa_id` option. This allows for describing a list of SGX EPC sections
535which must be seen by the guest as belonging to the NUMA node `guest_numa_id`.
536
537Multiple values can be provided to define the list. Each value is a string
538referring to an existing SGX EPC section identifier. Values are separated from
539each other with the `,` separator.
540
541As soon as one tries to describe a list of values, `[` and `]` must be used to
542demarcate the list.
543
544_Example_
545
546```
547--sgx-epc id=epc0,size=32M id=epc1,size=64M id=epc2,size=32M
548--numa guest_numa_id=0,sgx_epc_sections=epc1 guest_numa_id=1,sgx_epc_sections=[epc0,epc2]
549```
550
551### PCI bus
552
553Cloud Hypervisor supports only one PCI bus, which is why it has been tied to
554the NUMA node 0 by default. It is the user responsibility to organize the NUMA
555nodes correctly so that vCPUs and guest RAM which should be located on the same
556NUMA node as the PCI bus end up on the NUMA node 0.
557