xref: /cloud-hypervisor/docs/memory.md (revision 6f8bd27cf7629733582d930519e98d19e90afb16)
1# Memory
2
3Cloud Hypervisor has many ways to expose memory to the guest VM. This document
4aims to explain what Cloud Hypervisor is capable of and how it can be used to
5meet the needs of very different use cases.
6
7## Basic Parameters
8
9`MemoryConfig` or what is known as `--memory` from the CLI perspective is the
10easiest way to get started with Cloud Hypervisor.
11
12```rust
13struct MemoryConfig {
14    size: u64,
15    mergeable: bool,
16    hotplug_method: HotplugMethod,
17    hotplug_size: Option<u64>,
18    hotplugged_size: Option<u64>,
19    shared: bool,
20    hugepages: bool,
21    hugepage_size: Option<u64>,
22    prefault: bool,
23    thp: bool
24    zones: Option<Vec<MemoryZoneConfig>>,
25}
26```
27
28```
29--memory <memory>	Memory parameters "size=<guest_memory_size>,mergeable=on|off,shared=on|off,hugepages=on|off,hugepage_size=<hugepage_size>,hotplug_method=acpi|virtio-mem,hotplug_size=<hotpluggable_memory_size>,hotplugged_size=<hotplugged_memory_size>,prefault=on|off,thp=on|off" [default: size=512M,thp=on]
30```
31
32### `size`
33
34Size of the RAM in the guest VM.
35
36This option is mandatory when using the `--memory` parameter.
37
38Value is an unsigned integer of 64 bits.
39
40_Example_
41
42```
43--memory size=1G
44```
45
46### `mergeable`
47
48Specifies if the pages from the guest RAM must be marked as _mergeable_. In
49case this option is `true` or `on`, the pages will be marked with `madvise(2)`
50to let the host kernel know which pages are eligible for being merged by the
51KSM daemon.
52
53This option can be used when trying to reach a higher density of VMs running
54on a single host, as it will reduce the amount of memory consumed by each VM.
55
56By default this option is turned off.
57
58_Example_
59
60```
61--memory size=1G,mergeable=on
62```
63
64### `hotplug_method`
65
66Selects the way of adding and/or removing memory to/from a booted VM.
67
68Possible values are `acpi` and `virtio-mem`. Default value is `acpi`.
69
70_Example_
71
72```
73--memory size=1G,hotplug_method=acpi
74```
75
76### `hotplug_size`
77
78Amount of memory that can be dynamically added to the VM.
79
80Value is an unsigned integer of 64 bits. A value of 0 is invalid.
81
82_Example_
83
84```
85--memory size=1G,hotplug_size=1G
86```
87
88### `hotplugged_size`
89
90Amount of memory that will be dynamically added to the VM at boot. This option
91allows for starting a VM with a certain amount of memory that can be reduced
92during runtime.
93
94This is only valid when the `hotplug_method` is `virtio-mem` as it does not
95make sense for the `acpi` use case. When using ACPI, the memory can't be
96resized after it has been extended.
97
98This option is only valid when `hotplug_size` is specified, and its value can't
99exceed the value of `hotplug_size`.
100
101Value is an unsigned integer of 64 bits. A value of 0 is invalid.
102
103_Example_
104
105```
106--memory size=1G,hotplug_method=virtio-mem,hotplug_size=1G,hotplugged_size=512M
107```
108
109### `shared`
110
111Specifies if the memory must be `mmap(2)` with `MAP_SHARED` flag.
112
113By sharing a memory mapping, one can share the guest RAM with other processes
114running on the host. One can use this option when running vhost-user devices
115as part of the VM device model, as they will be driven by standalone daemons
116needing access to the guest RAM content.
117
118By default this option is turned off, which results in performing `mmap(2)`
119with `MAP_PRIVATE` flag.
120
121If `hugepages=on` then the value of this field is ignored as huge pages always
122requires `MAP_SHARED`.
123
124_Example_
125
126```
127--memory size=1G,shared=on
128```
129
130### `hugepages` and `hugepage_size`
131
132Specifies if the memory must be created and `mmap(2)` with `MAP_HUGETLB` and size
133flags. This performs a memory mapping relying on the specified huge page size.
134If no huge page size is supplied the system's default huge page size is used.
135
136By using hugepages, one can improve the overall performance of the VM, assuming
137the guest will allocate hugepages as well. Another interesting use case is VFIO
138as it speeds up the VM's boot time since the amount of IOMMU mappings are
139reduced.
140
141The user is responsible for ensuring there are sufficient huge pages of the
142specified size for the VMM to use. Failure to do so may result in strange VMM
143behaviour, e.g. error with `ReadKernelImage` is common. If there is a strange
144error with `hugepages` enabled, just disable it or check whether there are enough
145huge pages.
146
147If `hugepages=on` then the value of `shared` is ignored as huge pages always
148requires `MAP_SHARED`.
149
150By default this option is turned off.
151
152_Example_
153
154```
155--memory size=1G,hugepages=on,hugepage_size=2M
156```
157
158### `prefault`
159
160Specifies if the memory must be `mmap(2)` with `MAP_POPULATE` flag.
161
162By triggering prefault, one can allocate all required physical memory and create
163its page tables while calling `mmap`. With physical memory allocated, the number
164of page faults will decrease during running, and performance will also improve.
165
166Note that boot of VM will be slower with `prefault` enabled because of allocating
167physical memory and creating page tables in advance, and physical memory of the
168specified size will be consumed quickly.
169
170This option only takes effect at boot of VM. There is also a `prefault` option in
171restore and its choice will overwrite `prefault` in memory.
172
173By default this option is turned off.
174
175_Example_
176
177```
178--memory size=1G,prefault=on
179```
180
181### `thp`
182
183Specifies if private anonymous memory for the guest (i.e. `shared=off` and no
184backing file) should be labelled `MADV_HUGEPAGE` with `madvise(2)` indicating
185to the kernel that this memory may be backed with huge pages transparently.
186
187The use of transparent huge pages can improve the performance of the guest as
188there will fewer virtualisation related page faults. Unlike using
189`hugepages=on` a specific number of huge pages do not need to be allocated by
190the kernel.
191
192By default this option is turned on.
193
194_Example_
195
196```
197--memory size=1G,thp=on
198```
199
200## Advanced Parameters
201
202`MemoryZoneConfig` or what is known as `--memory-zone` from the CLI perspective
203is a power user parameter. It allows for a full description of the guest RAM,
204describing how every memory region is backed and exposed to the guest.
205
206```rust
207struct MemoryZoneConfig {
208    id: String,
209    size: u64,
210    file: Option<PathBuf>,
211    shared: bool,
212    hugepages: bool,
213    hugepage_size: Option<u64>,
214    host_numa_node: Option<u32>,
215    hotplug_size: Option<u64>,
216    hotplugged_size: Option<u64>,
217    prefault: bool,
218}
219```
220
221```
222--memory-zone <memory-zone>	User defined memory zone parameters "size=<guest_memory_region_size>,file=<backing_file>,shared=on|off,hugepages=on|off,hugepage_size=<hugepage_size>,host_numa_node=<node_id>,id=<zone_identifier>,hotplug_size=<hotpluggable_memory_size>,hotplugged_size=<hotplugged_memory_size>,prefault=on|off"
223```
224
225This parameter expects one or more occurences, allowing for a list of memory
226zones to be defined. It must be used with `--memory size=0`, clearly indicating
227that the memory will be described through advanced parameters.
228
229Each zone is given a list of options which we detail through the following
230sections.
231
232### `id`
233
234Memory zone identifier. This identifier must be unique, otherwise an error will
235be returned.
236
237This option is useful when referring to a memory zone previously created. In
238particular, the `--numa` parameter can associate a memory zone to a specific
239NUMA node based on the memory zone identifier.
240
241This option is mandatory when using the `--memory-zone` parameter.
242
243Value is a string.
244
245_Example_
246
247```
248--memory size=0
249--memory-zone id=mem0,size=1G
250```
251
252### `size`
253
254Size of the memory zone.
255
256This option is mandatory when using the `--memory-zone` parameter.
257
258Value is an unsigned integer of 64 bits.
259
260_Example_
261
262```
263--memory size=0
264--memory-zone id=mem0,size=1G
265```
266
267### `file`
268
269Path to the file backing the memory zone. This can be either a file or a
270directory. In case of a file, it will be opened and used as the backing file
271for the `mmap(2)` operation. In case of a directory, a temporary file with no
272hard link on the filesystem will be created. This file will be used as the
273backing file for the `mmap(2)` operation.
274
275This option can be particularly useful when trying to back a part of the guest
276RAM with a well known file. In the context of the snapshot/restore feature, and
277if the provided path is a file, the snapshot operation will not perform any
278copy of the guest RAM content for this specific memory zone since the user has
279access to it and it would duplicate data already stored on the current
280filesystem.
281
282Value is a string.
283
284_Example_
285
286```
287--memory size=0
288--memory-zone id=mem0,size=1G,file=/foo/bar
289```
290
291### `shared`
292
293Specifies if the memory zone must be `mmap(2)` with `MAP_SHARED` flag.
294
295By sharing a memory zone mapping, one can share part of the guest RAM with
296other processes running on the host. One can use this option when running
297vhost-user devices as part of the VM device model, as they will be driven
298by standalone daemons needing access to the guest RAM content.
299
300If `hugepages=on` then the value of this field is ignored as huge pages always
301requires `MAP_SHARED`.
302
303By default this option is turned off, which result in performing `mmap(2)`
304with `MAP_PRIVATE` flag.
305
306_Example_
307
308```
309--memory size=0
310--memory-zone id=mem0,size=1G,shared=on
311```
312
313### `hugepages` and `hugepage_size`
314
315Specifies if the memory must be created and `mmap(2)` with `MAP_HUGETLB` and size
316flags. This performs a memory mapping relying on the specified huge page size.
317If no huge page size is supplied the system's default huge page size is used.
318
319By using hugepages, one can improve the overall performance of the VM, assuming
320the guest will allocate hugepages as well. Another interesting use case is VFIO
321as it speeds up the VM's boot time since the amount of IOMMU mappings are
322reduced.
323
324The user is responsible for ensuring there are sufficient huge pages of the
325specified size for the VMM to use. Failure to do so may result in strange VMM
326behaviour, e.g. error with `ReadKernelImage` is common. If there is a strange
327error with `hugepages` enabled, just disable it or check whether there are enough
328huge pages.
329
330If `hugepages=on` then the value of `shared` is ignored as huge pages always
331requires `MAP_SHARED`.
332
333By default this option is turned off.
334
335_Example_
336
337```
338--memory size=0
339--memory-zone id=mem0,size=1G,hugepages=on,hugepage_size=2M
340```
341
342### `host_numa_node`
343
344Node identifier of a node present on the host. This option will let the user
345pick a specific NUMA node from which the memory must be allocated. After the
346memory zone is `mmap(2)`, the NUMA policy for this memory mapping will be
347applied through `mbind(2)`, relying on the provided node identifier. If the
348node does not exist on the host, the call to `mbind(2)` will fail.
349
350This option is useful when trying to back a VM memory with a specific type of
351memory from the host. Assuming a host has two types of memory, with one slower
352than the other, each related to a distinct NUMA node, one could create a VM
353with slower memory accesses by backing the entire guest RAM from the furthest
354NUMA node on the host.
355
356This option also gives the opportunity to create a VM with non uniform memory
357accesses as one could define a first memory zone backed by fast memory, and a
358second memory zone backed by slow memory.
359
360Value is an unsigned integer of 32 bits.
361
362_Example_
363
364```
365--memory size=0
366--memory-zone id=mem0,size=1G,host_numa_node=0
367```
368
369### `hotplug_size`
370
371Amount of memory that can be dynamically added to the memory zone. Since
372`virtio-mem` is the only way of resizing a memory zone, one must specify
373the `hotplug_method=virtio-mem` to the `--memory` parameter.
374
375Value is an unsigned integer of 64 bits. A value of 0 is invalid.
376
377_Example_
378
379```
380--memory size=0,hotplug_method=virtio-mem
381--memory-zone id=mem0,size=1G,hotplug_size=1G
382```
383
384### `hotplugged_size`
385
386Amount of memory that will be dynamically added to a memory zone at VM's boot.
387This option allows for starting a VM with a certain amount of memory that can
388be reduced during runtime.
389
390This is only valid when the `hotplug_method` is `virtio-mem` as it does not
391make sense for the `acpi` use case. When using ACPI, the memory can't be
392resized after it has been extended.
393
394This option is only valid when `hotplug_size` is specified, and its value can't
395exceed the value of `hotplug_size`.
396
397Value is an unsigned integer of 64 bits. A value of 0 is invalid.
398
399_Example_
400
401```
402--memory size=0,hotplug_method=virtio-mem
403--memory-zone id=mem0,size=1G,hotplug_size=1G,hotplugged_size=512M
404```
405
406### `prefault`
407
408Specifies if the memory must be `mmap(2)` with `MAP_POPULATE` flag.
409
410By triggering prefault, one can allocate all required physical memory and create
411its page tables while calling `mmap`. With physical memory allocated, the number
412of page faults will decrease during running, and performance will also improve.
413
414Note that boot of VM will be slower with `prefault` enabled because of allocating
415physical memory and creating page tables in advance, and physical memory of the
416specified size will be consumed quickly.
417
418This option only takes effect at boot of VM. There is also a `prefault` option in
419restore and its choice will overwrite `prefault` in memory.
420
421By default this option is turned off.
422
423_Example_
424
425```
426--memory size=0
427--memory-zone id=mem0,size=1G,prefault=on
428```
429
430## NUMA settings
431
432`NumaConfig` or what is known as `--numa` from the CLI perspective has been
433introduced to define a guest NUMA topology. It allows for a fine description
434about the CPUs and memory ranges associated with each NUMA node. Additionally
435it allows for specifying the distance between each NUMA node.
436
437```rust
438struct NumaConfig {
439    guest_numa_id: u32,
440    cpus: Option<Vec<u8>>,
441    distances: Option<Vec<NumaDistance>>,
442    memory_zones: Option<Vec<String>>,
443    sgx_epc_sections: Option<Vec<String>>,
444}
445```
446
447```
448--numa <numa>	Settings related to a given NUMA node "guest_numa_id=<node_id>,cpus=<cpus_id>,distances=<list_of_distances_to_destination_nodes>,memory_zones=<list_of_memory_zones>,sgx_epc_sections=<list_of_sgx_epc_sections>"
449```
450
451### `guest_numa_id`
452
453Node identifier of a guest NUMA node. This identifier must be unique, otherwise
454an error will be returned.
455
456This option is mandatory when using the `--numa` parameter.
457
458Value is an unsigned integer of 32 bits.
459
460_Example_
461
462```
463--numa guest_numa_id=0
464```
465
466### `cpus`
467
468List of virtual CPUs attached to the guest NUMA node identified by the
469`guest_numa_id` option. This allows for describing a list of CPUs which
470must be seen by the guest as belonging to the NUMA node `guest_numa_id`.
471
472One can use this option for a fine grained description of the NUMA topology
473regarding the CPUs associated with it, which might help the guest run more
474efficiently.
475
476Multiple values can be provided to define the list. Each value is an unsigned
477integer of 8 bits.
478
479For instance, if one needs to attach all CPUs from 0 to 4 to a specific node,
480the syntax using `-` will help define a contiguous range with `cpus=0-4`. The
481same example could also be described with `cpus=[0,1,2,3,4]`.
482
483A combination of both `-` and `,` separators is useful when one might need to
484describe a list containing all CPUs from 0 to 99 and the CPU 255, as it could
485simply be described with `cpus=[0-99,255]`.
486
487As soon as one tries to describe a list of values, `[` and `]` must be used to
488demarcate the list.
489
490_Example_
491
492```
493--cpus boot=8
494--numa guest_numa_id=0,cpus=[1-3,7] guest_numa_id=1,cpus=[0,4-6]
495```
496
497### `distances`
498
499List of distances between the current NUMA node referred by `guest_numa_id`
500and the destination NUMA nodes listed along with distances. This option let
501the user choose the distances between guest NUMA nodes. This is important to
502provide an accurate description of the way non uniform memory accesses will
503perform in the guest.
504
505One or more tuple of two values must be provided through this option. The first
506value is an unsigned integer of 32 bits as it represents the destination NUMA
507node. The second value is an unsigned integer of 8 bits as it represents the
508distance between the current NUMA node and the destination NUMA node. The two
509values are separated by `@` (`value1@value2`), meaning the destination NUMA
510node `value1` is located at a distance of `value2`. Each tuple is separated
511from the others with `,` separator.
512
513As soon as one tries to describe a list of values, `[` and `]` must be used to
514demarcate the list.
515
516For instance, if one wants to define 3 NUMA nodes, with each node located at
517different distances, it can be described with the following example.
518
519_Example_
520
521```
522--numa guest_numa_id=0,distances=[1@15,2@25] guest_numa_id=1,distances=[0@15,2@20] guest_numa_id=2,distances=[0@25,1@20]
523```
524
525### `memory_zones`
526
527List of memory zones attached to the guest NUMA node identified by the
528`guest_numa_id` option. This allows for describing a list of memory ranges
529which must be seen by the guest as belonging to the NUMA node `guest_numa_id`.
530
531This option can be very useful and powerful when combined with `host_numa_node`
532option from `--memory-zone` parameter as it allows for creating a VM with non
533uniform memory accesses, and let the guest know about it. It allows for
534exposing memory zones through different NUMA nodes, which can help the guest
535workload run more efficiently.
536
537Multiple values can be provided to define the list. Each value is a string
538referring to an existing memory zone identifier. Values are separated from
539each other with the `,` separator.
540
541As soon as one tries to describe a list of values, `[` and `]` must be used to
542demarcate the list.
543
544Note that a memory zone must belong to a single NUMA node. The following
545configuration is incorrect, therefore not allowed:
546`--numa guest_numa_id=0,memory_zones=mem0 guest_numa_id=1,memory_zones=mem0`
547
548_Example_
549
550```
551--memory size=0
552--memory-zone id=mem0,size=1G id=mem1,size=1G id=mem2,size=1G
553--numa guest_numa_id=0,memory_zones=[mem0,mem2] guest_numa_id=1,memory_zones=mem1
554```
555
556### `sgx_epc_sections`
557
558List of SGX EPC sections attached to the guest NUMA node identified by the
559`guest_numa_id` option. This allows for describing a list of SGX EPC sections
560which must be seen by the guest as belonging to the NUMA node `guest_numa_id`.
561
562Multiple values can be provided to define the list. Each value is a string
563referring to an existing SGX EPC section identifier. Values are separated from
564each other with the `,` separator.
565
566As soon as one tries to describe a list of values, `[` and `]` must be used to
567demarcate the list.
568
569_Example_
570
571```
572--sgx-epc id=epc0,size=32M id=epc1,size=64M id=epc2,size=32M
573--numa guest_numa_id=0,sgx_epc_sections=epc1 guest_numa_id=1,sgx_epc_sections=[epc0,epc2]
574```
575
576### PCI bus
577
578Cloud Hypervisor supports only one PCI bus, which is why it has been tied to
579the NUMA node 0 by default. It is the user responsibility to organize the NUMA
580nodes correctly so that vCPUs and guest RAM which should be located on the same
581NUMA node as the PCI bus end up on the NUMA node 0.
582