xref: /cloud-hypervisor/docs/memory.md (revision 7bf0cc1ed518c9d854caeec24f30715c1414fc56)
1# Memory
2
3Cloud Hypervisor has many ways to expose memory to the guest VM. This document
4aims to explain what Cloud Hypervisor is capable of and how it can be used to
5meet the needs of very different use cases.
6
7## Basic Parameters
8
9`MemoryConfig` or what is known as `--memory` from the CLI perspective is the
10easiest way to get started with Cloud Hypervisor.
11
12```rust
13struct MemoryConfig {
14    size: u64,
15    mergeable: bool,
16    hotplug_method: HotplugMethod,
17    hotplug_size: Option<u64>,
18    hotplugged_size: Option<u64>,
19    shared: bool,
20    hugepages: bool,
21    hugepage_size: Option<u64>,
22    prefault: bool,
23    thp: bool
24    zones: Option<Vec<MemoryZoneConfig>>,
25}
26```
27
28```
29--memory <memory>	Memory parameters "size=<guest_memory_size>,mergeable=on|off,shared=on|off,hugepages=on|off,hugepage_size=<hugepage_size>,hotplug_method=acpi|virtio-mem,hotplug_size=<hotpluggable_memory_size>,hotplugged_size=<hotplugged_memory_size>,prefault=on|off,thp=on|off" [default: size=512M,thp=on]
30```
31
32### `size`
33
34Size of the RAM in the guest VM.
35
36This option is mandatory when using the `--memory` parameter.
37
38Value is an unsigned integer of 64 bits.
39
40_Example_
41
42```
43--memory size=1G
44```
45
46### `mergeable`
47
48Specifies if the pages from the guest RAM must be marked as _mergeable_. In
49case this option is `true` or `on`, the pages will be marked with `madvise(2)`
50to let the host kernel know which pages are eligible for being merged by the
51KSM daemon.
52
53This option can be used when trying to reach a higher density of VMs running
54on a single host, as it will reduce the amount of memory consumed by each VM.
55
56By default this option is turned off.
57
58_Example_
59
60```
61--memory size=1G,mergeable=on
62```
63
64### `hotplug_method`
65
66Selects the way of adding and/or removing memory to/from a booted VM.
67
68Possible values are `acpi` and `virtio-mem`. Default value is `acpi`.
69
70_Example_
71
72```
73--memory size=1G,hotplug_method=acpi
74```
75
76### `hotplug_size`
77
78Amount of memory that can be dynamically added to the VM.
79
80Value is an unsigned integer of 64 bits. A value of 0 is invalid.
81
82_Example_
83
84```
85--memory size=1G,hotplug_size=1G
86```
87
88### `hotplugged_size`
89
90Amount of memory that will be dynamically added to the VM at boot. This option
91allows for starting a VM with a certain amount of memory that can be reduced
92during runtime.
93
94This is only valid when the `hotplug_method` is `virtio-mem` as it does not
95make sense for the `acpi` use case. When using ACPI, the memory can't be
96resized after it has been extended.
97
98This option is only valid when `hotplug_size` is specified, and its value can't
99exceed the value of `hotplug_size`.
100
101Value is an unsigned integer of 64 bits. A value of 0 is invalid.
102
103_Example_
104
105```
106--memory size=1G,hotplug_method=virtio-mem,hotplug_size=1G,hotplugged_size=512M
107```
108
109### `shared`
110
111Specifies if the memory must be `mmap(2)` with `MAP_SHARED` flag.
112
113By sharing a memory mapping, one can share the guest RAM with other processes
114running on the host. One can use this option when running vhost-user devices
115as part of the VM device model, as they will be driven by standalone daemons
116needing access to the guest RAM content.
117
118By default this option is turned off, which results in performing `mmap(2)`
119with `MAP_PRIVATE` flag.
120
121If `hugepages=on` then the value of this field is ignored as huge pages always
122requires `MAP_SHARED`.
123
124_Example_
125
126```
127--memory size=1G,shared=on
128```
129
130### `hugepages` and `hugepage_size`
131
132Specifies if the memory must be created and `mmap(2)` with `MAP_HUGETLB` and size
133flags. This performs a memory mapping relying on the specified huge page size.
134If no huge page size is supplied the system's default huge page size is used.
135
136By using hugepages, one can improve the overall performance of the VM, assuming
137the guest will allocate hugepages as well. Another interesting use case is VFIO
138as it speeds up the VM's boot time since the amount of IOMMU mappings are
139reduced.
140
141The user is responsible for ensuring there are sufficient huge pages of the
142specified size for the VMM to use. Failure to do so may result in strange VMM
143behaviour, e.g. error with `ReadKernelImage` is common. If there is a strange
144error with `hugepages` enabled, just disable it or check whether there are enough
145huge pages.
146
147If `hugepages=on` then the value of `shared` is ignored as huge pages always
148requires `MAP_SHARED`.
149
150By default this option is turned off.
151
152_Example_
153
154```
155--memory size=1G,hugepages=on,hugepage_size=2M
156```
157
158### `prefault`
159
160Specifies if the memory must be `mmap(2)` with `MAP_POPULATE` flag.
161
162By triggering prefault, one can allocate all required physical memory and create
163its page tables while calling `mmap`. With physical memory allocated, the number
164of page faults will decrease during running, and performance will also improve.
165
166Note that boot of VM will be slower with `prefault` enabled because of allocating
167physical memory and creating page tables in advance, and physical memory of the
168specified size will be consumed quickly.
169
170This option only takes effect at boot of VM. There is also a `prefault` option in
171restore and its choice will overwrite `prefault` in memory.
172
173By default this option is turned off.
174
175_Example_
176
177```
178--memory size=1G,prefault=on
179```
180
181### `thp`
182
183Specifies if private anonymous memory for the guest (i.e. `shared=off` and no
184backing file) should be labelled `MADV_HUGEPAGE` with `madvise(2)` indicating
185to the kernel that this memory may be backed with huge pages transparently.
186
187The use of transparent huge pages can improve the performance of the guest as
188there will fewer virtualisation related page faults. Unlike using
189`hugepages=on` a specific number of huge pages do not need to be allocated by
190the kernel.
191
192By default this option is turned on.
193
194_Example_
195
196```
197--memory size=1G,thp=on
198```
199
200## Advanced Parameters
201
202`MemoryZoneConfig` or what is known as `--memory-zone` from the CLI perspective
203is a power user parameter. It allows for a full description of the guest RAM,
204describing how every memory region is backed and exposed to the guest.
205
206```rust
207struct MemoryZoneConfig {
208    id: String,
209    size: u64,
210    file: Option<PathBuf>,
211    shared: bool,
212    hugepages: bool,
213    hugepage_size: Option<u64>,
214    host_numa_node: Option<u32>,
215    hotplug_size: Option<u64>,
216    hotplugged_size: Option<u64>,
217    prefault: bool,
218}
219```
220
221```
222--memory-zone <memory-zone>	User defined memory zone parameters "size=<guest_memory_region_size>,file=<backing_file>,shared=on|off,hugepages=on|off,hugepage_size=<hugepage_size>,host_numa_node=<node_id>,id=<zone_identifier>,hotplug_size=<hotpluggable_memory_size>,hotplugged_size=<hotplugged_memory_size>,prefault=on|off"
223```
224
225This parameter expects one or more occurrences, allowing for a list of memory
226zones to be defined. It must be used with `--memory size=0`, clearly indicating
227that the memory will be described through advanced parameters.
228
229Each zone is given a list of options which we detail through the following
230sections.
231
232### `id`
233
234Memory zone identifier. This identifier must be unique, otherwise an error will
235be returned.
236
237This option is useful when referring to a memory zone previously created. In
238particular, the `--numa` parameter can associate a memory zone to a specific
239NUMA node based on the memory zone identifier.
240
241This option is mandatory when using the `--memory-zone` parameter.
242
243Value is a string.
244
245_Example_
246
247```
248--memory size=0
249--memory-zone id=mem0,size=1G
250```
251
252### `size`
253
254Size of the memory zone.
255
256This option is mandatory when using the `--memory-zone` parameter.
257
258Value is an unsigned integer of 64 bits.
259
260_Example_
261
262```
263--memory size=0
264--memory-zone id=mem0,size=1G
265```
266
267### `file`
268
269Path to the file backing the memory zone. The file will be opened and used as
270the backing file for the `mmap(2)` operation.
271
272This option can be particularly useful when trying to back a part of the guest
273RAM with a well known file. In the context of the snapshot/restore feature, and
274if the provided path is a file, the snapshot operation will not perform any
275copy of the guest RAM content for this specific memory zone since the user has
276access to it and it would duplicate data already stored on the current
277filesystem.
278
279Value is a string.
280
281_Example_
282
283```
284--memory size=0
285--memory-zone id=mem0,size=1G,file=/foo/bar
286```
287
288### `shared`
289
290Specifies if the memory zone must be `mmap(2)` with `MAP_SHARED` flag.
291
292By sharing a memory zone mapping, one can share part of the guest RAM with
293other processes running on the host. One can use this option when running
294vhost-user devices as part of the VM device model, as they will be driven
295by standalone daemons needing access to the guest RAM content.
296
297If `hugepages=on` then the value of this field is ignored as huge pages always
298requires `MAP_SHARED`.
299
300By default this option is turned off, which result in performing `mmap(2)`
301with `MAP_PRIVATE` flag.
302
303_Example_
304
305```
306--memory size=0
307--memory-zone id=mem0,size=1G,shared=on
308```
309
310### `hugepages` and `hugepage_size`
311
312Specifies if the memory must be created and `mmap(2)` with `MAP_HUGETLB` and size
313flags. This performs a memory mapping relying on the specified huge page size.
314If no huge page size is supplied the system's default huge page size is used.
315
316By using hugepages, one can improve the overall performance of the VM, assuming
317the guest will allocate hugepages as well. Another interesting use case is VFIO
318as it speeds up the VM's boot time since the amount of IOMMU mappings are
319reduced.
320
321The user is responsible for ensuring there are sufficient huge pages of the
322specified size for the VMM to use. Failure to do so may result in strange VMM
323behaviour, e.g. error with `ReadKernelImage` is common. If there is a strange
324error with `hugepages` enabled, just disable it or check whether there are enough
325huge pages.
326
327If `hugepages=on` then the value of `shared` is ignored as huge pages always
328requires `MAP_SHARED`.
329
330By default this option is turned off.
331
332_Example_
333
334```
335--memory size=0
336--memory-zone id=mem0,size=1G,hugepages=on,hugepage_size=2M
337```
338
339### `host_numa_node`
340
341Node identifier of a node present on the host. This option will let the user
342pick a specific NUMA node from which the memory must be allocated. After the
343memory zone is `mmap(2)`, the NUMA policy for this memory mapping will be
344applied through `mbind(2)`, relying on the provided node identifier. If the
345node does not exist on the host, the call to `mbind(2)` will fail.
346
347This option is useful when trying to back a VM memory with a specific type of
348memory from the host. Assuming a host has two types of memory, with one slower
349than the other, each related to a distinct NUMA node, one could create a VM
350with slower memory accesses by backing the entire guest RAM from the furthest
351NUMA node on the host.
352
353This option also gives the opportunity to create a VM with non uniform memory
354accesses as one could define a first memory zone backed by fast memory, and a
355second memory zone backed by slow memory.
356
357Value is an unsigned integer of 32 bits.
358
359_Example_
360
361```
362--memory size=0
363--memory-zone id=mem0,size=1G,host_numa_node=0
364```
365
366### `hotplug_size`
367
368Amount of memory that can be dynamically added to the memory zone. Since
369`virtio-mem` is the only way of resizing a memory zone, one must specify
370the `hotplug_method=virtio-mem` to the `--memory` parameter.
371
372Value is an unsigned integer of 64 bits. A value of 0 is invalid.
373
374_Example_
375
376```
377--memory size=0,hotplug_method=virtio-mem
378--memory-zone id=mem0,size=1G,hotplug_size=1G
379```
380
381### `hotplugged_size`
382
383Amount of memory that will be dynamically added to a memory zone at VM's boot.
384This option allows for starting a VM with a certain amount of memory that can
385be reduced during runtime.
386
387This is only valid when the `hotplug_method` is `virtio-mem` as it does not
388make sense for the `acpi` use case. When using ACPI, the memory can't be
389resized after it has been extended.
390
391This option is only valid when `hotplug_size` is specified, and its value can't
392exceed the value of `hotplug_size`.
393
394Value is an unsigned integer of 64 bits. A value of 0 is invalid.
395
396_Example_
397
398```
399--memory size=0,hotplug_method=virtio-mem
400--memory-zone id=mem0,size=1G,hotplug_size=1G,hotplugged_size=512M
401```
402
403### `prefault`
404
405Specifies if the memory must be `mmap(2)` with `MAP_POPULATE` flag.
406
407By triggering prefault, one can allocate all required physical memory and create
408its page tables while calling `mmap`. With physical memory allocated, the number
409of page faults will decrease during running, and performance will also improve.
410
411Note that boot of VM will be slower with `prefault` enabled because of allocating
412physical memory and creating page tables in advance, and physical memory of the
413specified size will be consumed quickly.
414
415This option only takes effect at boot of VM. There is also a `prefault` option in
416restore and its choice will overwrite `prefault` in memory.
417
418By default this option is turned off.
419
420_Example_
421
422```
423--memory size=0
424--memory-zone id=mem0,size=1G,prefault=on
425```
426
427## NUMA settings
428
429`NumaConfig` or what is known as `--numa` from the CLI perspective has been
430introduced to define a guest NUMA topology. It allows for a fine description
431about the CPUs and memory ranges associated with each NUMA node. Additionally
432it allows for specifying the distance between each NUMA node.
433
434```rust
435struct NumaConfig {
436    guest_numa_id: u32,
437    cpus: Option<Vec<u8>>,
438    distances: Option<Vec<NumaDistance>>,
439    memory_zones: Option<Vec<String>>,
440    sgx_epc_sections: Option<Vec<String>>,
441}
442```
443
444```
445--numa <numa>	Settings related to a given NUMA node "guest_numa_id=<node_id>,cpus=<cpus_id>,distances=<list_of_distances_to_destination_nodes>,memory_zones=<list_of_memory_zones>,sgx_epc_sections=<list_of_sgx_epc_sections>"
446```
447
448### `guest_numa_id`
449
450Node identifier of a guest NUMA node. This identifier must be unique, otherwise
451an error will be returned.
452
453This option is mandatory when using the `--numa` parameter.
454
455Value is an unsigned integer of 32 bits.
456
457_Example_
458
459```
460--numa guest_numa_id=0
461```
462
463### `cpus`
464
465List of virtual CPUs attached to the guest NUMA node identified by the
466`guest_numa_id` option. This allows for describing a list of CPUs which
467must be seen by the guest as belonging to the NUMA node `guest_numa_id`.
468
469One can use this option for a fine grained description of the NUMA topology
470regarding the CPUs associated with it, which might help the guest run more
471efficiently.
472
473Multiple values can be provided to define the list. Each value is an unsigned
474integer of 8 bits.
475
476For instance, if one needs to attach all CPUs from 0 to 4 to a specific node,
477the syntax using `-` will help define a contiguous range with `cpus=0-4`. The
478same example could also be described with `cpus=[0,1,2,3,4]`.
479
480A combination of both `-` and `,` separators is useful when one might need to
481describe a list containing all CPUs from 0 to 99 and the CPU 255, as it could
482simply be described with `cpus=[0-99,255]`.
483
484As soon as one tries to describe a list of values, `[` and `]` must be used to
485demarcate the list.
486
487_Example_
488
489```
490--cpus boot=8
491--numa guest_numa_id=0,cpus=[1-3,7] guest_numa_id=1,cpus=[0,4-6]
492```
493
494### `distances`
495
496List of distances between the current NUMA node referred by `guest_numa_id`
497and the destination NUMA nodes listed along with distances. This option let
498the user choose the distances between guest NUMA nodes. This is important to
499provide an accurate description of the way non uniform memory accesses will
500perform in the guest.
501
502One or more tuple of two values must be provided through this option. The first
503value is an unsigned integer of 32 bits as it represents the destination NUMA
504node. The second value is an unsigned integer of 8 bits as it represents the
505distance between the current NUMA node and the destination NUMA node. The two
506values are separated by `@` (`value1@value2`), meaning the destination NUMA
507node `value1` is located at a distance of `value2`. Each tuple is separated
508from the others with `,` separator.
509
510As soon as one tries to describe a list of values, `[` and `]` must be used to
511demarcate the list.
512
513For instance, if one wants to define 3 NUMA nodes, with each node located at
514different distances, it can be described with the following example.
515
516_Example_
517
518```
519--numa guest_numa_id=0,distances=[1@15,2@25] --numa guest_numa_id=1,distances=[0@15,2@20] guest_numa_id=2,distances=[0@25,1@20]
520```
521
522### `memory_zones`
523
524List of memory zones attached to the guest NUMA node identified by the
525`guest_numa_id` option. This allows for describing a list of memory ranges
526which must be seen by the guest as belonging to the NUMA node `guest_numa_id`.
527
528This option can be very useful and powerful when combined with `host_numa_node`
529option from `--memory-zone` parameter as it allows for creating a VM with non
530uniform memory accesses, and let the guest know about it. It allows for
531exposing memory zones through different NUMA nodes, which can help the guest
532workload run more efficiently.
533
534Multiple values can be provided to define the list. Each value is a string
535referring to an existing memory zone identifier. Values are separated from
536each other with the `,` separator.
537
538As soon as one tries to describe a list of values, `[` and `]` must be used to
539demarcate the list.
540
541Note that a memory zone must belong to a single NUMA node. The following
542configuration is incorrect, therefore not allowed:
543`--numa guest_numa_id=0,memory_zones=mem0 --numa guest_numa_id=1,memory_zones=mem0`
544
545_Example_
546
547```
548--memory size=0
549--memory-zone id=mem0,size=1G id=mem1,size=1G --memory-zone id=mem2,size=1G
550--numa guest_numa_id=0,memory_zones=[mem0,mem2] --numa guest_numa_id=1,memory_zones=mem1
551```
552
553### `sgx_epc_sections`
554
555List of SGX EPC sections attached to the guest NUMA node identified by the
556`guest_numa_id` option. This allows for describing a list of SGX EPC sections
557which must be seen by the guest as belonging to the NUMA node `guest_numa_id`.
558
559Multiple values can be provided to define the list. Each value is a string
560referring to an existing SGX EPC section identifier. Values are separated from
561each other with the `,` separator.
562
563As soon as one tries to describe a list of values, `[` and `]` must be used to
564demarcate the list.
565
566_Example_
567
568```
569--sgx-epc id=epc0,size=32M id=epc1,size=64M id=epc2,size=32M
570--numa guest_numa_id=0,sgx_epc_sections=epc1 --numa guest_numa_id=1,sgx_epc_sections=[epc0,epc2]
571```
572
573### PCI bus
574
575Cloud Hypervisor supports only one PCI bus, which is why it has been tied to
576the NUMA node 0 by default. It is the user responsibility to organize the NUMA
577nodes correctly so that vCPUs and guest RAM which should be located on the same
578NUMA node as the PCI bus end up on the NUMA node 0.
579