xref: /cloud-hypervisor/docs/memory.md (revision 7d7bfb2034001d4cb15df2ddc56d2d350c8da30f)
1# Memory
2
3Cloud Hypervisor has many ways to expose memory to the guest VM. This document
4aims to explain what Cloud Hypervisor is capable of and how it can be used to
5meet the needs of very different use cases.
6
7## Basic Parameters
8
9`MemoryConfig` or what is known as `--memory` from the CLI perspective is the
10easiest way to get started with Cloud Hypervisor.
11
12```rust
13struct MemoryConfig {
14    size: u64,
15    mergeable: bool,
16    hotplug_method: HotplugMethod,
17    hotplug_size: Option<u64>,
18    hotplugged_size: Option<u64>,
19    shared: bool,
20    hugepages: bool,
21    hugepage_size: Option<u64>,
22    prefault: bool,
23    zones: Option<Vec<MemoryZoneConfig>>,
24}
25```
26
27```
28--memory <memory>	Memory parameters "size=<guest_memory_size>,mergeable=on|off,shared=on|off,hugepages=on|off,hugepage_size=<hugepage_size>,hotplug_method=acpi|virtio-mem,hotplug_size=<hotpluggable_memory_size>,hotplugged_size=<hotplugged_memory_size>,prefault=on|off" [default: size=512M]
29```
30
31### `size`
32
33Size of the RAM in the guest VM.
34
35This option is mandatory when using the `--memory` parameter.
36
37Value is an unsigned integer of 64 bits.
38
39_Example_
40
41```
42--memory size=1G
43```
44
45### `mergeable`
46
47Specifies if the pages from the guest RAM must be marked as _mergeable_. In
48case this option is `true` or `on`, the pages will be marked with `madvise(2)`
49to let the host kernel know which pages are eligible for being merged by the
50KSM daemon.
51
52This option can be used when trying to reach a higher density of VMs running
53on a single host, as it will reduce the amount of memory consumed by each VM.
54
55By default this option is turned off.
56
57_Example_
58
59```
60--memory size=1G,mergeable=on
61```
62
63### `hotplug_method`
64
65Selects the way of adding and/or removing memory to/from a booted VM.
66
67Possible values are `acpi` and `virtio-mem`. Default value is `acpi`.
68
69_Example_
70
71```
72--memory size=1G,hotplug_method=acpi
73```
74
75### `hotplug_size`
76
77Amount of memory that can be dynamically added to the VM.
78
79Value is an unsigned integer of 64 bits. A value of 0 is invalid.
80
81_Example_
82
83```
84--memory size=1G,hotplug_size=1G
85```
86
87### `hotplugged_size`
88
89Amount of memory that will be dynamically added to the VM at boot. This option
90allows for starting a VM with a certain amount of memory that can be reduced
91during runtime.
92
93This is only valid when the `hotplug_method` is `virtio-mem` as it does not
94make sense for the `acpi` use case. When using ACPI, the memory can't be
95resized after it has been extended.
96
97This option is only valid when `hotplug_size` is specified, and its value can't
98exceed the value of `hotplug_size`.
99
100Value is an unsigned integer of 64 bits. A value of 0 is invalid.
101
102_Example_
103
104```
105--memory size=1G,hotplug_method=virtio-mem,hotplug_size=1G,hotplugged_size=512M
106```
107
108### `shared`
109
110Specifies if the memory must be `mmap(2)` with `MAP_SHARED` flag.
111
112By sharing a memory mapping, one can share the guest RAM with other processes
113running on the host. One can use this option when running vhost-user devices
114as part of the VM device model, as they will be driven by standalone daemons
115needing access to the guest RAM content.
116
117By default this option is turned off, which results in performing `mmap(2)`
118with `MAP_PRIVATE` flag.
119
120_Example_
121
122```
123--memory size=1G,shared=on
124```
125
126### `hugepages` and `hugepage_size`
127
128Specifies if the memory must be created and `mmap(2)` with `MAP_HUGETLB` and size
129flags. This performs a memory mapping relying on the specified huge page size.
130If no huge page size is supplied the system's default huge page size is used.
131
132By using hugepages, one can improve the overall performance of the VM, assuming
133the guest will allocate hugepages as well. Another interesting use case is VFIO
134as it speeds up the VM's boot time since the amount of IOMMU mappings are
135reduced.
136
137The user is responsible for ensuring there are sufficient huge pages of the
138specified size for the VMM to use. Failure to do so may result in strange VMM
139behaviour, e.g. error with `ReadKernelImage` is common. If there is a strange
140error with `hugepages` enabled, just disable it or check whether there are enough
141huge pages.
142
143By default this option is turned off.
144
145_Example_
146
147```
148--memory size=1G,hugepages=on,hugepage_size=2M
149```
150
151### `prefault`
152
153Specifies if the memory must be `mmap(2)` with `MAP_POPULATE` flag.
154
155By triggering prefault, one can allocate all required physical memory and create
156its page tables while calling `mmap`. With physical memory allocated, the number
157of page faults will decrease during running, and performance will also improve.
158
159Note that boot of VM will be slower with `prefault` enabled because of allocating
160physical memory and creating page tables in advance, and physical memory of the
161specified size will be consumed quickly.
162
163This option only takes effect at boot of VM. There is also a `prefault` option in
164restore and its choice will overwrite `prefault` in memory.
165
166By default this option is turned off.
167
168_Example_
169
170```
171--memory size=1G,prefault=on
172```
173
174## Advanced Parameters
175
176`MemoryZoneConfig` or what is known as `--memory-zone` from the CLI perspective
177is a power user parameter. It allows for a full description of the guest RAM,
178describing how every memory region is backed and exposed to the guest.
179
180```rust
181struct MemoryZoneConfig {
182    id: String,
183    size: u64,
184    file: Option<PathBuf>,
185    shared: bool,
186    hugepages: bool,
187    hugepage_size: Option<u64>,
188    host_numa_node: Option<u32>,
189    hotplug_size: Option<u64>,
190    hotplugged_size: Option<u64>,
191    prefault: bool,
192}
193```
194
195```
196--memory-zone <memory-zone>	User defined memory zone parameters "size=<guest_memory_region_size>,file=<backing_file>,shared=on|off,hugepages=on|off,hugepage_size=<hugepage_size>,host_numa_node=<node_id>,id=<zone_identifier>,hotplug_size=<hotpluggable_memory_size>,hotplugged_size=<hotplugged_memory_size>,prefault=on|off"
197```
198
199This parameter expects one or more occurences, allowing for a list of memory
200zones to be defined. It must be used with `--memory size=0`, clearly indicating
201that the memory will be described through advanced parameters.
202
203Each zone is given a list of options which we detail through the following
204sections.
205
206### `id`
207
208Memory zone identifier. This identifier must be unique, otherwise an error will
209be returned.
210
211This option is useful when referring to a memory zone previously created. In
212particular, the `--numa` parameter can associate a memory zone to a specific
213NUMA node based on the memory zone identifier.
214
215This option is mandatory when using the `--memory-zone` parameter.
216
217Value is a string.
218
219_Example_
220
221```
222--memory size=0
223--memory-zone id=mem0,size=1G
224```
225
226### `size`
227
228Size of the memory zone.
229
230This option is mandatory when using the `--memory-zone` parameter.
231
232Value is an unsigned integer of 64 bits.
233
234_Example_
235
236```
237--memory size=0
238--memory-zone id=mem0,size=1G
239```
240
241### `file`
242
243Path to the file backing the memory zone. This can be either a file or a
244directory. In case of a file, it will be opened and used as the backing file
245for the `mmap(2)` operation. In case of a directory, a temporary file with no
246hard link on the filesystem will be created. This file will be used as the
247backing file for the `mmap(2)` operation.
248
249This option can be particularly useful when trying to back a part of the guest
250RAM with a well known file. In the context of the snapshot/restore feature, and
251if the provided path is a file, the snapshot operation will not perform any
252copy of the guest RAM content for this specific memory zone since the user has
253access to it and it would duplicate data already stored on the current
254filesystem.
255
256Value is a string.
257
258_Example_
259
260```
261--memory size=0
262--memory-zone id=mem0,size=1G,file=/foo/bar
263```
264
265### `shared`
266
267Specifies if the memory zone must be `mmap(2)` with `MAP_SHARED` flag.
268
269By sharing a memory zone mapping, one can share part of the guest RAM with
270other processes running on the host. One can use this option when running
271vhost-user devices as part of the VM device model, as they will be driven
272by standalone daemons needing access to the guest RAM content.
273
274By default this option is turned off, which result in performing `mmap(2)`
275with `MAP_PRIVATE` flag.
276
277_Example_
278
279```
280--memory size=0
281--memory-zone id=mem0,size=1G,shared=on
282```
283
284### `hugepages` and `hugepage_size`
285
286Specifies if the memory must be created and `mmap(2)` with `MAP_HUGETLB` and size
287flags. This performs a memory mapping relying on the specified huge page size.
288If no huge page size is supplied the system's default huge page size is used.
289
290By using hugepages, one can improve the overall performance of the VM, assuming
291the guest will allocate hugepages as well. Another interesting use case is VFIO
292as it speeds up the VM's boot time since the amount of IOMMU mappings are
293reduced.
294
295The user is responsible for ensuring there are sufficient huge pages of the
296specified size for the VMM to use. Failure to do so may result in strange VMM
297behaviour, e.g. error with `ReadKernelImage` is common. If there is a strange
298error with `hugepages` enabled, just disable it or check whether there are enough
299huge pages.
300
301By default this option is turned off.
302
303_Example_
304
305```
306--memory size=0
307--memory-zone id=mem0,size=1G,hugepages=on,hugepage_size=2M
308```
309
310### `host_numa_node`
311
312Node identifier of a node present on the host. This option will let the user
313pick a specific NUMA node from which the memory must be allocated. After the
314memory zone is `mmap(2)`, the NUMA policy for this memory mapping will be
315applied through `mbind(2)`, relying on the provided node identifier. If the
316node does not exist on the host, the call to `mbind(2)` will fail.
317
318This option is useful when trying to back a VM memory with a specific type of
319memory from the host. Assuming a host has two types of memory, with one slower
320than the other, each related to a distinct NUMA node, one could create a VM
321with slower memory accesses by backing the entire guest RAM from the furthest
322NUMA node on the host.
323
324This option also gives the opportunity to create a VM with non uniform memory
325accesses as one could define a first memory zone backed by fast memory, and a
326second memory zone backed by slow memory.
327
328Value is an unsigned integer of 32 bits.
329
330_Example_
331
332```
333--memory size=0
334--memory-zone id=mem0,size=1G,host_numa_node=0
335```
336
337### `hotplug_size`
338
339Amount of memory that can be dynamically added to the memory zone. Since
340`virtio-mem` is the only way of resizing a memory zone, one must specify
341the `hotplug_method=virtio-mem` to the `--memory` parameter.
342
343Value is an unsigned integer of 64 bits. A value of 0 is invalid.
344
345_Example_
346
347```
348--memory size=0,hotplug_method=virtio-mem
349--memory-zone id=mem0,size=1G,hotplug_size=1G
350```
351
352### `hotplugged_size`
353
354Amount of memory that will be dynamically added to a memory zone at VM's boot.
355This option allows for starting a VM with a certain amount of memory that can
356be reduced during runtime.
357
358This is only valid when the `hotplug_method` is `virtio-mem` as it does not
359make sense for the `acpi` use case. When using ACPI, the memory can't be
360resized after it has been extended.
361
362This option is only valid when `hotplug_size` is specified, and its value can't
363exceed the value of `hotplug_size`.
364
365Value is an unsigned integer of 64 bits. A value of 0 is invalid.
366
367_Example_
368
369```
370--memory size=0,hotplug_method=virtio-mem
371--memory-zone id=mem0,size=1G,hotplug_size=1G,hotplugged_size=512M
372```
373
374### `prefault`
375
376Specifies if the memory must be `mmap(2)` with `MAP_POPULATE` flag.
377
378By triggering prefault, one can allocate all required physical memory and create
379its page tables while calling `mmap`. With physical memory allocated, the number
380of page faults will decrease during running, and performance will also improve.
381
382Note that boot of VM will be slower with `prefault` enabled because of allocating
383physical memory and creating page tables in advance, and physical memory of the
384specified size will be consumed quickly.
385
386This option only takes effect at boot of VM. There is also a `prefault` option in
387restore and its choice will overwrite `prefault` in memory.
388
389By default this option is turned off.
390
391_Example_
392
393```
394--memory size=0
395--memory-zone id=mem0,size=1G,prefault=on
396```
397
398## NUMA settings
399
400`NumaConfig` or what is known as `--numa` from the CLI perspective has been
401introduced to define a guest NUMA topology. It allows for a fine description
402about the CPUs and memory ranges associated with each NUMA node. Additionally
403it allows for specifying the distance between each NUMA node.
404
405```rust
406struct NumaConfig {
407    guest_numa_id: u32,
408    cpus: Option<Vec<u8>>,
409    distances: Option<Vec<NumaDistance>>,
410    memory_zones: Option<Vec<String>>,
411    sgx_epc_sections: Option<Vec<String>>,
412}
413```
414
415```
416--numa <numa>	Settings related to a given NUMA node "guest_numa_id=<node_id>,cpus=<cpus_id>,distances=<list_of_distances_to_destination_nodes>,memory_zones=<list_of_memory_zones>,sgx_epc_sections=<list_of_sgx_epc_sections>"
417```
418
419### `guest_numa_id`
420
421Node identifier of a guest NUMA node. This identifier must be unique, otherwise
422an error will be returned.
423
424This option is mandatory when using the `--numa` parameter.
425
426Value is an unsigned integer of 32 bits.
427
428_Example_
429
430```
431--numa guest_numa_id=0
432```
433
434### `cpus`
435
436List of virtual CPUs attached to the guest NUMA node identified by the
437`guest_numa_id` option. This allows for describing a list of CPUs which
438must be seen by the guest as belonging to the NUMA node `guest_numa_id`.
439
440One can use this option for a fine grained description of the NUMA topology
441regarding the CPUs associated with it, which might help the guest run more
442efficiently.
443
444Multiple values can be provided to define the list. Each value is an unsigned
445integer of 8 bits.
446
447For instance, if one needs to attach all CPUs from 0 to 4 to a specific node,
448the syntax using `-` will help define a contiguous range with `cpus=0-4`. The
449same example could also be described with `cpus=[0,1,2,3,4]`.
450
451A combination of both `-` and `,` separators is useful when one might need to
452describe a list containing all CPUs from 0 to 99 and the CPU 255, as it could
453simply be described with `cpus=[0-99,255]`.
454
455As soon as one tries to describe a list of values, `[` and `]` must be used to
456demarcate the list.
457
458_Example_
459
460```
461--cpus boot=8
462--numa guest_numa_id=0,cpus=[1-3,7] guest_numa_id=1,cpus=[0,4-6]
463```
464
465### `distances`
466
467List of distances between the current NUMA node referred by `guest_numa_id`
468and the destination NUMA nodes listed along with distances. This option let
469the user choose the distances between guest NUMA nodes. This is important to
470provide an accurate description of the way non uniform memory accesses will
471perform in the guest.
472
473One or more tuple of two values must be provided through this option. The first
474value is an unsigned integer of 32 bits as it represents the destination NUMA
475node. The second value is an unsigned integer of 8 bits as it represents the
476distance between the current NUMA node and the destination NUMA node. The two
477values are separated by `@` (`value1@value2`), meaning the destination NUMA
478node `value1` is located at a distance of `value2`. Each tuple is separated
479from the others with `,` separator.
480
481As soon as one tries to describe a list of values, `[` and `]` must be used to
482demarcate the list.
483
484For instance, if one wants to define 3 NUMA nodes, with each node located at
485different distances, it can be described with the following example.
486
487_Example_
488
489```
490--numa guest_numa_id=0,distances=[1@15,2@25] guest_numa_id=1,distances=[0@15,2@20] guest_numa_id=2,distances=[0@25,1@20]
491```
492
493### `memory_zones`
494
495List of memory zones attached to the guest NUMA node identified by the
496`guest_numa_id` option. This allows for describing a list of memory ranges
497which must be seen by the guest as belonging to the NUMA node `guest_numa_id`.
498
499This option can be very useful and powerful when combined with `host_numa_node`
500option from `--memory-zone` parameter as it allows for creating a VM with non
501uniform memory accesses, and let the guest know about it. It allows for
502exposing memory zones through different NUMA nodes, which can help the guest
503workload run more efficiently.
504
505Multiple values can be provided to define the list. Each value is a string
506referring to an existing memory zone identifier. Values are separated from
507each other with the `,` separator.
508
509As soon as one tries to describe a list of values, `[` and `]` must be used to
510demarcate the list.
511
512Note that a memory zone must belong to a single NUMA node. The following
513configuration is incorrect, therefore not allowed:
514`--numa guest_numa_id=0,memory_zones=mem0 guest_numa_id=1,memory_zones=mem0`
515
516_Example_
517
518```
519--memory size=0
520--memory-zone id=mem0,size=1G id=mem1,size=1G id=mem2,size=1G
521--numa guest_numa_id=0,memory_zones=[mem0,mem2] guest_numa_id=1,memory_zones=mem1
522```
523
524### `sgx_epc_sections`
525
526List of SGX EPC sections attached to the guest NUMA node identified by the
527`guest_numa_id` option. This allows for describing a list of SGX EPC sections
528which must be seen by the guest as belonging to the NUMA node `guest_numa_id`.
529
530Multiple values can be provided to define the list. Each value is a string
531referring to an existing SGX EPC section identifier. Values are separated from
532each other with the `,` separator.
533
534As soon as one tries to describe a list of values, `[` and `]` must be used to
535demarcate the list.
536
537_Example_
538
539```
540--sgx-epc id=epc0,size=32M id=epc1,size=64M id=epc2,size=32M
541--numa guest_numa_id=0,sgx_epc_sections=epc1 guest_numa_id=1,sgx_epc_sections=[epc0,epc2]
542```
543
544### PCI bus
545
546Cloud Hypervisor supports only one PCI bus, which is why it has been tied to
547the NUMA node 0 by default. It is the user responsibility to organize the NUMA
548nodes correctly so that vCPUs and guest RAM which should be located on the same
549NUMA node as the PCI bus end up on the NUMA node 0.
550