xref: /cloud-hypervisor/docs/memory.md (revision f67b3f79ea19c9a66e04074cbbf5d292f6529e43)
1# Memory
2
3Cloud-Hypervisor has many ways to expose memory to the guest VM. This document
4aims to explain what Cloud-Hypervisor is capable of and how it can be used to
5meet the needs of very different use cases.
6
7## Basic Parameters
8
9`MemoryConfig` or what is known as `--memory` from the CLI perspective is the
10easiest way to get started with Cloud-Hypervisor.
11
12```rust
13struct MemoryConfig {
14    size: u64,
15    mergeable: bool,
16    shared: bool,
17    hugepages: bool,
18    hugepage_size: Option<u64>,
19    hotplug_method: HotplugMethod,
20    hotplug_size: Option<u64>,
21    hotplugged_size: Option<u64>,
22    zones: Option<Vec<MemoryZoneConfig>>,
23}
24```
25
26```
27--memory <memory>	Memory parameters "size=<guest_memory_size>,mergeable=on|off,shared=on|off,hugepages=on|off,hugepage_size=<hugepage_size>,hotplug_method=acpi|virtio-mem,hotplug_size=<hotpluggable_memory_size>,hotplugged_size=<hotplugged_memory_size>"
28```
29
30### `size`
31
32Size of the RAM in the guest VM.
33
34This option is mandatory when using the `--memory` parameter.
35
36Value is an unsigned integer of 64 bits.
37
38_Example_
39
40```
41--memory size=1G
42```
43
44### `mergeable`
45
46Specifies if the pages from the guest RAM must be marked as _mergeable_. In
47case this option is `true` or `on`, the pages will be marked with `madvise(2)`
48to let the host kernel know which pages are eligible for being merged by the
49KSM daemon.
50
51This option can be used when trying to reach a higher density of VMs running
52on a single host, as it will reduce the amount of memory consumed by each VM.
53
54By default this option is turned off.
55
56_Example_
57
58```
59--memory size=1G,mergeable=on
60```
61
62### `shared`
63
64Specifies if the memory must be `mmap(2)` with `MAP_SHARED` flag.
65
66By sharing a memory mapping, one can share the guest RAM with other processes
67running on the host. One can use this option when running vhost-user devices
68as part of the VM device model, as they will be driven by standalone daemons
69needing access to the guest RAM content.
70
71By default this option is turned off, which results in performing `mmap(2)`
72with `MAP_PRIVATE` flag.
73
74_Example_
75
76```
77--memory size=1G,shared=on
78```
79
80### `hugepages` and `hugepage_size`
81
82Specifies if the memory must be created and `mmap(2)` with `MAP_HUGETLB` and size
83flags. This performs a memory mapping relying on the specified huge page size. If no huge page size is supplied the system's default huge page size is used.
84
85By using hugepages, one can improve the overall performance of the VM, assuming
86the guest will allocate hugepages as well. Another interesting use case is VFIO
87as it speeds up the VM's boot time since the amount of IOMMU mappings are
88reduced.
89
90The user is responsible for ensuring there are sufficient huge pages of the specified size for the VMM to use.
91Failure to do so may result in strange VMM behaviour, e.g. error with `ReadKernelImage` is common.
92If there is a strange error with `hugepages` enabled, just disable it or check whether there are enough huge pages.
93
94By default this option is turned off.
95
96_Example_
97
98```
99--memory size=1G,hugepages=on,hugepage_size=2M
100```
101
102### `hotplug_method`
103
104Selects the way of adding and/or removing memory to/from a booted VM.
105
106Possible values are `acpi` and `virtio-mem`. Default value is `acpi`.
107
108_Example_
109
110```
111--memory size=1G,hotplug_method=acpi
112```
113
114### `hotplug_size`
115
116Amount of memory that can be dynamically added to the VM.
117
118Value is an unsigned integer of 64 bits. A value of 0 is invalid.
119
120_Example_
121
122```
123--memory size=1G,hotplug_size=1G
124```
125
126### `hotplugged_size`
127
128Amount of memory that will be dynamically added to the VM at boot. This option
129allows for starting a VM with a certain amount of memory that can be reduced
130during runtime.
131
132This is only valid when the `hotplug_method` is `virtio-mem` as it does not
133make sense for the `acpi` use case. When using ACPI, the memory can't be
134resized after it has been extended.
135
136This option is only valid when `hotplug_size` is specified, and its value can't
137exceed the value of `hotplug_size`.
138
139Value is an unsigned integer of 64 bits. A value of 0 is invalid.
140
141_Example_
142
143```
144--memory size=1G,hotplug_method=virtio-mem,hotplug_size=1G,hotplugged_size=512M
145```
146
147## Advanced Parameters
148
149`MemoryZoneConfig` or what is known as `--memory-zone` from the CLI perspective
150is a power user parameter. It allows for a full description of the guest RAM,
151describing how every memory region is backed and exposed to the guest.
152
153```rust
154struct MemoryZoneConfig {
155    id: String,
156    size: u64,
157    file: Option<PathBuf>,
158    shared: bool,
159    hugepages: bool,
160    host_numa_node: Option<u32>,
161    hotplug_size: Option<u64>,
162    hotplugged_size: Option<u64>,
163}
164```
165
166```
167--memory-zone <memory-zone>	User defined memory zone parameters "size=<guest_memory_region_size>,file=<backing_file>,shared=on|off,hugepages=on|off,host_numa_node=<node_id>,id=<zone_identifier>,hotplug_size=<hotpluggable_memory_size>,hotplugged_size=<hotplugged_memory_size>"
168```
169
170This parameter expects one or more occurences, allowing for a list of memory
171zones to be defined. It must be used with `--memory size=0`, clearly indicating
172that the memory will be described through advanced parameters.
173
174Each zone is given a list of options which we detail through the following
175sections.
176
177### `id`
178
179Memory zone identifier. This identifier must be unique, otherwise an error will
180be returned.
181
182This option is useful when referring to a memory zone previously created. In
183particular, the `--numa` parameter can associate a memory zone to a specific
184NUMA node based on the memory zone identifier.
185
186This option is mandatory when using the `--memory-zone` parameter.
187
188Value is a string.
189
190_Example_
191
192```
193--memory size=0
194--memory-zone id=mem0,size=1G
195```
196
197### `size`
198
199Size of the memory zone.
200
201This option is mandatory when using the `--memory-zone` parameter.
202
203Value is an unsigned integer of 64 bits.
204
205_Example_
206
207```
208--memory size=0
209--memory-zone id=mem0,size=1G
210```
211
212### `file`
213
214Path to the file backing the memory zone. This can be either a file or a
215directory. In case of a file, it will be opened and used as the backing file
216for the `mmap(2)` operation. In case of a directory, a temporary file with no
217hard link on the filesystem will be created. This file will be used as the
218backing file for the `mmap(2)` operation.
219
220This option can be particularly useful when trying to back a part of the guest
221RAM with a well known file. In the context of the snapshot/restore feature, and
222if the provided path is a file, the snapshot operation will not perform any
223copy of the guest RAM content for this specific memory zone since the user has
224access to it and it would duplicate data already stored on the current
225filesystem.
226
227Value is a string.
228
229_Example_
230
231```
232--memory size=0
233--memory-zone id=mem0,size=1G,file=/foo/bar
234```
235
236### `shared`
237
238Specifies if the memory zone must be `mmap(2)` with `MAP_SHARED` flag.
239
240By sharing a memory zone mapping, one can share part of the guest RAM with
241other processes running on the host. One can use this option when running
242vhost-user devices as part of the VM device model, as they will be driven
243by standalone daemons needing access to the guest RAM content.
244
245By default this option is turned off, which result in performing `mmap(2)`
246with `MAP_PRIVATE` flag.
247
248_Example_
249
250```
251--memory size=0
252--memory-zone id=mem0,size=1G,shared=on
253```
254
255### `hugepages`
256
257Specifies if the memory zone must be `mmap(2)` with `MAP_HUGETLB` and
258`MAP_HUGE_2MB` flags. This performs a memory zone mapping relying on 2MiB
259pages instead of the default 4kiB pages.
260
261By using hugepages, one can improve the overall performance of the VM, assuming
262the guest will allocate hugepages as well. Another interesting use case is VFIO
263as it speeds up the VM's boot time since the amount of IOMMU mappings are
264reduced.
265
266By default this option is turned off.
267
268_Example_
269
270```
271--memory size=0
272--memory-zone id=mem0,size=1G,hugepages=on
273```
274
275### `host_numa_node`
276
277Node identifier of a node present on the host. This option will let the user
278pick a specific NUMA node from which the memory must be allocated. After the
279memory zone is `mmap(2)`, the NUMA policy for this memory mapping will be
280applied through `mbind(2)`, relying on the provided node identifier. If the
281node does not exist on the host, the call to `mbind(2)` will fail.
282
283This option is useful when trying to back a VM memory with a specific type of
284memory from the host. Assuming a host has two types of memory, with one slower
285than the other, each related to a distinct NUMA node, one could create a VM
286with slower memory accesses by backing the entire guest RAM from the furthest
287NUMA node on the host.
288
289This option also gives the opportunity to create a VM with non uniform memory
290accesses as one could define a first memory zone backed by fast memory, and a
291second memory zone backed by slow memory.
292
293Value is an unsigned integer of 32 bits.
294
295_Example_
296
297```
298--memory size=0
299--memory-zone id=mem0,size=1G,host_numa_node=0
300```
301
302### `hotplug_size`
303
304Amount of memory that can be dynamically added to the memory zone. Since
305`virtio-mem` is the only way of resizing a memory zone, one must specify
306the `hotplug_method=virtio-mem` to the `--memory` parameter.
307
308Value is an unsigned integer of 64 bits. A value of 0 is invalid.
309
310_Example_
311
312```
313--memory size=0,hotplug_method=virtio-mem
314--memory-zone id=mem0,size=1G,hotplug_size=1G
315```
316
317### `hotplugged_size`
318
319Amount of memory that will be dynamically added to a memory zone at VM's boot.
320This option allows for starting a VM with a certain amount of memory that can
321be reduced during runtime.
322
323This is only valid when the `hotplug_method` is `virtio-mem` as it does not
324make sense for the `acpi` use case. When using ACPI, the memory can't be
325resized after it has been extended.
326
327This option is only valid when `hotplug_size` is specified, and its value can't
328exceed the value of `hotplug_size`.
329
330Value is an unsigned integer of 64 bits. A value of 0 is invalid.
331
332_Example_
333
334```
335--memory size=0,hotplug_method=virtio-mem
336--memory-zone id=mem0,size=1G,hotplug_size=1G,hotplugged_size=512M
337```
338
339## NUMA settings
340
341`NumaConfig` or what is known as `--numa` from the CLI perspective has been
342introduced to define a guest NUMA topology. It allows for a fine description
343about the CPUs and memory ranges associated with each NUMA node. Additionally
344it allows for specifying the distance between each NUMA node.
345
346```rust
347struct NumaConfig {
348    guest_numa_id: u32,
349    cpus: Option<Vec<u8>>,
350    distances: Option<Vec<NumaDistance>>,
351    memory_zones: Option<Vec<String>>,
352    sgx_epc_sections: Option<Vec<String>>,
353}
354```
355
356```
357--numa <numa>	Settings related to a given NUMA node "guest_numa_id=<node_id>,cpus=<cpus_id>,distances=<list_of_distances_to_destination_nodes>,memory_zones=<list_of_memory_zones>,sgx_epc_sections=<list_of_sgx_epc_sections>"
358```
359
360### `guest_numa_id`
361
362Node identifier of a guest NUMA node. This identifier must be unique, otherwise
363an error will be returned.
364
365This option is mandatory when using the `--numa` parameter.
366
367Value is an unsigned integer of 32 bits.
368
369_Example_
370
371```
372--numa guest_numa_id=0
373```
374
375### `cpus`
376
377List of virtual CPUs attached to the guest NUMA node identified by the
378`guest_numa_id` option. This allows for describing a list of CPUs which
379must be seen by the guest as belonging to the NUMA node `guest_numa_id`.
380
381One can use this option for a fine grained description of the NUMA topology
382regarding the CPUs associated with it, which might help the guest run more
383efficiently.
384
385Multiple values can be provided to define the list. Each value is an unsigned
386integer of 8 bits.
387
388For instance, if one needs to attach all CPUs from 0 to 4 to a specific node,
389the syntax using `-` will help define a contiguous range with `cpus=0-4`. The
390same example could also be described with `cpus=0:1:2:3:4`.
391
392A combination of both `-` and `:` separators is useful when one might need to
393describe a list containing all CPUs from 0 to 99 and the CPU 255, as it could
394simply be described with `cpus=0-99:255`.
395
396_Example_
397
398```
399--cpus boot=8
400--numa guest_numa_id=0,cpus=1-3:7 guest_numa_id=1,cpus=0:4-6
401```
402
403### `distances`
404
405List of distances between the current NUMA node referred by `guest_numa_id`
406and the destination NUMA nodes listed along with distances. This option let
407the user choose the distances between guest NUMA nodes. This is important to
408provide an accurate description of the way non uniform memory accesses will
409perform in the guest.
410
411One or more tuple of two values must be provided through this option. The first
412value is an unsigned integer of 32 bits as it represents the destination NUMA
413node. The second value is an unsigned integer of 8 bits as it represents the
414distance between the current NUMA node and the destination NUMA node. The two
415values are separated by `@` (`value1@value2`), meaning the destination NUMA
416node `value1` is located at a distance of `value2`. Each tuple is separated
417from the others with `:` separator.
418
419For instance, if one wants to define 3 NUMA nodes, with each node located at
420different distances, it can be described with the following example.
421
422_Example_
423
424```
425--numa guest_numa_id=0,distances=1@15:2@25 guest_numa_id=1,distances=0@15:2@20 guest_numa_id=2,distances=0@25:1@20
426```
427
428### `memory_zones`
429
430List of memory zones attached to the guest NUMA node identified by the
431`guest_numa_id` option. This allows for describing a list of memory ranges
432which must be seen by the guest as belonging to the NUMA node `guest_numa_id`.
433
434This option can be very useful and powerful when combined with `host_numa_node`
435option from `--memory-zone` parameter as it allows for creating a VM with non
436uniform memory accesses, and let the guest know about it. It allows for
437exposing memory zones through different NUMA nodes, which can help the guest
438workload run more efficiently.
439
440Multiple values can be provided to define the list. Each value is a string
441referring to an existing memory zone identifier. Values are separated from
442each other with the `:` separator.
443
444Note that a memory zone must belong to a single NUMA node. The following
445configuration is incorrect, therefore not allowed:
446`--numa guest_numa_id=0,memory_zones=mem0 guest_numa_id=1,memory_zones=mem0`
447
448_Example_
449
450```
451--memory size=0
452--memory-zone id=mem0,size=1G id=mem1,size=1G id=mem2,size=1G
453--numa guest_numa_id=0,memory_zones=mem0:mem2 guest_numa_id=1,memory_zones=mem1
454```
455
456### `sgx_epc_sections`
457
458List of SGX EPC sections attached to the guest NUMA node identified by the
459`guest_numa_id` option. This allows for describing a list of SGX EPC sections
460which must be seen by the guest as belonging to the NUMA node `guest_numa_id`.
461
462Multiple values can be provided to define the list. Each value is a string
463referring to an existing SGX EPC section identifier. Values are separated from
464each other with the `:` separator.
465
466_Example_
467
468```
469--sgx-epc id=epc0,size=32M id=epc1,size=64M id=epc2,size=32M
470--numa guest_numa_id=0,sgx_epc_sections=epc1 guest_numa_id=1,sgx_epc_sections=epc0:epc2
471```
472
473### PCI bus
474
475Cloud Hypervisor supports only one PCI bus, which is why it has been tied to
476the NUMA node 0 by default. It is the user responsibility to organize the NUMA
477nodes correctly so that vCPUs and guest RAM which should be located on the same
478NUMA node as the PCI bus end up on the NUMA node 0.
479