xref: /cloud-hypervisor/docs/memory.md (revision 9af2968a7dc47b89bf07ea9dc5e735084efcfa3a)
1# Memory
2
3Cloud-Hypervisor has many ways to expose memory to the guest VM. This document
4aims to explain what Cloud-Hypervisor is capable of and how it can be used to
5meet the needs of very different use cases.
6
7## Basic Parameters
8
9`MemoryConfig` or what is known as `--memory` from the CLI perspective is the
10easiest way to get started with Cloud-Hypervisor.
11
12```rust
13struct MemoryConfig {
14    size: u64,
15    mergeable: bool,
16    shared: bool,
17    hugepages: bool,
18    hugepage_size: Option<u64>,
19    hotplug_method: HotplugMethod,
20    hotplug_size: Option<u64>,
21    hotplugged_size: Option<u64>,
22    zones: Option<Vec<MemoryZoneConfig>>,
23}
24```
25
26```
27--memory <memory>	Memory parameters "size=<guest_memory_size>,mergeable=on|off,shared=on|off,hugepages=on|off,hotplug_method=acpi|virtio-mem,hotplug_size=<hotpluggable_memory_size>,hotplugged_size=<hotplugged_memory_size>"
28```
29
30### `size`
31
32Size of the RAM in the guest VM.
33
34This option is mandatory when using the `--memory` parameter.
35
36Value is an unsigned integer of 64 bits.
37
38_Example_
39
40```
41--memory size=1G
42```
43
44### `mergeable`
45
46Specifies if the pages from the guest RAM must be marked as _mergeable_. In
47case this option is `true` or `on`, the pages will be marked with `madvise(2)`
48to let the host kernel know which pages are eligible for being merged by the
49KSM daemon.
50
51This option can be used when trying to reach a higher density of VMs running
52on a single host, as it will reduce the amount of memory consumed by each VM.
53
54By default this option is turned off.
55
56_Example_
57
58```
59--memory size=1G,mergeable=on
60```
61
62### `shared`
63
64Specifies if the memory must be `mmap(2)` with `MAP_SHARED` flag.
65
66By sharing a memory mapping, one can share the guest RAM with other processes
67running on the host. One can use this option when running vhost-user devices
68as part of the VM device model, as they will be driven by standalone daemons
69needing access to the guest RAM content.
70
71By default this option is turned off, which results in performing `mmap(2)`
72with `MAP_PRIVATE` flag.
73
74_Example_
75
76```
77--memory size=1G,shared=on
78```
79
80### `hugepages` and `hugepage_size`
81
82Specifies if the memory must be created and `mmap(2)` with `MAP_HUGETLB` and size
83flags. This performs a memory mapping relying on the specified huge page size. If no huge page size is supplied the system's default huge page size is used.
84
85By using hugepages, one can improve the overall performance of the VM, assuming
86the guest will allocate hugepages as well. Another interesting use case is VFIO
87as it speeds up the VM's boot time since the amount of IOMMU mappings are
88reduced.
89
90The user is responsible for ensuring there are sufficient huge pages of the specified size for the VMM to use. Failure to do so may result in strange VMM behaviour.
91
92By default this option is turned off.
93
94_Example_
95
96```
97--memory size=1G,hugepages=on,hugepage_size=2M
98```
99
100### `hotplug_method`
101
102Selects the way of adding and/or removing memory to/from a booted VM.
103
104Possible values are `acpi` and `virtio-mem`. Default value is `acpi`.
105
106_Example_
107
108```
109--memory size=1G,hotplug_method=acpi
110```
111
112### `hotplug_size`
113
114Amount of memory that can be dynamically added to the VM.
115
116Value is an unsigned integer of 64 bits. A value of 0 is invalid.
117
118_Example_
119
120```
121--memory size=1G,hotplug_size=1G
122```
123
124### `hotplugged_size`
125
126Amount of memory that will be dynamically added to the VM at boot. This option
127allows for starting a VM with a certain amount of memory that can be reduced
128during runtime.
129
130This is only valid when the `hotplug_method` is `virtio-mem` as it does not
131make sense for the `acpi` use case. When using ACPI, the memory can't be
132resized after it has been extended.
133
134This option is only valid when `hotplug_size` is specified, and its value can't
135exceed the value of `hotplug_size`.
136
137Value is an unsigned integer of 64 bits. A value of 0 is invalid.
138
139_Example_
140
141```
142--memory size=1G,hotplug_method=virtio-mem,hotplug_size=1G,hotplugged_size=512M
143```
144
145## Advanced Parameters
146
147`MemoryZoneConfig` or what is known as `--memory-zone` from the CLI perspective
148is a power user parameter. It allows for a full description of the guest RAM,
149describing how every memory region is backed and exposed to the guest.
150
151```rust
152struct MemoryZoneConfig {
153    id: String,
154    size: u64,
155    file: Option<PathBuf>,
156    shared: bool,
157    hugepages: bool,
158    host_numa_node: Option<u32>,
159    hotplug_size: Option<u64>,
160    hotplugged_size: Option<u64>,
161}
162```
163
164```
165--memory-zone <memory-zone>	User defined memory zone parameters "size=<guest_memory_region_size>,file=<backing_file>,shared=on|off,hugepages=on|off,host_numa_node=<node_id>,id=<zone_identifier>,hotplug_size=<hotpluggable_memory_size>,hotplugged_size=<hotplugged_memory_size>"
166```
167
168This parameter expects one or more occurences, allowing for a list of memory
169zones to be defined. It must be used with `--memory size=0`, clearly indicating
170that the memory will be described through advanced parameters.
171
172Each zone is given a list of options which we detail through the following
173sections.
174
175### `id`
176
177Memory zone identifier. This identifier must be unique, otherwise an error will
178be returned.
179
180This option is useful when referring to a memory zone previously created. In
181particular, the `--numa` parameter can associate a memory zone to a specific
182NUMA node based on the memory zone identifier.
183
184This option is mandatory when using the `--memory-zone` parameter.
185
186Value is a string.
187
188_Example_
189
190```
191--memory size=0
192--memory-zone id=mem0,size=1G
193```
194
195### `size`
196
197Size of the memory zone.
198
199This option is mandatory when using the `--memory-zone` parameter.
200
201Value is an unsigned integer of 64 bits.
202
203_Example_
204
205```
206--memory size=0
207--memory-zone id=mem0,size=1G
208```
209
210### `file`
211
212Path to the file backing the memory zone. This can be either a file or a
213directory. In case of a file, it will be opened and used as the backing file
214for the `mmap(2)` operation. In case of a directory, a temporary file with no
215hard link on the filesystem will be created. This file will be used as the
216backing file for the `mmap(2)` operation.
217
218This option can be particularly useful when trying to back a part of the guest
219RAM with a well known file. In the context of the snapshot/restore feature, and
220if the provided path is a file, the snapshot operation will not perform any
221copy of the guest RAM content for this specific memory zone since the user has
222access to it and it would duplicate data already stored on the current
223filesystem.
224
225Value is a string.
226
227_Example_
228
229```
230--memory size=0
231--memory-zone id=mem0,size=1G,file=/foo/bar
232```
233
234### `shared`
235
236Specifies if the memory zone must be `mmap(2)` with `MAP_SHARED` flag.
237
238By sharing a memory zone mapping, one can share part of the guest RAM with
239other processes running on the host. One can use this option when running
240vhost-user devices as part of the VM device model, as they will be driven
241by standalone daemons needing access to the guest RAM content.
242
243By default this option is turned off, which result in performing `mmap(2)`
244with `MAP_PRIVATE` flag.
245
246_Example_
247
248```
249--memory size=0
250--memory-zone id=mem0,size=1G,shared=on
251```
252
253### `hugepages`
254
255Specifies if the memory zone must be `mmap(2)` with `MAP_HUGETLB` and
256`MAP_HUGE_2MB` flags. This performs a memory zone mapping relying on 2MiB
257pages instead of the default 4kiB pages.
258
259By using hugepages, one can improve the overall performance of the VM, assuming
260the guest will allocate hugepages as well. Another interesting use case is VFIO
261as it speeds up the VM's boot time since the amount of IOMMU mappings are
262reduced.
263
264By default this option is turned off.
265
266_Example_
267
268```
269--memory size=0
270--memory-zone id=mem0,size=1G,hugepages=on
271```
272
273### `host_numa_node`
274
275Node identifier of a node present on the host. This option will let the user
276pick a specific NUMA node from which the memory must be allocated. After the
277memory zone is `mmap(2)`, the NUMA policy for this memory mapping will be
278applied through `mbind(2)`, relying on the provided node identifier. If the
279node does not exist on the host, the call to `mbind(2)` will fail.
280
281This option is useful when trying to back a VM memory with a specific type of
282memory from the host. Assuming a host has two types of memory, with one slower
283than the other, each related to a distinct NUMA node, one could create a VM
284with slower memory accesses by backing the entire guest RAM from the furthest
285NUMA node on the host.
286
287This option also gives the opportunity to create a VM with non uniform memory
288accesses as one could define a first memory zone backed by fast memory, and a
289second memory zone backed by slow memory.
290
291Value is an unsigned integer of 32 bits.
292
293_Example_
294
295```
296--memory size=0
297--memory-zone id=mem0,size=1G,host_numa_node=0
298```
299
300### `hotplug_size`
301
302Amount of memory that can be dynamically added to the memory zone. Since
303`virtio-mem` is the only way of resizing a memory zone, one must specify
304the `hotplug_method=virtio-mem` to the `--memory` parameter.
305
306Value is an unsigned integer of 64 bits. A value of 0 is invalid.
307
308_Example_
309
310```
311--memory size=0,hotplug_method=virtio-mem
312--memory-zone id=mem0,size=1G,hotplug_size=1G
313```
314
315### `hotplugged_size`
316
317Amount of memory that will be dynamically added to a memory zone at VM's boot.
318This option allows for starting a VM with a certain amount of memory that can
319be reduced during runtime.
320
321This is only valid when the `hotplug_method` is `virtio-mem` as it does not
322make sense for the `acpi` use case. When using ACPI, the memory can't be
323resized after it has been extended.
324
325This option is only valid when `hotplug_size` is specified, and its value can't
326exceed the value of `hotplug_size`.
327
328Value is an unsigned integer of 64 bits. A value of 0 is invalid.
329
330_Example_
331
332```
333--memory size=0,hotplug_method=virtio-mem
334--memory-zone id=mem0,size=1G,hotplug_size=1G,hotplugged_size=512M
335```
336
337## NUMA settings
338
339`NumaConfig` or what is known as `--numa` from the CLI perspective has been
340introduced to define a guest NUMA topology. It allows for a fine description
341about the CPUs and memory ranges associated with each NUMA node. Additionally
342it allows for specifying the distance between each NUMA node.
343
344```rust
345struct NumaConfig {
346    guest_numa_id: u32,
347    cpus: Option<Vec<u8>>,
348    distances: Option<Vec<NumaDistance>>,
349    memory_zones: Option<Vec<String>>,
350    sgx_epc_sections: Option<Vec<String>>,
351}
352```
353
354```
355--numa <numa>	Settings related to a given NUMA node "guest_numa_id=<node_id>,cpus=<cpus_id>,distances=<list_of_distances_to_destination_nodes>,memory_zones=<list_of_memory_zones>,sgx_epc_sections=<list_of_sgx_epc_sections>"
356```
357
358### `guest_numa_id`
359
360Node identifier of a guest NUMA node. This identifier must be unique, otherwise
361an error will be returned.
362
363This option is mandatory when using the `--numa` parameter.
364
365Value is an unsigned integer of 32 bits.
366
367_Example_
368
369```
370--numa guest_numa_id=0
371```
372
373### `cpus`
374
375List of virtual CPUs attached to the guest NUMA node identified by the
376`guest_numa_id` option. This allows for describing a list of CPUs which
377must be seen by the guest as belonging to the NUMA node `guest_numa_id`.
378
379One can use this option for a fine grained description of the NUMA topology
380regarding the CPUs associated with it, which might help the guest run more
381efficiently.
382
383Multiple values can be provided to define the list. Each value is an unsigned
384integer of 8 bits.
385
386For instance, if one needs to attach all CPUs from 0 to 4 to a specific node,
387the syntax using `-` will help define a contiguous range with `cpus=0-4`. The
388same example could also be described with `cpus=0:1:2:3:4`.
389
390A combination of both `-` and `:` separators is useful when one might need to
391describe a list containing all CPUs from 0 to 99 and the CPU 255, as it could
392simply be described with `cpus=0-99:255`.
393
394_Example_
395
396```
397--cpus boot=8
398--numa guest_numa_id=0,cpus=1-3:7 guest_numa_id=1,cpus=0:4-6
399```
400
401### `distances`
402
403List of distances between the current NUMA node referred by `guest_numa_id`
404and the destination NUMA nodes listed along with distances. This option let
405the user choose the distances between guest NUMA nodes. This is important to
406provide an accurate description of the way non uniform memory accesses will
407perform in the guest.
408
409One or more tuple of two values must be provided through this option. The first
410value is an unsigned integer of 32 bits as it represents the destination NUMA
411node. The second value is an unsigned integer of 8 bits as it represents the
412distance between the current NUMA node and the destination NUMA node. The two
413values are separated by `@` (`value1@value2`), meaning the destination NUMA
414node `value1` is located at a distance of `value2`. Each tuple is separated
415from the others with `:` separator.
416
417For instance, if one wants to define 3 NUMA nodes, with each node located at
418different distances, it can be described with the following example.
419
420_Example_
421
422```
423--numa guest_numa_id=0,distances=1@15:2@25 guest_numa_id=1,distances=0@15:2@20 guest_numa_id=2,distances=0@25:1@20
424```
425
426### `memory_zones`
427
428List of memory zones attached to the guest NUMA node identified by the
429`guest_numa_id` option. This allows for describing a list of memory ranges
430which must be seen by the guest as belonging to the NUMA node `guest_numa_id`.
431
432This option can be very useful and powerful when combined with `host_numa_node`
433option from `--memory-zone` parameter as it allows for creating a VM with non
434uniform memory accesses, and let the guest know about it. It allows for
435exposing memory zones through different NUMA nodes, which can help the guest
436workload run more efficiently.
437
438Multiple values can be provided to define the list. Each value is a string
439referring to an existing memory zone identifier. Values are separated from
440each other with the `:` separator.
441
442_Example_
443
444```
445--memory size=0
446--memory-zone id=mem0,size=1G id=mem1,size=1G id=mem2,size=1G
447--numa guest_numa_id=0,memory_zones=mem0:mem2 guest_numa_id=1,memory_zones=mem1
448```
449
450### `sgx_epc_sections`
451
452List of SGX EPC sections attached to the guest NUMA node identified by the
453`guest_numa_id` option. This allows for describing a list of SGX EPC sections
454which must be seen by the guest as belonging to the NUMA node `guest_numa_id`.
455
456Multiple values can be provided to define the list. Each value is a string
457referring to an existing SGX EPC section identifier. Values are separated from
458each other with the `:` separator.
459
460_Example_
461
462```
463--sgx-epc id=epc0,size=32M id=epc1,size=64M id=epc2,size=32M
464--numa guest_numa_id=0,sgx_epc_sections=epc1 guest_numa_id=1,sgx_epc_sections=epc0:epc2
465```
466
467### PCI bus
468
469Cloud Hypervisor supports only one PCI bus, which is why it has been tied to
470the NUMA node 0 by default. It is the user responsibility to organize the NUMA
471nodes correctly so that vCPUs and guest RAM which should be located on the same
472NUMA node as the PCI bus end up on the NUMA node 0.