xref: /qemu/docs/nvdimm.txt (revision 9ab3aad2813ce5d9e79c86cb65a013016b61a08f)
179c0f397SHaozhong ZhangQEMU Virtual NVDIMM
279c0f397SHaozhong Zhang===================
379c0f397SHaozhong Zhang
479c0f397SHaozhong ZhangThis document explains the usage of virtual NVDIMM (vNVDIMM) feature
579c0f397SHaozhong Zhangwhich is available since QEMU v2.6.0.
679c0f397SHaozhong Zhang
779c0f397SHaozhong ZhangThe current QEMU only implements the persistent memory mode of vNVDIMM
879c0f397SHaozhong Zhangdevice and not the block window mode.
979c0f397SHaozhong Zhang
1079c0f397SHaozhong ZhangBasic Usage
1179c0f397SHaozhong Zhang-----------
1279c0f397SHaozhong Zhang
1379c0f397SHaozhong ZhangThe storage of a vNVDIMM device in QEMU is provided by the memory
1479c0f397SHaozhong Zhangbackend (i.e. memory-backend-file and memory-backend-ram). A simple
1579c0f397SHaozhong Zhangway to create a vNVDIMM device at startup time is done via the
1679c0f397SHaozhong Zhangfollowing command line options:
1779c0f397SHaozhong Zhang
1879c0f397SHaozhong Zhang -machine pc,nvdimm
1979c0f397SHaozhong Zhang -m $RAM_SIZE,slots=$N,maxmem=$MAX_SIZE
2079c0f397SHaozhong Zhang -object memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE
2179c0f397SHaozhong Zhang -device nvdimm,id=nvdimm1,memdev=mem1
2279c0f397SHaozhong Zhang
2379c0f397SHaozhong ZhangWhere,
2479c0f397SHaozhong Zhang
2579c0f397SHaozhong Zhang - the "nvdimm" machine option enables vNVDIMM feature.
2679c0f397SHaozhong Zhang
2779c0f397SHaozhong Zhang - "slots=$N" should be equal to or larger than the total amount of
2879c0f397SHaozhong Zhang   normal RAM devices and vNVDIMM devices, e.g. $N should be >= 2 here.
2979c0f397SHaozhong Zhang
3079c0f397SHaozhong Zhang - "maxmem=$MAX_SIZE" should be equal to or larger than the total size
3179c0f397SHaozhong Zhang   of normal RAM devices and vNVDIMM devices, e.g. $MAX_SIZE should be
3279c0f397SHaozhong Zhang   >= $RAM_SIZE + $NVDIMM_SIZE here.
3379c0f397SHaozhong Zhang
3479c0f397SHaozhong Zhang - "object memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE"
3579c0f397SHaozhong Zhang   creates a backend storage of size $NVDIMM_SIZE on a file $PATH. All
3679c0f397SHaozhong Zhang   accesses to the virtual NVDIMM device go to the file $PATH.
3779c0f397SHaozhong Zhang
3879c0f397SHaozhong Zhang   "share=on/off" controls the visibility of guest writes. If
3979c0f397SHaozhong Zhang   "share=on", then guest writes will be applied to the backend
4079c0f397SHaozhong Zhang   file. If another guest uses the same backend file with option
4179c0f397SHaozhong Zhang   "share=on", then above writes will be visible to it as well. If
4279c0f397SHaozhong Zhang   "share=off", then guest writes won't be applied to the backend
4379c0f397SHaozhong Zhang   file and thus will be invisible to other guests.
4479c0f397SHaozhong Zhang
4579c0f397SHaozhong Zhang - "device nvdimm,id=nvdimm1,memdev=mem1" creates a virtual NVDIMM
4679c0f397SHaozhong Zhang   device whose storage is provided by above memory backend device.
4779c0f397SHaozhong Zhang
4879c0f397SHaozhong ZhangMultiple vNVDIMM devices can be created if multiple pairs of "-object"
4979c0f397SHaozhong Zhangand "-device" are provided.
5079c0f397SHaozhong Zhang
5179c0f397SHaozhong ZhangFor above command line options, if the guest OS has the proper NVDIMM
5279c0f397SHaozhong Zhangdriver, it should be able to detect a NVDIMM device which is in the
5379c0f397SHaozhong Zhangpersistent memory mode and whose size is $NVDIMM_SIZE.
5479c0f397SHaozhong Zhang
5579c0f397SHaozhong ZhangNote:
5679c0f397SHaozhong Zhang
5779c0f397SHaozhong Zhang1. Prior to QEMU v2.8.0, if memory-backend-file is used and the actual
5879c0f397SHaozhong Zhang   backend file size is not equal to the size given by "size" option,
5979c0f397SHaozhong Zhang   QEMU will truncate the backend file by ftruncate(2), which will
6079c0f397SHaozhong Zhang   corrupt the existing data in the backend file, especially for the
6179c0f397SHaozhong Zhang   shrink case.
6279c0f397SHaozhong Zhang
6379c0f397SHaozhong Zhang   QEMU v2.8.0 and later check the backend file size and the "size"
6479c0f397SHaozhong Zhang   option. If they do not match, QEMU will report errors and abort in
6579c0f397SHaozhong Zhang   order to avoid the data corruption.
6679c0f397SHaozhong Zhang
6779c0f397SHaozhong Zhang2. QEMU v2.6.0 only puts a basic alignment requirement on the "size"
6879c0f397SHaozhong Zhang   option of memory-backend-file, e.g. 4KB alignment on x86.  However,
6979c0f397SHaozhong Zhang   QEMU v.2.7.0 puts an additional alignment requirement, which may
7079c0f397SHaozhong Zhang   require a larger value than the basic one, e.g. 2MB on x86. This
7179c0f397SHaozhong Zhang   change breaks the usage of memory-backend-file that only satisfies
7279c0f397SHaozhong Zhang   the basic alignment.
7379c0f397SHaozhong Zhang
7479c0f397SHaozhong Zhang   QEMU v2.8.0 and later remove the additional alignment on non-s390x
7579c0f397SHaozhong Zhang   architectures, so the broken memory-backend-file can work again.
7679c0f397SHaozhong Zhang
7779c0f397SHaozhong ZhangLabel
7879c0f397SHaozhong Zhang-----
7979c0f397SHaozhong Zhang
8079c0f397SHaozhong ZhangQEMU v2.7.0 and later implement the label support for vNVDIMM devices.
8179c0f397SHaozhong ZhangTo enable label on vNVDIMM devices, users can simply add
8279c0f397SHaozhong Zhang"label-size=$SZ" option to "-device nvdimm", e.g.
8379c0f397SHaozhong Zhang
8479c0f397SHaozhong Zhang -device nvdimm,id=nvdimm1,memdev=mem1,label-size=128K
8579c0f397SHaozhong Zhang
8679c0f397SHaozhong ZhangNote:
8779c0f397SHaozhong Zhang
8879c0f397SHaozhong Zhang1. The minimal label size is 128KB.
8979c0f397SHaozhong Zhang
9079c0f397SHaozhong Zhang2. QEMU v2.7.0 and later store labels at the end of backend storage.
9179c0f397SHaozhong Zhang   If a memory backend file, which was previously used as the backend
9279c0f397SHaozhong Zhang   of a vNVDIMM device without labels, is now used for a vNVDIMM
9379c0f397SHaozhong Zhang   device with label, the data in the label area at the end of file
9479c0f397SHaozhong Zhang   will be inaccessible to the guest. If any useful data (e.g. the
9579c0f397SHaozhong Zhang   meta-data of the file system) was stored there, the latter usage
9679c0f397SHaozhong Zhang   may result guest data corruption (e.g. breakage of guest file
9779c0f397SHaozhong Zhang   system).
9879c0f397SHaozhong Zhang
9979c0f397SHaozhong ZhangHotplug
10079c0f397SHaozhong Zhang-------
10179c0f397SHaozhong Zhang
10279c0f397SHaozhong ZhangQEMU v2.8.0 and later implement the hotplug support for vNVDIMM
10379c0f397SHaozhong Zhangdevices. Similarly to the RAM hotplug, the vNVDIMM hotplug is
10479c0f397SHaozhong Zhangaccomplished by two monitor commands "object_add" and "device_add".
10579c0f397SHaozhong Zhang
10679c0f397SHaozhong ZhangFor example, the following commands add another 4GB vNVDIMM device to
10779c0f397SHaozhong Zhangthe guest:
10879c0f397SHaozhong Zhang
10979c0f397SHaozhong Zhang (qemu) object_add memory-backend-file,id=mem2,share=on,mem-path=new_nvdimm.img,size=4G
11079c0f397SHaozhong Zhang (qemu) device_add nvdimm,id=nvdimm2,memdev=mem2
11179c0f397SHaozhong Zhang
11279c0f397SHaozhong ZhangNote:
11379c0f397SHaozhong Zhang
11479c0f397SHaozhong Zhang1. Each hotplugged vNVDIMM device consumes one memory slot. Users
11579c0f397SHaozhong Zhang   should always ensure the memory option "-m ...,slots=N" specifies
11679c0f397SHaozhong Zhang   enough number of slots, i.e.
11779c0f397SHaozhong Zhang     N >= number of RAM devices +
11879c0f397SHaozhong Zhang          number of statically plugged vNVDIMM devices +
11979c0f397SHaozhong Zhang          number of hotplugged vNVDIMM devices
12079c0f397SHaozhong Zhang
12179c0f397SHaozhong Zhang2. The similar is required for the memory option "-m ...,maxmem=M", i.e.
12279c0f397SHaozhong Zhang     M >= size of RAM devices +
12379c0f397SHaozhong Zhang          size of statically plugged vNVDIMM devices +
12479c0f397SHaozhong Zhang          size of hotplugged vNVDIMM devices
12598376843SHaozhong Zhang
12698376843SHaozhong ZhangAlignment
12798376843SHaozhong Zhang---------
12898376843SHaozhong Zhang
12998376843SHaozhong ZhangQEMU uses mmap(2) to maps vNVDIMM backends and aligns the mapping
13098376843SHaozhong Zhangaddress to the page size (getpagesize(2)) by default. However, some
13198376843SHaozhong Zhangtypes of backends may require an alignment different than the page
13298376843SHaozhong Zhangsize. In that case, QEMU v2.12.0 and later provide 'align' option to
13398376843SHaozhong Zhangmemory-backend-file to allow users to specify the proper alignment.
13498376843SHaozhong Zhang
13598376843SHaozhong ZhangFor example, device dax require the 2 MB alignment, so we can use
13698376843SHaozhong Zhangfollowing QEMU command line options to use it (/dev/dax0.0) as the
13798376843SHaozhong Zhangbackend of vNVDIMM:
13898376843SHaozhong Zhang
13998376843SHaozhong Zhang -object memory-backend-file,id=mem1,share=on,mem-path=/dev/dax0.0,size=4G,align=2M
14098376843SHaozhong Zhang -device nvdimm,id=nvdimm1,memdev=mem1
141cb836434SHaozhong Zhang
142cb836434SHaozhong ZhangGuest Data Persistence
143cb836434SHaozhong Zhang----------------------
144cb836434SHaozhong Zhang
145cb836434SHaozhong ZhangThough QEMU supports multiple types of vNVDIMM backends on Linux,
146cb836434SHaozhong Zhangcurrently the only one that can guarantee the guest write persistence
147cb836434SHaozhong Zhangis the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to
148cb836434SHaozhong Zhangwhich all guest access do not involve any host-side kernel cache.
149cb836434SHaozhong Zhang
150cb836434SHaozhong ZhangWhen using other types of backends, it's suggested to set 'unarmed'
151cb836434SHaozhong Zhangoption of '-device nvdimm' to 'on', which sets the unarmed flag of the
152cb836434SHaozhong Zhangguest NVDIMM region mapping structure.  This unarmed flag indicates
153cb836434SHaozhong Zhangguest software that this vNVDIMM device contains a region that cannot
154cb836434SHaozhong Zhangaccept persistent writes. In result, for example, the guest Linux
155cb836434SHaozhong ZhangNVDIMM driver, marks such vNVDIMM device as read-only.
156*9ab3aad2SRoss Zwisler
157*9ab3aad2SRoss ZwislerPlatform Capabilities
158*9ab3aad2SRoss Zwisler---------------------
159*9ab3aad2SRoss Zwisler
160*9ab3aad2SRoss ZwislerACPI 6.2 Errata A added support for a new Platform Capabilities Structure
161*9ab3aad2SRoss Zwislerwhich allows the platform to communicate what features it supports related to
162*9ab3aad2SRoss ZwislerNVDIMM data durability.  Users can provide a capabilities value to a guest via
163*9ab3aad2SRoss Zwislerthe optional "nvdimm-cap" machine command line option:
164*9ab3aad2SRoss Zwisler
165*9ab3aad2SRoss Zwisler    -machine pc,accel=kvm,nvdimm,nvdimm-cap=2
166*9ab3aad2SRoss Zwisler
167*9ab3aad2SRoss ZwislerThis "nvdimm-cap" field is an integer, and is the combined value of the
168*9ab3aad2SRoss Zwislervarious capability bits defined in table 5-137 of the ACPI 6.2 Errata A spec.
169*9ab3aad2SRoss Zwisler
170*9ab3aad2SRoss ZwislerHere is a quick summary of the three bits that are defined as of that spec:
171*9ab3aad2SRoss Zwisler
172*9ab3aad2SRoss ZwislerBit[0] - CPU Cache Flush to NVDIMM Durability on Power Loss Capable.
173*9ab3aad2SRoss ZwislerBit[1] - Memory Controller Flush to NVDIMM Durability on Power Loss Capable.
174*9ab3aad2SRoss Zwisler         Note: If bit 0 is set to 1 then this bit shall be set to 1 as well.
175*9ab3aad2SRoss ZwislerBit[2] - Byte Addressable Persistent Memory Hardware Mirroring Capable.
176*9ab3aad2SRoss Zwisler
177*9ab3aad2SRoss ZwislerSo, a "nvdimm-cap" value of 2 would mean that the platform supports Memory
178*9ab3aad2SRoss ZwislerController Flush on Power Loss, a value of 3 would mean that the platform
179*9ab3aad2SRoss Zwislersupports CPU Cache Flush and Memory Controller Flush on Power Loss, etc.
180*9ab3aad2SRoss Zwisler
181*9ab3aad2SRoss ZwislerFor a complete list of the flags available and for more detailed descriptions,
182*9ab3aad2SRoss Zwislerplease consult the ACPI spec.
183