xref: /qemu/docs/devel/migration/CPR.rst (revision 9736ee382e95ead06a838fe0b0498e0cb3845270)
120e6b156SSteve SistareCheckPoint and Restart (CPR)
220e6b156SSteve Sistare============================
320e6b156SSteve Sistare
420e6b156SSteve SistareCPR is the umbrella name for a set of migration modes in which the
520e6b156SSteve SistareVM is migrated to a new QEMU instance on the same host.  It is
620e6b156SSteve Sistareintended for use when the goal is to update host software components
720e6b156SSteve Sistarethat run the VM, such as QEMU or even the host kernel.  At this time,
8*45c3d6cfSSteve Sistarethe cpr-reboot and cpr-transfer modes are available.
920e6b156SSteve Sistare
1020e6b156SSteve SistareBecause QEMU is restarted on the same host, with access to the same
1120e6b156SSteve Sistarelocal devices, CPR is allowed in certain cases where normal migration
1220e6b156SSteve Sistarewould be blocked.  However, the user must not modify the contents of
1320e6b156SSteve Sistareguest block devices between quitting old QEMU and starting new QEMU.
1420e6b156SSteve Sistare
1520e6b156SSteve SistareCPR unconditionally stops VM execution before memory is saved, and
1620e6b156SSteve Sistarethus does not depend on any form of dirty page tracking.
1720e6b156SSteve Sistare
1820e6b156SSteve Sistarecpr-reboot mode
1920e6b156SSteve Sistare---------------
2020e6b156SSteve Sistare
2120e6b156SSteve SistareIn this mode, QEMU stops the VM, and writes VM state to the migration
2220e6b156SSteve SistareURI, which will typically be a file.  After quitting QEMU, the user
2320e6b156SSteve Sistareresumes by running QEMU with the ``-incoming`` option.  Because the
2420e6b156SSteve Sistareold and new QEMU instances are not active concurrently, the URI cannot
2520e6b156SSteve Sistarebe a type that streams data from one instance to the other.
2620e6b156SSteve Sistare
2720e6b156SSteve SistareGuest RAM can be saved in place if backed by shared memory, or can be
2820e6b156SSteve Sistarecopied to a file.  The former is more efficient and is therefore
2920e6b156SSteve Sistarepreferred.
3020e6b156SSteve Sistare
3120e6b156SSteve SistareAfter state and memory are saved, the user may update userland host
3220e6b156SSteve Sistaresoftware before restarting QEMU and resuming the VM.  Further, if
3320e6b156SSteve Sistarethe RAM is backed by persistent shared memory, such as a DAX device,
3420e6b156SSteve Sistarethen the user may reboot to a new host kernel before restarting QEMU.
3520e6b156SSteve Sistare
3620e6b156SSteve SistareThis mode supports VFIO devices provided the user first puts the
3720e6b156SSteve Sistareguest in the suspended runstate, such as by issuing the
3820e6b156SSteve Sistare``guest-suspend-ram`` command to the QEMU guest agent.  The agent
3920e6b156SSteve Sistaremust be pre-installed in the guest, and the guest must support
4020e6b156SSteve Sistaresuspend to RAM.  Beware that suspension can take a few seconds, so
4120e6b156SSteve Sistarethe user should poll to see the suspended state before proceeding
4220e6b156SSteve Sistarewith the CPR operation.
4320e6b156SSteve Sistare
4420e6b156SSteve SistareUsage
4520e6b156SSteve Sistare^^^^^
4620e6b156SSteve Sistare
4720e6b156SSteve SistareIt is recommended that guest RAM be backed with some type of shared
4820e6b156SSteve Sistarememory, such as ``memory-backend-file,share=on``, and that the
4920e6b156SSteve Sistare``x-ignore-shared`` capability be set.  This combination allows memory
5020e6b156SSteve Sistareto be saved in place.  Otherwise, after QEMU stops the VM, all guest
5120e6b156SSteve SistareRAM is copied to the migration URI.
5220e6b156SSteve Sistare
5320e6b156SSteve SistareOutgoing:
5420e6b156SSteve Sistare  * Set the migration mode parameter to ``cpr-reboot``.
5520e6b156SSteve Sistare  * Set the ``x-ignore-shared`` capability if desired.
56*45c3d6cfSSteve Sistare  * Issue the ``migrate`` command.  It is recommended the URI be a
5720e6b156SSteve Sistare    ``file`` type, but one can use other types such as ``exec``,
5820e6b156SSteve Sistare    provided the command captures all the data from the outgoing side,
5920e6b156SSteve Sistare    and provides all the data to the incoming side.
6020e6b156SSteve Sistare  * Quit when QEMU reaches the postmigrate state.
6120e6b156SSteve Sistare
6220e6b156SSteve SistareIncoming:
6320e6b156SSteve Sistare  * Start QEMU with the ``-incoming defer`` option.
6420e6b156SSteve Sistare  * Set the migration mode parameter to ``cpr-reboot``.
6520e6b156SSteve Sistare  * Set the ``x-ignore-shared`` capability if desired.
6620e6b156SSteve Sistare  * Issue the ``migrate-incoming`` command.
6720e6b156SSteve Sistare  * If the VM was running when the outgoing ``migrate`` command was
6820e6b156SSteve Sistare    issued, then QEMU automatically resumes VM execution.
6920e6b156SSteve Sistare
7020e6b156SSteve SistareExample 1
7120e6b156SSteve Sistare^^^^^^^^^
7220e6b156SSteve Sistare::
7320e6b156SSteve Sistare
7420e6b156SSteve Sistare  # qemu-kvm -monitor stdio
7520e6b156SSteve Sistare  -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/dax0.0,align=2M,share=on -m 4G
7620e6b156SSteve Sistare  ...
7720e6b156SSteve Sistare
7820e6b156SSteve Sistare  (qemu) info status
7920e6b156SSteve Sistare  VM status: running
8020e6b156SSteve Sistare  (qemu) migrate_set_parameter mode cpr-reboot
8120e6b156SSteve Sistare  (qemu) migrate_set_capability x-ignore-shared on
8220e6b156SSteve Sistare  (qemu) migrate -d file:vm.state
8320e6b156SSteve Sistare  (qemu) info status
8420e6b156SSteve Sistare  VM status: paused (postmigrate)
8520e6b156SSteve Sistare  (qemu) quit
8620e6b156SSteve Sistare
8720e6b156SSteve Sistare  ### optionally update kernel and reboot
8820e6b156SSteve Sistare  # systemctl kexec
8920e6b156SSteve Sistare  kexec_core: Starting new kernel
9020e6b156SSteve Sistare  ...
9120e6b156SSteve Sistare
9220e6b156SSteve Sistare  # qemu-kvm ... -incoming defer
9320e6b156SSteve Sistare  (qemu) info status
9420e6b156SSteve Sistare  VM status: paused (inmigrate)
9520e6b156SSteve Sistare  (qemu) migrate_set_parameter mode cpr-reboot
9620e6b156SSteve Sistare  (qemu) migrate_set_capability x-ignore-shared on
9720e6b156SSteve Sistare  (qemu) migrate_incoming file:vm.state
9820e6b156SSteve Sistare  (qemu) info status
9920e6b156SSteve Sistare  VM status: running
10020e6b156SSteve Sistare
10120e6b156SSteve SistareExample 2: VFIO
10220e6b156SSteve Sistare^^^^^^^^^^^^^^^
10320e6b156SSteve Sistare::
10420e6b156SSteve Sistare
10520e6b156SSteve Sistare  # qemu-kvm -monitor stdio
10620e6b156SSteve Sistare  -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/dax0.0,align=2M,share=on -m 4G
10720e6b156SSteve Sistare  -device vfio-pci, ...
10820e6b156SSteve Sistare  -chardev socket,id=qga0,path=qga.sock,server=on,wait=off
10920e6b156SSteve Sistare  -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0
11020e6b156SSteve Sistare  ...
11120e6b156SSteve Sistare
11220e6b156SSteve Sistare  (qemu) info status
11320e6b156SSteve Sistare  VM status: running
11420e6b156SSteve Sistare
11520e6b156SSteve Sistare  # echo '{"execute":"guest-suspend-ram"}' | ncat --send-only -U qga.sock
11620e6b156SSteve Sistare
11720e6b156SSteve Sistare  (qemu) info status
11820e6b156SSteve Sistare  VM status: paused (suspended)
11920e6b156SSteve Sistare  (qemu) migrate_set_parameter mode cpr-reboot
12020e6b156SSteve Sistare  (qemu) migrate_set_capability x-ignore-shared on
12120e6b156SSteve Sistare  (qemu) migrate -d file:vm.state
12220e6b156SSteve Sistare  (qemu) info status
12320e6b156SSteve Sistare  VM status: paused (postmigrate)
12420e6b156SSteve Sistare  (qemu) quit
12520e6b156SSteve Sistare
12620e6b156SSteve Sistare  ### optionally update kernel and reboot
12720e6b156SSteve Sistare  # systemctl kexec
12820e6b156SSteve Sistare  kexec_core: Starting new kernel
12920e6b156SSteve Sistare  ...
13020e6b156SSteve Sistare
13120e6b156SSteve Sistare  # qemu-kvm ... -incoming defer
13220e6b156SSteve Sistare  (qemu) info status
13320e6b156SSteve Sistare  VM status: paused (inmigrate)
13420e6b156SSteve Sistare  (qemu) migrate_set_parameter mode cpr-reboot
13520e6b156SSteve Sistare  (qemu) migrate_set_capability x-ignore-shared on
13620e6b156SSteve Sistare  (qemu) migrate_incoming file:vm.state
13720e6b156SSteve Sistare  (qemu) info status
13820e6b156SSteve Sistare  VM status: paused (suspended)
13920e6b156SSteve Sistare  (qemu) system_wakeup
14020e6b156SSteve Sistare  (qemu) info status
14120e6b156SSteve Sistare  VM status: running
14220e6b156SSteve Sistare
14320e6b156SSteve SistareCaveats
14420e6b156SSteve Sistare^^^^^^^
14520e6b156SSteve Sistare
14620e6b156SSteve Sistarecpr-reboot mode may not be used with postcopy, background-snapshot,
14720e6b156SSteve Sistareor COLO.
148*45c3d6cfSSteve Sistare
149*45c3d6cfSSteve Sistarecpr-transfer mode
150*45c3d6cfSSteve Sistare-----------------
151*45c3d6cfSSteve Sistare
152*45c3d6cfSSteve SistareThis mode allows the user to transfer a guest to a new QEMU instance
153*45c3d6cfSSteve Sistareon the same host with minimal guest pause time, by preserving guest
154*45c3d6cfSSteve SistareRAM in place, albeit with new virtual addresses in new QEMU.  Devices
155*45c3d6cfSSteve Sistareand their pinned memory pages will also be preserved in a future QEMU
156*45c3d6cfSSteve Sistarerelease.
157*45c3d6cfSSteve Sistare
158*45c3d6cfSSteve SistareThe user starts new QEMU on the same host as old QEMU, with command-
159*45c3d6cfSSteve Sistareline arguments to create the same machine, plus the ``-incoming``
160*45c3d6cfSSteve Sistareoption for the main migration channel, like normal live migration.
161*45c3d6cfSSteve SistareIn addition, the user adds a second -incoming option with channel
162*45c3d6cfSSteve Sistaretype ``cpr``.  This CPR channel must support file descriptor transfer
163*45c3d6cfSSteve Sistarewith SCM_RIGHTS, i.e. it must be a UNIX domain socket.
164*45c3d6cfSSteve Sistare
165*45c3d6cfSSteve SistareTo initiate CPR, the user issues a migrate command to old QEMU,
166*45c3d6cfSSteve Sistareadding a second migration channel of type ``cpr`` in the channels
167*45c3d6cfSSteve Sistareargument.  Old QEMU stops the VM, saves state to the migration
168*45c3d6cfSSteve Sistarechannels, and enters the postmigrate state.  Execution resumes in
169*45c3d6cfSSteve Sistarenew QEMU.
170*45c3d6cfSSteve Sistare
171*45c3d6cfSSteve SistareNew QEMU reads the CPR channel before opening a monitor, hence
172*45c3d6cfSSteve Sistarethe CPR channel cannot be specified in the list of channels for a
173*45c3d6cfSSteve Sistaremigrate-incoming command.  It may only be specified on the command
174*45c3d6cfSSteve Sistareline.
175*45c3d6cfSSteve Sistare
176*45c3d6cfSSteve SistareUsage
177*45c3d6cfSSteve Sistare^^^^^
178*45c3d6cfSSteve Sistare
179*45c3d6cfSSteve SistareMemory backend objects must have the ``share=on`` attribute.
180*45c3d6cfSSteve Sistare
181*45c3d6cfSSteve SistareThe VM must be started with the ``-machine aux-ram-share=on``
182*45c3d6cfSSteve Sistareoption.  This causes implicit RAM blocks (those not described by
183*45c3d6cfSSteve Sistarea memory-backend object) to be allocated by mmap'ing a memfd.
184*45c3d6cfSSteve SistareExamples include VGA and ROM.
185*45c3d6cfSSteve Sistare
186*45c3d6cfSSteve SistareOutgoing:
187*45c3d6cfSSteve Sistare  * Set the migration mode parameter to ``cpr-transfer``.
188*45c3d6cfSSteve Sistare  * Issue the ``migrate`` command, containing a main channel and
189*45c3d6cfSSteve Sistare    a cpr channel.
190*45c3d6cfSSteve Sistare
191*45c3d6cfSSteve SistareIncoming:
192*45c3d6cfSSteve Sistare  * Start new QEMU with two ``-incoming`` options.
193*45c3d6cfSSteve Sistare  * If the VM was running when the outgoing ``migrate`` command was
194*45c3d6cfSSteve Sistare    issued, then QEMU automatically resumes VM execution.
195*45c3d6cfSSteve Sistare
196*45c3d6cfSSteve SistareCaveats
197*45c3d6cfSSteve Sistare^^^^^^^
198*45c3d6cfSSteve Sistare
199*45c3d6cfSSteve Sistarecpr-transfer mode may not be used with postcopy, background-snapshot,
200*45c3d6cfSSteve Sistareor COLO.
201*45c3d6cfSSteve Sistare
202*45c3d6cfSSteve Sistarememory-backend-epc is not supported.
203*45c3d6cfSSteve Sistare
204*45c3d6cfSSteve SistareThe main incoming migration channel address cannot be a file type.
205*45c3d6cfSSteve Sistare
206*45c3d6cfSSteve SistareIf the main incoming channel address is an inet socket, then the port
207*45c3d6cfSSteve Sistarecannot be 0 (meaning dynamically choose a port).
208*45c3d6cfSSteve Sistare
209*45c3d6cfSSteve SistareWhen using ``-incoming defer``, you must issue the migrate command to
210*45c3d6cfSSteve Sistareold QEMU before issuing any monitor commands to new QEMU, because new
211*45c3d6cfSSteve SistareQEMU blocks waiting to read from the cpr channel before starting its
212*45c3d6cfSSteve Sistaremonitor, and old QEMU does not write to the channel until the migrate
213*45c3d6cfSSteve Sistarecommand is issued.  However, new QEMU does not open and read the
214*45c3d6cfSSteve Sistaremain migration channel until you issue the migrate incoming command.
215*45c3d6cfSSteve Sistare
216*45c3d6cfSSteve SistareExample 1: incoming channel
217*45c3d6cfSSteve Sistare^^^^^^^^^^^^^^^^^^^^^^^^^^^
218*45c3d6cfSSteve Sistare
219*45c3d6cfSSteve SistareIn these examples, we simply restart the same version of QEMU, but
220*45c3d6cfSSteve Sistarein a real scenario one would start new QEMU on the incoming side.
221*45c3d6cfSSteve SistareNote that new QEMU does not print the monitor prompt until old QEMU
222*45c3d6cfSSteve Sistarehas issued the migrate command.  The outgoing side uses QMP because
223*45c3d6cfSSteve SistareHMP cannot specify a CPR channel.  Some QMP responses are omitted for
224*45c3d6cfSSteve Sistarebrevity.
225*45c3d6cfSSteve Sistare
226*45c3d6cfSSteve Sistare::
227*45c3d6cfSSteve Sistare
228*45c3d6cfSSteve Sistare  Outgoing:                             Incoming:
229*45c3d6cfSSteve Sistare
230*45c3d6cfSSteve Sistare  # qemu-kvm -qmp stdio
231*45c3d6cfSSteve Sistare  -object memory-backend-file,id=ram0,size=4G,
232*45c3d6cfSSteve Sistare  mem-path=/dev/shm/ram0,share=on -m 4G
233*45c3d6cfSSteve Sistare  -machine memory-backend=ram0
234*45c3d6cfSSteve Sistare  -machine aux-ram-share=on
235*45c3d6cfSSteve Sistare  ...
236*45c3d6cfSSteve Sistare                                        # qemu-kvm -monitor stdio
237*45c3d6cfSSteve Sistare                                        -incoming tcp:0:44444
238*45c3d6cfSSteve Sistare                                        -incoming '{"channel-type": "cpr",
239*45c3d6cfSSteve Sistare                                          "addr": { "transport": "socket",
240*45c3d6cfSSteve Sistare                                          "type": "unix", "path": "cpr.sock"}}'
241*45c3d6cfSSteve Sistare                                        ...
242*45c3d6cfSSteve Sistare  {"execute":"qmp_capabilities"}
243*45c3d6cfSSteve Sistare
244*45c3d6cfSSteve Sistare  {"execute": "query-status"}
245*45c3d6cfSSteve Sistare  {"return": {"status": "running",
246*45c3d6cfSSteve Sistare              "running": true}}
247*45c3d6cfSSteve Sistare
248*45c3d6cfSSteve Sistare  {"execute":"migrate-set-parameters",
249*45c3d6cfSSteve Sistare   "arguments":{"mode":"cpr-transfer"}}
250*45c3d6cfSSteve Sistare
251*45c3d6cfSSteve Sistare  {"execute": "migrate", "arguments": { "channels": [
252*45c3d6cfSSteve Sistare    {"channel-type": "main",
253*45c3d6cfSSteve Sistare     "addr": { "transport": "socket", "type": "inet",
254*45c3d6cfSSteve Sistare               "host": "0", "port": "44444" }},
255*45c3d6cfSSteve Sistare    {"channel-type": "cpr",
256*45c3d6cfSSteve Sistare     "addr": { "transport": "socket", "type": "unix",
257*45c3d6cfSSteve Sistare               "path": "cpr.sock" }}]}}
258*45c3d6cfSSteve Sistare
259*45c3d6cfSSteve Sistare                                        QEMU 10.0.50 monitor
260*45c3d6cfSSteve Sistare                                        (qemu) info status
261*45c3d6cfSSteve Sistare                                        VM status: running
262*45c3d6cfSSteve Sistare
263*45c3d6cfSSteve Sistare  {"execute": "query-status"}
264*45c3d6cfSSteve Sistare  {"return": {"status": "postmigrate",
265*45c3d6cfSSteve Sistare              "running": false}}
266*45c3d6cfSSteve Sistare
267*45c3d6cfSSteve SistareExample 2: incoming defer
268*45c3d6cfSSteve Sistare^^^^^^^^^^^^^^^^^^^^^^^^^
269*45c3d6cfSSteve Sistare
270*45c3d6cfSSteve SistareThis example uses ``-incoming defer`` to hot plug a device before
271*45c3d6cfSSteve Sistareaccepting the main migration channel.  Again note you must issue the
272*45c3d6cfSSteve Sistaremigrate command to old QEMU before you can issue any monitor
273*45c3d6cfSSteve Sistarecommands to new QEMU.
274*45c3d6cfSSteve Sistare
275*45c3d6cfSSteve Sistare
276*45c3d6cfSSteve Sistare::
277*45c3d6cfSSteve Sistare
278*45c3d6cfSSteve Sistare  Outgoing:                             Incoming:
279*45c3d6cfSSteve Sistare
280*45c3d6cfSSteve Sistare  # qemu-kvm -monitor stdio
281*45c3d6cfSSteve Sistare  -object memory-backend-file,id=ram0,size=4G,
282*45c3d6cfSSteve Sistare  mem-path=/dev/shm/ram0,share=on -m 4G
283*45c3d6cfSSteve Sistare  -machine memory-backend=ram0
284*45c3d6cfSSteve Sistare  -machine aux-ram-share=on
285*45c3d6cfSSteve Sistare  ...
286*45c3d6cfSSteve Sistare                                        # qemu-kvm -monitor stdio
287*45c3d6cfSSteve Sistare                                        -incoming defer
288*45c3d6cfSSteve Sistare                                        -incoming '{"channel-type": "cpr",
289*45c3d6cfSSteve Sistare                                          "addr": { "transport": "socket",
290*45c3d6cfSSteve Sistare                                          "type": "unix", "path": "cpr.sock"}}'
291*45c3d6cfSSteve Sistare                                        ...
292*45c3d6cfSSteve Sistare  {"execute":"qmp_capabilities"}
293*45c3d6cfSSteve Sistare
294*45c3d6cfSSteve Sistare  {"execute": "device_add",
295*45c3d6cfSSteve Sistare   "arguments": {"driver": "pcie-root-port"}}
296*45c3d6cfSSteve Sistare
297*45c3d6cfSSteve Sistare  {"execute":"migrate-set-parameters",
298*45c3d6cfSSteve Sistare   "arguments":{"mode":"cpr-transfer"}}
299*45c3d6cfSSteve Sistare
300*45c3d6cfSSteve Sistare  {"execute": "migrate", "arguments": { "channels": [
301*45c3d6cfSSteve Sistare    {"channel-type": "main",
302*45c3d6cfSSteve Sistare     "addr": { "transport": "socket", "type": "inet",
303*45c3d6cfSSteve Sistare               "host": "0", "port": "44444" }},
304*45c3d6cfSSteve Sistare    {"channel-type": "cpr",
305*45c3d6cfSSteve Sistare     "addr": { "transport": "socket", "type": "unix",
306*45c3d6cfSSteve Sistare               "path": "cpr.sock" }}]}}
307*45c3d6cfSSteve Sistare
308*45c3d6cfSSteve Sistare                                        QEMU 10.0.50 monitor
309*45c3d6cfSSteve Sistare                                        (qemu) info status
310*45c3d6cfSSteve Sistare                                        VM status: paused (inmigrate)
311*45c3d6cfSSteve Sistare                                        (qemu) device_add pcie-root-port
312*45c3d6cfSSteve Sistare                                        (qemu) migrate_incoming tcp:0:44444
313*45c3d6cfSSteve Sistare                                        (qemu) info status
314*45c3d6cfSSteve Sistare                                        VM status: running
315*45c3d6cfSSteve Sistare
316*45c3d6cfSSteve Sistare  {"execute": "query-status"}
317*45c3d6cfSSteve Sistare  {"return": {"status": "postmigrate",
318*45c3d6cfSSteve Sistare              "running": false}}
319*45c3d6cfSSteve Sistare
320*45c3d6cfSSteve SistareFutures
321*45c3d6cfSSteve Sistare^^^^^^^
322*45c3d6cfSSteve Sistare
323*45c3d6cfSSteve Sistarecpr-transfer mode is based on a capability to transfer open file
324*45c3d6cfSSteve Sistaredescriptors from old to new QEMU.  In the future, descriptors for
325*45c3d6cfSSteve Sistarevfio, iommufd, vhost, and char devices could be transferred,
326*45c3d6cfSSteve Sistarepreserving those devices and their kernel state without interruption,
327*45c3d6cfSSteve Sistareeven if they do not explicitly support live migration.
328