120e6b156SSteve SistareCheckPoint and Restart (CPR) 220e6b156SSteve Sistare============================ 320e6b156SSteve Sistare 420e6b156SSteve SistareCPR is the umbrella name for a set of migration modes in which the 520e6b156SSteve SistareVM is migrated to a new QEMU instance on the same host. It is 620e6b156SSteve Sistareintended for use when the goal is to update host software components 720e6b156SSteve Sistarethat run the VM, such as QEMU or even the host kernel. At this time, 8*45c3d6cfSSteve Sistarethe cpr-reboot and cpr-transfer modes are available. 920e6b156SSteve Sistare 1020e6b156SSteve SistareBecause QEMU is restarted on the same host, with access to the same 1120e6b156SSteve Sistarelocal devices, CPR is allowed in certain cases where normal migration 1220e6b156SSteve Sistarewould be blocked. However, the user must not modify the contents of 1320e6b156SSteve Sistareguest block devices between quitting old QEMU and starting new QEMU. 1420e6b156SSteve Sistare 1520e6b156SSteve SistareCPR unconditionally stops VM execution before memory is saved, and 1620e6b156SSteve Sistarethus does not depend on any form of dirty page tracking. 1720e6b156SSteve Sistare 1820e6b156SSteve Sistarecpr-reboot mode 1920e6b156SSteve Sistare--------------- 2020e6b156SSteve Sistare 2120e6b156SSteve SistareIn this mode, QEMU stops the VM, and writes VM state to the migration 2220e6b156SSteve SistareURI, which will typically be a file. After quitting QEMU, the user 2320e6b156SSteve Sistareresumes by running QEMU with the ``-incoming`` option. Because the 2420e6b156SSteve Sistareold and new QEMU instances are not active concurrently, the URI cannot 2520e6b156SSteve Sistarebe a type that streams data from one instance to the other. 2620e6b156SSteve Sistare 2720e6b156SSteve SistareGuest RAM can be saved in place if backed by shared memory, or can be 2820e6b156SSteve Sistarecopied to a file. The former is more efficient and is therefore 2920e6b156SSteve Sistarepreferred. 3020e6b156SSteve Sistare 3120e6b156SSteve SistareAfter state and memory are saved, the user may update userland host 3220e6b156SSteve Sistaresoftware before restarting QEMU and resuming the VM. Further, if 3320e6b156SSteve Sistarethe RAM is backed by persistent shared memory, such as a DAX device, 3420e6b156SSteve Sistarethen the user may reboot to a new host kernel before restarting QEMU. 3520e6b156SSteve Sistare 3620e6b156SSteve SistareThis mode supports VFIO devices provided the user first puts the 3720e6b156SSteve Sistareguest in the suspended runstate, such as by issuing the 3820e6b156SSteve Sistare``guest-suspend-ram`` command to the QEMU guest agent. The agent 3920e6b156SSteve Sistaremust be pre-installed in the guest, and the guest must support 4020e6b156SSteve Sistaresuspend to RAM. Beware that suspension can take a few seconds, so 4120e6b156SSteve Sistarethe user should poll to see the suspended state before proceeding 4220e6b156SSteve Sistarewith the CPR operation. 4320e6b156SSteve Sistare 4420e6b156SSteve SistareUsage 4520e6b156SSteve Sistare^^^^^ 4620e6b156SSteve Sistare 4720e6b156SSteve SistareIt is recommended that guest RAM be backed with some type of shared 4820e6b156SSteve Sistarememory, such as ``memory-backend-file,share=on``, and that the 4920e6b156SSteve Sistare``x-ignore-shared`` capability be set. This combination allows memory 5020e6b156SSteve Sistareto be saved in place. Otherwise, after QEMU stops the VM, all guest 5120e6b156SSteve SistareRAM is copied to the migration URI. 5220e6b156SSteve Sistare 5320e6b156SSteve SistareOutgoing: 5420e6b156SSteve Sistare * Set the migration mode parameter to ``cpr-reboot``. 5520e6b156SSteve Sistare * Set the ``x-ignore-shared`` capability if desired. 56*45c3d6cfSSteve Sistare * Issue the ``migrate`` command. It is recommended the URI be a 5720e6b156SSteve Sistare ``file`` type, but one can use other types such as ``exec``, 5820e6b156SSteve Sistare provided the command captures all the data from the outgoing side, 5920e6b156SSteve Sistare and provides all the data to the incoming side. 6020e6b156SSteve Sistare * Quit when QEMU reaches the postmigrate state. 6120e6b156SSteve Sistare 6220e6b156SSteve SistareIncoming: 6320e6b156SSteve Sistare * Start QEMU with the ``-incoming defer`` option. 6420e6b156SSteve Sistare * Set the migration mode parameter to ``cpr-reboot``. 6520e6b156SSteve Sistare * Set the ``x-ignore-shared`` capability if desired. 6620e6b156SSteve Sistare * Issue the ``migrate-incoming`` command. 6720e6b156SSteve Sistare * If the VM was running when the outgoing ``migrate`` command was 6820e6b156SSteve Sistare issued, then QEMU automatically resumes VM execution. 6920e6b156SSteve Sistare 7020e6b156SSteve SistareExample 1 7120e6b156SSteve Sistare^^^^^^^^^ 7220e6b156SSteve Sistare:: 7320e6b156SSteve Sistare 7420e6b156SSteve Sistare # qemu-kvm -monitor stdio 7520e6b156SSteve Sistare -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/dax0.0,align=2M,share=on -m 4G 7620e6b156SSteve Sistare ... 7720e6b156SSteve Sistare 7820e6b156SSteve Sistare (qemu) info status 7920e6b156SSteve Sistare VM status: running 8020e6b156SSteve Sistare (qemu) migrate_set_parameter mode cpr-reboot 8120e6b156SSteve Sistare (qemu) migrate_set_capability x-ignore-shared on 8220e6b156SSteve Sistare (qemu) migrate -d file:vm.state 8320e6b156SSteve Sistare (qemu) info status 8420e6b156SSteve Sistare VM status: paused (postmigrate) 8520e6b156SSteve Sistare (qemu) quit 8620e6b156SSteve Sistare 8720e6b156SSteve Sistare ### optionally update kernel and reboot 8820e6b156SSteve Sistare # systemctl kexec 8920e6b156SSteve Sistare kexec_core: Starting new kernel 9020e6b156SSteve Sistare ... 9120e6b156SSteve Sistare 9220e6b156SSteve Sistare # qemu-kvm ... -incoming defer 9320e6b156SSteve Sistare (qemu) info status 9420e6b156SSteve Sistare VM status: paused (inmigrate) 9520e6b156SSteve Sistare (qemu) migrate_set_parameter mode cpr-reboot 9620e6b156SSteve Sistare (qemu) migrate_set_capability x-ignore-shared on 9720e6b156SSteve Sistare (qemu) migrate_incoming file:vm.state 9820e6b156SSteve Sistare (qemu) info status 9920e6b156SSteve Sistare VM status: running 10020e6b156SSteve Sistare 10120e6b156SSteve SistareExample 2: VFIO 10220e6b156SSteve Sistare^^^^^^^^^^^^^^^ 10320e6b156SSteve Sistare:: 10420e6b156SSteve Sistare 10520e6b156SSteve Sistare # qemu-kvm -monitor stdio 10620e6b156SSteve Sistare -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/dax0.0,align=2M,share=on -m 4G 10720e6b156SSteve Sistare -device vfio-pci, ... 10820e6b156SSteve Sistare -chardev socket,id=qga0,path=qga.sock,server=on,wait=off 10920e6b156SSteve Sistare -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 11020e6b156SSteve Sistare ... 11120e6b156SSteve Sistare 11220e6b156SSteve Sistare (qemu) info status 11320e6b156SSteve Sistare VM status: running 11420e6b156SSteve Sistare 11520e6b156SSteve Sistare # echo '{"execute":"guest-suspend-ram"}' | ncat --send-only -U qga.sock 11620e6b156SSteve Sistare 11720e6b156SSteve Sistare (qemu) info status 11820e6b156SSteve Sistare VM status: paused (suspended) 11920e6b156SSteve Sistare (qemu) migrate_set_parameter mode cpr-reboot 12020e6b156SSteve Sistare (qemu) migrate_set_capability x-ignore-shared on 12120e6b156SSteve Sistare (qemu) migrate -d file:vm.state 12220e6b156SSteve Sistare (qemu) info status 12320e6b156SSteve Sistare VM status: paused (postmigrate) 12420e6b156SSteve Sistare (qemu) quit 12520e6b156SSteve Sistare 12620e6b156SSteve Sistare ### optionally update kernel and reboot 12720e6b156SSteve Sistare # systemctl kexec 12820e6b156SSteve Sistare kexec_core: Starting new kernel 12920e6b156SSteve Sistare ... 13020e6b156SSteve Sistare 13120e6b156SSteve Sistare # qemu-kvm ... -incoming defer 13220e6b156SSteve Sistare (qemu) info status 13320e6b156SSteve Sistare VM status: paused (inmigrate) 13420e6b156SSteve Sistare (qemu) migrate_set_parameter mode cpr-reboot 13520e6b156SSteve Sistare (qemu) migrate_set_capability x-ignore-shared on 13620e6b156SSteve Sistare (qemu) migrate_incoming file:vm.state 13720e6b156SSteve Sistare (qemu) info status 13820e6b156SSteve Sistare VM status: paused (suspended) 13920e6b156SSteve Sistare (qemu) system_wakeup 14020e6b156SSteve Sistare (qemu) info status 14120e6b156SSteve Sistare VM status: running 14220e6b156SSteve Sistare 14320e6b156SSteve SistareCaveats 14420e6b156SSteve Sistare^^^^^^^ 14520e6b156SSteve Sistare 14620e6b156SSteve Sistarecpr-reboot mode may not be used with postcopy, background-snapshot, 14720e6b156SSteve Sistareor COLO. 148*45c3d6cfSSteve Sistare 149*45c3d6cfSSteve Sistarecpr-transfer mode 150*45c3d6cfSSteve Sistare----------------- 151*45c3d6cfSSteve Sistare 152*45c3d6cfSSteve SistareThis mode allows the user to transfer a guest to a new QEMU instance 153*45c3d6cfSSteve Sistareon the same host with minimal guest pause time, by preserving guest 154*45c3d6cfSSteve SistareRAM in place, albeit with new virtual addresses in new QEMU. Devices 155*45c3d6cfSSteve Sistareand their pinned memory pages will also be preserved in a future QEMU 156*45c3d6cfSSteve Sistarerelease. 157*45c3d6cfSSteve Sistare 158*45c3d6cfSSteve SistareThe user starts new QEMU on the same host as old QEMU, with command- 159*45c3d6cfSSteve Sistareline arguments to create the same machine, plus the ``-incoming`` 160*45c3d6cfSSteve Sistareoption for the main migration channel, like normal live migration. 161*45c3d6cfSSteve SistareIn addition, the user adds a second -incoming option with channel 162*45c3d6cfSSteve Sistaretype ``cpr``. This CPR channel must support file descriptor transfer 163*45c3d6cfSSteve Sistarewith SCM_RIGHTS, i.e. it must be a UNIX domain socket. 164*45c3d6cfSSteve Sistare 165*45c3d6cfSSteve SistareTo initiate CPR, the user issues a migrate command to old QEMU, 166*45c3d6cfSSteve Sistareadding a second migration channel of type ``cpr`` in the channels 167*45c3d6cfSSteve Sistareargument. Old QEMU stops the VM, saves state to the migration 168*45c3d6cfSSteve Sistarechannels, and enters the postmigrate state. Execution resumes in 169*45c3d6cfSSteve Sistarenew QEMU. 170*45c3d6cfSSteve Sistare 171*45c3d6cfSSteve SistareNew QEMU reads the CPR channel before opening a monitor, hence 172*45c3d6cfSSteve Sistarethe CPR channel cannot be specified in the list of channels for a 173*45c3d6cfSSteve Sistaremigrate-incoming command. It may only be specified on the command 174*45c3d6cfSSteve Sistareline. 175*45c3d6cfSSteve Sistare 176*45c3d6cfSSteve SistareUsage 177*45c3d6cfSSteve Sistare^^^^^ 178*45c3d6cfSSteve Sistare 179*45c3d6cfSSteve SistareMemory backend objects must have the ``share=on`` attribute. 180*45c3d6cfSSteve Sistare 181*45c3d6cfSSteve SistareThe VM must be started with the ``-machine aux-ram-share=on`` 182*45c3d6cfSSteve Sistareoption. This causes implicit RAM blocks (those not described by 183*45c3d6cfSSteve Sistarea memory-backend object) to be allocated by mmap'ing a memfd. 184*45c3d6cfSSteve SistareExamples include VGA and ROM. 185*45c3d6cfSSteve Sistare 186*45c3d6cfSSteve SistareOutgoing: 187*45c3d6cfSSteve Sistare * Set the migration mode parameter to ``cpr-transfer``. 188*45c3d6cfSSteve Sistare * Issue the ``migrate`` command, containing a main channel and 189*45c3d6cfSSteve Sistare a cpr channel. 190*45c3d6cfSSteve Sistare 191*45c3d6cfSSteve SistareIncoming: 192*45c3d6cfSSteve Sistare * Start new QEMU with two ``-incoming`` options. 193*45c3d6cfSSteve Sistare * If the VM was running when the outgoing ``migrate`` command was 194*45c3d6cfSSteve Sistare issued, then QEMU automatically resumes VM execution. 195*45c3d6cfSSteve Sistare 196*45c3d6cfSSteve SistareCaveats 197*45c3d6cfSSteve Sistare^^^^^^^ 198*45c3d6cfSSteve Sistare 199*45c3d6cfSSteve Sistarecpr-transfer mode may not be used with postcopy, background-snapshot, 200*45c3d6cfSSteve Sistareor COLO. 201*45c3d6cfSSteve Sistare 202*45c3d6cfSSteve Sistarememory-backend-epc is not supported. 203*45c3d6cfSSteve Sistare 204*45c3d6cfSSteve SistareThe main incoming migration channel address cannot be a file type. 205*45c3d6cfSSteve Sistare 206*45c3d6cfSSteve SistareIf the main incoming channel address is an inet socket, then the port 207*45c3d6cfSSteve Sistarecannot be 0 (meaning dynamically choose a port). 208*45c3d6cfSSteve Sistare 209*45c3d6cfSSteve SistareWhen using ``-incoming defer``, you must issue the migrate command to 210*45c3d6cfSSteve Sistareold QEMU before issuing any monitor commands to new QEMU, because new 211*45c3d6cfSSteve SistareQEMU blocks waiting to read from the cpr channel before starting its 212*45c3d6cfSSteve Sistaremonitor, and old QEMU does not write to the channel until the migrate 213*45c3d6cfSSteve Sistarecommand is issued. However, new QEMU does not open and read the 214*45c3d6cfSSteve Sistaremain migration channel until you issue the migrate incoming command. 215*45c3d6cfSSteve Sistare 216*45c3d6cfSSteve SistareExample 1: incoming channel 217*45c3d6cfSSteve Sistare^^^^^^^^^^^^^^^^^^^^^^^^^^^ 218*45c3d6cfSSteve Sistare 219*45c3d6cfSSteve SistareIn these examples, we simply restart the same version of QEMU, but 220*45c3d6cfSSteve Sistarein a real scenario one would start new QEMU on the incoming side. 221*45c3d6cfSSteve SistareNote that new QEMU does not print the monitor prompt until old QEMU 222*45c3d6cfSSteve Sistarehas issued the migrate command. The outgoing side uses QMP because 223*45c3d6cfSSteve SistareHMP cannot specify a CPR channel. Some QMP responses are omitted for 224*45c3d6cfSSteve Sistarebrevity. 225*45c3d6cfSSteve Sistare 226*45c3d6cfSSteve Sistare:: 227*45c3d6cfSSteve Sistare 228*45c3d6cfSSteve Sistare Outgoing: Incoming: 229*45c3d6cfSSteve Sistare 230*45c3d6cfSSteve Sistare # qemu-kvm -qmp stdio 231*45c3d6cfSSteve Sistare -object memory-backend-file,id=ram0,size=4G, 232*45c3d6cfSSteve Sistare mem-path=/dev/shm/ram0,share=on -m 4G 233*45c3d6cfSSteve Sistare -machine memory-backend=ram0 234*45c3d6cfSSteve Sistare -machine aux-ram-share=on 235*45c3d6cfSSteve Sistare ... 236*45c3d6cfSSteve Sistare # qemu-kvm -monitor stdio 237*45c3d6cfSSteve Sistare -incoming tcp:0:44444 238*45c3d6cfSSteve Sistare -incoming '{"channel-type": "cpr", 239*45c3d6cfSSteve Sistare "addr": { "transport": "socket", 240*45c3d6cfSSteve Sistare "type": "unix", "path": "cpr.sock"}}' 241*45c3d6cfSSteve Sistare ... 242*45c3d6cfSSteve Sistare {"execute":"qmp_capabilities"} 243*45c3d6cfSSteve Sistare 244*45c3d6cfSSteve Sistare {"execute": "query-status"} 245*45c3d6cfSSteve Sistare {"return": {"status": "running", 246*45c3d6cfSSteve Sistare "running": true}} 247*45c3d6cfSSteve Sistare 248*45c3d6cfSSteve Sistare {"execute":"migrate-set-parameters", 249*45c3d6cfSSteve Sistare "arguments":{"mode":"cpr-transfer"}} 250*45c3d6cfSSteve Sistare 251*45c3d6cfSSteve Sistare {"execute": "migrate", "arguments": { "channels": [ 252*45c3d6cfSSteve Sistare {"channel-type": "main", 253*45c3d6cfSSteve Sistare "addr": { "transport": "socket", "type": "inet", 254*45c3d6cfSSteve Sistare "host": "0", "port": "44444" }}, 255*45c3d6cfSSteve Sistare {"channel-type": "cpr", 256*45c3d6cfSSteve Sistare "addr": { "transport": "socket", "type": "unix", 257*45c3d6cfSSteve Sistare "path": "cpr.sock" }}]}} 258*45c3d6cfSSteve Sistare 259*45c3d6cfSSteve Sistare QEMU 10.0.50 monitor 260*45c3d6cfSSteve Sistare (qemu) info status 261*45c3d6cfSSteve Sistare VM status: running 262*45c3d6cfSSteve Sistare 263*45c3d6cfSSteve Sistare {"execute": "query-status"} 264*45c3d6cfSSteve Sistare {"return": {"status": "postmigrate", 265*45c3d6cfSSteve Sistare "running": false}} 266*45c3d6cfSSteve Sistare 267*45c3d6cfSSteve SistareExample 2: incoming defer 268*45c3d6cfSSteve Sistare^^^^^^^^^^^^^^^^^^^^^^^^^ 269*45c3d6cfSSteve Sistare 270*45c3d6cfSSteve SistareThis example uses ``-incoming defer`` to hot plug a device before 271*45c3d6cfSSteve Sistareaccepting the main migration channel. Again note you must issue the 272*45c3d6cfSSteve Sistaremigrate command to old QEMU before you can issue any monitor 273*45c3d6cfSSteve Sistarecommands to new QEMU. 274*45c3d6cfSSteve Sistare 275*45c3d6cfSSteve Sistare 276*45c3d6cfSSteve Sistare:: 277*45c3d6cfSSteve Sistare 278*45c3d6cfSSteve Sistare Outgoing: Incoming: 279*45c3d6cfSSteve Sistare 280*45c3d6cfSSteve Sistare # qemu-kvm -monitor stdio 281*45c3d6cfSSteve Sistare -object memory-backend-file,id=ram0,size=4G, 282*45c3d6cfSSteve Sistare mem-path=/dev/shm/ram0,share=on -m 4G 283*45c3d6cfSSteve Sistare -machine memory-backend=ram0 284*45c3d6cfSSteve Sistare -machine aux-ram-share=on 285*45c3d6cfSSteve Sistare ... 286*45c3d6cfSSteve Sistare # qemu-kvm -monitor stdio 287*45c3d6cfSSteve Sistare -incoming defer 288*45c3d6cfSSteve Sistare -incoming '{"channel-type": "cpr", 289*45c3d6cfSSteve Sistare "addr": { "transport": "socket", 290*45c3d6cfSSteve Sistare "type": "unix", "path": "cpr.sock"}}' 291*45c3d6cfSSteve Sistare ... 292*45c3d6cfSSteve Sistare {"execute":"qmp_capabilities"} 293*45c3d6cfSSteve Sistare 294*45c3d6cfSSteve Sistare {"execute": "device_add", 295*45c3d6cfSSteve Sistare "arguments": {"driver": "pcie-root-port"}} 296*45c3d6cfSSteve Sistare 297*45c3d6cfSSteve Sistare {"execute":"migrate-set-parameters", 298*45c3d6cfSSteve Sistare "arguments":{"mode":"cpr-transfer"}} 299*45c3d6cfSSteve Sistare 300*45c3d6cfSSteve Sistare {"execute": "migrate", "arguments": { "channels": [ 301*45c3d6cfSSteve Sistare {"channel-type": "main", 302*45c3d6cfSSteve Sistare "addr": { "transport": "socket", "type": "inet", 303*45c3d6cfSSteve Sistare "host": "0", "port": "44444" }}, 304*45c3d6cfSSteve Sistare {"channel-type": "cpr", 305*45c3d6cfSSteve Sistare "addr": { "transport": "socket", "type": "unix", 306*45c3d6cfSSteve Sistare "path": "cpr.sock" }}]}} 307*45c3d6cfSSteve Sistare 308*45c3d6cfSSteve Sistare QEMU 10.0.50 monitor 309*45c3d6cfSSteve Sistare (qemu) info status 310*45c3d6cfSSteve Sistare VM status: paused (inmigrate) 311*45c3d6cfSSteve Sistare (qemu) device_add pcie-root-port 312*45c3d6cfSSteve Sistare (qemu) migrate_incoming tcp:0:44444 313*45c3d6cfSSteve Sistare (qemu) info status 314*45c3d6cfSSteve Sistare VM status: running 315*45c3d6cfSSteve Sistare 316*45c3d6cfSSteve Sistare {"execute": "query-status"} 317*45c3d6cfSSteve Sistare {"return": {"status": "postmigrate", 318*45c3d6cfSSteve Sistare "running": false}} 319*45c3d6cfSSteve Sistare 320*45c3d6cfSSteve SistareFutures 321*45c3d6cfSSteve Sistare^^^^^^^ 322*45c3d6cfSSteve Sistare 323*45c3d6cfSSteve Sistarecpr-transfer mode is based on a capability to transfer open file 324*45c3d6cfSSteve Sistaredescriptors from old to new QEMU. In the future, descriptors for 325*45c3d6cfSSteve Sistarevfio, iommufd, vhost, and char devices could be transferred, 326*45c3d6cfSSteve Sistarepreserving those devices and their kernel state without interruption, 327*45c3d6cfSSteve Sistareeven if they do not explicitly support live migration. 328