1*7f631442SPierrick Bouvier.. _migration: 2*7f631442SPierrick Bouvier 3f6bbac98SPeter Xu=================== 4f6bbac98SPeter XuMigration framework 5f6bbac98SPeter Xu=================== 62e3c8f8dSDr. David Alan Gilbert 72e3c8f8dSDr. David Alan GilbertQEMU has code to load/save the state of the guest that it is running. 82e3c8f8dSDr. David Alan GilbertThese are two complementary operations. Saving the state just does 92e3c8f8dSDr. David Alan Gilbertthat, saves the state for each device that the guest is running. 102e3c8f8dSDr. David Alan GilbertRestoring a guest is just the opposite operation: we need to load the 112e3c8f8dSDr. David Alan Gilbertstate of each device. 122e3c8f8dSDr. David Alan Gilbert 132e3c8f8dSDr. David Alan GilbertFor this to work, QEMU has to be launched with the same arguments the 142e3c8f8dSDr. David Alan Gilberttwo times. I.e. it can only restore the state in one guest that has 152e3c8f8dSDr. David Alan Gilbertthe same devices that the one it was saved (this last requirement can 162e3c8f8dSDr. David Alan Gilbertbe relaxed a bit, but for now we can consider that configuration has 172e3c8f8dSDr. David Alan Gilbertto be exactly the same). 182e3c8f8dSDr. David Alan Gilbert 192e3c8f8dSDr. David Alan GilbertOnce that we are able to save/restore a guest, a new functionality is 202e3c8f8dSDr. David Alan Gilbertrequested: migration. This means that QEMU is able to start in one 212e3c8f8dSDr. David Alan Gilbertmachine and being "migrated" to another machine. I.e. being moved to 222e3c8f8dSDr. David Alan Gilbertanother machine. 232e3c8f8dSDr. David Alan Gilbert 242e3c8f8dSDr. David Alan GilbertNext was the "live migration" functionality. This is important 252e3c8f8dSDr. David Alan Gilbertbecause some guests run with a lot of state (specially RAM), and it 262e3c8f8dSDr. David Alan Gilbertcan take a while to move all state from one machine to another. Live 272e3c8f8dSDr. David Alan Gilbertmigration allows the guest to continue running while the state is 282e3c8f8dSDr. David Alan Gilberttransferred. Only while the last part of the state is transferred has 292e3c8f8dSDr. David Alan Gilbertthe guest to be stopped. Typically the time that the guest is 302e3c8f8dSDr. David Alan Gilbertunresponsive during live migration is the low hundred of milliseconds 312e3c8f8dSDr. David Alan Gilbert(notice that this depends on a lot of things). 322e3c8f8dSDr. David Alan Gilbert 33d8a0f054SJuan Quintela.. contents:: 34d8a0f054SJuan Quintela 35edd70806SDr. David Alan GilbertTransports 36edd70806SDr. David Alan Gilbert========== 372e3c8f8dSDr. David Alan Gilbert 38edd70806SDr. David Alan GilbertThe migration stream is normally just a byte stream that can be passed 39edd70806SDr. David Alan Gilbertover any transport. 402e3c8f8dSDr. David Alan Gilbert 412e3c8f8dSDr. David Alan Gilbert- tcp migration: do the migration using tcp sockets 422e3c8f8dSDr. David Alan Gilbert- unix migration: do the migration using unix sockets 432e3c8f8dSDr. David Alan Gilbert- exec migration: do the migration using the stdin/stdout through a process. 449277d81fSVille Skyttä- fd migration: do the migration using a file descriptor that is 452e3c8f8dSDr. David Alan Gilbert passed to QEMU. QEMU doesn't care how this file descriptor is opened. 46c35462f1SFabiano Rosas- file migration: do the migration using a file that is passed to QEMU 47c35462f1SFabiano Rosas by path. A file offset option is supported to allow a management 48c35462f1SFabiano Rosas application to add its own metadata to the start of the file without 4961dec060SFabiano Rosas QEMU interference. Note that QEMU does not flush cached file 5061dec060SFabiano Rosas data/metadata at the end of migration. 512e3c8f8dSDr. David Alan Gilbert 528d60280eSFabiano Rosas The file migration also supports using a file that has already been 538d60280eSFabiano Rosas opened. A set of file descriptors is passed to QEMU via an "fdset" 548d60280eSFabiano Rosas (see add-fd QMP command documentation). This method allows a 558d60280eSFabiano Rosas management application to have control over the migration file 568d60280eSFabiano Rosas opening operation. There are, however, strict requirements to this 578d60280eSFabiano Rosas interface if the multifd capability is enabled: 588d60280eSFabiano Rosas 598d60280eSFabiano Rosas - the fdset must contain two file descriptors that are not 608d60280eSFabiano Rosas duplicates between themselves; 618d60280eSFabiano Rosas - if the direct-io capability is to be used, exactly one of the 628d60280eSFabiano Rosas file descriptors must have the O_DIRECT flag set; 638d60280eSFabiano Rosas - the file must be opened with WRONLY on the migration source side 648d60280eSFabiano Rosas and RDONLY on the migration destination side. 658d60280eSFabiano Rosas 668d60280eSFabiano Rosas- rdma migration: support is included for migration using RDMA, which 678d60280eSFabiano Rosas transports the page data using ``RDMA``, where the hardware takes 688d60280eSFabiano Rosas care of transporting the pages, and the load on the CPU is much 698d60280eSFabiano Rosas lower. While the internals of RDMA migration are a bit different, 708d60280eSFabiano Rosas this isn't really visible outside the RAM migration code. 71edd70806SDr. David Alan Gilbert 72edd70806SDr. David Alan GilbertAll these migration protocols use the same infrastructure to 732e3c8f8dSDr. David Alan Gilbertsave/restore state devices. This infrastructure is shared with the 742e3c8f8dSDr. David Alan Gilbertsavevm/loadvm functionality. 752e3c8f8dSDr. David Alan Gilbert 762e3c8f8dSDr. David Alan GilbertCommon infrastructure 772e3c8f8dSDr. David Alan Gilbert===================== 782e3c8f8dSDr. David Alan Gilbert 792e3c8f8dSDr. David Alan GilbertThe files, sockets or fd's that carry the migration stream are abstracted by 804df3a7bfSPeter Maydellthe ``QEMUFile`` type (see ``migration/qemu-file.h``). In most cases this 814df3a7bfSPeter Maydellis connected to a subtype of ``QIOChannel`` (see ``io/``). 822e3c8f8dSDr. David Alan Gilbert 83edd70806SDr. David Alan Gilbert 842e3c8f8dSDr. David Alan GilbertSaving the state of one device 852e3c8f8dSDr. David Alan Gilbert============================== 862e3c8f8dSDr. David Alan Gilbert 87edd70806SDr. David Alan GilbertFor most devices, the state is saved in a single call to the migration 88edd70806SDr. David Alan Gilbertinfrastructure; these are *non-iterative* devices. The data for these 89edd70806SDr. David Alan Gilbertdevices is sent at the end of precopy migration, when the CPUs are paused. 90edd70806SDr. David Alan GilbertThere are also *iterative* devices, which contain a very large amount of 91edd70806SDr. David Alan Gilbertdata (e.g. RAM or large tables). See the iterative device section below. 922e3c8f8dSDr. David Alan Gilbert 93edd70806SDr. David Alan GilbertGeneral advice for device developers 94edd70806SDr. David Alan Gilbert------------------------------------ 952e3c8f8dSDr. David Alan Gilbert 96edd70806SDr. David Alan Gilbert- The migration state saved should reflect the device being modelled rather 97edd70806SDr. David Alan Gilbert than the way your implementation works. That way if you change the implementation 98edd70806SDr. David Alan Gilbert later the migration stream will stay compatible. That model may include 99edd70806SDr. David Alan Gilbert internal state that's not directly visible in a register. 1002e3c8f8dSDr. David Alan Gilbert 101edd70806SDr. David Alan Gilbert- When saving a migration stream the device code may walk and check 102edd70806SDr. David Alan Gilbert the state of the device. These checks might fail in various ways (e.g. 103edd70806SDr. David Alan Gilbert discovering internal state is corrupt or that the guest has done something bad). 104edd70806SDr. David Alan Gilbert Consider carefully before asserting/aborting at this point, since the 105edd70806SDr. David Alan Gilbert normal response from users is that *migration broke their VM* since it had 106edd70806SDr. David Alan Gilbert apparently been running fine until then. In these error cases, the device 107edd70806SDr. David Alan Gilbert should log a message indicating the cause of error, and should consider 108edd70806SDr. David Alan Gilbert putting the device into an error state, allowing the rest of the VM to 109edd70806SDr. David Alan Gilbert continue execution. 1102e3c8f8dSDr. David Alan Gilbert 111edd70806SDr. David Alan Gilbert- The migration might happen at an inconvenient point, 112edd70806SDr. David Alan Gilbert e.g. right in the middle of the guest reprogramming the device, during 113edd70806SDr. David Alan Gilbert guest reboot or shutdown or while the device is waiting for external IO. 114edd70806SDr. David Alan Gilbert It's strongly preferred that migrations do not fail in this situation, 115edd70806SDr. David Alan Gilbert since in the cloud environment migrations might happen automatically to 116edd70806SDr. David Alan Gilbert VMs that the administrator doesn't directly control. 1172e3c8f8dSDr. David Alan Gilbert 118edd70806SDr. David Alan Gilbert- If you do need to fail a migration, ensure that sufficient information 119edd70806SDr. David Alan Gilbert is logged to identify what went wrong. 1202e3c8f8dSDr. David Alan Gilbert 121edd70806SDr. David Alan Gilbert- The destination should treat an incoming migration stream as hostile 122edd70806SDr. David Alan Gilbert (which we do to varying degrees in the existing code). Check that offsets 123edd70806SDr. David Alan Gilbert into buffers and the like can't cause overruns. Fail the incoming migration 124edd70806SDr. David Alan Gilbert in the case of a corrupted stream like this. 1252e3c8f8dSDr. David Alan Gilbert 126edd70806SDr. David Alan Gilbert- Take care with internal device state or behaviour that might become 127edd70806SDr. David Alan Gilbert migration version dependent. For example, the order of PCI capabilities 128edd70806SDr. David Alan Gilbert is required to stay constant across migration. Another example would 129edd70806SDr. David Alan Gilbert be that a special case handled by subsections (see below) might become 130edd70806SDr. David Alan Gilbert much more common if a default behaviour is changed. 1312e3c8f8dSDr. David Alan Gilbert 132edd70806SDr. David Alan Gilbert- The state of the source should not be changed or destroyed by the 133edd70806SDr. David Alan Gilbert outgoing migration. Migrations timing out or being failed by 134edd70806SDr. David Alan Gilbert higher levels of management, or failures of the destination host are 135edd70806SDr. David Alan Gilbert not unusual, and in that case the VM is restarted on the source. 136edd70806SDr. David Alan Gilbert Note that the management layer can validly revert the migration 137edd70806SDr. David Alan Gilbert even though the QEMU level of migration has succeeded as long as it 138edd70806SDr. David Alan Gilbert does it before starting execution on the destination. 139edd70806SDr. David Alan Gilbert 140edd70806SDr. David Alan Gilbert- Buses and devices should be able to explicitly specify addresses when 141edd70806SDr. David Alan Gilbert instantiated, and management tools should use those. For example, 142edd70806SDr. David Alan Gilbert when hot adding USB devices it's important to specify the ports 143edd70806SDr. David Alan Gilbert and addresses, since implicit ordering based on the command line order 144edd70806SDr. David Alan Gilbert may be different on the destination. This can result in the 145edd70806SDr. David Alan Gilbert device state being loaded into the wrong device. 1462e3c8f8dSDr. David Alan Gilbert 1472e3c8f8dSDr. David Alan GilbertVMState 1482e3c8f8dSDr. David Alan Gilbert------- 1492e3c8f8dSDr. David Alan Gilbert 150edd70806SDr. David Alan GilbertMost device data can be described using the ``VMSTATE`` macros (mostly defined 151edd70806SDr. David Alan Gilbertin ``include/migration/vmstate.h``). 1522e3c8f8dSDr. David Alan Gilbert 1532e3c8f8dSDr. David Alan GilbertAn example (from hw/input/pckbd.c) 1542e3c8f8dSDr. David Alan Gilbert 1552e3c8f8dSDr. David Alan Gilbert.. code:: c 1562e3c8f8dSDr. David Alan Gilbert 1572e3c8f8dSDr. David Alan Gilbert static const VMStateDescription vmstate_kbd = { 1582e3c8f8dSDr. David Alan Gilbert .name = "pckbd", 1592e3c8f8dSDr. David Alan Gilbert .version_id = 3, 1602e3c8f8dSDr. David Alan Gilbert .minimum_version_id = 3, 1612563c97fSRichard Henderson .fields = (const VMStateField[]) { 1622e3c8f8dSDr. David Alan Gilbert VMSTATE_UINT8(write_cmd, KBDState), 1632e3c8f8dSDr. David Alan Gilbert VMSTATE_UINT8(status, KBDState), 1642e3c8f8dSDr. David Alan Gilbert VMSTATE_UINT8(mode, KBDState), 1652e3c8f8dSDr. David Alan Gilbert VMSTATE_UINT8(pending, KBDState), 1662e3c8f8dSDr. David Alan Gilbert VMSTATE_END_OF_LIST() 1672e3c8f8dSDr. David Alan Gilbert } 1682e3c8f8dSDr. David Alan Gilbert }; 1692e3c8f8dSDr. David Alan Gilbert 1705b146be3SJuan QuintelaWe are declaring the state with name "pckbd". The ``version_id`` is 1715b146be3SJuan Quintela3, and there are 4 uint8_t fields in the KBDState structure. We 1725b146be3SJuan Quintelaregistered this ``VMSTATEDescription`` with one of the following 1735b146be3SJuan Quintelafunctions. The first one will generate a device ``instance_id`` 1745b146be3SJuan Quinteladifferent for each registration. Use the second one if you already 1755b146be3SJuan Quintelahave an id that is different for each instance of the device: 1762e3c8f8dSDr. David Alan Gilbert 1772e3c8f8dSDr. David Alan Gilbert.. code:: c 1782e3c8f8dSDr. David Alan Gilbert 1795b146be3SJuan Quintela vmstate_register_any(NULL, &vmstate_kbd, s); 1805b146be3SJuan Quintela vmstate_register(NULL, instance_id, &vmstate_kbd, s); 1812e3c8f8dSDr. David Alan Gilbert 1824df3a7bfSPeter MaydellFor devices that are ``qdev`` based, we can register the device in the class 183edd70806SDr. David Alan Gilbertinit function: 1842e3c8f8dSDr. David Alan Gilbert 185edd70806SDr. David Alan Gilbert.. code:: c 1862e3c8f8dSDr. David Alan Gilbert 187edd70806SDr. David Alan Gilbert dc->vmsd = &vmstate_kbd_isa; 1882e3c8f8dSDr. David Alan Gilbert 189edd70806SDr. David Alan GilbertThe VMState macros take care of ensuring that the device data section 190edd70806SDr. David Alan Gilbertis formatted portably (normally big endian) and make some compile time checks 191edd70806SDr. David Alan Gilbertagainst the types of the fields in the structures. 1922e3c8f8dSDr. David Alan Gilbert 193edd70806SDr. David Alan GilbertVMState macros can include other VMStateDescriptions to store substructures 194edd70806SDr. David Alan Gilbert(see ``VMSTATE_STRUCT_``), arrays (``VMSTATE_ARRAY_``) and variable length 195edd70806SDr. David Alan Gilbertarrays (``VMSTATE_VARRAY_``). Various other macros exist for special 196edd70806SDr. David Alan Gilbertcases. 1972e3c8f8dSDr. David Alan Gilbert 198edd70806SDr. David Alan GilbertNote that the format on the wire is still very raw; i.e. a VMSTATE_UINT32 199edd70806SDr. David Alan Gilbertends up with a 4 byte bigendian representation on the wire; in the future 200edd70806SDr. David Alan Gilbertit might be possible to use a more structured format. 2012e3c8f8dSDr. David Alan Gilbert 202edd70806SDr. David Alan GilbertLegacy way 203edd70806SDr. David Alan Gilbert---------- 2042e3c8f8dSDr. David Alan Gilbert 205edd70806SDr. David Alan GilbertThis way is going to disappear as soon as all current users are ported to VMSTATE; 206edd70806SDr. David Alan Gilbertalthough converting existing code can be tricky, and thus 'soon' is relative. 2072e3c8f8dSDr. David Alan Gilbert 208edd70806SDr. David Alan GilbertEach device has to register two functions, one to save the state and 209edd70806SDr. David Alan Gilbertanother to load the state back. 2102e3c8f8dSDr. David Alan Gilbert 211edd70806SDr. David Alan Gilbert.. code:: c 2122e3c8f8dSDr. David Alan Gilbert 213ce62df53SDr. David Alan Gilbert int register_savevm_live(const char *idstr, 214edd70806SDr. David Alan Gilbert int instance_id, 215edd70806SDr. David Alan Gilbert int version_id, 216edd70806SDr. David Alan Gilbert SaveVMHandlers *ops, 217edd70806SDr. David Alan Gilbert void *opaque); 2182e3c8f8dSDr. David Alan Gilbert 2194df3a7bfSPeter MaydellTwo functions in the ``ops`` structure are the ``save_state`` 2204df3a7bfSPeter Maydelland ``load_state`` functions. Notice that ``load_state`` receives a version_id 2214df3a7bfSPeter Maydellparameter to know what state format is receiving. ``save_state`` doesn't 222edd70806SDr. David Alan Gilberthave a version_id parameter because it always uses the latest version. 2232e3c8f8dSDr. David Alan Gilbert 224edd70806SDr. David Alan GilbertNote that because the VMState macros still save the data in a raw 225edd70806SDr. David Alan Gilbertformat, in many cases it's possible to replace legacy code 226edd70806SDr. David Alan Gilbertwith a carefully constructed VMState description that matches the 227edd70806SDr. David Alan Gilbertbyte layout of the existing code. 2282e3c8f8dSDr. David Alan Gilbert 229edd70806SDr. David Alan GilbertChanging migration data structures 230edd70806SDr. David Alan Gilbert---------------------------------- 2312e3c8f8dSDr. David Alan Gilbert 232edd70806SDr. David Alan GilbertWhen we migrate a device, we save/load the state as a series 233edd70806SDr. David Alan Gilbertof fields. Sometimes, due to bugs or new functionality, we need to 234edd70806SDr. David Alan Gilbertchange the state to store more/different information. Changing the migration 235edd70806SDr. David Alan Gilbertstate saved for a device can break migration compatibility unless 236edd70806SDr. David Alan Gilbertcare is taken to use the appropriate techniques. In general QEMU tries 237edd70806SDr. David Alan Gilbertto maintain forward migration compatibility (i.e. migrating from 238edd70806SDr. David Alan GilbertQEMU n->n+1) and there are users who benefit from backward compatibility 239edd70806SDr. David Alan Gilbertas well. 2402e3c8f8dSDr. David Alan Gilbert 2412e3c8f8dSDr. David Alan GilbertSubsections 2422e3c8f8dSDr. David Alan Gilbert----------- 2432e3c8f8dSDr. David Alan Gilbert 244edd70806SDr. David Alan GilbertThe most common structure change is adding new data, e.g. when adding 245edd70806SDr. David Alan Gilberta newer form of device, or adding that state that you previously 246edd70806SDr. David Alan Gilbertforgot to migrate. This is best solved using a subsection. 2472e3c8f8dSDr. David Alan Gilbert 248edd70806SDr. David Alan GilbertA subsection is "like" a device vmstate, but with a particularity, it 249edd70806SDr. David Alan Gilberthas a Boolean function that tells if that values are needed to be sent 250edd70806SDr. David Alan Gilbertor not. If this functions returns false, the subsection is not sent. 251edd70806SDr. David Alan GilbertSubsections have a unique name, that is looked for on the receiving 252edd70806SDr. David Alan Gilbertside. 2532e3c8f8dSDr. David Alan Gilbert 2542e3c8f8dSDr. David Alan GilbertOn the receiving side, if we found a subsection for a device that we 2552e3c8f8dSDr. David Alan Gilbertdon't understand, we just fail the migration. If we understand all 256edd70806SDr. David Alan Gilbertthe subsections, then we load the state with success. There's no check 257edd70806SDr. David Alan Gilbertthat a subsection is loaded, so a newer QEMU that knows about a subsection 258edd70806SDr. David Alan Gilbertcan (with care) load a stream from an older QEMU that didn't send 259edd70806SDr. David Alan Gilbertthe subsection. 260edd70806SDr. David Alan Gilbert 261edd70806SDr. David Alan GilbertIf the new data is only needed in a rare case, then the subsection 262edd70806SDr. David Alan Gilbertcan be made conditional on that case and the migration will still 263edd70806SDr. David Alan Gilbertsucceed to older QEMUs in most cases. This is OK for data that's 264edd70806SDr. David Alan Gilbertcritical, but in some use cases it's preferred that the migration 265edd70806SDr. David Alan Gilbertshould succeed even with the data missing. To support this the 266edd70806SDr. David Alan Gilbertsubsection can be connected to a device property and from there 267edd70806SDr. David Alan Gilbertto a versioned machine type. 2682e3c8f8dSDr. David Alan Gilbert 2693eb21fe9SDr. David Alan GilbertThe 'pre_load' and 'post_load' functions on subsections are only 2703eb21fe9SDr. David Alan Gilbertcalled if the subsection is loaded. 2713eb21fe9SDr. David Alan Gilbert 2723eb21fe9SDr. David Alan GilbertOne important note is that the outer post_load() function is called "after" 2733eb21fe9SDr. David Alan Gilbertloading all subsections, because a newer subsection could change the same 2743eb21fe9SDr. David Alan Gilbertvalue that it uses. A flag, and the combination of outer pre_load and 2753eb21fe9SDr. David Alan Gilbertpost_load can be used to detect whether a subsection was loaded, and to 276edd70806SDr. David Alan Gilbertfall back on default behaviour when the subsection isn't present. 2772e3c8f8dSDr. David Alan Gilbert 2782e3c8f8dSDr. David Alan GilbertExample: 2792e3c8f8dSDr. David Alan Gilbert 2802e3c8f8dSDr. David Alan Gilbert.. code:: c 2812e3c8f8dSDr. David Alan Gilbert 2822e3c8f8dSDr. David Alan Gilbert static bool ide_drive_pio_state_needed(void *opaque) 2832e3c8f8dSDr. David Alan Gilbert { 2842e3c8f8dSDr. David Alan Gilbert IDEState *s = opaque; 2852e3c8f8dSDr. David Alan Gilbert 2862e3c8f8dSDr. David Alan Gilbert return ((s->status & DRQ_STAT) != 0) 2872e3c8f8dSDr. David Alan Gilbert || (s->bus->error_status & BM_STATUS_PIO_RETRY); 2882e3c8f8dSDr. David Alan Gilbert } 2892e3c8f8dSDr. David Alan Gilbert 2902e3c8f8dSDr. David Alan Gilbert const VMStateDescription vmstate_ide_drive_pio_state = { 2912e3c8f8dSDr. David Alan Gilbert .name = "ide_drive/pio_state", 2922e3c8f8dSDr. David Alan Gilbert .version_id = 1, 2932e3c8f8dSDr. David Alan Gilbert .minimum_version_id = 1, 2942e3c8f8dSDr. David Alan Gilbert .pre_save = ide_drive_pio_pre_save, 2952e3c8f8dSDr. David Alan Gilbert .post_load = ide_drive_pio_post_load, 2962e3c8f8dSDr. David Alan Gilbert .needed = ide_drive_pio_state_needed, 2972563c97fSRichard Henderson .fields = (const VMStateField[]) { 2982e3c8f8dSDr. David Alan Gilbert VMSTATE_INT32(req_nb_sectors, IDEState), 2992e3c8f8dSDr. David Alan Gilbert VMSTATE_VARRAY_INT32(io_buffer, IDEState, io_buffer_total_len, 1, 3002e3c8f8dSDr. David Alan Gilbert vmstate_info_uint8, uint8_t), 3012e3c8f8dSDr. David Alan Gilbert VMSTATE_INT32(cur_io_buffer_offset, IDEState), 3022e3c8f8dSDr. David Alan Gilbert VMSTATE_INT32(cur_io_buffer_len, IDEState), 3032e3c8f8dSDr. David Alan Gilbert VMSTATE_UINT8(end_transfer_fn_idx, IDEState), 3042e3c8f8dSDr. David Alan Gilbert VMSTATE_INT32(elementary_transfer_size, IDEState), 3052e3c8f8dSDr. David Alan Gilbert VMSTATE_INT32(packet_transfer_size, IDEState), 3062e3c8f8dSDr. David Alan Gilbert VMSTATE_END_OF_LIST() 3072e3c8f8dSDr. David Alan Gilbert } 3082e3c8f8dSDr. David Alan Gilbert }; 3092e3c8f8dSDr. David Alan Gilbert 3102e3c8f8dSDr. David Alan Gilbert const VMStateDescription vmstate_ide_drive = { 3112e3c8f8dSDr. David Alan Gilbert .name = "ide_drive", 3122e3c8f8dSDr. David Alan Gilbert .version_id = 3, 3132e3c8f8dSDr. David Alan Gilbert .minimum_version_id = 0, 3142e3c8f8dSDr. David Alan Gilbert .post_load = ide_drive_post_load, 3152563c97fSRichard Henderson .fields = (const VMStateField[]) { 3162e3c8f8dSDr. David Alan Gilbert .... several fields .... 3172e3c8f8dSDr. David Alan Gilbert VMSTATE_END_OF_LIST() 3182e3c8f8dSDr. David Alan Gilbert }, 3192563c97fSRichard Henderson .subsections = (const VMStateDescription * const []) { 3202e3c8f8dSDr. David Alan Gilbert &vmstate_ide_drive_pio_state, 3212e3c8f8dSDr. David Alan Gilbert NULL 3222e3c8f8dSDr. David Alan Gilbert } 3232e3c8f8dSDr. David Alan Gilbert }; 3242e3c8f8dSDr. David Alan Gilbert 3252e3c8f8dSDr. David Alan GilbertHere we have a subsection for the pio state. We only need to 3262e3c8f8dSDr. David Alan Gilbertsave/send this state when we are in the middle of a pio operation 3272e3c8f8dSDr. David Alan Gilbert(that is what ``ide_drive_pio_state_needed()`` checks). If DRQ_STAT is 3282e3c8f8dSDr. David Alan Gilbertnot enabled, the values on that fields are garbage and don't need to 3292e3c8f8dSDr. David Alan Gilbertbe sent. 3302e3c8f8dSDr. David Alan Gilbert 331edd70806SDr. David Alan GilbertConnecting subsections to properties 332edd70806SDr. David Alan Gilbert------------------------------------ 333edd70806SDr. David Alan Gilbert 3342e3c8f8dSDr. David Alan GilbertUsing a condition function that checks a 'property' to determine whether 335edd70806SDr. David Alan Gilbertto send a subsection allows backward migration compatibility when 336edd70806SDr. David Alan Gilbertnew subsections are added, especially when combined with versioned 337edd70806SDr. David Alan Gilbertmachine types. 3382e3c8f8dSDr. David Alan Gilbert 3392e3c8f8dSDr. David Alan GilbertFor example: 3402e3c8f8dSDr. David Alan Gilbert 3412e3c8f8dSDr. David Alan Gilbert a) Add a new property using ``DEFINE_PROP_BOOL`` - e.g. support-foo and 3422e3c8f8dSDr. David Alan Gilbert default it to true. 343ac78f737SMarc-André Lureau b) Add an entry to the ``hw_compat_`` for the previous version that sets 3442e3c8f8dSDr. David Alan Gilbert the property to false. 3452e3c8f8dSDr. David Alan Gilbert c) Add a static bool support_foo function that tests the property. 3462e3c8f8dSDr. David Alan Gilbert d) Add a subsection with a .needed set to the support_foo function 3473eb21fe9SDr. David Alan Gilbert e) (potentially) Add an outer pre_load that sets up a default value 3483eb21fe9SDr. David Alan Gilbert for 'foo' to be used if the subsection isn't loaded. 3492e3c8f8dSDr. David Alan Gilbert 3502e3c8f8dSDr. David Alan GilbertNow that subsection will not be generated when using an older 3512e3c8f8dSDr. David Alan Gilbertmachine type and the migration stream will be accepted by older 352edd70806SDr. David Alan GilbertQEMU versions. 3532e3c8f8dSDr. David Alan Gilbert 3542e3c8f8dSDr. David Alan GilbertNot sending existing elements 3552e3c8f8dSDr. David Alan Gilbert----------------------------- 3562e3c8f8dSDr. David Alan Gilbert 3572e3c8f8dSDr. David Alan GilbertSometimes members of the VMState are no longer needed: 3582e3c8f8dSDr. David Alan Gilbert 3592e3c8f8dSDr. David Alan Gilbert - removing them will break migration compatibility 3602e3c8f8dSDr. David Alan Gilbert 361edd70806SDr. David Alan Gilbert - making them version dependent and bumping the version will break backward migration 362edd70806SDr. David Alan Gilbert compatibility. 3632e3c8f8dSDr. David Alan Gilbert 364edd70806SDr. David Alan GilbertAdding a dummy field into the migration stream is normally the best way to preserve 365edd70806SDr. David Alan Gilbertcompatibility. 366edd70806SDr. David Alan Gilbert 367edd70806SDr. David Alan GilbertIf the field really does need to be removed then: 3682e3c8f8dSDr. David Alan Gilbert 3692e3c8f8dSDr. David Alan Gilbert a) Add a new property/compatibility/function in the same way for subsections above. 3702e3c8f8dSDr. David Alan Gilbert b) replace the VMSTATE macro with the _TEST version of the macro, e.g.: 3712e3c8f8dSDr. David Alan Gilbert 3722e3c8f8dSDr. David Alan Gilbert ``VMSTATE_UINT32(foo, barstruct)`` 3732e3c8f8dSDr. David Alan Gilbert 3742e3c8f8dSDr. David Alan Gilbert becomes 3752e3c8f8dSDr. David Alan Gilbert 3762e3c8f8dSDr. David Alan Gilbert ``VMSTATE_UINT32_TEST(foo, barstruct, pre_version_baz)`` 3772e3c8f8dSDr. David Alan Gilbert 3782e3c8f8dSDr. David Alan Gilbert Sometime in the future when we no longer care about the ancient versions these can be killed off. 379edd70806SDr. David Alan Gilbert Note that for backward compatibility it's important to fill in the structure with 380edd70806SDr. David Alan Gilbert data that the destination will understand. 381edd70806SDr. David Alan Gilbert 382edd70806SDr. David Alan GilbertAny difference in the predicates on the source and destination will end up 383edd70806SDr. David Alan Gilbertwith different fields being enabled and data being loaded into the wrong 384edd70806SDr. David Alan Gilbertfields; for this reason conditional fields like this are very fragile. 385edd70806SDr. David Alan Gilbert 386edd70806SDr. David Alan GilbertVersions 387edd70806SDr. David Alan Gilbert-------- 388edd70806SDr. David Alan Gilbert 389edd70806SDr. David Alan GilbertVersion numbers are intended for major incompatible changes to the 390edd70806SDr. David Alan Gilbertmigration of a device, and using them breaks backward-migration 391edd70806SDr. David Alan Gilbertcompatibility; in general most changes can be made by adding Subsections 392edd70806SDr. David Alan Gilbert(see above) or _TEST macros (see above) which won't break compatibility. 393edd70806SDr. David Alan Gilbert 3944df3a7bfSPeter MaydellEach version is associated with a series of fields saved. The ``save_state`` always saves 3954df3a7bfSPeter Maydellthe state as the newer version. But ``load_state`` sometimes is able to 396edd70806SDr. David Alan Gilbertload state from an older version. 397edd70806SDr. David Alan Gilbert 39818621987SPeter MaydellYou can see that there are two version fields: 399edd70806SDr. David Alan Gilbert 4004df3a7bfSPeter Maydell- ``version_id``: the maximum version_id supported by VMState for that device. 4014df3a7bfSPeter Maydell- ``minimum_version_id``: the minimum version_id that VMState is able to understand 402edd70806SDr. David Alan Gilbert for that device. 403edd70806SDr. David Alan Gilbert 40418621987SPeter MaydellVMState is able to read versions from minimum_version_id to version_id. 405edd70806SDr. David Alan Gilbert 406edd70806SDr. David Alan GilbertThere are *_V* forms of many ``VMSTATE_`` macros to load fields for version dependent fields, 407edd70806SDr. David Alan Gilberte.g. 408edd70806SDr. David Alan Gilbert 409edd70806SDr. David Alan Gilbert.. code:: c 410edd70806SDr. David Alan Gilbert 411edd70806SDr. David Alan Gilbert VMSTATE_UINT16_V(ip_id, Slirp, 2), 412edd70806SDr. David Alan Gilbert 413edd70806SDr. David Alan Gilbertonly loads that field for versions 2 and newer. 414edd70806SDr. David Alan Gilbert 415edd70806SDr. David Alan GilbertSaving state will always create a section with the 'version_id' value 416edd70806SDr. David Alan Gilbertand thus can't be loaded by any older QEMU. 417edd70806SDr. David Alan Gilbert 418edd70806SDr. David Alan GilbertMassaging functions 419edd70806SDr. David Alan Gilbert------------------- 420edd70806SDr. David Alan Gilbert 421edd70806SDr. David Alan GilbertSometimes, it is not enough to be able to save the state directly 422edd70806SDr. David Alan Gilbertfrom one structure, we need to fill the correct values there. One 423edd70806SDr. David Alan Gilbertexample is when we are using kvm. Before saving the cpu state, we 424edd70806SDr. David Alan Gilbertneed to ask kvm to copy to QEMU the state that it is using. And the 425edd70806SDr. David Alan Gilbertopposite when we are loading the state, we need a way to tell kvm to 426edd70806SDr. David Alan Gilbertload the state for the cpu that we have just loaded from the QEMUFile. 427edd70806SDr. David Alan Gilbert 428edd70806SDr. David Alan GilbertThe functions to do that are inside a vmstate definition, and are called: 429edd70806SDr. David Alan Gilbert 430edd70806SDr. David Alan Gilbert- ``int (*pre_load)(void *opaque);`` 431edd70806SDr. David Alan Gilbert 432edd70806SDr. David Alan Gilbert This function is called before we load the state of one device. 433edd70806SDr. David Alan Gilbert 434edd70806SDr. David Alan Gilbert- ``int (*post_load)(void *opaque, int version_id);`` 435edd70806SDr. David Alan Gilbert 436edd70806SDr. David Alan Gilbert This function is called after we load the state of one device. 437edd70806SDr. David Alan Gilbert 438edd70806SDr. David Alan Gilbert- ``int (*pre_save)(void *opaque);`` 439edd70806SDr. David Alan Gilbert 440edd70806SDr. David Alan Gilbert This function is called before we save the state of one device. 441edd70806SDr. David Alan Gilbert 4428c07559fSAaron Lindsay- ``int (*post_save)(void *opaque);`` 4438c07559fSAaron Lindsay 4448c07559fSAaron Lindsay This function is called after we save the state of one device 4458c07559fSAaron Lindsay (even upon failure, unless the call to pre_save returned an error). 4468c07559fSAaron Lindsay 4478c07559fSAaron LindsayExample: You can look at hpet.c, that uses the first three functions 4488c07559fSAaron Lindsayto massage the state that is transferred. 449edd70806SDr. David Alan Gilbert 450edd70806SDr. David Alan GilbertThe ``VMSTATE_WITH_TMP`` macro may be useful when the migration 451edd70806SDr. David Alan Gilbertdata doesn't match the stored device data well; it allows an 452edd70806SDr. David Alan Gilbertintermediate temporary structure to be populated with migration 453edd70806SDr. David Alan Gilbertdata and then transferred to the main structure. 454edd70806SDr. David Alan Gilbert 455ad2b6523SBernhard BeschowIf you use memory or portio_list API functions that update memory layout outside 456edd70806SDr. David Alan Gilbertinitialization (i.e., in response to a guest action), this is a strong 4574df3a7bfSPeter Maydellindication that you need to call these functions in a ``post_load`` callback. 458ad2b6523SBernhard BeschowExamples of such API functions are: 459edd70806SDr. David Alan Gilbert 460edd70806SDr. David Alan Gilbert - memory_region_add_subregion() 461edd70806SDr. David Alan Gilbert - memory_region_del_subregion() 462edd70806SDr. David Alan Gilbert - memory_region_set_readonly() 463c26763f8SMarc-André Lureau - memory_region_set_nonvolatile() 464edd70806SDr. David Alan Gilbert - memory_region_set_enabled() 465edd70806SDr. David Alan Gilbert - memory_region_set_address() 466edd70806SDr. David Alan Gilbert - memory_region_set_alias_offset() 467ad2b6523SBernhard Beschow - portio_list_set_address() 468f165cdf1SBernhard Beschow - portio_list_set_enabled() 469edd70806SDr. David Alan Gilbert 470e300f4c1SPeter MaydellSince the order of device save/restore is not defined, you must 471e300f4c1SPeter Maydellavoid accessing or changing any other device's state in one of these 472e300f4c1SPeter Maydellcallbacks. (For instance, don't do anything that calls ``update_irq()`` 473e300f4c1SPeter Maydellin a ``post_load`` hook.) Otherwise, restore will not be deterministic, 474e300f4c1SPeter Maydelland this will break execution record/replay. 475e300f4c1SPeter Maydell 476edd70806SDr. David Alan GilbertIterative device migration 477edd70806SDr. David Alan Gilbert-------------------------- 478edd70806SDr. David Alan Gilbert 479eef0bae3SFabiano RosasSome devices, such as RAM or certain platform devices, 480edd70806SDr. David Alan Gilberthave large amounts of data that would mean that the CPUs would be 481edd70806SDr. David Alan Gilbertpaused for too long if they were sent in one section. For these 482edd70806SDr. David Alan Gilbertdevices an *iterative* approach is taken. 483edd70806SDr. David Alan Gilbert 484edd70806SDr. David Alan GilbertThe iterative devices generally don't use VMState macros 485edd70806SDr. David Alan Gilbert(although it may be possible in some cases) and instead use 486edd70806SDr. David Alan Gilbertqemu_put_*/qemu_get_* macros to read/write data to the stream. Specialist 487edd70806SDr. David Alan Gilbertversions exist for high bandwidth IO. 488edd70806SDr. David Alan Gilbert 489edd70806SDr. David Alan Gilbert 490edd70806SDr. David Alan GilbertAn iterative device must provide: 491edd70806SDr. David Alan Gilbert 492edd70806SDr. David Alan Gilbert - A ``save_setup`` function that initialises the data structures and 493edd70806SDr. David Alan Gilbert transmits a first section containing information on the device. In the 494edd70806SDr. David Alan Gilbert case of RAM this transmits a list of RAMBlocks and sizes. 495edd70806SDr. David Alan Gilbert 496edd70806SDr. David Alan Gilbert - A ``load_setup`` function that initialises the data structures on the 497edd70806SDr. David Alan Gilbert destination. 498edd70806SDr. David Alan Gilbert 499c8df4a7aSJuan Quintela - A ``state_pending_exact`` function that indicates how much more 500c8df4a7aSJuan Quintela data we must save. The core migration code will use this to 501c8df4a7aSJuan Quintela determine when to pause the CPUs and complete the migration. 502edd70806SDr. David Alan Gilbert 503c8df4a7aSJuan Quintela - A ``state_pending_estimate`` function that indicates how much more 504c8df4a7aSJuan Quintela data we must save. When the estimated amount is smaller than the 505c8df4a7aSJuan Quintela threshold, we call ``state_pending_exact``. 506c8df4a7aSJuan Quintela 507c8df4a7aSJuan Quintela - A ``save_live_iterate`` function should send a chunk of data until 508c8df4a7aSJuan Quintela the point that stream bandwidth limits tell it to stop. Each call 509c8df4a7aSJuan Quintela generates one section. 510edd70806SDr. David Alan Gilbert 511edd70806SDr. David Alan Gilbert - A ``save_live_complete_precopy`` function that must transmit the 512edd70806SDr. David Alan Gilbert last section for the device containing any remaining data. 513edd70806SDr. David Alan Gilbert 514edd70806SDr. David Alan Gilbert - A ``load_state`` function used to load sections generated by 515edd70806SDr. David Alan Gilbert any of the save functions that generate sections. 516edd70806SDr. David Alan Gilbert 517edd70806SDr. David Alan Gilbert - ``cleanup`` functions for both save and load that are called 518edd70806SDr. David Alan Gilbert at the end of migration. 519edd70806SDr. David Alan Gilbert 520edd70806SDr. David Alan GilbertNote that the contents of the sections for iterative migration tend 521edd70806SDr. David Alan Gilbertto be open-coded by the devices; care should be taken in parsing 522edd70806SDr. David Alan Gilbertthe results and structuring the stream to make them easy to validate. 523edd70806SDr. David Alan Gilbert 524edd70806SDr. David Alan GilbertDevice ordering 525edd70806SDr. David Alan Gilbert--------------- 526edd70806SDr. David Alan Gilbert 527edd70806SDr. David Alan GilbertThere are cases in which the ordering of device loading matters; for 528edd70806SDr. David Alan Gilbertexample in some systems where a device may assert an interrupt during loading, 529edd70806SDr. David Alan Gilbertif the interrupt controller is loaded later then it might lose the state. 530edd70806SDr. David Alan Gilbert 531edd70806SDr. David Alan GilbertSome ordering is implicitly provided by the order in which the machine 532edd70806SDr. David Alan Gilbertdefinition creates devices, however this is somewhat fragile. 533edd70806SDr. David Alan Gilbert 534edd70806SDr. David Alan GilbertThe ``MigrationPriority`` enum provides a means of explicitly enforcing 535edd70806SDr. David Alan Gilbertordering. Numerically higher priorities are loaded earlier. 536edd70806SDr. David Alan GilbertThe priority is set by setting the ``priority`` field of the top level 537edd70806SDr. David Alan Gilbert``VMStateDescription`` for the device. 538edd70806SDr. David Alan Gilbert 539edd70806SDr. David Alan GilbertStream structure 540edd70806SDr. David Alan Gilbert================ 541edd70806SDr. David Alan Gilbert 542edd70806SDr. David Alan GilbertThe stream tries to be word and endian agnostic, allowing migration between hosts 543edd70806SDr. David Alan Gilbertof different characteristics running the same VM. 544edd70806SDr. David Alan Gilbert 545edd70806SDr. David Alan Gilbert - Header 546edd70806SDr. David Alan Gilbert 547edd70806SDr. David Alan Gilbert - Magic 548edd70806SDr. David Alan Gilbert - Version 549edd70806SDr. David Alan Gilbert - VM configuration section 550edd70806SDr. David Alan Gilbert 551edd70806SDr. David Alan Gilbert - Machine type 552edd70806SDr. David Alan Gilbert - Target page bits 553edd70806SDr. David Alan Gilbert - List of sections 554edd70806SDr. David Alan Gilbert Each section contains a device, or one iteration of a device save. 555edd70806SDr. David Alan Gilbert 556edd70806SDr. David Alan Gilbert - section type 557edd70806SDr. David Alan Gilbert - section id 558edd70806SDr. David Alan Gilbert - ID string (First section of each device) 559edd70806SDr. David Alan Gilbert - instance id (First section of each device) 560edd70806SDr. David Alan Gilbert - version id (First section of each device) 561edd70806SDr. David Alan Gilbert - <device data> 562edd70806SDr. David Alan Gilbert - Footer mark 563edd70806SDr. David Alan Gilbert - EOF mark 564edd70806SDr. David Alan Gilbert - VM Description structure 565edd70806SDr. David Alan Gilbert Consisting of a JSON description of the contents for analysis only 566edd70806SDr. David Alan Gilbert 567edd70806SDr. David Alan GilbertThe ``device data`` in each section consists of the data produced 568edd70806SDr. David Alan Gilbertby the code described above. For non-iterative devices they have a single 569edd70806SDr. David Alan Gilbertsection; iterative devices have an initial and last section and a set 570edd70806SDr. David Alan Gilbertof parts in between. 571edd70806SDr. David Alan GilbertNote that there is very little checking by the common code of the integrity 572edd70806SDr. David Alan Gilbertof the ``device data`` contents, that's up to the devices themselves. 573edd70806SDr. David Alan GilbertThe ``footer mark`` provides a little bit of protection for the case where 574edd70806SDr. David Alan Gilbertthe receiving side reads more or less data than expected. 575edd70806SDr. David Alan Gilbert 576edd70806SDr. David Alan GilbertThe ``ID string`` is normally unique, having been formed from a bus name 577edd70806SDr. David Alan Gilbertand device address, PCI devices and storage devices hung off PCI controllers 578edd70806SDr. David Alan Gilbertfit this pattern well. Some devices are fixed single instances (e.g. "pc-ram"). 579edd70806SDr. David Alan GilbertOthers (especially either older devices or system devices which for 580edd70806SDr. David Alan Gilbertsome reason don't have a bus concept) make use of the ``instance id`` 581edd70806SDr. David Alan Gilbertfor otherwise identically named devices. 5822e3c8f8dSDr. David Alan Gilbert 5832e3c8f8dSDr. David Alan GilbertReturn path 5842e3c8f8dSDr. David Alan Gilbert----------- 5852e3c8f8dSDr. David Alan Gilbert 586edd70806SDr. David Alan GilbertOnly a unidirectional stream is required for normal migration, however a 587edd70806SDr. David Alan Gilbert``return path`` can be created when bidirectional communication is desired. 588edd70806SDr. David Alan GilbertThis is primarily used by postcopy, but is also used to return a success 589edd70806SDr. David Alan Gilbertflag to the source at the end of migration. 5902e3c8f8dSDr. David Alan Gilbert 5912e3c8f8dSDr. David Alan Gilbert``qemu_file_get_return_path(QEMUFile* fwdpath)`` gives the QEMUFile* for the return 5922e3c8f8dSDr. David Alan Gilbertpath. 5932e3c8f8dSDr. David Alan Gilbert 5942e3c8f8dSDr. David Alan Gilbert Source side 5952e3c8f8dSDr. David Alan Gilbert 5962e3c8f8dSDr. David Alan Gilbert Forward path - written by migration thread 5972e3c8f8dSDr. David Alan Gilbert Return path - opened by main thread, read by return-path thread 5982e3c8f8dSDr. David Alan Gilbert 5992e3c8f8dSDr. David Alan Gilbert Destination side 6002e3c8f8dSDr. David Alan Gilbert 6012e3c8f8dSDr. David Alan Gilbert Forward path - read by main thread 6022e3c8f8dSDr. David Alan Gilbert Return path - opened by main thread, written by main thread AND postcopy 6032e3c8f8dSDr. David Alan Gilbert thread (protected by rp_mutex) 6042e3c8f8dSDr. David Alan Gilbert 605