xref: /qemu/docs/devel/migration/compatibility.rst (revision e3a207722b783675b362db4ae22a449f42a26b24)
16cc6a7b9SPeter XuBackwards compatibility
26cc6a7b9SPeter Xu=======================
36cc6a7b9SPeter Xu
46cc6a7b9SPeter XuHow backwards compatibility works
56cc6a7b9SPeter Xu---------------------------------
66cc6a7b9SPeter Xu
76cc6a7b9SPeter XuWhen we do migration, we have two QEMU processes: the source and the
86cc6a7b9SPeter Xutarget.  There are two cases, they are the same version or they are
96cc6a7b9SPeter Xudifferent versions.  The easy case is when they are the same version.
106cc6a7b9SPeter XuThe difficult one is when they are different versions.
116cc6a7b9SPeter Xu
126cc6a7b9SPeter XuThere are two things that are different, but they have very similar
136cc6a7b9SPeter Xunames and sometimes get confused:
146cc6a7b9SPeter Xu
156cc6a7b9SPeter Xu- QEMU version
166cc6a7b9SPeter Xu- machine type version
176cc6a7b9SPeter Xu
186cc6a7b9SPeter XuLet's start with a practical example, we start with:
196cc6a7b9SPeter Xu
206cc6a7b9SPeter Xu- qemu-system-x86_64 (v5.2), from now on qemu-5.2.
216cc6a7b9SPeter Xu- qemu-system-x86_64 (v5.1), from now on qemu-5.1.
226cc6a7b9SPeter Xu
236cc6a7b9SPeter XuRelated to this are the "latest" machine types defined on each of
246cc6a7b9SPeter Xuthem:
256cc6a7b9SPeter Xu
266cc6a7b9SPeter Xu- pc-q35-5.2 (newer one in qemu-5.2) from now on pc-5.2
276cc6a7b9SPeter Xu- pc-q35-5.1 (newer one in qemu-5.1) from now on pc-5.1
286cc6a7b9SPeter Xu
296cc6a7b9SPeter XuFirst of all, migration is only supposed to work if you use the same
306cc6a7b9SPeter Xumachine type in both source and destination. The QEMU hardware
316cc6a7b9SPeter Xuconfiguration needs to be the same also on source and destination.
326cc6a7b9SPeter XuMost aspects of the backend configuration can be changed at will,
336cc6a7b9SPeter Xuexcept for a few cases where the backend features influence frontend
346cc6a7b9SPeter Xudevice feature exposure.  But that is not relevant for this section.
356cc6a7b9SPeter Xu
366cc6a7b9SPeter XuI am going to list the number of combinations that we can have.  Let's
376cc6a7b9SPeter Xustart with the trivial ones, QEMU is the same on source and
386cc6a7b9SPeter Xudestination:
396cc6a7b9SPeter Xu
406cc6a7b9SPeter Xu1 - qemu-5.2 -M pc-5.2  -> migrates to -> qemu-5.2 -M pc-5.2
416cc6a7b9SPeter Xu
426cc6a7b9SPeter Xu  This is the latest QEMU with the latest machine type.
436cc6a7b9SPeter Xu  This have to work, and if it doesn't work it is a bug.
446cc6a7b9SPeter Xu
456cc6a7b9SPeter Xu2 - qemu-5.1 -M pc-5.1  -> migrates to -> qemu-5.1 -M pc-5.1
466cc6a7b9SPeter Xu
476cc6a7b9SPeter Xu  Exactly the same case than the previous one, but for 5.1.
486cc6a7b9SPeter Xu  Nothing to see here either.
496cc6a7b9SPeter Xu
506cc6a7b9SPeter XuThis are the easiest ones, we will not talk more about them in this
516cc6a7b9SPeter Xusection.
526cc6a7b9SPeter Xu
536cc6a7b9SPeter XuNow we start with the more interesting cases.  Consider the case where
546cc6a7b9SPeter Xuwe have the same QEMU version in both sides (qemu-5.2) but we are using
556cc6a7b9SPeter Xuthe latest machine type for that version (pc-5.2) but one of an older
566cc6a7b9SPeter XuQEMU version, in this case pc-5.1.
576cc6a7b9SPeter Xu
586cc6a7b9SPeter Xu3 - qemu-5.2 -M pc-5.1  -> migrates to -> qemu-5.2 -M pc-5.1
596cc6a7b9SPeter Xu
606cc6a7b9SPeter Xu  It needs to use the definition of pc-5.1 and the devices as they
616cc6a7b9SPeter Xu  were configured on 5.1, but this should be easy in the sense that
626cc6a7b9SPeter Xu  both sides are the same QEMU and both sides have exactly the same
636cc6a7b9SPeter Xu  idea of what the pc-5.1 machine is.
646cc6a7b9SPeter Xu
656cc6a7b9SPeter Xu4 - qemu-5.1 -M pc-5.2  -> migrates to -> qemu-5.1 -M pc-5.2
666cc6a7b9SPeter Xu
676cc6a7b9SPeter Xu  This combination is not possible as the qemu-5.1 doesn't understand
686cc6a7b9SPeter Xu  pc-5.2 machine type.  So nothing to worry here.
696cc6a7b9SPeter Xu
706cc6a7b9SPeter XuNow it comes the interesting ones, when both QEMU processes are
716cc6a7b9SPeter Xudifferent.  Notice also that the machine type needs to be pc-5.1,
726cc6a7b9SPeter Xubecause we have the limitation than qemu-5.1 doesn't know pc-5.2.  So
736cc6a7b9SPeter Xuthe possible cases are:
746cc6a7b9SPeter Xu
756cc6a7b9SPeter Xu5 - qemu-5.2 -M pc-5.1  -> migrates to -> qemu-5.1 -M pc-5.1
766cc6a7b9SPeter Xu
776cc6a7b9SPeter Xu  This migration is known as newer to older.  We need to make sure
786cc6a7b9SPeter Xu  when we are developing 5.2 we need to take care about not to break
796cc6a7b9SPeter Xu  migration to qemu-5.1.  Notice that we can't make updates to
806cc6a7b9SPeter Xu  qemu-5.1 to understand whatever qemu-5.2 decides to change, so it is
816cc6a7b9SPeter Xu  in qemu-5.2 side to make the relevant changes.
826cc6a7b9SPeter Xu
836cc6a7b9SPeter Xu6 - qemu-5.1 -M pc-5.1  -> migrates to -> qemu-5.2 -M pc-5.1
846cc6a7b9SPeter Xu
856cc6a7b9SPeter Xu  This migration is known as older to newer.  We need to make sure
866cc6a7b9SPeter Xu  than we are able to receive migrations from qemu-5.1. The problem is
876cc6a7b9SPeter Xu  similar to the previous one.
886cc6a7b9SPeter Xu
896cc6a7b9SPeter XuIf qemu-5.1 and qemu-5.2 were the same, there will not be any
906cc6a7b9SPeter Xucompatibility problems.  But the reason that we create qemu-5.2 is to
916cc6a7b9SPeter Xuget new features, devices, defaults, etc.
926cc6a7b9SPeter Xu
936cc6a7b9SPeter XuIf we get a device that has a new feature, or change a default value,
946cc6a7b9SPeter Xuwe have a problem when we try to migrate between different QEMU
956cc6a7b9SPeter Xuversions.
966cc6a7b9SPeter Xu
976cc6a7b9SPeter XuSo we need a way to tell qemu-5.2 that when we are using machine type
986cc6a7b9SPeter Xupc-5.1, it needs to **not** use the feature, to be able to migrate to
996cc6a7b9SPeter Xureal qemu-5.1.
1006cc6a7b9SPeter Xu
1016cc6a7b9SPeter XuAnd the equivalent part when migrating from qemu-5.1 to qemu-5.2.
1026cc6a7b9SPeter Xuqemu-5.2 has to expect that it is not going to get data for the new
1036cc6a7b9SPeter Xufeature, because qemu-5.1 doesn't know about it.
1046cc6a7b9SPeter Xu
1056cc6a7b9SPeter XuHow do we tell QEMU about these device feature changes?  In
1066cc6a7b9SPeter Xuhw/core/machine.c:hw_compat_X_Y arrays.
1076cc6a7b9SPeter Xu
1086cc6a7b9SPeter XuIf we change a default value, we need to put back the old value on
1096cc6a7b9SPeter Xuthat array.  And the device, during initialization needs to look at
1106cc6a7b9SPeter Xuthat array to see what value it needs to get for that feature.  And
1116cc6a7b9SPeter Xuwhat are we going to put in that array, the value of a property.
1126cc6a7b9SPeter Xu
1136cc6a7b9SPeter XuTo create a property for a device, we need to use one of the
1146cc6a7b9SPeter XuDEFINE_PROP_*() macros. See include/hw/qdev-properties.h to find the
1156cc6a7b9SPeter Xumacros that exist.  With it, we set the default value for that
1166cc6a7b9SPeter Xuproperty, and that is what it is going to get in the latest released
1176cc6a7b9SPeter Xuversion.  But if we want a different value for a previous version, we
1186cc6a7b9SPeter Xucan change that in the hw_compat_X_Y arrays.
1196cc6a7b9SPeter Xu
1206cc6a7b9SPeter Xuhw_compat_X_Y is an array of registers that have the format:
1216cc6a7b9SPeter Xu
1226cc6a7b9SPeter Xu- name_device
1236cc6a7b9SPeter Xu- name_property
1246cc6a7b9SPeter Xu- value
1256cc6a7b9SPeter Xu
1266cc6a7b9SPeter XuLet's see a practical example.
1276cc6a7b9SPeter Xu
1286cc6a7b9SPeter XuIn qemu-5.2 virtio-blk-device got multi queue support.  This is a
1296cc6a7b9SPeter Xuchange that is not backward compatible.  In qemu-5.1 it has one
1306cc6a7b9SPeter Xuqueue. In qemu-5.2 it has the same number of queues as the number of
1316cc6a7b9SPeter Xucpus in the system.
1326cc6a7b9SPeter Xu
1336cc6a7b9SPeter XuWhen we are doing migration, if we migrate from a device that has 4
1346cc6a7b9SPeter Xuqueues to a device that have only one queue, we don't know where to
1356cc6a7b9SPeter Xuput the extra information for the other 3 queues, and we fail
1366cc6a7b9SPeter Xumigration.
1376cc6a7b9SPeter Xu
1386cc6a7b9SPeter XuSimilar problem when we migrate from qemu-5.1 that has only one queue
1396cc6a7b9SPeter Xuto qemu-5.2, we only sent information for one queue, but destination
1406cc6a7b9SPeter Xuhas 4, and we have 3 queues that are not properly initialized and
1416cc6a7b9SPeter Xuanything can happen.
1426cc6a7b9SPeter Xu
1436cc6a7b9SPeter XuSo, how can we address this problem.  Easy, just convince qemu-5.2
1446cc6a7b9SPeter Xuthat when it is running pc-5.1, it needs to set the number of queues
1456cc6a7b9SPeter Xufor virtio-blk-devices to 1.
1466cc6a7b9SPeter Xu
1476cc6a7b9SPeter XuThat way we fix the cases 5 and 6.
1486cc6a7b9SPeter Xu
1496cc6a7b9SPeter Xu5 - qemu-5.2 -M pc-5.1  -> migrates to -> qemu-5.1 -M pc-5.1
1506cc6a7b9SPeter Xu
1516cc6a7b9SPeter Xu    qemu-5.2 -M pc-5.1 sets number of queues to be 1.
1526cc6a7b9SPeter Xu    qemu-5.1 -M pc-5.1 expects number of queues to be 1.
1536cc6a7b9SPeter Xu
1546cc6a7b9SPeter Xu    correct.  migration works.
1556cc6a7b9SPeter Xu
1566cc6a7b9SPeter Xu6 - qemu-5.1 -M pc-5.1  -> migrates to -> qemu-5.2 -M pc-5.1
1576cc6a7b9SPeter Xu
1586cc6a7b9SPeter Xu    qemu-5.1 -M pc-5.1 sets number of queues to be 1.
1596cc6a7b9SPeter Xu    qemu-5.2 -M pc-5.1 expects number of queues to be 1.
1606cc6a7b9SPeter Xu
1616cc6a7b9SPeter Xu    correct.  migration works.
1626cc6a7b9SPeter Xu
1636cc6a7b9SPeter XuAnd now the other interesting case, case 3.  In this case we have:
1646cc6a7b9SPeter Xu
1656cc6a7b9SPeter Xu3 - qemu-5.2 -M pc-5.1  -> migrates to -> qemu-5.2 -M pc-5.1
1666cc6a7b9SPeter Xu
1676cc6a7b9SPeter Xu    Here we have the same QEMU in both sides.  So it doesn't matter a
1686cc6a7b9SPeter Xu    lot if we have set the number of queues to 1 or not, because
1696cc6a7b9SPeter Xu    they are the same.
1706cc6a7b9SPeter Xu
1716cc6a7b9SPeter Xu    WRONG!
1726cc6a7b9SPeter Xu
1736cc6a7b9SPeter Xu    Think what happens if we do one of this double migrations:
1746cc6a7b9SPeter Xu
1756cc6a7b9SPeter Xu    A -> migrates -> B -> migrates -> C
1766cc6a7b9SPeter Xu
1776cc6a7b9SPeter Xu    where:
1786cc6a7b9SPeter Xu
1796cc6a7b9SPeter Xu    A: qemu-5.1 -M pc-5.1
1806cc6a7b9SPeter Xu    B: qemu-5.2 -M pc-5.1
1816cc6a7b9SPeter Xu    C: qemu-5.2 -M pc-5.1
1826cc6a7b9SPeter Xu
1836cc6a7b9SPeter Xu    migration A -> B is case 6, so number of queues needs to be 1.
1846cc6a7b9SPeter Xu
1856cc6a7b9SPeter Xu    migration B -> C is case 3, so we don't care.  But actually we
1866cc6a7b9SPeter Xu    care because we haven't started the guest in qemu-5.2, it came
1876cc6a7b9SPeter Xu    migrated from qemu-5.1.  So to be in the safe place, we need to
1886cc6a7b9SPeter Xu    always use number of queues 1 when we are using pc-5.1.
1896cc6a7b9SPeter Xu
1906cc6a7b9SPeter XuNow, how was this done in reality?  The following commit shows how it
1916cc6a7b9SPeter Xuwas done::
1926cc6a7b9SPeter Xu
1936cc6a7b9SPeter Xu  commit 9445e1e15e66c19e42bea942ba810db28052cd05
1946cc6a7b9SPeter Xu  Author: Stefan Hajnoczi <stefanha@redhat.com>
1956cc6a7b9SPeter Xu  Date:   Tue Aug 18 15:33:47 2020 +0100
1966cc6a7b9SPeter Xu
1976cc6a7b9SPeter Xu  virtio-blk-pci: default num_queues to -smp N
1986cc6a7b9SPeter Xu
1996cc6a7b9SPeter XuThe relevant parts for migration are::
2006cc6a7b9SPeter Xu
201*fd363a14SRichard Henderson    @@ -1281,7 +1284,8 @@ static const Property virtio_blk_properties[] = {
2026cc6a7b9SPeter Xu     #endif
2036cc6a7b9SPeter Xu         DEFINE_PROP_BIT("request-merging", VirtIOBlock, conf.request_merging, 0,
2046cc6a7b9SPeter Xu                         true),
2056cc6a7b9SPeter Xu    -    DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues, 1),
2066cc6a7b9SPeter Xu    +    DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues,
2076cc6a7b9SPeter Xu    +                       VIRTIO_BLK_AUTO_NUM_QUEUES),
2086cc6a7b9SPeter Xu         DEFINE_PROP_UINT16("queue-size", VirtIOBlock, conf.queue_size, 256),
2096cc6a7b9SPeter Xu
2106cc6a7b9SPeter XuIt changes the default value of num_queues.  But it fishes it for old
2116cc6a7b9SPeter Xumachine types to have the right value::
2126cc6a7b9SPeter Xu
2136cc6a7b9SPeter Xu    @@ -31,6 +31,7 @@
2146cc6a7b9SPeter Xu     GlobalProperty hw_compat_5_1[] = {
2156cc6a7b9SPeter Xu         ...
2166cc6a7b9SPeter Xu    +    { "virtio-blk-device", "num-queues", "1"},
2176cc6a7b9SPeter Xu         ...
2186cc6a7b9SPeter Xu     };
2196cc6a7b9SPeter Xu
2206cc6a7b9SPeter XuA device with different features on both sides
2216cc6a7b9SPeter Xu----------------------------------------------
2226cc6a7b9SPeter Xu
2236cc6a7b9SPeter XuLet's assume that we are using the same QEMU binary on both sides,
2246cc6a7b9SPeter Xujust to make the things easier.  But we have a device that has
2256cc6a7b9SPeter Xudifferent features on both sides of the migration.  That can be
2266cc6a7b9SPeter Xubecause the devices are different, because the kernel driver of both
2276cc6a7b9SPeter Xudevices have different features, whatever.
2286cc6a7b9SPeter Xu
2296cc6a7b9SPeter XuHow can we get this to work with migration.  The way to do that is
2306cc6a7b9SPeter Xu"theoretically" easy.  You have to get the features that the device
2316cc6a7b9SPeter Xuhas in the source of the migration.  The features that the device has
2326cc6a7b9SPeter Xuon the target of the migration, you get the intersection of the
2336cc6a7b9SPeter Xufeatures of both sides, and that is the way that you should launch
2346cc6a7b9SPeter XuQEMU.
2356cc6a7b9SPeter Xu
2366cc6a7b9SPeter XuNotice that this is not completely related to QEMU.  The most
2376cc6a7b9SPeter Xuimportant thing here is that this should be handled by the managing
2386cc6a7b9SPeter Xuapplication that launches QEMU.  If QEMU is configured correctly, the
2396cc6a7b9SPeter Xumigration will succeed.
2406cc6a7b9SPeter Xu
2416cc6a7b9SPeter XuThat said, actually doing it is complicated.  Almost all devices are
2426cc6a7b9SPeter Xubad at being able to be launched with only some features enabled.
2436cc6a7b9SPeter XuWith one big exception: cpus.
2446cc6a7b9SPeter Xu
2456cc6a7b9SPeter XuYou can read the documentation for QEMU x86 cpu models here:
2466cc6a7b9SPeter Xu
2476cc6a7b9SPeter Xuhttps://qemu-project.gitlab.io/qemu/system/qemu-cpu-models.html
2486cc6a7b9SPeter Xu
2496cc6a7b9SPeter XuSee when they talk about migration they recommend that one chooses the
2506cc6a7b9SPeter Xunewest cpu model that is supported for all cpus.
2516cc6a7b9SPeter Xu
2526cc6a7b9SPeter XuLet's say that we have:
2536cc6a7b9SPeter Xu
2546cc6a7b9SPeter XuHost A:
2556cc6a7b9SPeter Xu
2566cc6a7b9SPeter XuDevice X has the feature Y
2576cc6a7b9SPeter Xu
2586cc6a7b9SPeter XuHost B:
2596cc6a7b9SPeter Xu
2606cc6a7b9SPeter XuDevice X has not the feature Y
2616cc6a7b9SPeter Xu
2626cc6a7b9SPeter XuIf we try to migrate without any care from host A to host B, it will
2636cc6a7b9SPeter Xufail because when migration tries to load the feature Y on
2646cc6a7b9SPeter Xudestination, it will find that the hardware is not there.
2656cc6a7b9SPeter Xu
2666cc6a7b9SPeter XuDoing this would be the equivalent of doing with cpus:
2676cc6a7b9SPeter Xu
2686cc6a7b9SPeter XuHost A:
2696cc6a7b9SPeter Xu
2706cc6a7b9SPeter Xu$ qemu-system-x86_64 -cpu host
2716cc6a7b9SPeter Xu
2726cc6a7b9SPeter XuHost B:
2736cc6a7b9SPeter Xu
2746cc6a7b9SPeter Xu$ qemu-system-x86_64 -cpu host
2756cc6a7b9SPeter Xu
2766cc6a7b9SPeter XuWhen both hosts have different cpu features this is guaranteed to
2776cc6a7b9SPeter Xufail.  Especially if Host B has less features than host A.  If host A
2786cc6a7b9SPeter Xuhas less features than host B, sometimes it works.  Important word of
2796cc6a7b9SPeter Xulast sentence is "sometimes".
2806cc6a7b9SPeter Xu
2816cc6a7b9SPeter XuSo, forgetting about cpu models and continuing with the -cpu host
2826cc6a7b9SPeter Xuexample, let's see that the differences of the cpus is that Host A and
2836cc6a7b9SPeter XuB have the following features:
2846cc6a7b9SPeter Xu
2856cc6a7b9SPeter XuFeatures:   'pcid'  'stibp' 'taa-no'
2866cc6a7b9SPeter XuHost A:        X       X
2876cc6a7b9SPeter XuHost B:                        X
2886cc6a7b9SPeter Xu
2896cc6a7b9SPeter XuAnd we want to migrate between them, the way configure both QEMU cpu
2906cc6a7b9SPeter Xuwill be:
2916cc6a7b9SPeter Xu
2926cc6a7b9SPeter XuHost A:
2936cc6a7b9SPeter Xu
2946cc6a7b9SPeter Xu$ qemu-system-x86_64 -cpu host,pcid=off,stibp=off
2956cc6a7b9SPeter Xu
2966cc6a7b9SPeter XuHost B:
2976cc6a7b9SPeter Xu
2986cc6a7b9SPeter Xu$ qemu-system-x86_64 -cpu host,taa-no=off
2996cc6a7b9SPeter Xu
3006cc6a7b9SPeter XuAnd you would be able to migrate between them.  It is responsibility
3016cc6a7b9SPeter Xuof the management application or of the user to make sure that the
3026cc6a7b9SPeter Xuconfiguration is correct.  QEMU doesn't know how to look at this kind
3036cc6a7b9SPeter Xuof features in general.
3046cc6a7b9SPeter Xu
3056cc6a7b9SPeter XuNotice that we don't recommend to use -cpu host for migration.  It is
3066cc6a7b9SPeter Xuused in this example because it makes the example simpler.
3076cc6a7b9SPeter Xu
3086cc6a7b9SPeter XuOther devices have worse control about individual features.  If they
3096cc6a7b9SPeter Xuwant to be able to migrate between hosts that show different features,
3106cc6a7b9SPeter Xuthe device needs a way to configure which ones it is going to use.
3116cc6a7b9SPeter Xu
3126cc6a7b9SPeter XuIn this section we have considered that we are using the same QEMU
3136cc6a7b9SPeter Xubinary in both sides of the migration.  If we use different QEMU
3146cc6a7b9SPeter Xuversions process, then we need to have into account all other
3156cc6a7b9SPeter Xudifferences and the examples become even more complicated.
3166cc6a7b9SPeter Xu
3176cc6a7b9SPeter XuHow to mitigate when we have a backward compatibility error
3186cc6a7b9SPeter Xu-----------------------------------------------------------
3196cc6a7b9SPeter Xu
3206cc6a7b9SPeter XuWe broke migration for old machine types continuously during
3216cc6a7b9SPeter Xudevelopment.  But as soon as we find that there is a problem, we fix
3226cc6a7b9SPeter Xuit.  The problem is what happens when we detect after we have done a
3236cc6a7b9SPeter Xurelease that something has gone wrong.
3246cc6a7b9SPeter Xu
3256cc6a7b9SPeter XuLet see how it worked with one example.
3266cc6a7b9SPeter Xu
3276cc6a7b9SPeter XuAfter the release of qemu-8.0 we found a problem when doing migration
3286cc6a7b9SPeter Xuof the machine type pc-7.2.
3296cc6a7b9SPeter Xu
3306cc6a7b9SPeter Xu- $ qemu-7.2 -M pc-7.2  ->  qemu-7.2 -M pc-7.2
3316cc6a7b9SPeter Xu
3326cc6a7b9SPeter Xu  This migration works
3336cc6a7b9SPeter Xu
3346cc6a7b9SPeter Xu- $ qemu-8.0 -M pc-7.2  ->  qemu-8.0 -M pc-7.2
3356cc6a7b9SPeter Xu
3366cc6a7b9SPeter Xu  This migration works
3376cc6a7b9SPeter Xu
3386cc6a7b9SPeter Xu- $ qemu-8.0 -M pc-7.2  ->  qemu-7.2 -M pc-7.2
3396cc6a7b9SPeter Xu
3406cc6a7b9SPeter Xu  This migration fails
3416cc6a7b9SPeter Xu
3426cc6a7b9SPeter Xu- $ qemu-7.2 -M pc-7.2  ->  qemu-8.0 -M pc-7.2
3436cc6a7b9SPeter Xu
3446cc6a7b9SPeter Xu  This migration fails
3456cc6a7b9SPeter Xu
3466cc6a7b9SPeter XuSo clearly something fails when migration between qemu-7.2 and
3476cc6a7b9SPeter Xuqemu-8.0 with machine type pc-7.2.  The error messages, and git bisect
3486cc6a7b9SPeter Xupointed to this commit.
3496cc6a7b9SPeter Xu
3506cc6a7b9SPeter XuIn qemu-8.0 we got this commit::
3516cc6a7b9SPeter Xu
3526cc6a7b9SPeter Xu    commit 010746ae1db7f52700cb2e2c46eb94f299cfa0d2
3536cc6a7b9SPeter Xu    Author: Jonathan Cameron <Jonathan.Cameron@huawei.com>
3546cc6a7b9SPeter Xu    Date:   Thu Mar 2 13:37:02 2023 +0000
3556cc6a7b9SPeter Xu
3566cc6a7b9SPeter Xu    hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register
3576cc6a7b9SPeter Xu
3586cc6a7b9SPeter Xu
3596cc6a7b9SPeter XuThe relevant bits of the commit for our example are this ones::
3606cc6a7b9SPeter Xu
3616cc6a7b9SPeter Xu    --- a/hw/pci/pcie_aer.c
3626cc6a7b9SPeter Xu    +++ b/hw/pci/pcie_aer.c
3636cc6a7b9SPeter Xu    @@ -112,6 +112,10 @@ int pcie_aer_init(PCIDevice *dev,
3646cc6a7b9SPeter Xu
3656cc6a7b9SPeter Xu         pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS,
3666cc6a7b9SPeter Xu                      PCI_ERR_UNC_SUPPORTED);
3676cc6a7b9SPeter Xu    +    pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK,
3686cc6a7b9SPeter Xu    +                 PCI_ERR_UNC_MASK_DEFAULT);
3696cc6a7b9SPeter Xu    +    pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK,
3706cc6a7b9SPeter Xu    +                 PCI_ERR_UNC_SUPPORTED);
3716cc6a7b9SPeter Xu
3726cc6a7b9SPeter Xu         pci_set_long(dev->config + offset + PCI_ERR_UNCOR_SEVER,
3736cc6a7b9SPeter Xu                     PCI_ERR_UNC_SEVERITY_DEFAULT);
3746cc6a7b9SPeter Xu
3756cc6a7b9SPeter XuThe patch changes how we configure PCI space for AER.  But QEMU fails
3766cc6a7b9SPeter Xuwhen the PCI space configuration is different between source and
3776cc6a7b9SPeter Xudestination.
3786cc6a7b9SPeter Xu
3796cc6a7b9SPeter XuThe following commit shows how this got fixed::
3806cc6a7b9SPeter Xu
3816cc6a7b9SPeter Xu    commit 5ed3dabe57dd9f4c007404345e5f5bf0e347317f
3826cc6a7b9SPeter Xu    Author: Leonardo Bras <leobras@redhat.com>
3836cc6a7b9SPeter Xu    Date:   Tue May 2 21:27:02 2023 -0300
3846cc6a7b9SPeter Xu
3856cc6a7b9SPeter Xu    hw/pci: Disable PCI_ERR_UNCOR_MASK register for machine type < 8.0
3866cc6a7b9SPeter Xu
3876cc6a7b9SPeter Xu    [...]
3886cc6a7b9SPeter Xu
3896cc6a7b9SPeter XuThe relevant parts of the fix in QEMU are as follow:
3906cc6a7b9SPeter Xu
3916cc6a7b9SPeter XuFirst, we create a new property for the device to be able to configure
3926cc6a7b9SPeter Xuthe old behaviour or the new behaviour::
3936cc6a7b9SPeter Xu
3946cc6a7b9SPeter Xu    diff --git a/hw/pci/pci.c b/hw/pci/pci.c
3956cc6a7b9SPeter Xu    index 8a87ccc8b0..5153ad63d6 100644
3966cc6a7b9SPeter Xu    --- a/hw/pci/pci.c
3976cc6a7b9SPeter Xu    +++ b/hw/pci/pci.c
398*fd363a14SRichard Henderson    @@ -79,6 +79,8 @@ static const Property pci_props[] = {
3996cc6a7b9SPeter Xu         DEFINE_PROP_STRING("failover_pair_id", PCIDevice,
4006cc6a7b9SPeter Xu                            failover_pair_id),
4016cc6a7b9SPeter Xu         DEFINE_PROP_UINT32("acpi-index",  PCIDevice, acpi_index, 0),
4026cc6a7b9SPeter Xu    +    DEFINE_PROP_BIT("x-pcie-err-unc-mask", PCIDevice, cap_present,
4036cc6a7b9SPeter Xu    +                    QEMU_PCIE_ERR_UNC_MASK_BITNR, true),
4046cc6a7b9SPeter Xu     };
4056cc6a7b9SPeter Xu
4066cc6a7b9SPeter XuNotice that we enable the feature for new machine types.
4076cc6a7b9SPeter Xu
4086cc6a7b9SPeter XuNow we see how the fix is done.  This is going to depend on what kind
4096cc6a7b9SPeter Xuof breakage happens, but in this case it is quite simple::
4106cc6a7b9SPeter Xu
4116cc6a7b9SPeter Xu    diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
4126cc6a7b9SPeter Xu    index 103667c368..374d593ead 100644
4136cc6a7b9SPeter Xu    --- a/hw/pci/pcie_aer.c
4146cc6a7b9SPeter Xu    +++ b/hw/pci/pcie_aer.c
4156cc6a7b9SPeter Xu    @@ -112,10 +112,13 @@ int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver,
4166cc6a7b9SPeter Xu    uint16_t offset,
4176cc6a7b9SPeter Xu
4186cc6a7b9SPeter Xu         pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS,
4196cc6a7b9SPeter Xu                      PCI_ERR_UNC_SUPPORTED);
4206cc6a7b9SPeter Xu    -    pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK,
4216cc6a7b9SPeter Xu    -                 PCI_ERR_UNC_MASK_DEFAULT);
4226cc6a7b9SPeter Xu    -    pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK,
4236cc6a7b9SPeter Xu    -                 PCI_ERR_UNC_SUPPORTED);
4246cc6a7b9SPeter Xu    +
4256cc6a7b9SPeter Xu    +    if (dev->cap_present & QEMU_PCIE_ERR_UNC_MASK) {
4266cc6a7b9SPeter Xu    +        pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK,
4276cc6a7b9SPeter Xu    +                     PCI_ERR_UNC_MASK_DEFAULT);
4286cc6a7b9SPeter Xu    +        pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK,
4296cc6a7b9SPeter Xu    +                     PCI_ERR_UNC_SUPPORTED);
4306cc6a7b9SPeter Xu    +    }
4316cc6a7b9SPeter Xu
4326cc6a7b9SPeter Xu         pci_set_long(dev->config + offset + PCI_ERR_UNCOR_SEVER,
4336cc6a7b9SPeter Xu                      PCI_ERR_UNC_SEVERITY_DEFAULT);
4346cc6a7b9SPeter Xu
4356cc6a7b9SPeter XuI.e. If the property bit is enabled, we configure it as we did for
4366cc6a7b9SPeter Xuqemu-8.0.  If the property bit is not set, we configure it as it was in 7.2.
4376cc6a7b9SPeter Xu
4386cc6a7b9SPeter XuAnd now, everything that is missing is disabling the feature for old
4396cc6a7b9SPeter Xumachine types::
4406cc6a7b9SPeter Xu
4416cc6a7b9SPeter Xu    diff --git a/hw/core/machine.c b/hw/core/machine.c
4426cc6a7b9SPeter Xu    index 47a34841a5..07f763eb2e 100644
4436cc6a7b9SPeter Xu    --- a/hw/core/machine.c
4446cc6a7b9SPeter Xu    +++ b/hw/core/machine.c
4456cc6a7b9SPeter Xu    @@ -48,6 +48,7 @@ GlobalProperty hw_compat_7_2[] = {
4466cc6a7b9SPeter Xu         { "e1000e", "migrate-timadj", "off" },
4476cc6a7b9SPeter Xu         { "virtio-mem", "x-early-migration", "false" },
4486cc6a7b9SPeter Xu         { "migration", "x-preempt-pre-7-2", "true" },
4496cc6a7b9SPeter Xu    +    { TYPE_PCI_DEVICE, "x-pcie-err-unc-mask", "off" },
4506cc6a7b9SPeter Xu     };
4516cc6a7b9SPeter Xu     const size_t hw_compat_7_2_len = G_N_ELEMENTS(hw_compat_7_2);
4526cc6a7b9SPeter Xu
4536cc6a7b9SPeter XuAnd now, when qemu-8.0.1 is released with this fix, all combinations
4546cc6a7b9SPeter Xuare going to work as supposed.
4556cc6a7b9SPeter Xu
4566cc6a7b9SPeter Xu- $ qemu-7.2 -M pc-7.2  ->  qemu-7.2 -M pc-7.2 (works)
4576cc6a7b9SPeter Xu- $ qemu-8.0.1 -M pc-7.2  ->  qemu-8.0.1 -M pc-7.2 (works)
4586cc6a7b9SPeter Xu- $ qemu-8.0.1 -M pc-7.2  ->  qemu-7.2 -M pc-7.2 (works)
4596cc6a7b9SPeter Xu- $ qemu-7.2 -M pc-7.2  ->  qemu-8.0.1 -M pc-7.2 (works)
4606cc6a7b9SPeter Xu
4616cc6a7b9SPeter XuSo the normality has been restored and everything is ok, no?
4626cc6a7b9SPeter Xu
4636cc6a7b9SPeter XuNot really, now our matrix is much bigger.  We started with the easy
4646cc6a7b9SPeter Xucases, migration from the same version to the same version always
4656cc6a7b9SPeter Xuworks:
4666cc6a7b9SPeter Xu
4676cc6a7b9SPeter Xu- $ qemu-7.2 -M pc-7.2  ->  qemu-7.2 -M pc-7.2
4686cc6a7b9SPeter Xu- $ qemu-8.0 -M pc-7.2  ->  qemu-8.0 -M pc-7.2
4696cc6a7b9SPeter Xu- $ qemu-8.0.1 -M pc-7.2  ->  qemu-8.0.1 -M pc-7.2
4706cc6a7b9SPeter Xu
4716cc6a7b9SPeter XuNow the interesting ones.  When the QEMU processes versions are
4726cc6a7b9SPeter Xudifferent.  For the 1st set, their fail and we can do nothing, both
4736cc6a7b9SPeter Xuversions are released and we can't change anything.
4746cc6a7b9SPeter Xu
4756cc6a7b9SPeter Xu- $ qemu-7.2 -M pc-7.2  ->  qemu-8.0 -M pc-7.2
4766cc6a7b9SPeter Xu- $ qemu-8.0 -M pc-7.2  ->  qemu-7.2 -M pc-7.2
4776cc6a7b9SPeter Xu
4786cc6a7b9SPeter XuThis two are the ones that work. The whole point of making the
4796cc6a7b9SPeter Xuchange in qemu-8.0.1 release was to fix this issue:
4806cc6a7b9SPeter Xu
4816cc6a7b9SPeter Xu- $ qemu-7.2 -M pc-7.2  ->  qemu-8.0.1 -M pc-7.2
4826cc6a7b9SPeter Xu- $ qemu-8.0.1 -M pc-7.2  ->  qemu-7.2 -M pc-7.2
4836cc6a7b9SPeter Xu
4846cc6a7b9SPeter XuBut now we found that qemu-8.0 neither can migrate to qemu-7.2 not
4856cc6a7b9SPeter Xuqemu-8.0.1.
4866cc6a7b9SPeter Xu
4876cc6a7b9SPeter Xu- $ qemu-8.0 -M pc-7.2  ->  qemu-8.0.1 -M pc-7.2
4886cc6a7b9SPeter Xu- $ qemu-8.0.1 -M pc-7.2  ->  qemu-8.0 -M pc-7.2
4896cc6a7b9SPeter Xu
4906cc6a7b9SPeter XuSo, if we start a pc-7.2 machine in qemu-8.0 we can't migrate it to
4916cc6a7b9SPeter Xuanything except to qemu-8.0.
4926cc6a7b9SPeter Xu
4936cc6a7b9SPeter XuCan we do better?
4946cc6a7b9SPeter Xu
4956cc6a7b9SPeter XuYeap.  If we know that we are going to do this migration:
4966cc6a7b9SPeter Xu
4976cc6a7b9SPeter Xu- $ qemu-8.0 -M pc-7.2  ->  qemu-8.0.1 -M pc-7.2
4986cc6a7b9SPeter Xu
4996cc6a7b9SPeter XuWe can launch the appropriate devices with::
5006cc6a7b9SPeter Xu
5016cc6a7b9SPeter Xu  --device...,x-pci-e-err-unc-mask=on
5026cc6a7b9SPeter Xu
5036cc6a7b9SPeter XuAnd now we can receive a migration from 8.0.  And from now on, we can
5046cc6a7b9SPeter Xudo that migration to new machine types if we remember to enable that
5056cc6a7b9SPeter Xuproperty for pc-7.2.  Notice that we need to remember, it is not
5066cc6a7b9SPeter Xuenough to know that the source of the migration is qemu-8.0.  Think of
5076cc6a7b9SPeter Xuthis example:
5086cc6a7b9SPeter Xu
5096cc6a7b9SPeter Xu$ qemu-8.0 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 -> qemu-8.2 -M pc-7.2
5106cc6a7b9SPeter Xu
5116cc6a7b9SPeter XuIn the second migration, the source is not qemu-8.0, but we still have
5126cc6a7b9SPeter Xuthat "problem" and have that property enabled.  Notice that we need to
5136cc6a7b9SPeter Xucontinue having this mark/property until we have this machine
5146cc6a7b9SPeter Xurebooted.  But it is not a normal reboot (that don't reload QEMU) we
5156cc6a7b9SPeter Xuneed the machine to poweroff/poweron on a fixed QEMU.  And from now
5166cc6a7b9SPeter Xuon we can use the proper real machine.
517