16cc6a7b9SPeter XuBackwards compatibility 26cc6a7b9SPeter Xu======================= 36cc6a7b9SPeter Xu 46cc6a7b9SPeter XuHow backwards compatibility works 56cc6a7b9SPeter Xu--------------------------------- 66cc6a7b9SPeter Xu 76cc6a7b9SPeter XuWhen we do migration, we have two QEMU processes: the source and the 86cc6a7b9SPeter Xutarget. There are two cases, they are the same version or they are 96cc6a7b9SPeter Xudifferent versions. The easy case is when they are the same version. 106cc6a7b9SPeter XuThe difficult one is when they are different versions. 116cc6a7b9SPeter Xu 126cc6a7b9SPeter XuThere are two things that are different, but they have very similar 136cc6a7b9SPeter Xunames and sometimes get confused: 146cc6a7b9SPeter Xu 156cc6a7b9SPeter Xu- QEMU version 166cc6a7b9SPeter Xu- machine type version 176cc6a7b9SPeter Xu 186cc6a7b9SPeter XuLet's start with a practical example, we start with: 196cc6a7b9SPeter Xu 206cc6a7b9SPeter Xu- qemu-system-x86_64 (v5.2), from now on qemu-5.2. 216cc6a7b9SPeter Xu- qemu-system-x86_64 (v5.1), from now on qemu-5.1. 226cc6a7b9SPeter Xu 236cc6a7b9SPeter XuRelated to this are the "latest" machine types defined on each of 246cc6a7b9SPeter Xuthem: 256cc6a7b9SPeter Xu 266cc6a7b9SPeter Xu- pc-q35-5.2 (newer one in qemu-5.2) from now on pc-5.2 276cc6a7b9SPeter Xu- pc-q35-5.1 (newer one in qemu-5.1) from now on pc-5.1 286cc6a7b9SPeter Xu 296cc6a7b9SPeter XuFirst of all, migration is only supposed to work if you use the same 306cc6a7b9SPeter Xumachine type in both source and destination. The QEMU hardware 316cc6a7b9SPeter Xuconfiguration needs to be the same also on source and destination. 326cc6a7b9SPeter XuMost aspects of the backend configuration can be changed at will, 336cc6a7b9SPeter Xuexcept for a few cases where the backend features influence frontend 346cc6a7b9SPeter Xudevice feature exposure. But that is not relevant for this section. 356cc6a7b9SPeter Xu 366cc6a7b9SPeter XuI am going to list the number of combinations that we can have. Let's 376cc6a7b9SPeter Xustart with the trivial ones, QEMU is the same on source and 386cc6a7b9SPeter Xudestination: 396cc6a7b9SPeter Xu 406cc6a7b9SPeter Xu1 - qemu-5.2 -M pc-5.2 -> migrates to -> qemu-5.2 -M pc-5.2 416cc6a7b9SPeter Xu 426cc6a7b9SPeter Xu This is the latest QEMU with the latest machine type. 436cc6a7b9SPeter Xu This have to work, and if it doesn't work it is a bug. 446cc6a7b9SPeter Xu 456cc6a7b9SPeter Xu2 - qemu-5.1 -M pc-5.1 -> migrates to -> qemu-5.1 -M pc-5.1 466cc6a7b9SPeter Xu 476cc6a7b9SPeter Xu Exactly the same case than the previous one, but for 5.1. 486cc6a7b9SPeter Xu Nothing to see here either. 496cc6a7b9SPeter Xu 506cc6a7b9SPeter XuThis are the easiest ones, we will not talk more about them in this 516cc6a7b9SPeter Xusection. 526cc6a7b9SPeter Xu 536cc6a7b9SPeter XuNow we start with the more interesting cases. Consider the case where 546cc6a7b9SPeter Xuwe have the same QEMU version in both sides (qemu-5.2) but we are using 556cc6a7b9SPeter Xuthe latest machine type for that version (pc-5.2) but one of an older 566cc6a7b9SPeter XuQEMU version, in this case pc-5.1. 576cc6a7b9SPeter Xu 586cc6a7b9SPeter Xu3 - qemu-5.2 -M pc-5.1 -> migrates to -> qemu-5.2 -M pc-5.1 596cc6a7b9SPeter Xu 606cc6a7b9SPeter Xu It needs to use the definition of pc-5.1 and the devices as they 616cc6a7b9SPeter Xu were configured on 5.1, but this should be easy in the sense that 626cc6a7b9SPeter Xu both sides are the same QEMU and both sides have exactly the same 636cc6a7b9SPeter Xu idea of what the pc-5.1 machine is. 646cc6a7b9SPeter Xu 656cc6a7b9SPeter Xu4 - qemu-5.1 -M pc-5.2 -> migrates to -> qemu-5.1 -M pc-5.2 666cc6a7b9SPeter Xu 676cc6a7b9SPeter Xu This combination is not possible as the qemu-5.1 doesn't understand 686cc6a7b9SPeter Xu pc-5.2 machine type. So nothing to worry here. 696cc6a7b9SPeter Xu 706cc6a7b9SPeter XuNow it comes the interesting ones, when both QEMU processes are 716cc6a7b9SPeter Xudifferent. Notice also that the machine type needs to be pc-5.1, 726cc6a7b9SPeter Xubecause we have the limitation than qemu-5.1 doesn't know pc-5.2. So 736cc6a7b9SPeter Xuthe possible cases are: 746cc6a7b9SPeter Xu 756cc6a7b9SPeter Xu5 - qemu-5.2 -M pc-5.1 -> migrates to -> qemu-5.1 -M pc-5.1 766cc6a7b9SPeter Xu 776cc6a7b9SPeter Xu This migration is known as newer to older. We need to make sure 786cc6a7b9SPeter Xu when we are developing 5.2 we need to take care about not to break 796cc6a7b9SPeter Xu migration to qemu-5.1. Notice that we can't make updates to 806cc6a7b9SPeter Xu qemu-5.1 to understand whatever qemu-5.2 decides to change, so it is 816cc6a7b9SPeter Xu in qemu-5.2 side to make the relevant changes. 826cc6a7b9SPeter Xu 836cc6a7b9SPeter Xu6 - qemu-5.1 -M pc-5.1 -> migrates to -> qemu-5.2 -M pc-5.1 846cc6a7b9SPeter Xu 856cc6a7b9SPeter Xu This migration is known as older to newer. We need to make sure 866cc6a7b9SPeter Xu than we are able to receive migrations from qemu-5.1. The problem is 876cc6a7b9SPeter Xu similar to the previous one. 886cc6a7b9SPeter Xu 896cc6a7b9SPeter XuIf qemu-5.1 and qemu-5.2 were the same, there will not be any 906cc6a7b9SPeter Xucompatibility problems. But the reason that we create qemu-5.2 is to 916cc6a7b9SPeter Xuget new features, devices, defaults, etc. 926cc6a7b9SPeter Xu 936cc6a7b9SPeter XuIf we get a device that has a new feature, or change a default value, 946cc6a7b9SPeter Xuwe have a problem when we try to migrate between different QEMU 956cc6a7b9SPeter Xuversions. 966cc6a7b9SPeter Xu 976cc6a7b9SPeter XuSo we need a way to tell qemu-5.2 that when we are using machine type 986cc6a7b9SPeter Xupc-5.1, it needs to **not** use the feature, to be able to migrate to 996cc6a7b9SPeter Xureal qemu-5.1. 1006cc6a7b9SPeter Xu 1016cc6a7b9SPeter XuAnd the equivalent part when migrating from qemu-5.1 to qemu-5.2. 1026cc6a7b9SPeter Xuqemu-5.2 has to expect that it is not going to get data for the new 1036cc6a7b9SPeter Xufeature, because qemu-5.1 doesn't know about it. 1046cc6a7b9SPeter Xu 1056cc6a7b9SPeter XuHow do we tell QEMU about these device feature changes? In 1066cc6a7b9SPeter Xuhw/core/machine.c:hw_compat_X_Y arrays. 1076cc6a7b9SPeter Xu 1086cc6a7b9SPeter XuIf we change a default value, we need to put back the old value on 1096cc6a7b9SPeter Xuthat array. And the device, during initialization needs to look at 1106cc6a7b9SPeter Xuthat array to see what value it needs to get for that feature. And 1116cc6a7b9SPeter Xuwhat are we going to put in that array, the value of a property. 1126cc6a7b9SPeter Xu 1136cc6a7b9SPeter XuTo create a property for a device, we need to use one of the 1146cc6a7b9SPeter XuDEFINE_PROP_*() macros. See include/hw/qdev-properties.h to find the 1156cc6a7b9SPeter Xumacros that exist. With it, we set the default value for that 1166cc6a7b9SPeter Xuproperty, and that is what it is going to get in the latest released 1176cc6a7b9SPeter Xuversion. But if we want a different value for a previous version, we 1186cc6a7b9SPeter Xucan change that in the hw_compat_X_Y arrays. 1196cc6a7b9SPeter Xu 1206cc6a7b9SPeter Xuhw_compat_X_Y is an array of registers that have the format: 1216cc6a7b9SPeter Xu 1226cc6a7b9SPeter Xu- name_device 1236cc6a7b9SPeter Xu- name_property 1246cc6a7b9SPeter Xu- value 1256cc6a7b9SPeter Xu 1266cc6a7b9SPeter XuLet's see a practical example. 1276cc6a7b9SPeter Xu 1286cc6a7b9SPeter XuIn qemu-5.2 virtio-blk-device got multi queue support. This is a 1296cc6a7b9SPeter Xuchange that is not backward compatible. In qemu-5.1 it has one 1306cc6a7b9SPeter Xuqueue. In qemu-5.2 it has the same number of queues as the number of 1316cc6a7b9SPeter Xucpus in the system. 1326cc6a7b9SPeter Xu 1336cc6a7b9SPeter XuWhen we are doing migration, if we migrate from a device that has 4 1346cc6a7b9SPeter Xuqueues to a device that have only one queue, we don't know where to 1356cc6a7b9SPeter Xuput the extra information for the other 3 queues, and we fail 1366cc6a7b9SPeter Xumigration. 1376cc6a7b9SPeter Xu 1386cc6a7b9SPeter XuSimilar problem when we migrate from qemu-5.1 that has only one queue 1396cc6a7b9SPeter Xuto qemu-5.2, we only sent information for one queue, but destination 1406cc6a7b9SPeter Xuhas 4, and we have 3 queues that are not properly initialized and 1416cc6a7b9SPeter Xuanything can happen. 1426cc6a7b9SPeter Xu 1436cc6a7b9SPeter XuSo, how can we address this problem. Easy, just convince qemu-5.2 1446cc6a7b9SPeter Xuthat when it is running pc-5.1, it needs to set the number of queues 1456cc6a7b9SPeter Xufor virtio-blk-devices to 1. 1466cc6a7b9SPeter Xu 1476cc6a7b9SPeter XuThat way we fix the cases 5 and 6. 1486cc6a7b9SPeter Xu 1496cc6a7b9SPeter Xu5 - qemu-5.2 -M pc-5.1 -> migrates to -> qemu-5.1 -M pc-5.1 1506cc6a7b9SPeter Xu 1516cc6a7b9SPeter Xu qemu-5.2 -M pc-5.1 sets number of queues to be 1. 1526cc6a7b9SPeter Xu qemu-5.1 -M pc-5.1 expects number of queues to be 1. 1536cc6a7b9SPeter Xu 1546cc6a7b9SPeter Xu correct. migration works. 1556cc6a7b9SPeter Xu 1566cc6a7b9SPeter Xu6 - qemu-5.1 -M pc-5.1 -> migrates to -> qemu-5.2 -M pc-5.1 1576cc6a7b9SPeter Xu 1586cc6a7b9SPeter Xu qemu-5.1 -M pc-5.1 sets number of queues to be 1. 1596cc6a7b9SPeter Xu qemu-5.2 -M pc-5.1 expects number of queues to be 1. 1606cc6a7b9SPeter Xu 1616cc6a7b9SPeter Xu correct. migration works. 1626cc6a7b9SPeter Xu 1636cc6a7b9SPeter XuAnd now the other interesting case, case 3. In this case we have: 1646cc6a7b9SPeter Xu 1656cc6a7b9SPeter Xu3 - qemu-5.2 -M pc-5.1 -> migrates to -> qemu-5.2 -M pc-5.1 1666cc6a7b9SPeter Xu 1676cc6a7b9SPeter Xu Here we have the same QEMU in both sides. So it doesn't matter a 1686cc6a7b9SPeter Xu lot if we have set the number of queues to 1 or not, because 1696cc6a7b9SPeter Xu they are the same. 1706cc6a7b9SPeter Xu 1716cc6a7b9SPeter Xu WRONG! 1726cc6a7b9SPeter Xu 1736cc6a7b9SPeter Xu Think what happens if we do one of this double migrations: 1746cc6a7b9SPeter Xu 1756cc6a7b9SPeter Xu A -> migrates -> B -> migrates -> C 1766cc6a7b9SPeter Xu 1776cc6a7b9SPeter Xu where: 1786cc6a7b9SPeter Xu 1796cc6a7b9SPeter Xu A: qemu-5.1 -M pc-5.1 1806cc6a7b9SPeter Xu B: qemu-5.2 -M pc-5.1 1816cc6a7b9SPeter Xu C: qemu-5.2 -M pc-5.1 1826cc6a7b9SPeter Xu 1836cc6a7b9SPeter Xu migration A -> B is case 6, so number of queues needs to be 1. 1846cc6a7b9SPeter Xu 1856cc6a7b9SPeter Xu migration B -> C is case 3, so we don't care. But actually we 1866cc6a7b9SPeter Xu care because we haven't started the guest in qemu-5.2, it came 1876cc6a7b9SPeter Xu migrated from qemu-5.1. So to be in the safe place, we need to 1886cc6a7b9SPeter Xu always use number of queues 1 when we are using pc-5.1. 1896cc6a7b9SPeter Xu 1906cc6a7b9SPeter XuNow, how was this done in reality? The following commit shows how it 1916cc6a7b9SPeter Xuwas done:: 1926cc6a7b9SPeter Xu 1936cc6a7b9SPeter Xu commit 9445e1e15e66c19e42bea942ba810db28052cd05 1946cc6a7b9SPeter Xu Author: Stefan Hajnoczi <stefanha@redhat.com> 1956cc6a7b9SPeter Xu Date: Tue Aug 18 15:33:47 2020 +0100 1966cc6a7b9SPeter Xu 1976cc6a7b9SPeter Xu virtio-blk-pci: default num_queues to -smp N 1986cc6a7b9SPeter Xu 1996cc6a7b9SPeter XuThe relevant parts for migration are:: 2006cc6a7b9SPeter Xu 201*fd363a14SRichard Henderson @@ -1281,7 +1284,8 @@ static const Property virtio_blk_properties[] = { 2026cc6a7b9SPeter Xu #endif 2036cc6a7b9SPeter Xu DEFINE_PROP_BIT("request-merging", VirtIOBlock, conf.request_merging, 0, 2046cc6a7b9SPeter Xu true), 2056cc6a7b9SPeter Xu - DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues, 1), 2066cc6a7b9SPeter Xu + DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues, 2076cc6a7b9SPeter Xu + VIRTIO_BLK_AUTO_NUM_QUEUES), 2086cc6a7b9SPeter Xu DEFINE_PROP_UINT16("queue-size", VirtIOBlock, conf.queue_size, 256), 2096cc6a7b9SPeter Xu 2106cc6a7b9SPeter XuIt changes the default value of num_queues. But it fishes it for old 2116cc6a7b9SPeter Xumachine types to have the right value:: 2126cc6a7b9SPeter Xu 2136cc6a7b9SPeter Xu @@ -31,6 +31,7 @@ 2146cc6a7b9SPeter Xu GlobalProperty hw_compat_5_1[] = { 2156cc6a7b9SPeter Xu ... 2166cc6a7b9SPeter Xu + { "virtio-blk-device", "num-queues", "1"}, 2176cc6a7b9SPeter Xu ... 2186cc6a7b9SPeter Xu }; 2196cc6a7b9SPeter Xu 2206cc6a7b9SPeter XuA device with different features on both sides 2216cc6a7b9SPeter Xu---------------------------------------------- 2226cc6a7b9SPeter Xu 2236cc6a7b9SPeter XuLet's assume that we are using the same QEMU binary on both sides, 2246cc6a7b9SPeter Xujust to make the things easier. But we have a device that has 2256cc6a7b9SPeter Xudifferent features on both sides of the migration. That can be 2266cc6a7b9SPeter Xubecause the devices are different, because the kernel driver of both 2276cc6a7b9SPeter Xudevices have different features, whatever. 2286cc6a7b9SPeter Xu 2296cc6a7b9SPeter XuHow can we get this to work with migration. The way to do that is 2306cc6a7b9SPeter Xu"theoretically" easy. You have to get the features that the device 2316cc6a7b9SPeter Xuhas in the source of the migration. The features that the device has 2326cc6a7b9SPeter Xuon the target of the migration, you get the intersection of the 2336cc6a7b9SPeter Xufeatures of both sides, and that is the way that you should launch 2346cc6a7b9SPeter XuQEMU. 2356cc6a7b9SPeter Xu 2366cc6a7b9SPeter XuNotice that this is not completely related to QEMU. The most 2376cc6a7b9SPeter Xuimportant thing here is that this should be handled by the managing 2386cc6a7b9SPeter Xuapplication that launches QEMU. If QEMU is configured correctly, the 2396cc6a7b9SPeter Xumigration will succeed. 2406cc6a7b9SPeter Xu 2416cc6a7b9SPeter XuThat said, actually doing it is complicated. Almost all devices are 2426cc6a7b9SPeter Xubad at being able to be launched with only some features enabled. 2436cc6a7b9SPeter XuWith one big exception: cpus. 2446cc6a7b9SPeter Xu 2456cc6a7b9SPeter XuYou can read the documentation for QEMU x86 cpu models here: 2466cc6a7b9SPeter Xu 2476cc6a7b9SPeter Xuhttps://qemu-project.gitlab.io/qemu/system/qemu-cpu-models.html 2486cc6a7b9SPeter Xu 2496cc6a7b9SPeter XuSee when they talk about migration they recommend that one chooses the 2506cc6a7b9SPeter Xunewest cpu model that is supported for all cpus. 2516cc6a7b9SPeter Xu 2526cc6a7b9SPeter XuLet's say that we have: 2536cc6a7b9SPeter Xu 2546cc6a7b9SPeter XuHost A: 2556cc6a7b9SPeter Xu 2566cc6a7b9SPeter XuDevice X has the feature Y 2576cc6a7b9SPeter Xu 2586cc6a7b9SPeter XuHost B: 2596cc6a7b9SPeter Xu 2606cc6a7b9SPeter XuDevice X has not the feature Y 2616cc6a7b9SPeter Xu 2626cc6a7b9SPeter XuIf we try to migrate without any care from host A to host B, it will 2636cc6a7b9SPeter Xufail because when migration tries to load the feature Y on 2646cc6a7b9SPeter Xudestination, it will find that the hardware is not there. 2656cc6a7b9SPeter Xu 2666cc6a7b9SPeter XuDoing this would be the equivalent of doing with cpus: 2676cc6a7b9SPeter Xu 2686cc6a7b9SPeter XuHost A: 2696cc6a7b9SPeter Xu 2706cc6a7b9SPeter Xu$ qemu-system-x86_64 -cpu host 2716cc6a7b9SPeter Xu 2726cc6a7b9SPeter XuHost B: 2736cc6a7b9SPeter Xu 2746cc6a7b9SPeter Xu$ qemu-system-x86_64 -cpu host 2756cc6a7b9SPeter Xu 2766cc6a7b9SPeter XuWhen both hosts have different cpu features this is guaranteed to 2776cc6a7b9SPeter Xufail. Especially if Host B has less features than host A. If host A 2786cc6a7b9SPeter Xuhas less features than host B, sometimes it works. Important word of 2796cc6a7b9SPeter Xulast sentence is "sometimes". 2806cc6a7b9SPeter Xu 2816cc6a7b9SPeter XuSo, forgetting about cpu models and continuing with the -cpu host 2826cc6a7b9SPeter Xuexample, let's see that the differences of the cpus is that Host A and 2836cc6a7b9SPeter XuB have the following features: 2846cc6a7b9SPeter Xu 2856cc6a7b9SPeter XuFeatures: 'pcid' 'stibp' 'taa-no' 2866cc6a7b9SPeter XuHost A: X X 2876cc6a7b9SPeter XuHost B: X 2886cc6a7b9SPeter Xu 2896cc6a7b9SPeter XuAnd we want to migrate between them, the way configure both QEMU cpu 2906cc6a7b9SPeter Xuwill be: 2916cc6a7b9SPeter Xu 2926cc6a7b9SPeter XuHost A: 2936cc6a7b9SPeter Xu 2946cc6a7b9SPeter Xu$ qemu-system-x86_64 -cpu host,pcid=off,stibp=off 2956cc6a7b9SPeter Xu 2966cc6a7b9SPeter XuHost B: 2976cc6a7b9SPeter Xu 2986cc6a7b9SPeter Xu$ qemu-system-x86_64 -cpu host,taa-no=off 2996cc6a7b9SPeter Xu 3006cc6a7b9SPeter XuAnd you would be able to migrate between them. It is responsibility 3016cc6a7b9SPeter Xuof the management application or of the user to make sure that the 3026cc6a7b9SPeter Xuconfiguration is correct. QEMU doesn't know how to look at this kind 3036cc6a7b9SPeter Xuof features in general. 3046cc6a7b9SPeter Xu 3056cc6a7b9SPeter XuNotice that we don't recommend to use -cpu host for migration. It is 3066cc6a7b9SPeter Xuused in this example because it makes the example simpler. 3076cc6a7b9SPeter Xu 3086cc6a7b9SPeter XuOther devices have worse control about individual features. If they 3096cc6a7b9SPeter Xuwant to be able to migrate between hosts that show different features, 3106cc6a7b9SPeter Xuthe device needs a way to configure which ones it is going to use. 3116cc6a7b9SPeter Xu 3126cc6a7b9SPeter XuIn this section we have considered that we are using the same QEMU 3136cc6a7b9SPeter Xubinary in both sides of the migration. If we use different QEMU 3146cc6a7b9SPeter Xuversions process, then we need to have into account all other 3156cc6a7b9SPeter Xudifferences and the examples become even more complicated. 3166cc6a7b9SPeter Xu 3176cc6a7b9SPeter XuHow to mitigate when we have a backward compatibility error 3186cc6a7b9SPeter Xu----------------------------------------------------------- 3196cc6a7b9SPeter Xu 3206cc6a7b9SPeter XuWe broke migration for old machine types continuously during 3216cc6a7b9SPeter Xudevelopment. But as soon as we find that there is a problem, we fix 3226cc6a7b9SPeter Xuit. The problem is what happens when we detect after we have done a 3236cc6a7b9SPeter Xurelease that something has gone wrong. 3246cc6a7b9SPeter Xu 3256cc6a7b9SPeter XuLet see how it worked with one example. 3266cc6a7b9SPeter Xu 3276cc6a7b9SPeter XuAfter the release of qemu-8.0 we found a problem when doing migration 3286cc6a7b9SPeter Xuof the machine type pc-7.2. 3296cc6a7b9SPeter Xu 3306cc6a7b9SPeter Xu- $ qemu-7.2 -M pc-7.2 -> qemu-7.2 -M pc-7.2 3316cc6a7b9SPeter Xu 3326cc6a7b9SPeter Xu This migration works 3336cc6a7b9SPeter Xu 3346cc6a7b9SPeter Xu- $ qemu-8.0 -M pc-7.2 -> qemu-8.0 -M pc-7.2 3356cc6a7b9SPeter Xu 3366cc6a7b9SPeter Xu This migration works 3376cc6a7b9SPeter Xu 3386cc6a7b9SPeter Xu- $ qemu-8.0 -M pc-7.2 -> qemu-7.2 -M pc-7.2 3396cc6a7b9SPeter Xu 3406cc6a7b9SPeter Xu This migration fails 3416cc6a7b9SPeter Xu 3426cc6a7b9SPeter Xu- $ qemu-7.2 -M pc-7.2 -> qemu-8.0 -M pc-7.2 3436cc6a7b9SPeter Xu 3446cc6a7b9SPeter Xu This migration fails 3456cc6a7b9SPeter Xu 3466cc6a7b9SPeter XuSo clearly something fails when migration between qemu-7.2 and 3476cc6a7b9SPeter Xuqemu-8.0 with machine type pc-7.2. The error messages, and git bisect 3486cc6a7b9SPeter Xupointed to this commit. 3496cc6a7b9SPeter Xu 3506cc6a7b9SPeter XuIn qemu-8.0 we got this commit:: 3516cc6a7b9SPeter Xu 3526cc6a7b9SPeter Xu commit 010746ae1db7f52700cb2e2c46eb94f299cfa0d2 3536cc6a7b9SPeter Xu Author: Jonathan Cameron <Jonathan.Cameron@huawei.com> 3546cc6a7b9SPeter Xu Date: Thu Mar 2 13:37:02 2023 +0000 3556cc6a7b9SPeter Xu 3566cc6a7b9SPeter Xu hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register 3576cc6a7b9SPeter Xu 3586cc6a7b9SPeter Xu 3596cc6a7b9SPeter XuThe relevant bits of the commit for our example are this ones:: 3606cc6a7b9SPeter Xu 3616cc6a7b9SPeter Xu --- a/hw/pci/pcie_aer.c 3626cc6a7b9SPeter Xu +++ b/hw/pci/pcie_aer.c 3636cc6a7b9SPeter Xu @@ -112,6 +112,10 @@ int pcie_aer_init(PCIDevice *dev, 3646cc6a7b9SPeter Xu 3656cc6a7b9SPeter Xu pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS, 3666cc6a7b9SPeter Xu PCI_ERR_UNC_SUPPORTED); 3676cc6a7b9SPeter Xu + pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK, 3686cc6a7b9SPeter Xu + PCI_ERR_UNC_MASK_DEFAULT); 3696cc6a7b9SPeter Xu + pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK, 3706cc6a7b9SPeter Xu + PCI_ERR_UNC_SUPPORTED); 3716cc6a7b9SPeter Xu 3726cc6a7b9SPeter Xu pci_set_long(dev->config + offset + PCI_ERR_UNCOR_SEVER, 3736cc6a7b9SPeter Xu PCI_ERR_UNC_SEVERITY_DEFAULT); 3746cc6a7b9SPeter Xu 3756cc6a7b9SPeter XuThe patch changes how we configure PCI space for AER. But QEMU fails 3766cc6a7b9SPeter Xuwhen the PCI space configuration is different between source and 3776cc6a7b9SPeter Xudestination. 3786cc6a7b9SPeter Xu 3796cc6a7b9SPeter XuThe following commit shows how this got fixed:: 3806cc6a7b9SPeter Xu 3816cc6a7b9SPeter Xu commit 5ed3dabe57dd9f4c007404345e5f5bf0e347317f 3826cc6a7b9SPeter Xu Author: Leonardo Bras <leobras@redhat.com> 3836cc6a7b9SPeter Xu Date: Tue May 2 21:27:02 2023 -0300 3846cc6a7b9SPeter Xu 3856cc6a7b9SPeter Xu hw/pci: Disable PCI_ERR_UNCOR_MASK register for machine type < 8.0 3866cc6a7b9SPeter Xu 3876cc6a7b9SPeter Xu [...] 3886cc6a7b9SPeter Xu 3896cc6a7b9SPeter XuThe relevant parts of the fix in QEMU are as follow: 3906cc6a7b9SPeter Xu 3916cc6a7b9SPeter XuFirst, we create a new property for the device to be able to configure 3926cc6a7b9SPeter Xuthe old behaviour or the new behaviour:: 3936cc6a7b9SPeter Xu 3946cc6a7b9SPeter Xu diff --git a/hw/pci/pci.c b/hw/pci/pci.c 3956cc6a7b9SPeter Xu index 8a87ccc8b0..5153ad63d6 100644 3966cc6a7b9SPeter Xu --- a/hw/pci/pci.c 3976cc6a7b9SPeter Xu +++ b/hw/pci/pci.c 398*fd363a14SRichard Henderson @@ -79,6 +79,8 @@ static const Property pci_props[] = { 3996cc6a7b9SPeter Xu DEFINE_PROP_STRING("failover_pair_id", PCIDevice, 4006cc6a7b9SPeter Xu failover_pair_id), 4016cc6a7b9SPeter Xu DEFINE_PROP_UINT32("acpi-index", PCIDevice, acpi_index, 0), 4026cc6a7b9SPeter Xu + DEFINE_PROP_BIT("x-pcie-err-unc-mask", PCIDevice, cap_present, 4036cc6a7b9SPeter Xu + QEMU_PCIE_ERR_UNC_MASK_BITNR, true), 4046cc6a7b9SPeter Xu }; 4056cc6a7b9SPeter Xu 4066cc6a7b9SPeter XuNotice that we enable the feature for new machine types. 4076cc6a7b9SPeter Xu 4086cc6a7b9SPeter XuNow we see how the fix is done. This is going to depend on what kind 4096cc6a7b9SPeter Xuof breakage happens, but in this case it is quite simple:: 4106cc6a7b9SPeter Xu 4116cc6a7b9SPeter Xu diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c 4126cc6a7b9SPeter Xu index 103667c368..374d593ead 100644 4136cc6a7b9SPeter Xu --- a/hw/pci/pcie_aer.c 4146cc6a7b9SPeter Xu +++ b/hw/pci/pcie_aer.c 4156cc6a7b9SPeter Xu @@ -112,10 +112,13 @@ int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, 4166cc6a7b9SPeter Xu uint16_t offset, 4176cc6a7b9SPeter Xu 4186cc6a7b9SPeter Xu pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS, 4196cc6a7b9SPeter Xu PCI_ERR_UNC_SUPPORTED); 4206cc6a7b9SPeter Xu - pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK, 4216cc6a7b9SPeter Xu - PCI_ERR_UNC_MASK_DEFAULT); 4226cc6a7b9SPeter Xu - pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK, 4236cc6a7b9SPeter Xu - PCI_ERR_UNC_SUPPORTED); 4246cc6a7b9SPeter Xu + 4256cc6a7b9SPeter Xu + if (dev->cap_present & QEMU_PCIE_ERR_UNC_MASK) { 4266cc6a7b9SPeter Xu + pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK, 4276cc6a7b9SPeter Xu + PCI_ERR_UNC_MASK_DEFAULT); 4286cc6a7b9SPeter Xu + pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK, 4296cc6a7b9SPeter Xu + PCI_ERR_UNC_SUPPORTED); 4306cc6a7b9SPeter Xu + } 4316cc6a7b9SPeter Xu 4326cc6a7b9SPeter Xu pci_set_long(dev->config + offset + PCI_ERR_UNCOR_SEVER, 4336cc6a7b9SPeter Xu PCI_ERR_UNC_SEVERITY_DEFAULT); 4346cc6a7b9SPeter Xu 4356cc6a7b9SPeter XuI.e. If the property bit is enabled, we configure it as we did for 4366cc6a7b9SPeter Xuqemu-8.0. If the property bit is not set, we configure it as it was in 7.2. 4376cc6a7b9SPeter Xu 4386cc6a7b9SPeter XuAnd now, everything that is missing is disabling the feature for old 4396cc6a7b9SPeter Xumachine types:: 4406cc6a7b9SPeter Xu 4416cc6a7b9SPeter Xu diff --git a/hw/core/machine.c b/hw/core/machine.c 4426cc6a7b9SPeter Xu index 47a34841a5..07f763eb2e 100644 4436cc6a7b9SPeter Xu --- a/hw/core/machine.c 4446cc6a7b9SPeter Xu +++ b/hw/core/machine.c 4456cc6a7b9SPeter Xu @@ -48,6 +48,7 @@ GlobalProperty hw_compat_7_2[] = { 4466cc6a7b9SPeter Xu { "e1000e", "migrate-timadj", "off" }, 4476cc6a7b9SPeter Xu { "virtio-mem", "x-early-migration", "false" }, 4486cc6a7b9SPeter Xu { "migration", "x-preempt-pre-7-2", "true" }, 4496cc6a7b9SPeter Xu + { TYPE_PCI_DEVICE, "x-pcie-err-unc-mask", "off" }, 4506cc6a7b9SPeter Xu }; 4516cc6a7b9SPeter Xu const size_t hw_compat_7_2_len = G_N_ELEMENTS(hw_compat_7_2); 4526cc6a7b9SPeter Xu 4536cc6a7b9SPeter XuAnd now, when qemu-8.0.1 is released with this fix, all combinations 4546cc6a7b9SPeter Xuare going to work as supposed. 4556cc6a7b9SPeter Xu 4566cc6a7b9SPeter Xu- $ qemu-7.2 -M pc-7.2 -> qemu-7.2 -M pc-7.2 (works) 4576cc6a7b9SPeter Xu- $ qemu-8.0.1 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 (works) 4586cc6a7b9SPeter Xu- $ qemu-8.0.1 -M pc-7.2 -> qemu-7.2 -M pc-7.2 (works) 4596cc6a7b9SPeter Xu- $ qemu-7.2 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 (works) 4606cc6a7b9SPeter Xu 4616cc6a7b9SPeter XuSo the normality has been restored and everything is ok, no? 4626cc6a7b9SPeter Xu 4636cc6a7b9SPeter XuNot really, now our matrix is much bigger. We started with the easy 4646cc6a7b9SPeter Xucases, migration from the same version to the same version always 4656cc6a7b9SPeter Xuworks: 4666cc6a7b9SPeter Xu 4676cc6a7b9SPeter Xu- $ qemu-7.2 -M pc-7.2 -> qemu-7.2 -M pc-7.2 4686cc6a7b9SPeter Xu- $ qemu-8.0 -M pc-7.2 -> qemu-8.0 -M pc-7.2 4696cc6a7b9SPeter Xu- $ qemu-8.0.1 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 4706cc6a7b9SPeter Xu 4716cc6a7b9SPeter XuNow the interesting ones. When the QEMU processes versions are 4726cc6a7b9SPeter Xudifferent. For the 1st set, their fail and we can do nothing, both 4736cc6a7b9SPeter Xuversions are released and we can't change anything. 4746cc6a7b9SPeter Xu 4756cc6a7b9SPeter Xu- $ qemu-7.2 -M pc-7.2 -> qemu-8.0 -M pc-7.2 4766cc6a7b9SPeter Xu- $ qemu-8.0 -M pc-7.2 -> qemu-7.2 -M pc-7.2 4776cc6a7b9SPeter Xu 4786cc6a7b9SPeter XuThis two are the ones that work. The whole point of making the 4796cc6a7b9SPeter Xuchange in qemu-8.0.1 release was to fix this issue: 4806cc6a7b9SPeter Xu 4816cc6a7b9SPeter Xu- $ qemu-7.2 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 4826cc6a7b9SPeter Xu- $ qemu-8.0.1 -M pc-7.2 -> qemu-7.2 -M pc-7.2 4836cc6a7b9SPeter Xu 4846cc6a7b9SPeter XuBut now we found that qemu-8.0 neither can migrate to qemu-7.2 not 4856cc6a7b9SPeter Xuqemu-8.0.1. 4866cc6a7b9SPeter Xu 4876cc6a7b9SPeter Xu- $ qemu-8.0 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 4886cc6a7b9SPeter Xu- $ qemu-8.0.1 -M pc-7.2 -> qemu-8.0 -M pc-7.2 4896cc6a7b9SPeter Xu 4906cc6a7b9SPeter XuSo, if we start a pc-7.2 machine in qemu-8.0 we can't migrate it to 4916cc6a7b9SPeter Xuanything except to qemu-8.0. 4926cc6a7b9SPeter Xu 4936cc6a7b9SPeter XuCan we do better? 4946cc6a7b9SPeter Xu 4956cc6a7b9SPeter XuYeap. If we know that we are going to do this migration: 4966cc6a7b9SPeter Xu 4976cc6a7b9SPeter Xu- $ qemu-8.0 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 4986cc6a7b9SPeter Xu 4996cc6a7b9SPeter XuWe can launch the appropriate devices with:: 5006cc6a7b9SPeter Xu 5016cc6a7b9SPeter Xu --device...,x-pci-e-err-unc-mask=on 5026cc6a7b9SPeter Xu 5036cc6a7b9SPeter XuAnd now we can receive a migration from 8.0. And from now on, we can 5046cc6a7b9SPeter Xudo that migration to new machine types if we remember to enable that 5056cc6a7b9SPeter Xuproperty for pc-7.2. Notice that we need to remember, it is not 5066cc6a7b9SPeter Xuenough to know that the source of the migration is qemu-8.0. Think of 5076cc6a7b9SPeter Xuthis example: 5086cc6a7b9SPeter Xu 5096cc6a7b9SPeter Xu$ qemu-8.0 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 -> qemu-8.2 -M pc-7.2 5106cc6a7b9SPeter Xu 5116cc6a7b9SPeter XuIn the second migration, the source is not qemu-8.0, but we still have 5126cc6a7b9SPeter Xuthat "problem" and have that property enabled. Notice that we need to 5136cc6a7b9SPeter Xucontinue having this mark/property until we have this machine 5146cc6a7b9SPeter Xurebooted. But it is not a normal reboot (that don't reload QEMU) we 5156cc6a7b9SPeter Xuneed the machine to poweroff/poweron on a fixed QEMU. And from now 5166cc6a7b9SPeter Xuon we can use the proper real machine. 517