#
e806b4db |
| 28-Jun-2017 |
Laurent Vivier <lvivier@redhat.com> |
spapr: fix migration to pseries machine < 2.8
since commit 5c4537bd ("spapr: Fix 2.7<->2.8 migration of PCI host bridge"), some migration fields are forged from the new ones in spapr_pci_pre_save().
spapr: fix migration to pseries machine < 2.8
since commit 5c4537bd ("spapr: Fix 2.7<->2.8 migration of PCI host bridge"), some migration fields are forged from the new ones in spapr_pci_pre_save().
It works well, except when the number of MSI devices is 0, because in this case the function exits immediately.
This fix moves the migration code before the exit code.
The problem can be reproduced with these commands:
source qemu-2.9:
qemu-system-ppc64 -monitor stdio -M pseries-2.6 -nodefaults -S
destination qemu-2.6:
qemu-system-ppc64 -monitor stdio -M pseries-2.6 -nodefaults \ -incoming tcp:0:4444
on the source:
migrate tcp:localhost:4444
Destination fails with the following error:
qemu-system-ppc64: error while loading state for instance 0x0 of device 'spapr_pci' qemu-system-ppc64: load of migration failed: Invalid argument
Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
d2164ad3 |
| 23-Jun-2017 |
Halil Pasic <pasic@linux.vnet.ibm.com> |
vmstate: error hint for failed equal checks
In some cases a failing VMSTATE_*_EQUAL does not mean we detected a bug, but it's actually the best we can do. Especially in these cases a verbose error m
vmstate: error hint for failed equal checks
In some cases a failing VMSTATE_*_EQUAL does not mean we detected a bug, but it's actually the best we can do. Especially in these cases a verbose error message is required.
Let's introduce infrastructure for specifying a error hint to be used if equal check fails. Let's do this by adding a parameter to the _EQUAL macros called _err_hint. Also change all current users to pass NULL as last parameter so nothing changes for them.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Message-Id: <20170623144823.42936-1-pasic@linux.vnet.ibm.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
show more ...
|
#
6304fd27 |
| 07-Jun-2017 |
David Gibson <david@gibson.dropbear.id.au> |
spapr: Fold spapr_phb_{add,remove}_pci_device() into their only callers
Both functions are fairly short, and so are their callers. There's no particular logical distinction between them, so fold th
spapr: Fold spapr_phb_{add,remove}_pci_device() into their only callers
Both functions are fairly short, and so are their callers. There's no particular logical distinction between them, so fold them together.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
show more ...
|
#
0be4e886 |
| 06-Jun-2017 |
David Gibson <david@gibson.dropbear.id.au> |
spapr: Change DRC attach & detach methods to functions
DRC objects have attach & detach methods, but there's only one implementation. Although there are some differences in its behaviour for differ
spapr: Change DRC attach & detach methods to functions
DRC objects have attach & detach methods, but there's only one implementation. Although there are some differences in its behaviour for different DRC types, the overall structure is the same, so while we might want different method implementations for some parts, we're unlikely to want them for the top-level functions.
So, replace them with direct function calls.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
show more ...
|
#
f224d35b |
| 07-Jun-2017 |
David Gibson <david@gibson.dropbear.id.au> |
spapr: Clean up DR entity sense handling
DRC classes have an entity_sense method to determine (in a specific PAPR sense) the presence or absence of a device plugged into a DRC. However, we only hav
spapr: Clean up DR entity sense handling
DRC classes have an entity_sense method to determine (in a specific PAPR sense) the presence or absence of a device plugged into a DRC. However, we only have one implementation of the method, which explicitly tests for different DRC types. This changes it to instead have different method implementations for the two cases: "logical" and "physical" DRCs.
While we're at it, the entity sense method always returns RTAS_OUT_SUCCESS, and the interesting value is returned via pass-by-reference. Simplify this to directly return the value we care about
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
show more ...
|
#
fbf55397 |
| 04-Jun-2017 |
David Gibson <david@gibson.dropbear.id.au> |
spapr: Clean up spapr_dr_connector_by_*()
* Change names to something less ludicrously verbose * Now that we have QOM subclasses for the different DRC types, use a QOM typename instead of a PAP
spapr: Clean up spapr_dr_connector_by_*()
* Change names to something less ludicrously verbose * Now that we have QOM subclasses for the different DRC types, use a QOM typename instead of a PAPR type value parameter
The latter allows removal of the get_type_shift() helper.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
show more ...
|
#
2d335818 |
| 04-Jun-2017 |
David Gibson <david@gibson.dropbear.id.au> |
spapr: Introduce DRC subclasses
Currently we only have a single QOM type for all DRCs, but lots of places where we switch behaviour based on the DRC's PAPR defined type. This is a poor use of our ex
spapr: Introduce DRC subclasses
Currently we only have a single QOM type for all DRCs, but lots of places where we switch behaviour based on the DRC's PAPR defined type. This is a poor use of our existing type system.
So, instead create QOM subclasses for each PAPR defined DRC type. We also introduce intermediate subclasses for physical and logical DRCs, a division which will be useful later on.
Instead of being stored in the DRC object itself, the PAPR type is now stored in the class structure. There are still many places where we switch directly on the PAPR type value, but this at least provides the basis to start to remove those.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
show more ...
|
#
0b55aa91 |
| 02-Jun-2017 |
David Gibson <david@gibson.dropbear.id.au> |
spapr: Make DRC get_index and get_type methods into plain functions
These two methods only have one implementation, and the spec they're implementing means any other implementation is unlikely, verg
spapr: Make DRC get_index and get_type methods into plain functions
These two methods only have one implementation, and the spec they're implementing means any other implementation is unlikely, verging on impossible.
So replace them with simple functions.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
show more ...
|
#
31834723 |
| 22-May-2017 |
Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> |
hw/ppc: removing drc->detach_cb and drc->detach_cb_opaque
The pointer drc->detach_cb is being used as a way of informing the detach() function inside spapr_drc.c which cb to execute. This informatio
hw/ppc: removing drc->detach_cb and drc->detach_cb_opaque
The pointer drc->detach_cb is being used as a way of informing the detach() function inside spapr_drc.c which cb to execute. This information can also be retrieved simply by checking drc->type and choosing the right callback based on it. In this context, detach_cb is redundant information that must be managed.
After the previous spapr_lmb_release change, no detach_cb_opaques are being used by any of the three callbacks functions. This is yet another information that is now unused and, on top of that, can't be migrated either.
This patch makes the following changes:
- removal of detach_cb_opaque. the 'opaque' argument was removed from the callbacks and from the detach() function of sPAPRConnectorClass. The attribute detach_cb_opaque of sPAPRConnector was removed.
- removal of detach_cb from the detach() call. The function pointer detach_cb of sPAPRConnector was removed. detach() now uses a switch(drc->type) to execute the apropriate callback. To achieve this, spapr_core_release, spapr_lmb_release and spapr_phb_remove_pci_device_cb callbacks were made public to be visible inside detach().
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
e4f4fb1e |
| 03-May-2017 |
Eduardo Habkost <ehabkost@redhat.com> |
sysbus: Set user_creatable=false by default on TYPE_SYS_BUS_DEVICE
commit 33cd52b5d7b9adfd009e95f07e6c64dd88ae2a31 unset cannot_instantiate_with_device_add_yet in TYPE_SYSBUS, making all sysbus devi
sysbus: Set user_creatable=false by default on TYPE_SYS_BUS_DEVICE
commit 33cd52b5d7b9adfd009e95f07e6c64dd88ae2a31 unset cannot_instantiate_with_device_add_yet in TYPE_SYSBUS, making all sysbus devices appear on "-device help" and lack the "no-user" flag in "info qdm".
To fix this, we can set user_creatable=false by default on TYPE_SYS_BUS_DEVICE, but this requires setting user_creatable=true explicitly on the sysbus devices that actually work with -device.
Fortunately today we have just a few has_dynamic_sysbus=1 machines: virt, pc-q35-*, ppce500, and spapr.
virt, ppce500, and spapr have extra checks to ensure just a few device types can be instantiated:
* virt supports only TYPE_VFIO_CALXEDA_XGMAC, TYPE_VFIO_AMD_XGBE. * ppce500 supports only TYPE_ETSEC_COMMON. * spapr supports only TYPE_SPAPR_PCI_HOST_BRIDGE.
This patch sets user_creatable=true explicitly on those 4 device classes.
Now, the more complex cases:
pc-q35-*: q35 has no sysbus device whitelist yet (which is a separate bug). We are in the process of fixing it and building a sysbus whitelist on q35, but in the meantime we can fix the "-device help" and "info qdm" bugs mentioned above. Also, despite not being strictly necessary for fixing the q35 bug, reducing the list of user_creatable=true devices will help us be more confident when building the q35 whitelist.
xen: We also have a hack at xen_set_dynamic_sysbus(), that sets has_dynamic_sysbus=true at runtime when using the Xen accelerator. This hack is only used to allow xen-backend devices to be dynamically plugged/unplugged.
This means today we can use -device with the following 22 device types, that are the ones compiled into the qemu-system-x86_64 and qemu-system-i386 binaries:
* allwinner-ahci * amd-iommu * cfi.pflash01 * esp * fw_cfg_io * fw_cfg_mem * generic-sdhci * hpet * intel-iommu * ioapic * isabus-bridge * kvmclock * kvm-ioapic * kvmvapic * SUNW,fdtwo * sysbus-ahci * sysbus-fdc * sysbus-ohci * unimplemented-device * virtio-mmio * xen-backend * xen-sysdev
This patch adds user_creatable=true explicitly to those devices, temporarily, just to keep 100% compatibility with existing behavior of q35. Subsequent patches will remove user_creatable=true from the devices that are really not meant to user-creatable on any machine, and remove the FIXME comment from the ones that are really supposed to be user-creatable. This is being done in separate patches because we still don't have an obvious list of devices that will be whitelisted by q35, and I would like to get each device reviewed individually.
Cc: Alexander Graf <agraf@suse.de> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Alistair Francis <alistair.francis@xilinx.com> Cc: Beniamino Galvani <b.galvani@gmail.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: "Edgar E. Iglesias" <edgar.iglesias@gmail.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Frank Blaschka <frank.blaschka@de.ibm.com> Cc: Gabriel L. Somlo <somlo@cmu.edu> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: John Snow <jsnow@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Laszlo Ersek <lersek@redhat.com> Cc: Marcel Apfelbaum <marcel@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Pierre Morel <pmorel@linux.vnet.ibm.com> Cc: Prasad J Pandit <pjp@fedoraproject.org> Cc: qemu-arm@nongnu.org Cc: qemu-block@nongnu.org Cc: qemu-ppc@nongnu.org Cc: Richard Henderson <rth@twiddle.net> Cc: Rob Herring <robh@kernel.org> Cc: Shannon Zhao <zhaoshenglong@huawei.com> Cc: sstabellini@kernel.org Cc: Thomas Huth <thuth@redhat.com> Cc: Yi Min Zhao <zyimin@linux.vnet.ibm.com> Acked-by: John Snow <jsnow@redhat.com> Acked-by: Juergen Gross <jgross@suse.com> Acked-by: Marcel Apfelbaum <marcel@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20170503203604.31462-3-ehabkost@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [ehabkost: Small changes at sysbus_device_class_init() comments] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
show more ...
|
#
c88fa6dd |
| 29-Mar-2017 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
spapr_pci: Removed unused include
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
|
#
a01f3432 |
| 28-Mar-2017 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
spapr_pci: Warn when RAM page size is not enabled in IOMMU page mask
If a page size used by QEMU is not enabled in the PHB IOMMU page mask, in-kernel acceleration of TCE handling won't be enabled an
spapr_pci: Warn when RAM page size is not enabled in IOMMU page mask
If a page size used by QEMU is not enabled in the PHB IOMMU page mask, in-kernel acceleration of TCE handling won't be enabled and performance might be slower than expected.
This prints a warning if system page size is not enabled. This should print a warning if huge pages are enabled but sphb.pgsz still uses the default value of 4K|64K.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
82516263 |
| 14-Mar-2017 |
David Gibson <david@gibson.dropbear.id.au> |
pseries: Don't expose PCIe extended config space on older machine types
bb9986452 "spapr_pci: Advertise access to PCIe extended config space" allowed guests to access the extended config space of PC
pseries: Don't expose PCIe extended config space on older machine types
bb9986452 "spapr_pci: Advertise access to PCIe extended config space" allowed guests to access the extended config space of PCI Express devices via the PAPR interfaces, even though the paravirtualized bus mostly acts like plain PCI.
However, that patch enabled access unconditionally, including for existing machine types, which is an unwise change in behaviour. This patch limits the change to pseries-2.9 (and later) machine types.
Suggested-by: Andrea Bolognani <abologna@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
bb998645 |
| 01-Mar-2017 |
David Gibson <david@gibson.dropbear.id.au> |
spapr_pci: Advertise access to PCIe extended config space
The (paravirtual) PCI host bridge on the 'pseries' machine in most regards acts like a regular PCI bus, rather than a PCIe bus. Despite thi
spapr_pci: Advertise access to PCIe extended config space
The (paravirtual) PCI host bridge on the 'pseries' machine in most regards acts like a regular PCI bus, rather than a PCIe bus. Despite this, though, it does allow access to the PCIe extended config space.
We already implemented the RTAS methods to allow this access.. but forgot to put the markers into the device tree so that guest's know it is there. This adds them in.
With this, a pseries guest is able to view extended config space on (for example an e1000e device. This should be enough to allow guests to use at least some PCIe devices.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
f7759e43 |
| 27-Feb-2017 |
Cédric Le Goater <clg@kaod.org> |
ppc/xics: use the QOM interface to get irqs
Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.
ppc/xics: use the QOM interface to get irqs
Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
681bfade |
| 27-Feb-2017 |
Cédric Le Goater <clg@kaod.org> |
ppc/xics: store the ICS object under the sPAPR machine
A list of ICS objects was introduced under the XICS object for the PowerNV machine but, for the sPAPR machine, it brings extra complexity as th
ppc/xics: store the ICS object under the sPAPR machine
A list of ICS objects was introduced under the XICS object for the PowerNV machine but, for the sPAPR machine, it brings extra complexity as there is only a single ICS. To simplify the code, let's add the ICS pointer under the sPAPR machine and try to reduce the use of this list where possible.
Also, change the xics_spapr_*() routines to use an ICS object instead of an XICSState and change their name to reflect that these are specific to the sPAPR ICS object.
Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
a8eeafda |
| 22-Feb-2017 |
Greg Kurz <gkurz@linux.vnet.ibm.com> |
spapr/pci: populate PCI DT in reverse order
Since commit 1d2d974244c6 "spapr_pci: enumerate and add PCI device tree", QEMU populates the PCI device tree in the opposite order compared to SLOF.
Befo
spapr/pci: populate PCI DT in reverse order
Since commit 1d2d974244c6 "spapr_pci: enumerate and add PCI device tree", QEMU populates the PCI device tree in the opposite order compared to SLOF.
Before 1d2d974244c6:
Populating /pci@800000020000000 00 0000 (D) : 1af4 1000 virtio [ net ] 00 0800 (D) : 1af4 1001 virtio [ block ] 00 1000 (D) : 1af4 1009 virtio [ network ] Populating /pci@800000020000000/unknown-legacy-device@2
7e5294b8 : /pci@800000020000000 7e52b998 : |-- ethernet@0 7e52c0c8 : |-- scsi@1 7e52c7e8 : +-- unknown-legacy-device@2 ok
Since 1d2d974244c6:
Populating /pci@800000020000000 00 1000 (D) : 1af4 1009 virtio [ network ] Populating /pci@800000020000000/unknown-legacy-device@2 00 0800 (D) : 1af4 1001 virtio [ block ] 00 0000 (D) : 1af4 1000 virtio [ net ]
7e5e8118 : /pci@800000020000000 7e5ea6a0 : |-- unknown-legacy-device@2 7e5eadb8 : |-- scsi@1 7e5eb4d8 : +-- ethernet@0 ok
This behaviour change is not actually a bug since no assumptions should be made on DT ordering. But it has no real justification either, other than being the consequence of the way fdt_add_subnode() inserts new elements to the front of the FDT rather than adding them to the tail.
This patch reverts to the historical SLOF ordering by walking PCI devices in reverse order. This reconciles pseries with x86 machine types behavior. It is expected to make things easier when porting existing applications to power.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Tested-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> (slight update to the changelog) Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
2530a1a5 |
| 17-Feb-2017 |
Laurent Vivier <lvivier@redhat.com> |
spapr: generate DT node names
When DT node names for PCI devices are generated by SLOF, they are generated according to the type of the device (for instance, ethernet for virtio-net-pci device).
No
spapr: generate DT node names
When DT node names for PCI devices are generated by SLOF, they are generated according to the type of the device (for instance, ethernet for virtio-net-pci device).
Node name for hotplugged devices is generated by QEMU. This patch adds the mechanic to QEMU to create the node name according to the device type too.
The data structure has been roughly copied from OpenBIOS/OpenHackware, node names from SLOF.
Example:
Hotplugging some PCI cards with QEMU monitor:
device_add virtio-tablet-pci device_add virtio-serial-pci device_add virtio-mouse-pci device_add virtio-scsi-pci device_add virtio-gpu-pci device_add ne2k_pci device_add nec-usb-xhci device_add intel-hda
What we can see in linux device tree:
for dir in /proc/device-tree/pci@800000020000000/*@*/; do echo $dir cat $dir/name echo done
WITHOUT this patch:
/proc/device-tree/pci@800000020000000/pci@0/ pci /proc/device-tree/pci@800000020000000/pci@1/ pci /proc/device-tree/pci@800000020000000/pci@2/ pci /proc/device-tree/pci@800000020000000/pci@3/ pci /proc/device-tree/pci@800000020000000/pci@4/ pci /proc/device-tree/pci@800000020000000/pci@5/ pci /proc/device-tree/pci@800000020000000/pci@6/ pci /proc/device-tree/pci@800000020000000/pci@7/ pci
WITH this patch:
/proc/device-tree/pci@800000020000000/communication-controller@1/ communication-controller /proc/device-tree/pci@800000020000000/display@4/ display /proc/device-tree/pci@800000020000000/ethernet@5/ ethernet /proc/device-tree/pci@800000020000000/input-controller@0/ input-controller /proc/device-tree/pci@800000020000000/mouse@2/ mouse /proc/device-tree/pci@800000020000000/multimedia-device@7/ multimedia-device /proc/device-tree/pci@800000020000000/scsi@3/ scsi /proc/device-tree/pci@800000020000000/usb-xhci@6/ usb-xhci
Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
5c4537bd |
| 22-Nov-2016 |
David Gibson <david@gibson.dropbear.id.au> |
spapr: Fix 2.7<->2.8 migration of PCI host bridge
daa2369 "spapr_pci: Add a 64-bit MMIO window" subtly broke migration from qemu-2.7 to the current version. It split the device's MMIO window into t
spapr: Fix 2.7<->2.8 migration of PCI host bridge
daa2369 "spapr_pci: Add a 64-bit MMIO window" subtly broke migration from qemu-2.7 to the current version. It split the device's MMIO window into two pieces for 32-bit and 64-bit MMIO.
The patch included backwards compatibility code to convert the old property into the new format. However, the property value was also transferred in the migration stream and compared with a (probably unwise) VMSTATE_EQUAL. So, the "raw" value from 2.7 is compared to the new style converted value from (pre-)2.8 giving a mismatch and migration failure.
Along with the actual field that caused the breakage, there are several other ill-advised VMSTATE_EQUAL()s. To fix forwards migration, we read the values in the stream into scratch variables and ignore them, instead of comparing for equality. To fix backwards migration, we populate those scratch variables in pre_save() with adjusted values to match the old behaviour.
To permit the eventual possibility of removing this cruft from the stream, we only include these compatibility fields if a new 'pre-2.8-migration' property is set. We clear it on the pseries-2.8 machine type, which obviously can't be migrated backwards, but set it on earlier machine type versions.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
show more ...
|
#
5a78b821 |
| 21-Nov-2016 |
David Gibson <david@gibson.dropbear.id.au> |
Revert "spapr: Fix migration of PCI host bridges from qemu-2.7"
This reverts commit 9b54ca0ba781012eeea4237b7c4832ba2ea81d89.
The commit above corrected a migration breakage between qemu-2.7 and qe
Revert "spapr: Fix migration of PCI host bridges from qemu-2.7"
This reverts commit 9b54ca0ba781012eeea4237b7c4832ba2ea81d89.
The commit above corrected a migration breakage between qemu-2.7 and qemu-2.8. However it did so by advancing the migration version for the PCI host bridge, which obviously breaks migration backwards to earlier qemu versions.
Although it's not totally essential, we'd like to maintain the possibility for backwards migration, so revert the change in preparation for a better fix.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
show more ...
|
#
9b54ca0b |
| 14-Nov-2016 |
David Gibson <david@gibson.dropbear.id.au> |
spapr: Fix migration of PCI host bridges from qemu-2.7
daa2369 "spapr_pci: Add a 64-bit MMIO window" subtly broke migration from qemu-2.7 to the current version. It split the device's MMIO window i
spapr: Fix migration of PCI host bridges from qemu-2.7
daa2369 "spapr_pci: Add a 64-bit MMIO window" subtly broke migration from qemu-2.7 to the current version. It split the device's MMIO window into two pieces for 32-bit and 64-bit MMIO.
The patch included backwards compatibility code to convert the old property into the new format. However, the property value was also transferred in the migration stream and compared with a (probably unwise) VMSTATE_EQUAL. So, the "raw" value from 2.7 is compared to the new style converted value from (pre-)2.8 giving a mismatch and migration failure.
Although it would be technically possible to fix this in a way allowing backwards migration, that would leave an ugly legacy around indefinitely. This patch takes the simpler approach of bumping the migration version, dropping the unwise VMSTATE_EQUAL (and some equally unwise ones around it) and ignoring them on an incoming migration.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
show more ...
|
#
4bcfa56c |
| 18-Oct-2016 |
Michael Roth <mdroth@linux.vnet.ibm.com> |
spapr_pci: advertise explicit numa IDs even when there's 1 node
With the addition of "numa_node" properties for PHBs we began advertising NUMA affinity in cases where nb_numa_nodes > 1.
Since the d
spapr_pci: advertise explicit numa IDs even when there's 1 node
With the addition of "numa_node" properties for PHBs we began advertising NUMA affinity in cases where nb_numa_nodes > 1.
Since the default on the guest side is to make no assumptions about PHB NUMA affinity (defaulting to -1), there is still a valid use-case for explicitly defining a PHB's NUMA affinity even when there's just one node. In particular, some workloads make faulty assumptions about /sys/bus/pci/<devid>/numa_node being >= 0, warranting the use of this property as a workaround even if there's just 1 PHB or NUMA node.
Enable this use-case by always advertising the PHB's NUMA affinity if "numa_node" has been explicitly set.
We could achieve this by relaxing the check to simply be nb_numa_nodes > 0, but even safer would be to check numa_info[nodeid].present explicitly, and to fail at start time for cases where it does not exist.
This has an additional affect of no longer advertising PHB NUMA affinity unconditionally if nb_numa_nodes > 1 and "numa_node" property is unset/-1, but since the default value on the guest side for each PHB is also -1, the behavior should be the same for that situation. We could still retain the old behavior if desired, but the decision seems arbitrary, so we take the simpler route.
Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Shivaprasad G. Bhat <shivapbh@in.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
357d1e3b |
| 16-Oct-2016 |
David Gibson <david@gibson.dropbear.id.au> |
spapr: Improved placement of PCI host bridges in guest memory map
Currently, the MMIO space for accessing PCI on pseries guests begins at 1 TiB in guest address space. Each PCI host bridge (PHB) ha
spapr: Improved placement of PCI host bridges in guest memory map
Currently, the MMIO space for accessing PCI on pseries guests begins at 1 TiB in guest address space. Each PCI host bridge (PHB) has a 64 GiB chunk of address space in which it places its outbound PIO and 32-bit and 64-bit MMIO windows.
This scheme as several problems: - It limits guest RAM to 1 TiB (though we have a limited fix for this now) - It limits the total MMIO window to 64 GiB. This is not always enough for some of the large nVidia GPGPU cards - Putting all the windows into a single 64 GiB area means that naturally aligning things within there will waste more address space. In addition there was a miscalculation in some of the defaults, which meant that the MMIO windows for each PHB actually slightly overran the 64 GiB region for that PHB. We got away without nasty consequences because the overrun fit within an unused area at the beginning of the next PHB's region, but it's not pretty.
This patch implements a new scheme which addresses those problems, and is also closer to what bare metal hardware and pHyp guests generally use.
Because some guest versions (including most current distro kernels) can't access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and 64 TiB. This is broken into 1 TiB chunks. The first 1 TiB contains the PIO (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs. Each subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for one PHB each.
This reduces the number of allowed PHBs (without full manual configuration of all the windows) from 256 to 31, but this should still be plenty in practice.
We also change some of the default window sizes for manually configured PHBs to saner values.
Finally we adjust some tests and libqos so that it correctly uses the new default locations. Ideally it would parse the device tree given to the guest, but that's a more complex problem for another time.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
show more ...
|
#
daa23699 |
| 11-Oct-2016 |
David Gibson <david@gibson.dropbear.id.au> |
spapr_pci: Add a 64-bit MMIO window
On real hardware, and under pHyp, the PCI host bridges on Power machines typically advertise two outbound MMIO windows from the guest's physical memory space to P
spapr_pci: Add a 64-bit MMIO window
On real hardware, and under pHyp, the PCI host bridges on Power machines typically advertise two outbound MMIO windows from the guest's physical memory space to PCI memory space: - A 32-bit window which maps onto 2GiB..4GiB in the PCI address space - A 64-bit window which maps onto a large region somewhere high in PCI address space (traditionally this used an identity mapping from guest physical address to PCI address, but that's not always the case)
The qemu implementation in spapr-pci-host-bridge, however, only supports a single outbound MMIO window, however. At least some Linux versions expect the two windows however, so we arranged this window to map onto the PCI memory space from 2 GiB..~64 GiB, then advertised it as two contiguous windows, the "32-bit" window from 2G..4G and the "64-bit" window from 4G..~64G.
This approach means, however, that the 64G window is not naturally aligned. In turn this limits the size of the largest BAR we can map (which does have to be naturally aligned) to roughly half of the total window. With some large nVidia GPGPU cards which have huge memory BARs, this is starting to be a problem.
This patch adds true support for separate 32-bit and 64-bit outbound MMIO windows to the spapr-pci-host-bridge implementation, each of which can be independently configured. The 32-bit window always maps to 2G.. in PCI space, but the PCI address of the 64-bit window can be configured (it defaults to the same as the guest physical address).
So as not to break possible existing configurations, as long as a 64-bit window is not specified, a large single window can be specified. This will appear the same way to the guest as the old approach, although it's now implemented by two contiguous memory regions rather than a single one.
For now, this only adds the possibility of 64-bit windows. The default configuration still uses the legacy mode.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
show more ...
|
#
6737d9ad |
| 12-Oct-2016 |
David Gibson <david@gibson.dropbear.id.au> |
spapr_pci: Delegate placement of PCI host bridges to machine type
The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB) for a PAPR guest. Unlike on x86, it's routine on Power (bo
spapr_pci: Delegate placement of PCI host bridges to machine type
The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB) for a PAPR guest. Unlike on x86, it's routine on Power (both bare metal and PAPR guests) to have numerous independent PHBs, each controlling a separate PCI domain.
There are two ways of configuring the spapr-pci-host-bridge device: first it can be done fully manually, specifying the locations and sizes of all the IO windows. This gives the most control, but is very awkward with 6 mandatory parameters. Alternatively just an "index" can be specified which essentially selects from an array of predefined PHB locations. The PHB at index 0 is automatically created as the default PHB.
The current set of default locations causes some problems for guests with large RAM (> 1 TiB) or PCI devices with very large BARs (e.g. big nVidia GPGPU cards via VFIO). Obviously, for migration we can only change the locations on a new machine type, however.
This is awkward, because the placement is currently decided within the spapr-pci-host-bridge code, so it breaks abstraction to look inside the machine type version.
So, this patch delegates the "default mode" PHB placement from the spapr-pci-host-bridge device back to the machine type via a public method in sPAPRMachineClass. It's still a bit ugly, but it's about the best we can do.
For now, this just changes where the calculation is done. It doesn't change the actual location of the host bridges, or any other behaviour.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
show more ...
|