History log of /qemu/hw/ppc/spapr_pci.c (Results 226 – 250 of 401)
Revision Date Author Comments
# e806b4db 28-Jun-2017 Laurent Vivier <lvivier@redhat.com>

spapr: fix migration to pseries machine < 2.8

since commit 5c4537bd ("spapr: Fix 2.7<->2.8 migration of PCI host bridge"),
some migration fields are forged from the new ones in spapr_pci_pre_save().

spapr: fix migration to pseries machine < 2.8

since commit 5c4537bd ("spapr: Fix 2.7<->2.8 migration of PCI host bridge"),
some migration fields are forged from the new ones in spapr_pci_pre_save().

It works well, except when the number of MSI devices is 0,
because in this case the function exits immediately.

This fix moves the migration code before the exit code.

The problem can be reproduced with these commands:

source qemu-2.9:

qemu-system-ppc64 -monitor stdio -M pseries-2.6 -nodefaults -S

destination qemu-2.6:

qemu-system-ppc64 -monitor stdio -M pseries-2.6 -nodefaults \
-incoming tcp:0:4444

on the source:

migrate tcp:localhost:4444

Destination fails with the following error:

qemu-system-ppc64: error while loading state for
instance 0x0 of device 'spapr_pci'
qemu-system-ppc64: load of migration failed: Invalid argument

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# d2164ad3 23-Jun-2017 Halil Pasic <pasic@linux.vnet.ibm.com>

vmstate: error hint for failed equal checks

In some cases a failing VMSTATE_*_EQUAL does not mean we detected a bug,
but it's actually the best we can do. Especially in these cases a verbose
error m

vmstate: error hint for failed equal checks

In some cases a failing VMSTATE_*_EQUAL does not mean we detected a bug,
but it's actually the best we can do. Especially in these cases a verbose
error message is required.

Let's introduce infrastructure for specifying a error hint to be used if
equal check fails. Let's do this by adding a parameter to the _EQUAL
macros called _err_hint. Also change all current users to pass NULL as
last parameter so nothing changes for them.

Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>

Message-Id: <20170623144823.42936-1-pasic@linux.vnet.ibm.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

show more ...


# 6304fd27 07-Jun-2017 David Gibson <david@gibson.dropbear.id.au>

spapr: Fold spapr_phb_{add,remove}_pci_device() into their only callers

Both functions are fairly short, and so are their callers. There's no
particular logical distinction between them, so fold th

spapr: Fold spapr_phb_{add,remove}_pci_device() into their only callers

Both functions are fairly short, and so are their callers. There's no
particular logical distinction between them, so fold them together.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>

show more ...


# 0be4e886 06-Jun-2017 David Gibson <david@gibson.dropbear.id.au>

spapr: Change DRC attach & detach methods to functions

DRC objects have attach & detach methods, but there's only one
implementation. Although there are some differences in its behaviour for
differ

spapr: Change DRC attach & detach methods to functions

DRC objects have attach & detach methods, but there's only one
implementation. Although there are some differences in its behaviour for
different DRC types, the overall structure is the same, so while we might
want different method implementations for some parts, we're unlikely to
want them for the top-level functions.

So, replace them with direct function calls.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>

show more ...


# f224d35b 07-Jun-2017 David Gibson <david@gibson.dropbear.id.au>

spapr: Clean up DR entity sense handling

DRC classes have an entity_sense method to determine (in a specific PAPR
sense) the presence or absence of a device plugged into a DRC. However,
we only hav

spapr: Clean up DR entity sense handling

DRC classes have an entity_sense method to determine (in a specific PAPR
sense) the presence or absence of a device plugged into a DRC. However,
we only have one implementation of the method, which explicitly tests for
different DRC types. This changes it to instead have different method
implementations for the two cases: "logical" and "physical" DRCs.

While we're at it, the entity sense method always returns RTAS_OUT_SUCCESS,
and the interesting value is returned via pass-by-reference. Simplify this
to directly return the value we care about

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>

show more ...


# fbf55397 04-Jun-2017 David Gibson <david@gibson.dropbear.id.au>

spapr: Clean up spapr_dr_connector_by_*()

* Change names to something less ludicrously verbose
* Now that we have QOM subclasses for the different DRC types, use a QOM
typename instead of a PAP

spapr: Clean up spapr_dr_connector_by_*()

* Change names to something less ludicrously verbose
* Now that we have QOM subclasses for the different DRC types, use a QOM
typename instead of a PAPR type value parameter

The latter allows removal of the get_type_shift() helper.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>

show more ...


# 2d335818 04-Jun-2017 David Gibson <david@gibson.dropbear.id.au>

spapr: Introduce DRC subclasses

Currently we only have a single QOM type for all DRCs, but lots of
places where we switch behaviour based on the DRC's PAPR defined type.
This is a poor use of our ex

spapr: Introduce DRC subclasses

Currently we only have a single QOM type for all DRCs, but lots of
places where we switch behaviour based on the DRC's PAPR defined type.
This is a poor use of our existing type system.

So, instead create QOM subclasses for each PAPR defined DRC type. We
also introduce intermediate subclasses for physical and logical DRCs,
a division which will be useful later on.

Instead of being stored in the DRC object itself, the PAPR type is now
stored in the class structure. There are still many places where we
switch directly on the PAPR type value, but this at least provides the
basis to start to remove those.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>

show more ...


# 0b55aa91 02-Jun-2017 David Gibson <david@gibson.dropbear.id.au>

spapr: Make DRC get_index and get_type methods into plain functions

These two methods only have one implementation, and the spec they're
implementing means any other implementation is unlikely, verg

spapr: Make DRC get_index and get_type methods into plain functions

These two methods only have one implementation, and the spec they're
implementing means any other implementation is unlikely, verging on
impossible.

So replace them with simple functions.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>

show more ...


# 31834723 22-May-2017 Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>

hw/ppc: removing drc->detach_cb and drc->detach_cb_opaque

The pointer drc->detach_cb is being used as a way of informing
the detach() function inside spapr_drc.c which cb to execute. This
informatio

hw/ppc: removing drc->detach_cb and drc->detach_cb_opaque

The pointer drc->detach_cb is being used as a way of informing
the detach() function inside spapr_drc.c which cb to execute. This
information can also be retrieved simply by checking drc->type and
choosing the right callback based on it. In this context, detach_cb
is redundant information that must be managed.

After the previous spapr_lmb_release change, no detach_cb_opaques
are being used by any of the three callbacks functions. This is
yet another information that is now unused and, on top of that, can't
be migrated either.

This patch makes the following changes:

- removal of detach_cb_opaque. the 'opaque' argument was removed from
the callbacks and from the detach() function of sPAPRConnectorClass. The
attribute detach_cb_opaque of sPAPRConnector was removed.

- removal of detach_cb from the detach() call. The function pointer
detach_cb of sPAPRConnector was removed. detach() now uses a
switch(drc->type) to execute the apropriate callback. To achieve this,
spapr_core_release, spapr_lmb_release and spapr_phb_remove_pci_device_cb
callbacks were made public to be visible inside detach().

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# e4f4fb1e 03-May-2017 Eduardo Habkost <ehabkost@redhat.com>

sysbus: Set user_creatable=false by default on TYPE_SYS_BUS_DEVICE

commit 33cd52b5d7b9adfd009e95f07e6c64dd88ae2a31 unset
cannot_instantiate_with_device_add_yet in TYPE_SYSBUS, making all
sysbus devi

sysbus: Set user_creatable=false by default on TYPE_SYS_BUS_DEVICE

commit 33cd52b5d7b9adfd009e95f07e6c64dd88ae2a31 unset
cannot_instantiate_with_device_add_yet in TYPE_SYSBUS, making all
sysbus devices appear on "-device help" and lack the "no-user"
flag in "info qdm".

To fix this, we can set user_creatable=false by default on
TYPE_SYS_BUS_DEVICE, but this requires setting
user_creatable=true explicitly on the sysbus devices that
actually work with -device.

Fortunately today we have just a few has_dynamic_sysbus=1
machines: virt, pc-q35-*, ppce500, and spapr.

virt, ppce500, and spapr have extra checks to ensure just a few
device types can be instantiated:

* virt supports only TYPE_VFIO_CALXEDA_XGMAC, TYPE_VFIO_AMD_XGBE.
* ppce500 supports only TYPE_ETSEC_COMMON.
* spapr supports only TYPE_SPAPR_PCI_HOST_BRIDGE.

This patch sets user_creatable=true explicitly on those 4 device
classes.

Now, the more complex cases:

pc-q35-*: q35 has no sysbus device whitelist yet (which is a
separate bug). We are in the process of fixing it and building a
sysbus whitelist on q35, but in the meantime we can fix the
"-device help" and "info qdm" bugs mentioned above. Also, despite
not being strictly necessary for fixing the q35 bug, reducing the
list of user_creatable=true devices will help us be more
confident when building the q35 whitelist.

xen: We also have a hack at xen_set_dynamic_sysbus(), that sets
has_dynamic_sysbus=true at runtime when using the Xen
accelerator. This hack is only used to allow xen-backend devices
to be dynamically plugged/unplugged.

This means today we can use -device with the following 22 device
types, that are the ones compiled into the qemu-system-x86_64 and
qemu-system-i386 binaries:

* allwinner-ahci
* amd-iommu
* cfi.pflash01
* esp
* fw_cfg_io
* fw_cfg_mem
* generic-sdhci
* hpet
* intel-iommu
* ioapic
* isabus-bridge
* kvmclock
* kvm-ioapic
* kvmvapic
* SUNW,fdtwo
* sysbus-ahci
* sysbus-fdc
* sysbus-ohci
* unimplemented-device
* virtio-mmio
* xen-backend
* xen-sysdev

This patch adds user_creatable=true explicitly to those devices,
temporarily, just to keep 100% compatibility with existing
behavior of q35. Subsequent patches will remove
user_creatable=true from the devices that are really not meant to
user-creatable on any machine, and remove the FIXME comment from
the ones that are really supposed to be user-creatable. This is
being done in separate patches because we still don't have an
obvious list of devices that will be whitelisted by q35, and I
would like to get each device reviewed individually.

Cc: Alexander Graf <agraf@suse.de>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Alistair Francis <alistair.francis@xilinx.com>
Cc: Beniamino Galvani <b.galvani@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: "Edgar E. Iglesias" <edgar.iglesias@gmail.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Frank Blaschka <frank.blaschka@de.ibm.com>
Cc: Gabriel L. Somlo <somlo@cmu.edu>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: John Snow <jsnow@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Pierre Morel <pmorel@linux.vnet.ibm.com>
Cc: Prasad J Pandit <pjp@fedoraproject.org>
Cc: qemu-arm@nongnu.org
Cc: qemu-block@nongnu.org
Cc: qemu-ppc@nongnu.org
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rob Herring <robh@kernel.org>
Cc: Shannon Zhao <zhaoshenglong@huawei.com>
Cc: sstabellini@kernel.org
Cc: Thomas Huth <thuth@redhat.com>
Cc: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Acked-by: John Snow <jsnow@redhat.com>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-3-ehabkost@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[ehabkost: Small changes at sysbus_device_class_init() comments]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

show more ...


# c88fa6dd 29-Mar-2017 Alexey Kardashevskiy <aik@ozlabs.ru>

spapr_pci: Removed unused include

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>


# a01f3432 28-Mar-2017 Alexey Kardashevskiy <aik@ozlabs.ru>

spapr_pci: Warn when RAM page size is not enabled in IOMMU page mask

If a page size used by QEMU is not enabled in the PHB IOMMU page mask,
in-kernel acceleration of TCE handling won't be enabled an

spapr_pci: Warn when RAM page size is not enabled in IOMMU page mask

If a page size used by QEMU is not enabled in the PHB IOMMU page mask,
in-kernel acceleration of TCE handling won't be enabled and performance
might be slower than expected.

This prints a warning if system page size is not enabled. This should
print a warning if huge pages are enabled but sphb.pgsz still uses
the default value of 4K|64K.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# 82516263 14-Mar-2017 David Gibson <david@gibson.dropbear.id.au>

pseries: Don't expose PCIe extended config space on older machine types

bb9986452 "spapr_pci: Advertise access to PCIe extended config space"
allowed guests to access the extended config space of PC

pseries: Don't expose PCIe extended config space on older machine types

bb9986452 "spapr_pci: Advertise access to PCIe extended config space"
allowed guests to access the extended config space of PCI Express devices
via the PAPR interfaces, even though the paravirtualized bus mostly acts
like plain PCI.

However, that patch enabled access unconditionally, including for existing
machine types, which is an unwise change in behaviour. This patch limits
the change to pseries-2.9 (and later) machine types.

Suggested-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# bb998645 01-Mar-2017 David Gibson <david@gibson.dropbear.id.au>

spapr_pci: Advertise access to PCIe extended config space

The (paravirtual) PCI host bridge on the 'pseries' machine in most
regards acts like a regular PCI bus, rather than a PCIe bus. Despite
thi

spapr_pci: Advertise access to PCIe extended config space

The (paravirtual) PCI host bridge on the 'pseries' machine in most
regards acts like a regular PCI bus, rather than a PCIe bus. Despite
this, though, it does allow access to the PCIe extended config space.

We already implemented the RTAS methods to allow this access.. but
forgot to put the markers into the device tree so that guest's know it
is there. This adds them in.

With this, a pseries guest is able to view extended config space on
(for example an e1000e device. This should be enough to allow guests
to use at least some PCIe devices.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# f7759e43 27-Feb-2017 Cédric Le Goater <clg@kaod.org>

ppc/xics: use the QOM interface to get irqs

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.

ppc/xics: use the QOM interface to get irqs

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# 681bfade 27-Feb-2017 Cédric Le Goater <clg@kaod.org>

ppc/xics: store the ICS object under the sPAPR machine

A list of ICS objects was introduced under the XICS object for the
PowerNV machine but, for the sPAPR machine, it brings extra complexity
as th

ppc/xics: store the ICS object under the sPAPR machine

A list of ICS objects was introduced under the XICS object for the
PowerNV machine but, for the sPAPR machine, it brings extra complexity
as there is only a single ICS. To simplify the code, let's add the ICS
pointer under the sPAPR machine and try to reduce the use of this list
where possible.

Also, change the xics_spapr_*() routines to use an ICS object instead
of an XICSState and change their name to reflect that these are
specific to the sPAPR ICS object.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# a8eeafda 22-Feb-2017 Greg Kurz <gkurz@linux.vnet.ibm.com>

spapr/pci: populate PCI DT in reverse order

Since commit 1d2d974244c6 "spapr_pci: enumerate and add PCI device tree", QEMU
populates the PCI device tree in the opposite order compared to SLOF.

Befo

spapr/pci: populate PCI DT in reverse order

Since commit 1d2d974244c6 "spapr_pci: enumerate and add PCI device tree", QEMU
populates the PCI device tree in the opposite order compared to SLOF.

Before 1d2d974244c6:

Populating /pci@800000020000000
00 0000 (D) : 1af4 1000 virtio [ net ]
00 0800 (D) : 1af4 1001 virtio [ block ]
00 1000 (D) : 1af4 1009 virtio [ network ]
Populating /pci@800000020000000/unknown-legacy-device@2

7e5294b8 : /pci@800000020000000
7e52b998 : |-- ethernet@0
7e52c0c8 : |-- scsi@1
7e52c7e8 : +-- unknown-legacy-device@2 ok

Since 1d2d974244c6:

Populating /pci@800000020000000
00 1000 (D) : 1af4 1009 virtio [ network ]
Populating /pci@800000020000000/unknown-legacy-device@2
00 0800 (D) : 1af4 1001 virtio [ block ]
00 0000 (D) : 1af4 1000 virtio [ net ]

7e5e8118 : /pci@800000020000000
7e5ea6a0 : |-- unknown-legacy-device@2
7e5eadb8 : |-- scsi@1
7e5eb4d8 : +-- ethernet@0 ok

This behaviour change is not actually a bug since no assumptions should be
made on DT ordering. But it has no real justification either, other than
being the consequence of the way fdt_add_subnode() inserts new elements
to the front of the FDT rather than adding them to the tail.

This patch reverts to the historical SLOF ordering by walking PCI devices
in reverse order. This reconciles pseries with x86 machine types behavior.
It is expected to make things easier when porting existing applications to
power.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
(slight update to the changelog)
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# 2530a1a5 17-Feb-2017 Laurent Vivier <lvivier@redhat.com>

spapr: generate DT node names

When DT node names for PCI devices are generated by SLOF,
they are generated according to the type of the device
(for instance, ethernet for virtio-net-pci device).

No

spapr: generate DT node names

When DT node names for PCI devices are generated by SLOF,
they are generated according to the type of the device
(for instance, ethernet for virtio-net-pci device).

Node name for hotplugged devices is generated by QEMU.
This patch adds the mechanic to QEMU to create the node
name according to the device type too.

The data structure has been roughly copied from OpenBIOS/OpenHackware,
node names from SLOF.

Example:

Hotplugging some PCI cards with QEMU monitor:

device_add virtio-tablet-pci
device_add virtio-serial-pci
device_add virtio-mouse-pci
device_add virtio-scsi-pci
device_add virtio-gpu-pci
device_add ne2k_pci
device_add nec-usb-xhci
device_add intel-hda

What we can see in linux device tree:

for dir in /proc/device-tree/pci@800000020000000/*@*/; do
echo $dir
cat $dir/name
echo
done

WITHOUT this patch:

/proc/device-tree/pci@800000020000000/pci@0/
pci
/proc/device-tree/pci@800000020000000/pci@1/
pci
/proc/device-tree/pci@800000020000000/pci@2/
pci
/proc/device-tree/pci@800000020000000/pci@3/
pci
/proc/device-tree/pci@800000020000000/pci@4/
pci
/proc/device-tree/pci@800000020000000/pci@5/
pci
/proc/device-tree/pci@800000020000000/pci@6/
pci
/proc/device-tree/pci@800000020000000/pci@7/
pci

WITH this patch:

/proc/device-tree/pci@800000020000000/communication-controller@1/
communication-controller
/proc/device-tree/pci@800000020000000/display@4/
display
/proc/device-tree/pci@800000020000000/ethernet@5/
ethernet
/proc/device-tree/pci@800000020000000/input-controller@0/
input-controller
/proc/device-tree/pci@800000020000000/mouse@2/
mouse
/proc/device-tree/pci@800000020000000/multimedia-device@7/
multimedia-device
/proc/device-tree/pci@800000020000000/scsi@3/
scsi
/proc/device-tree/pci@800000020000000/usb-xhci@6/
usb-xhci

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# 5c4537bd 22-Nov-2016 David Gibson <david@gibson.dropbear.id.au>

spapr: Fix 2.7<->2.8 migration of PCI host bridge

daa2369 "spapr_pci: Add a 64-bit MMIO window" subtly broke migration
from qemu-2.7 to the current version. It split the device's MMIO
window into t

spapr: Fix 2.7<->2.8 migration of PCI host bridge

daa2369 "spapr_pci: Add a 64-bit MMIO window" subtly broke migration
from qemu-2.7 to the current version. It split the device's MMIO
window into two pieces for 32-bit and 64-bit MMIO.

The patch included backwards compatibility code to convert the old
property into the new format. However, the property value was also
transferred in the migration stream and compared with a (probably
unwise) VMSTATE_EQUAL. So, the "raw" value from 2.7 is compared to
the new style converted value from (pre-)2.8 giving a mismatch and
migration failure.

Along with the actual field that caused the breakage, there are
several other ill-advised VMSTATE_EQUAL()s. To fix forwards
migration, we read the values in the stream into scratch variables and
ignore them, instead of comparing for equality. To fix backwards
migration, we populate those scratch variables in pre_save() with
adjusted values to match the old behaviour.

To permit the eventual possibility of removing this cruft from the
stream, we only include these compatibility fields if a new
'pre-2.8-migration' property is set. We clear it on the pseries-2.8
machine type, which obviously can't be migrated backwards, but set it
on earlier machine type versions.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>

show more ...


# 5a78b821 21-Nov-2016 David Gibson <david@gibson.dropbear.id.au>

Revert "spapr: Fix migration of PCI host bridges from qemu-2.7"

This reverts commit 9b54ca0ba781012eeea4237b7c4832ba2ea81d89.

The commit above corrected a migration breakage between qemu-2.7 and
qe

Revert "spapr: Fix migration of PCI host bridges from qemu-2.7"

This reverts commit 9b54ca0ba781012eeea4237b7c4832ba2ea81d89.

The commit above corrected a migration breakage between qemu-2.7 and
qemu-2.8. However it did so by advancing the migration version for
the PCI host bridge, which obviously breaks migration backwards to
earlier qemu versions.

Although it's not totally essential, we'd like to maintain the
possibility for backwards migration, so revert the change in
preparation for a better fix.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>

show more ...


# 9b54ca0b 14-Nov-2016 David Gibson <david@gibson.dropbear.id.au>

spapr: Fix migration of PCI host bridges from qemu-2.7

daa2369 "spapr_pci: Add a 64-bit MMIO window" subtly broke migration from
qemu-2.7 to the current version. It split the device's MMIO window i

spapr: Fix migration of PCI host bridges from qemu-2.7

daa2369 "spapr_pci: Add a 64-bit MMIO window" subtly broke migration from
qemu-2.7 to the current version. It split the device's MMIO window into
two pieces for 32-bit and 64-bit MMIO.

The patch included backwards compatibility code to convert the old property
into the new format. However, the property value was also transferred in
the migration stream and compared with a (probably unwise) VMSTATE_EQUAL.
So, the "raw" value from 2.7 is compared to the new style converted value
from (pre-)2.8 giving a mismatch and migration failure.

Although it would be technically possible to fix this in a way allowing
backwards migration, that would leave an ugly legacy around indefinitely.
This patch takes the simpler approach of bumping the migration version,
dropping the unwise VMSTATE_EQUAL (and some equally unwise ones around it)
and ignoring them on an incoming migration.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>

show more ...


# 4bcfa56c 18-Oct-2016 Michael Roth <mdroth@linux.vnet.ibm.com>

spapr_pci: advertise explicit numa IDs even when there's 1 node

With the addition of "numa_node" properties for PHBs we began
advertising NUMA affinity in cases where nb_numa_nodes > 1.

Since the d

spapr_pci: advertise explicit numa IDs even when there's 1 node

With the addition of "numa_node" properties for PHBs we began
advertising NUMA affinity in cases where nb_numa_nodes > 1.

Since the default on the guest side is to make no assumptions about
PHB NUMA affinity (defaulting to -1), there is still a valid use-case
for explicitly defining a PHB's NUMA affinity even when there's just
one node. In particular, some workloads make faulty assumptions about
/sys/bus/pci/<devid>/numa_node being >= 0, warranting the use of
this property as a workaround even if there's just 1 PHB or NUMA
node.

Enable this use-case by always advertising the PHB's NUMA affinity
if "numa_node" has been explicitly set.

We could achieve this by relaxing the check to simply be
nb_numa_nodes > 0, but even safer would be to check
numa_info[nodeid].present explicitly, and to fail at start time
for cases where it does not exist.

This has an additional affect of no longer advertising PHB NUMA
affinity unconditionally if nb_numa_nodes > 1 and "numa_node"
property is unset/-1, but since the default value on the guest
side for each PHB is also -1, the behavior should be the same for
that situation. We could still retain the old behavior if desired,
but the decision seems arbitrary, so we take the simpler route.

Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Shivaprasad G. Bhat <shivapbh@in.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# 357d1e3b 16-Oct-2016 David Gibson <david@gibson.dropbear.id.au>

spapr: Improved placement of PCI host bridges in guest memory map

Currently, the MMIO space for accessing PCI on pseries guests begins at
1 TiB in guest address space. Each PCI host bridge (PHB) ha

spapr: Improved placement of PCI host bridges in guest memory map

Currently, the MMIO space for accessing PCI on pseries guests begins at
1 TiB in guest address space. Each PCI host bridge (PHB) has a 64 GiB
chunk of address space in which it places its outbound PIO and 32-bit and
64-bit MMIO windows.

This scheme as several problems:
- It limits guest RAM to 1 TiB (though we have a limited fix for this
now)
- It limits the total MMIO window to 64 GiB. This is not always enough
for some of the large nVidia GPGPU cards
- Putting all the windows into a single 64 GiB area means that naturally
aligning things within there will waste more address space.
In addition there was a miscalculation in some of the defaults, which meant
that the MMIO windows for each PHB actually slightly overran the 64 GiB
region for that PHB. We got away without nasty consequences because
the overrun fit within an unused area at the beginning of the next PHB's
region, but it's not pretty.

This patch implements a new scheme which addresses those problems, and is
also closer to what bare metal hardware and pHyp guests generally use.

Because some guest versions (including most current distro kernels) can't
access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and
64 TiB. This is broken into 1 TiB chunks. The first 1 TiB contains the
PIO (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs. Each
subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for
one PHB each.

This reduces the number of allowed PHBs (without full manual configuration
of all the windows) from 256 to 31, but this should still be plenty in
practice.

We also change some of the default window sizes for manually configured
PHBs to saner values.

Finally we adjust some tests and libqos so that it correctly uses the new
default locations. Ideally it would parse the device tree given to the
guest, but that's a more complex problem for another time.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>

show more ...


# daa23699 11-Oct-2016 David Gibson <david@gibson.dropbear.id.au>

spapr_pci: Add a 64-bit MMIO window

On real hardware, and under pHyp, the PCI host bridges on Power machines
typically advertise two outbound MMIO windows from the guest's physical
memory space to P

spapr_pci: Add a 64-bit MMIO window

On real hardware, and under pHyp, the PCI host bridges on Power machines
typically advertise two outbound MMIO windows from the guest's physical
memory space to PCI memory space:
- A 32-bit window which maps onto 2GiB..4GiB in the PCI address space
- A 64-bit window which maps onto a large region somewhere high in PCI
address space (traditionally this used an identity mapping from guest
physical address to PCI address, but that's not always the case)

The qemu implementation in spapr-pci-host-bridge, however, only supports a
single outbound MMIO window, however. At least some Linux versions expect
the two windows however, so we arranged this window to map onto the PCI
memory space from 2 GiB..~64 GiB, then advertised it as two contiguous
windows, the "32-bit" window from 2G..4G and the "64-bit" window from
4G..~64G.

This approach means, however, that the 64G window is not naturally aligned.
In turn this limits the size of the largest BAR we can map (which does have
to be naturally aligned) to roughly half of the total window. With some
large nVidia GPGPU cards which have huge memory BARs, this is starting to
be a problem.

This patch adds true support for separate 32-bit and 64-bit outbound MMIO
windows to the spapr-pci-host-bridge implementation, each of which can
be independently configured. The 32-bit window always maps to 2G.. in PCI
space, but the PCI address of the 64-bit window can be configured (it
defaults to the same as the guest physical address).

So as not to break possible existing configurations, as long as a 64-bit
window is not specified, a large single window can be specified. This
will appear the same way to the guest as the old approach, although it's
now implemented by two contiguous memory regions rather than a single one.

For now, this only adds the possibility of 64-bit windows. The default
configuration still uses the legacy mode.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>

show more ...


# 6737d9ad 12-Oct-2016 David Gibson <david@gibson.dropbear.id.au>

spapr_pci: Delegate placement of PCI host bridges to machine type

The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB)
for a PAPR guest. Unlike on x86, it's routine on Power (bo

spapr_pci: Delegate placement of PCI host bridges to machine type

The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB)
for a PAPR guest. Unlike on x86, it's routine on Power (both bare metal
and PAPR guests) to have numerous independent PHBs, each controlling a
separate PCI domain.

There are two ways of configuring the spapr-pci-host-bridge device: first
it can be done fully manually, specifying the locations and sizes of all
the IO windows. This gives the most control, but is very awkward with 6
mandatory parameters. Alternatively just an "index" can be specified
which essentially selects from an array of predefined PHB locations.
The PHB at index 0 is automatically created as the default PHB.

The current set of default locations causes some problems for guests with
large RAM (> 1 TiB) or PCI devices with very large BARs (e.g. big nVidia
GPGPU cards via VFIO). Obviously, for migration we can only change the
locations on a new machine type, however.

This is awkward, because the placement is currently decided within the
spapr-pci-host-bridge code, so it breaks abstraction to look inside the
machine type version.

So, this patch delegates the "default mode" PHB placement from the
spapr-pci-host-bridge device back to the machine type via a public method
in sPAPRMachineClass. It's still a bit ugly, but it's about the best we
can do.

For now, this just changes where the calculation is done. It doesn't
change the actual location of the host bridges, or any other behaviour.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>

show more ...


12345678910>>...17