History log of /kvmtool/vfio/pci.c (Results 1 – 25 of 30)
Revision Date Author Comments
# 0b5e55fc 28-Jun-2023 Jean-Philippe Brucker <jean-philippe@linaro.org>

vfio/pci: Clarify the MSI states

The MSI and MSI-X implementations is a bit complex, because it keeps
track of capability and vector states as seen by both the guest and the
host. Add a few comments

vfio/pci: Clarify the MSI states

The MSI and MSI-X implementations is a bit complex, because it keeps
track of capability and vector states as seen by both the guest and the
host. Add a few comments about those states and rename them to something
more accurate.

What's called phys_state at the moment represents the software state
maintained by VFIO and kvmtool, rather than the physical MSI capability,
so host_state is more correct. To be consistent, rename virt_state to
guest_state as well.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230628112331.453904-4-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>

show more ...


# 3a36d341 28-Jun-2023 Jean-Philippe Brucker <jean-philippe@linaro.org>

vfio/pci: Initialize MSI vectors unmasked

MSI vectors can be masked and unmasked individually when using the MSI-X
capability, or when the classic MSI capability supports Per-Vector
Masking. At the

vfio/pci: Initialize MSI vectors unmasked

MSI vectors can be masked and unmasked individually when using the MSI-X
capability, or when the classic MSI capability supports Per-Vector
Masking. At the moment we incorrectly initialize the guest's view of the
vectors (virt_state) as masked, so when using a MSI capability without
Per-Vector Masking, the vectors are never unmasked and MSIs don't work.
Initialize them unmasked instead.

Since VFIO doesn't support per-vector masking we implement it by
disconnecting the irqfd, and keep track of it with the vector's
phys_state. Initially the irqfd is not connected so phys_state is
masked.

Reported-by: Vivek Gautam <vivek.gautam@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230628112331.453904-3-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>

show more ...


# 39181fc6 12-Oct-2021 Alexandru Elisei <alexandru.elisei@arm.com>

vfio/pci: Align MSIX Table and PBA size to guest maximum page size

When allocating MMIO space for the MSI-X table, kvmtool rounds the
allocation to the host's page size to make it as easy as possibl

vfio/pci: Align MSIX Table and PBA size to guest maximum page size

When allocating MMIO space for the MSI-X table, kvmtool rounds the
allocation to the host's page size to make it as easy as possible for the
guest to map the table to a page, if it wants to (and doesn't do BAR
reassignment, like the x86 architecture for example). However, the host's
page size can differ from the guest's on architectures which support
multiple page sizes. For example, arm64 supports three different page size,
and it is possible for the host to be using 4k pages, while the guest is
using 64k pages.

To make sure the allocation is always aligned to a guest's page size, round
it up to the maximum architectural page size. Do the same for the pending
bit array if it lives in its own BAR.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20211012132510.42134-8-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

show more ...


# b20d6e30 12-Oct-2021 Alexandru Elisei <alexandru.elisei@arm.com>

vfio/pci: Print an error when offset is outside of the MSIX table or PBA

Now that we keep track of the real size of MSIX table and PBA, print an
error when the guest tries to write to an offset whic

vfio/pci: Print an error when offset is outside of the MSIX table or PBA

Now that we keep track of the real size of MSIX table and PBA, print an
error when the guest tries to write to an offset which is not inside the
correct regions.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20211012132510.42134-7-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

show more ...


# f93acc04 12-Oct-2021 Alexandru Elisei <alexandru.elisei@arm.com>

vfio/pci: Rework MSIX table and PBA physical size allocation

When creating the MSIX table and PBA, kvmtool rounds up the table and
pending bit array sizes to the host's page size. Unfortunately, whe

vfio/pci: Rework MSIX table and PBA physical size allocation

When creating the MSIX table and PBA, kvmtool rounds up the table and
pending bit array sizes to the host's page size. Unfortunately, when doing
that, it doesn't take into account that the new size can exceed the device
BAR size, leading to hard to diagnose errors for certain configurations.

One theoretical example: PBA and table in the same 4k BAR, host's page size
is 4k. In this case, table->size = 4k, pba->size = 4k, map_size = 4k, which
means that pba->guest_phys_addr = table->guest_phys_addr + 4k, which is
outside of the 4k MMIO range allocated for both structures.

Another example, this time a real-world error that I encountered: happens
with a 64k host booting a 4k guest, an RTL8168 PCIE NIC assigned to the
guest. In this case, kvmtool sets table->size = 64k (because it's rounded
to the host's page size) and pba->size = 64k.

Truncated output of lspci -vv on the host:

01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
Subsystem: TP-LINK Technologies Co., Ltd. TG-3468 Gigabit PCI Express Network Adapter
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 255
Region 0: I/O ports at 1000 [size=256]
Region 2: Memory at 40000000 (64-bit, non-prefetchable) [size=4K]
Region 4: Memory at 100000000 (64-bit, prefetchable) [size=16K]
[..]
Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
Vector table: BAR=4 offset=00000000
PBA: BAR=4 offset=00000800
[..]

When booting the guest:

[..]
[ 0.207444] pci-host-generic 40000000.pci: host bridge /pci ranges:
[ 0.208564] pci-host-generic 40000000.pci: IO 0x0000000000..0x000000ffff -> 0x0000000000
[ 0.209857] pci-host-generic 40000000.pci: MEM 0x0050000000..0x007fffffff -> 0x0050000000
[ 0.211184] pci-host-generic 40000000.pci: ECAM at [mem 0x40000000-0x4fffffff] for [bus 00]
[ 0.212625] pci-host-generic 40000000.pci: PCI host bridge to bus 0000:00
[ 0.213647] pci_bus 0000:00: root bus resource [bus 00]
[ 0.214429] pci_bus 0000:00: root bus resource [io 0x0000-0xffff]
[ 0.215355] pci_bus 0000:00: root bus resource [mem 0x50000000-0x7fffffff]
[ 0.216676] pci 0000:00:00.0: [10ec:8168] type 00 class 0x020000
[ 0.223771] pci 0000:00:00.0: reg 0x10: [io 0x6200-0x62ff]
[ 0.239765] pci 0000:00:00.0: reg 0x18: [mem 0x50010000-0x50010fff]
[ 0.244595] pci 0000:00:00.0: reg 0x20: [mem 0x50000000-0x50003fff]
[ 0.246331] pci 0000:00:01.0: [1af4:1000] type 00 class 0x020000
[ 0.247278] pci 0000:00:01.0: reg 0x10: [io 0x6300-0x63ff]
[ 0.248212] pci 0000:00:01.0: reg 0x14: [mem 0x50020000-0x500200ff]
[ 0.249172] pci 0000:00:01.0: reg 0x18: [mem 0x50020400-0x500207ff]
[ 0.250450] pci 0000:00:02.0: [1af4:1001] type 00 class 0x018000
[ 0.251392] pci 0000:00:02.0: reg 0x10: [io 0x6400-0x64ff]
[ 0.252351] pci 0000:00:02.0: reg 0x14: [mem 0x50020800-0x500208ff]
[ 0.253312] pci 0000:00:02.0: reg 0x18: [mem 0x50020c00-0x50020fff]
[ 0.254760] pci 0000:00:00.0: BAR 4: assigned [mem 0x50000000-0x50003fff] (1)
[ 0.255805] pci 0000:00:00.0: BAR 2: assigned [mem 0x50004000-0x50004fff] (2)
Warning: [10ec:8168] Error activating emulation for BAR 2
Warning: [10ec:8168] Error activating emulation for BAR 2
[ 0.260432] pci 0000:00:01.0: BAR 2: assigned [mem 0x50005000-0x500053ff]
Warning: [1af4:1000] Error activating emulation for BAR 2
Warning: [1af4:1000] Error activating emulation for BAR 2
[ 0.261469] pci 0000:00:02.0: BAR 2: assigned [mem 0x50005400-0x500057ff]
Warning: [1af4:1001] Error activating emulation for BAR 2
Warning: [1af4:1001] Error activating emulation for BAR 2
[ 0.262499] pci 0000:00:00.0: BAR 0: assigned [io 0x1000-0x10ff]
[ 0.263415] pci 0000:00:01.0: BAR 0: assigned [io 0x1100-0x11ff]
[ 0.264462] pci 0000:00:01.0: BAR 1: assigned [mem 0x50005800-0x500058ff]
Warning: [1af4:1000] Error activating emulation for BAR 1
Warning: [1af4:1000] Error activating emulation for BAR 1
[ 0.265481] pci 0000:00:02.0: BAR 0: assigned [io 0x1200-0x12ff]
[ 0.266397] pci 0000:00:02.0: BAR 1: assigned [mem 0x50005900-0x500059ff]
Warning: [1af4:1001] Error activating emulation for BAR 1
Warning: [1af4:1001] Error activating emulation for BAR 1
[ 0.267892] EINJ: ACPI disabled.
[ 0.269922] virtio-pci 0000:00:01.0: virtio_pci: leaving for legacy driver
[ 0.271118] virtio-pci 0000:00:02.0: virtio_pci: leaving for legacy driver
[ 0.274122] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[ 0.275930] printk: console [ttyS0] disabled
[ 0.276669] 1000000.U6_16550A: ttyS0 at MMIO 0x1000000 (irq = 13, base_baud = 115200) is a 16550A
[ 0.278058] printk: console [ttyS0] enabled
[ 0.278058] printk: console [ttyS0] enabled
[ 0.279304] printk: bootconsole [ns16550a0] disabled
[ 0.279304] printk: bootconsole [ns16550a0] disabled
[ 0.281252] 1001000.U6_16550A: ttyS1 at MMIO 0x1001000 (irq = 14, base_baud = 115200) is a 16550A
[ 0.282842] 1002000.U6_16550A: ttyS2 at MMIO 0x1002000 (irq = 15, base_baud = 115200) is a 16550A
[ 0.284611] 1003000.U6_16550A: ttyS3 at MMIO 0x1003000 (irq = 16, base_baud = 115200) is a 16550A
[ 0.286094] SuperH (H)SCI(F) driver initialized
[ 0.286868] msm_serial: driver initialized
[ 0.287890] [drm] radeon kernel modesetting enabled.
[ 0.288826] cacheinfo: Unable to detect cache hierarchy for CPU 0
[ 0.293321] loop: module loaded
KVM_SET_GSI_ROUTING: Invalid argument

At (1), the guest writes 0x50000000 into BAR 4 of the NIC (which holds
the MSIX table and PBA), expecting that will cover only 16k of address
space (the BAR size), up to 0x50003fff, inclusive. On the host side, in
vfio_pci_bar_activate(), kvmtool will actually register for MMIO
emulation the region 0x50000000-0x5000ffff (64k in total) for the MSIX
table and 0x50010000-0x5001ffff (another 64k) for the PBA (kvmtool set
table->size and pba->size to 64k when it aligned them to the host's page
size).

Then at step (2), the guest writes the next available address (from its
point of view) into BAR 2 of the NIC, which is 0x50004000. On the host
side, the PCI emulation layer will search all the regions that overlap with
the BAR address range (0x50004000-0x50004fff) and will find none because,
just like the guest, it uses the BAR size to check for overlaps. When
vfio_pci_bar_activate() is reached, kvmtool will try to register memory for
this region, but it is already registered for the MSIX table emulation and
fails.

The same scenario repeats for every following memory BAR, because the MSIX
table and PBA use memory from 0x50000000 to 0x5001ffff.

The error at the end, which finally terminates the VM, is caused by the
guest trying to write to a totally different BAR, which vfio-pci
interpretes as a write to MSI-X table because it falls in the 64k region
that was registered for emulation. The IRQ ID is not a valid SPI number and
gicv2m_update_routing() returns an error (and sets errno to EINVAL).

Fix this by aligning the table and PBA size to 8 bytes to allow for
qword accesses, like PCI 3.0 mandates.

For the sake of simplicity, the PBA offset in a BAR, in case of a shared
BAR, is kept the same as the offset of the physical device. One hopes that
the device respects the recommendations set forth in PCI LOCAL BUS
SPECIFICATION, REV. 3.0, section "MSI-X Capability and Table Structures"

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20211012132510.42134-6-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

show more ...


# 5f44d5d6 12-Oct-2021 Alexandru Elisei <alexandru.elisei@arm.com>

vfio/pci: Rename PBA offset in device descriptor to fd_offset

The MSI-X capability defines a PBA offset, which is the offset of the PBA
array in the BAR that holds the array.

kvmtool uses the field

vfio/pci: Rename PBA offset in device descriptor to fd_offset

The MSI-X capability defines a PBA offset, which is the offset of the PBA
array in the BAR that holds the array.

kvmtool uses the field "pba_offset" in struct msix_cap (which represents
the MSIX capability) to refer to the [PBA offset:BAR] field of the
capability; and the field "offset" in the struct vfio_pci_msix_pba to refer
to offset of the PBA array in the device descriptor created by the VFIO
driver.

As we're getting ready to add yet another field that represents an offset
to struct vfio_pci_msix_pba, try to avoid ambiguities by renaming the
struct's "offset" field to "fd_offset".

No functional change intended.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20211012132510.42134-5-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

show more ...


# 3d3dca07 12-Oct-2021 Alexandru Elisei <alexandru.elisei@arm.com>

vfio/pci.c: Remove double include for assert.h

assert.h is included twice, keep only one instance.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.el

vfio/pci.c: Remove double include for assert.h

assert.h is included twice, keep only one instance.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20211012132510.42134-3-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

show more ...


# 25c1dc6c 13-Jul-2021 Alexandru Elisei <alexandru.elisei@arm.com>

arm/arm64: vfio: Add PCI Express Capability Structure

It turns out that some Linux drivers (like Realtek R8169) fall back to a
device-specific configuration method if the device is not PCI Express
c

arm/arm64: vfio: Add PCI Express Capability Structure

It turns out that some Linux drivers (like Realtek R8169) fall back to a
device-specific configuration method if the device is not PCI Express
capable:

[ 1.433825] r8169 0000:00:00.0 enp0s0: No native access to PCI extended config space, falling back to CSI

Add the PCI Express Capability Structure and populate it for assigned
devices, as this is how the Linux PCI driver determines if a device is PCI
Express capable.

Because we don't emulate a PCI Express link, a root complex or any slot
related properties, the PCI Express capability is kept as small as possible
by ignoring those fields.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20210713170631.155595-5-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

show more ...


# e69b7663 13-Jul-2021 Alexandru Elisei <alexandru.elisei@arm.com>

arm/arm64: Add PCI Express 1.1 support

PCI Express comes with an extended addressing scheme, which directly
translated into a bigger device configuration space (256->4096 bytes)
and bigger PCI confi

arm/arm64: Add PCI Express 1.1 support

PCI Express comes with an extended addressing scheme, which directly
translated into a bigger device configuration space (256->4096 bytes)
and bigger PCI configuration space (16->256 MB), as well as mandatory
capabilities (power management [1] and PCI Express capability [2]).

However, our virtio PCI implementation implements version 0.9 of the
protocol and it still uses transitional PCI device ID's, so we have
opted to omit the mandatory PCI Express capabilities. For VFIO, the power
management and PCI Express capability are left for a subsequent patch.

[1] PCI Express Base Specification Revision 1.1, section 7.6
[2] PCI Express Base Specification Revision 1.1, section 7.8

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20210713170631.155595-4-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

show more ...


# 465edc9d 14-May-2020 Alexandru Elisei <alexandru.elisei@arm.com>

pci: Implement reassignable BARs

BARs are used by the guest to configure the access to the PCI device by
writing the address to which the device will respond. The basic idea for
adding support for r

pci: Implement reassignable BARs

BARs are used by the guest to configure the access to the PCI device by
writing the address to which the device will respond. The basic idea for
adding support for reassignable BARs is straightforward: deactivate
emulation for the memory region described by the old BAR value, and
activate emulation for the new region.

BAR reassignment can be done while device access is enabled and memory
regions for different devices can overlap as long as no access is made to
the overlapping memory regions. This means that it is legal for the BARs of
two distinct devices to point to an overlapping memory region, and indeed,
this is how Linux does resource assignment at boot. To account for this
situation, the simple algorithm described above is enhanced to scan for all
devices and:

- Deactivate emulation for any BARs that might overlap with the new BAR
value.

- Enable emulation for any BARs that were overlapping with the old value
after the BAR has been updated.

Activating/deactivating emulation of a memory region has side effects. In
order to prevent the execution of the same callback twice we now keep track
of the state of the region emulation. For example, this can happen if we
program a BAR with an address that overlaps a second BAR, thus deactivating
emulation for the second BAR, and then we disable all region accesses to
the second BAR by writing to the command register.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/1589470709-4104-11-git-send-email-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

show more ...


# 5a8e4f25 14-May-2020 Alexandru Elisei <alexandru.elisei@arm.com>

pci: Implement callbacks for toggling BAR emulation

Implement callbacks for activating and deactivating emulation for a BAR
region. This is in preparation for allowing a guest operating system to
en

pci: Implement callbacks for toggling BAR emulation

Implement callbacks for activating and deactivating emulation for a BAR
region. This is in preparation for allowing a guest operating system to
enable and disable access to I/O or memory space, or to reassign the
BARs.

The emulated vesa device framebuffer isn't designed to allow stopping and
restarting at arbitrary points in the guest execution. Furthermore, on x86,
the kernel will not change the BAR addresses, which on bare metal are
programmed by the firmware, so take the easy way out and refuse to
activate/deactivate emulation for the BAR regions. We also take this
opportunity to make the vesa emulation code more consistent by moving all
static variable definitions in one place, at the top of the file.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/1589470709-4104-9-git-send-email-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

show more ...


# e1d0285c 14-May-2020 Alexandru Elisei <alexandru.elisei@arm.com>

vfio/pci: Don't write configuration value twice

After writing to the device fd as part of the PCI configuration space
emulation, we read back from the device to make sure that the write
finished. Th

vfio/pci: Don't write configuration value twice

After writing to the device fd as part of the PCI configuration space
emulation, we read back from the device to make sure that the write
finished. The value is read back into the PCI configuration space and
afterwards, the same value is copied by the PCI emulation code. Let's
read from the device fd into a temporary variable, to prevent this
double write.

The double write is harmless in itself. But when we implement
reassignable BARs, we need to keep track of the old BAR value, and the
VFIO code is overwritting it.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/1589470709-4104-7-git-send-email-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

show more ...


# a05e576f 14-May-2020 Alexandru Elisei <alexandru.elisei@arm.com>

vfio: Reserve ioports when configuring the BAR

Let's be consistent and reserve ioports when we are configuring the BAR,
not when we map it, just like we do with mmio regions.

Signed-off-by: Alexand

vfio: Reserve ioports when configuring the BAR

Let's be consistent and reserve ioports when we are configuring the BAR,
not when we map it, just like we do with mmio regions.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/1589470709-4104-5-git-send-email-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

show more ...


# c0c45eed 24-Apr-2020 Andre Przywara <andre.przywara@arm.com>

pci: Move legacy IRQ assignment into devices

So far the (legacy) IRQ line for a PCI device is allocated in devices.c,
which should actually not take care of that. Since we allocate all other
device

pci: Move legacy IRQ assignment into devices

So far the (legacy) IRQ line for a PCI device is allocated in devices.c,
which should actually not take care of that. Since we allocate all other
device specific resources in the actual device emulation code, the IRQ
should not be something special.

Remove the PCI specific code from devices.c, and move the IRQ line
allocation to the PCI code.
This drops the IRQ line from the VESA device, since it does not use one.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

show more ...


# e554aefd 24-Apr-2020 Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

vfio: fix multi-MSI vector handling

A PCI device with a MSI capability enabling Multiple MSI messages
(through the Multiple Message Enable field in the Message Control
register[6:4]) is expected to

vfio: fix multi-MSI vector handling

A PCI device with a MSI capability enabling Multiple MSI messages
(through the Multiple Message Enable field in the Message Control
register[6:4]) is expected to drive the Message Data lower bits (number
determined by the number of selected vectors) to generate the
corresponding MSI messages writes on the PCI bus.

Therefore, KVM expects the MSI data lower bits (a number of
bits that depend on bits [6:4] of the Message Control
register - which in turn control the number of vectors
allocated) to be set-up by kvmtool while programming the
MSI IRQ routing entries to make sure the MSI entries can
actually be demultiplexed by KVM and IRQ routes set-up
accordingly so that when an actual HW fires KVM can
route it to the correct entry in the interrupt controller
(and set-up a correct passthrough route for directly
injected interrupt).

Current kvmtool code does not set-up Message data entries
correctly for multi-MSI vectors - the data field is left
as programmed in the MSI capability by the guest for all
vector entries, triggering IRQs misrouting.

Fix it.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>

show more ...


# 3665392a 14-Apr-2020 Alexandru Elisei <alexandru.elisei@arm.com>

vfio/pci: Don't access unallocated regions

Don't try to configure a BAR if there is no region associated with it.

Also move the variable declarations from inside the loop to the start of
the functi

vfio/pci: Don't access unallocated regions

Don't try to configure a BAR if there is no region associated with it.

Also move the variable declarations from inside the loop to the start of
the function for consistency.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

show more ...


# 5b7fef16 14-Apr-2020 Alexandru Elisei <alexandru.elisei@arm.com>

vfio/pci: Ignore expansion ROM BAR writes

To get the size of the expansion ROM, software writes 0xfffff800 to the
expansion ROM BAR in the PCI configuration space. PCI emulation executes
the optiona

vfio/pci: Ignore expansion ROM BAR writes

To get the size of the expansion ROM, software writes 0xfffff800 to the
expansion ROM BAR in the PCI configuration space. PCI emulation executes
the optional configuration space write callback that a device can implement
before emulating this write.

kvmtool's implementation of VFIO doesn't have support for emulating
expansion ROMs. However, the callback writes the guest value to the
hardware BAR, and then it reads it back to the emulated BAR to make sure
the write has completed successfully.

After this, we return to regular PCI emulation and because the BAR is no
longer 0, we write back to the BAR the value that the guest used to get the
size. As a result, the guest will think that the ROM size is 0x800 after
the subsequent read and we end up unintentionally exposing to the guest a
BAR which we don't emulate.

Let's fix this by ignoring writes to the expansion ROM BAR.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

show more ...


# 84998f21 14-Apr-2020 Alexandru Elisei <alexandru.elisei@arm.com>

vfio/pci: Don't assume that only even numbered BARs are 64bit

Not all devices have the bottom 32 bits of a 64 bit BAR in an even
numbered BAR. For example, on an NVIDIA Quadro P400, BARs 1 and 3 are

vfio/pci: Don't assume that only even numbered BARs are 64bit

Not all devices have the bottom 32 bits of a 64 bit BAR in an even
numbered BAR. For example, on an NVIDIA Quadro P400, BARs 1 and 3 are
64bit. Remove this assumption.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

show more ...


# ed01a603 14-Apr-2020 Alexandru Elisei <alexandru.elisei@arm.com>

vfio/pci: Allocate correct size for MSIX table and PBA BARs

kvmtool assumes that the BAR that holds the address for the MSIX table
and PBA structure has a size which is equal to their total size and

vfio/pci: Allocate correct size for MSIX table and PBA BARs

kvmtool assumes that the BAR that holds the address for the MSIX table
and PBA structure has a size which is equal to their total size and it
allocates memory from MMIO space accordingly. However, when
initializing the BARs, the BAR size is set to the region size reported
by VFIO. When the physical BAR size is greater than the mmio space that
kvmtool allocates, we can have a situation where the BAR overlaps with
another BAR, in which case kvmtool will fail to map the memory. This was
found when trying to do PCI passthrough with a PCIe Realtek r8168 NIC,
when the guest was also using virtio-block and virtio-net devices:

[..]
[ 0.197926] PCI: OF: PROBE_ONLY enabled
[ 0.198454] pci-host-generic 40000000.pci: host bridge /pci ranges:
[ 0.199291] pci-host-generic 40000000.pci: IO 0x00007000..0x0000ffff -> 0x00007000
[ 0.200331] pci-host-generic 40000000.pci: MEM 0x41000000..0x7fffffff -> 0x41000000
[ 0.201480] pci-host-generic 40000000.pci: ECAM at [mem 0x40000000-0x40ffffff] for [bus 00]
[ 0.202635] pci-host-generic 40000000.pci: PCI host bridge to bus 0000:00
[ 0.203535] pci_bus 0000:00: root bus resource [bus 00]
[ 0.204227] pci_bus 0000:00: root bus resource [io 0x0000-0x8fff] (bus address [0x7000-0xffff])
[ 0.205483] pci_bus 0000:00: root bus resource [mem 0x41000000-0x7fffffff]
[ 0.206456] pci 0000:00:00.0: [10ec:8168] type 00 class 0x020000
[ 0.207399] pci 0000:00:00.0: reg 0x10: [io 0x0000-0x00ff]
[ 0.208252] pci 0000:00:00.0: reg 0x18: [mem 0x41002000-0x41002fff]
[ 0.209233] pci 0000:00:00.0: reg 0x20: [mem 0x41000000-0x41003fff]
[ 0.210481] pci 0000:00:01.0: [1af4:1000] type 00 class 0x020000
[ 0.211349] pci 0000:00:01.0: reg 0x10: [io 0x0100-0x01ff]
[ 0.212118] pci 0000:00:01.0: reg 0x14: [mem 0x41003000-0x410030ff]
[ 0.212982] pci 0000:00:01.0: reg 0x18: [mem 0x41003200-0x410033ff]
[ 0.214247] pci 0000:00:02.0: [1af4:1001] type 00 class 0x018000
[ 0.215096] pci 0000:00:02.0: reg 0x10: [io 0x0200-0x02ff]
[ 0.215863] pci 0000:00:02.0: reg 0x14: [mem 0x41003400-0x410034ff]
[ 0.216723] pci 0000:00:02.0: reg 0x18: [mem 0x41003600-0x410037ff]
[ 0.218105] pci 0000:00:00.0: can't claim BAR 4 [mem 0x41000000-0x41003fff]: address conflict with 0000:00:00.0 [mem 0x41002000-0x41002fff]
[..]

Guest output of lspci -vv:

00:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
Subsystem: TP-LINK Technologies Co., Ltd. TG-3468 Gigabit PCI Express Network Adapter
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 16
Region 0: I/O ports at 0000 [size=256]
Region 2: Memory at 41002000 (64-bit, non-prefetchable) [size=4K]
Region 4: Memory at 41000000 (64-bit, prefetchable) [size=16K]
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
Vector table: BAR=4 offset=00000000
PBA: BAR=4 offset=00001000

Let's fix this by allocating an amount of MMIO memory equal to the size
of the BAR that contains the MSIX table and/or PBA.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

show more ...


# 854aa2ef 14-Apr-2020 Julien Thierry <julien.thierry@arm.com>

ioport: pci: Move port allocations to PCI devices

The dynamic ioport allocation with IOPORT_EMPTY is currently only used
by PCI devices. Other devices use fixed ports for which they request
registra

ioport: pci: Move port allocations to PCI devices

The dynamic ioport allocation with IOPORT_EMPTY is currently only used
by PCI devices. Other devices use fixed ports for which they request
registration to the ioport API.

PCI ports need to be in the PCI IO space and there is no reason ioport
API should know a PCI port is being allocated and needs to be placed in
PCI IO space. This currently just happens to be the case.

Move the responsability of dynamic allocation of ioports from the ioport
API to PCI.

In the future, if other types of devices also need dynamic ioport
allocation, they'll have to figure out the range of ports they are
allowed to use.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
[Renamed functions for clarity]
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

show more ...


# a3704b91 03-May-2019 Andre Przywara <andre.przywara@arm.com>

vfio: rework vfio_irq_set payload setting

struct vfio_irq_set from the kernel headers contains a variable sized
array to hold a payload. The vfio_irq_eventfd struct puts the "fd"
member right after

vfio: rework vfio_irq_set payload setting

struct vfio_irq_set from the kernel headers contains a variable sized
array to hold a payload. The vfio_irq_eventfd struct puts the "fd"
member right after this, hoping it to automatically fit in the payload slot.
But having a variable sized type not at the end of a struct is a GNU C
extension, so clang will refuse to compile this.

Solve this by somewhat doing the compiler's job and place the payload
manually at the end of the structure.

Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

show more ...


# 1ac5dce9 03-May-2019 Andre Przywara <andre.przywara@arm.com>

vfio: remove unneeded test

clang complained that the comparison of an u8 variable against 256 is
somewhat pointless.

Just remove the check, as the condition will never hit.

Signed-off-by: Andre Pr

vfio: remove unneeded test

clang complained that the comparison of an u8 variable against 256 is
somewhat pointless.

Just remove the check, as the condition will never hit.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

show more ...


# 09533d3c 03-May-2019 Andre Przywara <andre.przywara@arm.com>

vfio: remove spurious ampersand

As clang rightfully pointed out, the ampersand in front of this member
looks wrong.

Remove it so we actually really compare against the count being 0.

Signed-off-by

vfio: remove spurious ampersand

As clang rightfully pointed out, the ampersand in front of this member
looks wrong.

Remove it so we actually really compare against the count being 0.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

show more ...


# 7302327a 08-Apr-2019 Leo Yan <leo.yan@linaro.org>

vfio-pci: Re-enable INTx mode when disable MSI/MSIX

Since PCI forbids enabling INTx, MSI or MSIX at the same time, it's by
default to disable INTx mode when enable MSI/MSIX mode; but this logic is
e

vfio-pci: Re-enable INTx mode when disable MSI/MSIX

Since PCI forbids enabling INTx, MSI or MSIX at the same time, it's by
default to disable INTx mode when enable MSI/MSIX mode; but this logic is
easily broken if the guest PCI driver detects the MSI/MSIX cannot work as
expected and tries to rollback to use INTx mode. In this case, the INTx
mode has been disabled and has no chance to re-enable it, thus both INTx
mode and MSI/MSIX mode cannot work in vfio.

Below shows the detailed flow for introducing this issue:

vfio_pci_configure_dev_irqs()
`-> vfio_pci_enable_intx()

vfio_pci_enable_msis()
`-> vfio_pci_disable_intx()

vfio_pci_disable_msis() => Guest PCI driver disables MSI

To fix this issue, when disable MSI/MSIX we need to check if INTx mode
is available for this device or not; if the device can support INTx then
re-enable it so that the device can fallback to use it.

Since vfio_pci_disable_intx() / vfio_pci_enable_intx() pair functions
may be called for multiple times, this patch uses 'intx_fd == -1' to
denote the INTx is disabled, the pair functions can directly bail out
when detect INTx has been disabled and enabled respectively.

Suggested-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

show more ...


# 12bd7a16 08-Apr-2019 Leo Yan <leo.yan@linaro.org>

vfio-pci: Add new function for INTx one-time initialisation

To support INTx enabling for multiple times, we need firstly to extract
one-time initialisation and move the related code into a new funct

vfio-pci: Add new function for INTx one-time initialisation

To support INTx enabling for multiple times, we need firstly to extract
one-time initialisation and move the related code into a new function
vfio_pci_init_intx(); if later disable and re-enable the INTx, we can
skip these one-time operations.

This patch move below three main operations for INTx one-time
initialisation from function vfio_pci_enable_intx() into function
vfio_pci_init_intx():

- Reserve 2 FDs for INTx;
- Sanity check with ioctl VFIO_DEVICE_GET_IRQ_INFO;
- Setup pdev->intx_gsi.

Suggested-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

show more ...


12