#
0b5e55fc |
| 28-Jun-2023 |
Jean-Philippe Brucker <jean-philippe@linaro.org> |
vfio/pci: Clarify the MSI states
The MSI and MSI-X implementations is a bit complex, because it keeps track of capability and vector states as seen by both the guest and the host. Add a few comments
vfio/pci: Clarify the MSI states
The MSI and MSI-X implementations is a bit complex, because it keeps track of capability and vector states as seen by both the guest and the host. Add a few comments about those states and rename them to something more accurate.
What's called phys_state at the moment represents the software state maintained by VFIO and kvmtool, rather than the physical MSI capability, so host_state is more correct. To be consistent, rename virt_state to guest_state as well.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230628112331.453904-4-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
#
3a36d341 |
| 28-Jun-2023 |
Jean-Philippe Brucker <jean-philippe@linaro.org> |
vfio/pci: Initialize MSI vectors unmasked
MSI vectors can be masked and unmasked individually when using the MSI-X capability, or when the classic MSI capability supports Per-Vector Masking. At the
vfio/pci: Initialize MSI vectors unmasked
MSI vectors can be masked and unmasked individually when using the MSI-X capability, or when the classic MSI capability supports Per-Vector Masking. At the moment we incorrectly initialize the guest's view of the vectors (virt_state) as masked, so when using a MSI capability without Per-Vector Masking, the vectors are never unmasked and MSIs don't work. Initialize them unmasked instead.
Since VFIO doesn't support per-vector masking we implement it by disconnecting the irqfd, and keep track of it with the vector's phys_state. Initially the irqfd is not connected so phys_state is masked.
Reported-by: Vivek Gautam <vivek.gautam@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230628112331.453904-3-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
#
39181fc6 |
| 12-Oct-2021 |
Alexandru Elisei <alexandru.elisei@arm.com> |
vfio/pci: Align MSIX Table and PBA size to guest maximum page size
When allocating MMIO space for the MSI-X table, kvmtool rounds the allocation to the host's page size to make it as easy as possibl
vfio/pci: Align MSIX Table and PBA size to guest maximum page size
When allocating MMIO space for the MSI-X table, kvmtool rounds the allocation to the host's page size to make it as easy as possible for the guest to map the table to a page, if it wants to (and doesn't do BAR reassignment, like the x86 architecture for example). However, the host's page size can differ from the guest's on architectures which support multiple page sizes. For example, arm64 supports three different page size, and it is possible for the host to be using 4k pages, while the guest is using 64k pages.
To make sure the allocation is always aligned to a guest's page size, round it up to the maximum architectural page size. Do the same for the pending bit array if it lives in its own BAR.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20211012132510.42134-8-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
#
b20d6e30 |
| 12-Oct-2021 |
Alexandru Elisei <alexandru.elisei@arm.com> |
vfio/pci: Print an error when offset is outside of the MSIX table or PBA
Now that we keep track of the real size of MSIX table and PBA, print an error when the guest tries to write to an offset whic
vfio/pci: Print an error when offset is outside of the MSIX table or PBA
Now that we keep track of the real size of MSIX table and PBA, print an error when the guest tries to write to an offset which is not inside the correct regions.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20211012132510.42134-7-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
#
f93acc04 |
| 12-Oct-2021 |
Alexandru Elisei <alexandru.elisei@arm.com> |
vfio/pci: Rework MSIX table and PBA physical size allocation
When creating the MSIX table and PBA, kvmtool rounds up the table and pending bit array sizes to the host's page size. Unfortunately, whe
vfio/pci: Rework MSIX table and PBA physical size allocation
When creating the MSIX table and PBA, kvmtool rounds up the table and pending bit array sizes to the host's page size. Unfortunately, when doing that, it doesn't take into account that the new size can exceed the device BAR size, leading to hard to diagnose errors for certain configurations.
One theoretical example: PBA and table in the same 4k BAR, host's page size is 4k. In this case, table->size = 4k, pba->size = 4k, map_size = 4k, which means that pba->guest_phys_addr = table->guest_phys_addr + 4k, which is outside of the 4k MMIO range allocated for both structures.
Another example, this time a real-world error that I encountered: happens with a 64k host booting a 4k guest, an RTL8168 PCIE NIC assigned to the guest. In this case, kvmtool sets table->size = 64k (because it's rounded to the host's page size) and pba->size = 64k.
Truncated output of lspci -vv on the host:
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06) Subsystem: TP-LINK Technologies Co., Ltd. TG-3468 Gigabit PCI Express Network Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 255 Region 0: I/O ports at 1000 [size=256] Region 2: Memory at 40000000 (64-bit, non-prefetchable) [size=4K] Region 4: Memory at 100000000 (64-bit, prefetchable) [size=16K] [..] Capabilities: [b0] MSI-X: Enable- Count=4 Masked- Vector table: BAR=4 offset=00000000 PBA: BAR=4 offset=00000800 [..]
When booting the guest:
[..] [ 0.207444] pci-host-generic 40000000.pci: host bridge /pci ranges: [ 0.208564] pci-host-generic 40000000.pci: IO 0x0000000000..0x000000ffff -> 0x0000000000 [ 0.209857] pci-host-generic 40000000.pci: MEM 0x0050000000..0x007fffffff -> 0x0050000000 [ 0.211184] pci-host-generic 40000000.pci: ECAM at [mem 0x40000000-0x4fffffff] for [bus 00] [ 0.212625] pci-host-generic 40000000.pci: PCI host bridge to bus 0000:00 [ 0.213647] pci_bus 0000:00: root bus resource [bus 00] [ 0.214429] pci_bus 0000:00: root bus resource [io 0x0000-0xffff] [ 0.215355] pci_bus 0000:00: root bus resource [mem 0x50000000-0x7fffffff] [ 0.216676] pci 0000:00:00.0: [10ec:8168] type 00 class 0x020000 [ 0.223771] pci 0000:00:00.0: reg 0x10: [io 0x6200-0x62ff] [ 0.239765] pci 0000:00:00.0: reg 0x18: [mem 0x50010000-0x50010fff] [ 0.244595] pci 0000:00:00.0: reg 0x20: [mem 0x50000000-0x50003fff] [ 0.246331] pci 0000:00:01.0: [1af4:1000] type 00 class 0x020000 [ 0.247278] pci 0000:00:01.0: reg 0x10: [io 0x6300-0x63ff] [ 0.248212] pci 0000:00:01.0: reg 0x14: [mem 0x50020000-0x500200ff] [ 0.249172] pci 0000:00:01.0: reg 0x18: [mem 0x50020400-0x500207ff] [ 0.250450] pci 0000:00:02.0: [1af4:1001] type 00 class 0x018000 [ 0.251392] pci 0000:00:02.0: reg 0x10: [io 0x6400-0x64ff] [ 0.252351] pci 0000:00:02.0: reg 0x14: [mem 0x50020800-0x500208ff] [ 0.253312] pci 0000:00:02.0: reg 0x18: [mem 0x50020c00-0x50020fff] [ 0.254760] pci 0000:00:00.0: BAR 4: assigned [mem 0x50000000-0x50003fff] (1) [ 0.255805] pci 0000:00:00.0: BAR 2: assigned [mem 0x50004000-0x50004fff] (2) Warning: [10ec:8168] Error activating emulation for BAR 2 Warning: [10ec:8168] Error activating emulation for BAR 2 [ 0.260432] pci 0000:00:01.0: BAR 2: assigned [mem 0x50005000-0x500053ff] Warning: [1af4:1000] Error activating emulation for BAR 2 Warning: [1af4:1000] Error activating emulation for BAR 2 [ 0.261469] pci 0000:00:02.0: BAR 2: assigned [mem 0x50005400-0x500057ff] Warning: [1af4:1001] Error activating emulation for BAR 2 Warning: [1af4:1001] Error activating emulation for BAR 2 [ 0.262499] pci 0000:00:00.0: BAR 0: assigned [io 0x1000-0x10ff] [ 0.263415] pci 0000:00:01.0: BAR 0: assigned [io 0x1100-0x11ff] [ 0.264462] pci 0000:00:01.0: BAR 1: assigned [mem 0x50005800-0x500058ff] Warning: [1af4:1000] Error activating emulation for BAR 1 Warning: [1af4:1000] Error activating emulation for BAR 1 [ 0.265481] pci 0000:00:02.0: BAR 0: assigned [io 0x1200-0x12ff] [ 0.266397] pci 0000:00:02.0: BAR 1: assigned [mem 0x50005900-0x500059ff] Warning: [1af4:1001] Error activating emulation for BAR 1 Warning: [1af4:1001] Error activating emulation for BAR 1 [ 0.267892] EINJ: ACPI disabled. [ 0.269922] virtio-pci 0000:00:01.0: virtio_pci: leaving for legacy driver [ 0.271118] virtio-pci 0000:00:02.0: virtio_pci: leaving for legacy driver [ 0.274122] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled [ 0.275930] printk: console [ttyS0] disabled [ 0.276669] 1000000.U6_16550A: ttyS0 at MMIO 0x1000000 (irq = 13, base_baud = 115200) is a 16550A [ 0.278058] printk: console [ttyS0] enabled [ 0.278058] printk: console [ttyS0] enabled [ 0.279304] printk: bootconsole [ns16550a0] disabled [ 0.279304] printk: bootconsole [ns16550a0] disabled [ 0.281252] 1001000.U6_16550A: ttyS1 at MMIO 0x1001000 (irq = 14, base_baud = 115200) is a 16550A [ 0.282842] 1002000.U6_16550A: ttyS2 at MMIO 0x1002000 (irq = 15, base_baud = 115200) is a 16550A [ 0.284611] 1003000.U6_16550A: ttyS3 at MMIO 0x1003000 (irq = 16, base_baud = 115200) is a 16550A [ 0.286094] SuperH (H)SCI(F) driver initialized [ 0.286868] msm_serial: driver initialized [ 0.287890] [drm] radeon kernel modesetting enabled. [ 0.288826] cacheinfo: Unable to detect cache hierarchy for CPU 0 [ 0.293321] loop: module loaded KVM_SET_GSI_ROUTING: Invalid argument
At (1), the guest writes 0x50000000 into BAR 4 of the NIC (which holds the MSIX table and PBA), expecting that will cover only 16k of address space (the BAR size), up to 0x50003fff, inclusive. On the host side, in vfio_pci_bar_activate(), kvmtool will actually register for MMIO emulation the region 0x50000000-0x5000ffff (64k in total) for the MSIX table and 0x50010000-0x5001ffff (another 64k) for the PBA (kvmtool set table->size and pba->size to 64k when it aligned them to the host's page size).
Then at step (2), the guest writes the next available address (from its point of view) into BAR 2 of the NIC, which is 0x50004000. On the host side, the PCI emulation layer will search all the regions that overlap with the BAR address range (0x50004000-0x50004fff) and will find none because, just like the guest, it uses the BAR size to check for overlaps. When vfio_pci_bar_activate() is reached, kvmtool will try to register memory for this region, but it is already registered for the MSIX table emulation and fails.
The same scenario repeats for every following memory BAR, because the MSIX table and PBA use memory from 0x50000000 to 0x5001ffff.
The error at the end, which finally terminates the VM, is caused by the guest trying to write to a totally different BAR, which vfio-pci interpretes as a write to MSI-X table because it falls in the 64k region that was registered for emulation. The IRQ ID is not a valid SPI number and gicv2m_update_routing() returns an error (and sets errno to EINVAL).
Fix this by aligning the table and PBA size to 8 bytes to allow for qword accesses, like PCI 3.0 mandates.
For the sake of simplicity, the PBA offset in a BAR, in case of a shared BAR, is kept the same as the offset of the physical device. One hopes that the device respects the recommendations set forth in PCI LOCAL BUS SPECIFICATION, REV. 3.0, section "MSI-X Capability and Table Structures"
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20211012132510.42134-6-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
#
5f44d5d6 |
| 12-Oct-2021 |
Alexandru Elisei <alexandru.elisei@arm.com> |
vfio/pci: Rename PBA offset in device descriptor to fd_offset
The MSI-X capability defines a PBA offset, which is the offset of the PBA array in the BAR that holds the array.
kvmtool uses the field
vfio/pci: Rename PBA offset in device descriptor to fd_offset
The MSI-X capability defines a PBA offset, which is the offset of the PBA array in the BAR that holds the array.
kvmtool uses the field "pba_offset" in struct msix_cap (which represents the MSIX capability) to refer to the [PBA offset:BAR] field of the capability; and the field "offset" in the struct vfio_pci_msix_pba to refer to offset of the PBA array in the device descriptor created by the VFIO driver.
As we're getting ready to add yet another field that represents an offset to struct vfio_pci_msix_pba, try to avoid ambiguities by renaming the struct's "offset" field to "fd_offset".
No functional change intended.
Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20211012132510.42134-5-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
#
3d3dca07 |
| 12-Oct-2021 |
Alexandru Elisei <alexandru.elisei@arm.com> |
vfio/pci.c: Remove double include for assert.h
assert.h is included twice, keep only one instance.
Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Alexandru Elisei <alexandru.el
vfio/pci.c: Remove double include for assert.h
assert.h is included twice, keep only one instance.
Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20211012132510.42134-3-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
#
25c1dc6c |
| 13-Jul-2021 |
Alexandru Elisei <alexandru.elisei@arm.com> |
arm/arm64: vfio: Add PCI Express Capability Structure
It turns out that some Linux drivers (like Realtek R8169) fall back to a device-specific configuration method if the device is not PCI Express c
arm/arm64: vfio: Add PCI Express Capability Structure
It turns out that some Linux drivers (like Realtek R8169) fall back to a device-specific configuration method if the device is not PCI Express capable:
[ 1.433825] r8169 0000:00:00.0 enp0s0: No native access to PCI extended config space, falling back to CSI
Add the PCI Express Capability Structure and populate it for assigned devices, as this is how the Linux PCI driver determines if a device is PCI Express capable.
Because we don't emulate a PCI Express link, a root complex or any slot related properties, the PCI Express capability is kept as small as possible by ignoring those fields.
Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20210713170631.155595-5-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
#
e69b7663 |
| 13-Jul-2021 |
Alexandru Elisei <alexandru.elisei@arm.com> |
arm/arm64: Add PCI Express 1.1 support
PCI Express comes with an extended addressing scheme, which directly translated into a bigger device configuration space (256->4096 bytes) and bigger PCI confi
arm/arm64: Add PCI Express 1.1 support
PCI Express comes with an extended addressing scheme, which directly translated into a bigger device configuration space (256->4096 bytes) and bigger PCI configuration space (16->256 MB), as well as mandatory capabilities (power management [1] and PCI Express capability [2]).
However, our virtio PCI implementation implements version 0.9 of the protocol and it still uses transitional PCI device ID's, so we have opted to omit the mandatory PCI Express capabilities. For VFIO, the power management and PCI Express capability are left for a subsequent patch.
[1] PCI Express Base Specification Revision 1.1, section 7.6 [2] PCI Express Base Specification Revision 1.1, section 7.8
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20210713170631.155595-4-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
#
465edc9d |
| 14-May-2020 |
Alexandru Elisei <alexandru.elisei@arm.com> |
pci: Implement reassignable BARs
BARs are used by the guest to configure the access to the PCI device by writing the address to which the device will respond. The basic idea for adding support for r
pci: Implement reassignable BARs
BARs are used by the guest to configure the access to the PCI device by writing the address to which the device will respond. The basic idea for adding support for reassignable BARs is straightforward: deactivate emulation for the memory region described by the old BAR value, and activate emulation for the new region.
BAR reassignment can be done while device access is enabled and memory regions for different devices can overlap as long as no access is made to the overlapping memory regions. This means that it is legal for the BARs of two distinct devices to point to an overlapping memory region, and indeed, this is how Linux does resource assignment at boot. To account for this situation, the simple algorithm described above is enhanced to scan for all devices and:
- Deactivate emulation for any BARs that might overlap with the new BAR value.
- Enable emulation for any BARs that were overlapping with the old value after the BAR has been updated.
Activating/deactivating emulation of a memory region has side effects. In order to prevent the execution of the same callback twice we now keep track of the state of the region emulation. For example, this can happen if we program a BAR with an address that overlaps a second BAR, thus deactivating emulation for the second BAR, and then we disable all region accesses to the second BAR by writing to the command register.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/1589470709-4104-11-git-send-email-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
#
5a8e4f25 |
| 14-May-2020 |
Alexandru Elisei <alexandru.elisei@arm.com> |
pci: Implement callbacks for toggling BAR emulation
Implement callbacks for activating and deactivating emulation for a BAR region. This is in preparation for allowing a guest operating system to en
pci: Implement callbacks for toggling BAR emulation
Implement callbacks for activating and deactivating emulation for a BAR region. This is in preparation for allowing a guest operating system to enable and disable access to I/O or memory space, or to reassign the BARs.
The emulated vesa device framebuffer isn't designed to allow stopping and restarting at arbitrary points in the guest execution. Furthermore, on x86, the kernel will not change the BAR addresses, which on bare metal are programmed by the firmware, so take the easy way out and refuse to activate/deactivate emulation for the BAR regions. We also take this opportunity to make the vesa emulation code more consistent by moving all static variable definitions in one place, at the top of the file.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/1589470709-4104-9-git-send-email-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
#
e1d0285c |
| 14-May-2020 |
Alexandru Elisei <alexandru.elisei@arm.com> |
vfio/pci: Don't write configuration value twice
After writing to the device fd as part of the PCI configuration space emulation, we read back from the device to make sure that the write finished. Th
vfio/pci: Don't write configuration value twice
After writing to the device fd as part of the PCI configuration space emulation, we read back from the device to make sure that the write finished. The value is read back into the PCI configuration space and afterwards, the same value is copied by the PCI emulation code. Let's read from the device fd into a temporary variable, to prevent this double write.
The double write is harmless in itself. But when we implement reassignable BARs, we need to keep track of the old BAR value, and the VFIO code is overwritting it.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/1589470709-4104-7-git-send-email-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
#
a05e576f |
| 14-May-2020 |
Alexandru Elisei <alexandru.elisei@arm.com> |
vfio: Reserve ioports when configuring the BAR
Let's be consistent and reserve ioports when we are configuring the BAR, not when we map it, just like we do with mmio regions.
Signed-off-by: Alexand
vfio: Reserve ioports when configuring the BAR
Let's be consistent and reserve ioports when we are configuring the BAR, not when we map it, just like we do with mmio regions.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/1589470709-4104-5-git-send-email-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
#
c0c45eed |
| 24-Apr-2020 |
Andre Przywara <andre.przywara@arm.com> |
pci: Move legacy IRQ assignment into devices
So far the (legacy) IRQ line for a PCI device is allocated in devices.c, which should actually not take care of that. Since we allocate all other device
pci: Move legacy IRQ assignment into devices
So far the (legacy) IRQ line for a PCI device is allocated in devices.c, which should actually not take care of that. Since we allocate all other device specific resources in the actual device emulation code, the IRQ should not be something special.
Remove the PCI specific code from devices.c, and move the IRQ line allocation to the PCI code. This drops the IRQ line from the VESA device, since it does not use one.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
#
e554aefd |
| 24-Apr-2020 |
Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> |
vfio: fix multi-MSI vector handling
A PCI device with a MSI capability enabling Multiple MSI messages (through the Multiple Message Enable field in the Message Control register[6:4]) is expected to
vfio: fix multi-MSI vector handling
A PCI device with a MSI capability enabling Multiple MSI messages (through the Multiple Message Enable field in the Message Control register[6:4]) is expected to drive the Message Data lower bits (number determined by the number of selected vectors) to generate the corresponding MSI messages writes on the PCI bus.
Therefore, KVM expects the MSI data lower bits (a number of bits that depend on bits [6:4] of the Message Control register - which in turn control the number of vectors allocated) to be set-up by kvmtool while programming the MSI IRQ routing entries to make sure the MSI entries can actually be demultiplexed by KVM and IRQ routes set-up accordingly so that when an actual HW fires KVM can route it to the correct entry in the interrupt controller (and set-up a correct passthrough route for directly injected interrupt).
Current kvmtool code does not set-up Message data entries correctly for multi-MSI vectors - the data field is left as programmed in the MSI capability by the guest for all vector entries, triggering IRQs misrouting.
Fix it.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
#
3665392a |
| 14-Apr-2020 |
Alexandru Elisei <alexandru.elisei@arm.com> |
vfio/pci: Don't access unallocated regions
Don't try to configure a BAR if there is no region associated with it.
Also move the variable declarations from inside the loop to the start of the functi
vfio/pci: Don't access unallocated regions
Don't try to configure a BAR if there is no region associated with it.
Also move the variable declarations from inside the loop to the start of the function for consistency.
Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
#
5b7fef16 |
| 14-Apr-2020 |
Alexandru Elisei <alexandru.elisei@arm.com> |
vfio/pci: Ignore expansion ROM BAR writes
To get the size of the expansion ROM, software writes 0xfffff800 to the expansion ROM BAR in the PCI configuration space. PCI emulation executes the optiona
vfio/pci: Ignore expansion ROM BAR writes
To get the size of the expansion ROM, software writes 0xfffff800 to the expansion ROM BAR in the PCI configuration space. PCI emulation executes the optional configuration space write callback that a device can implement before emulating this write.
kvmtool's implementation of VFIO doesn't have support for emulating expansion ROMs. However, the callback writes the guest value to the hardware BAR, and then it reads it back to the emulated BAR to make sure the write has completed successfully.
After this, we return to regular PCI emulation and because the BAR is no longer 0, we write back to the BAR the value that the guest used to get the size. As a result, the guest will think that the ROM size is 0x800 after the subsequent read and we end up unintentionally exposing to the guest a BAR which we don't emulate.
Let's fix this by ignoring writes to the expansion ROM BAR.
Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
#
84998f21 |
| 14-Apr-2020 |
Alexandru Elisei <alexandru.elisei@arm.com> |
vfio/pci: Don't assume that only even numbered BARs are 64bit
Not all devices have the bottom 32 bits of a 64 bit BAR in an even numbered BAR. For example, on an NVIDIA Quadro P400, BARs 1 and 3 are
vfio/pci: Don't assume that only even numbered BARs are 64bit
Not all devices have the bottom 32 bits of a 64 bit BAR in an even numbered BAR. For example, on an NVIDIA Quadro P400, BARs 1 and 3 are 64bit. Remove this assumption.
Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
#
ed01a603 |
| 14-Apr-2020 |
Alexandru Elisei <alexandru.elisei@arm.com> |
vfio/pci: Allocate correct size for MSIX table and PBA BARs
kvmtool assumes that the BAR that holds the address for the MSIX table and PBA structure has a size which is equal to their total size and
vfio/pci: Allocate correct size for MSIX table and PBA BARs
kvmtool assumes that the BAR that holds the address for the MSIX table and PBA structure has a size which is equal to their total size and it allocates memory from MMIO space accordingly. However, when initializing the BARs, the BAR size is set to the region size reported by VFIO. When the physical BAR size is greater than the mmio space that kvmtool allocates, we can have a situation where the BAR overlaps with another BAR, in which case kvmtool will fail to map the memory. This was found when trying to do PCI passthrough with a PCIe Realtek r8168 NIC, when the guest was also using virtio-block and virtio-net devices:
[..] [ 0.197926] PCI: OF: PROBE_ONLY enabled [ 0.198454] pci-host-generic 40000000.pci: host bridge /pci ranges: [ 0.199291] pci-host-generic 40000000.pci: IO 0x00007000..0x0000ffff -> 0x00007000 [ 0.200331] pci-host-generic 40000000.pci: MEM 0x41000000..0x7fffffff -> 0x41000000 [ 0.201480] pci-host-generic 40000000.pci: ECAM at [mem 0x40000000-0x40ffffff] for [bus 00] [ 0.202635] pci-host-generic 40000000.pci: PCI host bridge to bus 0000:00 [ 0.203535] pci_bus 0000:00: root bus resource [bus 00] [ 0.204227] pci_bus 0000:00: root bus resource [io 0x0000-0x8fff] (bus address [0x7000-0xffff]) [ 0.205483] pci_bus 0000:00: root bus resource [mem 0x41000000-0x7fffffff] [ 0.206456] pci 0000:00:00.0: [10ec:8168] type 00 class 0x020000 [ 0.207399] pci 0000:00:00.0: reg 0x10: [io 0x0000-0x00ff] [ 0.208252] pci 0000:00:00.0: reg 0x18: [mem 0x41002000-0x41002fff] [ 0.209233] pci 0000:00:00.0: reg 0x20: [mem 0x41000000-0x41003fff] [ 0.210481] pci 0000:00:01.0: [1af4:1000] type 00 class 0x020000 [ 0.211349] pci 0000:00:01.0: reg 0x10: [io 0x0100-0x01ff] [ 0.212118] pci 0000:00:01.0: reg 0x14: [mem 0x41003000-0x410030ff] [ 0.212982] pci 0000:00:01.0: reg 0x18: [mem 0x41003200-0x410033ff] [ 0.214247] pci 0000:00:02.0: [1af4:1001] type 00 class 0x018000 [ 0.215096] pci 0000:00:02.0: reg 0x10: [io 0x0200-0x02ff] [ 0.215863] pci 0000:00:02.0: reg 0x14: [mem 0x41003400-0x410034ff] [ 0.216723] pci 0000:00:02.0: reg 0x18: [mem 0x41003600-0x410037ff] [ 0.218105] pci 0000:00:00.0: can't claim BAR 4 [mem 0x41000000-0x41003fff]: address conflict with 0000:00:00.0 [mem 0x41002000-0x41002fff] [..]
Guest output of lspci -vv:
00:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06) Subsystem: TP-LINK Technologies Co., Ltd. TG-3468 Gigabit PCI Express Network Adapter Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 16 Region 0: I/O ports at 0000 [size=256] Region 2: Memory at 41002000 (64-bit, non-prefetchable) [size=4K] Region 4: Memory at 41000000 (64-bit, prefetchable) [size=16K] Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [b0] MSI-X: Enable- Count=4 Masked- Vector table: BAR=4 offset=00000000 PBA: BAR=4 offset=00001000
Let's fix this by allocating an amount of MMIO memory equal to the size of the BAR that contains the MSIX table and/or PBA.
Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
#
854aa2ef |
| 14-Apr-2020 |
Julien Thierry <julien.thierry@arm.com> |
ioport: pci: Move port allocations to PCI devices
The dynamic ioport allocation with IOPORT_EMPTY is currently only used by PCI devices. Other devices use fixed ports for which they request registra
ioport: pci: Move port allocations to PCI devices
The dynamic ioport allocation with IOPORT_EMPTY is currently only used by PCI devices. Other devices use fixed ports for which they request registration to the ioport API.
PCI ports need to be in the PCI IO space and there is no reason ioport API should know a PCI port is being allocated and needs to be placed in PCI IO space. This currently just happens to be the case.
Move the responsability of dynamic allocation of ioports from the ioport API to PCI.
In the future, if other types of devices also need dynamic ioport allocation, they'll have to figure out the range of ports they are allowed to use.
Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> [Renamed functions for clarity] Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
#
a3704b91 |
| 03-May-2019 |
Andre Przywara <andre.przywara@arm.com> |
vfio: rework vfio_irq_set payload setting
struct vfio_irq_set from the kernel headers contains a variable sized array to hold a payload. The vfio_irq_eventfd struct puts the "fd" member right after
vfio: rework vfio_irq_set payload setting
struct vfio_irq_set from the kernel headers contains a variable sized array to hold a payload. The vfio_irq_eventfd struct puts the "fd" member right after this, hoping it to automatically fit in the payload slot. But having a variable sized type not at the end of a struct is a GNU C extension, so clang will refuse to compile this.
Solve this by somewhat doing the compiler's job and place the payload manually at the end of the structure.
Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
show more ...
|
#
1ac5dce9 |
| 03-May-2019 |
Andre Przywara <andre.przywara@arm.com> |
vfio: remove unneeded test
clang complained that the comparison of an u8 variable against 256 is somewhat pointless.
Just remove the check, as the condition will never hit.
Signed-off-by: Andre Pr
vfio: remove unneeded test
clang complained that the comparison of an u8 variable against 256 is somewhat pointless.
Just remove the check, as the condition will never hit.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
show more ...
|
#
09533d3c |
| 03-May-2019 |
Andre Przywara <andre.przywara@arm.com> |
vfio: remove spurious ampersand
As clang rightfully pointed out, the ampersand in front of this member looks wrong.
Remove it so we actually really compare against the count being 0.
Signed-off-by
vfio: remove spurious ampersand
As clang rightfully pointed out, the ampersand in front of this member looks wrong.
Remove it so we actually really compare against the count being 0.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
show more ...
|
#
7302327a |
| 08-Apr-2019 |
Leo Yan <leo.yan@linaro.org> |
vfio-pci: Re-enable INTx mode when disable MSI/MSIX
Since PCI forbids enabling INTx, MSI or MSIX at the same time, it's by default to disable INTx mode when enable MSI/MSIX mode; but this logic is e
vfio-pci: Re-enable INTx mode when disable MSI/MSIX
Since PCI forbids enabling INTx, MSI or MSIX at the same time, it's by default to disable INTx mode when enable MSI/MSIX mode; but this logic is easily broken if the guest PCI driver detects the MSI/MSIX cannot work as expected and tries to rollback to use INTx mode. In this case, the INTx mode has been disabled and has no chance to re-enable it, thus both INTx mode and MSI/MSIX mode cannot work in vfio.
Below shows the detailed flow for introducing this issue:
vfio_pci_configure_dev_irqs() `-> vfio_pci_enable_intx()
vfio_pci_enable_msis() `-> vfio_pci_disable_intx()
vfio_pci_disable_msis() => Guest PCI driver disables MSI
To fix this issue, when disable MSI/MSIX we need to check if INTx mode is available for this device or not; if the device can support INTx then re-enable it so that the device can fallback to use it.
Since vfio_pci_disable_intx() / vfio_pci_enable_intx() pair functions may be called for multiple times, this patch uses 'intx_fd == -1' to denote the INTx is disabled, the pair functions can directly bail out when detect INTx has been disabled and enabled respectively.
Suggested-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
show more ...
|
#
12bd7a16 |
| 08-Apr-2019 |
Leo Yan <leo.yan@linaro.org> |
vfio-pci: Add new function for INTx one-time initialisation
To support INTx enabling for multiple times, we need firstly to extract one-time initialisation and move the related code into a new funct
vfio-pci: Add new function for INTx one-time initialisation
To support INTx enabling for multiple times, we need firstly to extract one-time initialisation and move the related code into a new function vfio_pci_init_intx(); if later disable and re-enable the INTx, we can skip these one-time operations.
This patch move below three main operations for INTx one-time initialisation from function vfio_pci_enable_intx() into function vfio_pci_init_intx():
- Reserve 2 FDs for INTx; - Sanity check with ioctl VFIO_DEVICE_GET_IRQ_INFO; - Setup pdev->intx_gsi.
Suggested-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
show more ...
|