1Intel Graphics Device (IGD) assignment with vfio-pci 2==================================================== 3 4Using vfio-pci, we can passthrough Intel Graphics Device (IGD) to guest, either 5serve as primary and exclusive graphics adapter, or used in combination with an 6emulated primary graphics device, depending on the config and guest driver 7support. However, IGD devices are not "clean" PCI devices, they use extra 8memory regions other than BARs. Special handling is required to make them work 9properly, including: 10 11* OpRegion for accessing Virtual BIOS Table (VBT) that contains display output 12 information. 13* Data Stolen Memory (DSM) region used as VRAM at early stage (BIOS/UEFI) 14 15Certain guest software also depends on following conditions to work: 16(*-Required by) 17 18| Condition | Linux | Windows | VBIOS | EFI GOP | 19|---------------------------------------------|-------|---------|-------|---------| 20| #1 IGD has a valid OpRegion containing VBT | * ^1 | * | * | * | 21| #2 VID/DID of LPC bridge at 00:1f.0 matches | | | * | * | 22| #3 IGD is assigned to BDF 00:02.0 | | | * | * | 23| #4 IGD has VGA controller device class | | | * | * | 24| #5 Host's VGA ranges are mapped to IGD | | | * | | 25| #6 Guest has valid VBIOS or UEFI Option ROM | | | * | * | 26 27^1 Though i915 driver is able to mock a OpRegion, it is still recommended to 28 use the VBT copied from host OpRegion to prevent incorrect configuration. 29 30For #1, the "x-igd-opregion=on" option exposes a copy of host IGD OpRegion to 31guest via fw_cfg, where guest firmware can set up guest OpRegion with it. 32 33For #2, "x-igd-lpc=on" option copies the IDs of host LPC bridge and host bridge 34to guest. Currently this is only supported on i440fx machines as there is 35already an ICH9 LPC bridge present on q35 machines, overwriting its IDs may 36lead to unexpected behavior. 37 38For #3, "addr=2.0" assigns IGD to 00:02.0. 39 40For #4, the primary display must be set to IGD in host BIOS. 41 42For #5, "x-vga=on" enables guest access to standard VGA IO/MMIO ranges. 43 44For #6, ROM either provided via the ROM BAR or romfile= option is needed, this 45Intel document [1] shows how to dump VBIOS to file. For UEFI Option ROM, see 46"Guest firmware" section. 47 48QEMU also provides a "Legacy" mode that implicitly enables full functionality 49on IGD, it is automatically enabled when 50* IGD generation is 6 to 9 (Sandy Bridge to Comet Lake) 51* Machine type is i440fx 52* IGD is assigned to guest BDF 00:02.0 53* ROM BAR or romfile is present 54 55In "Legacy" mode, QEMU will automatically setup OpRegion, LPC bridge IDs and 56VGA range access, which is equivalent to: 57 x-igd-opregion=on,x-igd-lpc=on,x-vga=on 58 59By default, "Legacy" mode won't fail, it continues on error. User can set 60"x-igd-legacy-mode=on" to force enabling legacy mode, this also checks if the 61conditions above for legacy mode is met, and if any error occurs, QEMU will 62fail immediately. Users can also set "x-igd-legacy-mode=off" to disable legacy 63mode. 64 65In legacy mode, as the guest VGA ranges are assigned to IGD device, all other 66graphics devices should be removed, this can be done using "-nographic" or 67"-vga none" or "-nodefaults", along with adding the device using vfio-pci. 68 69For either mode, depending on the host kernel, the i915 driver in the host 70may generate faults and errors upon re-binding to an IGD device after it 71has been assigned to a VM. It's therefore generally recommended to prevent 72such driver binding unless the host driver is known to work well for this. 73There are numerous ways to do this, i915 can be blacklisted on the host, 74the driver_override option can be used to ensure that only vfio-pci can bind 75to the device on the host[2], virsh nodedev-detach can be used to bind the 76device to vfio drivers and then managed='no' set in the VM xml to prevent 77re-binding to i915, etc. Also note that IGD is also typically the primary 78graphics in the host and special options may be required beyond simply 79blacklisting i915 or using pci-stub/vfio-pci to take ownership of IGD as a 80PCI class device. Lower level drivers exist that may still claim the device. 81It may therefore be necessary to use kernel boot options video=vesafb:off or 82video=efifb:off (depending on host BIOS/UEFI) or these can be combined to 83a catch-all, video=vesafb:off,efifb:off. Error messages such as: 84 85 Failed to mmap 0000:00:02.0 BAR <>. Performance may be slow 86 87are a good indicator that such a problem exists. The host files /proc/iomem 88and /proc/ioports are often useful for identifying drivers consuming ranges 89of the device to cause such conflicts. 90 91Additionally, IGD device are known to generate small numbers of DMAR faults 92when initially assigned. It is believed that this is simply the IGD attempting 93to access the reserved GTT space after reset, which it no longer has access to 94when accessed from userspace. So long as the DMAR faults are small in number 95and most importantly, not ongoing, these are not an indication of an error. 96 97Additionally++, analog VGA output (as opposed to digital outputs like HDMI, 98DVI, or DisplayPort) may be unsupported in some use cases. In the author's 99experience, even DP to VGA adapters can be troublesome while adapters between 100digital formats work well. 101 102 103Options 104======= 105* x-igd-opregion=[*on*|off] 106 Copy host IGD OpRegion and expose it to guest with fw_cfg 107 108* x-igd-lpc=[on|*off*] 109 Creates a dummy LPC bridge at 00:1f:0 with host VID/DID (i440fx only) 110 111* x-igd-legacy-mode=[on|off|*auto*] 112 Enable/Disable legacy mode 113 114* x-igd-gms=[hex, default 0] 115 Overriding DSM region size in GGC register, 0 means uses host value. 116 Use this only when the DSM size cannot be changed through the 117 'DVMT Pre-Allocated' option in host BIOS. 118 119 120Examples 121======== 122* Adding IGD with automatically legacy mode support 123 -device vfio-pci,host=00:02.0,id=hostdev0,addr=2.0 124 125* Adding IGD with OpRegion and LPC ID hack, but without VGA ranges 126 (For UEFI guests) 127 -device vfio-pci,host=00:02.0,id=hostdev0,addr=2.0,x-igd-legacy-mode=off,x-igd-lpc=on,romfile=efi_oprom.rom 128 129 130Guest firmware 131============== 132Guest firmware is responsible for setting up OpRegion and Base of Data Stolen 133Memory (BDSM) in guest address space. IGD passthrough support imposes two 134fw_cfg requirements on the VM firmware: 135 1361) "etc/igd-opregion" 137 138 This fw_cfg file exposes the OpRegion for the IGD device. A reserved 139 region should be created below 4GB (recommended 4KB alignment), sized 140 sufficient for the fw_cfg file size, and the content of this file copied 141 to it. The dword based address of this reserved memory region must also 142 be written to the ASLS register at offset 0xFC on the IGD device. It is 143 recommended that firmware should make use of this fw_cfg entry for any 144 PCI class VGA device with Intel vendor ID. Multiple of such devices 145 within a VM is undefined. 146 1472) "etc/igd-bdsm-size" 148 149 This fw_cfg file contains an 8-byte, little endian integer indicating 150 the size of the reserved memory region required for IGD stolen memory. 151 Firmware must allocate a reserved memory below 4GB with required 1MB 152 alignment equal to this size. Additionally the base address of this 153 reserved region must be written to the dword BDSM register in PCI config 154 space of the IGD device at offset 0x5C (or 0xC0 for Gen 11+ devices using 155 64-bit BDSM). As this support is related to running the IGD ROM, which 156 has other dependencies on the device appearing at guest address 00:02.0, 157 it's expected that this fw_cfg file is only relevant to a single PCI 158 class VGA device with Intel vendor ID, appearing at PCI bus address 00:02.0. 159 160 Starting from Meteor Lake, IGD devices access stolen memory via its MMIO 161 BAR2 (LMEMBAR) and removed the BDSM register in config space. There is 162 no need for guest firmware to allocate data stolen memory in guest address 163 space and write it to BDSM register. Value of this fw_cfg file is 0 in 164 such case. 165 166Upstream Seabios has OpRegion and BDSM (pre-Gen11 device only) support. 167However, the support is not accepted by upstream EDK2/OVMF. A recommended 168solution is to create a virtual OpRom with following DXE drivers: 169 170* IgdAssignmentDxe: Set up OpRegion and BDSM according to fw_cfg (must) 171* IntelGopDriver: Closed-source Intel GOP driver 172* PlatformGopPolicy: Protocol required by IntelGopDriver 173 174IntelGopDriver and PlatformGopPolicy is only required when enabling GOP on IGD. 175 176The original IgdAssignmentDxe can be found at [3]. A Intel maintained version 177with PlatformGopPolicy for industrial computing is at [4]. There is also an 178unofficially maintained version with newer Gen11+ device support at [5]. 179You need to build them with EDK2. 180 181For the IntelGopDriver, Intel never released it to public. You may contact 182Intel support to get one as [4] said, if you are an Intel Premier Support 183customer, or you can try extracting it from your host firmware using 184"UEFI BIOS Updater"[6]. 185 186Once you got all the required DXE drivers, a Option ROM can be generated with 187EfiRom utility in EDK2, using 188 EfiRom -f 0x8086 -i <Device ID of your IGD> -o output.rom \ 189 -e IgdAssignmentDxe.efi PlatformGOPPolicy.efi IntelGopDriver.efi 190 191 192Known issues 193============ 194When using OVMF as guest firmware, you may encounter the following warning: 195warning: vfio_container_dma_map(0x55fab36ce610, 0x380010000000, 0x108000, 0x7fd336000000) = -22 (Invalid argument) 196 197Solution: 198Set the host physical address bits to IOMMU address width using 199 -cpu host,host-phys-bits-limit=<IOMMU address width> 200Or in libvirt XML with 201 <cpu> 202 <maxphysaddr mode='passthrough' limit='<IOMMU address width>'/> 203 </cpu> 204The IOMMU address width can be determined with 205 echo $(( ((0x$(cat /sys/devices/virtual/iommu/dmar0/intel-iommu/cap) & 0x3F0000) >> 16) + 1 )) 206Refer https://edk2.groups.io/g/devel/topic/patch_v1/102359124 for more details 207 208 209Memory View 210=========== 211IGD has it own address space. To use system RAM as VRAM, a single-level page 212table named Global Graphics Translation Table (GTT) is used for the address 213translation. Each page table entry points a 4KB page. Illustration below shows 214the translation flow on IGD with 64-bit GTT PTEs. 215 216(PTE_SIZE == 8) +-------------+---+ 217 | Address | V | V: Valid Bit 218 +-------------+---+ 219 | ... | | 220IGD:0x01ae9010 0xd740| 0x70ffc000 | 1 | Mem:0x42ba3e010^ 221-----------------------> 0xd748| 0x42ba3e000 | 1 +------------------> 222(addr >> 12) * PTE_SIZE 0xd750| 0x42ba3f000 | 1 | 223 | ... | | 224 +-------------+---+ 225^ The address may be remapped by IOMMU 226 227The memory region store GTT is called GTT Stolen Memory (GSM) it is located 228right below the Data Stolen Memory (DSM). Accessing this region directly is 229not allowed, any access will immediately freeze the whole system. The only way 230to access it is through the second half of MMIO BAR0. 231 232The Data Stolen Memory is reserved by firmware, and acts as the VRAM in pre-OS 233environments. In QEMU, guest firmware (Seabios/OVMF) is responsible for 234reserving a continuous region and program its base address to BDSM register, 235then let VBIOS/GOP driver initializing this region. Illustration below shows 236how DSM is mapped. 237 238 IGD Addr Space Host Addr Space Guest Addr Space 239 +-------------+ +-------------+ +-------------+ 240 | | | | | | 241 | | | | | | 242 | | +-------------+ +-------------+ 243 | | | Data Stolen | | Data Stolen | 244 | | | (Guest) | | (Guest) | 245 | | +------------>+-------------+<------->+-------------+<--Guest BDSM 246 | | | Passthrough | | EPT | | Emulated by QEMU 247DSMSIZE+-------------+ | with IOMMU | | Mapping | | Programmed by guest FW 248 | | | | | | | 249 | | | | | | | 250 0+-------------+--+ | | | | 251 | +-------------+ | | 252 | | Data Stolen | +-------------+ 253 | | (Host) | 254 +------------>+-------------+<--Host BDSM 255 Non- | | "real" one in HW 256 Passthrough | | Programmed by host FW 257 +-------------+ 258 259Footnotes 260========= 261[1] https://www.intel.com/content/www/us/en/docs/graphics-for-linux/developer-reference/1-0/dump-video-bios.html 262[2] # echo "vfio-pci" > /sys/bus/pci/devices/0000:00:02.0/driver_override 263[3] https://web.archive.org/web/20240827012422/https://bugzilla.tianocore.org/show_bug.cgi?id=935 264 Tianocore bugzilla was down since Jan 2025 :( 265[4] https://eci.intel.com/docs/3.3/components/kvm-hypervisor.html, Patch 0001-0004 266[5] https://github.com/tomitamoeko/VfioIgdPkg 267[6] https://winraid.level1techs.com/t/tool-guide-news-uefi-bios-updater-ubu/30357 268