124563a58SCédric Le GoaterXIVE for sPAPR (pseries machines) 224563a58SCédric Le Goater================================= 324563a58SCédric Le Goater 424563a58SCédric Le GoaterThe POWER9 processor comes with a new interrupt controller 524563a58SCédric Le Goaterarchitecture, called XIVE as "eXternal Interrupt Virtualization 624563a58SCédric Le GoaterEngine". It supports a larger number of interrupt sources and offers 724563a58SCédric Le Goatervirtualization features which enables the HW to deliver interrupts 824563a58SCédric Le Goaterdirectly to virtual processors without hypervisor assistance. 924563a58SCédric Le Goater 1024563a58SCédric Le GoaterA QEMU ``pseries`` machine (which is PAPR compliant) using POWER9 1124563a58SCédric Le Goaterprocessors can run under two interrupt modes: 1224563a58SCédric Le Goater 1324563a58SCédric Le Goater- *Legacy Compatibility Mode* 1424563a58SCédric Le Goater 1524563a58SCédric Le Goater the hypervisor provides identical interfaces and similar 1624563a58SCédric Le Goater functionality to PAPR+ Version 2.7. This is the default mode 1724563a58SCédric Le Goater 1824563a58SCédric Le Goater It is also referred as *XICS* in QEMU. 1924563a58SCédric Le Goater 2024563a58SCédric Le Goater- *XIVE native exploitation mode* 2124563a58SCédric Le Goater 2224563a58SCédric Le Goater the hypervisor provides new interfaces to manage the XIVE control 2324563a58SCédric Le Goater structures, and provides direct control for interrupt management 2424563a58SCédric Le Goater through MMIO pages. 2524563a58SCédric Le Goater 2624563a58SCédric Le GoaterWhich interrupt modes can be used by the machine is negotiated with 2724563a58SCédric Le Goaterthe guest O/S during the Client Architecture Support negotiation 2824563a58SCédric Le Goatersequence. The two modes are mutually exclusive. 2924563a58SCédric Le Goater 3024563a58SCédric Le GoaterBoth interrupt mode share the same IRQ number space. See below for the 3124563a58SCédric Le Goaterlayout. 3224563a58SCédric Le Goater 3324563a58SCédric Le GoaterCAS Negotiation 3424563a58SCédric Le Goater--------------- 3524563a58SCédric Le Goater 3624563a58SCédric Le GoaterQEMU advertises the supported interrupt modes in the device tree 37b87a0100SCédric Le Goaterproperty ``ibm,arch-vec-5-platform-support`` in byte 23 and the OS 38b87a0100SCédric Le GoaterSelection for XIVE is indicated in the ``ibm,architecture-vec-5`` 3924563a58SCédric Le Goaterproperty byte 23. 4024563a58SCédric Le Goater 4124563a58SCédric Le GoaterThe interrupt modes supported by the machine depend on the CPU type 4224563a58SCédric Le Goater(POWER9 is required for XIVE) but also on the machine property 4324563a58SCédric Le Goater``ic-mode`` which can be set on the command line. It can take the 44b87a0100SCédric Le Goaterfollowing values: ``xics``, ``xive``, and ``dual`` which is the 45b87a0100SCédric Le Goaterdefault mode. ``dual`` means that both modes XICS **and** XIVE are 46b87a0100SCédric Le Goatersupported and if the guest OS supports XIVE, this mode will be 47b87a0100SCédric Le Goaterselected. 4824563a58SCédric Le Goater 49*76ca4b58SzhaolichangThe chosen interrupt mode is activated after a reconfiguration done 5024563a58SCédric Le Goaterin a machine reset. 5124563a58SCédric Le Goater 52b87a0100SCédric Le GoaterKVM negotiation 53b87a0100SCédric Le Goater--------------- 54b87a0100SCédric Le Goater 55b87a0100SCédric Le GoaterWhen the guest starts under KVM, the capabilities of the host kernel 56b87a0100SCédric Le Goaterand QEMU are also negotiated. Depending on the version of the host 57b87a0100SCédric Le Goaterkernel, KVM will advertise the XIVE capability to QEMU or not. 58b87a0100SCédric Le Goater 59b87a0100SCédric Le GoaterNevertheless, the available interrupt modes in the machine should not 60b87a0100SCédric Le Goaterdepend on the XIVE KVM capability of the host. On older kernels 61b87a0100SCédric Le Goaterwithout XIVE KVM support, QEMU will use the emulated XIVE device as a 62b87a0100SCédric Le Goaterfallback and on newer kernels (>=5.2), the KVM XIVE device. 63b87a0100SCédric Le Goater 648d14523bSCédric Le GoaterXIVE native exploitation mode is not supported for KVM nested guests, 658d14523bSCédric Le GoaterVMs running under a L1 hypervisor (KVM on pSeries). In that case, the 668d14523bSCédric Le Goaterhypervisor will not advertise the KVM capability and QEMU will use the 678d14523bSCédric Le Goateremulated XIVE device, same as for older versions of KVM. 688d14523bSCédric Le Goater 69b87a0100SCédric Le GoaterAs a final refinement, the user can also switch the use of the KVM 70b87a0100SCédric Le Goaterdevice with the machine option ``kernel_irqchip``. 71b87a0100SCédric Le Goater 72b87a0100SCédric Le Goater 73b87a0100SCédric Le GoaterXIVE support in KVM 74b87a0100SCédric Le Goater~~~~~~~~~~~~~~~~~~~ 75b87a0100SCédric Le Goater 76b87a0100SCédric Le GoaterFor guest OSes supporting XIVE, the resulting interrupt modes on host 77b87a0100SCédric Le Goaterkernels with XIVE KVM support are the following: 78b87a0100SCédric Le Goater 79b87a0100SCédric Le Goater============== ============= ============= ================ 80b87a0100SCédric Le Goateric-mode kernel_irqchip 81b87a0100SCédric Le Goater-------------- ---------------------------------------------- 82b87a0100SCédric Le Goater/ allowed off on 83b87a0100SCédric Le Goater (default) 84b87a0100SCédric Le Goater============== ============= ============= ================ 85b87a0100SCédric Le Goaterdual (default) XIVE KVM XIVE emul. XIVE KVM 86b87a0100SCédric Le Goaterxive XIVE KVM XIVE emul. XIVE KVM 87b87a0100SCédric Le Goaterxics XICS KVM XICS emul. XICS KVM 88b87a0100SCédric Le Goater============== ============= ============= ================ 89b87a0100SCédric Le Goater 90b87a0100SCédric Le GoaterFor legacy guest OSes without XIVE support, the resulting interrupt 91b87a0100SCédric Le Goatermodes are the following: 92b87a0100SCédric Le Goater 93b87a0100SCédric Le Goater============== ============= ============= ================ 94b87a0100SCédric Le Goateric-mode kernel_irqchip 95b87a0100SCédric Le Goater-------------- ---------------------------------------------- 96b87a0100SCédric Le Goater/ allowed off on 97b87a0100SCédric Le Goater (default) 98b87a0100SCédric Le Goater============== ============= ============= ================ 99b87a0100SCédric Le Goaterdual (default) XICS KVM XICS emul. XICS KVM 100b87a0100SCédric Le Goaterxive QEMU error(3) QEMU error(3) QEMU error(3) 101b87a0100SCédric Le Goaterxics XICS KVM XICS emul. XICS KVM 102b87a0100SCédric Le Goater============== ============= ============= ================ 103b87a0100SCédric Le Goater 104b87a0100SCédric Le Goater(3) QEMU fails at CAS with ``Guest requested unavailable interrupt 105b87a0100SCédric Le Goater mode (XICS), either don't set the ic-mode machine property or try 106b87a0100SCédric Le Goater ic-mode=xics or ic-mode=dual`` 107b87a0100SCédric Le Goater 108b87a0100SCédric Le Goater 109b87a0100SCédric Le GoaterNo XIVE support in KVM 110b87a0100SCédric Le Goater~~~~~~~~~~~~~~~~~~~~~~ 111b87a0100SCédric Le Goater 112b87a0100SCédric Le GoaterFor guest OSes supporting XIVE, the resulting interrupt modes on host 113b87a0100SCédric Le Goaterkernels without XIVE KVM support are the following: 114b87a0100SCédric Le Goater 115b87a0100SCédric Le Goater============== ============= ============= ================ 116b87a0100SCédric Le Goateric-mode kernel_irqchip 117b87a0100SCédric Le Goater-------------- ---------------------------------------------- 118b87a0100SCédric Le Goater/ allowed off on 119b87a0100SCédric Le Goater (default) 120b87a0100SCédric Le Goater============== ============= ============= ================ 121b87a0100SCédric Le Goaterdual (default) XIVE emul.(1) XIVE emul. QEMU error (2) 122b87a0100SCédric Le Goaterxive XIVE emul.(1) XIVE emul. QEMU error (2) 123b87a0100SCédric Le Goaterxics XICS KVM XICS emul. XICS KVM 124b87a0100SCédric Le Goater============== ============= ============= ================ 125b87a0100SCédric Le Goater 126b87a0100SCédric Le Goater 127b87a0100SCédric Le Goater(1) QEMU warns with ``warning: kernel_irqchip requested but unavailable: 128b87a0100SCédric Le Goater IRQ_XIVE capability must be present for KVM`` 129c55bcb1fSGreg Kurz In some cases (old host kernels or KVM nested guests), one may hit a 130c55bcb1fSGreg Kurz QEMU/KVM incompatibility due to device destruction in reset. QEMU fails 131c55bcb1fSGreg Kurz with ``KVM is incompatible with ic-mode=dual,kernel-irqchip=on`` 132b87a0100SCédric Le Goater(2) QEMU fails with ``kernel_irqchip requested but unavailable: 133b87a0100SCédric Le Goater IRQ_XIVE capability must be present for KVM`` 134b87a0100SCédric Le Goater 135b87a0100SCédric Le Goater 136b87a0100SCédric Le GoaterFor legacy guest OSes without XIVE support, the resulting interrupt 137b87a0100SCédric Le Goatermodes are the following: 138b87a0100SCédric Le Goater 139b87a0100SCédric Le Goater============== ============= ============= ================ 140b87a0100SCédric Le Goateric-mode kernel_irqchip 141b87a0100SCédric Le Goater-------------- ---------------------------------------------- 142b87a0100SCédric Le Goater/ allowed off on 143b87a0100SCédric Le Goater (default) 144b87a0100SCédric Le Goater============== ============= ============= ================ 145b87a0100SCédric Le Goaterdual (default) QEMU error(4) XICS emul. QEMU error(4) 146b87a0100SCédric Le Goaterxive QEMU error(3) QEMU error(3) QEMU error(3) 147b87a0100SCédric Le Goaterxics XICS KVM XICS emul. XICS KVM 148b87a0100SCédric Le Goater============== ============= ============= ================ 149b87a0100SCédric Le Goater 150b87a0100SCédric Le Goater(3) QEMU fails at CAS with ``Guest requested unavailable interrupt 151b87a0100SCédric Le Goater mode (XICS), either don't set the ic-mode machine property or try 152b87a0100SCédric Le Goater ic-mode=xics or ic-mode=dual`` 1537abc0c6dSGreg Kurz(4) QEMU/KVM incompatibility due to device destruction in reset. QEMU fails 154c55bcb1fSGreg Kurz with ``KVM is incompatible with ic-mode=dual,kernel-irqchip=on`` 155b87a0100SCédric Le Goater 156b87a0100SCédric Le Goater 15724563a58SCédric Le GoaterXIVE Device tree properties 15824563a58SCédric Le Goater--------------------------- 15924563a58SCédric Le Goater 16024563a58SCédric Le GoaterThe properties for the PAPR interrupt controller node when the *XIVE 161*76ca4b58Szhaolichangnative exploitation mode* is selected should contain: 16224563a58SCédric Le Goater 16324563a58SCédric Le Goater- ``device_type`` 16424563a58SCédric Le Goater 16524563a58SCédric Le Goater value should be "power-ivpe". 16624563a58SCédric Le Goater 16724563a58SCédric Le Goater- ``compatible`` 16824563a58SCédric Le Goater 16924563a58SCédric Le Goater value should be "ibm,power-ivpe". 17024563a58SCédric Le Goater 17124563a58SCédric Le Goater- ``reg`` 17224563a58SCédric Le Goater 17324563a58SCédric Le Goater contains the base address and size of the thread interrupt 17424563a58SCédric Le Goater managnement areas (TIMA), for the User level and for the Guest OS 17524563a58SCédric Le Goater level. Only the Guest OS level is taken into account today. 17624563a58SCédric Le Goater 17724563a58SCédric Le Goater- ``ibm,xive-eq-sizes`` 17824563a58SCédric Le Goater 17924563a58SCédric Le Goater the size of the event queues. One cell per size supported, contains 18024563a58SCédric Le Goater log2 of size, in ascending order. 18124563a58SCédric Le Goater 18224563a58SCédric Le Goater- ``ibm,xive-lisn-ranges`` 18324563a58SCédric Le Goater 18424563a58SCédric Le Goater the IRQ interrupt number ranges assigned to the guest for the IPIs. 18524563a58SCédric Le Goater 18624563a58SCédric Le GoaterThe root node also exports : 18724563a58SCédric Le Goater 18824563a58SCédric Le Goater- ``ibm,plat-res-int-priorities`` 18924563a58SCédric Le Goater 19024563a58SCédric Le Goater contains a list of priorities that the hypervisor has reserved for 19124563a58SCédric Le Goater its own use. 19224563a58SCédric Le Goater 19324563a58SCédric Le GoaterIRQ number space 19424563a58SCédric Le Goater---------------- 19524563a58SCédric Le Goater 19624563a58SCédric Le GoaterIRQ Number space of the ``pseries`` machine is 8K wide and is the same 19724563a58SCédric Le Goaterfor both interrupt mode. The different ranges are defined as follow : 19824563a58SCédric Le Goater 19924563a58SCédric Le Goater- ``0x0000 .. 0x0FFF`` 4K CPU IPIs (only used under XIVE) 20024563a58SCédric Le Goater- ``0x1000 .. 0x1000`` 1 EPOW 20124563a58SCédric Le Goater- ``0x1001 .. 0x1001`` 1 HOTPLUG 202b87a0100SCédric Le Goater- ``0x1002 .. 0x10FF`` unused 20324563a58SCédric Le Goater- ``0x1100 .. 0x11FF`` 256 VIO devices 204b87a0100SCédric Le Goater- ``0x1200 .. 0x127F`` 32x4 LSIs for PHB devices 20524563a58SCédric Le Goater- ``0x1280 .. 0x12FF`` unused 206b87a0100SCédric Le Goater- ``0x1300 .. 0x1FFF`` PHB MSIs (dynamically allocated) 20724563a58SCédric Le Goater 20824563a58SCédric Le GoaterMonitoring XIVE 20924563a58SCédric Le Goater--------------- 21024563a58SCédric Le Goater 21124563a58SCédric Le GoaterThe state of the XIVE interrupt controller can be queried through the 21224563a58SCédric Le Goatermonitor commands ``info pic``. The output comes in two parts. 21324563a58SCédric Le Goater 21424563a58SCédric Le GoaterFirst, the state of the thread interrupt context registers is dumped 21524563a58SCédric Le Goaterfor each CPU : 21624563a58SCédric Le Goater 21724563a58SCédric Le Goater:: 21824563a58SCédric Le Goater 21924563a58SCédric Le Goater (qemu) info pic 22024563a58SCédric Le Goater CPU[0000]: QW NSR CPPR IPB LSMFB ACK# INC AGE PIPR W2 22124563a58SCédric Le Goater CPU[0000]: USER 00 00 00 00 00 00 00 00 00000000 22224563a58SCédric Le Goater CPU[0000]: OS 00 ff 00 00 ff 00 ff ff 80000400 22324563a58SCédric Le Goater CPU[0000]: POOL 00 00 00 00 00 00 00 00 00000000 22424563a58SCédric Le Goater CPU[0000]: PHYS 00 00 00 00 00 00 00 ff 00000000 22524563a58SCédric Le Goater ... 22624563a58SCédric Le Goater 22724563a58SCédric Le GoaterIn the case of a ``pseries`` machine, QEMU acts as the hypervisor and only 22824563a58SCédric Le Goaterthe O/S and USER register rings make sense. ``W2`` contains the vCPU CAM 22924563a58SCédric Le Goaterline which is set to the VP identifier. 23024563a58SCédric Le Goater 23124563a58SCédric Le GoaterThen comes the routing information which aggregates the EAS and the 23224563a58SCédric Le GoaterEND configuration: 23324563a58SCédric Le Goater 23424563a58SCédric Le Goater:: 23524563a58SCédric Le Goater 23624563a58SCédric Le Goater ... 23724563a58SCédric Le Goater LISN PQ EISN CPU/PRIO EQ 23824563a58SCédric Le Goater 00000000 MSI -- 00000010 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] 23924563a58SCédric Le Goater 00000001 MSI -- 00000010 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] 24024563a58SCédric Le Goater 00000002 MSI -- 00000010 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ] 24124563a58SCédric Le Goater 00000003 MSI -- 00000010 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ] 24224563a58SCédric Le Goater 00000004 MSI -Q M 00000000 24324563a58SCédric Le Goater 00000005 MSI -Q M 00000000 24424563a58SCédric Le Goater 00000006 MSI -Q M 00000000 24524563a58SCédric Le Goater 00000007 MSI -Q M 00000000 24624563a58SCédric Le Goater 00001000 MSI -- 00000012 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] 24724563a58SCédric Le Goater 00001001 MSI -- 00000013 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] 24824563a58SCédric Le Goater 00001100 MSI -- 00000100 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] 24924563a58SCédric Le Goater 00001101 MSI -Q M 00000000 25024563a58SCédric Le Goater 00001200 LSI -Q M 00000000 25124563a58SCédric Le Goater 00001201 LSI -Q M 00000000 25224563a58SCédric Le Goater 00001202 LSI -Q M 00000000 25324563a58SCédric Le Goater 00001203 LSI -Q M 00000000 25424563a58SCédric Le Goater 00001300 MSI -- 00000102 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] 25524563a58SCédric Le Goater 00001301 MSI -- 00000103 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ] 25624563a58SCédric Le Goater 00001302 MSI -- 00000104 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ] 25724563a58SCédric Le Goater 25824563a58SCédric Le GoaterThe source information and configuration: 25924563a58SCédric Le Goater 26024563a58SCédric Le Goater- The ``LISN`` column outputs the interrupt number of the source in 26124563a58SCédric Le Goater range ``[ 0x0 ... 0x1FFF ]`` and its type : ``MSI`` or ``LSI`` 26224563a58SCédric Le Goater- The ``PQ`` column reflects the state of the PQ bits of the source : 26324563a58SCédric Le Goater 26424563a58SCédric Le Goater - ``--`` source is ready to take events 26524563a58SCédric Le Goater - ``P-`` an event was sent and an EOI is PENDING 26624563a58SCédric Le Goater - ``PQ`` an event was QUEUED 26724563a58SCédric Le Goater - ``-Q`` source is OFF 26824563a58SCédric Le Goater 26924563a58SCédric Le Goater a ``M`` indicates that source is *MASKED* at the EAS level, 27024563a58SCédric Le Goater 27124563a58SCédric Le GoaterThe targeting configuration : 27224563a58SCédric Le Goater 27324563a58SCédric Le Goater- The ``EISN`` column is the event data that will be queued in the event 27424563a58SCédric Le Goater queue of the O/S. 27524563a58SCédric Le Goater- The ``CPU/PRIO`` column is the tuple defining the CPU number and 27624563a58SCédric Le Goater priority queue serving the source. 27724563a58SCédric Le Goater- The ``EQ`` column outputs : 27824563a58SCédric Le Goater 27924563a58SCédric Le Goater - the current index of the event queue/ the max number of entries 28024563a58SCédric Le Goater - the O/S event queue address 28124563a58SCédric Le Goater - the toggle bit 28224563a58SCédric Le Goater - the last entries that were pushed in the event queue. 283