1.. SPDX-License-Identifier: GPL-2.0 2 3====================================================================== 4Hardware Feedback Interface For Hetero Core Scheduling On AMD Platform 5====================================================================== 6 7:Copyright: 2025 Advanced Micro Devices, Inc. All Rights Reserved. 8 9:Author: Perry Yuan <perry.yuan@amd.com> 10:Author: Mario Limonciello <mario.limonciello@amd.com> 11 12Overview 13-------- 14 15AMD Heterogeneous Core implementations are comprised of more than one 16architectural class and CPUs are comprised of cores of various efficiency and 17power capabilities: performance-oriented *classic cores* and power-efficient 18*dense cores*. As such, power management strategies must be designed to 19accommodate the complexities introduced by incorporating different core types. 20Heterogeneous systems can also extend to more than two architectural classes 21as well. The purpose of the scheduling feedback mechanism is to provide 22information to the operating system scheduler in real time such that the 23scheduler can direct threads to the optimal core. 24 25The goal of AMD's heterogeneous architecture is to attain power benefit by 26sending background threads to the dense cores while sending high priority 27threads to the classic cores. From a performance perspective, sending 28background threads to dense cores can free up power headroom and allow the 29classic cores to optimally service demanding threads. Furthermore, the area 30optimized nature of the dense cores allows for an increasing number of 31physical cores. This improved core density will have positive multithreaded 32performance impact. 33 34AMD Heterogeneous Core Driver 35----------------------------- 36 37The ``amd_hfi`` driver delivers the operating system a performance and energy 38efficiency capability data for each CPU in the system. The scheduler can use 39the ranking data from the HFI driver to make task placement decisions. 40 41Thread Classification and Ranking Table Interaction 42---------------------------------------------------- 43 44The thread classification is used to select into a ranking table that 45describes an efficiency and performance ranking for each classification. 46 47Threads are classified during runtime into enumerated classes. The classes 48represent thread performance/power characteristics that may benefit from 49special scheduling behaviors. The below table depicts an example of thread 50classification and a preference where a given thread should be scheduled 51based on its thread class. The real time thread classification is consumed 52by the operating system and is used to inform the scheduler of where the 53thread should be placed. 54 55Thread Classification Example Table 56^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 57+----------+----------------+-------------------------------+---------------------+---------+ 58| class ID | Classification | Preferred scheduling behavior | Preemption priority | Counter | 59+----------+----------------+-------------------------------+---------------------+---------+ 60| 0 | Default | Performant | Highest | | 61+----------+----------------+-------------------------------+---------------------+---------+ 62| 1 | Non-scalable | Efficient | Lowest | PMCx1A1 | 63+----------+----------------+-------------------------------+---------------------+---------+ 64| 2 | I/O bound | Efficient | Lowest | PMCx044 | 65+----------+----------------+-------------------------------+---------------------+---------+ 66 67Thread classification is performed by the hardware each time that the thread is switched out. 68Threads that don't meet any hardware specified criteria are classified as "default". 69 70AMD Hardware Feedback Interface 71-------------------------------- 72 73The Hardware Feedback Interface provides to the operating system information 74about the performance and energy efficiency of each CPU in the system. Each 75capability is given as a unit-less quantity in the range [0-255]. A higher 76performance value indicates higher performance capability, and a higher 77efficiency value indicates more efficiency. Energy efficiency and performance 78are reported in separate capabilities in the shared memory based ranking table. 79 80These capabilities may change at runtime as a result of changes in the 81operating conditions of the system or the action of external factors. 82Power Management firmware is responsible for detecting events that require 83a reordering of the performance and efficiency ranking. Table updates happen 84relatively infrequently and occur on the time scale of seconds or more. 85 86The following events trigger a table update: 87 * Thermal Stress Events 88 * Silent Compute 89 * Extreme Low Battery Scenarios 90 91The kernel or a userspace policy daemon can use these capabilities to modify 92task placement decisions. For instance, if either the performance or energy 93capabilities of a given logical processor becomes zero, it is an indication 94that the hardware recommends to the operating system to not schedule any tasks 95on that processor for performance or energy efficiency reasons, respectively. 96 97Implementation details for Linux 98-------------------------------- 99 100The implementation of threads scheduling consists of the following steps: 101 1021. A thread is spawned and scheduled to the ideal core using the default 103 heterogeneous scheduling policy. 1042. The processor profiles thread execution and assigns an enumerated 105 classification ID. 106 This classification is communicated to the OS via logical processor 107 scope MSR. 1083. During the thread context switch out the operating system consumes the 109 workload (WL) classification which resides in a logical processor scope MSR. 1104. The OS triggers the hardware to clear its history by writing to an MSR, 111 after consuming the WL classification and before switching in the new thread. 1125. If due to the classification, ranking table, and processor availability, 113 the thread is not on its ideal processor, the OS will then consider 114 scheduling the thread on its ideal processor (if available). 115 116Ranking Table 117------------- 118The ranking table is a shared memory region that is used to communicate the 119performance and energy efficiency capabilities of each CPU in the system. 120 121The ranking table design includes rankings for each APIC ID in the system and 122rankings both for performance and efficiency for each workload classification. 123 124.. kernel-doc:: drivers/platform/x86/amd/hfi/hfi.c 125 :doc: amd_shmem_info 126 127Ranking Table update 128--------------------------- 129The power management firmware issues an platform interrupt after updating the 130ranking table and is ready for the operating system to consume it. CPUs receive 131such interrupt and read new ranking table from shared memory which PCCT table 132has provided, then ``amd_hfi`` driver parses the new table to provide new 133consume data for scheduling decisions. 134