xref: /linux/Documentation/arch/x86/amd-hfi.rst (revision 98e8f2c0e0930feee6a2538450c74d9d7de0a9cc)
1.. SPDX-License-Identifier: GPL-2.0
2
3======================================================================
4Hardware Feedback Interface For Hetero Core Scheduling On AMD Platform
5======================================================================
6
7:Copyright: 2025 Advanced Micro Devices, Inc. All Rights Reserved.
8
9:Author: Perry Yuan <perry.yuan@amd.com>
10:Author: Mario Limonciello <mario.limonciello@amd.com>
11
12Overview
13--------
14
15AMD Heterogeneous Core implementations are comprised of more than one
16architectural class and CPUs are comprised of cores of various efficiency and
17power capabilities: performance-oriented *classic cores* and power-efficient
18*dense cores*. As such, power management strategies must be designed to
19accommodate the complexities introduced by incorporating different core types.
20Heterogeneous systems can also extend to more than two architectural classes
21as well. The purpose of the scheduling feedback mechanism is to provide
22information to the operating system scheduler in real time such that the
23scheduler can direct threads to the optimal core.
24
25The goal of AMD's heterogeneous architecture is to attain power benefit by
26sending background threads to the dense cores while sending high priority
27threads to the classic cores. From a performance perspective, sending
28background threads to dense cores can free up power headroom and allow the
29classic cores to optimally service demanding threads. Furthermore, the area
30optimized nature of the dense cores allows for an increasing number of
31physical cores. This improved core density will have positive multithreaded
32performance impact.
33
34AMD Heterogeneous Core Driver
35-----------------------------
36
37The ``amd_hfi`` driver delivers the operating system a performance and energy
38efficiency capability data for each CPU in the system. The scheduler can use
39the ranking data from the HFI driver to make task placement decisions.
40
41Thread Classification and Ranking Table Interaction
42----------------------------------------------------
43
44The thread classification is used to select into a ranking table that
45describes an efficiency and performance ranking for each classification.
46
47Threads are classified during runtime into enumerated classes. The classes
48represent thread performance/power characteristics that may benefit from
49special scheduling behaviors. The below table depicts an example of thread
50classification and a preference where a given thread should be scheduled
51based on its thread class. The real time thread classification is consumed
52by the operating system and is used to inform the scheduler of where the
53thread should be placed.
54
55Thread Classification Example Table
56^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
57+----------+----------------+-------------------------------+---------------------+---------+
58| class ID | Classification | Preferred scheduling behavior | Preemption priority | Counter |
59+----------+----------------+-------------------------------+---------------------+---------+
60| 0        | Default        | Performant                    | Highest             |         |
61+----------+----------------+-------------------------------+---------------------+---------+
62| 1        | Non-scalable   | Efficient                     | Lowest              | PMCx1A1 |
63+----------+----------------+-------------------------------+---------------------+---------+
64| 2        | I/O bound      | Efficient                     | Lowest              | PMCx044 |
65+----------+----------------+-------------------------------+---------------------+---------+
66
67Thread classification is performed by the hardware each time that the thread is switched out.
68Threads that don't meet any hardware specified criteria are classified as "default".
69
70AMD Hardware Feedback Interface
71--------------------------------
72
73The Hardware Feedback Interface provides to the operating system information
74about the performance and energy efficiency of each CPU in the system. Each
75capability is given as a unit-less quantity in the range [0-255]. A higher
76performance value indicates higher performance capability, and a higher
77efficiency value indicates more efficiency. Energy efficiency and performance
78are reported in separate capabilities in the shared memory based ranking table.
79
80These capabilities may change at runtime as a result of changes in the
81operating conditions of the system or the action of external factors.
82Power Management firmware is responsible for detecting events that require
83a reordering of the performance and efficiency ranking. Table updates happen
84relatively infrequently and occur on the time scale of seconds or more.
85
86The following events trigger a table update:
87    * Thermal Stress Events
88    * Silent Compute
89    * Extreme Low Battery Scenarios
90
91The kernel or a userspace policy daemon can use these capabilities to modify
92task placement decisions. For instance, if either the performance or energy
93capabilities of a given logical processor becomes zero, it is an indication
94that the hardware recommends to the operating system to not schedule any tasks
95on that processor for performance or energy efficiency reasons, respectively.
96
97Implementation details for Linux
98--------------------------------
99
100The implementation of threads scheduling consists of the following steps:
101
1021. A thread is spawned and scheduled to the ideal core using the default
103   heterogeneous scheduling policy.
1042. The processor profiles thread execution and assigns an enumerated
105   classification ID.
106   This classification is communicated to the OS via logical processor
107   scope MSR.
1083. During the thread context switch out the operating system consumes the
109   workload (WL) classification which resides in a logical processor scope MSR.
1104. The OS triggers the hardware to clear its history by writing to an MSR,
111   after consuming the WL classification and before switching in the new thread.
1125. If due to the classification, ranking table, and processor availability,
113   the thread is not on its ideal processor, the OS will then consider
114   scheduling the thread on its ideal processor (if available).
115
116Ranking Table
117-------------
118The ranking table is a shared memory region that is used to communicate the
119performance and energy efficiency capabilities of each CPU in the system.
120
121The ranking table design includes rankings for each APIC ID in the system and
122rankings both for performance and efficiency for each workload classification.
123
124.. kernel-doc:: drivers/platform/x86/amd/hfi/hfi.c
125   :doc: amd_shmem_info
126
127Ranking Table update
128---------------------------
129The power management firmware issues an platform interrupt after updating the
130ranking table and is ready for the operating system to consume it. CPUs receive
131such interrupt and read new ranking table from shared memory which PCCT table
132has provided, then ``amd_hfi`` driver parses the new table to provide new
133consume data for scheduling decisions.
134