1.. SPDX-License-Identifier: GPL-2.0
2
3Speculative Return Stack Overflow (SRSO)
4========================================
5
6This is a mitigation for the speculative return stack overflow (SRSO)
7vulnerability found on AMD processors. The mechanism is by now the well
8known scenario of poisoning CPU functional units - the Branch Target
9Buffer (BTB) and Return Address Predictor (RAP) in this case - and then
10tricking the elevated privilege domain (the kernel) into leaking
11sensitive data.
12
13AMD CPUs predict RET instructions using a Return Address Predictor (aka
14Return Address Stack/Return Stack Buffer). In some cases, a non-architectural
15CALL instruction (i.e., an instruction predicted to be a CALL but is
16not actually a CALL) can create an entry in the RAP which may be used
17to predict the target of a subsequent RET instruction.
18
19The specific circumstances that lead to this varies by microarchitecture
20but the concern is that an attacker can mis-train the CPU BTB to predict
21non-architectural CALL instructions in kernel space and use this to
22control the speculative target of a subsequent kernel RET, potentially
23leading to information disclosure via a speculative side-channel.
24
25The issue is tracked under CVE-2023-20569.
26
27Affected processors
28-------------------
29
30AMD Zen, generations 1-4. That is, all families 0x17 and 0x19. Older
31processors have not been investigated.
32
33System information and options
34------------------------------
35
36First of all, it is required that the latest microcode be loaded for
37mitigations to be effective.
38
39The sysfs file showing SRSO mitigation status is:
40
41  /sys/devices/system/cpu/vulnerabilities/spec_rstack_overflow
42
43The possible values in this file are:
44
45 * 'Not affected':
46
47   The processor is not vulnerable
48
49* 'Vulnerable':
50
51   The processor is vulnerable and no mitigations have been applied.
52
53 * 'Vulnerable: No microcode':
54
55   The processor is vulnerable, no microcode extending IBPB
56   functionality to address the vulnerability has been applied.
57
58 * 'Vulnerable: Safe RET, no microcode':
59
60   The "Safe RET" mitigation (see below) has been applied to protect the
61   kernel, but the IBPB-extending microcode has not been applied.  User
62   space tasks may still be vulnerable.
63
64 * 'Vulnerable: Microcode, no safe RET':
65
66   Extended IBPB functionality microcode patch has been applied. It does
67   not address User->Kernel and Guest->Host transitions protection but it
68   does address User->User and VM->VM attack vectors.
69
70   Note that User->User mitigation is controlled by how the IBPB aspect in
71   the Spectre v2 mitigation is selected:
72
73    * conditional IBPB:
74
75      where each process can select whether it needs an IBPB issued
76      around it PR_SPEC_DISABLE/_ENABLE etc, see :doc:`spectre`
77
78    * strict:
79
80      i.e., always on - by supplying spectre_v2_user=on on the kernel
81      command line
82
83   (spec_rstack_overflow=microcode)
84
85 * 'Mitigation: Safe RET':
86
87   Combined microcode/software mitigation. It complements the
88   extended IBPB microcode patch functionality by addressing
89   User->Kernel and Guest->Host transitions protection.
90
91   Selected by default or by spec_rstack_overflow=safe-ret
92
93 * 'Mitigation: IBPB':
94
95   Similar protection as "safe RET" above but employs an IBPB barrier on
96   privilege domain crossings (User->Kernel, Guest->Host).
97
98  (spec_rstack_overflow=ibpb)
99
100 * 'Mitigation: IBPB on VMEXIT':
101
102   Mitigation addressing the cloud provider scenario - the Guest->Host
103   transitions only.
104
105   (spec_rstack_overflow=ibpb-vmexit)
106
107 * 'Mitigation: Reduced Speculation':
108
109   This mitigation gets automatically enabled when the above one "IBPB on
110   VMEXIT" has been selected and the CPU supports the BpSpecReduce bit.
111
112   It gets automatically enabled on machines which have the
113   SRSO_USER_KERNEL_NO=1 CPUID bit. In that case, the code logic is to switch
114   to the above =ibpb-vmexit mitigation because the user/kernel boundary is
115   not affected anymore and thus "safe RET" is not needed.
116
117   After enabling the IBPB on VMEXIT mitigation option, the BpSpecReduce bit
118   is detected (functionality present on all such machines) and that
119   practically overrides IBPB on VMEXIT as it has a lot less performance
120   impact and takes care of the guest->host attack vector too.
121
122In order to exploit vulnerability, an attacker needs to:
123
124 - gain local access on the machine
125
126 - break kASLR
127
128 - find gadgets in the running kernel in order to use them in the exploit
129
130 - potentially create and pin an additional workload on the sibling
131   thread, depending on the microarchitecture (not necessary on fam 0x19)
132
133 - run the exploit
134
135Considering the performance implications of each mitigation type, the
136default one is 'Mitigation: safe RET' which should take care of most
137attack vectors, including the local User->Kernel one.
138
139As always, the user is advised to keep her/his system up-to-date by
140applying software updates regularly.
141
142The default setting will be reevaluated when needed and especially when
143new attack vectors appear.
144
145As one can surmise, 'Mitigation: safe RET' does come at the cost of some
146performance depending on the workload. If one trusts her/his userspace
147and does not want to suffer the performance impact, one can always
148disable the mitigation with spec_rstack_overflow=off.
149
150Similarly, 'Mitigation: IBPB' is another full mitigation type employing
151an indirect branch prediction barrier after having applied the required
152microcode patch for one's system. This mitigation comes also at
153a performance cost.
154
155Mitigation: Safe RET
156--------------------
157
158The mitigation works by ensuring all RET instructions speculate to
159a controlled location, similar to how speculation is controlled in the
160retpoline sequence.  To accomplish this, the __x86_return_thunk forces
161the CPU to mispredict every function return using a 'safe return'
162sequence.
163
164To ensure the safety of this mitigation, the kernel must ensure that the
165safe return sequence is itself free from attacker interference.  In Zen3
166and Zen4, this is accomplished by creating a BTB alias between the
167untraining function srso_alias_untrain_ret() and the safe return
168function srso_alias_safe_ret() which results in evicting a potentially
169poisoned BTB entry and using that safe one for all function returns.
170
171In older Zen1 and Zen2, this is accomplished using a reinterpretation
172technique similar to Retbleed one: srso_untrain_ret() and
173srso_safe_ret().
174
175Checking the safe RET mitigation actually works
176-----------------------------------------------
177
178In case one wants to validate whether the SRSO safe RET mitigation works
179on a kernel, one could use two performance counters
180
181* PMC_0xc8 - Count of RET/RET lw retired
182* PMC_0xc9 - Count of RET/RET lw retired mispredicted
183
184and compare the number of RETs retired properly vs those retired
185mispredicted, in kernel mode. Another way of specifying those events
186is::
187
188        # perf list ex_ret_near_ret
189
190        List of pre-defined events (to be used in -e or -M):
191
192        core:
193          ex_ret_near_ret
194               [Retired Near Returns]
195          ex_ret_near_ret_mispred
196               [Retired Near Returns Mispredicted]
197
198Either the command using the event mnemonics::
199
200        # perf stat -e ex_ret_near_ret:k -e ex_ret_near_ret_mispred:k sleep 10s
201
202or using the raw PMC numbers::
203
204        # perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s
205
206should give the same amount. I.e., every RET retired should be
207mispredicted::
208
209        [root@brent: ~/kernel/linux/tools/perf> ./perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s
210
211         Performance counter stats for 'sleep 10s':
212
213                   137,167      cpu/event=0xc8,umask=0/k
214                   137,173      cpu/event=0xc9,umask=0/k
215
216              10.004110303 seconds time elapsed
217
218               0.000000000 seconds user
219               0.004462000 seconds sys
220
221vs the case when the mitigation is disabled (spec_rstack_overflow=off)
222or not functioning properly, showing usually a lot smaller number of
223mispredicted retired RETs vs the overall count of retired RETs during
224a workload::
225
226       [root@brent: ~/kernel/linux/tools/perf> ./perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s
227
228        Performance counter stats for 'sleep 10s':
229
230                  201,627      cpu/event=0xc8,umask=0/k
231                    4,074      cpu/event=0xc9,umask=0/k
232
233             10.003267252 seconds time elapsed
234
235              0.002729000 seconds user
236              0.000000000 seconds sys
237
238Also, there is a selftest which performs the above, go to
239tools/testing/selftests/x86/ and do::
240
241        make srso
242        ./srso
243