1.. SPDX-License-Identifier: GPL-2.0 2 3Speculative Return Stack Overflow (SRSO) 4======================================== 5 6This is a mitigation for the speculative return stack overflow (SRSO) 7vulnerability found on AMD processors. The mechanism is by now the well 8known scenario of poisoning CPU functional units - the Branch Target 9Buffer (BTB) and Return Address Predictor (RAP) in this case - and then 10tricking the elevated privilege domain (the kernel) into leaking 11sensitive data. 12 13AMD CPUs predict RET instructions using a Return Address Predictor (aka 14Return Address Stack/Return Stack Buffer). In some cases, a non-architectural 15CALL instruction (i.e., an instruction predicted to be a CALL but is 16not actually a CALL) can create an entry in the RAP which may be used 17to predict the target of a subsequent RET instruction. 18 19The specific circumstances that lead to this varies by microarchitecture 20but the concern is that an attacker can mis-train the CPU BTB to predict 21non-architectural CALL instructions in kernel space and use this to 22control the speculative target of a subsequent kernel RET, potentially 23leading to information disclosure via a speculative side-channel. 24 25The issue is tracked under CVE-2023-20569. 26 27Affected processors 28------------------- 29 30AMD Zen, generations 1-4. That is, all families 0x17 and 0x19. Older 31processors have not been investigated. 32 33System information and options 34------------------------------ 35 36First of all, it is required that the latest microcode be loaded for 37mitigations to be effective. 38 39The sysfs file showing SRSO mitigation status is: 40 41 /sys/devices/system/cpu/vulnerabilities/spec_rstack_overflow 42 43The possible values in this file are: 44 45 * 'Not affected': 46 47 The processor is not vulnerable 48 49* 'Vulnerable': 50 51 The processor is vulnerable and no mitigations have been applied. 52 53 * 'Vulnerable: No microcode': 54 55 The processor is vulnerable, no microcode extending IBPB 56 functionality to address the vulnerability has been applied. 57 58 * 'Vulnerable: Safe RET, no microcode': 59 60 The "Safe RET" mitigation (see below) has been applied to protect the 61 kernel, but the IBPB-extending microcode has not been applied. User 62 space tasks may still be vulnerable. 63 64 * 'Vulnerable: Microcode, no safe RET': 65 66 Extended IBPB functionality microcode patch has been applied. It does 67 not address User->Kernel and Guest->Host transitions protection but it 68 does address User->User and VM->VM attack vectors. 69 70 Note that User->User mitigation is controlled by how the IBPB aspect in 71 the Spectre v2 mitigation is selected: 72 73 * conditional IBPB: 74 75 where each process can select whether it needs an IBPB issued 76 around it PR_SPEC_DISABLE/_ENABLE etc, see :doc:`spectre` 77 78 * strict: 79 80 i.e., always on - by supplying spectre_v2_user=on on the kernel 81 command line 82 83 (spec_rstack_overflow=microcode) 84 85 * 'Mitigation: Safe RET': 86 87 Combined microcode/software mitigation. It complements the 88 extended IBPB microcode patch functionality by addressing 89 User->Kernel and Guest->Host transitions protection. 90 91 Selected by default or by spec_rstack_overflow=safe-ret 92 93 * 'Mitigation: IBPB': 94 95 Similar protection as "safe RET" above but employs an IBPB barrier on 96 privilege domain crossings (User->Kernel, Guest->Host). 97 98 (spec_rstack_overflow=ibpb) 99 100 * 'Mitigation: IBPB on VMEXIT': 101 102 Mitigation addressing the cloud provider scenario - the Guest->Host 103 transitions only. 104 105 (spec_rstack_overflow=ibpb-vmexit) 106 107 * 'Mitigation: Reduced Speculation': 108 109 This mitigation gets automatically enabled when the above one "IBPB on 110 VMEXIT" has been selected and the CPU supports the BpSpecReduce bit. 111 112 It gets automatically enabled on machines which have the 113 SRSO_USER_KERNEL_NO=1 CPUID bit. In that case, the code logic is to switch 114 to the above =ibpb-vmexit mitigation because the user/kernel boundary is 115 not affected anymore and thus "safe RET" is not needed. 116 117 After enabling the IBPB on VMEXIT mitigation option, the BpSpecReduce bit 118 is detected (functionality present on all such machines) and that 119 practically overrides IBPB on VMEXIT as it has a lot less performance 120 impact and takes care of the guest->host attack vector too. 121 122In order to exploit vulnerability, an attacker needs to: 123 124 - gain local access on the machine 125 126 - break kASLR 127 128 - find gadgets in the running kernel in order to use them in the exploit 129 130 - potentially create and pin an additional workload on the sibling 131 thread, depending on the microarchitecture (not necessary on fam 0x19) 132 133 - run the exploit 134 135Considering the performance implications of each mitigation type, the 136default one is 'Mitigation: safe RET' which should take care of most 137attack vectors, including the local User->Kernel one. 138 139As always, the user is advised to keep her/his system up-to-date by 140applying software updates regularly. 141 142The default setting will be reevaluated when needed and especially when 143new attack vectors appear. 144 145As one can surmise, 'Mitigation: safe RET' does come at the cost of some 146performance depending on the workload. If one trusts her/his userspace 147and does not want to suffer the performance impact, one can always 148disable the mitigation with spec_rstack_overflow=off. 149 150Similarly, 'Mitigation: IBPB' is another full mitigation type employing 151an indirect branch prediction barrier after having applied the required 152microcode patch for one's system. This mitigation comes also at 153a performance cost. 154 155Mitigation: Safe RET 156-------------------- 157 158The mitigation works by ensuring all RET instructions speculate to 159a controlled location, similar to how speculation is controlled in the 160retpoline sequence. To accomplish this, the __x86_return_thunk forces 161the CPU to mispredict every function return using a 'safe return' 162sequence. 163 164To ensure the safety of this mitigation, the kernel must ensure that the 165safe return sequence is itself free from attacker interference. In Zen3 166and Zen4, this is accomplished by creating a BTB alias between the 167untraining function srso_alias_untrain_ret() and the safe return 168function srso_alias_safe_ret() which results in evicting a potentially 169poisoned BTB entry and using that safe one for all function returns. 170 171In older Zen1 and Zen2, this is accomplished using a reinterpretation 172technique similar to Retbleed one: srso_untrain_ret() and 173srso_safe_ret(). 174 175Checking the safe RET mitigation actually works 176----------------------------------------------- 177 178In case one wants to validate whether the SRSO safe RET mitigation works 179on a kernel, one could use two performance counters 180 181* PMC_0xc8 - Count of RET/RET lw retired 182* PMC_0xc9 - Count of RET/RET lw retired mispredicted 183 184and compare the number of RETs retired properly vs those retired 185mispredicted, in kernel mode. Another way of specifying those events 186is:: 187 188 # perf list ex_ret_near_ret 189 190 List of pre-defined events (to be used in -e or -M): 191 192 core: 193 ex_ret_near_ret 194 [Retired Near Returns] 195 ex_ret_near_ret_mispred 196 [Retired Near Returns Mispredicted] 197 198Either the command using the event mnemonics:: 199 200 # perf stat -e ex_ret_near_ret:k -e ex_ret_near_ret_mispred:k sleep 10s 201 202or using the raw PMC numbers:: 203 204 # perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s 205 206should give the same amount. I.e., every RET retired should be 207mispredicted:: 208 209 [root@brent: ~/kernel/linux/tools/perf> ./perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s 210 211 Performance counter stats for 'sleep 10s': 212 213 137,167 cpu/event=0xc8,umask=0/k 214 137,173 cpu/event=0xc9,umask=0/k 215 216 10.004110303 seconds time elapsed 217 218 0.000000000 seconds user 219 0.004462000 seconds sys 220 221vs the case when the mitigation is disabled (spec_rstack_overflow=off) 222or not functioning properly, showing usually a lot smaller number of 223mispredicted retired RETs vs the overall count of retired RETs during 224a workload:: 225 226 [root@brent: ~/kernel/linux/tools/perf> ./perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s 227 228 Performance counter stats for 'sleep 10s': 229 230 201,627 cpu/event=0xc8,umask=0/k 231 4,074 cpu/event=0xc9,umask=0/k 232 233 10.003267252 seconds time elapsed 234 235 0.002729000 seconds user 236 0.000000000 seconds sys 237 238Also, there is a selftest which performs the above, go to 239tools/testing/selftests/x86/ and do:: 240 241 make srso 242 ./srso 243