13ec8ce5dSThomas GleixnerL1TF - L1 Terminal Fault 23ec8ce5dSThomas Gleixner======================== 33ec8ce5dSThomas Gleixner 43ec8ce5dSThomas GleixnerL1 Terminal Fault is a hardware vulnerability which allows unprivileged 53ec8ce5dSThomas Gleixnerspeculative access to data which is available in the Level 1 Data Cache 63ec8ce5dSThomas Gleixnerwhen the page table entry controlling the virtual address, which is used 73ec8ce5dSThomas Gleixnerfor the access, has the Present bit cleared or other reserved bits set. 83ec8ce5dSThomas Gleixner 93ec8ce5dSThomas GleixnerAffected processors 103ec8ce5dSThomas Gleixner------------------- 113ec8ce5dSThomas Gleixner 123ec8ce5dSThomas GleixnerThis vulnerability affects a wide range of Intel processors. The 133ec8ce5dSThomas Gleixnervulnerability is not present on: 143ec8ce5dSThomas Gleixner 153ec8ce5dSThomas Gleixner - Processors from AMD, Centaur and other non Intel vendors 163ec8ce5dSThomas Gleixner 173ec8ce5dSThomas Gleixner - Older processor models, where the CPU family is < 6 183ec8ce5dSThomas Gleixner 193ec8ce5dSThomas Gleixner - A range of Intel ATOM processors (Cedarview, Cloverview, Lincroft, 201949f9f4STony Luck Penwell, Pineview, Silvermont, Airmont, Merrifield) 213ec8ce5dSThomas Gleixner 223ec8ce5dSThomas Gleixner - The Intel XEON PHI family 233ec8ce5dSThomas Gleixner 243ec8ce5dSThomas Gleixner - Intel processors which have the ARCH_CAP_RDCL_NO bit set in the 253ec8ce5dSThomas Gleixner IA32_ARCH_CAPABILITIES MSR. If the bit is set the CPU is not affected 263ec8ce5dSThomas Gleixner by the Meltdown vulnerability either. These CPUs should become 273ec8ce5dSThomas Gleixner available by end of 2018. 283ec8ce5dSThomas Gleixner 293ec8ce5dSThomas GleixnerWhether a processor is affected or not can be read out from the L1TF 303ec8ce5dSThomas Gleixnervulnerability file in sysfs. See :ref:`l1tf_sys_info`. 313ec8ce5dSThomas Gleixner 323ec8ce5dSThomas GleixnerRelated CVEs 333ec8ce5dSThomas Gleixner------------ 343ec8ce5dSThomas Gleixner 353ec8ce5dSThomas GleixnerThe following CVE entries are related to the L1TF vulnerability: 363ec8ce5dSThomas Gleixner 373ec8ce5dSThomas Gleixner ============= ================= ============================== 383ec8ce5dSThomas Gleixner CVE-2018-3615 L1 Terminal Fault SGX related aspects 393ec8ce5dSThomas Gleixner CVE-2018-3620 L1 Terminal Fault OS, SMM related aspects 403ec8ce5dSThomas Gleixner CVE-2018-3646 L1 Terminal Fault Virtualization related aspects 413ec8ce5dSThomas Gleixner ============= ================= ============================== 423ec8ce5dSThomas Gleixner 433ec8ce5dSThomas GleixnerProblem 443ec8ce5dSThomas Gleixner------- 453ec8ce5dSThomas Gleixner 463ec8ce5dSThomas GleixnerIf an instruction accesses a virtual address for which the relevant page 473ec8ce5dSThomas Gleixnertable entry (PTE) has the Present bit cleared or other reserved bits set, 483ec8ce5dSThomas Gleixnerthen speculative execution ignores the invalid PTE and loads the referenced 493ec8ce5dSThomas Gleixnerdata if it is present in the Level 1 Data Cache, as if the page referenced 503ec8ce5dSThomas Gleixnerby the address bits in the PTE was still present and accessible. 513ec8ce5dSThomas Gleixner 523ec8ce5dSThomas GleixnerWhile this is a purely speculative mechanism and the instruction will raise 533ec8ce5dSThomas Gleixnera page fault when it is retired eventually, the pure act of loading the 543ec8ce5dSThomas Gleixnerdata and making it available to other speculative instructions opens up the 553ec8ce5dSThomas Gleixneropportunity for side channel attacks to unprivileged malicious code, 563ec8ce5dSThomas Gleixnersimilar to the Meltdown attack. 573ec8ce5dSThomas Gleixner 583ec8ce5dSThomas GleixnerWhile Meltdown breaks the user space to kernel space protection, L1TF 593ec8ce5dSThomas Gleixnerallows to attack any physical memory address in the system and the attack 603ec8ce5dSThomas Gleixnerworks across all protection domains. It allows an attack of SGX and also 613ec8ce5dSThomas Gleixnerworks from inside virtual machines because the speculation bypasses the 623ec8ce5dSThomas Gleixnerextended page table (EPT) protection mechanism. 633ec8ce5dSThomas Gleixner 643ec8ce5dSThomas Gleixner 653ec8ce5dSThomas GleixnerAttack scenarios 663ec8ce5dSThomas Gleixner---------------- 673ec8ce5dSThomas Gleixner 683ec8ce5dSThomas Gleixner1. Malicious user space 693ec8ce5dSThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^ 703ec8ce5dSThomas Gleixner 713ec8ce5dSThomas Gleixner Operating Systems store arbitrary information in the address bits of a 723ec8ce5dSThomas Gleixner PTE which is marked non present. This allows a malicious user space 733ec8ce5dSThomas Gleixner application to attack the physical memory to which these PTEs resolve. 743ec8ce5dSThomas Gleixner In some cases user-space can maliciously influence the information 753ec8ce5dSThomas Gleixner encoded in the address bits of the PTE, thus making attacks more 763ec8ce5dSThomas Gleixner deterministic and more practical. 773ec8ce5dSThomas Gleixner 783ec8ce5dSThomas Gleixner The Linux kernel contains a mitigation for this attack vector, PTE 793ec8ce5dSThomas Gleixner inversion, which is permanently enabled and has no performance 803ec8ce5dSThomas Gleixner impact. The kernel ensures that the address bits of PTEs, which are not 813ec8ce5dSThomas Gleixner marked present, never point to cacheable physical memory space. 823ec8ce5dSThomas Gleixner 833ec8ce5dSThomas Gleixner A system with an up to date kernel is protected against attacks from 843ec8ce5dSThomas Gleixner malicious user space applications. 853ec8ce5dSThomas Gleixner 863ec8ce5dSThomas Gleixner2. Malicious guest in a virtual machine 873ec8ce5dSThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 883ec8ce5dSThomas Gleixner 893ec8ce5dSThomas Gleixner The fact that L1TF breaks all domain protections allows malicious guest 903ec8ce5dSThomas Gleixner OSes, which can control the PTEs directly, and malicious guest user 913ec8ce5dSThomas Gleixner space applications, which run on an unprotected guest kernel lacking the 923ec8ce5dSThomas Gleixner PTE inversion mitigation for L1TF, to attack physical host memory. 933ec8ce5dSThomas Gleixner 943ec8ce5dSThomas Gleixner A special aspect of L1TF in the context of virtualization is symmetric 953ec8ce5dSThomas Gleixner multi threading (SMT). The Intel implementation of SMT is called 963ec8ce5dSThomas Gleixner HyperThreading. The fact that Hyperthreads on the affected processors 973ec8ce5dSThomas Gleixner share the L1 Data Cache (L1D) is important for this. As the flaw allows 983ec8ce5dSThomas Gleixner only to attack data which is present in L1D, a malicious guest running 993ec8ce5dSThomas Gleixner on one Hyperthread can attack the data which is brought into the L1D by 1003ec8ce5dSThomas Gleixner the context which runs on the sibling Hyperthread of the same physical 1013ec8ce5dSThomas Gleixner core. This context can be host OS, host user space or a different guest. 1023ec8ce5dSThomas Gleixner 1033ec8ce5dSThomas Gleixner If the processor does not support Extended Page Tables, the attack is 1043ec8ce5dSThomas Gleixner only possible, when the hypervisor does not sanitize the content of the 1053ec8ce5dSThomas Gleixner effective (shadow) page tables. 1063ec8ce5dSThomas Gleixner 1073ec8ce5dSThomas Gleixner While solutions exist to mitigate these attack vectors fully, these 1083ec8ce5dSThomas Gleixner mitigations are not enabled by default in the Linux kernel because they 1093ec8ce5dSThomas Gleixner can affect performance significantly. The kernel provides several 1103ec8ce5dSThomas Gleixner mechanisms which can be utilized to address the problem depending on the 1113ec8ce5dSThomas Gleixner deployment scenario. The mitigations, their protection scope and impact 1123ec8ce5dSThomas Gleixner are described in the next sections. 1133ec8ce5dSThomas Gleixner 1141949f9f4STony Luck The default mitigations and the rationale for choosing them are explained 1153ec8ce5dSThomas Gleixner at the end of this document. See :ref:`default_mitigations`. 1163ec8ce5dSThomas Gleixner 1173ec8ce5dSThomas Gleixner.. _l1tf_sys_info: 1183ec8ce5dSThomas Gleixner 1193ec8ce5dSThomas GleixnerL1TF system information 1203ec8ce5dSThomas Gleixner----------------------- 1213ec8ce5dSThomas Gleixner 1223ec8ce5dSThomas GleixnerThe Linux kernel provides a sysfs interface to enumerate the current L1TF 1233ec8ce5dSThomas Gleixnerstatus of the system: whether the system is vulnerable, and which 1243ec8ce5dSThomas Gleixnermitigations are active. The relevant sysfs file is: 1253ec8ce5dSThomas Gleixner 1263ec8ce5dSThomas Gleixner/sys/devices/system/cpu/vulnerabilities/l1tf 1273ec8ce5dSThomas Gleixner 1283ec8ce5dSThomas GleixnerThe possible values in this file are: 1293ec8ce5dSThomas Gleixner 1303ec8ce5dSThomas Gleixner =========================== =============================== 1313ec8ce5dSThomas Gleixner 'Not affected' The processor is not vulnerable 1323ec8ce5dSThomas Gleixner 'Mitigation: PTE Inversion' The host protection is active 1333ec8ce5dSThomas Gleixner =========================== =============================== 1343ec8ce5dSThomas Gleixner 1353ec8ce5dSThomas GleixnerIf KVM/VMX is enabled and the processor is vulnerable then the following 1363ec8ce5dSThomas Gleixnerinformation is appended to the 'Mitigation: PTE Inversion' part: 1373ec8ce5dSThomas Gleixner 1383ec8ce5dSThomas Gleixner - SMT status: 1393ec8ce5dSThomas Gleixner 1403ec8ce5dSThomas Gleixner ===================== ================ 1413ec8ce5dSThomas Gleixner 'VMX: SMT vulnerable' SMT is enabled 1423ec8ce5dSThomas Gleixner 'VMX: SMT disabled' SMT is disabled 1433ec8ce5dSThomas Gleixner ===================== ================ 1443ec8ce5dSThomas Gleixner 1453ec8ce5dSThomas Gleixner - L1D Flush mode: 1463ec8ce5dSThomas Gleixner 1473ec8ce5dSThomas Gleixner ================================ ==================================== 1483ec8ce5dSThomas Gleixner 'L1D vulnerable' L1D flushing is disabled 1493ec8ce5dSThomas Gleixner 1503ec8ce5dSThomas Gleixner 'L1D conditional cache flushes' L1D flush is conditionally enabled 1513ec8ce5dSThomas Gleixner 1523ec8ce5dSThomas Gleixner 'L1D cache flushes' L1D flush is unconditionally enabled 1533ec8ce5dSThomas Gleixner ================================ ==================================== 1543ec8ce5dSThomas Gleixner 1553ec8ce5dSThomas GleixnerThe resulting grade of protection is discussed in the following sections. 1563ec8ce5dSThomas Gleixner 1573ec8ce5dSThomas Gleixner 1583ec8ce5dSThomas GleixnerHost mitigation mechanism 1593ec8ce5dSThomas Gleixner------------------------- 1603ec8ce5dSThomas Gleixner 1613ec8ce5dSThomas GleixnerThe kernel is unconditionally protected against L1TF attacks from malicious 1623ec8ce5dSThomas Gleixneruser space running on the host. 1633ec8ce5dSThomas Gleixner 1643ec8ce5dSThomas Gleixner 1653ec8ce5dSThomas GleixnerGuest mitigation mechanisms 1663ec8ce5dSThomas Gleixner--------------------------- 1673ec8ce5dSThomas Gleixner 1683ec8ce5dSThomas Gleixner.. _l1d_flush: 1693ec8ce5dSThomas Gleixner 1703ec8ce5dSThomas Gleixner1. L1D flush on VMENTER 1713ec8ce5dSThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^ 1723ec8ce5dSThomas Gleixner 1733ec8ce5dSThomas Gleixner To make sure that a guest cannot attack data which is present in the L1D 1743ec8ce5dSThomas Gleixner the hypervisor flushes the L1D before entering the guest. 1753ec8ce5dSThomas Gleixner 1763ec8ce5dSThomas Gleixner Flushing the L1D evicts not only the data which should not be accessed 1773ec8ce5dSThomas Gleixner by a potentially malicious guest, it also flushes the guest 1783ec8ce5dSThomas Gleixner data. Flushing the L1D has a performance impact as the processor has to 1793ec8ce5dSThomas Gleixner bring the flushed guest data back into the L1D. Depending on the 1803ec8ce5dSThomas Gleixner frequency of VMEXIT/VMENTER and the type of computations in the guest 1813ec8ce5dSThomas Gleixner performance degradation in the range of 1% to 50% has been observed. For 1823ec8ce5dSThomas Gleixner scenarios where guest VMEXIT/VMENTER are rare the performance impact is 1833ec8ce5dSThomas Gleixner minimal. Virtio and mechanisms like posted interrupts are designed to 1843ec8ce5dSThomas Gleixner confine the VMEXITs to a bare minimum, but specific configurations and 1853ec8ce5dSThomas Gleixner application scenarios might still suffer from a high VMEXIT rate. 1863ec8ce5dSThomas Gleixner 1873ec8ce5dSThomas Gleixner The kernel provides two L1D flush modes: 1883ec8ce5dSThomas Gleixner - conditional ('cond') 1893ec8ce5dSThomas Gleixner - unconditional ('always') 1903ec8ce5dSThomas Gleixner 1913ec8ce5dSThomas Gleixner The conditional mode avoids L1D flushing after VMEXITs which execute 1921949f9f4STony Luck only audited code paths before the corresponding VMENTER. These code 1931949f9f4STony Luck paths have been verified that they cannot expose secrets or other 1943ec8ce5dSThomas Gleixner interesting data to an attacker, but they can leak information about the 1953ec8ce5dSThomas Gleixner address space layout of the hypervisor. 1963ec8ce5dSThomas Gleixner 1973ec8ce5dSThomas Gleixner Unconditional mode flushes L1D on all VMENTER invocations and provides 1983ec8ce5dSThomas Gleixner maximum protection. It has a higher overhead than the conditional 1993ec8ce5dSThomas Gleixner mode. The overhead cannot be quantified correctly as it depends on the 2003ec8ce5dSThomas Gleixner workload scenario and the resulting number of VMEXITs. 2013ec8ce5dSThomas Gleixner 2023ec8ce5dSThomas Gleixner The general recommendation is to enable L1D flush on VMENTER. The kernel 2033ec8ce5dSThomas Gleixner defaults to conditional mode on affected processors. 2043ec8ce5dSThomas Gleixner 2053ec8ce5dSThomas Gleixner **Note**, that L1D flush does not prevent the SMT problem because the 2063ec8ce5dSThomas Gleixner sibling thread will also bring back its data into the L1D which makes it 2073ec8ce5dSThomas Gleixner attackable again. 2083ec8ce5dSThomas Gleixner 2093ec8ce5dSThomas Gleixner L1D flush can be controlled by the administrator via the kernel command 2103ec8ce5dSThomas Gleixner line and sysfs control files. See :ref:`mitigation_control_command_line` 2113ec8ce5dSThomas Gleixner and :ref:`mitigation_control_kvm`. 2123ec8ce5dSThomas Gleixner 2133ec8ce5dSThomas Gleixner.. _guest_confinement: 2143ec8ce5dSThomas Gleixner 2153ec8ce5dSThomas Gleixner2. Guest VCPU confinement to dedicated physical cores 2163ec8ce5dSThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2173ec8ce5dSThomas Gleixner 2183ec8ce5dSThomas Gleixner To address the SMT problem, it is possible to make a guest or a group of 2193ec8ce5dSThomas Gleixner guests affine to one or more physical cores. The proper mechanism for 2203ec8ce5dSThomas Gleixner that is to utilize exclusive cpusets to ensure that no other guest or 2213ec8ce5dSThomas Gleixner host tasks can run on these cores. 2223ec8ce5dSThomas Gleixner 2233ec8ce5dSThomas Gleixner If only a single guest or related guests run on sibling SMT threads on 2243ec8ce5dSThomas Gleixner the same physical core then they can only attack their own memory and 2253ec8ce5dSThomas Gleixner restricted parts of the host memory. 2263ec8ce5dSThomas Gleixner 2273ec8ce5dSThomas Gleixner Host memory is attackable, when one of the sibling SMT threads runs in 2283ec8ce5dSThomas Gleixner host OS (hypervisor) context and the other in guest context. The amount 2293ec8ce5dSThomas Gleixner of valuable information from the host OS context depends on the context 2303ec8ce5dSThomas Gleixner which the host OS executes, i.e. interrupts, soft interrupts and kernel 2313ec8ce5dSThomas Gleixner threads. The amount of valuable data from these contexts cannot be 2323ec8ce5dSThomas Gleixner declared as non-interesting for an attacker without deep inspection of 2333ec8ce5dSThomas Gleixner the code. 2343ec8ce5dSThomas Gleixner 2353ec8ce5dSThomas Gleixner **Note**, that assigning guests to a fixed set of physical cores affects 2363ec8ce5dSThomas Gleixner the ability of the scheduler to do load balancing and might have 2373ec8ce5dSThomas Gleixner negative effects on CPU utilization depending on the hosting 2383ec8ce5dSThomas Gleixner scenario. Disabling SMT might be a viable alternative for particular 2393ec8ce5dSThomas Gleixner scenarios. 2403ec8ce5dSThomas Gleixner 2413ec8ce5dSThomas Gleixner For further information about confining guests to a single or to a group 2423ec8ce5dSThomas Gleixner of cores consult the cpusets documentation: 2433ec8ce5dSThomas Gleixner 2444f4cfa6cSMauro Carvalho Chehab https://www.kernel.org/doc/Documentation/admin-guide/cgroup-v1/cpusets.rst 2453ec8ce5dSThomas Gleixner 2463ec8ce5dSThomas Gleixner.. _interrupt_isolation: 2473ec8ce5dSThomas Gleixner 2483ec8ce5dSThomas Gleixner3. Interrupt affinity 2493ec8ce5dSThomas Gleixner^^^^^^^^^^^^^^^^^^^^^ 2503ec8ce5dSThomas Gleixner 2513ec8ce5dSThomas Gleixner Interrupts can be made affine to logical CPUs. This is not universally 2523ec8ce5dSThomas Gleixner true because there are types of interrupts which are truly per CPU 2533ec8ce5dSThomas Gleixner interrupts, e.g. the local timer interrupt. Aside of that multi queue 2543ec8ce5dSThomas Gleixner devices affine their interrupts to single CPUs or groups of CPUs per 2553ec8ce5dSThomas Gleixner queue without allowing the administrator to control the affinities. 2563ec8ce5dSThomas Gleixner 2573ec8ce5dSThomas Gleixner Moving the interrupts, which can be affinity controlled, away from CPUs 2583ec8ce5dSThomas Gleixner which run untrusted guests, reduces the attack vector space. 2593ec8ce5dSThomas Gleixner 2603ec8ce5dSThomas Gleixner Whether the interrupts with are affine to CPUs, which run untrusted 2613ec8ce5dSThomas Gleixner guests, provide interesting data for an attacker depends on the system 2623ec8ce5dSThomas Gleixner configuration and the scenarios which run on the system. While for some 2631949f9f4STony Luck of the interrupts it can be assumed that they won't expose interesting 2643ec8ce5dSThomas Gleixner information beyond exposing hints about the host OS memory layout, there 2653ec8ce5dSThomas Gleixner is no way to make general assumptions. 2663ec8ce5dSThomas Gleixner 2673ec8ce5dSThomas Gleixner Interrupt affinity can be controlled by the administrator via the 2683ec8ce5dSThomas Gleixner /proc/irq/$NR/smp_affinity[_list] files. Limited documentation is 2693ec8ce5dSThomas Gleixner available at: 2703ec8ce5dSThomas Gleixner 271*e00b0ab8SMauro Carvalho Chehab https://www.kernel.org/doc/Documentation/core-api/irq/irq-affinity.rst 2723ec8ce5dSThomas Gleixner 2733ec8ce5dSThomas Gleixner.. _smt_control: 2743ec8ce5dSThomas Gleixner 2753ec8ce5dSThomas Gleixner4. SMT control 2763ec8ce5dSThomas Gleixner^^^^^^^^^^^^^^ 2773ec8ce5dSThomas Gleixner 2783ec8ce5dSThomas Gleixner To prevent the SMT issues of L1TF it might be necessary to disable SMT 2793ec8ce5dSThomas Gleixner completely. Disabling SMT can have a significant performance impact, but 2803ec8ce5dSThomas Gleixner the impact depends on the hosting scenario and the type of workloads. 2813ec8ce5dSThomas Gleixner The impact of disabling SMT needs also to be weighted against the impact 2823ec8ce5dSThomas Gleixner of other mitigation solutions like confining guests to dedicated cores. 2833ec8ce5dSThomas Gleixner 2843ec8ce5dSThomas Gleixner The kernel provides a sysfs interface to retrieve the status of SMT and 2853ec8ce5dSThomas Gleixner to control it. It also provides a kernel command line interface to 2863ec8ce5dSThomas Gleixner control SMT. 2873ec8ce5dSThomas Gleixner 2883ec8ce5dSThomas Gleixner The kernel command line interface consists of the following options: 2893ec8ce5dSThomas Gleixner 2903ec8ce5dSThomas Gleixner =========== ========================================================== 2913ec8ce5dSThomas Gleixner nosmt Affects the bring up of the secondary CPUs during boot. The 2923ec8ce5dSThomas Gleixner kernel tries to bring all present CPUs online during the 2933ec8ce5dSThomas Gleixner boot process. "nosmt" makes sure that from each physical 2943ec8ce5dSThomas Gleixner core only one - the so called primary (hyper) thread is 2953ec8ce5dSThomas Gleixner activated. Due to a design flaw of Intel processors related 2963ec8ce5dSThomas Gleixner to Machine Check Exceptions the non primary siblings have 2973ec8ce5dSThomas Gleixner to be brought up at least partially and are then shut down 2983ec8ce5dSThomas Gleixner again. "nosmt" can be undone via the sysfs interface. 2993ec8ce5dSThomas Gleixner 3001949f9f4STony Luck nosmt=force Has the same effect as "nosmt" but it does not allow to 3013ec8ce5dSThomas Gleixner undo the SMT disable via the sysfs interface. 3023ec8ce5dSThomas Gleixner =========== ========================================================== 3033ec8ce5dSThomas Gleixner 3043ec8ce5dSThomas Gleixner The sysfs interface provides two files: 3053ec8ce5dSThomas Gleixner 3063ec8ce5dSThomas Gleixner - /sys/devices/system/cpu/smt/control 3073ec8ce5dSThomas Gleixner - /sys/devices/system/cpu/smt/active 3083ec8ce5dSThomas Gleixner 3093ec8ce5dSThomas Gleixner /sys/devices/system/cpu/smt/control: 3103ec8ce5dSThomas Gleixner 3113ec8ce5dSThomas Gleixner This file allows to read out the SMT control state and provides the 3123ec8ce5dSThomas Gleixner ability to disable or (re)enable SMT. The possible states are: 3133ec8ce5dSThomas Gleixner 3143ec8ce5dSThomas Gleixner ============== =================================================== 3153ec8ce5dSThomas Gleixner on SMT is supported by the CPU and enabled. All 3163ec8ce5dSThomas Gleixner logical CPUs can be onlined and offlined without 3173ec8ce5dSThomas Gleixner restrictions. 3183ec8ce5dSThomas Gleixner 3193ec8ce5dSThomas Gleixner off SMT is supported by the CPU and disabled. Only 3203ec8ce5dSThomas Gleixner the so called primary SMT threads can be onlined 3213ec8ce5dSThomas Gleixner and offlined without restrictions. An attempt to 3223ec8ce5dSThomas Gleixner online a non-primary sibling is rejected 3233ec8ce5dSThomas Gleixner 3243ec8ce5dSThomas Gleixner forceoff Same as 'off' but the state cannot be controlled. 3253ec8ce5dSThomas Gleixner Attempts to write to the control file are rejected. 3263ec8ce5dSThomas Gleixner 3273ec8ce5dSThomas Gleixner notsupported The processor does not support SMT. It's therefore 3283ec8ce5dSThomas Gleixner not affected by the SMT implications of L1TF. 3293ec8ce5dSThomas Gleixner Attempts to write to the control file are rejected. 3303ec8ce5dSThomas Gleixner ============== =================================================== 3313ec8ce5dSThomas Gleixner 3323ec8ce5dSThomas Gleixner The possible states which can be written into this file to control SMT 3333ec8ce5dSThomas Gleixner state are: 3343ec8ce5dSThomas Gleixner 3353ec8ce5dSThomas Gleixner - on 3363ec8ce5dSThomas Gleixner - off 3373ec8ce5dSThomas Gleixner - forceoff 3383ec8ce5dSThomas Gleixner 3393ec8ce5dSThomas Gleixner /sys/devices/system/cpu/smt/active: 3403ec8ce5dSThomas Gleixner 3413ec8ce5dSThomas Gleixner This file reports whether SMT is enabled and active, i.e. if on any 3423ec8ce5dSThomas Gleixner physical core two or more sibling threads are online. 3433ec8ce5dSThomas Gleixner 3443ec8ce5dSThomas Gleixner SMT control is also possible at boot time via the l1tf kernel command 3453ec8ce5dSThomas Gleixner line parameter in combination with L1D flush control. See 3463ec8ce5dSThomas Gleixner :ref:`mitigation_control_command_line`. 3473ec8ce5dSThomas Gleixner 3483ec8ce5dSThomas Gleixner5. Disabling EPT 3493ec8ce5dSThomas Gleixner^^^^^^^^^^^^^^^^ 3503ec8ce5dSThomas Gleixner 3513ec8ce5dSThomas Gleixner Disabling EPT for virtual machines provides full mitigation for L1TF even 3523ec8ce5dSThomas Gleixner with SMT enabled, because the effective page tables for guests are 3533ec8ce5dSThomas Gleixner managed and sanitized by the hypervisor. Though disabling EPT has a 3543ec8ce5dSThomas Gleixner significant performance impact especially when the Meltdown mitigation 3553ec8ce5dSThomas Gleixner KPTI is enabled. 3563ec8ce5dSThomas Gleixner 3573ec8ce5dSThomas Gleixner EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter. 3583ec8ce5dSThomas Gleixner 3593ec8ce5dSThomas GleixnerThere is ongoing research and development for new mitigation mechanisms to 3603ec8ce5dSThomas Gleixneraddress the performance impact of disabling SMT or EPT. 3613ec8ce5dSThomas Gleixner 3623ec8ce5dSThomas Gleixner.. _mitigation_control_command_line: 3633ec8ce5dSThomas Gleixner 3643ec8ce5dSThomas GleixnerMitigation control on the kernel command line 3653ec8ce5dSThomas Gleixner--------------------------------------------- 3663ec8ce5dSThomas Gleixner 3673ec8ce5dSThomas GleixnerThe kernel command line allows to control the L1TF mitigations at boot 3683ec8ce5dSThomas Gleixnertime with the option "l1tf=". The valid arguments for this option are: 3693ec8ce5dSThomas Gleixner 3703ec8ce5dSThomas Gleixner ============ ============================================================= 3713ec8ce5dSThomas Gleixner full Provides all available mitigations for the L1TF 3723ec8ce5dSThomas Gleixner vulnerability. Disables SMT and enables all mitigations in 3733ec8ce5dSThomas Gleixner the hypervisors, i.e. unconditional L1D flushing 3743ec8ce5dSThomas Gleixner 3753ec8ce5dSThomas Gleixner SMT control and L1D flush control via the sysfs interface 3763ec8ce5dSThomas Gleixner is still possible after boot. Hypervisors will issue a 3773ec8ce5dSThomas Gleixner warning when the first VM is started in a potentially 3783ec8ce5dSThomas Gleixner insecure configuration, i.e. SMT enabled or L1D flush 3793ec8ce5dSThomas Gleixner disabled. 3803ec8ce5dSThomas Gleixner 3813ec8ce5dSThomas Gleixner full,force Same as 'full', but disables SMT and L1D flush runtime 3823ec8ce5dSThomas Gleixner control. Implies the 'nosmt=force' command line option. 3833ec8ce5dSThomas Gleixner (i.e. sysfs control of SMT is disabled.) 3843ec8ce5dSThomas Gleixner 3853ec8ce5dSThomas Gleixner flush Leaves SMT enabled and enables the default hypervisor 3863ec8ce5dSThomas Gleixner mitigation, i.e. conditional L1D flushing 3873ec8ce5dSThomas Gleixner 3883ec8ce5dSThomas Gleixner SMT control and L1D flush control via the sysfs interface 3893ec8ce5dSThomas Gleixner is still possible after boot. Hypervisors will issue a 3903ec8ce5dSThomas Gleixner warning when the first VM is started in a potentially 3913ec8ce5dSThomas Gleixner insecure configuration, i.e. SMT enabled or L1D flush 3923ec8ce5dSThomas Gleixner disabled. 3933ec8ce5dSThomas Gleixner 3943ec8ce5dSThomas Gleixner flush,nosmt Disables SMT and enables the default hypervisor mitigation, 3953ec8ce5dSThomas Gleixner i.e. conditional L1D flushing. 3963ec8ce5dSThomas Gleixner 3973ec8ce5dSThomas Gleixner SMT control and L1D flush control via the sysfs interface 3983ec8ce5dSThomas Gleixner is still possible after boot. Hypervisors will issue a 3993ec8ce5dSThomas Gleixner warning when the first VM is started in a potentially 4003ec8ce5dSThomas Gleixner insecure configuration, i.e. SMT enabled or L1D flush 4013ec8ce5dSThomas Gleixner disabled. 4023ec8ce5dSThomas Gleixner 4033ec8ce5dSThomas Gleixner flush,nowarn Same as 'flush', but hypervisors will not warn when a VM is 4043ec8ce5dSThomas Gleixner started in a potentially insecure configuration. 4053ec8ce5dSThomas Gleixner 4063ec8ce5dSThomas Gleixner off Disables hypervisor mitigations and doesn't emit any 4073ec8ce5dSThomas Gleixner warnings. 4085b5e4d62SMichal Hocko It also drops the swap size and available RAM limit restrictions 4095b5e4d62SMichal Hocko on both hypervisor and bare metal. 4105b5e4d62SMichal Hocko 4113ec8ce5dSThomas Gleixner ============ ============================================================= 4123ec8ce5dSThomas Gleixner 4133ec8ce5dSThomas GleixnerThe default is 'flush'. For details about L1D flushing see :ref:`l1d_flush`. 4143ec8ce5dSThomas Gleixner 4153ec8ce5dSThomas Gleixner 4163ec8ce5dSThomas Gleixner.. _mitigation_control_kvm: 4173ec8ce5dSThomas Gleixner 4183ec8ce5dSThomas GleixnerMitigation control for KVM - module parameter 4193ec8ce5dSThomas Gleixner------------------------------------------------------------- 4203ec8ce5dSThomas Gleixner 4213ec8ce5dSThomas GleixnerThe KVM hypervisor mitigation mechanism, flushing the L1D cache when 4223ec8ce5dSThomas Gleixnerentering a guest, can be controlled with a module parameter. 4233ec8ce5dSThomas Gleixner 4243ec8ce5dSThomas GleixnerThe option/parameter is "kvm-intel.vmentry_l1d_flush=". It takes the 4253ec8ce5dSThomas Gleixnerfollowing arguments: 4263ec8ce5dSThomas Gleixner 4273ec8ce5dSThomas Gleixner ============ ============================================================== 4283ec8ce5dSThomas Gleixner always L1D cache flush on every VMENTER. 4293ec8ce5dSThomas Gleixner 4303ec8ce5dSThomas Gleixner cond Flush L1D on VMENTER only when the code between VMEXIT and 4313ec8ce5dSThomas Gleixner VMENTER can leak host memory which is considered 4323ec8ce5dSThomas Gleixner interesting for an attacker. This still can leak host memory 4333ec8ce5dSThomas Gleixner which allows e.g. to determine the hosts address space layout. 4343ec8ce5dSThomas Gleixner 4353ec8ce5dSThomas Gleixner never Disables the mitigation 4363ec8ce5dSThomas Gleixner ============ ============================================================== 4373ec8ce5dSThomas Gleixner 4383ec8ce5dSThomas GleixnerThe parameter can be provided on the kernel command line, as a module 4393ec8ce5dSThomas Gleixnerparameter when loading the modules and at runtime modified via the sysfs 4403ec8ce5dSThomas Gleixnerfile: 4413ec8ce5dSThomas Gleixner 4423ec8ce5dSThomas Gleixner/sys/module/kvm_intel/parameters/vmentry_l1d_flush 4433ec8ce5dSThomas Gleixner 4443ec8ce5dSThomas GleixnerThe default is 'cond'. If 'l1tf=full,force' is given on the kernel command 4453ec8ce5dSThomas Gleixnerline, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush 4463ec8ce5dSThomas Gleixnermodule parameter is ignored and writes to the sysfs file are rejected. 4473ec8ce5dSThomas Gleixner 4485999bbe7SThomas Gleixner.. _mitigation_selection: 4493ec8ce5dSThomas Gleixner 4503ec8ce5dSThomas GleixnerMitigation selection guide 4513ec8ce5dSThomas Gleixner-------------------------- 4523ec8ce5dSThomas Gleixner 4533ec8ce5dSThomas Gleixner1. No virtualization in use 4543ec8ce5dSThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^^^^^ 4553ec8ce5dSThomas Gleixner 4563ec8ce5dSThomas Gleixner The system is protected by the kernel unconditionally and no further 4573ec8ce5dSThomas Gleixner action is required. 4583ec8ce5dSThomas Gleixner 4593ec8ce5dSThomas Gleixner2. Virtualization with trusted guests 4603ec8ce5dSThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 4613ec8ce5dSThomas Gleixner 4623ec8ce5dSThomas Gleixner If the guest comes from a trusted source and the guest OS kernel is 4633ec8ce5dSThomas Gleixner guaranteed to have the L1TF mitigations in place the system is fully 4643ec8ce5dSThomas Gleixner protected against L1TF and no further action is required. 4653ec8ce5dSThomas Gleixner 4663ec8ce5dSThomas Gleixner To avoid the overhead of the default L1D flushing on VMENTER the 4673ec8ce5dSThomas Gleixner administrator can disable the flushing via the kernel command line and 4683ec8ce5dSThomas Gleixner sysfs control files. See :ref:`mitigation_control_command_line` and 4693ec8ce5dSThomas Gleixner :ref:`mitigation_control_kvm`. 4703ec8ce5dSThomas Gleixner 4713ec8ce5dSThomas Gleixner 4723ec8ce5dSThomas Gleixner3. Virtualization with untrusted guests 4733ec8ce5dSThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 4743ec8ce5dSThomas Gleixner 4753ec8ce5dSThomas Gleixner3.1. SMT not supported or disabled 4763ec8ce5dSThomas Gleixner"""""""""""""""""""""""""""""""""" 4773ec8ce5dSThomas Gleixner 4783ec8ce5dSThomas Gleixner If SMT is not supported by the processor or disabled in the BIOS or by 4793ec8ce5dSThomas Gleixner the kernel, it's only required to enforce L1D flushing on VMENTER. 4803ec8ce5dSThomas Gleixner 4813ec8ce5dSThomas Gleixner Conditional L1D flushing is the default behaviour and can be tuned. See 4823ec8ce5dSThomas Gleixner :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`. 4833ec8ce5dSThomas Gleixner 4843ec8ce5dSThomas Gleixner3.2. EPT not supported or disabled 4853ec8ce5dSThomas Gleixner"""""""""""""""""""""""""""""""""" 4863ec8ce5dSThomas Gleixner 4873ec8ce5dSThomas Gleixner If EPT is not supported by the processor or disabled in the hypervisor, 4883ec8ce5dSThomas Gleixner the system is fully protected. SMT can stay enabled and L1D flushing on 4893ec8ce5dSThomas Gleixner VMENTER is not required. 4903ec8ce5dSThomas Gleixner 4913ec8ce5dSThomas Gleixner EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter. 4923ec8ce5dSThomas Gleixner 4933ec8ce5dSThomas Gleixner3.3. SMT and EPT supported and active 4943ec8ce5dSThomas Gleixner""""""""""""""""""""""""""""""""""""" 4953ec8ce5dSThomas Gleixner 4963ec8ce5dSThomas Gleixner If SMT and EPT are supported and active then various degrees of 4973ec8ce5dSThomas Gleixner mitigations can be employed: 4983ec8ce5dSThomas Gleixner 4993ec8ce5dSThomas Gleixner - L1D flushing on VMENTER: 5003ec8ce5dSThomas Gleixner 5013ec8ce5dSThomas Gleixner L1D flushing on VMENTER is the minimal protection requirement, but it 5023ec8ce5dSThomas Gleixner is only potent in combination with other mitigation methods. 5033ec8ce5dSThomas Gleixner 5043ec8ce5dSThomas Gleixner Conditional L1D flushing is the default behaviour and can be tuned. See 5053ec8ce5dSThomas Gleixner :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`. 5063ec8ce5dSThomas Gleixner 5073ec8ce5dSThomas Gleixner - Guest confinement: 5083ec8ce5dSThomas Gleixner 5093ec8ce5dSThomas Gleixner Confinement of guests to a single or a group of physical cores which 5103ec8ce5dSThomas Gleixner are not running any other processes, can reduce the attack surface 5113ec8ce5dSThomas Gleixner significantly, but interrupts, soft interrupts and kernel threads can 5123ec8ce5dSThomas Gleixner still expose valuable data to a potential attacker. See 5133ec8ce5dSThomas Gleixner :ref:`guest_confinement`. 5143ec8ce5dSThomas Gleixner 5153ec8ce5dSThomas Gleixner - Interrupt isolation: 5163ec8ce5dSThomas Gleixner 5173ec8ce5dSThomas Gleixner Isolating the guest CPUs from interrupts can reduce the attack surface 5183ec8ce5dSThomas Gleixner further, but still allows a malicious guest to explore a limited amount 5193ec8ce5dSThomas Gleixner of host physical memory. This can at least be used to gain knowledge 5203ec8ce5dSThomas Gleixner about the host address space layout. The interrupts which have a fixed 5213ec8ce5dSThomas Gleixner affinity to the CPUs which run the untrusted guests can depending on 5223ec8ce5dSThomas Gleixner the scenario still trigger soft interrupts and schedule kernel threads 5233ec8ce5dSThomas Gleixner which might expose valuable information. See 5243ec8ce5dSThomas Gleixner :ref:`interrupt_isolation`. 5253ec8ce5dSThomas Gleixner 5263ec8ce5dSThomas GleixnerThe above three mitigation methods combined can provide protection to a 5273ec8ce5dSThomas Gleixnercertain degree, but the risk of the remaining attack surface has to be 5283ec8ce5dSThomas Gleixnercarefully analyzed. For full protection the following methods are 5293ec8ce5dSThomas Gleixneravailable: 5303ec8ce5dSThomas Gleixner 5313ec8ce5dSThomas Gleixner - Disabling SMT: 5323ec8ce5dSThomas Gleixner 5333ec8ce5dSThomas Gleixner Disabling SMT and enforcing the L1D flushing provides the maximum 5343ec8ce5dSThomas Gleixner amount of protection. This mitigation is not depending on any of the 5353ec8ce5dSThomas Gleixner above mitigation methods. 5363ec8ce5dSThomas Gleixner 5373ec8ce5dSThomas Gleixner SMT control and L1D flushing can be tuned by the command line 5383ec8ce5dSThomas Gleixner parameters 'nosmt', 'l1tf', 'kvm-intel.vmentry_l1d_flush' and at run 5393ec8ce5dSThomas Gleixner time with the matching sysfs control files. See :ref:`smt_control`, 5403ec8ce5dSThomas Gleixner :ref:`mitigation_control_command_line` and 5413ec8ce5dSThomas Gleixner :ref:`mitigation_control_kvm`. 5423ec8ce5dSThomas Gleixner 5433ec8ce5dSThomas Gleixner - Disabling EPT: 5443ec8ce5dSThomas Gleixner 5453ec8ce5dSThomas Gleixner Disabling EPT provides the maximum amount of protection as well. It is 5463ec8ce5dSThomas Gleixner not depending on any of the above mitigation methods. SMT can stay 5473ec8ce5dSThomas Gleixner enabled and L1D flushing is not required, but the performance impact is 5483ec8ce5dSThomas Gleixner significant. 5493ec8ce5dSThomas Gleixner 5503ec8ce5dSThomas Gleixner EPT can be disabled in the hypervisor via the 'kvm-intel.ept' 5513ec8ce5dSThomas Gleixner parameter. 5523ec8ce5dSThomas Gleixner 5535b76a3cfSPaolo Bonzini3.4. Nested virtual machines 5545b76a3cfSPaolo Bonzini"""""""""""""""""""""""""""" 5555b76a3cfSPaolo Bonzini 5565b76a3cfSPaolo BonziniWhen nested virtualization is in use, three operating systems are involved: 5575b76a3cfSPaolo Bonzinithe bare metal hypervisor, the nested hypervisor and the nested virtual 5585b76a3cfSPaolo Bonzinimachine. VMENTER operations from the nested hypervisor into the nested 5595b76a3cfSPaolo Bonziniguest will always be processed by the bare metal hypervisor. If KVM is the 56060ca05c3SSalvatore Bonaccorsobare metal hypervisor it will: 5615b76a3cfSPaolo Bonzini 5625b76a3cfSPaolo Bonzini - Flush the L1D cache on every switch from the nested hypervisor to the 5635b76a3cfSPaolo Bonzini nested virtual machine, so that the nested hypervisor's secrets are not 5645b76a3cfSPaolo Bonzini exposed to the nested virtual machine; 5655b76a3cfSPaolo Bonzini 5665b76a3cfSPaolo Bonzini - Flush the L1D cache on every switch from the nested virtual machine to 5675b76a3cfSPaolo Bonzini the nested hypervisor; this is a complex operation, and flushing the L1D 5685b76a3cfSPaolo Bonzini cache avoids that the bare metal hypervisor's secrets are exposed to the 5695b76a3cfSPaolo Bonzini nested virtual machine; 5705b76a3cfSPaolo Bonzini 5715b76a3cfSPaolo Bonzini - Instruct the nested hypervisor to not perform any L1D cache flush. This 5725b76a3cfSPaolo Bonzini is an optimization to avoid double L1D flushing. 5735b76a3cfSPaolo Bonzini 5743ec8ce5dSThomas Gleixner 5753ec8ce5dSThomas Gleixner.. _default_mitigations: 5763ec8ce5dSThomas Gleixner 5773ec8ce5dSThomas GleixnerDefault mitigations 5783ec8ce5dSThomas Gleixner------------------- 5793ec8ce5dSThomas Gleixner 5803ec8ce5dSThomas Gleixner The kernel default mitigations for vulnerable processors are: 5813ec8ce5dSThomas Gleixner 5823ec8ce5dSThomas Gleixner - PTE inversion to protect against malicious user space. This is done 5835b5e4d62SMichal Hocko unconditionally and cannot be controlled. The swap storage is limited 5845b5e4d62SMichal Hocko to ~16TB. 5853ec8ce5dSThomas Gleixner 5863ec8ce5dSThomas Gleixner - L1D conditional flushing on VMENTER when EPT is enabled for 5873ec8ce5dSThomas Gleixner a guest. 5883ec8ce5dSThomas Gleixner 5893ec8ce5dSThomas Gleixner The kernel does not by default enforce the disabling of SMT, which leaves 5903ec8ce5dSThomas Gleixner SMT systems vulnerable when running untrusted guests with EPT enabled. 5913ec8ce5dSThomas Gleixner 5923ec8ce5dSThomas Gleixner The rationale for this choice is: 5933ec8ce5dSThomas Gleixner 5943ec8ce5dSThomas Gleixner - Force disabling SMT can break existing setups, especially with 5953ec8ce5dSThomas Gleixner unattended updates. 5963ec8ce5dSThomas Gleixner 5973ec8ce5dSThomas Gleixner - If regular users run untrusted guests on their machine, then L1TF is 5983ec8ce5dSThomas Gleixner just an add on to other malware which might be embedded in an untrusted 5993ec8ce5dSThomas Gleixner guest, e.g. spam-bots or attacks on the local network. 6003ec8ce5dSThomas Gleixner 6013ec8ce5dSThomas Gleixner There is no technical way to prevent a user from running untrusted code 6023ec8ce5dSThomas Gleixner on their machines blindly. 6033ec8ce5dSThomas Gleixner 6043ec8ce5dSThomas Gleixner - It's technically extremely unlikely and from today's knowledge even 6053ec8ce5dSThomas Gleixner impossible that L1TF can be exploited via the most popular attack 6063ec8ce5dSThomas Gleixner mechanisms like JavaScript because these mechanisms have no way to 6073ec8ce5dSThomas Gleixner control PTEs. If this would be possible and not other mitigation would 6083ec8ce5dSThomas Gleixner be possible, then the default might be different. 6093ec8ce5dSThomas Gleixner 6103ec8ce5dSThomas Gleixner - The administrators of cloud and hosting setups have to carefully 6113ec8ce5dSThomas Gleixner analyze the risk for their scenarios and make the appropriate 6123ec8ce5dSThomas Gleixner mitigation choices, which might even vary across their deployed 6133ec8ce5dSThomas Gleixner machines and also result in other changes of their overall setup. 6143ec8ce5dSThomas Gleixner There is no way for the kernel to provide a sensible default for this 6153ec8ce5dSThomas Gleixner kind of scenarios. 616