xref: /linux/Documentation/admin-guide/hw-vuln/l1tf.rst (revision 4b4193256c8d3bc3a5397b5cd9494c2ad386317d)
13ec8ce5dSThomas GleixnerL1TF - L1 Terminal Fault
23ec8ce5dSThomas Gleixner========================
33ec8ce5dSThomas Gleixner
43ec8ce5dSThomas GleixnerL1 Terminal Fault is a hardware vulnerability which allows unprivileged
53ec8ce5dSThomas Gleixnerspeculative access to data which is available in the Level 1 Data Cache
63ec8ce5dSThomas Gleixnerwhen the page table entry controlling the virtual address, which is used
73ec8ce5dSThomas Gleixnerfor the access, has the Present bit cleared or other reserved bits set.
83ec8ce5dSThomas Gleixner
93ec8ce5dSThomas GleixnerAffected processors
103ec8ce5dSThomas Gleixner-------------------
113ec8ce5dSThomas Gleixner
123ec8ce5dSThomas GleixnerThis vulnerability affects a wide range of Intel processors. The
133ec8ce5dSThomas Gleixnervulnerability is not present on:
143ec8ce5dSThomas Gleixner
153ec8ce5dSThomas Gleixner   - Processors from AMD, Centaur and other non Intel vendors
163ec8ce5dSThomas Gleixner
173ec8ce5dSThomas Gleixner   - Older processor models, where the CPU family is < 6
183ec8ce5dSThomas Gleixner
193ec8ce5dSThomas Gleixner   - A range of Intel ATOM processors (Cedarview, Cloverview, Lincroft,
201949f9f4STony Luck     Penwell, Pineview, Silvermont, Airmont, Merrifield)
213ec8ce5dSThomas Gleixner
223ec8ce5dSThomas Gleixner   - The Intel XEON PHI family
233ec8ce5dSThomas Gleixner
243ec8ce5dSThomas Gleixner   - Intel processors which have the ARCH_CAP_RDCL_NO bit set in the
253ec8ce5dSThomas Gleixner     IA32_ARCH_CAPABILITIES MSR. If the bit is set the CPU is not affected
263ec8ce5dSThomas Gleixner     by the Meltdown vulnerability either. These CPUs should become
273ec8ce5dSThomas Gleixner     available by end of 2018.
283ec8ce5dSThomas Gleixner
293ec8ce5dSThomas GleixnerWhether a processor is affected or not can be read out from the L1TF
303ec8ce5dSThomas Gleixnervulnerability file in sysfs. See :ref:`l1tf_sys_info`.
313ec8ce5dSThomas Gleixner
323ec8ce5dSThomas GleixnerRelated CVEs
333ec8ce5dSThomas Gleixner------------
343ec8ce5dSThomas Gleixner
353ec8ce5dSThomas GleixnerThe following CVE entries are related to the L1TF vulnerability:
363ec8ce5dSThomas Gleixner
373ec8ce5dSThomas Gleixner   =============  =================  ==============================
383ec8ce5dSThomas Gleixner   CVE-2018-3615  L1 Terminal Fault  SGX related aspects
393ec8ce5dSThomas Gleixner   CVE-2018-3620  L1 Terminal Fault  OS, SMM related aspects
403ec8ce5dSThomas Gleixner   CVE-2018-3646  L1 Terminal Fault  Virtualization related aspects
413ec8ce5dSThomas Gleixner   =============  =================  ==============================
423ec8ce5dSThomas Gleixner
433ec8ce5dSThomas GleixnerProblem
443ec8ce5dSThomas Gleixner-------
453ec8ce5dSThomas Gleixner
463ec8ce5dSThomas GleixnerIf an instruction accesses a virtual address for which the relevant page
473ec8ce5dSThomas Gleixnertable entry (PTE) has the Present bit cleared or other reserved bits set,
483ec8ce5dSThomas Gleixnerthen speculative execution ignores the invalid PTE and loads the referenced
493ec8ce5dSThomas Gleixnerdata if it is present in the Level 1 Data Cache, as if the page referenced
503ec8ce5dSThomas Gleixnerby the address bits in the PTE was still present and accessible.
513ec8ce5dSThomas Gleixner
523ec8ce5dSThomas GleixnerWhile this is a purely speculative mechanism and the instruction will raise
533ec8ce5dSThomas Gleixnera page fault when it is retired eventually, the pure act of loading the
543ec8ce5dSThomas Gleixnerdata and making it available to other speculative instructions opens up the
553ec8ce5dSThomas Gleixneropportunity for side channel attacks to unprivileged malicious code,
563ec8ce5dSThomas Gleixnersimilar to the Meltdown attack.
573ec8ce5dSThomas Gleixner
583ec8ce5dSThomas GleixnerWhile Meltdown breaks the user space to kernel space protection, L1TF
593ec8ce5dSThomas Gleixnerallows to attack any physical memory address in the system and the attack
603ec8ce5dSThomas Gleixnerworks across all protection domains. It allows an attack of SGX and also
613ec8ce5dSThomas Gleixnerworks from inside virtual machines because the speculation bypasses the
623ec8ce5dSThomas Gleixnerextended page table (EPT) protection mechanism.
633ec8ce5dSThomas Gleixner
643ec8ce5dSThomas Gleixner
653ec8ce5dSThomas GleixnerAttack scenarios
663ec8ce5dSThomas Gleixner----------------
673ec8ce5dSThomas Gleixner
683ec8ce5dSThomas Gleixner1. Malicious user space
693ec8ce5dSThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^
703ec8ce5dSThomas Gleixner
713ec8ce5dSThomas Gleixner   Operating Systems store arbitrary information in the address bits of a
723ec8ce5dSThomas Gleixner   PTE which is marked non present. This allows a malicious user space
733ec8ce5dSThomas Gleixner   application to attack the physical memory to which these PTEs resolve.
743ec8ce5dSThomas Gleixner   In some cases user-space can maliciously influence the information
753ec8ce5dSThomas Gleixner   encoded in the address bits of the PTE, thus making attacks more
763ec8ce5dSThomas Gleixner   deterministic and more practical.
773ec8ce5dSThomas Gleixner
783ec8ce5dSThomas Gleixner   The Linux kernel contains a mitigation for this attack vector, PTE
793ec8ce5dSThomas Gleixner   inversion, which is permanently enabled and has no performance
803ec8ce5dSThomas Gleixner   impact. The kernel ensures that the address bits of PTEs, which are not
813ec8ce5dSThomas Gleixner   marked present, never point to cacheable physical memory space.
823ec8ce5dSThomas Gleixner
833ec8ce5dSThomas Gleixner   A system with an up to date kernel is protected against attacks from
843ec8ce5dSThomas Gleixner   malicious user space applications.
853ec8ce5dSThomas Gleixner
863ec8ce5dSThomas Gleixner2. Malicious guest in a virtual machine
873ec8ce5dSThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
883ec8ce5dSThomas Gleixner
893ec8ce5dSThomas Gleixner   The fact that L1TF breaks all domain protections allows malicious guest
903ec8ce5dSThomas Gleixner   OSes, which can control the PTEs directly, and malicious guest user
913ec8ce5dSThomas Gleixner   space applications, which run on an unprotected guest kernel lacking the
923ec8ce5dSThomas Gleixner   PTE inversion mitigation for L1TF, to attack physical host memory.
933ec8ce5dSThomas Gleixner
943ec8ce5dSThomas Gleixner   A special aspect of L1TF in the context of virtualization is symmetric
953ec8ce5dSThomas Gleixner   multi threading (SMT). The Intel implementation of SMT is called
963ec8ce5dSThomas Gleixner   HyperThreading. The fact that Hyperthreads on the affected processors
973ec8ce5dSThomas Gleixner   share the L1 Data Cache (L1D) is important for this. As the flaw allows
983ec8ce5dSThomas Gleixner   only to attack data which is present in L1D, a malicious guest running
993ec8ce5dSThomas Gleixner   on one Hyperthread can attack the data which is brought into the L1D by
1003ec8ce5dSThomas Gleixner   the context which runs on the sibling Hyperthread of the same physical
1013ec8ce5dSThomas Gleixner   core. This context can be host OS, host user space or a different guest.
1023ec8ce5dSThomas Gleixner
1033ec8ce5dSThomas Gleixner   If the processor does not support Extended Page Tables, the attack is
1043ec8ce5dSThomas Gleixner   only possible, when the hypervisor does not sanitize the content of the
1053ec8ce5dSThomas Gleixner   effective (shadow) page tables.
1063ec8ce5dSThomas Gleixner
1073ec8ce5dSThomas Gleixner   While solutions exist to mitigate these attack vectors fully, these
1083ec8ce5dSThomas Gleixner   mitigations are not enabled by default in the Linux kernel because they
1093ec8ce5dSThomas Gleixner   can affect performance significantly. The kernel provides several
1103ec8ce5dSThomas Gleixner   mechanisms which can be utilized to address the problem depending on the
1113ec8ce5dSThomas Gleixner   deployment scenario. The mitigations, their protection scope and impact
1123ec8ce5dSThomas Gleixner   are described in the next sections.
1133ec8ce5dSThomas Gleixner
1141949f9f4STony Luck   The default mitigations and the rationale for choosing them are explained
1153ec8ce5dSThomas Gleixner   at the end of this document. See :ref:`default_mitigations`.
1163ec8ce5dSThomas Gleixner
1173ec8ce5dSThomas Gleixner.. _l1tf_sys_info:
1183ec8ce5dSThomas Gleixner
1193ec8ce5dSThomas GleixnerL1TF system information
1203ec8ce5dSThomas Gleixner-----------------------
1213ec8ce5dSThomas Gleixner
1223ec8ce5dSThomas GleixnerThe Linux kernel provides a sysfs interface to enumerate the current L1TF
1233ec8ce5dSThomas Gleixnerstatus of the system: whether the system is vulnerable, and which
1243ec8ce5dSThomas Gleixnermitigations are active. The relevant sysfs file is:
1253ec8ce5dSThomas Gleixner
1263ec8ce5dSThomas Gleixner/sys/devices/system/cpu/vulnerabilities/l1tf
1273ec8ce5dSThomas Gleixner
1283ec8ce5dSThomas GleixnerThe possible values in this file are:
1293ec8ce5dSThomas Gleixner
1303ec8ce5dSThomas Gleixner  ===========================   ===============================
1313ec8ce5dSThomas Gleixner  'Not affected'		The processor is not vulnerable
1323ec8ce5dSThomas Gleixner  'Mitigation: PTE Inversion'	The host protection is active
1333ec8ce5dSThomas Gleixner  ===========================   ===============================
1343ec8ce5dSThomas Gleixner
1353ec8ce5dSThomas GleixnerIf KVM/VMX is enabled and the processor is vulnerable then the following
1363ec8ce5dSThomas Gleixnerinformation is appended to the 'Mitigation: PTE Inversion' part:
1373ec8ce5dSThomas Gleixner
1383ec8ce5dSThomas Gleixner  - SMT status:
1393ec8ce5dSThomas Gleixner
1403ec8ce5dSThomas Gleixner    =====================  ================
1413ec8ce5dSThomas Gleixner    'VMX: SMT vulnerable'  SMT is enabled
1423ec8ce5dSThomas Gleixner    'VMX: SMT disabled'    SMT is disabled
1433ec8ce5dSThomas Gleixner    =====================  ================
1443ec8ce5dSThomas Gleixner
1453ec8ce5dSThomas Gleixner  - L1D Flush mode:
1463ec8ce5dSThomas Gleixner
1473ec8ce5dSThomas Gleixner    ================================  ====================================
1483ec8ce5dSThomas Gleixner    'L1D vulnerable'		      L1D flushing is disabled
1493ec8ce5dSThomas Gleixner
1503ec8ce5dSThomas Gleixner    'L1D conditional cache flushes'   L1D flush is conditionally enabled
1513ec8ce5dSThomas Gleixner
1523ec8ce5dSThomas Gleixner    'L1D cache flushes'		      L1D flush is unconditionally enabled
1533ec8ce5dSThomas Gleixner    ================================  ====================================
1543ec8ce5dSThomas Gleixner
1553ec8ce5dSThomas GleixnerThe resulting grade of protection is discussed in the following sections.
1563ec8ce5dSThomas Gleixner
1573ec8ce5dSThomas Gleixner
1583ec8ce5dSThomas GleixnerHost mitigation mechanism
1593ec8ce5dSThomas Gleixner-------------------------
1603ec8ce5dSThomas Gleixner
1613ec8ce5dSThomas GleixnerThe kernel is unconditionally protected against L1TF attacks from malicious
1623ec8ce5dSThomas Gleixneruser space running on the host.
1633ec8ce5dSThomas Gleixner
1643ec8ce5dSThomas Gleixner
1653ec8ce5dSThomas GleixnerGuest mitigation mechanisms
1663ec8ce5dSThomas Gleixner---------------------------
1673ec8ce5dSThomas Gleixner
1683ec8ce5dSThomas Gleixner.. _l1d_flush:
1693ec8ce5dSThomas Gleixner
1703ec8ce5dSThomas Gleixner1. L1D flush on VMENTER
1713ec8ce5dSThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^
1723ec8ce5dSThomas Gleixner
1733ec8ce5dSThomas Gleixner   To make sure that a guest cannot attack data which is present in the L1D
1743ec8ce5dSThomas Gleixner   the hypervisor flushes the L1D before entering the guest.
1753ec8ce5dSThomas Gleixner
1763ec8ce5dSThomas Gleixner   Flushing the L1D evicts not only the data which should not be accessed
1773ec8ce5dSThomas Gleixner   by a potentially malicious guest, it also flushes the guest
1783ec8ce5dSThomas Gleixner   data. Flushing the L1D has a performance impact as the processor has to
1793ec8ce5dSThomas Gleixner   bring the flushed guest data back into the L1D. Depending on the
1803ec8ce5dSThomas Gleixner   frequency of VMEXIT/VMENTER and the type of computations in the guest
1813ec8ce5dSThomas Gleixner   performance degradation in the range of 1% to 50% has been observed. For
1823ec8ce5dSThomas Gleixner   scenarios where guest VMEXIT/VMENTER are rare the performance impact is
1833ec8ce5dSThomas Gleixner   minimal. Virtio and mechanisms like posted interrupts are designed to
1843ec8ce5dSThomas Gleixner   confine the VMEXITs to a bare minimum, but specific configurations and
1853ec8ce5dSThomas Gleixner   application scenarios might still suffer from a high VMEXIT rate.
1863ec8ce5dSThomas Gleixner
1873ec8ce5dSThomas Gleixner   The kernel provides two L1D flush modes:
1883ec8ce5dSThomas Gleixner    - conditional ('cond')
1893ec8ce5dSThomas Gleixner    - unconditional ('always')
1903ec8ce5dSThomas Gleixner
1913ec8ce5dSThomas Gleixner   The conditional mode avoids L1D flushing after VMEXITs which execute
1921949f9f4STony Luck   only audited code paths before the corresponding VMENTER. These code
1931949f9f4STony Luck   paths have been verified that they cannot expose secrets or other
1943ec8ce5dSThomas Gleixner   interesting data to an attacker, but they can leak information about the
1953ec8ce5dSThomas Gleixner   address space layout of the hypervisor.
1963ec8ce5dSThomas Gleixner
1973ec8ce5dSThomas Gleixner   Unconditional mode flushes L1D on all VMENTER invocations and provides
1983ec8ce5dSThomas Gleixner   maximum protection. It has a higher overhead than the conditional
1993ec8ce5dSThomas Gleixner   mode. The overhead cannot be quantified correctly as it depends on the
2003ec8ce5dSThomas Gleixner   workload scenario and the resulting number of VMEXITs.
2013ec8ce5dSThomas Gleixner
2023ec8ce5dSThomas Gleixner   The general recommendation is to enable L1D flush on VMENTER. The kernel
2033ec8ce5dSThomas Gleixner   defaults to conditional mode on affected processors.
2043ec8ce5dSThomas Gleixner
2053ec8ce5dSThomas Gleixner   **Note**, that L1D flush does not prevent the SMT problem because the
2063ec8ce5dSThomas Gleixner   sibling thread will also bring back its data into the L1D which makes it
2073ec8ce5dSThomas Gleixner   attackable again.
2083ec8ce5dSThomas Gleixner
2093ec8ce5dSThomas Gleixner   L1D flush can be controlled by the administrator via the kernel command
2103ec8ce5dSThomas Gleixner   line and sysfs control files. See :ref:`mitigation_control_command_line`
2113ec8ce5dSThomas Gleixner   and :ref:`mitigation_control_kvm`.
2123ec8ce5dSThomas Gleixner
2133ec8ce5dSThomas Gleixner.. _guest_confinement:
2143ec8ce5dSThomas Gleixner
2153ec8ce5dSThomas Gleixner2. Guest VCPU confinement to dedicated physical cores
2163ec8ce5dSThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2173ec8ce5dSThomas Gleixner
2183ec8ce5dSThomas Gleixner   To address the SMT problem, it is possible to make a guest or a group of
2193ec8ce5dSThomas Gleixner   guests affine to one or more physical cores. The proper mechanism for
2203ec8ce5dSThomas Gleixner   that is to utilize exclusive cpusets to ensure that no other guest or
2213ec8ce5dSThomas Gleixner   host tasks can run on these cores.
2223ec8ce5dSThomas Gleixner
2233ec8ce5dSThomas Gleixner   If only a single guest or related guests run on sibling SMT threads on
2243ec8ce5dSThomas Gleixner   the same physical core then they can only attack their own memory and
2253ec8ce5dSThomas Gleixner   restricted parts of the host memory.
2263ec8ce5dSThomas Gleixner
2273ec8ce5dSThomas Gleixner   Host memory is attackable, when one of the sibling SMT threads runs in
2283ec8ce5dSThomas Gleixner   host OS (hypervisor) context and the other in guest context. The amount
2293ec8ce5dSThomas Gleixner   of valuable information from the host OS context depends on the context
2303ec8ce5dSThomas Gleixner   which the host OS executes, i.e. interrupts, soft interrupts and kernel
2313ec8ce5dSThomas Gleixner   threads. The amount of valuable data from these contexts cannot be
2323ec8ce5dSThomas Gleixner   declared as non-interesting for an attacker without deep inspection of
2333ec8ce5dSThomas Gleixner   the code.
2343ec8ce5dSThomas Gleixner
2353ec8ce5dSThomas Gleixner   **Note**, that assigning guests to a fixed set of physical cores affects
2363ec8ce5dSThomas Gleixner   the ability of the scheduler to do load balancing and might have
2373ec8ce5dSThomas Gleixner   negative effects on CPU utilization depending on the hosting
2383ec8ce5dSThomas Gleixner   scenario. Disabling SMT might be a viable alternative for particular
2393ec8ce5dSThomas Gleixner   scenarios.
2403ec8ce5dSThomas Gleixner
2413ec8ce5dSThomas Gleixner   For further information about confining guests to a single or to a group
2423ec8ce5dSThomas Gleixner   of cores consult the cpusets documentation:
2433ec8ce5dSThomas Gleixner
2444f4cfa6cSMauro Carvalho Chehab   https://www.kernel.org/doc/Documentation/admin-guide/cgroup-v1/cpusets.rst
2453ec8ce5dSThomas Gleixner
2463ec8ce5dSThomas Gleixner.. _interrupt_isolation:
2473ec8ce5dSThomas Gleixner
2483ec8ce5dSThomas Gleixner3. Interrupt affinity
2493ec8ce5dSThomas Gleixner^^^^^^^^^^^^^^^^^^^^^
2503ec8ce5dSThomas Gleixner
2513ec8ce5dSThomas Gleixner   Interrupts can be made affine to logical CPUs. This is not universally
2523ec8ce5dSThomas Gleixner   true because there are types of interrupts which are truly per CPU
2533ec8ce5dSThomas Gleixner   interrupts, e.g. the local timer interrupt. Aside of that multi queue
2543ec8ce5dSThomas Gleixner   devices affine their interrupts to single CPUs or groups of CPUs per
2553ec8ce5dSThomas Gleixner   queue without allowing the administrator to control the affinities.
2563ec8ce5dSThomas Gleixner
2573ec8ce5dSThomas Gleixner   Moving the interrupts, which can be affinity controlled, away from CPUs
2583ec8ce5dSThomas Gleixner   which run untrusted guests, reduces the attack vector space.
2593ec8ce5dSThomas Gleixner
2603ec8ce5dSThomas Gleixner   Whether the interrupts with are affine to CPUs, which run untrusted
2613ec8ce5dSThomas Gleixner   guests, provide interesting data for an attacker depends on the system
2623ec8ce5dSThomas Gleixner   configuration and the scenarios which run on the system. While for some
2631949f9f4STony Luck   of the interrupts it can be assumed that they won't expose interesting
2643ec8ce5dSThomas Gleixner   information beyond exposing hints about the host OS memory layout, there
2653ec8ce5dSThomas Gleixner   is no way to make general assumptions.
2663ec8ce5dSThomas Gleixner
2673ec8ce5dSThomas Gleixner   Interrupt affinity can be controlled by the administrator via the
2683ec8ce5dSThomas Gleixner   /proc/irq/$NR/smp_affinity[_list] files. Limited documentation is
2693ec8ce5dSThomas Gleixner   available at:
2703ec8ce5dSThomas Gleixner
271*e00b0ab8SMauro Carvalho Chehab   https://www.kernel.org/doc/Documentation/core-api/irq/irq-affinity.rst
2723ec8ce5dSThomas Gleixner
2733ec8ce5dSThomas Gleixner.. _smt_control:
2743ec8ce5dSThomas Gleixner
2753ec8ce5dSThomas Gleixner4. SMT control
2763ec8ce5dSThomas Gleixner^^^^^^^^^^^^^^
2773ec8ce5dSThomas Gleixner
2783ec8ce5dSThomas Gleixner   To prevent the SMT issues of L1TF it might be necessary to disable SMT
2793ec8ce5dSThomas Gleixner   completely. Disabling SMT can have a significant performance impact, but
2803ec8ce5dSThomas Gleixner   the impact depends on the hosting scenario and the type of workloads.
2813ec8ce5dSThomas Gleixner   The impact of disabling SMT needs also to be weighted against the impact
2823ec8ce5dSThomas Gleixner   of other mitigation solutions like confining guests to dedicated cores.
2833ec8ce5dSThomas Gleixner
2843ec8ce5dSThomas Gleixner   The kernel provides a sysfs interface to retrieve the status of SMT and
2853ec8ce5dSThomas Gleixner   to control it. It also provides a kernel command line interface to
2863ec8ce5dSThomas Gleixner   control SMT.
2873ec8ce5dSThomas Gleixner
2883ec8ce5dSThomas Gleixner   The kernel command line interface consists of the following options:
2893ec8ce5dSThomas Gleixner
2903ec8ce5dSThomas Gleixner     =========== ==========================================================
2913ec8ce5dSThomas Gleixner     nosmt	 Affects the bring up of the secondary CPUs during boot. The
2923ec8ce5dSThomas Gleixner		 kernel tries to bring all present CPUs online during the
2933ec8ce5dSThomas Gleixner		 boot process. "nosmt" makes sure that from each physical
2943ec8ce5dSThomas Gleixner		 core only one - the so called primary (hyper) thread is
2953ec8ce5dSThomas Gleixner		 activated. Due to a design flaw of Intel processors related
2963ec8ce5dSThomas Gleixner		 to Machine Check Exceptions the non primary siblings have
2973ec8ce5dSThomas Gleixner		 to be brought up at least partially and are then shut down
2983ec8ce5dSThomas Gleixner		 again.  "nosmt" can be undone via the sysfs interface.
2993ec8ce5dSThomas Gleixner
3001949f9f4STony Luck     nosmt=force Has the same effect as "nosmt" but it does not allow to
3013ec8ce5dSThomas Gleixner		 undo the SMT disable via the sysfs interface.
3023ec8ce5dSThomas Gleixner     =========== ==========================================================
3033ec8ce5dSThomas Gleixner
3043ec8ce5dSThomas Gleixner   The sysfs interface provides two files:
3053ec8ce5dSThomas Gleixner
3063ec8ce5dSThomas Gleixner   - /sys/devices/system/cpu/smt/control
3073ec8ce5dSThomas Gleixner   - /sys/devices/system/cpu/smt/active
3083ec8ce5dSThomas Gleixner
3093ec8ce5dSThomas Gleixner   /sys/devices/system/cpu/smt/control:
3103ec8ce5dSThomas Gleixner
3113ec8ce5dSThomas Gleixner     This file allows to read out the SMT control state and provides the
3123ec8ce5dSThomas Gleixner     ability to disable or (re)enable SMT. The possible states are:
3133ec8ce5dSThomas Gleixner
3143ec8ce5dSThomas Gleixner	==============  ===================================================
3153ec8ce5dSThomas Gleixner	on		SMT is supported by the CPU and enabled. All
3163ec8ce5dSThomas Gleixner			logical CPUs can be onlined and offlined without
3173ec8ce5dSThomas Gleixner			restrictions.
3183ec8ce5dSThomas Gleixner
3193ec8ce5dSThomas Gleixner	off		SMT is supported by the CPU and disabled. Only
3203ec8ce5dSThomas Gleixner			the so called primary SMT threads can be onlined
3213ec8ce5dSThomas Gleixner			and offlined without restrictions. An attempt to
3223ec8ce5dSThomas Gleixner			online a non-primary sibling is rejected
3233ec8ce5dSThomas Gleixner
3243ec8ce5dSThomas Gleixner	forceoff	Same as 'off' but the state cannot be controlled.
3253ec8ce5dSThomas Gleixner			Attempts to write to the control file are rejected.
3263ec8ce5dSThomas Gleixner
3273ec8ce5dSThomas Gleixner	notsupported	The processor does not support SMT. It's therefore
3283ec8ce5dSThomas Gleixner			not affected by the SMT implications of L1TF.
3293ec8ce5dSThomas Gleixner			Attempts to write to the control file are rejected.
3303ec8ce5dSThomas Gleixner	==============  ===================================================
3313ec8ce5dSThomas Gleixner
3323ec8ce5dSThomas Gleixner     The possible states which can be written into this file to control SMT
3333ec8ce5dSThomas Gleixner     state are:
3343ec8ce5dSThomas Gleixner
3353ec8ce5dSThomas Gleixner     - on
3363ec8ce5dSThomas Gleixner     - off
3373ec8ce5dSThomas Gleixner     - forceoff
3383ec8ce5dSThomas Gleixner
3393ec8ce5dSThomas Gleixner   /sys/devices/system/cpu/smt/active:
3403ec8ce5dSThomas Gleixner
3413ec8ce5dSThomas Gleixner     This file reports whether SMT is enabled and active, i.e. if on any
3423ec8ce5dSThomas Gleixner     physical core two or more sibling threads are online.
3433ec8ce5dSThomas Gleixner
3443ec8ce5dSThomas Gleixner   SMT control is also possible at boot time via the l1tf kernel command
3453ec8ce5dSThomas Gleixner   line parameter in combination with L1D flush control. See
3463ec8ce5dSThomas Gleixner   :ref:`mitigation_control_command_line`.
3473ec8ce5dSThomas Gleixner
3483ec8ce5dSThomas Gleixner5. Disabling EPT
3493ec8ce5dSThomas Gleixner^^^^^^^^^^^^^^^^
3503ec8ce5dSThomas Gleixner
3513ec8ce5dSThomas Gleixner  Disabling EPT for virtual machines provides full mitigation for L1TF even
3523ec8ce5dSThomas Gleixner  with SMT enabled, because the effective page tables for guests are
3533ec8ce5dSThomas Gleixner  managed and sanitized by the hypervisor. Though disabling EPT has a
3543ec8ce5dSThomas Gleixner  significant performance impact especially when the Meltdown mitigation
3553ec8ce5dSThomas Gleixner  KPTI is enabled.
3563ec8ce5dSThomas Gleixner
3573ec8ce5dSThomas Gleixner  EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
3583ec8ce5dSThomas Gleixner
3593ec8ce5dSThomas GleixnerThere is ongoing research and development for new mitigation mechanisms to
3603ec8ce5dSThomas Gleixneraddress the performance impact of disabling SMT or EPT.
3613ec8ce5dSThomas Gleixner
3623ec8ce5dSThomas Gleixner.. _mitigation_control_command_line:
3633ec8ce5dSThomas Gleixner
3643ec8ce5dSThomas GleixnerMitigation control on the kernel command line
3653ec8ce5dSThomas Gleixner---------------------------------------------
3663ec8ce5dSThomas Gleixner
3673ec8ce5dSThomas GleixnerThe kernel command line allows to control the L1TF mitigations at boot
3683ec8ce5dSThomas Gleixnertime with the option "l1tf=". The valid arguments for this option are:
3693ec8ce5dSThomas Gleixner
3703ec8ce5dSThomas Gleixner  ============  =============================================================
3713ec8ce5dSThomas Gleixner  full		Provides all available mitigations for the L1TF
3723ec8ce5dSThomas Gleixner		vulnerability. Disables SMT and enables all mitigations in
3733ec8ce5dSThomas Gleixner		the hypervisors, i.e. unconditional L1D flushing
3743ec8ce5dSThomas Gleixner
3753ec8ce5dSThomas Gleixner		SMT control and L1D flush control via the sysfs interface
3763ec8ce5dSThomas Gleixner		is still possible after boot.  Hypervisors will issue a
3773ec8ce5dSThomas Gleixner		warning when the first VM is started in a potentially
3783ec8ce5dSThomas Gleixner		insecure configuration, i.e. SMT enabled or L1D flush
3793ec8ce5dSThomas Gleixner		disabled.
3803ec8ce5dSThomas Gleixner
3813ec8ce5dSThomas Gleixner  full,force	Same as 'full', but disables SMT and L1D flush runtime
3823ec8ce5dSThomas Gleixner		control. Implies the 'nosmt=force' command line option.
3833ec8ce5dSThomas Gleixner		(i.e. sysfs control of SMT is disabled.)
3843ec8ce5dSThomas Gleixner
3853ec8ce5dSThomas Gleixner  flush		Leaves SMT enabled and enables the default hypervisor
3863ec8ce5dSThomas Gleixner		mitigation, i.e. conditional L1D flushing
3873ec8ce5dSThomas Gleixner
3883ec8ce5dSThomas Gleixner		SMT control and L1D flush control via the sysfs interface
3893ec8ce5dSThomas Gleixner		is still possible after boot.  Hypervisors will issue a
3903ec8ce5dSThomas Gleixner		warning when the first VM is started in a potentially
3913ec8ce5dSThomas Gleixner		insecure configuration, i.e. SMT enabled or L1D flush
3923ec8ce5dSThomas Gleixner		disabled.
3933ec8ce5dSThomas Gleixner
3943ec8ce5dSThomas Gleixner  flush,nosmt	Disables SMT and enables the default hypervisor mitigation,
3953ec8ce5dSThomas Gleixner		i.e. conditional L1D flushing.
3963ec8ce5dSThomas Gleixner
3973ec8ce5dSThomas Gleixner		SMT control and L1D flush control via the sysfs interface
3983ec8ce5dSThomas Gleixner		is still possible after boot.  Hypervisors will issue a
3993ec8ce5dSThomas Gleixner		warning when the first VM is started in a potentially
4003ec8ce5dSThomas Gleixner		insecure configuration, i.e. SMT enabled or L1D flush
4013ec8ce5dSThomas Gleixner		disabled.
4023ec8ce5dSThomas Gleixner
4033ec8ce5dSThomas Gleixner  flush,nowarn	Same as 'flush', but hypervisors will not warn when a VM is
4043ec8ce5dSThomas Gleixner		started in a potentially insecure configuration.
4053ec8ce5dSThomas Gleixner
4063ec8ce5dSThomas Gleixner  off		Disables hypervisor mitigations and doesn't emit any
4073ec8ce5dSThomas Gleixner		warnings.
4085b5e4d62SMichal Hocko		It also drops the swap size and available RAM limit restrictions
4095b5e4d62SMichal Hocko		on both hypervisor and bare metal.
4105b5e4d62SMichal Hocko
4113ec8ce5dSThomas Gleixner  ============  =============================================================
4123ec8ce5dSThomas Gleixner
4133ec8ce5dSThomas GleixnerThe default is 'flush'. For details about L1D flushing see :ref:`l1d_flush`.
4143ec8ce5dSThomas Gleixner
4153ec8ce5dSThomas Gleixner
4163ec8ce5dSThomas Gleixner.. _mitigation_control_kvm:
4173ec8ce5dSThomas Gleixner
4183ec8ce5dSThomas GleixnerMitigation control for KVM - module parameter
4193ec8ce5dSThomas Gleixner-------------------------------------------------------------
4203ec8ce5dSThomas Gleixner
4213ec8ce5dSThomas GleixnerThe KVM hypervisor mitigation mechanism, flushing the L1D cache when
4223ec8ce5dSThomas Gleixnerentering a guest, can be controlled with a module parameter.
4233ec8ce5dSThomas Gleixner
4243ec8ce5dSThomas GleixnerThe option/parameter is "kvm-intel.vmentry_l1d_flush=". It takes the
4253ec8ce5dSThomas Gleixnerfollowing arguments:
4263ec8ce5dSThomas Gleixner
4273ec8ce5dSThomas Gleixner  ============  ==============================================================
4283ec8ce5dSThomas Gleixner  always	L1D cache flush on every VMENTER.
4293ec8ce5dSThomas Gleixner
4303ec8ce5dSThomas Gleixner  cond		Flush L1D on VMENTER only when the code between VMEXIT and
4313ec8ce5dSThomas Gleixner		VMENTER can leak host memory which is considered
4323ec8ce5dSThomas Gleixner		interesting for an attacker. This still can leak host memory
4333ec8ce5dSThomas Gleixner		which allows e.g. to determine the hosts address space layout.
4343ec8ce5dSThomas Gleixner
4353ec8ce5dSThomas Gleixner  never		Disables the mitigation
4363ec8ce5dSThomas Gleixner  ============  ==============================================================
4373ec8ce5dSThomas Gleixner
4383ec8ce5dSThomas GleixnerThe parameter can be provided on the kernel command line, as a module
4393ec8ce5dSThomas Gleixnerparameter when loading the modules and at runtime modified via the sysfs
4403ec8ce5dSThomas Gleixnerfile:
4413ec8ce5dSThomas Gleixner
4423ec8ce5dSThomas Gleixner/sys/module/kvm_intel/parameters/vmentry_l1d_flush
4433ec8ce5dSThomas Gleixner
4443ec8ce5dSThomas GleixnerThe default is 'cond'. If 'l1tf=full,force' is given on the kernel command
4453ec8ce5dSThomas Gleixnerline, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush
4463ec8ce5dSThomas Gleixnermodule parameter is ignored and writes to the sysfs file are rejected.
4473ec8ce5dSThomas Gleixner
4485999bbe7SThomas Gleixner.. _mitigation_selection:
4493ec8ce5dSThomas Gleixner
4503ec8ce5dSThomas GleixnerMitigation selection guide
4513ec8ce5dSThomas Gleixner--------------------------
4523ec8ce5dSThomas Gleixner
4533ec8ce5dSThomas Gleixner1. No virtualization in use
4543ec8ce5dSThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^^^^^
4553ec8ce5dSThomas Gleixner
4563ec8ce5dSThomas Gleixner   The system is protected by the kernel unconditionally and no further
4573ec8ce5dSThomas Gleixner   action is required.
4583ec8ce5dSThomas Gleixner
4593ec8ce5dSThomas Gleixner2. Virtualization with trusted guests
4603ec8ce5dSThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4613ec8ce5dSThomas Gleixner
4623ec8ce5dSThomas Gleixner   If the guest comes from a trusted source and the guest OS kernel is
4633ec8ce5dSThomas Gleixner   guaranteed to have the L1TF mitigations in place the system is fully
4643ec8ce5dSThomas Gleixner   protected against L1TF and no further action is required.
4653ec8ce5dSThomas Gleixner
4663ec8ce5dSThomas Gleixner   To avoid the overhead of the default L1D flushing on VMENTER the
4673ec8ce5dSThomas Gleixner   administrator can disable the flushing via the kernel command line and
4683ec8ce5dSThomas Gleixner   sysfs control files. See :ref:`mitigation_control_command_line` and
4693ec8ce5dSThomas Gleixner   :ref:`mitigation_control_kvm`.
4703ec8ce5dSThomas Gleixner
4713ec8ce5dSThomas Gleixner
4723ec8ce5dSThomas Gleixner3. Virtualization with untrusted guests
4733ec8ce5dSThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4743ec8ce5dSThomas Gleixner
4753ec8ce5dSThomas Gleixner3.1. SMT not supported or disabled
4763ec8ce5dSThomas Gleixner""""""""""""""""""""""""""""""""""
4773ec8ce5dSThomas Gleixner
4783ec8ce5dSThomas Gleixner  If SMT is not supported by the processor or disabled in the BIOS or by
4793ec8ce5dSThomas Gleixner  the kernel, it's only required to enforce L1D flushing on VMENTER.
4803ec8ce5dSThomas Gleixner
4813ec8ce5dSThomas Gleixner  Conditional L1D flushing is the default behaviour and can be tuned. See
4823ec8ce5dSThomas Gleixner  :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
4833ec8ce5dSThomas Gleixner
4843ec8ce5dSThomas Gleixner3.2. EPT not supported or disabled
4853ec8ce5dSThomas Gleixner""""""""""""""""""""""""""""""""""
4863ec8ce5dSThomas Gleixner
4873ec8ce5dSThomas Gleixner  If EPT is not supported by the processor or disabled in the hypervisor,
4883ec8ce5dSThomas Gleixner  the system is fully protected. SMT can stay enabled and L1D flushing on
4893ec8ce5dSThomas Gleixner  VMENTER is not required.
4903ec8ce5dSThomas Gleixner
4913ec8ce5dSThomas Gleixner  EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
4923ec8ce5dSThomas Gleixner
4933ec8ce5dSThomas Gleixner3.3. SMT and EPT supported and active
4943ec8ce5dSThomas Gleixner"""""""""""""""""""""""""""""""""""""
4953ec8ce5dSThomas Gleixner
4963ec8ce5dSThomas Gleixner  If SMT and EPT are supported and active then various degrees of
4973ec8ce5dSThomas Gleixner  mitigations can be employed:
4983ec8ce5dSThomas Gleixner
4993ec8ce5dSThomas Gleixner  - L1D flushing on VMENTER:
5003ec8ce5dSThomas Gleixner
5013ec8ce5dSThomas Gleixner    L1D flushing on VMENTER is the minimal protection requirement, but it
5023ec8ce5dSThomas Gleixner    is only potent in combination with other mitigation methods.
5033ec8ce5dSThomas Gleixner
5043ec8ce5dSThomas Gleixner    Conditional L1D flushing is the default behaviour and can be tuned. See
5053ec8ce5dSThomas Gleixner    :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
5063ec8ce5dSThomas Gleixner
5073ec8ce5dSThomas Gleixner  - Guest confinement:
5083ec8ce5dSThomas Gleixner
5093ec8ce5dSThomas Gleixner    Confinement of guests to a single or a group of physical cores which
5103ec8ce5dSThomas Gleixner    are not running any other processes, can reduce the attack surface
5113ec8ce5dSThomas Gleixner    significantly, but interrupts, soft interrupts and kernel threads can
5123ec8ce5dSThomas Gleixner    still expose valuable data to a potential attacker. See
5133ec8ce5dSThomas Gleixner    :ref:`guest_confinement`.
5143ec8ce5dSThomas Gleixner
5153ec8ce5dSThomas Gleixner  - Interrupt isolation:
5163ec8ce5dSThomas Gleixner
5173ec8ce5dSThomas Gleixner    Isolating the guest CPUs from interrupts can reduce the attack surface
5183ec8ce5dSThomas Gleixner    further, but still allows a malicious guest to explore a limited amount
5193ec8ce5dSThomas Gleixner    of host physical memory. This can at least be used to gain knowledge
5203ec8ce5dSThomas Gleixner    about the host address space layout. The interrupts which have a fixed
5213ec8ce5dSThomas Gleixner    affinity to the CPUs which run the untrusted guests can depending on
5223ec8ce5dSThomas Gleixner    the scenario still trigger soft interrupts and schedule kernel threads
5233ec8ce5dSThomas Gleixner    which might expose valuable information. See
5243ec8ce5dSThomas Gleixner    :ref:`interrupt_isolation`.
5253ec8ce5dSThomas Gleixner
5263ec8ce5dSThomas GleixnerThe above three mitigation methods combined can provide protection to a
5273ec8ce5dSThomas Gleixnercertain degree, but the risk of the remaining attack surface has to be
5283ec8ce5dSThomas Gleixnercarefully analyzed. For full protection the following methods are
5293ec8ce5dSThomas Gleixneravailable:
5303ec8ce5dSThomas Gleixner
5313ec8ce5dSThomas Gleixner  - Disabling SMT:
5323ec8ce5dSThomas Gleixner
5333ec8ce5dSThomas Gleixner    Disabling SMT and enforcing the L1D flushing provides the maximum
5343ec8ce5dSThomas Gleixner    amount of protection. This mitigation is not depending on any of the
5353ec8ce5dSThomas Gleixner    above mitigation methods.
5363ec8ce5dSThomas Gleixner
5373ec8ce5dSThomas Gleixner    SMT control and L1D flushing can be tuned by the command line
5383ec8ce5dSThomas Gleixner    parameters 'nosmt', 'l1tf', 'kvm-intel.vmentry_l1d_flush' and at run
5393ec8ce5dSThomas Gleixner    time with the matching sysfs control files. See :ref:`smt_control`,
5403ec8ce5dSThomas Gleixner    :ref:`mitigation_control_command_line` and
5413ec8ce5dSThomas Gleixner    :ref:`mitigation_control_kvm`.
5423ec8ce5dSThomas Gleixner
5433ec8ce5dSThomas Gleixner  - Disabling EPT:
5443ec8ce5dSThomas Gleixner
5453ec8ce5dSThomas Gleixner    Disabling EPT provides the maximum amount of protection as well. It is
5463ec8ce5dSThomas Gleixner    not depending on any of the above mitigation methods. SMT can stay
5473ec8ce5dSThomas Gleixner    enabled and L1D flushing is not required, but the performance impact is
5483ec8ce5dSThomas Gleixner    significant.
5493ec8ce5dSThomas Gleixner
5503ec8ce5dSThomas Gleixner    EPT can be disabled in the hypervisor via the 'kvm-intel.ept'
5513ec8ce5dSThomas Gleixner    parameter.
5523ec8ce5dSThomas Gleixner
5535b76a3cfSPaolo Bonzini3.4. Nested virtual machines
5545b76a3cfSPaolo Bonzini""""""""""""""""""""""""""""
5555b76a3cfSPaolo Bonzini
5565b76a3cfSPaolo BonziniWhen nested virtualization is in use, three operating systems are involved:
5575b76a3cfSPaolo Bonzinithe bare metal hypervisor, the nested hypervisor and the nested virtual
5585b76a3cfSPaolo Bonzinimachine.  VMENTER operations from the nested hypervisor into the nested
5595b76a3cfSPaolo Bonziniguest will always be processed by the bare metal hypervisor. If KVM is the
56060ca05c3SSalvatore Bonaccorsobare metal hypervisor it will:
5615b76a3cfSPaolo Bonzini
5625b76a3cfSPaolo Bonzini - Flush the L1D cache on every switch from the nested hypervisor to the
5635b76a3cfSPaolo Bonzini   nested virtual machine, so that the nested hypervisor's secrets are not
5645b76a3cfSPaolo Bonzini   exposed to the nested virtual machine;
5655b76a3cfSPaolo Bonzini
5665b76a3cfSPaolo Bonzini - Flush the L1D cache on every switch from the nested virtual machine to
5675b76a3cfSPaolo Bonzini   the nested hypervisor; this is a complex operation, and flushing the L1D
5685b76a3cfSPaolo Bonzini   cache avoids that the bare metal hypervisor's secrets are exposed to the
5695b76a3cfSPaolo Bonzini   nested virtual machine;
5705b76a3cfSPaolo Bonzini
5715b76a3cfSPaolo Bonzini - Instruct the nested hypervisor to not perform any L1D cache flush. This
5725b76a3cfSPaolo Bonzini   is an optimization to avoid double L1D flushing.
5735b76a3cfSPaolo Bonzini
5743ec8ce5dSThomas Gleixner
5753ec8ce5dSThomas Gleixner.. _default_mitigations:
5763ec8ce5dSThomas Gleixner
5773ec8ce5dSThomas GleixnerDefault mitigations
5783ec8ce5dSThomas Gleixner-------------------
5793ec8ce5dSThomas Gleixner
5803ec8ce5dSThomas Gleixner  The kernel default mitigations for vulnerable processors are:
5813ec8ce5dSThomas Gleixner
5823ec8ce5dSThomas Gleixner  - PTE inversion to protect against malicious user space. This is done
5835b5e4d62SMichal Hocko    unconditionally and cannot be controlled. The swap storage is limited
5845b5e4d62SMichal Hocko    to ~16TB.
5853ec8ce5dSThomas Gleixner
5863ec8ce5dSThomas Gleixner  - L1D conditional flushing on VMENTER when EPT is enabled for
5873ec8ce5dSThomas Gleixner    a guest.
5883ec8ce5dSThomas Gleixner
5893ec8ce5dSThomas Gleixner  The kernel does not by default enforce the disabling of SMT, which leaves
5903ec8ce5dSThomas Gleixner  SMT systems vulnerable when running untrusted guests with EPT enabled.
5913ec8ce5dSThomas Gleixner
5923ec8ce5dSThomas Gleixner  The rationale for this choice is:
5933ec8ce5dSThomas Gleixner
5943ec8ce5dSThomas Gleixner  - Force disabling SMT can break existing setups, especially with
5953ec8ce5dSThomas Gleixner    unattended updates.
5963ec8ce5dSThomas Gleixner
5973ec8ce5dSThomas Gleixner  - If regular users run untrusted guests on their machine, then L1TF is
5983ec8ce5dSThomas Gleixner    just an add on to other malware which might be embedded in an untrusted
5993ec8ce5dSThomas Gleixner    guest, e.g. spam-bots or attacks on the local network.
6003ec8ce5dSThomas Gleixner
6013ec8ce5dSThomas Gleixner    There is no technical way to prevent a user from running untrusted code
6023ec8ce5dSThomas Gleixner    on their machines blindly.
6033ec8ce5dSThomas Gleixner
6043ec8ce5dSThomas Gleixner  - It's technically extremely unlikely and from today's knowledge even
6053ec8ce5dSThomas Gleixner    impossible that L1TF can be exploited via the most popular attack
6063ec8ce5dSThomas Gleixner    mechanisms like JavaScript because these mechanisms have no way to
6073ec8ce5dSThomas Gleixner    control PTEs. If this would be possible and not other mitigation would
6083ec8ce5dSThomas Gleixner    be possible, then the default might be different.
6093ec8ce5dSThomas Gleixner
6103ec8ce5dSThomas Gleixner  - The administrators of cloud and hosting setups have to carefully
6113ec8ce5dSThomas Gleixner    analyze the risk for their scenarios and make the appropriate
6123ec8ce5dSThomas Gleixner    mitigation choices, which might even vary across their deployed
6133ec8ce5dSThomas Gleixner    machines and also result in other changes of their overall setup.
6143ec8ce5dSThomas Gleixner    There is no way for the kernel to provide a sensible default for this
6153ec8ce5dSThomas Gleixner    kind of scenarios.
616