xref: /linux/Documentation/arch/x86/sgx.rst (revision a23e1966932464e1c5226cb9ac4ce1d5fc10ba22)
13fa97bf0SJarkko Sakkinen.. SPDX-License-Identifier: GPL-2.0
23fa97bf0SJarkko Sakkinen
33fa97bf0SJarkko Sakkinen===============================
43fa97bf0SJarkko SakkinenSoftware Guard eXtensions (SGX)
53fa97bf0SJarkko Sakkinen===============================
63fa97bf0SJarkko Sakkinen
73fa97bf0SJarkko SakkinenOverview
83fa97bf0SJarkko Sakkinen========
93fa97bf0SJarkko Sakkinen
103fa97bf0SJarkko SakkinenSoftware Guard eXtensions (SGX) hardware enables for user space applications
113fa97bf0SJarkko Sakkinento set aside private memory regions of code and data:
123fa97bf0SJarkko Sakkinen
13379e4de9SReinette Chatre* Privileged (ring-0) ENCLS functions orchestrate the construction of the
143fa97bf0SJarkko Sakkinen  regions.
153fa97bf0SJarkko Sakkinen* Unprivileged (ring-3) ENCLU functions allow an application to enter and
163fa97bf0SJarkko Sakkinen  execute inside the regions.
173fa97bf0SJarkko Sakkinen
183fa97bf0SJarkko SakkinenThese memory regions are called enclaves. An enclave can be only entered at a
193fa97bf0SJarkko Sakkinenfixed set of entry points. Each entry point can hold a single hardware thread
203fa97bf0SJarkko Sakkinenat a time.  While the enclave is loaded from a regular binary file by using
213fa97bf0SJarkko SakkinenENCLS functions, only the threads inside the enclave can access its memory. The
223fa97bf0SJarkko Sakkinenregion is denied from outside access by the CPU, and encrypted before it leaves
233fa97bf0SJarkko Sakkinenfrom LLC.
243fa97bf0SJarkko Sakkinen
253fa97bf0SJarkko SakkinenThe support can be determined by
263fa97bf0SJarkko Sakkinen
273fa97bf0SJarkko Sakkinen	``grep sgx /proc/cpuinfo``
283fa97bf0SJarkko Sakkinen
293fa97bf0SJarkko SakkinenSGX must both be supported in the processor and enabled by the BIOS.  If SGX
303fa97bf0SJarkko Sakkinenappears to be unsupported on a system which has hardware support, ensure
313fa97bf0SJarkko Sakkinensupport is enabled in the BIOS.  If a BIOS presents a choice between "Enabled"
323fa97bf0SJarkko Sakkinenand "Software Enabled" modes for SGX, choose "Enabled".
333fa97bf0SJarkko Sakkinen
343fa97bf0SJarkko SakkinenEnclave Page Cache
353fa97bf0SJarkko Sakkinen==================
363fa97bf0SJarkko Sakkinen
373fa97bf0SJarkko SakkinenSGX utilizes an *Enclave Page Cache (EPC)* to store pages that are associated
383fa97bf0SJarkko Sakkinenwith an enclave. It is contained in a BIOS-reserved region of physical memory.
393fa97bf0SJarkko SakkinenUnlike pages used for regular memory, pages can only be accessed from outside of
403fa97bf0SJarkko Sakkinenthe enclave during enclave construction with special, limited SGX instructions.
413fa97bf0SJarkko Sakkinen
423fa97bf0SJarkko SakkinenOnly a CPU executing inside an enclave can directly access enclave memory.
433fa97bf0SJarkko SakkinenHowever, a CPU executing inside an enclave may access normal memory outside the
443fa97bf0SJarkko Sakkinenenclave.
453fa97bf0SJarkko Sakkinen
463fa97bf0SJarkko SakkinenThe kernel manages enclave memory similar to how it treats device memory.
473fa97bf0SJarkko Sakkinen
483fa97bf0SJarkko SakkinenEnclave Page Types
493fa97bf0SJarkko Sakkinen------------------
503fa97bf0SJarkko Sakkinen
513fa97bf0SJarkko Sakkinen**SGX Enclave Control Structure (SECS)**
523fa97bf0SJarkko Sakkinen   Enclave's address range, attributes and other global data are defined
533fa97bf0SJarkko Sakkinen   by this structure.
543fa97bf0SJarkko Sakkinen
553fa97bf0SJarkko Sakkinen**Regular (REG)**
563fa97bf0SJarkko Sakkinen   Regular EPC pages contain the code and data of an enclave.
573fa97bf0SJarkko Sakkinen
583fa97bf0SJarkko Sakkinen**Thread Control Structure (TCS)**
593fa97bf0SJarkko Sakkinen   Thread Control Structure pages define the entry points to an enclave and
603fa97bf0SJarkko Sakkinen   track the execution state of an enclave thread.
613fa97bf0SJarkko Sakkinen
623fa97bf0SJarkko Sakkinen**Version Array (VA)**
633fa97bf0SJarkko Sakkinen   Version Array pages contain 512 slots, each of which can contain a version
643fa97bf0SJarkko Sakkinen   number for a page evicted from the EPC.
653fa97bf0SJarkko Sakkinen
663fa97bf0SJarkko SakkinenEnclave Page Cache Map
673fa97bf0SJarkko Sakkinen----------------------
683fa97bf0SJarkko Sakkinen
693fa97bf0SJarkko SakkinenThe processor tracks EPC pages in a hardware metadata structure called the
703fa97bf0SJarkko Sakkinen*Enclave Page Cache Map (EPCM)*.  The EPCM contains an entry for each EPC page
713fa97bf0SJarkko Sakkinenwhich describes the owning enclave, access rights and page type among the other
723fa97bf0SJarkko Sakkinenthings.
733fa97bf0SJarkko Sakkinen
743fa97bf0SJarkko SakkinenEPCM permissions are separate from the normal page tables.  This prevents the
753fa97bf0SJarkko Sakkinenkernel from, for instance, allowing writes to data which an enclave wishes to
763fa97bf0SJarkko Sakkinenremain read-only.  EPCM permissions may only impose additional restrictions on
773fa97bf0SJarkko Sakkinentop of normal x86 page permissions.
783fa97bf0SJarkko Sakkinen
793fa97bf0SJarkko SakkinenFor all intents and purposes, the SGX architecture allows the processor to
803fa97bf0SJarkko Sakkineninvalidate all EPCM entries at will.  This requires that software be prepared to
813fa97bf0SJarkko Sakkinenhandle an EPCM fault at any time.  In practice, this can happen on events like
823fa97bf0SJarkko Sakkinenpower transitions when the ephemeral key that encrypts enclave memory is lost.
833fa97bf0SJarkko Sakkinen
843fa97bf0SJarkko SakkinenApplication interface
853fa97bf0SJarkko Sakkinen=====================
863fa97bf0SJarkko Sakkinen
873fa97bf0SJarkko SakkinenEnclave build functions
883fa97bf0SJarkko Sakkinen-----------------------
893fa97bf0SJarkko Sakkinen
903fa97bf0SJarkko SakkinenIn addition to the traditional compiler and linker build process, SGX has a
913fa97bf0SJarkko Sakkinenseparate enclave “build” process.  Enclaves must be built before they can be
923fa97bf0SJarkko Sakkinenexecuted (entered). The first step in building an enclave is opening the
933fa97bf0SJarkko Sakkinen**/dev/sgx_enclave** device.  Since enclave memory is protected from direct
94379e4de9SReinette Chatreaccess, special privileged instructions are then used to copy data into enclave
953fa97bf0SJarkko Sakkinenpages and establish enclave page permissions.
963fa97bf0SJarkko Sakkinen
973fa97bf0SJarkko Sakkinen.. kernel-doc:: arch/x86/kernel/cpu/sgx/ioctl.c
983fa97bf0SJarkko Sakkinen   :functions: sgx_ioc_enclave_create
993fa97bf0SJarkko Sakkinen               sgx_ioc_enclave_add_pages
1003fa97bf0SJarkko Sakkinen               sgx_ioc_enclave_init
1013fa97bf0SJarkko Sakkinen               sgx_ioc_enclave_provision
1023fa97bf0SJarkko Sakkinen
103629b5155SReinette ChatreEnclave runtime management
104629b5155SReinette Chatre--------------------------
105629b5155SReinette Chatre
106629b5155SReinette ChatreSystems supporting SGX2 additionally support changes to initialized
107629b5155SReinette Chatreenclaves: modifying enclave page permissions and type, and dynamically
108629b5155SReinette Chatreadding and removing of enclave pages. When an enclave accesses an address
109629b5155SReinette Chatrewithin its address range that does not have a backing page then a new
110629b5155SReinette Chatreregular page will be dynamically added to the enclave. The enclave is
111629b5155SReinette Chatrestill required to run EACCEPT on the new page before it can be used.
112629b5155SReinette Chatre
113629b5155SReinette Chatre.. kernel-doc:: arch/x86/kernel/cpu/sgx/ioctl.c
114629b5155SReinette Chatre   :functions: sgx_ioc_enclave_restrict_permissions
115629b5155SReinette Chatre               sgx_ioc_enclave_modify_types
116629b5155SReinette Chatre               sgx_ioc_enclave_remove_pages
117629b5155SReinette Chatre
1183fa97bf0SJarkko SakkinenEnclave vDSO
1193fa97bf0SJarkko Sakkinen------------
1203fa97bf0SJarkko Sakkinen
1213fa97bf0SJarkko SakkinenEntering an enclave can only be done through SGX-specific EENTER and ERESUME
1223fa97bf0SJarkko Sakkinenfunctions, and is a non-trivial process.  Because of the complexity of
1233fa97bf0SJarkko Sakkinentransitioning to and from an enclave, enclaves typically utilize a library to
1243fa97bf0SJarkko Sakkinenhandle the actual transitions.  This is roughly analogous to how glibc
1253fa97bf0SJarkko Sakkinenimplementations are used by most applications to wrap system calls.
1263fa97bf0SJarkko Sakkinen
1273fa97bf0SJarkko SakkinenAnother crucial characteristic of enclaves is that they can generate exceptions
1283fa97bf0SJarkko Sakkinenas part of their normal operation that need to be handled in the enclave or are
1293fa97bf0SJarkko Sakkinenunique to SGX.
1303fa97bf0SJarkko Sakkinen
1313fa97bf0SJarkko SakkinenInstead of the traditional signal mechanism to handle these exceptions, SGX
1323fa97bf0SJarkko Sakkinencan leverage special exception fixup provided by the vDSO.  The kernel-provided
1333fa97bf0SJarkko SakkinenvDSO function wraps low-level transitions to/from the enclave like EENTER and
1343fa97bf0SJarkko SakkinenERESUME.  The vDSO function intercepts exceptions that would otherwise generate
1353fa97bf0SJarkko Sakkinena signal and return the fault information directly to its caller.  This avoids
1363fa97bf0SJarkko Sakkinenthe need to juggle signal handlers.
1373fa97bf0SJarkko Sakkinen
1383fa97bf0SJarkko Sakkinen.. kernel-doc:: arch/x86/include/uapi/asm/sgx.h
1393fa97bf0SJarkko Sakkinen   :functions: vdso_sgx_enter_enclave_t
1403fa97bf0SJarkko Sakkinen
1413fa97bf0SJarkko Sakkinenksgxd
1423fa97bf0SJarkko Sakkinen=====
1433fa97bf0SJarkko Sakkinen
144379e4de9SReinette ChatreSGX support includes a kernel thread called *ksgxd*.
1453fa97bf0SJarkko Sakkinen
1463fa97bf0SJarkko SakkinenEPC sanitization
1473fa97bf0SJarkko Sakkinen----------------
1483fa97bf0SJarkko Sakkinen
1493fa97bf0SJarkko Sakkinenksgxd is started when SGX initializes.  Enclave memory is typically ready
150379e4de9SReinette Chatrefor use when the processor powers on or resets.  However, if SGX has been in
1513fa97bf0SJarkko Sakkinenuse since the reset, enclave pages may be in an inconsistent state.  This might
1523fa97bf0SJarkko Sakkinenoccur after a crash and kexec() cycle, for instance.  At boot, ksgxd
1533fa97bf0SJarkko Sakkinenreinitializes all enclave pages so that they can be allocated and re-used.
1543fa97bf0SJarkko Sakkinen
1553fa97bf0SJarkko SakkinenThe sanitization is done by going through EPC address space and applying the
1563fa97bf0SJarkko SakkinenEREMOVE function to each physical page. Some enclave pages like SECS pages have
1573fa97bf0SJarkko Sakkinenhardware dependencies on other pages which prevents EREMOVE from functioning.
1583fa97bf0SJarkko SakkinenExecuting two EREMOVE passes removes the dependencies.
1593fa97bf0SJarkko Sakkinen
1603fa97bf0SJarkko SakkinenPage reclaimer
1613fa97bf0SJarkko Sakkinen--------------
1623fa97bf0SJarkko Sakkinen
1633fa97bf0SJarkko SakkinenSimilar to the core kswapd, ksgxd, is responsible for managing the
1643fa97bf0SJarkko Sakkinenovercommitment of enclave memory.  If the system runs out of enclave memory,
165379e4de9SReinette Chatre*ksgxd* “swaps” enclave memory to normal memory.
1663fa97bf0SJarkko Sakkinen
1673fa97bf0SJarkko SakkinenLaunch Control
1683fa97bf0SJarkko Sakkinen==============
1693fa97bf0SJarkko Sakkinen
1703fa97bf0SJarkko SakkinenSGX provides a launch control mechanism. After all enclave pages have been
1713fa97bf0SJarkko Sakkinencopied, kernel executes EINIT function, which initializes the enclave. Only after
1723fa97bf0SJarkko Sakkinenthis the CPU can execute inside the enclave.
1733fa97bf0SJarkko Sakkinen
174379e4de9SReinette ChatreEINIT function takes an RSA-3072 signature of the enclave measurement.  The function
1753fa97bf0SJarkko Sakkinenchecks that the measurement is correct and signature is signed with the key
1763fa97bf0SJarkko Sakkinenhashed to the four **IA32_SGXLEPUBKEYHASH{0, 1, 2, 3}** MSRs representing the
1773fa97bf0SJarkko SakkinenSHA256 of a public key.
1783fa97bf0SJarkko Sakkinen
1793fa97bf0SJarkko SakkinenThose MSRs can be configured by the BIOS to be either readable or writable.
1803fa97bf0SJarkko SakkinenLinux supports only writable configuration in order to give full control to the
1813fa97bf0SJarkko Sakkinenkernel on launch control policy. Before calling EINIT function, the driver sets
1823fa97bf0SJarkko Sakkinenthe MSRs to match the enclave's signing key.
1833fa97bf0SJarkko Sakkinen
1843fa97bf0SJarkko SakkinenEncryption engines
1853fa97bf0SJarkko Sakkinen==================
1863fa97bf0SJarkko Sakkinen
1873fa97bf0SJarkko SakkinenIn order to conceal the enclave data while it is out of the CPU package, the
1883fa97bf0SJarkko Sakkinenmemory controller has an encryption engine to transparently encrypt and decrypt
1893fa97bf0SJarkko Sakkinenenclave memory.
1903fa97bf0SJarkko Sakkinen
1913fa97bf0SJarkko SakkinenIn CPUs prior to Ice Lake, the Memory Encryption Engine (MEE) is used to
1923fa97bf0SJarkko Sakkinenencrypt pages leaving the CPU caches. MEE uses a n-ary Merkle tree with root in
1933fa97bf0SJarkko SakkinenSRAM to maintain integrity of the encrypted data. This provides integrity and
1943fa97bf0SJarkko Sakkinenanti-replay protection but does not scale to large memory sizes because the time
1953fa97bf0SJarkko Sakkinenrequired to update the Merkle tree grows logarithmically in relation to the
1963fa97bf0SJarkko Sakkinenmemory size.
1973fa97bf0SJarkko Sakkinen
1983fa97bf0SJarkko SakkinenCPUs starting from Icelake use Total Memory Encryption (TME) in the place of
1993fa97bf0SJarkko SakkinenMEE. TME-based SGX implementations do not have an integrity Merkle tree, which
2003fa97bf0SJarkko Sakkinenmeans integrity and replay-attacks are not mitigated.  B, it includes
2013fa97bf0SJarkko Sakkinenadditional changes to prevent cipher text from being returned and SW memory
202379e4de9SReinette Chatrealiases from being created.
2033fa97bf0SJarkko Sakkinen
2043fa97bf0SJarkko SakkinenDMA to enclave memory is blocked by range registers on both MEE and TME systems
2053fa97bf0SJarkko Sakkinen(SDM section 41.10).
2063fa97bf0SJarkko Sakkinen
2073fa97bf0SJarkko SakkinenUsage Models
2083fa97bf0SJarkko Sakkinen============
2093fa97bf0SJarkko Sakkinen
2103fa97bf0SJarkko SakkinenShared Library
2113fa97bf0SJarkko Sakkinen--------------
2123fa97bf0SJarkko Sakkinen
2133fa97bf0SJarkko SakkinenSensitive data and the code that acts on it is partitioned from the application
2143fa97bf0SJarkko Sakkineninto a separate library. The library is then linked as a DSO which can be loaded
2153fa97bf0SJarkko Sakkineninto an enclave. The application can then make individual function calls into
2163fa97bf0SJarkko Sakkinenthe enclave through special SGX instructions. A run-time within the enclave is
2173fa97bf0SJarkko Sakkinenconfigured to marshal function parameters into and out of the enclave and to
2183fa97bf0SJarkko Sakkinencall the correct library function.
2193fa97bf0SJarkko Sakkinen
2203fa97bf0SJarkko SakkinenApplication Container
2213fa97bf0SJarkko Sakkinen---------------------
2223fa97bf0SJarkko Sakkinen
2233fa97bf0SJarkko SakkinenAn application may be loaded into a container enclave which is specially
2243fa97bf0SJarkko Sakkinenconfigured with a library OS and run-time which permits the application to run.
2253fa97bf0SJarkko SakkinenThe enclave run-time and library OS work together to execute the application
2263fa97bf0SJarkko Sakkinenwhen a thread enters the enclave.
227b0c7459bSKai Huang
228b0c7459bSKai HuangImpact of Potential Kernel SGX Bugs
229b0c7459bSKai Huang===================================
230b0c7459bSKai Huang
231b0c7459bSKai HuangEPC leaks
232b0c7459bSKai Huang---------
233b0c7459bSKai Huang
234b0c7459bSKai HuangWhen EPC page leaks happen, a WARNING like this is shown in dmesg:
235b0c7459bSKai Huang
236b0c7459bSKai Huang"EREMOVE returned ... and an EPC page was leaked.  SGX may become unusable..."
237b0c7459bSKai Huang
238b0c7459bSKai HuangThis is effectively a kernel use-after-free of an EPC page, and due
239b0c7459bSKai Huangto the way SGX works, the bug is detected at freeing. Rather than
240b0c7459bSKai Huangadding the page back to the pool of available EPC pages, the kernel
241b0c7459bSKai Huangintentionally leaks the page to avoid additional errors in the future.
242b0c7459bSKai Huang
243b0c7459bSKai HuangWhen this happens, the kernel will likely soon leak more EPC pages, and
244b0c7459bSKai HuangSGX will likely become unusable because the memory available to SGX is
245b0c7459bSKai Huanglimited. However, while this may be fatal to SGX, the rest of the kernel
246b0c7459bSKai Huangis unlikely to be impacted and should continue to work.
247b0c7459bSKai Huang
248*d56b699dSBjorn HelgaasAs a result, when this happens, user should stop running any new
249b0c7459bSKai HuangSGX workloads, (or just any new workloads), and migrate all valuable
250b0c7459bSKai Huangworkloads. Although a machine reboot can recover all EPC memory, the bug
251b0c7459bSKai Huangshould be reported to Linux developers.
252540745ddSSean Christopherson
253540745ddSSean Christopherson
254540745ddSSean ChristophersonVirtual EPC
255540745ddSSean Christopherson===========
256540745ddSSean Christopherson
257540745ddSSean ChristophersonThe implementation has also a virtual EPC driver to support SGX enclaves
258540745ddSSean Christophersonin guests. Unlike the SGX driver, an EPC page allocated by the virtual
259540745ddSSean ChristophersonEPC driver doesn't have a specific enclave associated with it. This is
260540745ddSSean Christophersonbecause KVM doesn't track how a guest uses EPC pages.
261540745ddSSean Christopherson
262540745ddSSean ChristophersonAs a result, the SGX core page reclaimer doesn't support reclaiming EPC
263540745ddSSean Christophersonpages allocated to KVM guests through the virtual EPC driver. If the
264540745ddSSean Christophersonuser wants to deploy SGX applications both on the host and in guests
265540745ddSSean Christophersonon the same machine, the user should reserve enough EPC (by taking out
266540745ddSSean Christophersontotal virtual EPC size of all SGX VMs from the physical EPC size) for
267540745ddSSean Christophersonhost SGX applications so they can run with acceptable performance.
268ae095b16SPaolo Bonzini
269ae095b16SPaolo BonziniArchitectural behavior is to restore all EPC pages to an uninitialized
270ae095b16SPaolo Bonzinistate also after a guest reboot.  Because this state can be reached only
271ae095b16SPaolo Bonzinithrough the privileged ``ENCLS[EREMOVE]`` instruction, ``/dev/sgx_vepc``
272ae095b16SPaolo Bonziniprovides the ``SGX_IOC_VEPC_REMOVE_ALL`` ioctl to execute the instruction
273ae095b16SPaolo Bonzinion all pages in the virtual EPC.
274ae095b16SPaolo Bonzini
275ae095b16SPaolo Bonzini``EREMOVE`` can fail for three reasons.  Userspace must pay attention
276ae095b16SPaolo Bonzinito expected failures and handle them as follows:
277ae095b16SPaolo Bonzini
278ae095b16SPaolo Bonzini1. Page removal will always fail when any thread is running in the
279ae095b16SPaolo Bonzini   enclave to which the page belongs.  In this case the ioctl will
280ae095b16SPaolo Bonzini   return ``EBUSY`` independent of whether it has successfully removed
281ae095b16SPaolo Bonzini   some pages; userspace can avoid these failures by preventing execution
282ae095b16SPaolo Bonzini   of any vcpu which maps the virtual EPC.
283ae095b16SPaolo Bonzini
284ae095b16SPaolo Bonzini2. Page removal will cause a general protection fault if two calls to
285ae095b16SPaolo Bonzini   ``EREMOVE`` happen concurrently for pages that refer to the same
286ae095b16SPaolo Bonzini   "SECS" metadata pages.  This can happen if there are concurrent
287ae095b16SPaolo Bonzini   invocations to ``SGX_IOC_VEPC_REMOVE_ALL``, or if a ``/dev/sgx_vepc``
288ae095b16SPaolo Bonzini   file descriptor in the guest is closed at the same time as
289ae095b16SPaolo Bonzini   ``SGX_IOC_VEPC_REMOVE_ALL``; it will also be reported as ``EBUSY``.
290ae095b16SPaolo Bonzini   This can be avoided in userspace by serializing calls to the ioctl()
291ae095b16SPaolo Bonzini   and to close(), but in general it should not be a problem.
292ae095b16SPaolo Bonzini
293ae095b16SPaolo Bonzini3. Finally, page removal will fail for SECS metadata pages which still
294ae095b16SPaolo Bonzini   have child pages.  Child pages can be removed by executing
295ae095b16SPaolo Bonzini   ``SGX_IOC_VEPC_REMOVE_ALL`` on all ``/dev/sgx_vepc`` file descriptors
296ae095b16SPaolo Bonzini   mapped into the guest.  This means that the ioctl() must be called
297ae095b16SPaolo Bonzini   twice: an initial set of calls to remove child pages and a subsequent
298ae095b16SPaolo Bonzini   set of calls to remove SECS pages.  The second set of calls is only
299ae095b16SPaolo Bonzini   required for those mappings that returned a nonzero value from the
300ae095b16SPaolo Bonzini   first call.  It indicates a bug in the kernel or the userspace client
301ae095b16SPaolo Bonzini   if any of the second round of ``SGX_IOC_VEPC_REMOVE_ALL`` calls has
302ae095b16SPaolo Bonzini   a return code other than 0.
303