13fa97bf0SJarkko Sakkinen.. SPDX-License-Identifier: GPL-2.0 23fa97bf0SJarkko Sakkinen 33fa97bf0SJarkko Sakkinen=============================== 43fa97bf0SJarkko SakkinenSoftware Guard eXtensions (SGX) 53fa97bf0SJarkko Sakkinen=============================== 63fa97bf0SJarkko Sakkinen 73fa97bf0SJarkko SakkinenOverview 83fa97bf0SJarkko Sakkinen======== 93fa97bf0SJarkko Sakkinen 103fa97bf0SJarkko SakkinenSoftware Guard eXtensions (SGX) hardware enables for user space applications 113fa97bf0SJarkko Sakkinento set aside private memory regions of code and data: 123fa97bf0SJarkko Sakkinen 13379e4de9SReinette Chatre* Privileged (ring-0) ENCLS functions orchestrate the construction of the 143fa97bf0SJarkko Sakkinen regions. 153fa97bf0SJarkko Sakkinen* Unprivileged (ring-3) ENCLU functions allow an application to enter and 163fa97bf0SJarkko Sakkinen execute inside the regions. 173fa97bf0SJarkko Sakkinen 183fa97bf0SJarkko SakkinenThese memory regions are called enclaves. An enclave can be only entered at a 193fa97bf0SJarkko Sakkinenfixed set of entry points. Each entry point can hold a single hardware thread 203fa97bf0SJarkko Sakkinenat a time. While the enclave is loaded from a regular binary file by using 213fa97bf0SJarkko SakkinenENCLS functions, only the threads inside the enclave can access its memory. The 223fa97bf0SJarkko Sakkinenregion is denied from outside access by the CPU, and encrypted before it leaves 233fa97bf0SJarkko Sakkinenfrom LLC. 243fa97bf0SJarkko Sakkinen 253fa97bf0SJarkko SakkinenThe support can be determined by 263fa97bf0SJarkko Sakkinen 273fa97bf0SJarkko Sakkinen ``grep sgx /proc/cpuinfo`` 283fa97bf0SJarkko Sakkinen 293fa97bf0SJarkko SakkinenSGX must both be supported in the processor and enabled by the BIOS. If SGX 303fa97bf0SJarkko Sakkinenappears to be unsupported on a system which has hardware support, ensure 313fa97bf0SJarkko Sakkinensupport is enabled in the BIOS. If a BIOS presents a choice between "Enabled" 323fa97bf0SJarkko Sakkinenand "Software Enabled" modes for SGX, choose "Enabled". 333fa97bf0SJarkko Sakkinen 343fa97bf0SJarkko SakkinenEnclave Page Cache 353fa97bf0SJarkko Sakkinen================== 363fa97bf0SJarkko Sakkinen 373fa97bf0SJarkko SakkinenSGX utilizes an *Enclave Page Cache (EPC)* to store pages that are associated 383fa97bf0SJarkko Sakkinenwith an enclave. It is contained in a BIOS-reserved region of physical memory. 393fa97bf0SJarkko SakkinenUnlike pages used for regular memory, pages can only be accessed from outside of 403fa97bf0SJarkko Sakkinenthe enclave during enclave construction with special, limited SGX instructions. 413fa97bf0SJarkko Sakkinen 423fa97bf0SJarkko SakkinenOnly a CPU executing inside an enclave can directly access enclave memory. 433fa97bf0SJarkko SakkinenHowever, a CPU executing inside an enclave may access normal memory outside the 443fa97bf0SJarkko Sakkinenenclave. 453fa97bf0SJarkko Sakkinen 463fa97bf0SJarkko SakkinenThe kernel manages enclave memory similar to how it treats device memory. 473fa97bf0SJarkko Sakkinen 483fa97bf0SJarkko SakkinenEnclave Page Types 493fa97bf0SJarkko Sakkinen------------------ 503fa97bf0SJarkko Sakkinen 513fa97bf0SJarkko Sakkinen**SGX Enclave Control Structure (SECS)** 523fa97bf0SJarkko Sakkinen Enclave's address range, attributes and other global data are defined 533fa97bf0SJarkko Sakkinen by this structure. 543fa97bf0SJarkko Sakkinen 553fa97bf0SJarkko Sakkinen**Regular (REG)** 563fa97bf0SJarkko Sakkinen Regular EPC pages contain the code and data of an enclave. 573fa97bf0SJarkko Sakkinen 583fa97bf0SJarkko Sakkinen**Thread Control Structure (TCS)** 593fa97bf0SJarkko Sakkinen Thread Control Structure pages define the entry points to an enclave and 603fa97bf0SJarkko Sakkinen track the execution state of an enclave thread. 613fa97bf0SJarkko Sakkinen 623fa97bf0SJarkko Sakkinen**Version Array (VA)** 633fa97bf0SJarkko Sakkinen Version Array pages contain 512 slots, each of which can contain a version 643fa97bf0SJarkko Sakkinen number for a page evicted from the EPC. 653fa97bf0SJarkko Sakkinen 663fa97bf0SJarkko SakkinenEnclave Page Cache Map 673fa97bf0SJarkko Sakkinen---------------------- 683fa97bf0SJarkko Sakkinen 693fa97bf0SJarkko SakkinenThe processor tracks EPC pages in a hardware metadata structure called the 703fa97bf0SJarkko Sakkinen*Enclave Page Cache Map (EPCM)*. The EPCM contains an entry for each EPC page 713fa97bf0SJarkko Sakkinenwhich describes the owning enclave, access rights and page type among the other 723fa97bf0SJarkko Sakkinenthings. 733fa97bf0SJarkko Sakkinen 743fa97bf0SJarkko SakkinenEPCM permissions are separate from the normal page tables. This prevents the 753fa97bf0SJarkko Sakkinenkernel from, for instance, allowing writes to data which an enclave wishes to 763fa97bf0SJarkko Sakkinenremain read-only. EPCM permissions may only impose additional restrictions on 773fa97bf0SJarkko Sakkinentop of normal x86 page permissions. 783fa97bf0SJarkko Sakkinen 793fa97bf0SJarkko SakkinenFor all intents and purposes, the SGX architecture allows the processor to 803fa97bf0SJarkko Sakkineninvalidate all EPCM entries at will. This requires that software be prepared to 813fa97bf0SJarkko Sakkinenhandle an EPCM fault at any time. In practice, this can happen on events like 823fa97bf0SJarkko Sakkinenpower transitions when the ephemeral key that encrypts enclave memory is lost. 833fa97bf0SJarkko Sakkinen 843fa97bf0SJarkko SakkinenApplication interface 853fa97bf0SJarkko Sakkinen===================== 863fa97bf0SJarkko Sakkinen 873fa97bf0SJarkko SakkinenEnclave build functions 883fa97bf0SJarkko Sakkinen----------------------- 893fa97bf0SJarkko Sakkinen 903fa97bf0SJarkko SakkinenIn addition to the traditional compiler and linker build process, SGX has a 913fa97bf0SJarkko Sakkinenseparate enclave “build” process. Enclaves must be built before they can be 923fa97bf0SJarkko Sakkinenexecuted (entered). The first step in building an enclave is opening the 933fa97bf0SJarkko Sakkinen**/dev/sgx_enclave** device. Since enclave memory is protected from direct 94379e4de9SReinette Chatreaccess, special privileged instructions are then used to copy data into enclave 953fa97bf0SJarkko Sakkinenpages and establish enclave page permissions. 963fa97bf0SJarkko Sakkinen 973fa97bf0SJarkko Sakkinen.. kernel-doc:: arch/x86/kernel/cpu/sgx/ioctl.c 983fa97bf0SJarkko Sakkinen :functions: sgx_ioc_enclave_create 993fa97bf0SJarkko Sakkinen sgx_ioc_enclave_add_pages 1003fa97bf0SJarkko Sakkinen sgx_ioc_enclave_init 1013fa97bf0SJarkko Sakkinen sgx_ioc_enclave_provision 1023fa97bf0SJarkko Sakkinen 103629b5155SReinette ChatreEnclave runtime management 104629b5155SReinette Chatre-------------------------- 105629b5155SReinette Chatre 106629b5155SReinette ChatreSystems supporting SGX2 additionally support changes to initialized 107629b5155SReinette Chatreenclaves: modifying enclave page permissions and type, and dynamically 108629b5155SReinette Chatreadding and removing of enclave pages. When an enclave accesses an address 109629b5155SReinette Chatrewithin its address range that does not have a backing page then a new 110629b5155SReinette Chatreregular page will be dynamically added to the enclave. The enclave is 111629b5155SReinette Chatrestill required to run EACCEPT on the new page before it can be used. 112629b5155SReinette Chatre 113629b5155SReinette Chatre.. kernel-doc:: arch/x86/kernel/cpu/sgx/ioctl.c 114629b5155SReinette Chatre :functions: sgx_ioc_enclave_restrict_permissions 115629b5155SReinette Chatre sgx_ioc_enclave_modify_types 116629b5155SReinette Chatre sgx_ioc_enclave_remove_pages 117629b5155SReinette Chatre 1183fa97bf0SJarkko SakkinenEnclave vDSO 1193fa97bf0SJarkko Sakkinen------------ 1203fa97bf0SJarkko Sakkinen 1213fa97bf0SJarkko SakkinenEntering an enclave can only be done through SGX-specific EENTER and ERESUME 1223fa97bf0SJarkko Sakkinenfunctions, and is a non-trivial process. Because of the complexity of 1233fa97bf0SJarkko Sakkinentransitioning to and from an enclave, enclaves typically utilize a library to 1243fa97bf0SJarkko Sakkinenhandle the actual transitions. This is roughly analogous to how glibc 1253fa97bf0SJarkko Sakkinenimplementations are used by most applications to wrap system calls. 1263fa97bf0SJarkko Sakkinen 1273fa97bf0SJarkko SakkinenAnother crucial characteristic of enclaves is that they can generate exceptions 1283fa97bf0SJarkko Sakkinenas part of their normal operation that need to be handled in the enclave or are 1293fa97bf0SJarkko Sakkinenunique to SGX. 1303fa97bf0SJarkko Sakkinen 1313fa97bf0SJarkko SakkinenInstead of the traditional signal mechanism to handle these exceptions, SGX 1323fa97bf0SJarkko Sakkinencan leverage special exception fixup provided by the vDSO. The kernel-provided 1333fa97bf0SJarkko SakkinenvDSO function wraps low-level transitions to/from the enclave like EENTER and 1343fa97bf0SJarkko SakkinenERESUME. The vDSO function intercepts exceptions that would otherwise generate 1353fa97bf0SJarkko Sakkinena signal and return the fault information directly to its caller. This avoids 1363fa97bf0SJarkko Sakkinenthe need to juggle signal handlers. 1373fa97bf0SJarkko Sakkinen 1383fa97bf0SJarkko Sakkinen.. kernel-doc:: arch/x86/include/uapi/asm/sgx.h 1393fa97bf0SJarkko Sakkinen :functions: vdso_sgx_enter_enclave_t 1403fa97bf0SJarkko Sakkinen 1413fa97bf0SJarkko Sakkinenksgxd 1423fa97bf0SJarkko Sakkinen===== 1433fa97bf0SJarkko Sakkinen 144379e4de9SReinette ChatreSGX support includes a kernel thread called *ksgxd*. 1453fa97bf0SJarkko Sakkinen 1463fa97bf0SJarkko SakkinenEPC sanitization 1473fa97bf0SJarkko Sakkinen---------------- 1483fa97bf0SJarkko Sakkinen 1493fa97bf0SJarkko Sakkinenksgxd is started when SGX initializes. Enclave memory is typically ready 150379e4de9SReinette Chatrefor use when the processor powers on or resets. However, if SGX has been in 1513fa97bf0SJarkko Sakkinenuse since the reset, enclave pages may be in an inconsistent state. This might 1523fa97bf0SJarkko Sakkinenoccur after a crash and kexec() cycle, for instance. At boot, ksgxd 1533fa97bf0SJarkko Sakkinenreinitializes all enclave pages so that they can be allocated and re-used. 1543fa97bf0SJarkko Sakkinen 1553fa97bf0SJarkko SakkinenThe sanitization is done by going through EPC address space and applying the 1563fa97bf0SJarkko SakkinenEREMOVE function to each physical page. Some enclave pages like SECS pages have 1573fa97bf0SJarkko Sakkinenhardware dependencies on other pages which prevents EREMOVE from functioning. 1583fa97bf0SJarkko SakkinenExecuting two EREMOVE passes removes the dependencies. 1593fa97bf0SJarkko Sakkinen 1603fa97bf0SJarkko SakkinenPage reclaimer 1613fa97bf0SJarkko Sakkinen-------------- 1623fa97bf0SJarkko Sakkinen 1633fa97bf0SJarkko SakkinenSimilar to the core kswapd, ksgxd, is responsible for managing the 1643fa97bf0SJarkko Sakkinenovercommitment of enclave memory. If the system runs out of enclave memory, 165379e4de9SReinette Chatre*ksgxd* “swaps” enclave memory to normal memory. 1663fa97bf0SJarkko Sakkinen 1673fa97bf0SJarkko SakkinenLaunch Control 1683fa97bf0SJarkko Sakkinen============== 1693fa97bf0SJarkko Sakkinen 1703fa97bf0SJarkko SakkinenSGX provides a launch control mechanism. After all enclave pages have been 1713fa97bf0SJarkko Sakkinencopied, kernel executes EINIT function, which initializes the enclave. Only after 1723fa97bf0SJarkko Sakkinenthis the CPU can execute inside the enclave. 1733fa97bf0SJarkko Sakkinen 174379e4de9SReinette ChatreEINIT function takes an RSA-3072 signature of the enclave measurement. The function 1753fa97bf0SJarkko Sakkinenchecks that the measurement is correct and signature is signed with the key 1763fa97bf0SJarkko Sakkinenhashed to the four **IA32_SGXLEPUBKEYHASH{0, 1, 2, 3}** MSRs representing the 1773fa97bf0SJarkko SakkinenSHA256 of a public key. 1783fa97bf0SJarkko Sakkinen 1793fa97bf0SJarkko SakkinenThose MSRs can be configured by the BIOS to be either readable or writable. 1803fa97bf0SJarkko SakkinenLinux supports only writable configuration in order to give full control to the 1813fa97bf0SJarkko Sakkinenkernel on launch control policy. Before calling EINIT function, the driver sets 1823fa97bf0SJarkko Sakkinenthe MSRs to match the enclave's signing key. 1833fa97bf0SJarkko Sakkinen 1843fa97bf0SJarkko SakkinenEncryption engines 1853fa97bf0SJarkko Sakkinen================== 1863fa97bf0SJarkko Sakkinen 1873fa97bf0SJarkko SakkinenIn order to conceal the enclave data while it is out of the CPU package, the 1883fa97bf0SJarkko Sakkinenmemory controller has an encryption engine to transparently encrypt and decrypt 1893fa97bf0SJarkko Sakkinenenclave memory. 1903fa97bf0SJarkko Sakkinen 1913fa97bf0SJarkko SakkinenIn CPUs prior to Ice Lake, the Memory Encryption Engine (MEE) is used to 1923fa97bf0SJarkko Sakkinenencrypt pages leaving the CPU caches. MEE uses a n-ary Merkle tree with root in 1933fa97bf0SJarkko SakkinenSRAM to maintain integrity of the encrypted data. This provides integrity and 1943fa97bf0SJarkko Sakkinenanti-replay protection but does not scale to large memory sizes because the time 1953fa97bf0SJarkko Sakkinenrequired to update the Merkle tree grows logarithmically in relation to the 1963fa97bf0SJarkko Sakkinenmemory size. 1973fa97bf0SJarkko Sakkinen 1983fa97bf0SJarkko SakkinenCPUs starting from Icelake use Total Memory Encryption (TME) in the place of 1993fa97bf0SJarkko SakkinenMEE. TME-based SGX implementations do not have an integrity Merkle tree, which 2003fa97bf0SJarkko Sakkinenmeans integrity and replay-attacks are not mitigated. B, it includes 2013fa97bf0SJarkko Sakkinenadditional changes to prevent cipher text from being returned and SW memory 202379e4de9SReinette Chatrealiases from being created. 2033fa97bf0SJarkko Sakkinen 2043fa97bf0SJarkko SakkinenDMA to enclave memory is blocked by range registers on both MEE and TME systems 2053fa97bf0SJarkko Sakkinen(SDM section 41.10). 2063fa97bf0SJarkko Sakkinen 2073fa97bf0SJarkko SakkinenUsage Models 2083fa97bf0SJarkko Sakkinen============ 2093fa97bf0SJarkko Sakkinen 2103fa97bf0SJarkko SakkinenShared Library 2113fa97bf0SJarkko Sakkinen-------------- 2123fa97bf0SJarkko Sakkinen 2133fa97bf0SJarkko SakkinenSensitive data and the code that acts on it is partitioned from the application 2143fa97bf0SJarkko Sakkineninto a separate library. The library is then linked as a DSO which can be loaded 2153fa97bf0SJarkko Sakkineninto an enclave. The application can then make individual function calls into 2163fa97bf0SJarkko Sakkinenthe enclave through special SGX instructions. A run-time within the enclave is 2173fa97bf0SJarkko Sakkinenconfigured to marshal function parameters into and out of the enclave and to 2183fa97bf0SJarkko Sakkinencall the correct library function. 2193fa97bf0SJarkko Sakkinen 2203fa97bf0SJarkko SakkinenApplication Container 2213fa97bf0SJarkko Sakkinen--------------------- 2223fa97bf0SJarkko Sakkinen 2233fa97bf0SJarkko SakkinenAn application may be loaded into a container enclave which is specially 2243fa97bf0SJarkko Sakkinenconfigured with a library OS and run-time which permits the application to run. 2253fa97bf0SJarkko SakkinenThe enclave run-time and library OS work together to execute the application 2263fa97bf0SJarkko Sakkinenwhen a thread enters the enclave. 227b0c7459bSKai Huang 228b0c7459bSKai HuangImpact of Potential Kernel SGX Bugs 229b0c7459bSKai Huang=================================== 230b0c7459bSKai Huang 231b0c7459bSKai HuangEPC leaks 232b0c7459bSKai Huang--------- 233b0c7459bSKai Huang 234b0c7459bSKai HuangWhen EPC page leaks happen, a WARNING like this is shown in dmesg: 235b0c7459bSKai Huang 236b0c7459bSKai Huang"EREMOVE returned ... and an EPC page was leaked. SGX may become unusable..." 237b0c7459bSKai Huang 238b0c7459bSKai HuangThis is effectively a kernel use-after-free of an EPC page, and due 239b0c7459bSKai Huangto the way SGX works, the bug is detected at freeing. Rather than 240b0c7459bSKai Huangadding the page back to the pool of available EPC pages, the kernel 241b0c7459bSKai Huangintentionally leaks the page to avoid additional errors in the future. 242b0c7459bSKai Huang 243b0c7459bSKai HuangWhen this happens, the kernel will likely soon leak more EPC pages, and 244b0c7459bSKai HuangSGX will likely become unusable because the memory available to SGX is 245b0c7459bSKai Huanglimited. However, while this may be fatal to SGX, the rest of the kernel 246b0c7459bSKai Huangis unlikely to be impacted and should continue to work. 247b0c7459bSKai Huang 248*d56b699dSBjorn HelgaasAs a result, when this happens, user should stop running any new 249b0c7459bSKai HuangSGX workloads, (or just any new workloads), and migrate all valuable 250b0c7459bSKai Huangworkloads. Although a machine reboot can recover all EPC memory, the bug 251b0c7459bSKai Huangshould be reported to Linux developers. 252540745ddSSean Christopherson 253540745ddSSean Christopherson 254540745ddSSean ChristophersonVirtual EPC 255540745ddSSean Christopherson=========== 256540745ddSSean Christopherson 257540745ddSSean ChristophersonThe implementation has also a virtual EPC driver to support SGX enclaves 258540745ddSSean Christophersonin guests. Unlike the SGX driver, an EPC page allocated by the virtual 259540745ddSSean ChristophersonEPC driver doesn't have a specific enclave associated with it. This is 260540745ddSSean Christophersonbecause KVM doesn't track how a guest uses EPC pages. 261540745ddSSean Christopherson 262540745ddSSean ChristophersonAs a result, the SGX core page reclaimer doesn't support reclaiming EPC 263540745ddSSean Christophersonpages allocated to KVM guests through the virtual EPC driver. If the 264540745ddSSean Christophersonuser wants to deploy SGX applications both on the host and in guests 265540745ddSSean Christophersonon the same machine, the user should reserve enough EPC (by taking out 266540745ddSSean Christophersontotal virtual EPC size of all SGX VMs from the physical EPC size) for 267540745ddSSean Christophersonhost SGX applications so they can run with acceptable performance. 268ae095b16SPaolo Bonzini 269ae095b16SPaolo BonziniArchitectural behavior is to restore all EPC pages to an uninitialized 270ae095b16SPaolo Bonzinistate also after a guest reboot. Because this state can be reached only 271ae095b16SPaolo Bonzinithrough the privileged ``ENCLS[EREMOVE]`` instruction, ``/dev/sgx_vepc`` 272ae095b16SPaolo Bonziniprovides the ``SGX_IOC_VEPC_REMOVE_ALL`` ioctl to execute the instruction 273ae095b16SPaolo Bonzinion all pages in the virtual EPC. 274ae095b16SPaolo Bonzini 275ae095b16SPaolo Bonzini``EREMOVE`` can fail for three reasons. Userspace must pay attention 276ae095b16SPaolo Bonzinito expected failures and handle them as follows: 277ae095b16SPaolo Bonzini 278ae095b16SPaolo Bonzini1. Page removal will always fail when any thread is running in the 279ae095b16SPaolo Bonzini enclave to which the page belongs. In this case the ioctl will 280ae095b16SPaolo Bonzini return ``EBUSY`` independent of whether it has successfully removed 281ae095b16SPaolo Bonzini some pages; userspace can avoid these failures by preventing execution 282ae095b16SPaolo Bonzini of any vcpu which maps the virtual EPC. 283ae095b16SPaolo Bonzini 284ae095b16SPaolo Bonzini2. Page removal will cause a general protection fault if two calls to 285ae095b16SPaolo Bonzini ``EREMOVE`` happen concurrently for pages that refer to the same 286ae095b16SPaolo Bonzini "SECS" metadata pages. This can happen if there are concurrent 287ae095b16SPaolo Bonzini invocations to ``SGX_IOC_VEPC_REMOVE_ALL``, or if a ``/dev/sgx_vepc`` 288ae095b16SPaolo Bonzini file descriptor in the guest is closed at the same time as 289ae095b16SPaolo Bonzini ``SGX_IOC_VEPC_REMOVE_ALL``; it will also be reported as ``EBUSY``. 290ae095b16SPaolo Bonzini This can be avoided in userspace by serializing calls to the ioctl() 291ae095b16SPaolo Bonzini and to close(), but in general it should not be a problem. 292ae095b16SPaolo Bonzini 293ae095b16SPaolo Bonzini3. Finally, page removal will fail for SECS metadata pages which still 294ae095b16SPaolo Bonzini have child pages. Child pages can be removed by executing 295ae095b16SPaolo Bonzini ``SGX_IOC_VEPC_REMOVE_ALL`` on all ``/dev/sgx_vepc`` file descriptors 296ae095b16SPaolo Bonzini mapped into the guest. This means that the ioctl() must be called 297ae095b16SPaolo Bonzini twice: an initial set of calls to remove child pages and a subsequent 298ae095b16SPaolo Bonzini set of calls to remove SECS pages. The second set of calls is only 299ae095b16SPaolo Bonzini required for those mappings that returned a nonzero value from the 300ae095b16SPaolo Bonzini first call. It indicates a bug in the kernel or the userspace client 301ae095b16SPaolo Bonzini if any of the second round of ``SGX_IOC_VEPC_REMOVE_ALL`` calls has 302ae095b16SPaolo Bonzini a return code other than 0. 303