1df4e817bSPasha Tatashin.. SPDX-License-Identifier: GPL-2.0 2df4e817bSPasha Tatashin 3df4e817bSPasha Tatashin================ 4df4e817bSPasha TatashinPage Table Check 5df4e817bSPasha Tatashin================ 6df4e817bSPasha Tatashin 7df4e817bSPasha TatashinIntroduction 8df4e817bSPasha Tatashin============ 9df4e817bSPasha Tatashin 10854d0982SPaul MenzelPage table check allows to harden the kernel by ensuring that some types of 11df4e817bSPasha Tatashinthe memory corruptions are prevented. 12df4e817bSPasha Tatashin 13df4e817bSPasha TatashinPage table check performs extra verifications at the time when new pages become 14df4e817bSPasha Tatashinaccessible from the userspace by getting their page table entries (PTEs PMDs 15df4e817bSPasha Tatashinetc.) added into the table. 16df4e817bSPasha Tatashin 178430557fSPeter XuIn case of most detected corruption, the kernel is crashed. There is a small 18df4e817bSPasha Tatashinperformance and memory overhead associated with the page table check. Therefore, 19df4e817bSPasha Tatashinit is disabled by default, but can be optionally enabled on systems where the 20df4e817bSPasha Tatashinextra hardening outweighs the performance costs. Also, because page table check 21df4e817bSPasha Tatashinis synchronous, it can help with debugging double map memory corruption issues, 22df4e817bSPasha Tatashinby crashing kernel at the time wrong mapping occurs instead of later which is 23df4e817bSPasha Tatashinoften the case with memory corruptions bugs. 24df4e817bSPasha Tatashin 258430557fSPeter XuIt can also be used to do page table entry checks over various flags, dump 268430557fSPeter Xuwarnings when illegal combinations of entry flags are detected. Currently, 278430557fSPeter Xuuserfaultfd is the only user of such to sanity check wr-protect bit against 288430557fSPeter Xuany writable flags. Illegal flag combinations will not directly cause data 298430557fSPeter Xucorruption in this case immediately, but that will cause read-only data to 308430557fSPeter Xube writable, leading to corrupt when the page content is later modified. 318430557fSPeter Xu 32df4e817bSPasha TatashinDouble mapping detection logic 33df4e817bSPasha Tatashin============================== 34df4e817bSPasha Tatashin 35df4e817bSPasha Tatashin+-------------------+-------------------+-------------------+------------------+ 36df4e817bSPasha Tatashin| Current Mapping | New mapping | Permissions | Rule | 37df4e817bSPasha Tatashin+===================+===================+===================+==================+ 38df4e817bSPasha Tatashin| Anonymous | Anonymous | Read | Allow | 39df4e817bSPasha Tatashin+-------------------+-------------------+-------------------+------------------+ 40df4e817bSPasha Tatashin| Anonymous | Anonymous | Read / Write | Prohibit | 41df4e817bSPasha Tatashin+-------------------+-------------------+-------------------+------------------+ 42df4e817bSPasha Tatashin| Anonymous | Named | Any | Prohibit | 43df4e817bSPasha Tatashin+-------------------+-------------------+-------------------+------------------+ 44df4e817bSPasha Tatashin| Named | Anonymous | Any | Prohibit | 45df4e817bSPasha Tatashin+-------------------+-------------------+-------------------+------------------+ 46df4e817bSPasha Tatashin| Named | Named | Any | Allow | 47df4e817bSPasha Tatashin+-------------------+-------------------+-------------------+------------------+ 48df4e817bSPasha Tatashin 49df4e817bSPasha TatashinEnabling Page Table Check 50df4e817bSPasha Tatashin========================= 51df4e817bSPasha Tatashin 52df4e817bSPasha TatashinBuild kernel with: 53df4e817bSPasha Tatashin 54df4e817bSPasha Tatashin- PAGE_TABLE_CHECK=y 55df4e817bSPasha Tatashin Note, it can only be enabled on platforms where ARCH_SUPPORTS_PAGE_TABLE_CHECK 56df4e817bSPasha Tatashin is available. 57df4e817bSPasha Tatashin 58df4e817bSPasha Tatashin- Boot with 'page_table_check=on' kernel parameter. 59df4e817bSPasha Tatashin 60df4e817bSPasha TatashinOptionally, build kernel with PAGE_TABLE_CHECK_ENFORCED in order to have page 61df4e817bSPasha Tatashintable support without extra kernel parameter. 6281a31a86SRuihan Li 6381a31a86SRuihan LiImplementation notes 6481a31a86SRuihan Li==================== 6581a31a86SRuihan Li 6681a31a86SRuihan LiWe specifically decided not to use VMA information in order to avoid relying on 6781a31a86SRuihan LiMM states (except for limited "struct page" info). The page table check is a 6881a31a86SRuihan Liseparate from Linux-MM state machine that verifies that the user accessible 6981a31a86SRuihan Lipages are not falsely shared. 7081a31a86SRuihan Li 7181a31a86SRuihan LiPAGE_TABLE_CHECK depends on EXCLUSIVE_SYSTEM_RAM. The reason is that without 7281a31a86SRuihan LiEXCLUSIVE_SYSTEM_RAM, users are allowed to map arbitrary physical memory 7381a31a86SRuihan Liregions into the userspace via /dev/mem. At the same time, pages may change 7481a31a86SRuihan Litheir properties (e.g., from anonymous pages to named pages) while they are 7581a31a86SRuihan Listill being mapped in the userspace, leading to "corruption" detected by the 7681a31a86SRuihan Lipage table check. 7781a31a86SRuihan Li 7881a31a86SRuihan LiEven with EXCLUSIVE_SYSTEM_RAM, I/O pages may be still allowed to be mapped via 7981a31a86SRuihan Li/dev/mem. However, these pages are always considered as named pages, so they 8081a31a86SRuihan Liwon't break the logic used in the page table check. 81