xref: /linux/Documentation/mm/page_table_check.rst (revision c771600c6af14749609b49565ffb4cac2959710d)
1df4e817bSPasha Tatashin.. SPDX-License-Identifier: GPL-2.0
2df4e817bSPasha Tatashin
3df4e817bSPasha Tatashin================
4df4e817bSPasha TatashinPage Table Check
5df4e817bSPasha Tatashin================
6df4e817bSPasha Tatashin
7df4e817bSPasha TatashinIntroduction
8df4e817bSPasha Tatashin============
9df4e817bSPasha Tatashin
10854d0982SPaul MenzelPage table check allows to harden the kernel by ensuring that some types of
11df4e817bSPasha Tatashinthe memory corruptions are prevented.
12df4e817bSPasha Tatashin
13df4e817bSPasha TatashinPage table check performs extra verifications at the time when new pages become
14df4e817bSPasha Tatashinaccessible from the userspace by getting their page table entries (PTEs PMDs
15df4e817bSPasha Tatashinetc.) added into the table.
16df4e817bSPasha Tatashin
178430557fSPeter XuIn case of most detected corruption, the kernel is crashed. There is a small
18df4e817bSPasha Tatashinperformance and memory overhead associated with the page table check. Therefore,
19df4e817bSPasha Tatashinit is disabled by default, but can be optionally enabled on systems where the
20df4e817bSPasha Tatashinextra hardening outweighs the performance costs. Also, because page table check
21df4e817bSPasha Tatashinis synchronous, it can help with debugging double map memory corruption issues,
22df4e817bSPasha Tatashinby crashing kernel at the time wrong mapping occurs instead of later which is
23df4e817bSPasha Tatashinoften the case with memory corruptions bugs.
24df4e817bSPasha Tatashin
258430557fSPeter XuIt can also be used to do page table entry checks over various flags, dump
268430557fSPeter Xuwarnings when illegal combinations of entry flags are detected.  Currently,
278430557fSPeter Xuuserfaultfd is the only user of such to sanity check wr-protect bit against
288430557fSPeter Xuany writable flags.  Illegal flag combinations will not directly cause data
298430557fSPeter Xucorruption in this case immediately, but that will cause read-only data to
308430557fSPeter Xube writable, leading to corrupt when the page content is later modified.
318430557fSPeter Xu
32df4e817bSPasha TatashinDouble mapping detection logic
33df4e817bSPasha Tatashin==============================
34df4e817bSPasha Tatashin
35df4e817bSPasha Tatashin+-------------------+-------------------+-------------------+------------------+
36df4e817bSPasha Tatashin| Current Mapping   | New mapping       | Permissions       | Rule             |
37df4e817bSPasha Tatashin+===================+===================+===================+==================+
38df4e817bSPasha Tatashin| Anonymous         | Anonymous         | Read              | Allow            |
39df4e817bSPasha Tatashin+-------------------+-------------------+-------------------+------------------+
40df4e817bSPasha Tatashin| Anonymous         | Anonymous         | Read / Write      | Prohibit         |
41df4e817bSPasha Tatashin+-------------------+-------------------+-------------------+------------------+
42df4e817bSPasha Tatashin| Anonymous         | Named             | Any               | Prohibit         |
43df4e817bSPasha Tatashin+-------------------+-------------------+-------------------+------------------+
44df4e817bSPasha Tatashin| Named             | Anonymous         | Any               | Prohibit         |
45df4e817bSPasha Tatashin+-------------------+-------------------+-------------------+------------------+
46df4e817bSPasha Tatashin| Named             | Named             | Any               | Allow            |
47df4e817bSPasha Tatashin+-------------------+-------------------+-------------------+------------------+
48df4e817bSPasha Tatashin
49df4e817bSPasha TatashinEnabling Page Table Check
50df4e817bSPasha Tatashin=========================
51df4e817bSPasha Tatashin
52df4e817bSPasha TatashinBuild kernel with:
53df4e817bSPasha Tatashin
54df4e817bSPasha Tatashin- PAGE_TABLE_CHECK=y
55df4e817bSPasha Tatashin  Note, it can only be enabled on platforms where ARCH_SUPPORTS_PAGE_TABLE_CHECK
56df4e817bSPasha Tatashin  is available.
57df4e817bSPasha Tatashin
58df4e817bSPasha Tatashin- Boot with 'page_table_check=on' kernel parameter.
59df4e817bSPasha Tatashin
60df4e817bSPasha TatashinOptionally, build kernel with PAGE_TABLE_CHECK_ENFORCED in order to have page
61df4e817bSPasha Tatashintable support without extra kernel parameter.
6281a31a86SRuihan Li
6381a31a86SRuihan LiImplementation notes
6481a31a86SRuihan Li====================
6581a31a86SRuihan Li
6681a31a86SRuihan LiWe specifically decided not to use VMA information in order to avoid relying on
6781a31a86SRuihan LiMM states (except for limited "struct page" info). The page table check is a
6881a31a86SRuihan Liseparate from Linux-MM state machine that verifies that the user accessible
6981a31a86SRuihan Lipages are not falsely shared.
7081a31a86SRuihan Li
7181a31a86SRuihan LiPAGE_TABLE_CHECK depends on EXCLUSIVE_SYSTEM_RAM. The reason is that without
7281a31a86SRuihan LiEXCLUSIVE_SYSTEM_RAM, users are allowed to map arbitrary physical memory
7381a31a86SRuihan Liregions into the userspace via /dev/mem. At the same time, pages may change
7481a31a86SRuihan Litheir properties (e.g., from anonymous pages to named pages) while they are
7581a31a86SRuihan Listill being mapped in the userspace, leading to "corruption" detected by the
7681a31a86SRuihan Lipage table check.
7781a31a86SRuihan Li
7881a31a86SRuihan LiEven with EXCLUSIVE_SYSTEM_RAM, I/O pages may be still allowed to be mapped via
7981a31a86SRuihan Li/dev/mem. However, these pages are always considered as named pages, so they
8081a31a86SRuihan Liwon't break the logic used in the page table check.
81