xref: /linux/Documentation/mm/page_owner.rst (revision a23e1966932464e1c5226cb9ac4ce1d5fc10ba22)
1f227e04eSMike Rapoport==================================================
2f227e04eSMike Rapoportpage owner: Tracking about who allocated each page
3f227e04eSMike Rapoport==================================================
4f227e04eSMike Rapoport
5f227e04eSMike RapoportIntroduction
6f227e04eSMike Rapoport============
716a7ade8SJoonsoo Kim
816a7ade8SJoonsoo Kimpage owner is for the tracking about who allocated each page.
916a7ade8SJoonsoo KimIt can be used to debug memory leak or to find a memory hogger.
1016a7ade8SJoonsoo KimWhen allocation happens, information about allocation such as call stack
1116a7ade8SJoonsoo Kimand order of pages is stored into certain storage for each page.
1216a7ade8SJoonsoo KimWhen we need to know about status of all pages, we can get and analyze
1316a7ade8SJoonsoo Kimthis information.
1416a7ade8SJoonsoo Kim
1516a7ade8SJoonsoo KimAlthough we already have tracepoint for tracing page allocation/free,
1616a7ade8SJoonsoo Kimusing it for analyzing who allocate each page is rather complex. We need
1716a7ade8SJoonsoo Kimto enlarge the trace buffer for preventing overlapping until userspace
1816a7ade8SJoonsoo Kimprogram launched. And, launched program continually dump out the trace
1994ebdd28SColin Ian Kingbuffer for later analysis and it would change system behaviour with more
2016a7ade8SJoonsoo Kimpossibility rather than just keeping it in memory, so bad for debugging.
2116a7ade8SJoonsoo Kim
2216a7ade8SJoonsoo Kimpage owner can also be used for various purposes. For example, accurate
2316a7ade8SJoonsoo Kimfragmentation statistics can be obtained through gfp flag information of
2416a7ade8SJoonsoo Kimeach page. It is already implemented and activated if page owner is
2516a7ade8SJoonsoo Kimenabled. Other usages are more than welcome.
2616a7ade8SJoonsoo Kim
27f5c12105SOscar SalvadorIt can also be used to show all the stacks and their current number of
28f5c12105SOscar Salvadorallocated base pages, which gives us a quick overview of where the memory
29f5c12105SOscar Salvadoris going without the need to screen through all the pages and match the
30f5c12105SOscar Salvadorallocation and free operation.
31ba6fe537SOscar Salvador
32024314d6SYixuan Caopage owner is disabled by default. So, if you'd like to use it, you need
33024314d6SYixuan Caoto add "page_owner=on" to your boot cmdline. If the kernel is built
34024314d6SYixuan Caowith page owner and page owner is disabled in runtime due to not enabling
3516a7ade8SJoonsoo Kimboot option, runtime overhead is marginal. If disabled in runtime, it
3616a7ade8SJoonsoo Kimdoesn't require memory to store owner information, so there is no runtime
3716a7ade8SJoonsoo Kimmemory overhead. And, page owner inserts just two unlikely branches into
387dd80b8aSVlastimil Babkathe page allocator hotpath and if not enabled, then allocation is done
397dd80b8aSVlastimil Babkalike as the kernel without page owner. These two unlikely branches should
407dd80b8aSVlastimil Babkanot affect to allocation performance, especially if the static keys jump
417dd80b8aSVlastimil Babkalabel patching functionality is available. Following is the kernel's code
427dd80b8aSVlastimil Babkasize change due to this facility.
4316a7ade8SJoonsoo Kim
440719fdbaSYixuan CaoAlthough enabling page owner increases kernel size by several kilobytes,
450719fdbaSYixuan Caomost of this code is outside page allocator and its hot path. Building
460719fdbaSYixuan Caothe kernel with page owner and turning it on if needed would be great
470719fdbaSYixuan Caooption to debug kernel memory problem.
4816a7ade8SJoonsoo Kim
4916a7ade8SJoonsoo KimThere is one notice that is caused by implementation detail. page owner
5016a7ade8SJoonsoo Kimstores information into the memory from struct page extension. This memory
5116a7ade8SJoonsoo Kimis initialized some time later than that page allocator starts in sparse
5216a7ade8SJoonsoo Kimmemory system, so, until initialization, many pages can be allocated and
5316a7ade8SJoonsoo Kimthey would have no owner information. To fix it up, these early allocated
5416a7ade8SJoonsoo Kimpages are investigated and marked as allocated in initialization phase.
5516a7ade8SJoonsoo KimAlthough it doesn't mean that they have the right owner information,
5616a7ade8SJoonsoo Kimat least, we can tell whether the page is allocated or not,
5716a7ade8SJoonsoo Kimmore accurately. On 2GB memory x86-64 VM box, 13343 early allocated pages
58e7951a3eSChen Xiaoare caught and marked, although they are mostly allocated from struct
5916a7ade8SJoonsoo Kimpage extension feature. Anyway, after that, no page is left in
6016a7ade8SJoonsoo Kimun-tracking state.
6116a7ade8SJoonsoo Kim
62f227e04eSMike RapoportUsage
63f227e04eSMike Rapoport=====
6416a7ade8SJoonsoo Kim
65f227e04eSMike Rapoport1) Build user-space helper::
66f227e04eSMike Rapoport
67799fb82aSSeongJae Park	cd tools/mm
6816a7ade8SJoonsoo Kim	make page_owner_sort
6916a7ade8SJoonsoo Kim
70f227e04eSMike Rapoport2) Enable page owner: add "page_owner=on" to boot cmdline.
7116a7ade8SJoonsoo Kim
7259d7cb27SJiajian Ye3) Do the job that you want to debug.
7316a7ade8SJoonsoo Kim
74f227e04eSMike Rapoport4) Analyze information from page owner::
75f227e04eSMike Rapoport
76ba6fe537SOscar Salvador	cat /sys/kernel/debug/page_owner_stacks/show_stacks > stacks.txt
77ba6fe537SOscar Salvador	cat stacks.txt
78f5c12105SOscar Salvador	 post_alloc_hook+0x177/0x1a0
79f5c12105SOscar Salvador	 get_page_from_freelist+0xd01/0xd80
80f5c12105SOscar Salvador	 __alloc_pages+0x39e/0x7e0
81f5c12105SOscar Salvador	 allocate_slab+0xbc/0x3f0
82f5c12105SOscar Salvador	 ___slab_alloc+0x528/0x8a0
83f5c12105SOscar Salvador	 kmem_cache_alloc+0x224/0x3b0
84f5c12105SOscar Salvador	 sk_prot_alloc+0x58/0x1a0
85f5c12105SOscar Salvador	 sk_alloc+0x32/0x4f0
86f5c12105SOscar Salvador	 inet_create+0x427/0xb50
87f5c12105SOscar Salvador	 __sock_create+0x2e4/0x650
88f5c12105SOscar Salvador	 inet_ctl_sock_create+0x30/0x180
89f5c12105SOscar Salvador	 igmp_net_init+0xc1/0x130
90f5c12105SOscar Salvador	 ops_init+0x167/0x410
91f5c12105SOscar Salvador	 setup_net+0x304/0xa60
92f5c12105SOscar Salvador	 copy_net_ns+0x29b/0x4a0
93f5c12105SOscar Salvador	 create_new_namespaces+0x4a1/0x820
94f5c12105SOscar Salvador	nr_base_pages: 16
95ba6fe537SOscar Salvador	...
96ba6fe537SOscar Salvador	...
97ba6fe537SOscar Salvador	echo 7000 > /sys/kernel/debug/page_owner_stacks/count_threshold
98ba6fe537SOscar Salvador	cat /sys/kernel/debug/page_owner_stacks/show_stacks> stacks_7000.txt
99ba6fe537SOscar Salvador	cat stacks_7000.txt
100f5c12105SOscar Salvador	 post_alloc_hook+0x177/0x1a0
101f5c12105SOscar Salvador	 get_page_from_freelist+0xd01/0xd80
102f5c12105SOscar Salvador	 __alloc_pages+0x39e/0x7e0
103f5c12105SOscar Salvador	 alloc_pages_mpol+0x22e/0x490
104f5c12105SOscar Salvador	 folio_alloc+0xd5/0x110
105f5c12105SOscar Salvador	 filemap_alloc_folio+0x78/0x230
106f5c12105SOscar Salvador	 page_cache_ra_order+0x287/0x6f0
107f5c12105SOscar Salvador	 filemap_get_pages+0x517/0x1160
108f5c12105SOscar Salvador	 filemap_read+0x304/0x9f0
109f5c12105SOscar Salvador	 xfs_file_buffered_read+0xe6/0x1d0 [xfs]
110f5c12105SOscar Salvador	 xfs_file_read_iter+0x1f0/0x380 [xfs]
111f5c12105SOscar Salvador	 __kernel_read+0x3b9/0x730
112f5c12105SOscar Salvador	 kernel_read_file+0x309/0x4d0
113f5c12105SOscar Salvador	 __do_sys_finit_module+0x381/0x730
114f5c12105SOscar Salvador	 do_syscall_64+0x8d/0x150
115f5c12105SOscar Salvador	 entry_SYSCALL_64_after_hwframe+0x62/0x6a
116f5c12105SOscar Salvador	nr_base_pages: 20824
117ba6fe537SOscar Salvador	...
118ba6fe537SOscar Salvador
11916a7ade8SJoonsoo Kim	cat /sys/kernel/debug/page_owner > page_owner_full.txt
1205b94ce2fSChanghee Han	./page_owner_sort page_owner_full.txt sorted_page_owner.txt
12116a7ade8SJoonsoo Kim
12218ab3078SJonathan Corbet   The general output of ``page_owner_full.txt`` is as follows::
123f7df2b1cSZhenliang Wei
124f7df2b1cSZhenliang Wei	Page allocated via order XXX, ...
125f7df2b1cSZhenliang Wei	PFN XXX ...
126f7df2b1cSZhenliang Wei	// Detailed stack
127f7df2b1cSZhenliang Wei
128f7df2b1cSZhenliang Wei	Page allocated via order XXX, ...
129f7df2b1cSZhenliang Wei	PFN XXX ...
130f7df2b1cSZhenliang Wei	// Detailed stack
1318f0efa81SKassey Li    By default, it will do full pfn dump, to start with a given pfn,
1328f0efa81SKassey Li    page_owner supports fseek.
1338f0efa81SKassey Li
1348f0efa81SKassey Li    FILE *fp = fopen("/sys/kernel/debug/page_owner", "r");
1358f0efa81SKassey Li    fseek(fp, pfn_start, SEEK_SET);
136f7df2b1cSZhenliang Wei
137f7df2b1cSZhenliang Wei   The ``page_owner_sort`` tool ignores ``PFN`` rows, puts the remaining rows
138f7df2b1cSZhenliang Wei   in buf, uses regexp to extract the page order value, counts the times
13957f2b54aSShenghong Han   and pages of buf, and finally sorts them according to the parameter(s).
140f7df2b1cSZhenliang Wei
14116a7ade8SJoonsoo Kim   See the result about who allocated each page
14218ab3078SJonathan Corbet   in the ``sorted_page_owner.txt``. General output::
143f7df2b1cSZhenliang Wei
144f7df2b1cSZhenliang Wei	XXX times, XXX pages:
145f7df2b1cSZhenliang Wei	Page allocated via order XXX, ...
146f7df2b1cSZhenliang Wei	// Detailed stack
147f7df2b1cSZhenliang Wei
148f7df2b1cSZhenliang Wei   By default, ``page_owner_sort`` is sorted according to the times of buf.
14957f2b54aSShenghong Han   If you want to sort by the page nums of buf, use the ``-m`` parameter.
15057f2b54aSShenghong Han   The detailed parameters are:
15157f2b54aSShenghong Han
1525603f9bdSAkira Yokosawa   fundamental function::
15357f2b54aSShenghong Han
15457f2b54aSShenghong Han	Sort:
15557f2b54aSShenghong Han		-a		Sort by memory allocation time.
15657f2b54aSShenghong Han		-m		Sort by total memory.
15757f2b54aSShenghong Han		-p		Sort by pid.
158cf3c2c86SJiajian Ye		-P		Sort by tgid.
159194d52d7SJiajian Ye		-n		Sort by task command name.
16057f2b54aSShenghong Han		-r		Sort by memory release time.
16157f2b54aSShenghong Han		-s		Sort by stack trace.
16257f2b54aSShenghong Han		-t		Sort by times (default).
163ebbeae36SJiajian Ye		--sort <order>	Specify sorting order.  Sorting syntax is [+|-]key[,[+|-]key[,...]].
164ebbeae36SJiajian Ye				Choose a key from the **STANDARD FORMAT SPECIFIERS** section. The "+" is
165ebbeae36SJiajian Ye				optional since default direction is increasing numerical or lexicographic
166ebbeae36SJiajian Ye				order. Mixed use of abbreviated and complete-form of keys is allowed.
167ebbeae36SJiajian Ye
168ebbeae36SJiajian Ye		Examples:
169ebbeae36SJiajian Ye				./page_owner_sort <input> <output> --sort=n,+pid,-tgid
170ebbeae36SJiajian Ye				./page_owner_sort <input> <output> --sort=at
17157f2b54aSShenghong Han
1725603f9bdSAkira Yokosawa   additional function::
17357f2b54aSShenghong Han
17457f2b54aSShenghong Han	Cull:
1759c8a0a8eSJiajian Ye		--cull <rules>
1769c8a0a8eSJiajian Ye				Specify culling rules.Culling syntax is key[,key[,...]].Choose a
1779c8a0a8eSJiajian Ye				multi-letter key from the **STANDARD FORMAT SPECIFIERS** section.
1789c8a0a8eSJiajian Ye
1799c8a0a8eSJiajian Ye		<rules> is a single argument in the form of a comma-separated list,
1809c8a0a8eSJiajian Ye		which offers a way to specify individual culling rules.  The recognized
1819c8a0a8eSJiajian Ye		keywords are described in the **STANDARD FORMAT SPECIFIERS** section below.
1829c8a0a8eSJiajian Ye		<rules> can be specified by the sequence of keys k1,k2, ..., as described in
1839c8a0a8eSJiajian Ye		the STANDARD SORT KEYS section below. Mixed use of abbreviated and
1849c8a0a8eSJiajian Ye		complete-form of keys is allowed.
1859c8a0a8eSJiajian Ye
1869c8a0a8eSJiajian Ye		Examples:
1879c8a0a8eSJiajian Ye				./page_owner_sort <input> <output> --cull=stacktrace
1889c8a0a8eSJiajian Ye				./page_owner_sort <input> <output> --cull=st,pid,name
1899c8a0a8eSJiajian Ye				./page_owner_sort <input> <output> --cull=n,f
19057f2b54aSShenghong Han
19157f2b54aSShenghong Han	Filter:
19259d7cb27SJiajian Ye		-f		Filter out the information of blocks whose memory has been released.
1938ea8613aSJiajian Ye
1948ea8613aSJiajian Ye	Select:
19575382a2dSJiajian Ye		--pid <pidlist>		Select by pid. This selects the blocks whose process ID
19675382a2dSJiajian Ye					numbers appear in <pidlist>.
19775382a2dSJiajian Ye		--tgid <tgidlist>	Select by tgid. This selects the blocks whose thread
19875382a2dSJiajian Ye					group ID numbers appear in <tgidlist>.
19975382a2dSJiajian Ye		--name <cmdlist>	Select by task command name. This selects the blocks whose
20075382a2dSJiajian Ye					task command name appear in <cmdlist>.
20175382a2dSJiajian Ye
20275382a2dSJiajian Ye		<pidlist>, <tgidlist>, <cmdlist> are single arguments in the form of a comma-separated list,
20375382a2dSJiajian Ye		which offers a way to specify individual selecting rules.
20475382a2dSJiajian Ye
20575382a2dSJiajian Ye
20675382a2dSJiajian Ye		Examples:
20775382a2dSJiajian Ye				./page_owner_sort <input> <output> --pid=1
20875382a2dSJiajian Ye				./page_owner_sort <input> <output> --tgid=1,2,3
20975382a2dSJiajian Ye				./page_owner_sort <input> <output> --name name1,name2
2109c8a0a8eSJiajian Ye
2119c8a0a8eSJiajian YeSTANDARD FORMAT SPECIFIERS
2129c8a0a8eSJiajian Ye==========================
2135603f9bdSAkira Yokosawa::
2149c8a0a8eSJiajian Ye
215ebbeae36SJiajian Ye  For --sort option:
216ebbeae36SJiajian Ye
217ebbeae36SJiajian Ye	KEY		LONG		DESCRIPTION
218ebbeae36SJiajian Ye	p		pid		process ID
219ebbeae36SJiajian Ye	tg		tgid		thread group ID
220ebbeae36SJiajian Ye	n		name		task command name
221ebbeae36SJiajian Ye	st		stacktrace	stack trace of the page allocation
222ebbeae36SJiajian Ye	T		txt		full text of block
223ebbeae36SJiajian Ye	ft		free_ts		timestamp of the page when it was released
224ebbeae36SJiajian Ye	at		alloc_ts	timestamp of the page when it was allocated
225f09654bbSYixuan Cao	ator		allocator	memory allocator for pages
226ebbeae36SJiajian Ye
227e7951a3eSChen Xiao  For --cull option:
228ebbeae36SJiajian Ye
2299c8a0a8eSJiajian Ye	KEY		LONG		DESCRIPTION
2309c8a0a8eSJiajian Ye	p		pid		process ID
2319c8a0a8eSJiajian Ye	tg		tgid		thread group ID
2329c8a0a8eSJiajian Ye	n		name		task command name
2339c8a0a8eSJiajian Ye	f		free		whether the page has been released or not
234ebbeae36SJiajian Ye	st		stacktrace	stack trace of the page allocation
235f09654bbSYixuan Cao	ator		allocator	memory allocator for pages
236