xref: /linux/tools/perf/Documentation/perf-mem.txt (revision e78f70bad29c5ae1e1076698b690b15794e9b81e)
1perf-mem(1)
2===========
3
4NAME
5----
6perf-mem - Profile memory accesses
7
8SYNOPSIS
9--------
10[verse]
11'perf mem' [<options>] (record [<command>] | report)
12
13DESCRIPTION
14-----------
15"perf mem record" runs a command and gathers memory operation data
16from it, into perf.data. Perf record options are accepted and are passed through.
17
18"perf mem report" displays the result. It invokes perf report with the
19right set of options to display a memory access profile. By default, loads
20and stores are sampled. Use the -t option to limit to loads or stores.
21
22Note that on Intel systems the memory latency reported is the use-latency,
23not the pure load (or store latency). Use latency includes any pipeline
24queuing delays in addition to the memory subsystem latency.
25
26On Arm64 this uses SPE to sample load and store operations, therefore hardware
27and kernel support is required. See linkperf:perf-arm-spe[1] for a setup guide.
28Due to the statistical nature of SPE sampling, not every memory operation will
29be sampled.
30
31On AMD this use IBS Op PMU to sample load-store operations.
32
33COMMON OPTIONS
34--------------
35-f::
36--force::
37	Don't do ownership validation
38
39-t::
40--type=<type>::
41	Select the memory operation type: load or store (default: load,store)
42
43-v::
44--verbose::
45	Be more verbose (show counter open errors, etc)
46
47-p::
48--phys-data::
49	Record/Report sample physical addresses
50
51--data-page-size::
52	Record/Report sample data address page size
53
54RECORD OPTIONS
55--------------
56<command>...::
57	Any command you can specify in a shell.
58
59-e::
60--event <event>::
61	Event selector. Use 'perf mem record -e list' to list available events.
62
63-K::
64--all-kernel::
65	Configure all used events to run in kernel space.
66
67-U::
68--all-user::
69	Configure all used events to run in user space.
70
71--ldlat <n>::
72	Specify desired latency for loads event. Supported on Intel, Arm64 and
73	some AMD processors. Ignored on other archs.
74
75	On supported AMD processors:
76	- /sys/bus/event_source/devices/ibs_op/caps/ldlat file contains '1'.
77	- Supported latency values are 128 to 2048 (both inclusive).
78	- Latency value which is a multiple of 128 incurs a little less profiling
79	  overhead compared to other values.
80	- Load latency filtering is disabled by default.
81
82REPORT OPTIONS
83--------------
84-i::
85--input=<file>::
86	Input file name.
87
88-C::
89--cpu=<cpu>::
90	Monitor only on the list of CPUs provided. Multiple CPUs can be provided as a
91        comma-separated list with no space: 0,1. Ranges of CPUs are specified with -
92	like 0-2. Default is to monitor all CPUS.
93
94-D::
95--dump-raw-samples::
96	Dump the raw decoded samples on the screen in a format that is easy to parse with
97	one sample per line.
98
99-s::
100--sort=<key>::
101	Group result by given key(s) - multiple keys can be specified
102	in CSV format.  The keys are specific to memory samples are:
103	symbol_daddr, symbol_iaddr, dso_daddr, locked, tlb, mem, snoop,
104	dcacheline, phys_daddr, data_page_size, blocked.
105
106	- symbol_daddr: name of data symbol being executed on at the time of sample
107	- symbol_iaddr: name of code symbol being executed on at the time of sample
108	- dso_daddr: name of library or module containing the data being executed
109	             on at the time of the sample
110	- locked: whether the bus was locked at the time of the sample
111	- tlb: type of tlb access for the data at the time of the sample
112	- mem: type of memory access for the data at the time of the sample
113	- snoop: type of snoop (if any) for the data at the time of the sample
114	- dcacheline: the cacheline the data address is on at the time of the sample
115	- phys_daddr: physical address of data being executed on at the time of sample
116	- data_page_size: the data page size of data being executed on at the time of sample
117	- blocked: reason of blocked load access for the data at the time of the sample
118
119	And the default sort keys are changed to local_weight, mem, sym, dso,
120	symbol_daddr, dso_daddr, snoop, tlb, locked, blocked, local_ins_lat.
121
122-F::
123--fields=::
124	Specify output field - multiple keys can be specified in CSV format.
125	Please see linkperf:perf-report[1] for details.
126
127	In addition to the default fields, 'perf mem report' will provide the
128	following fields to break down sample periods.
129
130	- op: operation in the sample instruction (load, store, prefetch, ...)
131	- cache: location in CPU cache (L1, L2, ...) where the sample hit
132	- mem: location in memory or other places the sample hit
133	- dtlb: location in Data TLB (L1, L2) where the sample hit
134	- snoop: snoop result for the sampled data access
135
136	Please take a look at the OUTPUT FIELD SELECTION section for caveats.
137
138-T::
139--type-profile::
140	Show data-type profile result instead of code symbols.  This requires
141	the debug information and it will change the default sort keys to:
142	mem, snoop, tlb, type.
143
144-U::
145--hide-unresolved::
146	Only display entries resolved to a symbol.
147
148-x::
149--field-separator=<separator>::
150	Specify the field separator used when dump raw samples (-D option). By default,
151	The separator is the space character.
152
153In addition, for report all perf report options are valid, and for record
154all perf record options.
155
156OVERHEAD CALCULATION
157--------------------
158Unlike linkperf:perf-report[1], which calculates overhead from the actual
159sample period, perf-mem overhead is calculated using sample weight. E.g.
160there are two samples in perf.data file, both with the same sample period,
161but one sample with weight 180 and the other with weight 20:
162
163  $ perf script -F period,data_src,weight,ip,sym
164  100000    629080842 |OP LOAD|LVL L3 hit|...     20       7e69b93ca524 strcmp
165  100000   1a29081042 |OP LOAD|LVL RAM hit|...   180   ffffffff82429168 memcpy
166
167  $ perf report -F overhead,symbol
168  50%   [.] strcmp
169  50%   [k] memcpy
170
171  $ perf mem report -F overhead,symbol
172  90%   [k] memcpy
173  10%   [.] strcmp
174
175OUTPUT FIELD SELECTION
176----------------------
177"perf mem report" adds a number of new output fields specific to data source
178information in the sample.  Some of them have the same name with the existing
179sort keys ("mem" and "snoop").  So unlike other fields and sort keys, they'll
180behave differently when it's used by -F/--fields or -s/--sort.
181
182Using those two as output fields will aggregate samples altogether and show
183breakdown.
184
185  $ perf mem report -F mem,snoop
186  ...
187  # ------ Memory -------  --- Snoop ----
188  #     RAM Uncach  Other     HitM  Other
189  # .....................  ..............
190  #
191       3.5%   0.0%  96.5%    25.1%  74.9%
192
193But using the same name for sort keys will aggregate samples for each type
194separately.
195
196  $ perf mem report -s mem,snoop
197  # Overhead       Samples  Memory access                            Snoop
198  # ........  ............  .......................................  ............
199  #
200      47.99%          1509  L2 hit                                   N/A
201      25.08%           338  core, same node Any cache hit            HitM
202      10.24%         54374  N/A                                      N/A
203       6.77%         35938  L1 hit                                   N/A
204       6.39%           101  core, same node Any cache hit            N/A
205       3.50%            69  RAM hit                                  N/A
206       0.03%           158  LFB/MAB hit                              N/A
207       0.00%             2  Uncached hit                             N/A
208
209SEE ALSO
210--------
211linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-arm-spe[1]
212