perf/Documentation/perf-stat.txt

1 perf-stat(1)
5 ----
6 perf-stat - Run a command and gather performance counter statistics
9 --------
11 'perf stat' [-e <EVENT> | --event=EVENT] [-a] <command>
12 'perf stat' [-e <EVENT> | --event=EVENT] [-a] -- <command> [<options>]
13 'perf stat' [-e <EVENT> | --event=EVENT] [-a] record [-o file] -- <command> [<options>]
14 'perf stat' report [-i file]
17 -----------
23 -------
33 -e::
34 --event=::
37 	- a symbolic event name (use 'perf list' to list all events)
39 	- a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a
42         - a symbolic or raw PMU event followed by an optional colon
43 	  and a list of event modifiers, e.g., cpu-cycles:p.  See the
44 	  linkperf:perf-list[1] man page for details on event modifiers.
46 	- a symbolically formed event like 'pmu/param1=0x3,param2/' where
52 	  perf stat -A -a -e cpu/event,percore=1/,otherevent ...
54 	- a symbolically formed event like 'pmu/config=M,config1=N,config2=K/'
67 -i::
68 --no-inherit::
70 -p::
71 --pid=<pid>::
74 -t::
75 --tid=<tid>::
79 --pfm-events events::
81 including support for event filters. For example '--pfm-events
84 events cannot be mixed together. The latter must be used with the -e
85 option. The -e option and this one can be mixed and matched.  Events
89 -a::
90 --all-cpus::
91         system-wide collection from all CPUs (default if no target is specified)
93 --no-scale::
96 -d::
97 --detailed::
100 	   -d:          detailed events, L1 and LLC data cache
101         -d -d:     more detailed events, dTLB and iTLB events
102      -d -d -d:     very detailed events, adding prefetch events
104 -r::
105 --repeat=<n>::
108 -B::
109 --big-num::
111 	Enabled by default. Use "--no-big-num" to disable.
112 	Default setting can be changed with "perf config stat.big-num=false".
114 -C::
115 --cpu=::
117 comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2.
118 In per-thread mode, this option is ignored. The -a option is still necessary
119 to activate system-wide monitoring. Default is to count on all CPUs.
121 -A::
122 --no-aggr::
125 -n::
126 --null::
127         null run - don't start any counters
129 -v::
130 --verbose::
133 -x SEP::
134 --field-separator SEP::
135 print counts using a CSV-style output to make it easy to import directly into
138 --table:: Display time for each run (-r option), in a table format, e.g.:
140   $ perf stat --null -r 5 --table perf bench sched pipe
145              5.189 (-0.293) #
146              5.189 (-0.294) #
147              5.186 (-0.296) #
152              5.483 +- 0.198 seconds time elapsed  ( +-  3.62% )
154 -G name::
155 --cgroup name::
157 in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to
161 an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have
164 use '-e e1 -e e2 -G foo,foo' or just use '-e e1 -e e2 -G foo'.
167 command line can be used: 'perf stat -e cycles -G cgroup_name -a -e cycles'.
169 --for-each-cgroup name::
171 by comma).  This has same effect that repeating -e option and -G option for
172 each event x name.  This option cannot be used with -G/--cgroup option.
174 -o file::
175 --output file::
178 --append::
179 Append to the output file designated with the -o option. Ignored if -o is not specified.
181 --log-fd::
183 Log output to fd, instead of stderr.  Complementary to --output, and mutually exclusive
184 with it.  --append may be used here.  Examples:
185      3>results  perf stat --log-fd 3          -- $cmd
186      3>>results perf stat --log-fd 3 --append -- $cmd
188 --control=fifo:ctl-fifo[,ack-fifo]::
189 --control=fd:ctl-fd[,ack-fd]::
190 ctl-fifo / ack-fifo are opened and used as ctl-fd / ack-fd as follows.
191 Listen on ctl-fd descriptor for command to control measurement ('enable': enable events,
193 --delay=-1 option. Optionally send control command completion ('ack\n') to ack-fd descriptor
202  test -p ${ctl_fifo} && unlink ${ctl_fifo}
207  test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo}
211  perf stat -D -1 -e cpu-cycles -a -I 1000       \
212            --control fd:${ctl_fd},${ctl_fd_ack} \
213            -- sleep 30 &
216  sleep 5  && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})"
217  sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})"
219  exec {ctl_fd_ack}>&-
222  exec {ctl_fd}>&-
225  wait -n ${perf_pid}
229 --pre::
230 --post::
233 perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' -- make -s -j64 O=defco…
235 -I msecs::
236 --interval-print msecs::
239 	example: 'perf stat -I 1000 -e cycles -a sleep 5'
243 --interval-count times::
245 This option should be used together with "-I" option.
246 	example: 'perf stat -I 1000 --interval-count 2 -e cycles -a'
248 --interval-clear::
251 --timeout msecs::
253 This option is not supported with the "-I" option.
254 	example: 'perf stat --time 2000 -e cycles -a'
256 --metric-only::
258 Don't show any raw values. Not supported with --per-thread.
260 --per-socket::
261 Aggregate counts per processor socket for system-wide mode measurements.  This
263 use --per-socket in addition to -a. (system-wide).  The output includes the
267 --per-die::
268 Aggregate counts per processor die for system-wide mode measurements.  This
270 use --per-die in addition to -a. (system-wide).  The output includes the
274 --per-core::
275 Aggregate counts per physical processor for system-wide mode measurements.  This
277 use --per-core in addition to -a. (system-wide).  The output includes the
280 --per-thread::
281 Aggregate counts per monitored threads, when monitoring threads (-t option)
282 or processes (-p option).
284 --per-node::
285 Aggregate counts per NUMA nodes for system-wide mode measurements. This
287 mode, use --per-node in addition to -a. (system-wide).
289 -D msecs::
290 --delay msecs::
291 After starting the program, wait msecs before measuring (-1: start with events
295 -T::
296 --transaction::
300 --metric-no-group::
303 --metric-no-group option places events outside of groups and may
304 increase the chance of the event being scheduled - leading to more
306 for metrics like instructions per cycle can be lower - as both metrics
309 --metric-no-merge::
320 -----------
323 -o file::
324 --output file::
328 -----------
331 -i file::
332 --input file::
335 --per-socket::
336 Aggregate counts per processor socket for system-wide mode measurements.
338 --per-die::
339 Aggregate counts per processor die for system-wide mode measurements.
341 --per-core::
342 Aggregate counts per physical processor for system-wide mode measurements.
344 -M::
345 --metrics::
351 -A::
352 --no-aggr::
355 --topdown::
358 by breaking the cycles consumed down into frontend bound, backend bound,
362 enough. Backend bound means that computation or memory access is the bottle
369 mode like -I 1000, as the bottleneck of workloads can change often.
371 This enables --metric-only, unless overridden with --no-metric-only.
378 and -a (global monitoring) is needed, requiring root rights or
379 perf.perf_event_paranoid=-1.
391 --no-merge::
404 --smi-cost::
410 The cost of SMI can be measured by (aperf - unhalted core cycles).
413 oriented analysis. --metric_only will be applied by default.
414 The output is SMI cycles%, equals to (aperf - unhalted core cycles) / aperf
416 Users who wants to get the actual value can apply --no-metric-only.
418 --all-kernel::
421 --all-user::
424 --percore-show-thread::
433 --summary::
434 Print summary for interval mode (-I).
437 --------
439 $ perf stat -- make
443         83723.452481      task-clock:u (msec)       #    1.004 CPUs utilized
444                    0      context-switches:u        #    0.000 K/sec
445                    0      cpu-migrations:u          #    0.000 K/sec
446            3,228,188      page-faults:u             #    0.039 M/sec
450        2,078,861,393      branch-misses:u           #    2.98% of all branches
458 -------
459 As displayed in the example above we can display 3 types of timings.
460 We always display the time the counters were enabled/alive:
464 For workload sessions we also display time the workloads spent in
473 ----------
475 With -x, perf stat is able to output a not-quite-CSV format output
477 it is recommended to use a different character like -x \;
481 	- optional usec time stamp in fractions of second (with -I xxx)
482 	- optional CPU, core, or socket identifier
483 	- optional number of logical CPUs aggregated
484 	- counter value
485 	- unit of the counter value or empty
486 	- event name
487 	- run time of counter
488 	- percentage of measurement time the counter was running
489 	- optional variance if multiple values are collected with -r
490 	- optional metric value
491 	- optional unit of metric
496 --------
497 linkperf:perf-top[1], linkperf:perf-list[1]