perf/Documentation/perf-stat.txt

1 perf-stat(1)
5 ----
6 perf-stat - Run a command and gather performance counter statistics
9 --------
11 'perf stat' [-e <EVENT> | --event=EVENT] [-a] <command>
12 'perf stat' [-e <EVENT> | --event=EVENT] [-a] -- <command> [<options>]
13 'perf stat' [-e <EVENT> | --event=EVENT] [-a] record [-o file] -- <command> [<options>]
14 'perf stat' report [-i file]
17 -----------
23 -------
33 -e::
34 --event=::
37 	- a symbolic event name (use 'perf list' to list all events)
39 	- a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a
42         - a symbolic or raw PMU event followed by an optional colon
43 	  and a list of event modifiers, e.g., cpu-cycles:p.  See the
44 	  linkperf:perf-list[1] man page for details on event modifiers.
46 	- a symbolically formed event like 'pmu/param1=0x3,param2/' where
52 	  perf stat -A -a -e cpu/event,percore=1/,otherevent ...
54 	- a symbolically formed event like 'pmu/config=M,config1=N,config2=K/'
67 -i::
68 --no-inherit::
70 -p::
71 --pid=<pid>::
74 -t::
75 --tid=<tid>::
79 --pfm-events events::
81 including support for event filters. For example '--pfm-events
84 events cannot be mixed together. The latter must be used with the -e
85 option. The -e option and this one can be mixed and matched.  Events
89 -a::
90 --all-cpus::
91         system-wide collection from all CPUs (default if no target is specified)
93 --no-scale::
96 -d::
97 --detailed::
100 	   -d:          detailed events, L1 and LLC data cache
101         -d -d:     more detailed events, dTLB and iTLB events
102      -d -d -d:     very detailed events, adding prefetch events
104 -r::
105 --repeat=<n>::
108 -B::
109 --big-num::
111 	Enabled by default. Use "--no-big-num" to disable.
112 	Default setting can be changed with "perf config stat.big-num=false".
114 -C::
115 --cpu=::
117 comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2.
118 In per-thread mode, this option is ignored. The -a option is still necessary
119 to activate system-wide monitoring. Default is to count on all CPUs.
121 -A::
122 --no-aggr::
125 -n::
126 --null::
127         null run - don't start any counters
129 -v::
130 --verbose::
133 -x SEP::
134 --field-separator SEP::
135 print counts using a CSV-style output to make it easy to import directly into
138 --table:: Display time for each run (-r option), in a table format, e.g.:
140   $ perf stat --null -r 5 --table perf bench sched pipe
145              5.189 (-0.293) #
146              5.189 (-0.294) #
147              5.186 (-0.296) #
152              5.483 +- 0.198 seconds time elapsed  ( +-  3.62% )
154 -G name::
155 --cgroup name::
157 in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to
161 an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have
164 use '-e e1 -e e2 -G foo,foo' or just use '-e e1 -e e2 -G foo'.
167 command line can be used: 'perf stat -e cycles -G cgroup_name -a -e cycles'.
169 --for-each-cgroup name::
171 by comma).  This has same effect that repeating -e option and -G option for
172 each event x name.  This option cannot be used with -G/--cgroup option.
174 -o file::
175 --output file::
178 --append::
179 Append to the output file designated with the -o option. Ignored if -o is not specified.
181 --log-fd::
183 Log output to fd, instead of stderr.  Complementary to --output, and mutually exclusive
184 with it.  --append may be used here.  Examples:
185      3>results  perf stat --log-fd 3          -- $cmd
186      3>>results perf stat --log-fd 3 --append -- $cmd
188 --control=fifo:ctl-fifo[,ack-fifo]::
189 --control=fd:ctl-fd[,ack-fd]::
190 ctl-fifo / ack-fifo are opened and used as ctl-fd / ack-fd as follows.
191 Listen on ctl-fd descriptor for command to control measurement ('enable': enable events,
193 --delay=-1 option. Optionally send control command completion ('ack\n') to ack-fd descriptor
202  test -p ${ctl_fifo} && unlink ${ctl_fifo}
207  test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo}
211  perf stat -D -1 -e cpu-cycles -a -I 1000       \
212            --control fd:${ctl_fd},${ctl_fd_ack} \
213            -- sleep 30 &
216  sleep 5  && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})"
217  sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})"
219  exec {ctl_fd_ack}>&-
222  exec {ctl_fd}>&-
225  wait -n ${perf_pid}
229 --pre::
230 --post::
233 perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' -- make -s -j64 O=defco…
235 -I msecs::
236 --interval-print msecs::
237 Print count deltas every N milliseconds (minimum: 1ms)
238 The overhead percentage could be high in some cases, for instance with small, sub 100ms intervals. …
239 	example: 'perf stat -I 1000 -e cycles -a sleep 5'
243 --interval-count times::
245 This option should be used together with "-I" option.
246 	example: 'perf stat -I 1000 --interval-count 2 -e cycles -a'
248 --interval-clear::
251 --timeout msecs::
252 Stop the 'perf stat' session and print count deltas after N milliseconds (minimum: 10 ms).
253 This option is not supported with the "-I" option.
254 	example: 'perf stat --time 2000 -e cycles -a'
256 --metric-only::
258 Don't show any raw values. Not supported with --per-thread.
260 --per-socket::
261 Aggregate counts per processor socket for system-wide mode measurements.  This
263 use --per-socket in addition to -a. (system-wide).  The output includes the
267 --per-die::
268 Aggregate counts per processor die for system-wide mode measurements.  This
270 use --per-die in addition to -a. (system-wide).  The output includes the
274 --per-core::
275 Aggregate counts per physical processor for system-wide mode measurements.  This
277 use --per-core in addition to -a. (system-wide).  The output includes the
280 --per-thread::
281 Aggregate counts per monitored threads, when monitoring threads (-t option)
282 or processes (-p option).
284 --per-node::
285 Aggregate counts per NUMA nodes for system-wide mode measurements. This
287 mode, use --per-node in addition to -a. (system-wide).
289 -D msecs::
290 --delay msecs::
291 After starting the program, wait msecs before measuring (-1: start with events
292 disabled). This is useful to filter out the startup phase of the program,
295 -T::
296 --transaction::
300 --metric-no-group::
303 --metric-no-group option places events outside of groups and may
304 increase the chance of the event being scheduled - leading to more
306 for metrics like instructions per cycle can be lower - as both metrics
307 may no longer be being measured at the same time.
309 --metric-no-merge::
320 -----------
323 -o file::
324 --output file::
328 -----------
331 -i file::
332 --input file::
335 --per-socket::
336 Aggregate counts per processor socket for system-wide mode measurements.
338 --per-die::
339 Aggregate counts per processor die for system-wide mode measurements.
341 --per-core::
342 Aggregate counts per physical processor for system-wide mode measurements.
344 -M::
345 --metrics::
351 -A::
352 --no-aggr::
355 --topdown::
369 mode like -I 1000, as the bottleneck of workloads can change often.
371 This enables --metric-only, unless overridden with --no-metric-only.
378 and -a (global monitoring) is needed, requiring root rights or
379 perf.perf_event_paranoid=-1.
391 --no-merge::
404 --smi-cost::
410 The cost of SMI can be measured by (aperf - unhalted core cycles).
413 oriented analysis. --metric_only will be applied by default.
414 The output is SMI cycles%, equals to (aperf - unhalted core cycles) / aperf
416 Users who wants to get the actual value can apply --no-metric-only.
418 --all-kernel::
421 --all-user::
424 --percore-show-thread::
433 --summary::
434 Print summary for interval mode (-I).
437 --------
439 $ perf stat -- make
443         83723.452481      task-clock:u (msec)       #    1.004 CPUs utilized
444                    0      context-switches:u        #    0.000 K/sec
445                    0      cpu-migrations:u          #    0.000 K/sec
446            3,228,188      page-faults:u             #    0.039 M/sec
450        2,078,861,393      branch-misses:u           #    2.98% of all branches
452         83.409183620 seconds time elapsed
458 -------
460 We always display the time the counters were enabled/alive:
462         83.409183620 seconds time elapsed
464 For workload sessions we also display time the workloads spent in
470 Those times are the very same as displayed by the 'time' tool.
473 ----------
475 With -x, perf stat is able to output a not-quite-CSV format output
477 it is recommended to use a different character like -x \;
481 	- optional usec time stamp in fractions of second (with -I xxx)
482 	- optional CPU, core, or socket identifier
483 	- optional number of logical CPUs aggregated
484 	- counter value
485 	- unit of the counter value or empty
486 	- event name
487 	- run time of counter
488 	- percentage of measurement time the counter was running
489 	- optional variance if multiple values are collected with -r
490 	- optional metric value
491 	- optional unit of metric
496 --------
497 linkperf:perf-top[1], linkperf:perf-list[1]