perf/Documentation/perf-stat.txt

39 	- a raw PMU event in the form of rN where N is a hexadecimal value
52 	  'percore' is a event qualifier that sums up the event counts for both
66 	'uncore_' is also ignored when performing this match.
109 	The default path is /sys/fs/bpf/perf_attr_map.
124         system-wide collection from all CPUs (default if no target is specified)
151 In per-thread mode, this option is ignored. The -a option is still necessary
152 to activate system-wide monitoring. Default is to count on all CPUs.
192 monitor only in the container (cgroup) called "name". This option is available only
195 can be provided. Each cgroup is applied to the corresponding event, i.e., first cgroup
196 to first event, second cgroup to second event and so on. It is possible to provide
216 Append to the output file designated with the -o option. Ignored if -o is not specified.
266 --pre::
270 perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' \-- make -s -j64 O=defc…
278 If the metric exists, it is calculated by the counts generated in this interval and the metric is p…
290 This option is not supported with the "-I" option.
299 is a useful mode to detect imbalance between sockets.  To enable this mode,
301 socket number and the number of online processors on that socket. This is
306 is a useful mode to detect imbalance between dies.  To enable this mode,
308 die number and the number of online processors on that die. This is
313 is a useful mode to detect imbalance between clusters.  To enable this mode,
315 cluster number and the number of online processors on that cluster. This is
329 is a useful mode to detect imbalance between physical cores.  To enable this mode,
339 is a useful mode to detect imbalance between NUMA nodes. To enable this
345 disabled). This is useful to filter out the startup phase of the program,
346 which is often very different.
362 --metric-no-merge::
367 group is that the group may require multiplexing and so accuracy for a
368 small group that need not have multiplexing is lowered. This option
378 compute the threshold then the threshold is still computed and used to
382 Don't print output, warnings or messages. This is useful with perf stat
428 	When threshold information is available for a metric, the
429 	color red is used to signify a metric has exceeded a threshold
436 --no-merge::
437 Do not aggregate/merge counts across monitored CPUs or PMUs.
448    opened on each thread and aggregation is performed across them.
450 2. Prefix or glob wildcard matching is used for the PMU name. For
453    combined if the PMU is specified without the suffix such as
459 --hybrid-merge::
472 enough. Backend bound means that computation or memory access is the bottle
475 an apparently bottleneck. The bottleneck is only the real bottleneck
476 if the workload is actually bound by the CPU and not by something else.
478 For best results it is usually a good idea to use it with interval
487 CPU thread. Per core mode is automatically enabled
488 and -a (global monitoring) is needed, requiring root rights or
497 To interpret the results it is usually needed to know on which
504 at runtime. Currently, a zero value is assigned to the retire_latency event when
505 this option is not set. The TPEBS hardware feature starts from Intel Granite
506 Rapids microarchitecture. This option only exists in X86_64 and is meaningful on
522 Error out if the input is higher than the supported max level.
532 In practice, the percentages of SMI cycles is very useful for performance
534 The output is SMI cycles%, equals to (aperf - unhalted core cycles) / aperf
550 hardware thread. This is essentially a replacement for the any bit and
608 With -x, perf stat is able to output a not-quite-CSV format output
610 it is recommended to use a different character like -x \;
633 With -j, perf stat is able to print out a JSON format output