perf/Documentation/cpu-and-latency-overheads.txt

2 -------------------------
3 There are two notions of time: wall-clock time and CPU time.
4 For a single-threaded program, or a program running on a single-core machine,
5 these notions are the same. However, for a multi-threaded/multi-process program
6 running on a multi-core machine, these notions are significantly different.
7 Each second of wall-clock time we have number-of-cores seconds of CPU time.
9 'latency' columns for CPU and wall-clock time correspondingly).
15 CPU utilization, while the latter may be useful to improve user-perceived
18 consider a program that executes function 'foo' for 9 seconds with 1 thread,
19 and then executes function 'bar' for 1 second with 128 threads (consumes
20 128 seconds of CPU time). The CPU overhead is: 'foo' - 6.6%, 'bar' - 93.4%.
21 While the latency overhead is: 'foo' - 90%, 'bar' - 10%. If we try to optimize
23 we would concentrate on the function 'bar', but it can yield only 10% running
27 'perf record --latency' and 'perf report':
29 -----------------------------------
33    0.99%   10.16%  dpkg-deb
36 -----------------------------------
38 To sort by latency overhead, use 'perf report --latency':
40 -----------------------------------
44  10.16%     0.99%  dpkg-deb
47 -----------------------------------
50 parallelization histogram with '--sort=latency,parallelism,comm,symbol --hierarchy'
51 flags. It shows fraction of (wall-clock) time the workload utilizes different
54 highly-parallel phases, which explains significant difference between
55 CPU and wall-clock overheads:
57 -----------------------------------
66 -----------------------------------
71 -----------------------------------
73 -  56.98%     2.29%     1
77        2.43%     0.10%     dpkg-source
79        2.10%     0.08%     dpkg-genchanges
80 -----------------------------------
82 To see the normal function-level profile for particular parallelism levels
83 (number of threads actively running on CPUs), you may use '--parallelism'
85 of a workload use '--latency --parallelism=1-2' flags.