1		ftrace - Function Tracer
2		========================
3
4Copyright 2008 Red Hat Inc.
5   Author:   Steven Rostedt <srostedt@redhat.com>
6  License:   The GNU Free Documentation License, Version 1.2
7               (dual licensed under the GPL v2)
8Reviewers:   Elias Oltmanns, Randy Dunlap, Andrew Morton,
9	     John Kacur, and David Teigland.
10Written for: 2.6.28-rc2
11
12Introduction
13------------
14
15Ftrace is an internal tracer designed to help out developers and
16designers of systems to find what is going on inside the kernel.
17It can be used for debugging or analyzing latencies and
18performance issues that take place outside of user-space.
19
20Although ftrace is the function tracer, it also includes an
21infrastructure that allows for other types of tracing. Some of
22the tracers that are currently in ftrace include a tracer to
23trace context switches, the time it takes for a high priority
24task to run after it was woken up, the time interrupts are
25disabled, and more (ftrace allows for tracer plugins, which
26means that the list of tracers can always grow).
27
28
29Implementation Details
30----------------------
31
32See ftrace-design.txt for details for arch porters and such.
33
34
35The File System
36---------------
37
38Ftrace uses the debugfs file system to hold the control files as
39well as the files to display output.
40
41When debugfs is configured into the kernel (which selecting any ftrace
42option will do) the directory /sys/kernel/debug will be created. To mount
43this directory, you can add to your /etc/fstab file:
44
45 debugfs       /sys/kernel/debug          debugfs defaults        0       0
46
47Or you can mount it at run time with:
48
49 mount -t debugfs nodev /sys/kernel/debug
50
51For quicker access to that directory you may want to make a soft link to
52it:
53
54 ln -s /sys/kernel/debug /debug
55
56Any selected ftrace option will also create a directory called tracing
57within the debugfs. The rest of the document will assume that you are in
58the ftrace directory (cd /sys/kernel/debug/tracing) and will only concentrate
59on the files within that directory and not distract from the content with
60the extended "/sys/kernel/debug/tracing" path name.
61
62That's it! (assuming that you have ftrace configured into your kernel)
63
64After mounting the debugfs, you can see a directory called
65"tracing".  This directory contains the control and output files
66of ftrace. Here is a list of some of the key files:
67
68
69 Note: all time values are in microseconds.
70
71  current_tracer:
72
73	This is used to set or display the current tracer
74	that is configured.
75
76  available_tracers:
77
78	This holds the different types of tracers that
79	have been compiled into the kernel. The
80	tracers listed here can be configured by
81	echoing their name into current_tracer.
82
83  tracing_on:
84
85	This sets or displays whether writing to the trace
86	ring buffer is enabled. Echo 0 into this file to disable
87	the tracer or 1 to enable it.
88
89  trace:
90
91	This file holds the output of the trace in a human
92	readable format (described below).
93
94  trace_pipe:
95
96	The output is the same as the "trace" file but this
97	file is meant to be streamed with live tracing.
98	Reads from this file will block until new data is
99	retrieved.  Unlike the "trace" file, this file is a
100	consumer. This means reading from this file causes
101	sequential reads to display more current data. Once
102	data is read from this file, it is consumed, and
103	will not be read again with a sequential read. The
104	"trace" file is static, and if the tracer is not
105	adding more data,they will display the same
106	information every time they are read.
107
108  trace_options:
109
110	This file lets the user control the amount of data
111	that is displayed in one of the above output
112	files.
113
114  tracing_max_latency:
115
116	Some of the tracers record the max latency.
117	For example, the time interrupts are disabled.
118	This time is saved in this file. The max trace
119	will also be stored, and displayed by "trace".
120	A new max trace will only be recorded if the
121	latency is greater than the value in this
122	file. (in microseconds)
123
124  buffer_size_kb:
125
126	This sets or displays the number of kilobytes each CPU
127	buffer can hold. The tracer buffers are the same size
128	for each CPU. The displayed number is the size of the
129	CPU buffer and not total size of all buffers. The
130	trace buffers are allocated in pages (blocks of memory
131	that the kernel uses for allocation, usually 4 KB in size).
132	If the last page allocated has room for more bytes
133	than requested, the rest of the page will be used,
134	making the actual allocation bigger than requested.
135	( Note, the size may not be a multiple of the page size
136	  due to buffer management overhead. )
137
138	This can only be updated when the current_tracer
139	is set to "nop".
140
141  tracing_cpumask:
142
143	This is a mask that lets the user only trace
144	on specified CPUS. The format is a hex string
145	representing the CPUS.
146
147  set_ftrace_filter:
148
149	When dynamic ftrace is configured in (see the
150	section below "dynamic ftrace"), the code is dynamically
151	modified (code text rewrite) to disable calling of the
152	function profiler (mcount). This lets tracing be configured
153	in with practically no overhead in performance.  This also
154	has a side effect of enabling or disabling specific functions
155	to be traced. Echoing names of functions into this file
156	will limit the trace to only those functions.
157
158	This interface also allows for commands to be used. See the
159	"Filter commands" section for more details.
160
161  set_ftrace_notrace:
162
163	This has an effect opposite to that of
164	set_ftrace_filter. Any function that is added here will not
165	be traced. If a function exists in both set_ftrace_filter
166	and set_ftrace_notrace,	the function will _not_ be traced.
167
168  set_ftrace_pid:
169
170	Have the function tracer only trace a single thread.
171
172  set_graph_function:
173
174	Set a "trigger" function where tracing should start
175	with the function graph tracer (See the section
176	"dynamic ftrace" for more details).
177
178  available_filter_functions:
179
180	This lists the functions that ftrace
181	has processed and can trace. These are the function
182	names that you can pass to "set_ftrace_filter" or
183	"set_ftrace_notrace". (See the section "dynamic ftrace"
184	below for more details.)
185
186
187The Tracers
188-----------
189
190Here is the list of current tracers that may be configured.
191
192  "function"
193
194	Function call tracer to trace all kernel functions.
195
196  "function_graph"
197
198	Similar to the function tracer except that the
199	function tracer probes the functions on their entry
200	whereas the function graph tracer traces on both entry
201	and exit of the functions. It then provides the ability
202	to draw a graph of function calls similar to C code
203	source.
204
205  "irqsoff"
206
207	Traces the areas that disable interrupts and saves
208	the trace with the longest max latency.
209	See tracing_max_latency. When a new max is recorded,
210	it replaces the old trace. It is best to view this
211	trace with the latency-format option enabled.
212
213  "preemptoff"
214
215	Similar to irqsoff but traces and records the amount of
216	time for which preemption is disabled.
217
218  "preemptirqsoff"
219
220	Similar to irqsoff and preemptoff, but traces and
221	records the largest time for which irqs and/or preemption
222	is disabled.
223
224  "wakeup"
225
226	Traces and records the max latency that it takes for
227	the highest priority task to get scheduled after
228	it has been woken up.
229
230  "hw-branch-tracer"
231
232	Uses the BTS CPU feature on x86 CPUs to traces all
233	branches executed.
234
235  "nop"
236
237	This is the "trace nothing" tracer. To remove all
238	tracers from tracing simply echo "nop" into
239	current_tracer.
240
241
242Examples of using the tracer
243----------------------------
244
245Here are typical examples of using the tracers when controlling
246them only with the debugfs interface (without using any
247user-land utilities).
248
249Output format:
250--------------
251
252Here is an example of the output format of the file "trace"
253
254                             --------
255# tracer: function
256#
257#           TASK-PID   CPU#    TIMESTAMP  FUNCTION
258#              | |      |          |         |
259            bash-4251  [01] 10152.583854: path_put <-path_walk
260            bash-4251  [01] 10152.583855: dput <-path_put
261            bash-4251  [01] 10152.583855: _atomic_dec_and_lock <-dput
262                             --------
263
264A header is printed with the tracer name that is represented by
265the trace. In this case the tracer is "function". Then a header
266showing the format. Task name "bash", the task PID "4251", the
267CPU that it was running on "01", the timestamp in <secs>.<usecs>
268format, the function name that was traced "path_put" and the
269parent function that called this function "path_walk". The
270timestamp is the time at which the function was entered.
271
272Latency trace format
273--------------------
274
275When the latency-format option is enabled, the trace file gives
276somewhat more information to see why a latency happened.
277Here is a typical trace.
278
279# tracer: irqsoff
280#
281irqsoff latency trace v1.1.5 on 2.6.26-rc8
282--------------------------------------------------------------------
283 latency: 97 us, #3/3, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
284    -----------------
285    | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
286    -----------------
287 => started at: apic_timer_interrupt
288 => ended at:   do_softirq
289
290#                _------=> CPU#
291#               / _-----=> irqs-off
292#              | / _----=> need-resched
293#              || / _---=> hardirq/softirq
294#              ||| / _--=> preempt-depth
295#              |||| /
296#              |||||     delay
297#  cmd     pid ||||| time  |   caller
298#     \   /    |||||   \   |   /
299  <idle>-0     0d..1    0us+: trace_hardirqs_off_thunk (apic_timer_interrupt)
300  <idle>-0     0d.s.   97us : __do_softirq (do_softirq)
301  <idle>-0     0d.s1   98us : trace_hardirqs_on (do_softirq)
302
303
304This shows that the current tracer is "irqsoff" tracing the time
305for which interrupts were disabled. It gives the trace version
306and the version of the kernel upon which this was executed on
307(2.6.26-rc8). Then it displays the max latency in microsecs (97
308us). The number of trace entries displayed and the total number
309recorded (both are three: #3/3). The type of preemption that was
310used (PREEMPT). VP, KP, SP, and HP are always zero and are
311reserved for later use. #P is the number of online CPUS (#P:2).
312
313The task is the process that was running when the latency
314occurred. (swapper pid: 0).
315
316The start and stop (the functions in which the interrupts were
317disabled and enabled respectively) that caused the latencies:
318
319  apic_timer_interrupt is where the interrupts were disabled.
320  do_softirq is where they were enabled again.
321
322The next lines after the header are the trace itself. The header
323explains which is which.
324
325  cmd: The name of the process in the trace.
326
327  pid: The PID of that process.
328
329  CPU#: The CPU which the process was running on.
330
331  irqs-off: 'd' interrupts are disabled. '.' otherwise.
332	    Note: If the architecture does not support a way to
333		  read the irq flags variable, an 'X' will always
334		  be printed here.
335
336  need-resched: 'N' task need_resched is set, '.' otherwise.
337
338  hardirq/softirq:
339	'H' - hard irq occurred inside a softirq.
340	'h' - hard irq is running
341	's' - soft irq is running
342	'.' - normal context.
343
344  preempt-depth: The level of preempt_disabled
345
346The above is mostly meaningful for kernel developers.
347
348  time: When the latency-format option is enabled, the trace file
349	output includes a timestamp relative to the start of the
350	trace. This differs from the output when latency-format
351	is disabled, which includes an absolute timestamp.
352
353  delay: This is just to help catch your eye a bit better. And
354	 needs to be fixed to be only relative to the same CPU.
355	 The marks are determined by the difference between this
356	 current trace and the next trace.
357	  '!' - greater than preempt_mark_thresh (default 100)
358	  '+' - greater than 1 microsecond
359	  ' ' - less than or equal to 1 microsecond.
360
361  The rest is the same as the 'trace' file.
362
363
364trace_options
365-------------
366
367The trace_options file is used to control what gets printed in
368the trace output. To see what is available, simply cat the file:
369
370  cat trace_options
371  print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \
372  noblock nostacktrace nosched-tree nouserstacktrace nosym-userobj
373
374To disable one of the options, echo in the option prepended with
375"no".
376
377  echo noprint-parent > trace_options
378
379To enable an option, leave off the "no".
380
381  echo sym-offset > trace_options
382
383Here are the available options:
384
385  print-parent - On function traces, display the calling (parent)
386		 function as well as the function being traced.
387
388  print-parent:
389   bash-4000  [01]  1477.606694: simple_strtoul <-strict_strtoul
390
391  noprint-parent:
392   bash-4000  [01]  1477.606694: simple_strtoul
393
394
395  sym-offset - Display not only the function name, but also the
396	       offset in the function. For example, instead of
397	       seeing just "ktime_get", you will see
398	       "ktime_get+0xb/0x20".
399
400  sym-offset:
401   bash-4000  [01]  1477.606694: simple_strtoul+0x6/0xa0
402
403  sym-addr - this will also display the function address as well
404	     as the function name.
405
406  sym-addr:
407   bash-4000  [01]  1477.606694: simple_strtoul <c0339346>
408
409  verbose - This deals with the trace file when the
410            latency-format option is enabled.
411
412    bash  4000 1 0 00000000 00010a95 [58127d26] 1720.415ms \
413    (+0.000ms): simple_strtoul (strict_strtoul)
414
415  raw - This will display raw numbers. This option is best for
416	use with user applications that can translate the raw
417	numbers better than having it done in the kernel.
418
419  hex - Similar to raw, but the numbers will be in a hexadecimal
420	format.
421
422  bin - This will print out the formats in raw binary.
423
424  block - TBD (needs update)
425
426  stacktrace - This is one of the options that changes the trace
427	       itself. When a trace is recorded, so is the stack
428	       of functions. This allows for back traces of
429	       trace sites.
430
431  userstacktrace - This option changes the trace. It records a
432		   stacktrace of the current userspace thread.
433
434  sym-userobj - when user stacktrace are enabled, look up which
435		object the address belongs to, and print a
436		relative address. This is especially useful when
437		ASLR is on, otherwise you don't get a chance to
438		resolve the address to object/file/line after
439		the app is no longer running
440
441		The lookup is performed when you read
442		trace,trace_pipe. Example:
443
444		a.out-1623  [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0
445x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
446
447  sched-tree - trace all tasks that are on the runqueue, at
448	       every scheduling event. Will add overhead if
449	       there's a lot of tasks running at once.
450
451  latency-format - This option changes the trace. When
452                   it is enabled, the trace displays
453                   additional information about the
454                   latencies, as described in "Latency
455                   trace format".
456
457  overwrite - This controls what happens when the trace buffer is
458              full. If "1" (default), the oldest events are
459              discarded and overwritten. If "0", then the newest
460              events are discarded.
461
462ftrace_enabled
463--------------
464
465The following tracers (listed below) give different output
466depending on whether or not the sysctl ftrace_enabled is set. To
467set ftrace_enabled, one can either use the sysctl function or
468set it via the proc file system interface.
469
470  sysctl kernel.ftrace_enabled=1
471
472 or
473
474  echo 1 > /proc/sys/kernel/ftrace_enabled
475
476To disable ftrace_enabled simply replace the '1' with '0' in the
477above commands.
478
479When ftrace_enabled is set the tracers will also record the
480functions that are within the trace. The descriptions of the
481tracers will also show an example with ftrace enabled.
482
483
484irqsoff
485-------
486
487When interrupts are disabled, the CPU can not react to any other
488external event (besides NMIs and SMIs). This prevents the timer
489interrupt from triggering or the mouse interrupt from letting
490the kernel know of a new mouse event. The result is a latency
491with the reaction time.
492
493The irqsoff tracer tracks the time for which interrupts are
494disabled. When a new maximum latency is hit, the tracer saves
495the trace leading up to that latency point so that every time a
496new maximum is reached, the old saved trace is discarded and the
497new trace is saved.
498
499To reset the maximum, echo 0 into tracing_max_latency. Here is
500an example:
501
502 # echo irqsoff > current_tracer
503 # echo latency-format > trace_options
504 # echo 0 > tracing_max_latency
505 # echo 1 > tracing_on
506 # ls -ltr
507 [...]
508 # echo 0 > tracing_on
509 # cat trace
510# tracer: irqsoff
511#
512irqsoff latency trace v1.1.5 on 2.6.26
513--------------------------------------------------------------------
514 latency: 12 us, #3/3, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
515    -----------------
516    | task: bash-3730 (uid:0 nice:0 policy:0 rt_prio:0)
517    -----------------
518 => started at: sys_setpgid
519 => ended at:   sys_setpgid
520
521#                _------=> CPU#
522#               / _-----=> irqs-off
523#              | / _----=> need-resched
524#              || / _---=> hardirq/softirq
525#              ||| / _--=> preempt-depth
526#              |||| /
527#              |||||     delay
528#  cmd     pid ||||| time  |   caller
529#     \   /    |||||   \   |   /
530    bash-3730  1d...    0us : _write_lock_irq (sys_setpgid)
531    bash-3730  1d..1    1us+: _write_unlock_irq (sys_setpgid)
532    bash-3730  1d..2   14us : trace_hardirqs_on (sys_setpgid)
533
534
535Here we see that that we had a latency of 12 microsecs (which is
536very good). The _write_lock_irq in sys_setpgid disabled
537interrupts. The difference between the 12 and the displayed
538timestamp 14us occurred because the clock was incremented
539between the time of recording the max latency and the time of
540recording the function that had that latency.
541
542Note the above example had ftrace_enabled not set. If we set the
543ftrace_enabled, we get a much larger output:
544
545# tracer: irqsoff
546#
547irqsoff latency trace v1.1.5 on 2.6.26-rc8
548--------------------------------------------------------------------
549 latency: 50 us, #101/101, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
550    -----------------
551    | task: ls-4339 (uid:0 nice:0 policy:0 rt_prio:0)
552    -----------------
553 => started at: __alloc_pages_internal
554 => ended at:   __alloc_pages_internal
555
556#                _------=> CPU#
557#               / _-----=> irqs-off
558#              | / _----=> need-resched
559#              || / _---=> hardirq/softirq
560#              ||| / _--=> preempt-depth
561#              |||| /
562#              |||||     delay
563#  cmd     pid ||||| time  |   caller
564#     \   /    |||||   \   |   /
565      ls-4339  0...1    0us+: get_page_from_freelist (__alloc_pages_internal)
566      ls-4339  0d..1    3us : rmqueue_bulk (get_page_from_freelist)
567      ls-4339  0d..1    3us : _spin_lock (rmqueue_bulk)
568      ls-4339  0d..1    4us : add_preempt_count (_spin_lock)
569      ls-4339  0d..2    4us : __rmqueue (rmqueue_bulk)
570      ls-4339  0d..2    5us : __rmqueue_smallest (__rmqueue)
571      ls-4339  0d..2    5us : __mod_zone_page_state (__rmqueue_smallest)
572      ls-4339  0d..2    6us : __rmqueue (rmqueue_bulk)
573      ls-4339  0d..2    6us : __rmqueue_smallest (__rmqueue)
574      ls-4339  0d..2    7us : __mod_zone_page_state (__rmqueue_smallest)
575      ls-4339  0d..2    7us : __rmqueue (rmqueue_bulk)
576      ls-4339  0d..2    8us : __rmqueue_smallest (__rmqueue)
577[...]
578      ls-4339  0d..2   46us : __rmqueue_smallest (__rmqueue)
579      ls-4339  0d..2   47us : __mod_zone_page_state (__rmqueue_smallest)
580      ls-4339  0d..2   47us : __rmqueue (rmqueue_bulk)
581      ls-4339  0d..2   48us : __rmqueue_smallest (__rmqueue)
582      ls-4339  0d..2   48us : __mod_zone_page_state (__rmqueue_smallest)
583      ls-4339  0d..2   49us : _spin_unlock (rmqueue_bulk)
584      ls-4339  0d..2   49us : sub_preempt_count (_spin_unlock)
585      ls-4339  0d..1   50us : get_page_from_freelist (__alloc_pages_internal)
586      ls-4339  0d..2   51us : trace_hardirqs_on (__alloc_pages_internal)
587
588
589
590Here we traced a 50 microsecond latency. But we also see all the
591functions that were called during that time. Note that by
592enabling function tracing, we incur an added overhead. This
593overhead may extend the latency times. But nevertheless, this
594trace has provided some very helpful debugging information.
595
596
597preemptoff
598----------
599
600When preemption is disabled, we may be able to receive
601interrupts but the task cannot be preempted and a higher
602priority task must wait for preemption to be enabled again
603before it can preempt a lower priority task.
604
605The preemptoff tracer traces the places that disable preemption.
606Like the irqsoff tracer, it records the maximum latency for
607which preemption was disabled. The control of preemptoff tracer
608is much like the irqsoff tracer.
609
610 # echo preemptoff > current_tracer
611 # echo latency-format > trace_options
612 # echo 0 > tracing_max_latency
613 # echo 1 > tracing_on
614 # ls -ltr
615 [...]
616 # echo 0 > tracing_on
617 # cat trace
618# tracer: preemptoff
619#
620preemptoff latency trace v1.1.5 on 2.6.26-rc8
621--------------------------------------------------------------------
622 latency: 29 us, #3/3, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
623    -----------------
624    | task: sshd-4261 (uid:0 nice:0 policy:0 rt_prio:0)
625    -----------------
626 => started at: do_IRQ
627 => ended at:   __do_softirq
628
629#                _------=> CPU#
630#               / _-----=> irqs-off
631#              | / _----=> need-resched
632#              || / _---=> hardirq/softirq
633#              ||| / _--=> preempt-depth
634#              |||| /
635#              |||||     delay
636#  cmd     pid ||||| time  |   caller
637#     \   /    |||||   \   |   /
638    sshd-4261  0d.h.    0us+: irq_enter (do_IRQ)
639    sshd-4261  0d.s.   29us : _local_bh_enable (__do_softirq)
640    sshd-4261  0d.s1   30us : trace_preempt_on (__do_softirq)
641
642
643This has some more changes. Preemption was disabled when an
644interrupt came in (notice the 'h'), and was enabled while doing
645a softirq. (notice the 's'). But we also see that interrupts
646have been disabled when entering the preempt off section and
647leaving it (the 'd'). We do not know if interrupts were enabled
648in the mean time.
649
650# tracer: preemptoff
651#
652preemptoff latency trace v1.1.5 on 2.6.26-rc8
653--------------------------------------------------------------------
654 latency: 63 us, #87/87, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
655    -----------------
656    | task: sshd-4261 (uid:0 nice:0 policy:0 rt_prio:0)
657    -----------------
658 => started at: remove_wait_queue
659 => ended at:   __do_softirq
660
661#                _------=> CPU#
662#               / _-----=> irqs-off
663#              | / _----=> need-resched
664#              || / _---=> hardirq/softirq
665#              ||| / _--=> preempt-depth
666#              |||| /
667#              |||||     delay
668#  cmd     pid ||||| time  |   caller
669#     \   /    |||||   \   |   /
670    sshd-4261  0d..1    0us : _spin_lock_irqsave (remove_wait_queue)
671    sshd-4261  0d..1    1us : _spin_unlock_irqrestore (remove_wait_queue)
672    sshd-4261  0d..1    2us : do_IRQ (common_interrupt)
673    sshd-4261  0d..1    2us : irq_enter (do_IRQ)
674    sshd-4261  0d..1    2us : idle_cpu (irq_enter)
675    sshd-4261  0d..1    3us : add_preempt_count (irq_enter)
676    sshd-4261  0d.h1    3us : idle_cpu (irq_enter)
677    sshd-4261  0d.h.    4us : handle_fasteoi_irq (do_IRQ)
678[...]
679    sshd-4261  0d.h.   12us : add_preempt_count (_spin_lock)
680    sshd-4261  0d.h1   12us : ack_ioapic_quirk_irq (handle_fasteoi_irq)
681    sshd-4261  0d.h1   13us : move_native_irq (ack_ioapic_quirk_irq)
682    sshd-4261  0d.h1   13us : _spin_unlock (handle_fasteoi_irq)
683    sshd-4261  0d.h1   14us : sub_preempt_count (_spin_unlock)
684    sshd-4261  0d.h1   14us : irq_exit (do_IRQ)
685    sshd-4261  0d.h1   15us : sub_preempt_count (irq_exit)
686    sshd-4261  0d..2   15us : do_softirq (irq_exit)
687    sshd-4261  0d...   15us : __do_softirq (do_softirq)
688    sshd-4261  0d...   16us : __local_bh_disable (__do_softirq)
689    sshd-4261  0d...   16us+: add_preempt_count (__local_bh_disable)
690    sshd-4261  0d.s4   20us : add_preempt_count (__local_bh_disable)
691    sshd-4261  0d.s4   21us : sub_preempt_count (local_bh_enable)
692    sshd-4261  0d.s5   21us : sub_preempt_count (local_bh_enable)
693[...]
694    sshd-4261  0d.s6   41us : add_preempt_count (__local_bh_disable)
695    sshd-4261  0d.s6   42us : sub_preempt_count (local_bh_enable)
696    sshd-4261  0d.s7   42us : sub_preempt_count (local_bh_enable)
697    sshd-4261  0d.s5   43us : add_preempt_count (__local_bh_disable)
698    sshd-4261  0d.s5   43us : sub_preempt_count (local_bh_enable_ip)
699    sshd-4261  0d.s6   44us : sub_preempt_count (local_bh_enable_ip)
700    sshd-4261  0d.s5   44us : add_preempt_count (__local_bh_disable)
701    sshd-4261  0d.s5   45us : sub_preempt_count (local_bh_enable)
702[...]
703    sshd-4261  0d.s.   63us : _local_bh_enable (__do_softirq)
704    sshd-4261  0d.s1   64us : trace_preempt_on (__do_softirq)
705
706
707The above is an example of the preemptoff trace with
708ftrace_enabled set. Here we see that interrupts were disabled
709the entire time. The irq_enter code lets us know that we entered
710an interrupt 'h'. Before that, the functions being traced still
711show that it is not in an interrupt, but we can see from the
712functions themselves that this is not the case.
713
714Notice that __do_softirq when called does not have a
715preempt_count. It may seem that we missed a preempt enabling.
716What really happened is that the preempt count is held on the
717thread's stack and we switched to the softirq stack (4K stacks
718in effect). The code does not copy the preempt count, but
719because interrupts are disabled, we do not need to worry about
720it. Having a tracer like this is good for letting people know
721what really happens inside the kernel.
722
723
724preemptirqsoff
725--------------
726
727Knowing the locations that have interrupts disabled or
728preemption disabled for the longest times is helpful. But
729sometimes we would like to know when either preemption and/or
730interrupts are disabled.
731
732Consider the following code:
733
734    local_irq_disable();
735    call_function_with_irqs_off();
736    preempt_disable();
737    call_function_with_irqs_and_preemption_off();
738    local_irq_enable();
739    call_function_with_preemption_off();
740    preempt_enable();
741
742The irqsoff tracer will record the total length of
743call_function_with_irqs_off() and
744call_function_with_irqs_and_preemption_off().
745
746The preemptoff tracer will record the total length of
747call_function_with_irqs_and_preemption_off() and
748call_function_with_preemption_off().
749
750But neither will trace the time that interrupts and/or
751preemption is disabled. This total time is the time that we can
752not schedule. To record this time, use the preemptirqsoff
753tracer.
754
755Again, using this trace is much like the irqsoff and preemptoff
756tracers.
757
758 # echo preemptirqsoff > current_tracer
759 # echo latency-format > trace_options
760 # echo 0 > tracing_max_latency
761 # echo 1 > tracing_on
762 # ls -ltr
763 [...]
764 # echo 0 > tracing_on
765 # cat trace
766# tracer: preemptirqsoff
767#
768preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8
769--------------------------------------------------------------------
770 latency: 293 us, #3/3, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
771    -----------------
772    | task: ls-4860 (uid:0 nice:0 policy:0 rt_prio:0)
773    -----------------
774 => started at: apic_timer_interrupt
775 => ended at:   __do_softirq
776
777#                _------=> CPU#
778#               / _-----=> irqs-off
779#              | / _----=> need-resched
780#              || / _---=> hardirq/softirq
781#              ||| / _--=> preempt-depth
782#              |||| /
783#              |||||     delay
784#  cmd     pid ||||| time  |   caller
785#     \   /    |||||   \   |   /
786      ls-4860  0d...    0us!: trace_hardirqs_off_thunk (apic_timer_interrupt)
787      ls-4860  0d.s.  294us : _local_bh_enable (__do_softirq)
788      ls-4860  0d.s1  294us : trace_preempt_on (__do_softirq)
789
790
791
792The trace_hardirqs_off_thunk is called from assembly on x86 when
793interrupts are disabled in the assembly code. Without the
794function tracing, we do not know if interrupts were enabled
795within the preemption points. We do see that it started with
796preemption enabled.
797
798Here is a trace with ftrace_enabled set:
799
800
801# tracer: preemptirqsoff
802#
803preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8
804--------------------------------------------------------------------
805 latency: 105 us, #183/183, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
806    -----------------
807    | task: sshd-4261 (uid:0 nice:0 policy:0 rt_prio:0)
808    -----------------
809 => started at: write_chan
810 => ended at:   __do_softirq
811
812#                _------=> CPU#
813#               / _-----=> irqs-off
814#              | / _----=> need-resched
815#              || / _---=> hardirq/softirq
816#              ||| / _--=> preempt-depth
817#              |||| /
818#              |||||     delay
819#  cmd     pid ||||| time  |   caller
820#     \   /    |||||   \   |   /
821      ls-4473  0.N..    0us : preempt_schedule (write_chan)
822      ls-4473  0dN.1    1us : _spin_lock (schedule)
823      ls-4473  0dN.1    2us : add_preempt_count (_spin_lock)
824      ls-4473  0d..2    2us : put_prev_task_fair (schedule)
825[...]
826      ls-4473  0d..2   13us : set_normalized_timespec (ktime_get_ts)
827      ls-4473  0d..2   13us : __switch_to (schedule)
828    sshd-4261  0d..2   14us : finish_task_switch (schedule)
829    sshd-4261  0d..2   14us : _spin_unlock_irq (finish_task_switch)
830    sshd-4261  0d..1   15us : add_preempt_count (_spin_lock_irqsave)
831    sshd-4261  0d..2   16us : _spin_unlock_irqrestore (hrtick_set)
832    sshd-4261  0d..2   16us : do_IRQ (common_interrupt)
833    sshd-4261  0d..2   17us : irq_enter (do_IRQ)
834    sshd-4261  0d..2   17us : idle_cpu (irq_enter)
835    sshd-4261  0d..2   18us : add_preempt_count (irq_enter)
836    sshd-4261  0d.h2   18us : idle_cpu (irq_enter)
837    sshd-4261  0d.h.   18us : handle_fasteoi_irq (do_IRQ)
838    sshd-4261  0d.h.   19us : _spin_lock (handle_fasteoi_irq)
839    sshd-4261  0d.h.   19us : add_preempt_count (_spin_lock)
840    sshd-4261  0d.h1   20us : _spin_unlock (handle_fasteoi_irq)
841    sshd-4261  0d.h1   20us : sub_preempt_count (_spin_unlock)
842[...]
843    sshd-4261  0d.h1   28us : _spin_unlock (handle_fasteoi_irq)
844    sshd-4261  0d.h1   29us : sub_preempt_count (_spin_unlock)
845    sshd-4261  0d.h2   29us : irq_exit (do_IRQ)
846    sshd-4261  0d.h2   29us : sub_preempt_count (irq_exit)
847    sshd-4261  0d..3   30us : do_softirq (irq_exit)
848    sshd-4261  0d...   30us : __do_softirq (do_softirq)
849    sshd-4261  0d...   31us : __local_bh_disable (__do_softirq)
850    sshd-4261  0d...   31us+: add_preempt_count (__local_bh_disable)
851    sshd-4261  0d.s4   34us : add_preempt_count (__local_bh_disable)
852[...]
853    sshd-4261  0d.s3   43us : sub_preempt_count (local_bh_enable_ip)
854    sshd-4261  0d.s4   44us : sub_preempt_count (local_bh_enable_ip)
855    sshd-4261  0d.s3   44us : smp_apic_timer_interrupt (apic_timer_interrupt)
856    sshd-4261  0d.s3   45us : irq_enter (smp_apic_timer_interrupt)
857    sshd-4261  0d.s3   45us : idle_cpu (irq_enter)
858    sshd-4261  0d.s3   46us : add_preempt_count (irq_enter)
859    sshd-4261  0d.H3   46us : idle_cpu (irq_enter)
860    sshd-4261  0d.H3   47us : hrtimer_interrupt (smp_apic_timer_interrupt)
861    sshd-4261  0d.H3   47us : ktime_get (hrtimer_interrupt)
862[...]
863    sshd-4261  0d.H3   81us : tick_program_event (hrtimer_interrupt)
864    sshd-4261  0d.H3   82us : ktime_get (tick_program_event)
865    sshd-4261  0d.H3   82us : ktime_get_ts (ktime_get)
866    sshd-4261  0d.H3   83us : getnstimeofday (ktime_get_ts)
867    sshd-4261  0d.H3   83us : set_normalized_timespec (ktime_get_ts)
868    sshd-4261  0d.H3   84us : clockevents_program_event (tick_program_event)
869    sshd-4261  0d.H3   84us : lapic_next_event (clockevents_program_event)
870    sshd-4261  0d.H3   85us : irq_exit (smp_apic_timer_interrupt)
871    sshd-4261  0d.H3   85us : sub_preempt_count (irq_exit)
872    sshd-4261  0d.s4   86us : sub_preempt_count (irq_exit)
873    sshd-4261  0d.s3   86us : add_preempt_count (__local_bh_disable)
874[...]
875    sshd-4261  0d.s1   98us : sub_preempt_count (net_rx_action)
876    sshd-4261  0d.s.   99us : add_preempt_count (_spin_lock_irq)
877    sshd-4261  0d.s1   99us+: _spin_unlock_irq (run_timer_softirq)
878    sshd-4261  0d.s.  104us : _local_bh_enable (__do_softirq)
879    sshd-4261  0d.s.  104us : sub_preempt_count (_local_bh_enable)
880    sshd-4261  0d.s.  105us : _local_bh_enable (__do_softirq)
881    sshd-4261  0d.s1  105us : trace_preempt_on (__do_softirq)
882
883
884This is a very interesting trace. It started with the preemption
885of the ls task. We see that the task had the "need_resched" bit
886set via the 'N' in the trace.  Interrupts were disabled before
887the spin_lock at the beginning of the trace. We see that a
888schedule took place to run sshd.  When the interrupts were
889enabled, we took an interrupt. On return from the interrupt
890handler, the softirq ran. We took another interrupt while
891running the softirq as we see from the capital 'H'.
892
893
894wakeup
895------
896
897In a Real-Time environment it is very important to know the
898wakeup time it takes for the highest priority task that is woken
899up to the time that it executes. This is also known as "schedule
900latency". I stress the point that this is about RT tasks. It is
901also important to know the scheduling latency of non-RT tasks,
902but the average schedule latency is better for non-RT tasks.
903Tools like LatencyTop are more appropriate for such
904measurements.
905
906Real-Time environments are interested in the worst case latency.
907That is the longest latency it takes for something to happen,
908and not the average. We can have a very fast scheduler that may
909only have a large latency once in a while, but that would not
910work well with Real-Time tasks.  The wakeup tracer was designed
911to record the worst case wakeups of RT tasks. Non-RT tasks are
912not recorded because the tracer only records one worst case and
913tracing non-RT tasks that are unpredictable will overwrite the
914worst case latency of RT tasks.
915
916Since this tracer only deals with RT tasks, we will run this
917slightly differently than we did with the previous tracers.
918Instead of performing an 'ls', we will run 'sleep 1' under
919'chrt' which changes the priority of the task.
920
921 # echo wakeup > current_tracer
922 # echo latency-format > trace_options
923 # echo 0 > tracing_max_latency
924 # echo 1 > tracing_on
925 # chrt -f 5 sleep 1
926 # echo 0 > tracing_on
927 # cat trace
928# tracer: wakeup
929#
930wakeup latency trace v1.1.5 on 2.6.26-rc8
931--------------------------------------------------------------------
932 latency: 4 us, #2/2, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
933    -----------------
934    | task: sleep-4901 (uid:0 nice:0 policy:1 rt_prio:5)
935    -----------------
936
937#                _------=> CPU#
938#               / _-----=> irqs-off
939#              | / _----=> need-resched
940#              || / _---=> hardirq/softirq
941#              ||| / _--=> preempt-depth
942#              |||| /
943#              |||||     delay
944#  cmd     pid ||||| time  |   caller
945#     \   /    |||||   \   |   /
946  <idle>-0     1d.h4    0us+: try_to_wake_up (wake_up_process)
947  <idle>-0     1d..4    4us : schedule (cpu_idle)
948
949
950Running this on an idle system, we see that it only took 4
951microseconds to perform the task switch.  Note, since the trace
952marker in the schedule is before the actual "switch", we stop
953the tracing when the recorded task is about to schedule in. This
954may change if we add a new marker at the end of the scheduler.
955
956Notice that the recorded task is 'sleep' with the PID of 4901
957and it has an rt_prio of 5. This priority is user-space priority
958and not the internal kernel priority. The policy is 1 for
959SCHED_FIFO and 2 for SCHED_RR.
960
961Doing the same with chrt -r 5 and ftrace_enabled set.
962
963# tracer: wakeup
964#
965wakeup latency trace v1.1.5 on 2.6.26-rc8
966--------------------------------------------------------------------
967 latency: 50 us, #60/60, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
968    -----------------
969    | task: sleep-4068 (uid:0 nice:0 policy:2 rt_prio:5)
970    -----------------
971
972#                _------=> CPU#
973#               / _-----=> irqs-off
974#              | / _----=> need-resched
975#              || / _---=> hardirq/softirq
976#              ||| / _--=> preempt-depth
977#              |||| /
978#              |||||     delay
979#  cmd     pid ||||| time  |   caller
980#     \   /    |||||   \   |   /
981ksoftirq-7     1d.H3    0us : try_to_wake_up (wake_up_process)
982ksoftirq-7     1d.H4    1us : sub_preempt_count (marker_probe_cb)
983ksoftirq-7     1d.H3    2us : check_preempt_wakeup (try_to_wake_up)
984ksoftirq-7     1d.H3    3us : update_curr (check_preempt_wakeup)
985ksoftirq-7     1d.H3    4us : calc_delta_mine (update_curr)
986ksoftirq-7     1d.H3    5us : __resched_task (check_preempt_wakeup)
987ksoftirq-7     1d.H3    6us : task_wake_up_rt (try_to_wake_up)
988ksoftirq-7     1d.H3    7us : _spin_unlock_irqrestore (try_to_wake_up)
989[...]
990ksoftirq-7     1d.H2   17us : irq_exit (smp_apic_timer_interrupt)
991ksoftirq-7     1d.H2   18us : sub_preempt_count (irq_exit)
992ksoftirq-7     1d.s3   19us : sub_preempt_count (irq_exit)
993ksoftirq-7     1..s2   20us : rcu_process_callbacks (__do_softirq)
994[...]
995ksoftirq-7     1..s2   26us : __rcu_process_callbacks (rcu_process_callbacks)
996ksoftirq-7     1d.s2   27us : _local_bh_enable (__do_softirq)
997ksoftirq-7     1d.s2   28us : sub_preempt_count (_local_bh_enable)
998ksoftirq-7     1.N.3   29us : sub_preempt_count (ksoftirqd)
999ksoftirq-7     1.N.2   30us : _cond_resched (ksoftirqd)
1000ksoftirq-7     1.N.2   31us : __cond_resched (_cond_resched)
1001ksoftirq-7     1.N.2   32us : add_preempt_count (__cond_resched)
1002ksoftirq-7     1.N.2   33us : schedule (__cond_resched)
1003ksoftirq-7     1.N.2   33us : add_preempt_count (schedule)
1004ksoftirq-7     1.N.3   34us : hrtick_clear (schedule)
1005ksoftirq-7     1dN.3   35us : _spin_lock (schedule)
1006ksoftirq-7     1dN.3   36us : add_preempt_count (_spin_lock)
1007ksoftirq-7     1d..4   37us : put_prev_task_fair (schedule)
1008ksoftirq-7     1d..4   38us : update_curr (put_prev_task_fair)
1009[...]
1010ksoftirq-7     1d..5   47us : _spin_trylock (tracing_record_cmdline)
1011ksoftirq-7     1d..5   48us : add_preempt_count (_spin_trylock)
1012ksoftirq-7     1d..6   49us : _spin_unlock (tracing_record_cmdline)
1013ksoftirq-7     1d..6   49us : sub_preempt_count (_spin_unlock)
1014ksoftirq-7     1d..4   50us : schedule (__cond_resched)
1015
1016The interrupt went off while running ksoftirqd. This task runs
1017at SCHED_OTHER. Why did not we see the 'N' set early? This may
1018be a harmless bug with x86_32 and 4K stacks. On x86_32 with 4K
1019stacks configured, the interrupt and softirq run with their own
1020stack. Some information is held on the top of the task's stack
1021(need_resched and preempt_count are both stored there). The
1022setting of the NEED_RESCHED bit is done directly to the task's
1023stack, but the reading of the NEED_RESCHED is done by looking at
1024the current stack, which in this case is the stack for the hard
1025interrupt. This hides the fact that NEED_RESCHED has been set.
1026We do not see the 'N' until we switch back to the task's
1027assigned stack.
1028
1029function
1030--------
1031
1032This tracer is the function tracer. Enabling the function tracer
1033can be done from the debug file system. Make sure the
1034ftrace_enabled is set; otherwise this tracer is a nop.
1035
1036 # sysctl kernel.ftrace_enabled=1
1037 # echo function > current_tracer
1038 # echo 1 > tracing_on
1039 # usleep 1
1040 # echo 0 > tracing_on
1041 # cat trace
1042# tracer: function
1043#
1044#           TASK-PID   CPU#    TIMESTAMP  FUNCTION
1045#              | |      |          |         |
1046            bash-4003  [00]   123.638713: finish_task_switch <-schedule
1047            bash-4003  [00]   123.638714: _spin_unlock_irq <-finish_task_switch
1048            bash-4003  [00]   123.638714: sub_preempt_count <-_spin_unlock_irq
1049            bash-4003  [00]   123.638715: hrtick_set <-schedule
1050            bash-4003  [00]   123.638715: _spin_lock_irqsave <-hrtick_set
1051            bash-4003  [00]   123.638716: add_preempt_count <-_spin_lock_irqsave
1052            bash-4003  [00]   123.638716: _spin_unlock_irqrestore <-hrtick_set
1053            bash-4003  [00]   123.638717: sub_preempt_count <-_spin_unlock_irqrestore
1054            bash-4003  [00]   123.638717: hrtick_clear <-hrtick_set
1055            bash-4003  [00]   123.638718: sub_preempt_count <-schedule
1056            bash-4003  [00]   123.638718: sub_preempt_count <-preempt_schedule
1057            bash-4003  [00]   123.638719: wait_for_completion <-__stop_machine_run
1058            bash-4003  [00]   123.638719: wait_for_common <-wait_for_completion
1059            bash-4003  [00]   123.638720: _spin_lock_irq <-wait_for_common
1060            bash-4003  [00]   123.638720: add_preempt_count <-_spin_lock_irq
1061[...]
1062
1063
1064Note: function tracer uses ring buffers to store the above
1065entries. The newest data may overwrite the oldest data.
1066Sometimes using echo to stop the trace is not sufficient because
1067the tracing could have overwritten the data that you wanted to
1068record. For this reason, it is sometimes better to disable
1069tracing directly from a program. This allows you to stop the
1070tracing at the point that you hit the part that you are
1071interested in. To disable the tracing directly from a C program,
1072something like following code snippet can be used:
1073
1074int trace_fd;
1075[...]
1076int main(int argc, char *argv[]) {
1077	[...]
1078	trace_fd = open(tracing_file("tracing_on"), O_WRONLY);
1079	[...]
1080	if (condition_hit()) {
1081		write(trace_fd, "0", 1);
1082	}
1083	[...]
1084}
1085
1086
1087Single thread tracing
1088---------------------
1089
1090By writing into set_ftrace_pid you can trace a
1091single thread. For example:
1092
1093# cat set_ftrace_pid
1094no pid
1095# echo 3111 > set_ftrace_pid
1096# cat set_ftrace_pid
10973111
1098# echo function > current_tracer
1099# cat trace | head
1100 # tracer: function
1101 #
1102 #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
1103 #              | |       |          |         |
1104     yum-updatesd-3111  [003]  1637.254676: finish_task_switch <-thread_return
1105     yum-updatesd-3111  [003]  1637.254681: hrtimer_cancel <-schedule_hrtimeout_range
1106     yum-updatesd-3111  [003]  1637.254682: hrtimer_try_to_cancel <-hrtimer_cancel
1107     yum-updatesd-3111  [003]  1637.254683: lock_hrtimer_base <-hrtimer_try_to_cancel
1108     yum-updatesd-3111  [003]  1637.254685: fget_light <-do_sys_poll
1109     yum-updatesd-3111  [003]  1637.254686: pipe_poll <-do_sys_poll
1110# echo -1 > set_ftrace_pid
1111# cat trace |head
1112 # tracer: function
1113 #
1114 #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
1115 #              | |       |          |         |
1116 ##### CPU 3 buffer started ####
1117     yum-updatesd-3111  [003]  1701.957688: free_poll_entry <-poll_freewait
1118     yum-updatesd-3111  [003]  1701.957689: remove_wait_queue <-free_poll_entry
1119     yum-updatesd-3111  [003]  1701.957691: fput <-free_poll_entry
1120     yum-updatesd-3111  [003]  1701.957692: audit_syscall_exit <-sysret_audit
1121     yum-updatesd-3111  [003]  1701.957693: path_put <-audit_syscall_exit
1122
1123If you want to trace a function when executing, you could use
1124something like this simple program:
1125
1126#include <stdio.h>
1127#include <stdlib.h>
1128#include <sys/types.h>
1129#include <sys/stat.h>
1130#include <fcntl.h>
1131#include <unistd.h>
1132#include <string.h>
1133
1134#define _STR(x) #x
1135#define STR(x) _STR(x)
1136#define MAX_PATH 256
1137
1138const char *find_debugfs(void)
1139{
1140       static char debugfs[MAX_PATH+1];
1141       static int debugfs_found;
1142       char type[100];
1143       FILE *fp;
1144
1145       if (debugfs_found)
1146               return debugfs;
1147
1148       if ((fp = fopen("/proc/mounts","r")) == NULL) {
1149               perror("/proc/mounts");
1150               return NULL;
1151       }
1152
1153       while (fscanf(fp, "%*s %"
1154                     STR(MAX_PATH)
1155                     "s %99s %*s %*d %*d\n",
1156                     debugfs, type) == 2) {
1157               if (strcmp(type, "debugfs") == 0)
1158                       break;
1159       }
1160       fclose(fp);
1161
1162       if (strcmp(type, "debugfs") != 0) {
1163               fprintf(stderr, "debugfs not mounted");
1164               return NULL;
1165       }
1166
1167       strcat(debugfs, "/tracing/");
1168       debugfs_found = 1;
1169
1170       return debugfs;
1171}
1172
1173const char *tracing_file(const char *file_name)
1174{
1175       static char trace_file[MAX_PATH+1];
1176       snprintf(trace_file, MAX_PATH, "%s/%s", find_debugfs(), file_name);
1177       return trace_file;
1178}
1179
1180int main (int argc, char **argv)
1181{
1182        if (argc < 1)
1183                exit(-1);
1184
1185        if (fork() > 0) {
1186                int fd, ffd;
1187                char line[64];
1188                int s;
1189
1190                ffd = open(tracing_file("current_tracer"), O_WRONLY);
1191                if (ffd < 0)
1192                        exit(-1);
1193                write(ffd, "nop", 3);
1194
1195                fd = open(tracing_file("set_ftrace_pid"), O_WRONLY);
1196                s = sprintf(line, "%d\n", getpid());
1197                write(fd, line, s);
1198
1199                write(ffd, "function", 8);
1200
1201                close(fd);
1202                close(ffd);
1203
1204                execvp(argv[1], argv+1);
1205        }
1206
1207        return 0;
1208}
1209
1210
1211hw-branch-tracer (x86 only)
1212---------------------------
1213
1214This tracer uses the x86 last branch tracing hardware feature to
1215collect a branch trace on all cpus with relatively low overhead.
1216
1217The tracer uses a fixed-size circular buffer per cpu and only
1218traces ring 0 branches. The trace file dumps that buffer in the
1219following format:
1220
1221# tracer: hw-branch-tracer
1222#
1223# CPU#        TO  <-  FROM
1224   0  scheduler_tick+0xb5/0x1bf	  <-  task_tick_idle+0x5/0x6
1225   2  run_posix_cpu_timers+0x2b/0x72a	  <-  run_posix_cpu_timers+0x25/0x72a
1226   0  scheduler_tick+0x139/0x1bf	  <-  scheduler_tick+0xed/0x1bf
1227   0  scheduler_tick+0x17c/0x1bf	  <-  scheduler_tick+0x148/0x1bf
1228   2  run_posix_cpu_timers+0x9e/0x72a	  <-  run_posix_cpu_timers+0x5e/0x72a
1229   0  scheduler_tick+0x1b6/0x1bf	  <-  scheduler_tick+0x1aa/0x1bf
1230
1231
1232The tracer may be used to dump the trace for the oops'ing cpu on
1233a kernel oops into the system log. To enable this,
1234ftrace_dump_on_oops must be set. To set ftrace_dump_on_oops, one
1235can either use the sysctl function or set it via the proc system
1236interface.
1237
1238  sysctl kernel.ftrace_dump_on_oops=n
1239
1240or
1241
1242  echo n > /proc/sys/kernel/ftrace_dump_on_oops
1243
1244If n = 1, ftrace will dump buffers of all CPUs, if n = 2 ftrace will
1245only dump the buffer of the CPU that triggered the oops.
1246
1247Here's an example of such a dump after a null pointer
1248dereference in a kernel module:
1249
1250[57848.105921] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
1251[57848.106019] IP: [<ffffffffa0000006>] open+0x6/0x14 [oops]
1252[57848.106019] PGD 2354e9067 PUD 2375e7067 PMD 0
1253[57848.106019] Oops: 0002 [#1] SMP
1254[57848.106019] last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:20:05.0/local_cpus
1255[57848.106019] Dumping ftrace buffer:
1256[57848.106019] ---------------------------------
1257[...]
1258[57848.106019]    0  chrdev_open+0xe6/0x165	  <-  cdev_put+0x23/0x24
1259[57848.106019]    0  chrdev_open+0x117/0x165	  <-  chrdev_open+0xfa/0x165
1260[57848.106019]    0  chrdev_open+0x120/0x165	  <-  chrdev_open+0x11c/0x165
1261[57848.106019]    0  chrdev_open+0x134/0x165	  <-  chrdev_open+0x12b/0x165
1262[57848.106019]    0  open+0x0/0x14 [oops]	  <-  chrdev_open+0x144/0x165
1263[57848.106019]    0  page_fault+0x0/0x30	  <-  open+0x6/0x14 [oops]
1264[57848.106019]    0  error_entry+0x0/0x5b	  <-  page_fault+0x4/0x30
1265[57848.106019]    0  error_kernelspace+0x0/0x31	  <-  error_entry+0x59/0x5b
1266[57848.106019]    0  error_sti+0x0/0x1	  <-  error_kernelspace+0x2d/0x31
1267[57848.106019]    0  page_fault+0x9/0x30	  <-  error_sti+0x0/0x1
1268[57848.106019]    0  do_page_fault+0x0/0x881	  <-  page_fault+0x1a/0x30
1269[...]
1270[57848.106019]    0  do_page_fault+0x66b/0x881	  <-  is_prefetch+0x1ee/0x1f2
1271[57848.106019]    0  do_page_fault+0x6e0/0x881	  <-  do_page_fault+0x67a/0x881
1272[57848.106019]    0  oops_begin+0x0/0x96	  <-  do_page_fault+0x6e0/0x881
1273[57848.106019]    0  trace_hw_branch_oops+0x0/0x2d	  <-  oops_begin+0x9/0x96
1274[...]
1275[57848.106019]    0  ds_suspend_bts+0x2a/0xe3	  <-  ds_suspend_bts+0x1a/0xe3
1276[57848.106019] ---------------------------------
1277[57848.106019] CPU 0
1278[57848.106019] Modules linked in: oops
1279[57848.106019] Pid: 5542, comm: cat Tainted: G        W  2.6.28 #23
1280[57848.106019] RIP: 0010:[<ffffffffa0000006>]  [<ffffffffa0000006>] open+0x6/0x14 [oops]
1281[57848.106019] RSP: 0018:ffff880235457d48  EFLAGS: 00010246
1282[...]
1283
1284
1285function graph tracer
1286---------------------------
1287
1288This tracer is similar to the function tracer except that it
1289probes a function on its entry and its exit. This is done by
1290using a dynamically allocated stack of return addresses in each
1291task_struct. On function entry the tracer overwrites the return
1292address of each function traced to set a custom probe. Thus the
1293original return address is stored on the stack of return address
1294in the task_struct.
1295
1296Probing on both ends of a function leads to special features
1297such as:
1298
1299- measure of a function's time execution
1300- having a reliable call stack to draw function calls graph
1301
1302This tracer is useful in several situations:
1303
1304- you want to find the reason of a strange kernel behavior and
1305  need to see what happens in detail on any areas (or specific
1306  ones).
1307
1308- you are experiencing weird latencies but it's difficult to
1309  find its origin.
1310
1311- you want to find quickly which path is taken by a specific
1312  function
1313
1314- you just want to peek inside a working kernel and want to see
1315  what happens there.
1316
1317# tracer: function_graph
1318#
1319# CPU  DURATION                  FUNCTION CALLS
1320# |     |   |                     |   |   |   |
1321
1322 0)               |  sys_open() {
1323 0)               |    do_sys_open() {
1324 0)               |      getname() {
1325 0)               |        kmem_cache_alloc() {
1326 0)   1.382 us    |          __might_sleep();
1327 0)   2.478 us    |        }
1328 0)               |        strncpy_from_user() {
1329 0)               |          might_fault() {
1330 0)   1.389 us    |            __might_sleep();
1331 0)   2.553 us    |          }
1332 0)   3.807 us    |        }
1333 0)   7.876 us    |      }
1334 0)               |      alloc_fd() {
1335 0)   0.668 us    |        _spin_lock();
1336 0)   0.570 us    |        expand_files();
1337 0)   0.586 us    |        _spin_unlock();
1338
1339
1340There are several columns that can be dynamically
1341enabled/disabled. You can use every combination of options you
1342want, depending on your needs.
1343
1344- The cpu number on which the function executed is default
1345  enabled.  It is sometimes better to only trace one cpu (see
1346  tracing_cpu_mask file) or you might sometimes see unordered
1347  function calls while cpu tracing switch.
1348
1349	hide: echo nofuncgraph-cpu > trace_options
1350	show: echo funcgraph-cpu > trace_options
1351
1352- The duration (function's time of execution) is displayed on
1353  the closing bracket line of a function or on the same line
1354  than the current function in case of a leaf one. It is default
1355  enabled.
1356
1357	hide: echo nofuncgraph-duration > trace_options
1358	show: echo funcgraph-duration > trace_options
1359
1360- The overhead field precedes the duration field in case of
1361  reached duration thresholds.
1362
1363	hide: echo nofuncgraph-overhead > trace_options
1364	show: echo funcgraph-overhead > trace_options
1365	depends on: funcgraph-duration
1366
1367  ie:
1368
1369  0)               |    up_write() {
1370  0)   0.646 us    |      _spin_lock_irqsave();
1371  0)   0.684 us    |      _spin_unlock_irqrestore();
1372  0)   3.123 us    |    }
1373  0)   0.548 us    |    fput();
1374  0) + 58.628 us   |  }
1375
1376  [...]
1377
1378  0)               |      putname() {
1379  0)               |        kmem_cache_free() {
1380  0)   0.518 us    |          __phys_addr();
1381  0)   1.757 us    |        }
1382  0)   2.861 us    |      }
1383  0) ! 115.305 us  |    }
1384  0) ! 116.402 us  |  }
1385
1386  + means that the function exceeded 10 usecs.
1387  ! means that the function exceeded 100 usecs.
1388
1389
1390- The task/pid field displays the thread cmdline and pid which
1391  executed the function. It is default disabled.
1392
1393	hide: echo nofuncgraph-proc > trace_options
1394	show: echo funcgraph-proc > trace_options
1395
1396  ie:
1397
1398  # tracer: function_graph
1399  #
1400  # CPU  TASK/PID        DURATION                  FUNCTION CALLS
1401  # |    |    |           |   |                     |   |   |   |
1402  0)    sh-4802     |               |                  d_free() {
1403  0)    sh-4802     |               |                    call_rcu() {
1404  0)    sh-4802     |               |                      __call_rcu() {
1405  0)    sh-4802     |   0.616 us    |                        rcu_process_gp_end();
1406  0)    sh-4802     |   0.586 us    |                        check_for_new_grace_period();
1407  0)    sh-4802     |   2.899 us    |                      }
1408  0)    sh-4802     |   4.040 us    |                    }
1409  0)    sh-4802     |   5.151 us    |                  }
1410  0)    sh-4802     | + 49.370 us   |                }
1411
1412
1413- The absolute time field is an absolute timestamp given by the
1414  system clock since it started. A snapshot of this time is
1415  given on each entry/exit of functions
1416
1417	hide: echo nofuncgraph-abstime > trace_options
1418	show: echo funcgraph-abstime > trace_options
1419
1420  ie:
1421
1422  #
1423  #      TIME       CPU  DURATION                  FUNCTION CALLS
1424  #       |         |     |   |                     |   |   |   |
1425  360.774522 |   1)   0.541 us    |                                          }
1426  360.774522 |   1)   4.663 us    |                                        }
1427  360.774523 |   1)   0.541 us    |                                        __wake_up_bit();
1428  360.774524 |   1)   6.796 us    |                                      }
1429  360.774524 |   1)   7.952 us    |                                    }
1430  360.774525 |   1)   9.063 us    |                                  }
1431  360.774525 |   1)   0.615 us    |                                  journal_mark_dirty();
1432  360.774527 |   1)   0.578 us    |                                  __brelse();
1433  360.774528 |   1)               |                                  reiserfs_prepare_for_journal() {
1434  360.774528 |   1)               |                                    unlock_buffer() {
1435  360.774529 |   1)               |                                      wake_up_bit() {
1436  360.774529 |   1)               |                                        bit_waitqueue() {
1437  360.774530 |   1)   0.594 us    |                                          __phys_addr();
1438
1439
1440You can put some comments on specific functions by using
1441trace_printk() For example, if you want to put a comment inside
1442the __might_sleep() function, you just have to include
1443<linux/ftrace.h> and call trace_printk() inside __might_sleep()
1444
1445trace_printk("I'm a comment!\n")
1446
1447will produce:
1448
1449 1)               |             __might_sleep() {
1450 1)               |                /* I'm a comment! */
1451 1)   1.449 us    |             }
1452
1453
1454You might find other useful features for this tracer in the
1455following "dynamic ftrace" section such as tracing only specific
1456functions or tasks.
1457
1458dynamic ftrace
1459--------------
1460
1461If CONFIG_DYNAMIC_FTRACE is set, the system will run with
1462virtually no overhead when function tracing is disabled. The way
1463this works is the mcount function call (placed at the start of
1464every kernel function, produced by the -pg switch in gcc),
1465starts of pointing to a simple return. (Enabling FTRACE will
1466include the -pg switch in the compiling of the kernel.)
1467
1468At compile time every C file object is run through the
1469recordmcount.pl script (located in the scripts directory). This
1470script will process the C object using objdump to find all the
1471locations in the .text section that call mcount. (Note, only the
1472.text section is processed, since processing other sections like
1473.init.text may cause races due to those sections being freed).
1474
1475A new section called "__mcount_loc" is created that holds
1476references to all the mcount call sites in the .text section.
1477This section is compiled back into the original object. The
1478final linker will add all these references into a single table.
1479
1480On boot up, before SMP is initialized, the dynamic ftrace code
1481scans this table and updates all the locations into nops. It
1482also records the locations, which are added to the
1483available_filter_functions list.  Modules are processed as they
1484are loaded and before they are executed.  When a module is
1485unloaded, it also removes its functions from the ftrace function
1486list. This is automatic in the module unload code, and the
1487module author does not need to worry about it.
1488
1489When tracing is enabled, kstop_machine is called to prevent
1490races with the CPUS executing code being modified (which can
1491cause the CPU to do undesirable things), and the nops are
1492patched back to calls. But this time, they do not call mcount
1493(which is just a function stub). They now call into the ftrace
1494infrastructure.
1495
1496One special side-effect to the recording of the functions being
1497traced is that we can now selectively choose which functions we
1498wish to trace and which ones we want the mcount calls to remain
1499as nops.
1500
1501Two files are used, one for enabling and one for disabling the
1502tracing of specified functions. They are:
1503
1504  set_ftrace_filter
1505
1506and
1507
1508  set_ftrace_notrace
1509
1510A list of available functions that you can add to these files is
1511listed in:
1512
1513   available_filter_functions
1514
1515 # cat available_filter_functions
1516put_prev_task_idle
1517kmem_cache_create
1518pick_next_task_rt
1519get_online_cpus
1520pick_next_task_fair
1521mutex_lock
1522[...]
1523
1524If I am only interested in sys_nanosleep and hrtimer_interrupt:
1525
1526 # echo sys_nanosleep hrtimer_interrupt \
1527		> set_ftrace_filter
1528 # echo function > current_tracer
1529 # echo 1 > tracing_on
1530 # usleep 1
1531 # echo 0 > tracing_on
1532 # cat trace
1533# tracer: ftrace
1534#
1535#           TASK-PID   CPU#    TIMESTAMP  FUNCTION
1536#              | |      |          |         |
1537          usleep-4134  [00]  1317.070017: hrtimer_interrupt <-smp_apic_timer_interrupt
1538          usleep-4134  [00]  1317.070111: sys_nanosleep <-syscall_call
1539          <idle>-0     [00]  1317.070115: hrtimer_interrupt <-smp_apic_timer_interrupt
1540
1541To see which functions are being traced, you can cat the file:
1542
1543 # cat set_ftrace_filter
1544hrtimer_interrupt
1545sys_nanosleep
1546
1547
1548Perhaps this is not enough. The filters also allow simple wild
1549cards. Only the following are currently available
1550
1551  <match>*  - will match functions that begin with <match>
1552  *<match>  - will match functions that end with <match>
1553  *<match>* - will match functions that have <match> in it
1554
1555These are the only wild cards which are supported.
1556
1557  <match>*<match> will not work.
1558
1559Note: It is better to use quotes to enclose the wild cards,
1560      otherwise the shell may expand the parameters into names
1561      of files in the local directory.
1562
1563 # echo 'hrtimer_*' > set_ftrace_filter
1564
1565Produces:
1566
1567# tracer: ftrace
1568#
1569#           TASK-PID   CPU#    TIMESTAMP  FUNCTION
1570#              | |      |          |         |
1571            bash-4003  [00]  1480.611794: hrtimer_init <-copy_process
1572            bash-4003  [00]  1480.611941: hrtimer_start <-hrtick_set
1573            bash-4003  [00]  1480.611956: hrtimer_cancel <-hrtick_clear
1574            bash-4003  [00]  1480.611956: hrtimer_try_to_cancel <-hrtimer_cancel
1575          <idle>-0     [00]  1480.612019: hrtimer_get_next_event <-get_next_timer_interrupt
1576          <idle>-0     [00]  1480.612025: hrtimer_get_next_event <-get_next_timer_interrupt
1577          <idle>-0     [00]  1480.612032: hrtimer_get_next_event <-get_next_timer_interrupt
1578          <idle>-0     [00]  1480.612037: hrtimer_get_next_event <-get_next_timer_interrupt
1579          <idle>-0     [00]  1480.612382: hrtimer_get_next_event <-get_next_timer_interrupt
1580
1581
1582Notice that we lost the sys_nanosleep.
1583
1584 # cat set_ftrace_filter
1585hrtimer_run_queues
1586hrtimer_run_pending
1587hrtimer_init
1588hrtimer_cancel
1589hrtimer_try_to_cancel
1590hrtimer_forward
1591hrtimer_start
1592hrtimer_reprogram
1593hrtimer_force_reprogram
1594hrtimer_get_next_event
1595hrtimer_interrupt
1596hrtimer_nanosleep
1597hrtimer_wakeup
1598hrtimer_get_remaining
1599hrtimer_get_res
1600hrtimer_init_sleeper
1601
1602
1603This is because the '>' and '>>' act just like they do in bash.
1604To rewrite the filters, use '>'
1605To append to the filters, use '>>'
1606
1607To clear out a filter so that all functions will be recorded
1608again:
1609
1610 # echo > set_ftrace_filter
1611 # cat set_ftrace_filter
1612 #
1613
1614Again, now we want to append.
1615
1616 # echo sys_nanosleep > set_ftrace_filter
1617 # cat set_ftrace_filter
1618sys_nanosleep
1619 # echo 'hrtimer_*' >> set_ftrace_filter
1620 # cat set_ftrace_filter
1621hrtimer_run_queues
1622hrtimer_run_pending
1623hrtimer_init
1624hrtimer_cancel
1625hrtimer_try_to_cancel
1626hrtimer_forward
1627hrtimer_start
1628hrtimer_reprogram
1629hrtimer_force_reprogram
1630hrtimer_get_next_event
1631hrtimer_interrupt
1632sys_nanosleep
1633hrtimer_nanosleep
1634hrtimer_wakeup
1635hrtimer_get_remaining
1636hrtimer_get_res
1637hrtimer_init_sleeper
1638
1639
1640The set_ftrace_notrace prevents those functions from being
1641traced.
1642
1643 # echo '*preempt*' '*lock*' > set_ftrace_notrace
1644
1645Produces:
1646
1647# tracer: ftrace
1648#
1649#           TASK-PID   CPU#    TIMESTAMP  FUNCTION
1650#              | |      |          |         |
1651            bash-4043  [01]   115.281644: finish_task_switch <-schedule
1652            bash-4043  [01]   115.281645: hrtick_set <-schedule
1653            bash-4043  [01]   115.281645: hrtick_clear <-hrtick_set
1654            bash-4043  [01]   115.281646: wait_for_completion <-__stop_machine_run
1655            bash-4043  [01]   115.281647: wait_for_common <-wait_for_completion
1656            bash-4043  [01]   115.281647: kthread_stop <-stop_machine_run
1657            bash-4043  [01]   115.281648: init_waitqueue_head <-kthread_stop
1658            bash-4043  [01]   115.281648: wake_up_process <-kthread_stop
1659            bash-4043  [01]   115.281649: try_to_wake_up <-wake_up_process
1660
1661We can see that there's no more lock or preempt tracing.
1662
1663
1664Dynamic ftrace with the function graph tracer
1665---------------------------------------------
1666
1667Although what has been explained above concerns both the
1668function tracer and the function-graph-tracer, there are some
1669special features only available in the function-graph tracer.
1670
1671If you want to trace only one function and all of its children,
1672you just have to echo its name into set_graph_function:
1673
1674 echo __do_fault > set_graph_function
1675
1676will produce the following "expanded" trace of the __do_fault()
1677function:
1678
1679 0)               |  __do_fault() {
1680 0)               |    filemap_fault() {
1681 0)               |      find_lock_page() {
1682 0)   0.804 us    |        find_get_page();
1683 0)               |        __might_sleep() {
1684 0)   1.329 us    |        }
1685 0)   3.904 us    |      }
1686 0)   4.979 us    |    }
1687 0)   0.653 us    |    _spin_lock();
1688 0)   0.578 us    |    page_add_file_rmap();
1689 0)   0.525 us    |    native_set_pte_at();
1690 0)   0.585 us    |    _spin_unlock();
1691 0)               |    unlock_page() {
1692 0)   0.541 us    |      page_waitqueue();
1693 0)   0.639 us    |      __wake_up_bit();
1694 0)   2.786 us    |    }
1695 0) + 14.237 us   |  }
1696 0)               |  __do_fault() {
1697 0)               |    filemap_fault() {
1698 0)               |      find_lock_page() {
1699 0)   0.698 us    |        find_get_page();
1700 0)               |        __might_sleep() {
1701 0)   1.412 us    |        }
1702 0)   3.950 us    |      }
1703 0)   5.098 us    |    }
1704 0)   0.631 us    |    _spin_lock();
1705 0)   0.571 us    |    page_add_file_rmap();
1706 0)   0.526 us    |    native_set_pte_at();
1707 0)   0.586 us    |    _spin_unlock();
1708 0)               |    unlock_page() {
1709 0)   0.533 us    |      page_waitqueue();
1710 0)   0.638 us    |      __wake_up_bit();
1711 0)   2.793 us    |    }
1712 0) + 14.012 us   |  }
1713
1714You can also expand several functions at once:
1715
1716 echo sys_open > set_graph_function
1717 echo sys_close >> set_graph_function
1718
1719Now if you want to go back to trace all functions you can clear
1720this special filter via:
1721
1722 echo > set_graph_function
1723
1724
1725Filter commands
1726---------------
1727
1728A few commands are supported by the set_ftrace_filter interface.
1729Trace commands have the following format:
1730
1731<function>:<command>:<parameter>
1732
1733The following commands are supported:
1734
1735- mod
1736  This command enables function filtering per module. The
1737  parameter defines the module. For example, if only the write*
1738  functions in the ext3 module are desired, run:
1739
1740   echo 'write*:mod:ext3' > set_ftrace_filter
1741
1742  This command interacts with the filter in the same way as
1743  filtering based on function names. Thus, adding more functions
1744  in a different module is accomplished by appending (>>) to the
1745  filter file. Remove specific module functions by prepending
1746  '!':
1747
1748   echo '!writeback*:mod:ext3' >> set_ftrace_filter
1749
1750- traceon/traceoff
1751  These commands turn tracing on and off when the specified
1752  functions are hit. The parameter determines how many times the
1753  tracing system is turned on and off. If unspecified, there is
1754  no limit. For example, to disable tracing when a schedule bug
1755  is hit the first 5 times, run:
1756
1757   echo '__schedule_bug:traceoff:5' > set_ftrace_filter
1758
1759  These commands are cumulative whether or not they are appended
1760  to set_ftrace_filter. To remove a command, prepend it by '!'
1761  and drop the parameter:
1762
1763   echo '!__schedule_bug:traceoff' > set_ftrace_filter
1764
1765
1766trace_pipe
1767----------
1768
1769The trace_pipe outputs the same content as the trace file, but
1770the effect on the tracing is different. Every read from
1771trace_pipe is consumed. This means that subsequent reads will be
1772different. The trace is live.
1773
1774 # echo function > current_tracer
1775 # cat trace_pipe > /tmp/trace.out &
1776[1] 4153
1777 # echo 1 > tracing_on
1778 # usleep 1
1779 # echo 0 > tracing_on
1780 # cat trace
1781# tracer: function
1782#
1783#           TASK-PID   CPU#    TIMESTAMP  FUNCTION
1784#              | |      |          |         |
1785
1786 #
1787 # cat /tmp/trace.out
1788            bash-4043  [00] 41.267106: finish_task_switch <-schedule
1789            bash-4043  [00] 41.267106: hrtick_set <-schedule
1790            bash-4043  [00] 41.267107: hrtick_clear <-hrtick_set
1791            bash-4043  [00] 41.267108: wait_for_completion <-__stop_machine_run
1792            bash-4043  [00] 41.267108: wait_for_common <-wait_for_completion
1793            bash-4043  [00] 41.267109: kthread_stop <-stop_machine_run
1794            bash-4043  [00] 41.267109: init_waitqueue_head <-kthread_stop
1795            bash-4043  [00] 41.267110: wake_up_process <-kthread_stop
1796            bash-4043  [00] 41.267110: try_to_wake_up <-wake_up_process
1797            bash-4043  [00] 41.267111: select_task_rq_rt <-try_to_wake_up
1798
1799
1800Note, reading the trace_pipe file will block until more input is
1801added. By changing the tracer, trace_pipe will issue an EOF. We
1802needed to set the function tracer _before_ we "cat" the
1803trace_pipe file.
1804
1805
1806trace entries
1807-------------
1808
1809Having too much or not enough data can be troublesome in
1810diagnosing an issue in the kernel. The file buffer_size_kb is
1811used to modify the size of the internal trace buffers. The
1812number listed is the number of entries that can be recorded per
1813CPU. To know the full size, multiply the number of possible CPUS
1814with the number of entries.
1815
1816 # cat buffer_size_kb
18171408 (units kilobytes)
1818
1819Note, to modify this, you must have tracing completely disabled.
1820To do that, echo "nop" into the current_tracer. If the
1821current_tracer is not set to "nop", an EINVAL error will be
1822returned.
1823
1824 # echo nop > current_tracer
1825 # echo 10000 > buffer_size_kb
1826 # cat buffer_size_kb
182710000 (units kilobytes)
1828
1829The number of pages which will be allocated is limited to a
1830percentage of available memory. Allocating too much will produce
1831an error.
1832
1833 # echo 1000000000000 > buffer_size_kb
1834-bash: echo: write error: Cannot allocate memory
1835 # cat buffer_size_kb
183685
1837
1838-----------
1839
1840More details can be found in the source code, in the
1841kernel/trace/*.c files.
1842