Documentation/RCU/stallwarn.rst

1 .. SPDX-License-Identifier: GPL-2.0
9 options that can be used to fine-tune the detector's operation.  Finally,
20 -	A CPU looping in an RCU read-side critical section.
22 -	A CPU looping with interrupts disabled.
24 -	A CPU looping with preemption disabled.
26 -	A CPU looping with bottom halves disabled.
28 -	For !CONFIG_PREEMPTION kernels, a CPU looping anywhere in the
33 -	Booting Linux using a console connection that is too slow to
34 	keep up with the boot-time console-message rate.  For example,
36 	with boot-time message rates, and will frequently result in
40 -	Anything that prevents RCU's grace-period kthreads from running.
41 	This can result in the "All QSes seen" console-log message.
44 	result in the ``rcu_.*kthread starved for`` console-log message,
47 -	A CPU-bound real-time task in a CONFIG_PREEMPTION kernel, which might
48 	happen to preempt a low-priority task in the middle of an RCU
49 	read-side critical section.   This is especially damaging if
50 	that low-priority task is not permitted to run on any other CPU,
54 	memory, you might see stall-warning messages.
56 -	A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
62 	CONFIG_PREEMPT_RCU case, you might see stall-warning
65 	You can use the rcutree.kthread_prio kernel boot parameter to
68 	can increase your system's context-switch rate and thus degrade
71 -	A periodic interrupt whose handler takes longer than the time
74 	Note that certain high-overhead debugging options, for example
79 -	Testing a workload on a fast system, tuning the stall-warning
81 	running the same workload with the same stall-warning timeout on a
82 	slow system.  Note that thermal throttling and on-demand governors
85 -	A hardware or software issue shuts off the scheduler-clock
86 	interrupt on a CPU that is not in dyntick-idle mode.  This
90 -	A hardware or software issue that prevents time-based wakeups
96 	the ``rcu_.*timer wakeup didn't happen for`` console-log message,
99 -	A timer issue causes time to appear to jump forward, so that RCU
100 	believes that the RCU CPU stall-warning timeout has been exceeded
101 	when in fact much less time has passed.  This could be due to
106 -	A low-level kernel issue that either fails to invoke one of the
114 	of issues, which sometimes arise in architecture-specific code.
116 -	A bug in the RCU implementation.
118 -	A hardware failure.  This is quite unlikely, but is not at all
125 The RCU, RCU-sched, RCU-tasks, and RCU-tasks-trace implementations have
143 Fine-Tuning the RCU CPU Stall Detector
146 The rcuupdate.rcu_cpu_stall_suppress module parameter disables RCU's
148 periods.  This module parameter enables CPU stall detection by default,
149 but may be overridden via boot-time parameter or at runtime via sysfs.
154 ----------------------------
156 	This kernel configuration parameter defines the period of time
161 	This configuration parameter may be changed at runtime via the
163 	this parameter is checked only at the beginning of a cycle.
164 	So if you are 10 seconds into a 40-second stall, setting this
165 	sysfs parameter to (say) five will shorten the timeout for the
170 	Stall-warning messages may be enabled and disabled completely via
174 --------------------------------
176 	Same as the CONFIG_RCU_CPU_STALL_TIMEOUT parameter but only for
177 	the expedited grace period. This parameter defines the period
184 	This configuration parameter may be changed at runtime via the
186 	this parameter is checked only at the beginning of a cycle. If you
188 	the timeout for the -next- stall.
190 	Stall-warning messages may be enabled and disabled completely via
194 ---------------------
200 	macro, not a kernel configuration parameter.)
203 -------------------
206 	own warnings, as this often gives better-quality stack traces.
211 	parameter.)
214 -------------------------------
216 	This boot/sysfs parameter controls the RCU-tasks and
217 	RCU-tasks-trace stall warning intervals.  A value of zero or less
218 	suppresses RCU-tasks stall warnings.  A positive value sets the
219 	stall-warning interval in seconds.  An RCU-tasks stall warning
225 	task stalling the current RCU-tasks grace period.
227 	An RCU-tasks-trace stall warning starts (and continues) similarly:
232 Interpreting RCU's CPU Stall-Detector "Splats"
235 For non-RCU-tasks flavors of RCU, when a CPU detects that some other
239 	2-...: (3 GPs behind) idle=06c/0/0 softirq=1453/1455 fqs=0
240 	16-...: (0 ticks this GP) idle=81c/0/0 softirq=764/764 fqs=0
244 causing stalls, and that the stall was affecting RCU-sched.  This message
251 in a self-detected stall.
255 ticks this GP)" indicates that this CPU has not taken any scheduling-clock
258 The "idle=" portion of the message prints the dyntick-idle state.
259 The hex number before the first "/" is the low-order 16 bits of the
260 dynticks counter, which will have an even-numbered value if the CPU
261 is in dyntick-idle mode and an odd-numbered value otherwise.  The hex
263 a small non-negative number if in the idle loop (as shown above) and a
265 "/" is the NMI nesting, which will be a small non-negative number.
272 example, if the CPU might have been in dyntick-idle mode for an extended
275 across repeated stall-warning messages, it is possible that RCU's softirq
277 the stalled CPU is spinning with interrupts are disabled, or, in -rt
278 kernels, if a high-priority process is starving RCU's softirq handler.
280 The "fqs=" shows the number of force-quiescent-state idle/offline
281 detection passes that the grace-period kthread has made across this
287 period (in this case 2603), the grace-period sequence number (7075), and
292 there will be a spurious stall-warning message, which will include
298 possible for a zero-jiffy stall to be flagged in this case, depending
299 on how the stall warning and the grace-period initialization happen to
305 grace period has nevertheless failed to end, the stall-warning splat
308 …n, last rcu_preempt kthread activity 23807 (4297905177-4297881370), jiffies_till_next_fqs=3, root …
311 since the grace-period kthread ran.  The "jiffies_till_next_fqs"
313 of jiffies between force-quiescent-state scans, in this case three,
314 which is way less than 23807.  Finally, the root rcu_node structure's
315 ->qsmask field is printed, which will normally be zero.
317 If the relevant grace-period kthread has been unable to run prior to
321 	rcu_sched kthread starved for 23807 jiffies! g7075 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1 ->cpu=5
324 Starving the grace-period kthreads of CPU time can of course result
327 grace-period sequence number, the "f" precedes the ->gp_flags command
328 to the grace-period kthread, the "RCU_GP_WAIT_FQS" indicates that the
330 task_struct ->state field, and the "cpu" indicates that the grace-period
333 If the relevant grace-period kthread does not wake from FQS wait in a
336 	kthread timer wakeup didn't happen for 23804 jiffies! g7076 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
344 	Possible timer handling issue on cpu=4 timer-softirq=11142
346 Here "cpu" indicates that the grace-period kthread last ran on CPU 4,
347 where it queued the fqs timer.  The number following the "timer-softirq"
361 If a stall lasts long enough, multiple stall-warning messages will
375 	INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 7-... } 21119 jiffies s: 73 root: 0x2/.
386 grace-period sequence counter is 73.  The fact that this last value is
418 in milliseconds.  Because user-mode tasks normally do not cause RCU CPU
424   |<------------first timeout---------->|<-----second timeout----->|
425   |<--half timeout-->|<--half timeout-->|                          |
426   |                  |<--first period-->|                          |
427   |                  |<-----------second sampling period---------->|
429              snapshot time point    1st-stall                  2nd-stall
450    This is similar to the previous example, but with non-zero number of
451    and CPU time consumed by hard interrupts, along with non-zero CPU
452    time consumed by in-kernel execution::
483    Here, the number and CPU time of hard interrupts are all non-zero,
484    but the number of context switches and the in-kernel CPU time consumed
486    non-zero, but could be zero, for example, if the CPU was spinning