1=================================================== 2Using Coresight for Kernel panic and Watchdog reset 3=================================================== 4 5Introduction 6------------ 7This documentation is about using Linux coresight trace support to 8debug kernel panic and watchdog reset scenarios. 9 10Coresight trace during Kernel panic 11----------------------------------- 12From the coresight driver point of view, addressing the kernel panic 13situation has four main requirements. 14 15a. Support for allocation of trace buffer pages from reserved memory area. 16 Platform can advertise this using a new device tree property added to 17 relevant coresight nodes. 18 19b. Support for stopping coresight blocks at the time of panic 20 21c. Saving required metadata in the specified format 22 23d. Support for reading trace data captured at the time of panic 24 25Allocation of trace buffer pages from reserved RAM 26~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 27A new optional device tree property "memory-region" is added to the 28Coresight TMC device nodes, that would give the base address and size of trace 29buffer. 30 31Static allocation of trace buffers would ensure that both IOMMU enabled 32and disabled cases are handled. Also, platforms that support persistent 33RAM will allow users to read trace data in the subsequent boot without 34booting the crashdump kernel. 35 36Note: 37For ETR sink devices, this reserved region will be used for both trace 38capture and trace data retrieval. 39For ETF sink devices, internal SRAM would be used for trace capture, 40and they would be synced to reserved region for retrieval. 41 42 43Disabling coresight blocks at the time of panic 44~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 45In order to avoid the situation of losing relevant trace data after a 46kernel panic, it would be desirable to stop the coresight blocks at the 47time of panic. 48 49This can be achieved by configuring the comparator, CTI and sink 50devices as below:: 51 52 Trigger on panic 53 Comparator --->External out --->CTI -->External In---->ETR/ETF stop 54 55Saving metadata at the time of kernel panic 56~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 57Coresight metadata involves all additional data that are required for a 58successful trace decode in addition to the trace data. This involves 59ETR/ETF/ETB register snapshot etc. 60 61A new optional device property "memory-region" is added to 62the ETR/ETF/ETB device nodes for this. 63 64Reading trace data captured at the time of panic 65~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 66Trace data captured at the time of panic, can be read from rebooted kernel 67or from crashdump kernel using a special device file /dev/crash_tmc_xxx. 68This device file is created only when there is a valid crashdata available. 69 70General flow of trace capture and decode in case of kernel panic 71~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 721. Enable source and sink on all the cores using the sysfs interface. 73 ETR sinks should have trace buffers allocated from reserved memory, 74 by selecting "resrv" buffer mode from sysfs. 75 762. Run relevant tests. 77 783. On a kernel panic, all coresight blocks are disabled, necessary 79 metadata is synced by kernel panic handler. 80 81 System would eventually reboot or boot a crashdump kernel. 82 834. For platforms that supports crashdump kernel, raw trace data can be 84 dumped using the coresight sysfs interface from the crashdump kernel 85 itself. Persistent RAM is not a requirement in this case. 86 875. For platforms that supports persistent RAM, trace data can be dumped 88 using the coresight sysfs interface in the subsequent Linux boot. 89 Crashdump kernel is not a requirement in this case. Persistent RAM 90 ensures that trace data is intact across reboot. 91 92Coresight trace during Watchdog reset 93------------------------------------- 94The main difference between addressing the watchdog reset and kernel panic 95case are below, 96 97a. Saving coresight metadata need to be taken care by the 98 SCP(system control processor) firmware in the specified format, 99 instead of kernel. 100 101b. Reserved memory region given by firmware for trace buffer and metadata 102 has to be in persistent RAM. 103 Note: This is a requirement for watchdog reset case but optional 104 in kernel panic case. 105 106Watchdog reset can be supported only on platforms that meet the above 107two requirements. 108 109Sample commands for testing a Kernel panic case with ETR sink 110------------------------------------------------------------- 111 1121. Boot Linux kernel with "crash_kexec_post_notifiers" added to the kernel 113 bootargs. This is mandatory if the user would like to read the tracedata 114 from the crashdump kernel. 115 1162. Enable the preloaded ETM configuration:: 117 118 #echo 1 > /sys/kernel/config/cs-syscfg/configurations/panicstop/enable 119 1203. Configure CTI using sysfs interface:: 121 122 #./cti_setup.sh 123 124 #cat cti_setup.sh 125 126 127 cd /sys/bus/coresight/devices/ 128 129 ap_cti_config () { 130 #ETM trig out[0] trigger to Channel 0 131 echo 0 4 > channels/trigin_attach 132 } 133 134 etf_cti_config () { 135 #ETF Flush in trigger from Channel 0 136 echo 0 1 > channels/trigout_attach 137 echo 1 > channels/trig_filter_enable 138 } 139 140 etr_cti_config () { 141 #ETR Flush in from Channel 0 142 echo 0 1 > channels/trigout_attach 143 echo 1 > channels/trig_filter_enable 144 } 145 146 ctidevs=`find . -name "cti*"` 147 148 for i in $ctidevs 149 do 150 cd $i 151 152 connection=`find . -name "ete*"` 153 if [ ! -z "$connection" ] 154 then 155 echo "AP CTI config for $i" 156 ap_cti_config 157 fi 158 159 connection=`find . -name "tmc_etf*"` 160 if [ ! -z "$connection" ] 161 then 162 echo "ETF CTI config for $i" 163 etf_cti_config 164 fi 165 166 connection=`find . -name "tmc_etr*"` 167 if [ ! -z "$connection" ] 168 then 169 echo "ETR CTI config for $i" 170 etr_cti_config 171 fi 172 173 cd .. 174 done 175 176Note: CTI connections are SOC specific and hence the above script is 177added just for reference. 178 1794. Choose reserved buffer mode for ETR buffer:: 180 181 #echo "resrv" > /sys/bus/coresight/devices/tmc_etr0/buf_mode_preferred 182 1835. Enable stop on flush trigger configuration:: 184 185 #echo 1 > /sys/bus/coresight/devices/tmc_etr0/stop_on_flush 186 1876. Start Coresight tracing on cores 1 and 2 using sysfs interface 188 1897. Run some application on core 1:: 190 191 #taskset -c 1 dd if=/dev/urandom of=/dev/null & 192 1938. Invoke kernel panic on core 2:: 194 195 #echo 1 > /proc/sys/kernel/panic 196 #taskset -c 2 echo c > /proc/sysrq-trigger 197 1989. From rebooted kernel or crashdump kernel, read crashdata:: 199 200 #dd if=/dev/crash_tmc_etr0 of=/trace/cstrace.bin 201 20210. Run opencsd decoder tools/scripts to generate the instruction trace. 203 204Sample instruction trace dump 205~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 206 207Core1 dump:: 208 209 A etm4_enable_hw: ffff800008ae1dd4 210 CONTEXT EL2 etm4_enable_hw: ffff800008ae1dd4 211 I etm4_enable_hw: ffff800008ae1dd4: 212 d503201f nop 213 I etm4_enable_hw: ffff800008ae1dd8: 214 d503201f nop 215 I etm4_enable_hw: ffff800008ae1ddc: 216 d503201f nop 217 I etm4_enable_hw: ffff800008ae1de0: 218 d503201f nop 219 I etm4_enable_hw: ffff800008ae1de4: 220 d503201f nop 221 I etm4_enable_hw: ffff800008ae1de8: 222 d503233f paciasp 223 I etm4_enable_hw: ffff800008ae1dec: 224 a9be7bfd stp x29, x30, [sp, #-32]! 225 I etm4_enable_hw: ffff800008ae1df0: 226 910003fd mov x29, sp 227 I etm4_enable_hw: ffff800008ae1df4: 228 a90153f3 stp x19, x20, [sp, #16] 229 I etm4_enable_hw: ffff800008ae1df8: 230 2a0003f4 mov w20, w0 231 I etm4_enable_hw: ffff800008ae1dfc: 232 900085b3 adrp x19, ffff800009b95000 <reserved_mem+0xc48> 233 I etm4_enable_hw: ffff800008ae1e00: 234 910f4273 add x19, x19, #0x3d0 235 I etm4_enable_hw: ffff800008ae1e04: 236 f8747a60 ldr x0, [x19, x20, lsl #3] 237 E etm4_enable_hw: ffff800008ae1e08: 238 b4000140 cbz x0, ffff800008ae1e30 <etm4_starting_cpu+0x50> 239 I 149.039572921 etm4_enable_hw: ffff800008ae1e30: 240 a94153f3 ldp x19, x20, [sp, #16] 241 I 149.039572921 etm4_enable_hw: ffff800008ae1e34: 242 52800000 mov w0, #0x0 // #0 243 I 149.039572921 etm4_enable_hw: ffff800008ae1e38: 244 a8c27bfd ldp x29, x30, [sp], #32 245 246 ..snip 247 248 149.052324811 chacha_block_generic: ffff800008642d80: 249 9100a3e0 add x0, 250 I 149.052324811 chacha_block_generic: ffff800008642d84: 251 b86178a2 ldr w2, [x5, x1, lsl #2] 252 I 149.052324811 chacha_block_generic: ffff800008642d88: 253 8b010803 add x3, x0, x1, lsl #2 254 I 149.052324811 chacha_block_generic: ffff800008642d8c: 255 b85fc063 ldur w3, [x3, #-4] 256 I 149.052324811 chacha_block_generic: ffff800008642d90: 257 0b030042 add w2, w2, w3 258 I 149.052324811 chacha_block_generic: ffff800008642d94: 259 b8217882 str w2, [x4, x1, lsl #2] 260 I 149.052324811 chacha_block_generic: ffff800008642d98: 261 91000421 add x1, x1, #0x1 262 I 149.052324811 chacha_block_generic: ffff800008642d9c: 263 f100443f cmp x1, #0x11 264 265 266Core 2 dump:: 267 268 A etm4_enable_hw: ffff800008ae1dd4 269 CONTEXT EL2 etm4_enable_hw: ffff800008ae1dd4 270 I etm4_enable_hw: ffff800008ae1dd4: 271 d503201f nop 272 I etm4_enable_hw: ffff800008ae1dd8: 273 d503201f nop 274 I etm4_enable_hw: ffff800008ae1ddc: 275 d503201f nop 276 I etm4_enable_hw: ffff800008ae1de0: 277 d503201f nop 278 I etm4_enable_hw: ffff800008ae1de4: 279 d503201f nop 280 I etm4_enable_hw: ffff800008ae1de8: 281 d503233f paciasp 282 I etm4_enable_hw: ffff800008ae1dec: 283 a9be7bfd stp x29, x30, [sp, #-32]! 284 I etm4_enable_hw: ffff800008ae1df0: 285 910003fd mov x29, sp 286 I etm4_enable_hw: ffff800008ae1df4: 287 a90153f3 stp x19, x20, [sp, #16] 288 I etm4_enable_hw: ffff800008ae1df8: 289 2a0003f4 mov w20, w0 290 I etm4_enable_hw: ffff800008ae1dfc: 291 900085b3 adrp x19, ffff800009b95000 <reserved_mem+0xc48> 292 I etm4_enable_hw: ffff800008ae1e00: 293 910f4273 add x19, x19, #0x3d0 294 I etm4_enable_hw: ffff800008ae1e04: 295 f8747a60 ldr x0, [x19, x20, lsl #3] 296 E etm4_enable_hw: ffff800008ae1e08: 297 b4000140 cbz x0, ffff800008ae1e30 <etm4_starting_cpu+0x50> 298 I 149.046243445 etm4_enable_hw: ffff800008ae1e30: 299 a94153f3 ldp x19, x20, [sp, #16] 300 I 149.046243445 etm4_enable_hw: ffff800008ae1e34: 301 52800000 mov w0, #0x0 // #0 302 I 149.046243445 etm4_enable_hw: ffff800008ae1e38: 303 a8c27bfd ldp x29, x30, [sp], #32 304 I 149.046243445 etm4_enable_hw: ffff800008ae1e3c: 305 d50323bf autiasp 306 E 149.046243445 etm4_enable_hw: ffff800008ae1e40: 307 d65f03c0 ret 308 A ete_sysreg_write: ffff800008adfa18 309 310 ..snip 311 312 I 149.05422547 panic: ffff800008096300: 313 a90363f7 stp x23, x24, [sp, #48] 314 I 149.05422547 panic: ffff800008096304: 315 6b00003f cmp w1, w0 316 I 149.05422547 panic: ffff800008096308: 317 3a411804 ccmn w0, #0x1, #0x4, ne // ne = any 318 N 149.05422547 panic: ffff80000809630c: 319 540001e0 b.eq ffff800008096348 <panic+0xe0> // b.none 320 I 149.05422547 panic: ffff800008096310: 321 f90023f9 str x25, [sp, #64] 322 E 149.05422547 panic: ffff800008096314: 323 97fe44ef bl ffff8000080276d0 <panic_smp_self_stop> 324 A panic: ffff80000809634c 325 I 149.05422547 panic: ffff80000809634c: 326 910102d5 add x21, x22, #0x40 327 I 149.05422547 panic: ffff800008096350: 328 52800020 mov w0, #0x1 // #1 329 E 149.05422547 panic: ffff800008096354: 330 94166b8b bl ffff800008631180 <bust_spinlocks> 331 N 149.054225518 bust_spinlocks: ffff800008631180: 332 340000c0 cbz w0, ffff800008631198 <bust_spinlocks+0x18> 333 I 149.054225518 bust_spinlocks: ffff800008631184: 334 f000a321 adrp x1, ffff800009a98000 <pbufs.0+0xbb8> 335 I 149.054225518 bust_spinlocks: ffff800008631188: 336 b9405c20 ldr w0, [x1, #92] 337 I 149.054225518 bust_spinlocks: ffff80000863118c: 338 11000400 add w0, w0, #0x1 339 I 149.054225518 bust_spinlocks: ffff800008631190: 340 b9005c20 str w0, [x1, #92] 341 E 149.054225518 bust_spinlocks: ffff800008631194: 342 d65f03c0 ret 343 A panic: ffff800008096358 344 345Perf based testing 346------------------ 347 348Starting perf session 349~~~~~~~~~~~~~~~~~~~~~ 350ETF:: 351 352 perf record -e cs_etm/panicstop,@tmc_etf1/ -C 1 353 perf record -e cs_etm/panicstop,@tmc_etf2/ -C 2 354 355ETR:: 356 357 perf record -e cs_etm/panicstop,@tmc_etr0/ -C 1,2 358 359Reading trace data after panic 360~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 361Same sysfs based method explained above can be used to retrieve and 362decode the trace data after the reboot on kernel panic. 363