1=============== 2 GPU Debugging 3=============== 4 5General Debugging Options 6========================= 7 8The DebugFS section provides documentation on a number files to aid in debugging 9issues on the GPU. 10 11 12GPUVM Debugging 13=============== 14 15To aid in debugging GPU virtual memory related problems, the driver supports a 16number of options module parameters: 17 18`vm_fault_stop` - If non-0, halt the GPU memory controller on a GPU page fault. 19 20`vm_update_mode` - If non-0, use the CPU to update GPU page tables rather than 21the GPU. 22 23 24Decoding a GPUVM Page Fault 25=========================== 26 27If you see a GPU page fault in the kernel log, you can decode it to figure 28out what is going wrong in your application. A page fault in your kernel 29log may look something like this: 30 31:: 32 33 [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:3 pasid:32777, for process glxinfo pid 2424 thread glxinfo:cs0 pid 2425) 34 in page starting at address 0x0000800102800000 from IH client 0x1b (UTCL2) 35 VM_L2_PROTECTION_FAULT_STATUS:0x00301030 36 Faulty UTCL2 client ID: TCP (0x8) 37 MORE_FAULTS: 0x0 38 WALKER_ERROR: 0x0 39 PERMISSION_FAULTS: 0x3 40 MAPPING_ERROR: 0x0 41 RW: 0x0 42 43First you have the memory hub, gfxhub and mmhub. gfxhub is the memory 44hub used for graphics, compute, and sdma on some chips. mmhub is the 45memory hub used for multi-media and sdma on some chips. 46 47Next you have the vmid and pasid. If the vmid is 0, this fault was likely 48caused by the kernel driver or firmware. If the vmid is non-0, it is generally 49a fault in a user application. The pasid is used to link a vmid to a system 50process id. If the process is active when the fault happens, the process 51information will be printed. 52 53The GPU virtual address that caused the fault comes next. 54 55The client ID indicates the GPU block that caused the fault. 56Some common client IDs: 57 58- CB/DB: The color/depth backend of the graphics pipe 59- CPF: Command Processor Frontend 60- CPC: Command Processor Compute 61- CPG: Command Processor Graphics 62- TCP/SQC/SQG: Shaders 63- SDMA: SDMA engines 64- VCN: Video encode/decode engines 65- JPEG: JPEG engines 66 67PERMISSION_FAULTS describe what faults were encountered: 68 69- bit 0: the PTE was not valid 70- bit 1: the PTE read bit was not set 71- bit 2: the PTE write bit was not set 72- bit 3: the PTE execute bit was not set 73 74Finally, RW, indicates whether the access was a read (0) or a write (1). 75 76In the example above, a shader (cliend id = TCP) generated a read (RW = 0x0) to 77an invalid page (PERMISSION_FAULTS = 0x3) at GPU virtual address 780x0000800102800000. The user can then inspect their shader code and resource 79descriptor state to determine what caused the GPU page fault. 80 81UMR 82=== 83 84`umr <https://gitlab.freedesktop.org/tomstdenis/umr>`_ is a general purpose 85GPU debugging and diagnostics tool. Please see the umr 86`documentation <https://umr.readthedocs.io/en/main/>`_ for more information 87about its capabilities. 88