1perf-mem(1) 2=========== 3 4NAME 5---- 6perf-mem - Profile memory accesses 7 8SYNOPSIS 9-------- 10[verse] 11'perf mem' [<options>] (record [<command>] | report) 12 13DESCRIPTION 14----------- 15"perf mem record" runs a command and gathers memory operation data 16from it, into perf.data. Perf record options are accepted and are passed through. 17 18"perf mem report" displays the result. It invokes perf report with the 19right set of options to display a memory access profile. By default, loads 20and stores are sampled. Use the -t option to limit to loads or stores. 21 22Note that on Intel systems the memory latency reported is the use-latency, 23not the pure load (or store latency). Use latency includes any pipeline 24queuing delays in addition to the memory subsystem latency. 25 26On Arm64 this uses SPE to sample load and store operations, therefore hardware 27and kernel support is required. See linkperf:perf-arm-spe[1] for a setup guide. 28Due to the statistical nature of SPE sampling, not every memory operation will 29be sampled. 30 31On AMD this use IBS Op PMU to sample load-store operations. 32 33COMMON OPTIONS 34-------------- 35-f:: 36--force:: 37 Don't do ownership validation 38 39-t:: 40--type=<type>:: 41 Select the memory operation type: load or store (default: load,store) 42 43-v:: 44--verbose:: 45 Be more verbose (show counter open errors, etc) 46 47-p:: 48--phys-data:: 49 Record/Report sample physical addresses 50 51--data-page-size:: 52 Record/Report sample data address page size 53 54RECORD OPTIONS 55-------------- 56<command>...:: 57 Any command you can specify in a shell. 58 59-e:: 60--event <event>:: 61 Event selector. Use 'perf mem record -e list' to list available events. 62 63-K:: 64--all-kernel:: 65 Configure all used events to run in kernel space. 66 67-U:: 68--all-user:: 69 Configure all used events to run in user space. 70 71--ldlat <n>:: 72 Specify desired latency for loads event. Supported on Intel, Arm64 and 73 some AMD processors. Ignored on other archs. 74 75 On supported AMD processors: 76 - /sys/bus/event_source/devices/ibs_op/caps/ldlat file contains '1'. 77 - Supported latency values are 128 to 2048 (both inclusive). 78 - Latency value which is a multiple of 128 incurs a little less profiling 79 overhead compared to other values. 80 - Load latency filtering is disabled by default. 81 82REPORT OPTIONS 83-------------- 84-i:: 85--input=<file>:: 86 Input file name. 87 88-C:: 89--cpu=<cpu>:: 90 Monitor only on the list of CPUs provided. Multiple CPUs can be provided as a 91 comma-separated list with no space: 0,1. Ranges of CPUs are specified with - 92 like 0-2. Default is to monitor all CPUS. 93 94-D:: 95--dump-raw-samples:: 96 Dump the raw decoded samples on the screen in a format that is easy to parse with 97 one sample per line. 98 99-s:: 100--sort=<key>:: 101 Group result by given key(s) - multiple keys can be specified 102 in CSV format. The keys are specific to memory samples are: 103 symbol_daddr, symbol_iaddr, dso_daddr, locked, tlb, mem, snoop, 104 dcacheline, phys_daddr, data_page_size, blocked. 105 106 - symbol_daddr: name of data symbol being executed on at the time of sample 107 - symbol_iaddr: name of code symbol being executed on at the time of sample 108 - dso_daddr: name of library or module containing the data being executed 109 on at the time of the sample 110 - locked: whether the bus was locked at the time of the sample 111 - tlb: type of tlb access for the data at the time of the sample 112 - mem: type of memory access for the data at the time of the sample 113 - snoop: type of snoop (if any) for the data at the time of the sample 114 - dcacheline: the cacheline the data address is on at the time of the sample 115 - phys_daddr: physical address of data being executed on at the time of sample 116 - data_page_size: the data page size of data being executed on at the time of sample 117 - blocked: reason of blocked load access for the data at the time of the sample 118 119 And the default sort keys are changed to local_weight, mem, sym, dso, 120 symbol_daddr, dso_daddr, snoop, tlb, locked, blocked, local_ins_lat. 121 122-F:: 123--fields=:: 124 Specify output field - multiple keys can be specified in CSV format. 125 Please see linkperf:perf-report[1] for details. 126 127 In addition to the default fields, 'perf mem report' will provide the 128 following fields to break down sample periods. 129 130 - op: operation in the sample instruction (load, store, prefetch, ...) 131 - cache: location in CPU cache (L1, L2, ...) where the sample hit 132 - mem: location in memory or other places the sample hit 133 - dtlb: location in Data TLB (L1, L2) where the sample hit 134 - snoop: snoop result for the sampled data access 135 136 Please take a look at the OUTPUT FIELD SELECTION section for caveats. 137 138-T:: 139--type-profile:: 140 Show data-type profile result instead of code symbols. This requires 141 the debug information and it will change the default sort keys to: 142 mem, snoop, tlb, type. 143 144-U:: 145--hide-unresolved:: 146 Only display entries resolved to a symbol. 147 148-x:: 149--field-separator=<separator>:: 150 Specify the field separator used when dump raw samples (-D option). By default, 151 The separator is the space character. 152 153In addition, for report all perf report options are valid, and for record 154all perf record options. 155 156OVERHEAD CALCULATION 157-------------------- 158Unlike linkperf:perf-report[1], which calculates overhead from the actual 159sample period, perf-mem overhead is calculated using sample weight. E.g. 160there are two samples in perf.data file, both with the same sample period, 161but one sample with weight 180 and the other with weight 20: 162 163 $ perf script -F period,data_src,weight,ip,sym 164 100000 629080842 |OP LOAD|LVL L3 hit|... 20 7e69b93ca524 strcmp 165 100000 1a29081042 |OP LOAD|LVL RAM hit|... 180 ffffffff82429168 memcpy 166 167 $ perf report -F overhead,symbol 168 50% [.] strcmp 169 50% [k] memcpy 170 171 $ perf mem report -F overhead,symbol 172 90% [k] memcpy 173 10% [.] strcmp 174 175OUTPUT FIELD SELECTION 176---------------------- 177"perf mem report" adds a number of new output fields specific to data source 178information in the sample. Some of them have the same name with the existing 179sort keys ("mem" and "snoop"). So unlike other fields and sort keys, they'll 180behave differently when it's used by -F/--fields or -s/--sort. 181 182Using those two as output fields will aggregate samples altogether and show 183breakdown. 184 185 $ perf mem report -F mem,snoop 186 ... 187 # ------ Memory ------- --- Snoop ---- 188 # RAM Uncach Other HitM Other 189 # ..................... .............. 190 # 191 3.5% 0.0% 96.5% 25.1% 74.9% 192 193But using the same name for sort keys will aggregate samples for each type 194separately. 195 196 $ perf mem report -s mem,snoop 197 # Overhead Samples Memory access Snoop 198 # ........ ............ ....................................... ............ 199 # 200 47.99% 1509 L2 hit N/A 201 25.08% 338 core, same node Any cache hit HitM 202 10.24% 54374 N/A N/A 203 6.77% 35938 L1 hit N/A 204 6.39% 101 core, same node Any cache hit N/A 205 3.50% 69 RAM hit N/A 206 0.03% 158 LFB/MAB hit N/A 207 0.00% 2 Uncached hit N/A 208 209SEE ALSO 210-------- 211linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-arm-spe[1] 212