#
d99c22ea |
| 22-Apr-2020 |
Stephane Eranian <eranian@google.com> |
perf record: Add num-synthesize-threads option
To control degree of parallelism of the synthesize_mmap() code which is scanning /proc/PID/task/PID/maps and can be time consuming. Mimic perf top way
perf record: Add num-synthesize-threads option
To control degree of parallelism of the synthesize_mmap() code which is scanning /proc/PID/task/PID/maps and can be time consuming. Mimic perf top way of handling the option. If not specified will default to 1 thread, i.e. default behavior before this option.
On a desktop computer the processing of /proc/PID/task/PID/maps isn't slow enough to warrant parallel processing and the thread creation has some cost - hence the default of 1. On a loaded server with >100 cores it is possible to see synthesis times in the order of seconds and in this case having the option is desirable.
As the processing is a synchronization point, it is legitimate to worry if Amdahl's law will apply to this patch. Profiling with this patch in place: https://lore.kernel.org/lkml/20200415054050.31645-4-irogers@google.com/ shows: ... - 32.59% __perf_event__synthesize_threads - 32.54% __event__synthesize_thread + 22.13% perf_event__synthesize_mmap_events + 6.68% perf_event__get_comm_ids.constprop.0 + 1.49% process_synthesized_event + 1.29% __GI___readdir64 + 0.60% __opendir ... That is the processing is 1.49% of execution time and there is plenty to make parallel. This is shown in the benchmark in this patch:
https://lore.kernel.org/lkml/20200415054050.31645-2-irogers@google.com/
Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 1 Average synthesis took: 127729.000 usec (+- 3372.880 usec) Average num. events: 21548.600 (+- 0.306) Average time per event 5.927 usec Number of synthesis threads: 2 Average synthesis took: 88863.500 usec (+- 385.168 usec) Average num. events: 21552.800 (+- 0.327) Average time per event 4.123 usec Number of synthesis threads: 3 Average synthesis took: 83257.400 usec (+- 348.617 usec) Average num. events: 21553.200 (+- 0.327) Average time per event 3.863 usec Number of synthesis threads: 4 Average synthesis took: 75093.000 usec (+- 422.978 usec) Average num. events: 21554.200 (+- 0.200) Average time per event 3.484 usec Number of synthesis threads: 5 Average synthesis took: 64896.600 usec (+- 353.348 usec) Average num. events: 21558.000 (+- 0.000) Average time per event 3.010 usec Number of synthesis threads: 6 Average synthesis took: 59210.200 usec (+- 342.890 usec) Average num. events: 21560.000 (+- 0.000) Average time per event 2.746 usec Number of synthesis threads: 7 Average synthesis took: 54093.900 usec (+- 306.247 usec) Average num. events: 21562.000 (+- 0.000) Average time per event 2.509 usec Number of synthesis threads: 8 Average synthesis took: 48938.700 usec (+- 341.732 usec) Average num. events: 21564.000 (+- 0.000) Average time per event 2.269 usec
Where average time per synthesized event goes from 5.927 usec with 1 thread to 2.269 usec with 8. This isn't a linear speed up as not all of synthesize code has been made parallel. If the synthesis time was about 10 seconds then using 8 threads may bring this down to less than 4.
Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tony Jones <tonyj@suse.de> Cc: yuzhoujian <yuzhoujian@didichuxing.com> Link: http://lore.kernel.org/lkml/20200422155038.9380-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
41d91ec3 |
| 22-Apr-2020 |
Mark Brown <broonie@kernel.org> |
Merge tag 'tegra-for-5.7-asoc' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into asoc-5.7
ASoC: tegra: Fixes for v5.7-rc3
This contains a couple of fixes that are needed to properly
Merge tag 'tegra-for-5.7-asoc' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into asoc-5.7
ASoC: tegra: Fixes for v5.7-rc3
This contains a couple of fixes that are needed to properly reconfigure the audio clocks on older Tegra devices.
show more ...
|
#
175ae3ad |
| 21-Apr-2020 |
Tony Lindgren <tony@atomide.com> |
Merge branch 'fixes-v5.7' into fixes
|
#
3bda0386 |
| 21-Apr-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
Merge tag 'kvm-s390-master-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master
KVM: s390: Fix for 5.7 and maintainer update
- Silence false positive lockdep warnin
Merge tag 'kvm-s390-master-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master
KVM: s390: Fix for 5.7 and maintainer update
- Silence false positive lockdep warning - add Claudio as reviewer
show more ...
|
Revision tags: v5.7-rc2 |
|
#
08d99b2c |
| 17-Apr-2020 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-next into drm-misc-next
Backmerging required to pull topic/phy-compliance.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
#
2b703bbd |
| 16-Apr-2020 |
Joonas Lahtinen <joonas.lahtinen@linux.intel.com> |
Merge drm/drm-next into drm-intel-next-queued
Backmerging in order to pull "topic/phy-compliance".
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
a4721ced |
| 14-Apr-2020 |
Maxime Ripard <maxime@cerno.tech> |
Merge v5.7-rc1 into drm-misc-fixes
Start the new drm-misc-fixes cycle.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
|
#
3b02a051 |
| 13-Apr-2020 |
Ingo Molnar <mingo@kernel.org> |
Merge tag 'v5.7-rc1' into locking/kcsan, to resolve conflicts and refresh
Resolve these conflicts:
arch/x86/Kconfig arch/x86/kernel/Makefile
Do a minor "evil merge" to move the KCSAN entry up a
Merge tag 'v5.7-rc1' into locking/kcsan, to resolve conflicts and refresh
Resolve these conflicts:
arch/x86/Kconfig arch/x86/kernel/Makefile
Do a minor "evil merge" to move the KCSAN entry up a bit by a few lines in the Kconfig to reduce the probability of future conflicts.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
Revision tags: v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5 |
|
#
28336be5 |
| 30-Dec-2019 |
Ingo Molnar <mingo@kernel.org> |
Merge tag 'v5.5-rc4' into locking/kcsan, to resolve conflicts
Conflicts: init/main.c lib/Kconfig.debug
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
c48b0722 |
| 05-Apr-2020 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'perf-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more perf updates from Thomas Gleixner: "Perf updates all over the place:
core:
- Support for
Merge tag 'perf-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more perf updates from Thomas Gleixner: "Perf updates all over the place:
core:
- Support for cgroup tracking in samples to allow cgroup based analysis
tools:
- Support for cgroup analysis
- Commandline option and hotkey for perf top to change the sort order
- A set of fixes all over the place
- Various build system related improvements
- Updates of the X86 pmu event JSON data
- Documentation updates"
* tag 'perf-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits) perf python: Fix clang detection to strip out options passed in $CC perf tools: Support Python 3.8+ in Makefile perf script: Fix invalid read of directory entry after closedir() perf script report: Fix SEGFAULT when using DWARF mode perf script: add -S/--symbols documentation perf pmu-events x86: Use CPU_CLK_UNHALTED.THREAD in Kernel_Utilization metric perf events parser: Add missing Intel CPU events to parser perf script: Allow --symbol to accept hexadecimal addresses perf report/top TUI: Fix title line formatting perf top: Support hotkey to change sort order perf top: Support --group-sort-idx to change the sort order perf symbols: Fix arm64 gap between kernel start and module end perf build-test: Honour JOBS to override detection of number of cores perf script: Add --show-cgroup-events option perf top: Add --all-cgroups option perf record: Add --all-cgroups option perf record: Support synthesizing cgroup events perf report: Add 'cgroup' sort key perf cgroup: Maintain cgroup hierarchy perf tools: Basic support for CGROUP event ...
show more ...
|
#
7dc41b9b |
| 04-Apr-2020 |
Ingo Molnar <mingo@kernel.org> |
Merge tag 'perf-urgent-for-mingo-5.7-20200403' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes and improvements from Arnaldo Carvalho de Melo:
pe
Merge tag 'perf-urgent-for-mingo-5.7-20200403' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes and improvements from Arnaldo Carvalho de Melo:
perf python:
Arnaldo Carvalho de Melo:
- Fix clang detection to strip out options passed in $CC.
build:
He Zhe:
- Normalize gcc parameter when generating arch errno table, fixing the build by removing options from $(CC).
Sam Lunt:
- Support Python 3.8+ in Makefile.
perf report/top:
Arnaldo Carvalho de Melo:
- Fix title line formatting.
perf script:
Andreas Gerstmayr:
- Fix SEGFAULT when using DWARF mode.
- Fix invalid read of directory entry after closedir(), found with valgrind.
Hagen Paul Pfeifer:
- Introduce --deltatime option.
Stephane Eranian:
- Allow --symbol to accept hexadecimal addresses.
Ian Rogers:
- Add -S/--symbols documentation
Namhyung Kim:
- Add --show-cgroup-events option.
perf python:
Arnaldo Carvalho de Melo:
- Include rwsem.c in the python binding, needed by the cgroups improvements.
build-test:
Arnaldo Carvalho de Melo:
- Honour JOBS to override detection of number of cores
perf top:
Jin Yao:
- Support --group-sort-idx to change the sort order
- perf top: Support hotkey to change sort order
perf pmu-events x86:
Jin Yao:
- Use CPU_CLK_UNHALTED.THREAD in Kernel_Utilization metric
perf symbols arm64:
Kemeng Shi:
- Fix arm64 gap between kernel start and module end
kernel perf subsystem:
Namhyung Kim:
- Add PERF_RECORD_CGROUP event and Add PERF_SAMPLE_CGROUP feature, to allow cgroup tracking, saving a link between cgroup path and its id number.
perf cgroup:
Namhyung Kim:
- Maintain cgroup hierarchy.
perf report:
Namhyung Kim:
- Add 'cgroup' sort key.
perf record:
Namhyung Kim:
- Support synthesizing cgroup events for pre-existing cgroups.
- Add --all-cgroups option
Documentation:
Tony Jones:
- Update docs regarding kernel/user space unwinding.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
#
8fb4b679 |
| 25-Mar-2020 |
Namhyung Kim <namhyung@kernel.org> |
perf record: Add --all-cgroups option
The --all-cgroups option is to enable cgroup profiling support. It tells kernel to record CGROUP events in the ring buffer so that perf report can identify tas
perf record: Add --all-cgroups option
The --all-cgroups option is to enable cgroup profiling support. It tells kernel to record CGROUP events in the ring buffer so that perf report can identify task/cgroup association later.
[root@seventh ~]# perf record --all-cgroups --namespaces /wb/cgtest [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.042 MB perf.data (558 samples) ] [root@seventh ~]# perf report --stdio -s cgroup_id,cgroup,pid # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 558 of event 'cycles' # Event count (approx.): 458017341 # # Overhead cgroup id (dev/inode) Cgroup Pid:Command # ........ ..................... .......... ............... # 33.15% 4/0xeffffffb /sub 9615:looper0 32.83% 4/0xf00002f5 /sub/cgrp2 9620:looper2 32.79% 4/0xf00002f4 /sub/cgrp1 9619:looper1 0.35% 4/0xf00002f5 /sub/cgrp2 9618:cgtest 0.34% 4/0xf00002f4 /sub/cgrp1 9617:cgtest 0.32% 4/0xeffffffb / 9615:looper0 0.11% 4/0xeffffffb /sub 9617:cgtest 0.10% 4/0xeffffffb /sub 9618:cgtest
# # (Tip: Sample related events with: perf record -e '{cycles,instructions}:S') # [root@seventh ~]#
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200325124536.2800725-8-namhyung@kernel.org Link: http://lore.kernel.org/lkml/20200402015249.3800462-1-namhyung@kernel.org [ Extracted the HAVE_FILE_HANDLE from the followup patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
c95baf12 |
| 20-Feb-2020 |
Zhenyu Wang <zhenyuw@linux.intel.com> |
Merge drm-intel-next-queued into gvt-next
Backmerge to pull in https://patchwork.freedesktop.org/patch/353621/?series=73544&rev=1
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
#
b19efcab |
| 01-Feb-2020 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'next' into for-linus
Prepare input updates for 5.6 merge window.
|
#
1bdd3e05 |
| 10-Jan-2020 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v5.5-rc5' into next
Sync up with mainline to get SPI "delay" API changes.
|
#
22164fbe |
| 06-Jan-2020 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
Merge drm/drm-next into drm-misc-next
Requested, and we need v5.5-rc1 backported as our current branch is still based on v5.4.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
|
Revision tags: v5.5-rc4, v5.5-rc3, v5.5-rc2 |
|
#
023265ed |
| 11-Dec-2019 |
Jani Nikula <jani.nikula@intel.com> |
Merge drm/drm-next into drm-intel-next-queued
Sync up with v5.5-rc1 to get the updated lock_release() API among other things. Fix the conflict reported by Stephen Rothwell [1].
[1] http://lore.kern
Merge drm/drm-next into drm-intel-next-queued
Sync up with v5.5-rc1 to get the updated lock_release() API among other things. Fix the conflict reported by Stephen Rothwell [1].
[1] http://lore.kernel.org/r/20191210093957.5120f717@canb.auug.org.au
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
show more ...
|
Revision tags: v5.5-rc1 |
|
#
942e6f8a |
| 05-Dec-2019 |
Olof Johansson <olof@lixom.net> |
Merge mainline/master into arm/fixes
This brings in the mainline tree right after armsoc contents was merged this release cycle, so that we can re-run savedefconfig, etc.
Signed-off-by: Olof Johans
Merge mainline/master into arm/fixes
This brings in the mainline tree right after armsoc contents was merged this release cycle, so that we can re-run savedefconfig, etc.
Signed-off-by: Olof Johansson <olof@lixom.net>
show more ...
|
#
3f59dbca |
| 26-Nov-2019 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar: "The main kernel side changes in this cycle were:
- Various Intel
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar: "The main kernel side changes in this cycle were:
- Various Intel-PT updates and optimizations (Alexander Shishkin)
- Prohibit kprobes on Xen/KVM emulate prefixes (Masami Hiramatsu)
- Add support for LSM and SELinux checks to control access to the perf syscall (Joel Fernandes)
- Misc other changes, optimizations, fixes and cleanups - see the shortlog for details.
There were numerous tooling changes as well - 254 non-merge commits. Here are the main changes - too many to list in detail:
- Enhancements to core tooling infrastructure, perf.data, libperf, libtraceevent, event parsing, vendor events, Intel PT, callchains, BPF support and instruction decoding.
- There were updates to the following tools:
perf annotate perf diff perf inject perf kvm perf list perf maps perf parse perf probe perf record perf report perf script perf stat perf test perf trace
- And a lot of other changes: please see the shortlog and Git log for more details"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (279 commits) perf parse: Fix potential memory leak when handling tracepoint errors perf probe: Fix spelling mistake "addrees" -> "address" libtraceevent: Fix memory leakage in copy_filter_type libtraceevent: Fix header installation perf intel-bts: Does not support AUX area sampling perf intel-pt: Add support for decoding AUX area samples perf intel-pt: Add support for recording AUX area samples perf pmu: When using default config, record which bits of config were changed by the user perf auxtrace: Add support for queuing AUX area samples perf session: Add facility to peek at all events perf auxtrace: Add support for dumping AUX area samples perf inject: Cut AUX area samples perf record: Add aux-sample-size config term perf record: Add support for AUX area sampling perf auxtrace: Add support for AUX area sample recording perf auxtrace: Move perf_evsel__find_pmu() perf record: Add a function to test for kernel support for AUX area sampling perf tools: Add kernel AUX area sampling definitions perf/core: Make the mlock accounting simple again perf report: Jump to symbol source view from total cycles view ...
show more ...
|
#
976e3645 |
| 25-Nov-2019 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'next' into for-linus
Prepare input updates for 5.5 merge window.
|
Revision tags: v5.4 |
|
#
8cacac6e |
| 23-Nov-2019 |
Ingo Molnar <mingo@kernel.org> |
Merge tag 'perf-core-for-mingo-5.5-20191122' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
perf rep
Merge tag 'perf-core-for-mingo-5.5-20191122' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
perf report:
Jin Yao:
- Allow entering the annotation view (symbol source/assembly + overhead/cycles/etc column) from the 'perf report --total-cycles' interface.
E.g.:
# perf record --all-cpus --branch-any --all-kernel ^C[ perf record: Woken up 5 times to write data ] # # perf evlist -v cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, exclude_user: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY # # perf report --total-cycles # # Samples: 78762 of event 'cycles' Sampled Sampled Avg Avg Cycles% Cycles Cycles% Cycles [Program Block Range] Shared Object 1.72% 95.8K 0.00% 254 [msr.h:105 -> msr.h:166] [kernel.vmlinux] 1.56% 107.6K 0.00% 618 [compiler.h:199 -> common.c:301] [kernel.vmlinux] 0.83% 46.3K 0.00% 409 [entry_64.S:153 -> entry_64.S:175] [kernel.vmlinux] 0.83% 46.1K 0.00% 83 [jump_label.h:41 -> tsc.c:230] [kernel.vmlinux] 0.64% 36.9K 0.01% 1.4K [hda_intel.c:904 -> hda_intel.c:916] [snd_hda_intel] 0.57% 30.2K 0.00% 282 [file.c:710 -> file.c:730] [kernel.vmlinux] 0.48% 25.8K 0.00% 82 [spinlock.c:158 -> spinlock.c:160] [kernel.vmlinux] 0.45% 23.7K 0.00% 369 [tick-broadcast.c:585 -> tick-broadcast.c:586] [kernel.vmlinux] 0.44% 24.4K 0.00% 73 [msr.h:236 -> tsc.c:1088] [kernel.vmlinux] 0.43% 22.7K 0.00% 144 [cpuidle.c:229 -> cpuidle.c:232] [kernel.vmlinux]
Then press 'A' or Enter on one of those lines, just like with 'perf top', say the top one: [msr.h:105 -> msr.h:166], then this shows up:
Samples: 78K of event 'cycles', 4000 Hz, Event count (approx.): 78762 native_write_msr /lib/modules/5.4.0-rc8/build/vmlinux [Percent: local period] Percent│ IPC Cycle (Average IPC: 0.02, IPC Coverage: 50.0%) │ │ Disassembly of section .text: │ │ ffffffff8106c480 <native_write_msr>: │ __wrmsr(): │ return EAX_EDX_VAL(val, low, high); │ } │ │ static inline void notrace __wrmsr(unsigned int msr, u32 low, u32 high) │ { │ asm volatile("1: wrmsr\n" 49.16 │0.02 mov %edi,%ecx │0.02 mov %esi,%eax │0.02 wrmsr │ arch_static_branch(): │ #include <linux/stringify.h> │ #include <linux/types.h> │ │ static __always_inline bool arch_static_branch(struct static_key *key, bool branch) │ { │ asm_volatile_goto("1:" 0.79 │0.02 nop │ native_write_msr(): │ { │ __wrmsr(msr, low, high); │ │ if (msr_tracepoint_active(__tracepoint_write_msr)) │ do_trace_write_msr(msr, ((u64)high << 32 | low), 0); │ } 50.05 │0.02 254 ← retq │ do_trace_write_msr(msr, ((u64)high << 32 | low), 0); │ shl $0x20,%rdx │ mov %esi,%esi │ or %rdx,%rsi │ xor %edx,%edx │ → jmpq do_trace_write_msr
We need to improve this to show the source code line numbers in the annotation view, so one can go from that program block to the annotation view and see those source code line numbers straight away.
auxtrace/Intel PT:
Adrian Hunter:
- Add support for AUX area sampling, requires new functionality that will land in 5.5, its already in tip.
This includes kernel capability querying so that it fails gracefully with older kernels, duimping aux area samples in 'perf report -D' and 'perf script'.
perf.data:
Alexey Budankov:
- Fix decompression of PERF_RECORD_COMPRESSED records.
core:
Arnaldo Carvalho de Melo:
- Use the 'dcacheline' cmp routine to find the right DSOs taking into account the 'maj', 'min', 'ino' and 'ino_generation', that got moved from 'struct map' to 'struct dso', where it belongs.
This further reduces the size of 'struct map', there is still more work to do to maybe get it to max one cacheline.
libtraceevent:
Hewenliang:
- Fix memory leakage in copy_filter_type().
Sudip Mukherjee:
- Fix header installation.
perf parse:
Ian Rogers :
- Fix potential memory leak when handling tracepoint errors, found using LLVM's libFuzzer.
perf probe:
Colin Ian King:
- Fix spelling mistake "addrees" -> "address".
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
Revision tags: v5.4-rc8 |
|
#
f0bb7ee8 |
| 15-Nov-2019 |
Adrian Hunter <adrian.hunter@intel.com> |
perf auxtrace: Add support for AUX area sample recording
Add support for parsing and validating AUX area sample options. At present, the only option is the sample size, but it is also necessary to e
perf auxtrace: Add support for AUX area sample recording
Add support for parsing and validating AUX area sample options. At present, the only option is the sample size, but it is also necessary to ensure that events are in a group with an AUX area event as the leader.
Committer note:
Add missing 'static inline' in front of auxtrace_parse_sample_options() for when we don't HAVE_AUXTRACE_SUPPORT.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
9f4813b5 |
| 19-Nov-2019 |
Ingo Molnar <mingo@kernel.org> |
Merge tag 'v5.4-rc8' into WIP.x86/mm, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
ac94be49 |
| 15-Nov-2019 |
Thomas Gleixner <tglx@linutronix.de> |
Merge branch 'linus' into x86/hyperv
Pick up upstream fixes to avoid conflicts.
|
#
56b2147f |
| 12-Nov-2019 |
Ingo Molnar <mingo@kernel.org> |
Merge tag 'perf-core-for-mingo-5.5-20191107' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
perf rep
Merge tag 'perf-core-for-mingo-5.5-20191107' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
perf report:
Jin Yao:
- Introduce --total-cycles, for basic block profiling, further using data obtained from LBR, an example should suffice:
# perf record -b ^C[ perf record: Woken up 595 times to write data ] [ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ]
# perf evlist -v cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY
# perf report --total-cycles --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 6M of event 'cycles' # Event count (approx.): 6299936 # # Sampled Sampled Avg Avg # Cycles% Cycles Cycles% Cycles [Program Block Range] Shared Object # ....... ...... ....... ..... .................................... ................ # 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] 0.72% 544.5K 0.03% 230 [entry_64.S:657 -> entry_64.S:662] [kernel.vmlinux] 0.56% 541.8K 0.09% 672 [compiler.h:199 -> common.c:300] [kernel.vmlinux] 0.39% 293.2K 0.01% 104 [list_debug.c:43 -> list_debug.c:61] [kernel.vmlinux] 0.36% 278.6K 0.03% 272 [entry_64.S:1289 -> entry_64.S:1308] [kernel.vmlinux]
perf record:
Adrian Hunter:
- Allow storing perf.data in a directory together with a copy of /proc/kcore.
Jiwei Sun:
- Add support for limit perf output file size, i.e.:
# perf record --all-cpus -F 10000 --max-size=4M sleep 10h [ perf record: perf size limit reached (4097 KB), stopping session ] [ perf record: Woken up 6 times to write data ] [ perf record: Captured and wrote 4.048 MB perf.data (54094 samples) ] Terminated # ls -lah perf.data -rw-------. 1 root root 4.1M Nov 7 15:27 perf.data #
perf stat:
Jiri Olsa:
- Add --per-node agregation support:
In live mode:
# perf stat -a -I 1000 -e cycles --per-node # time node cpus counts unit events 1.000542550 N0 20 6,202,097 cycles 1.000542550 N1 20 639,559 cycles 2.002040063 N0 20 7,412,495 cycles 2.002040063 N1 20 2,185,577 cycles 3.003451699 N0 20 6,508,917 cycles 3.003451699 N1 20 765,607 cycles ...
Or in the record/report stat session:
# perf stat record -a -I 1000 -e cycles # time counts unit events 1.000536937 10,008,468 cycles 2.002090152 9,578,539 cycles 3.003625233 7,647,869 cycles 4.005135036 7,032,086 cycles ^C 4.340902364 3,923,893 cycles
# perf stat report --per-node # time node cpus counts unit events 1.000536937 N0 20 9,355,086 cycles 1.000536937 N1 20 653,382 cycles 2.002090152 N0 20 7,712,838 cycles 2.002090152 N1 20 1,865,701 cycles ...
perf probe:
Masami Hiramatsu:
Various fixes related to recent additions to the DWARF format:
- Fix to find range-only function instance
- Walk function lines in lexical blocks
- Fix to show function entry line as probe-able
- Fix wrong address verification
- Fix to probe a function which has no entry pc
- Fix to probe an inline function which has no entry pc
- Fix to list probe event with correct line number
- Fix to show inlined function callsite without entry_pc
- Fix to show ranges of variables in functions without entry_pc
- Return a better scope DIE if there is no best scope
- Skip end-of-sequence and non statement lines
- Filter out instances except for inlined subroutine and subprogram
- Fix to show calling lines of inlined functions
- Skip overlapped location on searching variables
perf inject:
Adrian Hunter:
- Do not strip evsels with --strip, as they are needed for create_gcov (see the autofdo example in tools/perf/Documentation/intel-pt.txt).
Intel PT:
Adrian Hunter:
- Intel PT uses an auxtrace_cache to store the results of code-walking, to avoid repeated decoding. Add an auxtrace_cache__remove to handle text poke events.
core:
Andi Kleen:
- Always preserve errno while cleaning up perf_event_open failures.
llvm:
Arnaldo Carvalho de Melo:
- No need to tell that the request for saving a .o file for BPF events, as expressed in ~/.perfconfig was satisfied, make that a debug message.
perf vendor events:
Intel:
Haiyan Song:
- Update CascadelakeX events to v1.05.
- Update all the Intel JSON metrics from TMAM 3.6.
Treewide:
Ian Rogers:
- Improve error paths, plugging leaks found using LLVM tools such as libFuzzer.
jevents:
Yunfeng Ye:
- Fix resource leak in process_mapfile() and main()
perf kvm:
Igor Lubashev:
- Use evlist layer api when possible.
libsubcmd:
James Clark:
- Move EXTRA_FLAGS to the end to allow overriding existing flags.
- Use -O0 with DEBUG=1
perf diff:
Jin Yao:
- Don't use hack to skip column length calculation
CoreSight ETM:
Leo yan:
- Fix definition of macro TO_CS_QUEUE_NR
ARM64:
John Garry:
- Do not try to include libelf header files when its feature detection failed, fixing the cross build for ARM64.
perf tests:
Leo Yan:
- Fix out of bounds memory access in the backward ring buffer test.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|