# async-profiler
This project is a low overhead sampling profiler for Java
that does not suffer from [Safepoint bias problem](http://psy-lob-saw.blogspot.ru/2016/02/why-most-sampling-java-profilers-are.html).
It features HotSpot-specific APIs to collect stack traces
and to track memory allocations. The profiler works with
OpenJDK, Oracle JDK and other Java runtimes based on the HotSpot JVM.
async-profiler can trace the following kinds of events:
- CPU cycles
- Hardware and Software performance counters like cache misses, branch misses, page faults, context switches etc.
- Allocations in Java Heap
- Contented lock attempts, including both Java object monitors and ReentrantLocks
See our [Wiki](https://github.com/jvm-profiling-tools/async-profiler/wiki) or
[3 hours playlist](https://www.youtube.com/playlist?list=PLNCLTEx3B8h4Yo_WvKWdLvI9mj1XpTKBr)
to learn about all features.
## Download
Current release (2.9):
- Linux x64 (glibc): [async-profiler-2.9-linux-x64.tar.gz](https://github.com/jvm-profiling-tools/async-profiler/releases/download/v2.9/async-profiler-2.9-linux-x64.tar.gz)
- Linux x64 (musl): [async-profiler-2.9-linux-musl-x64.tar.gz](https://github.com/jvm-profiling-tools/async-profiler/releases/download/v2.9/async-profiler-2.9-linux-musl-x64.tar.gz)
- Linux arm64: [async-profiler-2.9-linux-arm64.tar.gz](https://github.com/jvm-profiling-tools/async-profiler/releases/download/v2.9/async-profiler-2.9-linux-arm64.tar.gz)
- macOS x64/arm64: [async-profiler-2.9-macos.zip](https://github.com/jvm-profiling-tools/async-profiler/releases/download/v2.9/async-profiler-2.9-macos.zip)
- Converters between profile formats: [converter.jar](https://github.com/jvm-profiling-tools/async-profiler/releases/download/v2.9/converter.jar)
(JFR to Flame Graph, JFR to FlameScope, collapsed stacks to Flame Graph)
[Previous releases](https://github.com/jvm-profiling-tools/async-profiler/releases)
Note: async-profiler also comes bundled with IntelliJ IDEA Ultimate 2018.3 and later.
For more information refer to [IntelliJ IDEA documentation](https://www.jetbrains.com/help/idea/cpu-and-allocation-profiling-basic-concepts.html).
## Supported platforms
- **Linux** / x64 / x86 / arm64 / arm32 / ppc64le
- **macOS** / x64 / arm64
### Community supported builds
- **Windows** / x64 - <img src="https://upload.wikimedia.org/wikipedia/commons/9/9c/IntelliJ_IDEA_Icon.svg" width="16" height="16"/> [IntelliJ IDEA](https://www.jetbrains.com/idea/) 2021.2 and later
- [**ap-loader**](https://github.com/jvm-profiling-tools/ap-loader) -
all-in-one JAR for using async-profiler in Java programs and as a CLI tool
## CPU profiling
In this mode profiler collects stack trace samples that include **Java** methods,
**native** calls, **JVM** code and **kernel** functions.
The general approach is receiving call stacks generated by `perf_events`
and matching them up with call stacks generated by `AsyncGetCallTrace`,
in order to produce an accurate profile of both Java and native code.
Additionally, async-profiler provides a workaround to recover stack traces
in some [corner cases](https://bugs.openjdk.java.net/browse/JDK-8178287)
where `AsyncGetCallTrace` fails.
This approach has the following advantages compared to using `perf_events`
directly with a Java agent that translates addresses to Java method names:
* Works on older Java versions because it doesn't require
`-XX:+PreserveFramePointer`, which is only available in JDK 8u60 and later.
* Does not introduce the performance overhead from `-XX:+PreserveFramePointer`,
which can in rare cases be as high as 10%.
* Does not require generating a map file to map Java code addresses to method
names.
* Works with interpreter frames.
* Does not require writing out a perf.data file for further processing in
user space scripts.
If you wish to resolve frames within `libjvm`, the [debug symbols](#installing-debug-symbols) are required.
## ALLOCATION profiling
Instead of detecting CPU-consuming code, the profiler can be configured
to collect call sites where the largest amount of heap memory is allocated.
async-profiler does not use intrusive techniques like bytecode instrumentation
or expensive DTrace probes which have significant performance impact.
It also does not affect Escape Analysis or prevent from JIT optimizations
like allocation elimination. Only actual heap allocations are measured.
The profiler features TLAB-driven sampling. It relies on HotSpot-specific
callbacks to receive two kinds of notifications:
- when an object is allocated in a newly created TLAB (aqua frames in a Flame Graph);
- when an object is allocated on a slow path outside TLAB (brown frames).
This means not each allocation is counted, but only allocations every _N_ kB,
where _N_ is the average size of TLAB. This makes heap sampling very cheap
and suitable for production. On the other hand, the collected data
may be incomplete, though in practice it will often reflect the top allocation
sources.
Sampling interval can be adjusted with `--alloc` option.
For example, `--alloc 500k` will take one sample after 500 KB of allocated
space on average. However, intervals less than TLAB size will not take effect.
The minimum supported JDK version is 7u40 where the TLAB callbacks appeared.
### Installing Debug Symbols
The allocation profiler requires HotSpot debug symbols. Oracle JDK already has them
embedded in `libjvm.so`, but in OpenJDK builds they are typically shipped
in a separate package. For example, to install OpenJDK debug symbols on
Debian / Ubuntu, run:
```
# apt install openjdk-8-dbg
```
or for OpenJDK 11:
```
# apt install openjdk-11-dbg
```
On CentOS, RHEL and some other RPM-based distributions, this could be done with
[debuginfo-install](http://man7.org/linux/man-pages/man1/debuginfo-install.1.html) utility:
```
# debuginfo-install java-1.8.0-openjdk
```
On Gentoo the `icedtea` OpenJDK package can be built with the per-package setting
`FEATURES="nostrip"` to retain symbols.
The `gdb` tool can be used to verify if the debug symbols are properly installed for the `libjvm` library.
For example on Linux:
```
$ gdb $JAVA_HOME/lib/server/libjvm.so -ex 'info address UseG1GC'
```
This command's output will either contain `Symbol "UseG1GC" is at 0xxxxx`
or `No symbol "UseG1GC" in current context`.
## Wall-clock profiling
`-e wall` option tells async-profiler to sample all threads equally every given
period of time regardless of thread status: Running, Sleeping or Blocked.
For instance, this can be helpful when profiling application start-up time.
Wall-clock profiler is most useful in per-thread mode: `-t`.
Example: `./profiler.sh -e wall -t -i 5ms -f result.html 8983`
## Java method profiling
`-e ClassName.methodName` option instruments the given Java method
in order to record all invocations of this method with the stack traces.
Example: `-e java.util.Properties.getProperty` will profile all places
where `getProperty` method is called from.
Only non-native Java methods are supported. To profile a native method,
use hardware breakpoint event instead, e.g. `-e Java_java_lang_Throwable_fillInStackTrace`
**Be aware** that if you attach async-profiler at runtime, the first instrumentation
of a non-native Java method may cause the [deoptimization](https://github.com/openjdk/jdk/blob/bf2e9ee9d321ed289466b2410f12ad10504d01a2/src/hotspot/share/prims/jvmtiRedefineClasses.cpp#L4092-L4096)
of all compiled methods. The subsequent instrumentation flushes only the _dependent code_.
The massive CodeCache flush doesn't occur if attaching async-profiler as an agent.
Here are some useful native methods that you may want to profile:
* ```G1CollectedHeap::humongous_obj_allocate``` - trace the _humongous allocation_ of the G1 GC,
* ```JVM_StartThread``` - trace the new thread creation,
* ```Java_java_lang_ClassLoader_defineClass1``` - trace class loading.
## Building
Build status: [![Build Status](https://github.com/jvm-profiling-tool