JVM performance optimization, Part 3: Garbage collection
The Java platform's garbage collection mechanism greatly increases developer productivity, but a poorly implemented garbage collector can over-consume
application resources. In this third article in the JVM performance optimization series, Eva Andreasson offers Java beginners an overview of the Java
platform's memory model and GC mechanism. She then explains why fragmentation (and not GC) is the major "gotcha!" of Java application performance, and
why generational garbage collection and compaction are currently the leading (though not most innovative) approaches to managing heap fragmentation in
Java applications.
Garbage collection (GC) is the process that aims to free up occupied memory that is no longer referenced by
any reachable Java object, and is an essential part of the Java virtual machine's (JVM's) dynamic memory
management system. In a typical garbage collection cycle all objects that are still referenced, and thus
reachable, are kept. The space occupied by previously referenced objects is freed and reclaimed to enable new
object allocation.
In order to understand garbage collection and the various GC approaches and algorithms, you must first know a
few things about the Java platform's memory model.
Garbage collection and the Java platform memory model
When you specify the startup option on the command line of your Java application (for instance:
) memory is assigned to a Java process. This memory is referred to as the Java heap (or just
heap). This is the dedicated memory address space where all objects created by your Java program (or
sometimes the JVM) will be allocated. As your Java program keeps running and allocating new objects, the Java
heap (meaning that address space) will fill up.
Eventually, the Java heap will be full, which means that an allocating thread is unable to find a large-enough
consecutive section of free memory for the object it wants to allocate. At that point, the JVM determines that a
garbage collection needs to happen and it notifies the garbage collector. A garbage collection can also be
triggered when a Java program calls . Using does not guarantee a garbage
collection. Before any garbage collection can start, a GC mechanism will first determine whether it is safe to start
it. It is safe to start a garbage collection when all of the application's active threads are at a safe point to allow for
it, e.g. simply explained it would be bad to start garbage collecting in the middle of an ongoing object allocation,
or in the middle of executing a sequence of optimized CPU instructions (see my previous article on compilers),
as you might lose context and thereby mess up end results.
A garbage collector should never reclaim an actively referenced object; to do so would break the Java virtual
machine specification. A garbage collector is also not required to immediately collect dead objects. Dead objects
are eventually collected during subsequent garbage collection cycles. While there are many ways to implement
garbage collection, these two assumptions are true for all varieties. The real challenge of garbage collection is to
identify everything that is live (still referenced) and reclaim any unreferenced memory, but do so without
impacting running applications any more than necessary. A garbage collector thus has two mandates:
1. To quickly free unreferenced memory in order to satisfy an application's allocation rate so that it doesn't run out of memory.
2. To reclaim memory while minimally impacting the performance (e.g., latency and throughput) of a running application.
Two kinds of garbage collection
In the first article in this series I touched on the two main approaches to garbage collection, which are reference counting and tracing
collectors. This time I'll drill down further into each approach then introduce some of the algorithms used to implement tracing collectors in
production environments.
Reference counting collectors
Reference counting collectors keep track of how many references are pointing to each Java object. Once the count for an object becomes
zero, the memory can be immediately reclaimed. This immediate access to reclaimed memory is the major advantage of the reference-
counting approach to garbage collection. There is very little overhead when it comes to holding on to un-referenced memory. Keeping all
reference counts up to date can be quite costly, however.
The main difficulty with reference counting collectors is keeping the reference counts accurate. Another well-known challenge is the
complexity associated with handling circular structures. If two objects reference each other and no live object refers to them, their memory