Skip to main content

Command Palette

Search for a command to run...

Java Memory Deep Dive

Part 3: Garbage Collection

Published
8 min read

Core GC Concepts

What is Garbage?

An object is garbage when it is unreachable from any GC Root. The JVM never uses reference counting (unlike Python/Swift).

[!IMPORTANT] Interview Key: Objects D, E, F form a reference cycle but are STILL garbage because no GC root can reach them. Java GC handles cycles correctly — unlike naive reference counting.

Reachability Analysis (Mark Phase)

The Three Fundamental GC Operations


GC Algorithm Strategies

Mark-Sweep

2.2 Mark-Sweep-Compact

2.3 Copying Collector

2.4 Algorithm Comparison

Algorithm Fragmentation Speed Memory Overhead Used In
Mark-Sweep Yes Fast sweep None CMS (old gen)
Mark-Compact No Slower (moving) None Serial, Parallel (old gen)
Copying No Fastest for low survival 2x space (from/to) All (young gen)

3. GC Types: Minor, Major, Full

3.1 Minor GC Walkthrough

3.2 Card Table and Remembered Sets

Problem: During Minor GC, we only scan Young Gen. But Old Gen objects might reference Young Gen objects. How do we find those references without scanning the entire Old Gen?

[!TIP] Write Barrier: A small piece of code injected by the JIT at every reference store. When you write oldObj.field = youngObj, the barrier marks the card as dirty. This is the cost of generational GC — every reference write has a small overhead.


4. JVM Garbage Collectors

4.1 Collector Evolution Timeline

4.2 Serial GC (-XX:+UseSerialGC)

Use case: Client apps, small heaps under 100MB, single-CPU machines.

4.3 Parallel GC (-XX:+UseParallelGC)

Default in Java 8. Uses multiple GC threads for higher throughput.

Flag Purpose
-XX:ParallelGCThreads=N Number of GC threads
-XX:MaxGCPauseMillis=200 Target max pause
-XX:GCTimeRatio=99 1% time in GC

4.4 G1 GC (-XX:+UseG1GC)

Default since Java 9. The most important collector to understand for interviews.

G1 Region-Based Heap

[!NOTE] G1 divides the heap into equal-sized regions (1MB–32MB, auto-calculated). Any region can serve any role. This flexibility allows G1 to collect the most garbage-filled regions first — hence "Garbage First."

G1 Collection Phases

G1 Key Concepts

IHOP (Initiating Heap Occupancy Percent): When Old Gen reaches this threshold, concurrent marking starts. Default: adaptive (starts ~45%).

Evacuation: G1 does not sweep in place — it copies live objects from collected regions to free regions.

Mixed GC: Collects both Young AND selected Old regions in a single STW pause.

SATB (Snapshot-At-The-Beginning): G1 takes a logical snapshot of the object graph at the start of concurrent marking. Any new references created during marking are captured via write barriers.

G1 Tuning Flags

Flag Purpose Default
-XX:MaxGCPauseMillis Target pause time 200ms
-XX:G1HeapRegionSize Region size Auto (1-32MB)
-XX:InitiatingHeapOccupancyPercent IHOP trigger 45% (adaptive)
-XX:G1MixedGCCountTarget Mixed GCs per cycle 8
-XX:G1HeapWastePercent Stop mixed if waste below 5%

4.5 ZGC (-XX:+UseZGC)

Ultra-low latency collector. Pauses are sub-millisecond regardless of heap size.

ZGC Key Properties:

  • Pauses are O(1) — do NOT scale with heap or live set size

  • Supports multi-terabyte heaps

  • Concurrent relocation (compaction while app runs)

  • Uses colored pointers and load barriers

  • Generational ZGC (Java 21+) adds generations for better throughput

4.6 Collector Comparison

Collector Pauses Throughput Heap Size Best For
Serial Long, single-thread Low Small Embedded, client
Parallel Medium, multi-thread Highest Medium-Large Batch, throughput
G1 Predictable target Good Large General purpose
ZGC Sub-ms Good Any (TB scale) Latency-critical
Shenandoah Sub-ms Good Any Latency-critical (RedHat)

5. Safepoints — How the JVM Pauses Threads

GC cannot pause threads at arbitrary points. Threads must reach a safepoint first.

[!WARNING] Time-To-Safepoint (TTSP) can be a hidden latency source. A counted loop without a safepoint poll (e.g., for(int i=0; i<1_000_000; i++)) can delay GC start. Use -XX:+UseCountedLoopSafepoints (default since Java 17) to insert safepoints in counted loops.


6. Reference Types and GC

Java provides four reference strengths that interact with GC:

// Strong - default
Object strong = new Object();

// Soft - cleared when memory is low
SoftReference<byte[]> cache = new SoftReference<>(new byte[1024*1024]);

// Weak - cleared at next GC regardless of memory
WeakReference<Object> weak = new WeakReference<>(new Object());
// WeakHashMap uses this for auto-expiring entries

// Phantom - for post-mortem cleanup (replaces finalize())
PhantomReference<Object> phantom = new PhantomReference<>(obj, referenceQueue);

GC Logging and Analysis

Enable GC Logging

# Java 9+ (Unified Logging)
java -Xlog:gc*:file=gc.log:time,uptime,level,tags -jar app.jar

# Key log tags
# gc           - basic GC events
# gc+heap      - heap before/after
# gc+phases    - GC phase timings
# gc+age       - tenuring age distribution
# gc+promotion - promotion details

Key Metrics to Monitor


Common GC Problems and Solutions

Memory Leak Pattern

Common leak sources:

  • Static collections that grow unbounded

  • Listener/callback registrations never removed

  • ThreadLocal variables not cleared

  • ClassLoader leaks (hot redeployment)

  • Unclosed resources holding references

Premature Promotion

GC Thrashing

When the JVM spends more time in GC than running the application:

// JVM throws this when:
// - More than 98% of time is spent in GC
// - Less than 2% of heap is recovered
java.lang.OutOfMemoryError: GC overhead limit exceeded

FAQs — Garbage Collection

Q1: How does the JVM determine which objects are garbage?

Reachability analysis from GC Roots (stack locals, static fields, active threads, JNI refs). Any object not reachable from a root is garbage. Java does NOT use reference counting, so circular references are handled correctly.

Q2: What is the difference between Minor GC, Major GC, and Full GC?

Minor GC collects Young Gen only (Eden + Survivors). Major GC collects Old Gen. Full GC collects the entire heap plus Metaspace. Minor GCs are fast (most objects die young). Full GC is the most expensive and should be minimized.

Q3: Explain G1 GC in depth

G1 divides the heap into equal-sized regions (1-32MB). Any region can be Eden, Survivor, Old, or Humongous. G1 runs Young GCs (evacuate Eden/Survivor regions) and triggers Concurrent Marking when heap reaches IHOP. After marking, it runs Mixed GCs that collect Young AND the most garbage-filled Old regions. G1 targets a configurable max pause time (default 200ms) by limiting how many regions it collects per pause.

Q4: What is a safepoint and why does it matter?

A safepoint is a point in executing code where a thread can be safely paused for GC. The JVM cannot stop threads at arbitrary points because object references might be in an inconsistent state. Threads check a safepoint flag at method returns, loop back-edges, and between bytecodes. Time-to-safepoint can be a hidden latency source if long-running loops lack safepoint polls.

Q5: How would you diagnose a memory leak in production?

  1. Enable GC logging (-Xlog:gc*) and watch if Old Gen baseline keeps rising after Full GCs.

  2. Take heap dumps (jmap -dump:live,format=b,file=heap.hprof <pid>).

  3. Analyze with Eclipse MAT or VisualVM — look at dominator tree and histogram.

  4. Check retained size by class to find what is holding memory.

  5. Look for GC root paths to leaked objects — the path shows what is preventing collection.

  6. Common culprits: static maps, unclosed resources, ThreadLocal, listener leaks.

Q6: When would you choose ZGC over G1?

ZGC when sub-millisecond pause times are critical regardless of heap size (financial trading, real-time systems). G1 is better for general-purpose workloads where 200ms pauses are acceptable. ZGC has slightly lower throughput than G1 due to load barrier overhead. Generational ZGC (Java 21+) closes the throughput gap significantly.

Q7: What is the write barrier in G1 and why is it needed?

G1 uses two write barriers: Pre-write barrier for SATB marking (captures old reference before overwrite) and post-write barrier for remembered sets (tracks cross-region references). Without these, G1 would need to scan the entire heap to find inter-region references during partial collection.


GC Selection Decision Tree