Java Memory Deep Dive
Part 3: Garbage Collection
Core GC Concepts
What is Garbage?
An object is garbage when it is unreachable from any GC Root. The JVM never uses reference counting (unlike Python/Swift).
[!IMPORTANT] Interview Key: Objects D, E, F form a reference cycle but are STILL garbage because no GC root can reach them. Java GC handles cycles correctly — unlike naive reference counting.
Reachability Analysis (Mark Phase)
The Three Fundamental GC Operations
GC Algorithm Strategies
Mark-Sweep
2.2 Mark-Sweep-Compact
2.3 Copying Collector
2.4 Algorithm Comparison
| Algorithm | Fragmentation | Speed | Memory Overhead | Used In |
|---|---|---|---|---|
| Mark-Sweep | Yes | Fast sweep | None | CMS (old gen) |
| Mark-Compact | No | Slower (moving) | None | Serial, Parallel (old gen) |
| Copying | No | Fastest for low survival | 2x space (from/to) | All (young gen) |
3. GC Types: Minor, Major, Full
3.1 Minor GC Walkthrough
3.2 Card Table and Remembered Sets
Problem: During Minor GC, we only scan Young Gen. But Old Gen objects might reference Young Gen objects. How do we find those references without scanning the entire Old Gen?
[!TIP] Write Barrier: A small piece of code injected by the JIT at every reference store. When you write
oldObj.field = youngObj, the barrier marks the card as dirty. This is the cost of generational GC — every reference write has a small overhead.
4. JVM Garbage Collectors
4.1 Collector Evolution Timeline
4.2 Serial GC (-XX:+UseSerialGC)
Use case: Client apps, small heaps under 100MB, single-CPU machines.
4.3 Parallel GC (-XX:+UseParallelGC)
Default in Java 8. Uses multiple GC threads for higher throughput.
| Flag | Purpose |
|---|---|
-XX:ParallelGCThreads=N |
Number of GC threads |
-XX:MaxGCPauseMillis=200 |
Target max pause |
-XX:GCTimeRatio=99 |
1% time in GC |
4.4 G1 GC (-XX:+UseG1GC)
Default since Java 9. The most important collector to understand for interviews.
G1 Region-Based Heap
[!NOTE] G1 divides the heap into equal-sized regions (1MB–32MB, auto-calculated). Any region can serve any role. This flexibility allows G1 to collect the most garbage-filled regions first — hence "Garbage First."
G1 Collection Phases
G1 Key Concepts
IHOP (Initiating Heap Occupancy Percent): When Old Gen reaches this threshold, concurrent marking starts. Default: adaptive (starts ~45%).
Evacuation: G1 does not sweep in place — it copies live objects from collected regions to free regions.
Mixed GC: Collects both Young AND selected Old regions in a single STW pause.
SATB (Snapshot-At-The-Beginning): G1 takes a logical snapshot of the object graph at the start of concurrent marking. Any new references created during marking are captured via write barriers.
G1 Tuning Flags
| Flag | Purpose | Default |
|---|---|---|
-XX:MaxGCPauseMillis |
Target pause time | 200ms |
-XX:G1HeapRegionSize |
Region size | Auto (1-32MB) |
-XX:InitiatingHeapOccupancyPercent |
IHOP trigger | 45% (adaptive) |
-XX:G1MixedGCCountTarget |
Mixed GCs per cycle | 8 |
-XX:G1HeapWastePercent |
Stop mixed if waste below | 5% |
4.5 ZGC (-XX:+UseZGC)
Ultra-low latency collector. Pauses are sub-millisecond regardless of heap size.
ZGC Key Properties:
Pauses are O(1) — do NOT scale with heap or live set size
Supports multi-terabyte heaps
Concurrent relocation (compaction while app runs)
Uses colored pointers and load barriers
Generational ZGC (Java 21+) adds generations for better throughput
4.6 Collector Comparison
| Collector | Pauses | Throughput | Heap Size | Best For |
|---|---|---|---|---|
| Serial | Long, single-thread | Low | Small | Embedded, client |
| Parallel | Medium, multi-thread | Highest | Medium-Large | Batch, throughput |
| G1 | Predictable target | Good | Large | General purpose |
| ZGC | Sub-ms | Good | Any (TB scale) | Latency-critical |
| Shenandoah | Sub-ms | Good | Any | Latency-critical (RedHat) |
5. Safepoints — How the JVM Pauses Threads
GC cannot pause threads at arbitrary points. Threads must reach a safepoint first.
[!WARNING] Time-To-Safepoint (TTSP) can be a hidden latency source. A counted loop without a safepoint poll (e.g.,
for(int i=0; i<1_000_000; i++)) can delay GC start. Use-XX:+UseCountedLoopSafepoints(default since Java 17) to insert safepoints in counted loops.
6. Reference Types and GC
Java provides four reference strengths that interact with GC:
// Strong - default
Object strong = new Object();
// Soft - cleared when memory is low
SoftReference<byte[]> cache = new SoftReference<>(new byte[1024*1024]);
// Weak - cleared at next GC regardless of memory
WeakReference<Object> weak = new WeakReference<>(new Object());
// WeakHashMap uses this for auto-expiring entries
// Phantom - for post-mortem cleanup (replaces finalize())
PhantomReference<Object> phantom = new PhantomReference<>(obj, referenceQueue);
GC Logging and Analysis
Enable GC Logging
# Java 9+ (Unified Logging)
java -Xlog:gc*:file=gc.log:time,uptime,level,tags -jar app.jar
# Key log tags
# gc - basic GC events
# gc+heap - heap before/after
# gc+phases - GC phase timings
# gc+age - tenuring age distribution
# gc+promotion - promotion details
Key Metrics to Monitor
Common GC Problems and Solutions
Memory Leak Pattern
Common leak sources:
Static collections that grow unbounded
Listener/callback registrations never removed
ThreadLocal variables not cleared
ClassLoader leaks (hot redeployment)
Unclosed resources holding references
Premature Promotion
GC Thrashing
When the JVM spends more time in GC than running the application:
// JVM throws this when:
// - More than 98% of time is spent in GC
// - Less than 2% of heap is recovered
java.lang.OutOfMemoryError: GC overhead limit exceeded
FAQs — Garbage Collection
Q1: How does the JVM determine which objects are garbage?
Reachability analysis from GC Roots (stack locals, static fields, active threads, JNI refs). Any object not reachable from a root is garbage. Java does NOT use reference counting, so circular references are handled correctly.
Q2: What is the difference between Minor GC, Major GC, and Full GC?
Minor GC collects Young Gen only (Eden + Survivors). Major GC collects Old Gen. Full GC collects the entire heap plus Metaspace. Minor GCs are fast (most objects die young). Full GC is the most expensive and should be minimized.
Q3: Explain G1 GC in depth
G1 divides the heap into equal-sized regions (1-32MB). Any region can be Eden, Survivor, Old, or Humongous. G1 runs Young GCs (evacuate Eden/Survivor regions) and triggers Concurrent Marking when heap reaches IHOP. After marking, it runs Mixed GCs that collect Young AND the most garbage-filled Old regions. G1 targets a configurable max pause time (default 200ms) by limiting how many regions it collects per pause.
Q4: What is a safepoint and why does it matter?
A safepoint is a point in executing code where a thread can be safely paused for GC. The JVM cannot stop threads at arbitrary points because object references might be in an inconsistent state. Threads check a safepoint flag at method returns, loop back-edges, and between bytecodes. Time-to-safepoint can be a hidden latency source if long-running loops lack safepoint polls.
Q5: How would you diagnose a memory leak in production?
Enable GC logging (
-Xlog:gc*) and watch if Old Gen baseline keeps rising after Full GCs.Take heap dumps (
jmap -dump:live,format=b,file=heap.hprof <pid>).Analyze with Eclipse MAT or VisualVM — look at dominator tree and histogram.
Check retained size by class to find what is holding memory.
Look for GC root paths to leaked objects — the path shows what is preventing collection.
Common culprits: static maps, unclosed resources, ThreadLocal, listener leaks.
Q6: When would you choose ZGC over G1?
ZGC when sub-millisecond pause times are critical regardless of heap size (financial trading, real-time systems). G1 is better for general-purpose workloads where 200ms pauses are acceptable. ZGC has slightly lower throughput than G1 due to load barrier overhead. Generational ZGC (Java 21+) closes the throughput gap significantly.
Q7: What is the write barrier in G1 and why is it needed?
G1 uses two write barriers: Pre-write barrier for SATB marking (captures old reference before overwrite) and post-write barrier for remembered sets (tracks cross-region references). Without these, G1 would need to scan the entire heap to find inter-region references during partial collection.
GC Selection Decision Tree
