Lock-Free Programming - Java (Part 8)
Lock-Free Patterns in Java's Concurrent Collections
You've learned how to build lock-free data structures from scratch. Now let's see how the JDK's best engineers applied these same techniques to build the concurrent collections you use every day.
This article is a guided tour through the internals of ConcurrentLinkedQueue, ConcurrentHashMap, and LongAdder — three crown jewels of java.util.concurrent.
ConcurrentLinkedQueue: Michael-Scott Under the Hood
Java's ConcurrentLinkedQueue is directly based on the Michael-Scott Queue we built in Article 6, with several practical optimizations.
The Core Similarity
Optimization: Lazy Tail Updates
Our queue advances tail on every enqueue. ConcurrentLinkedQueue only advances tail when it falls two or more nodes behind. This halves the number of CAS operations on tail.
Optimization: Self-Linking
When a node is dequeued, its next pointer is set to point to itself. This helps other threads detect that they're looking at a stale node and need to re-read head.
// Simplified from OpenJDK source
private void updateHead(Node<E> h, Node<E> p) {
if (h != p && head.compareAndSet(h, p)) {
// Point old head's next to itself — a "tombstone"
h.next.lazySet(h); // Self-link
}
}
Using ConcurrentLinkedQueue
import java.util.concurrent.ConcurrentLinkedQueue;
public class EventBus {
private final ConcurrentLinkedQueue<Event> events =
new ConcurrentLinkedQueue<>();
// Producers — any thread, any time
public void publish(Event event) {
events.offer(event); // Lock-free enqueue
}
// Consumer — processes events
public void processAll() {
Event event;
while ((event = events.poll()) != null) { // Lock-free dequeue
handle(event);
}
}
private void handle(Event event) { /* ... */ }
}
ConcurrentHashMap: A Hybrid Masterpiece
ConcurrentHashMap is not purely lock-free — it uses a hybrid approach. Some operations are lock-free (reads, size estimation), while others use fine-grained locking (inserts into a bucket, tree rebalancing).
Architecture Overview
Lock-Free Reads with Volatile
Reads in ConcurrentHashMap are entirely lock-free. The table array is volatile, and each node's val field is also volatile:
// Simplified from OpenJDK ConcurrentHashMap.get()
public V get(Object key) {
int hash = spread(key.hashCode());
Node<K,V>[] tab = table; // Volatile read — sees latest table
if (tab != null) {
int index = (tab.length - 1) & hash;
Node<K,V> first = tabAt(tab, index); // Volatile read of bucket head
if (first != null) {
if (first.hash == hash && first.key.equals(key)) {
return first.val; // Found at head — no traversal needed
}
// Traverse the chain...
Node<K,V> node = first.next;
while (node != null) {
if (node.hash == hash && node.key.equals(key)) {
return node.val;
}
node = node.next;
}
}
}
return null;
}
No locks anywhere. The volatile reads ensure we see the latest state.
CAS-Based Insert (Empty Bucket)
When inserting into an empty bucket, CAS is used:
// Simplified from OpenJDK ConcurrentHashMap.putVal()
if (tabAt(tab, index) == null) {
// Empty bucket — try CAS to insert first node
if (casTabAt(tab, index, null, new Node<>(hash, key, value))) {
break; // Success! No locking needed
}
}
Synchronized Insert (Non-Empty Bucket)
When a bucket already has nodes, ConcurrentHashMap synchronizes on the head node of that specific bucket:
// Simplified from OpenJDK
synchronized (headNode) {
// Only this bucket is locked — all other buckets are accessible!
// Traverse chain, insert or update
}
Lock-Free Size Estimation
Counting elements in a concurrent map is hard. ConcurrentHashMap uses LongAdder-style cells for its size counter:
// Each thread increments its own counter cell
// size() sums all cells — approximate but fast
map.size(); // Returns approximate size
map.mappingCount(); // Returns long (for maps > Integer.MAX_VALUE)
The Hybrid Summary
LongAdder: Distributed Contention
We introduced LongAdder in Article 4. Let's dive deeper into how it works.
The Striped Design
LongAdder extends the Striped64 class, which maintains a base value and an array of Cell objects:
The @Contended Annotation
Each Cell is annotated with @jdk.internal.vm.annotation.Contended, which pads the object with ~128 bytes of empty space. This prevents false sharing — where two unrelated cells happen to land on the same cache line.
The Increment Flow
Performance Impact
import java.util.concurrent.atomic.AtomicLong;
import java.util.concurrent.atomic.LongAdder;
// Under heavy contention (32 threads):
// AtomicLong.incrementAndGet(): ~800 ns/op (cache line bouncing)
// LongAdder.increment(): ~30 ns/op (distributed across cells) 🚀
The trade-off: sum() is not an atomic snapshot — elements can be added while summing. For exact counts, you'd need external synchronization.
Patterns Across JDK Concurrent Classes
After examining these implementations, common patterns emerge:
Pattern 1: CAS for Simple, Volatile for Read
Pattern 2: Helping
When one thread finds another thread's operation half-completed, it helps finish it:
ConcurrentLinkedQueue: advance lagging tail pointerConcurrentHashMap: help with table resizing (cooperative transfer)
Pattern 3: Layered Strategy
Use the cheapest operation first; fall back to more expensive ones:
Pattern 4: Contention Spreading
When CAS contention is high, spread the work:
| Class | Spreading Strategy |
|---|---|
LongAdder |
Separate cells per thread hash |
ConcurrentHashMap |
Separate lock per bucket |
StampedLock |
Optimistic reads avoid the lock entirely |
Using These Classes Effectively
import java.util.concurrent.*;
import java.util.concurrent.atomic.*;
public class ConcurrencyCheatSheet {
// ✅ Use ConcurrentHashMap for shared key-value data
private final ConcurrentHashMap<String, UserSession> sessions =
new ConcurrentHashMap<>();
// ✅ Use ConcurrentLinkedQueue for producer-consumer
private final ConcurrentLinkedQueue<Task> taskQueue =
new ConcurrentLinkedQueue<>();
// ✅ Use LongAdder for high-throughput counters
private final LongAdder requestCount = new LongAdder();
// ✅ Use AtomicReference for lock-free config swaps
private final AtomicReference<Config> config =
new AtomicReference<>(Config.defaults());
// ❌ DON'T: Use Collections.synchronizedMap — wraps with one big lock
// ❌ DON'T: Use Hashtable — ancient, one big lock
// ❌ DON'T: Use AtomicLong when LongAdder fits — more contention
}
Key Takeaways
ConcurrentLinkedQueueis a Michael-Scott Queue with lazy tail updates and self-linking optimizationsConcurrentHashMapis hybrid: lock-free reads, CAS for empty buckets, per-bucket locks for insertionsLongAdderdistributes contention across@Contended-padded cells — 25x faster thanAtomicLongunder contentionCommon patterns: CAS for simple mutations, volatile for reads, helping for progress, contention spreading for throughput
Use the JDK classes — they've been tested by millions of applications and optimized by world-class engineers