Lock-Free Programming - Java (Part 8)

You've learned how to build lock-free data structures from scratch. Now let's see how the JDK's best engineers applied these same techniques to build the concurrent collections you use every day.

This article is a guided tour through the internals of ConcurrentLinkedQueue, ConcurrentHashMap, and LongAdder — three crown jewels of java.util.concurrent.

ConcurrentLinkedQueue: Michael-Scott Under the Hood

Java's ConcurrentLinkedQueue is directly based on the Michael-Scott Queue we built in Article 6, with several practical optimizations.

The Core Similarity

Optimization: Lazy Tail Updates

Our queue advances tail on every enqueue. ConcurrentLinkedQueue only advances tail when it falls two or more nodes behind. This halves the number of CAS operations on tail.

Optimization: Self-Linking

When a node is dequeued, its next pointer is set to point to itself. This helps other threads detect that they're looking at a stale node and need to re-read head.

// Simplified from OpenJDK source
private void updateHead(Node<E> h, Node<E> p) {
    if (h != p && head.compareAndSet(h, p)) {
        // Point old head's next to itself — a "tombstone"
        h.next.lazySet(h);  // Self-link
    }
}

Using ConcurrentLinkedQueue

import java.util.concurrent.ConcurrentLinkedQueue;

public class EventBus {
    private final ConcurrentLinkedQueue<Event> events = 
        new ConcurrentLinkedQueue<>();
    
    // Producers — any thread, any time
    public void publish(Event event) {
        events.offer(event);  // Lock-free enqueue
    }
    
    // Consumer — processes events
    public void processAll() {
        Event event;
        while ((event = events.poll()) != null) {  // Lock-free dequeue
            handle(event);
        }
    }
    
    private void handle(Event event) { /* ... */ }
}

ConcurrentHashMap: A Hybrid Masterpiece

ConcurrentHashMap is not purely lock-free — it uses a hybrid approach. Some operations are lock-free (reads, size estimation), while others use fine-grained locking (inserts into a bucket, tree rebalancing).

Architecture Overview

Lock-Free Reads with Volatile

Reads in ConcurrentHashMap are entirely lock-free. The table array is volatile, and each node's val field is also volatile:

// Simplified from OpenJDK ConcurrentHashMap.get()
public V get(Object key) {
    int hash = spread(key.hashCode());
    Node<K,V>[] tab = table;  // Volatile read — sees latest table
    
    if (tab != null) {
        int index = (tab.length - 1) & hash;
        Node<K,V> first = tabAt(tab, index);  // Volatile read of bucket head
        
        if (first != null) {
            if (first.hash == hash && first.key.equals(key)) {
                return first.val;  // Found at head — no traversal needed
            }
            // Traverse the chain...
            Node<K,V> node = first.next;
            while (node != null) {
                if (node.hash == hash && node.key.equals(key)) {
                    return node.val;
                }
                node = node.next;
            }
        }
    }
    return null;
}

No locks anywhere. The volatile reads ensure we see the latest state.

CAS-Based Insert (Empty Bucket)

When inserting into an empty bucket, CAS is used:

// Simplified from OpenJDK ConcurrentHashMap.putVal()
if (tabAt(tab, index) == null) {
    // Empty bucket — try CAS to insert first node
    if (casTabAt(tab, index, null, new Node<>(hash, key, value))) {
        break;  // Success! No locking needed
    }
}

Synchronized Insert (Non-Empty Bucket)

When a bucket already has nodes, ConcurrentHashMap synchronizes on the head node of that specific bucket:

// Simplified from OpenJDK
synchronized (headNode) {
    // Only this bucket is locked — all other buckets are accessible!
    // Traverse chain, insert or update
}

Lock-Free Size Estimation

Counting elements in a concurrent map is hard. ConcurrentHashMap uses LongAdder-style cells for its size counter:

// Each thread increments its own counter cell
// size() sums all cells — approximate but fast
map.size();        // Returns approximate size
map.mappingCount(); // Returns long (for maps > Integer.MAX_VALUE)

The Hybrid Summary

LongAdder: Distributed Contention

We introduced LongAdder in Article 4. Let's dive deeper into how it works.

The Striped Design

LongAdder extends the Striped64 class, which maintains a base value and an array of Cell objects:

The @Contended Annotation

Each Cell is annotated with @jdk.internal.vm.annotation.Contended, which pads the object with ~128 bytes of empty space. This prevents false sharing — where two unrelated cells happen to land on the same cache line.

The Increment Flow

Performance Impact

import java.util.concurrent.atomic.AtomicLong;
import java.util.concurrent.atomic.LongAdder;

// Under heavy contention (32 threads):
// AtomicLong.incrementAndGet():    ~800 ns/op (cache line bouncing)
// LongAdder.increment():           ~30 ns/op (distributed across cells) 🚀

The trade-off: sum() is not an atomic snapshot — elements can be added while summing. For exact counts, you'd need external synchronization.

Patterns Across JDK Concurrent Classes

After examining these implementations, common patterns emerge:

Pattern 1: CAS for Simple, Volatile for Read

Pattern 2: Helping

When one thread finds another thread's operation half-completed, it helps finish it:

ConcurrentLinkedQueue: advance lagging tail pointer
ConcurrentHashMap: help with table resizing (cooperative transfer)

Pattern 3: Layered Strategy

Use the cheapest operation first; fall back to more expensive ones:

Pattern 4: Contention Spreading

When CAS contention is high, spread the work:

Class	Spreading Strategy
`LongAdder`	Separate cells per thread hash
`ConcurrentHashMap`	Separate lock per bucket
`StampedLock`	Optimistic reads avoid the lock entirely

Using These Classes Effectively

import java.util.concurrent.*;
import java.util.concurrent.atomic.*;

public class ConcurrencyCheatSheet {
    
    // ✅ Use ConcurrentHashMap for shared key-value data
    private final ConcurrentHashMap<String, UserSession> sessions = 
        new ConcurrentHashMap<>();
    
    // ✅ Use ConcurrentLinkedQueue for producer-consumer
    private final ConcurrentLinkedQueue<Task> taskQueue = 
        new ConcurrentLinkedQueue<>();
    
    // ✅ Use LongAdder for high-throughput counters
    private final LongAdder requestCount = new LongAdder();
    
    // ✅ Use AtomicReference for lock-free config swaps
    private final AtomicReference<Config> config = 
        new AtomicReference<>(Config.defaults());
    
    // ❌ DON'T: Use Collections.synchronizedMap — wraps with one big lock
    // ❌ DON'T: Use Hashtable — ancient, one big lock
    // ❌ DON'T: Use AtomicLong when LongAdder fits — more contention
}

Key Takeaways

ConcurrentLinkedQueue is a Michael-Scott Queue with lazy tail updates and self-linking optimizations
ConcurrentHashMap is hybrid: lock-free reads, CAS for empty buckets, per-bucket locks for insertions
LongAdder distributes contention across @Contended-padded cells — 25x faster than AtomicLong under contention
Common patterns: CAS for simple mutations, volatile for reads, helping for progress, contention spreading for throughput
Use the JDK classes — they've been tested by millions of applications and optimized by world-class engineers