Lock-Free Programming - Java (Part 5)

The Treiber Stack (published by R. Kent Treiber in 1986) is the simplest lock-free data structure, and it's the perfect starting point for understanding how CAS and AtomicReference come together to build concurrent collections without locks.

By the end of this article, you'll have a fully working lock-free stack and a deep understanding of why it works.

What's a Stack?

A stack is a Last-In, First-Out (LIFO) data structure. Think of a stack of plates:

Push: Place a plate on top
Pop: Take the plate off the top

The challenge: how do we make push and pop thread-safe without using locks?

The Naive Lock-Based Stack

First, let's see the lock-based version so we have a baseline:

import java.util.concurrent.locks.ReentrantLock;

public class LockedStack<T> {
    private static class Node<T> {
        final T value;
        Node<T> next;
        
        Node(T value, Node<T> next) {
            this.value = value;
            this.next = next;
        }
    }
    
    private Node<T> top = null;
    private final ReentrantLock lock = new ReentrantLock();
    
    public void push(T value) {
        lock.lock();
        try {
            top = new Node<>(value, top);
        } finally {
            lock.unlock();
        }
    }
    
    public T pop() {
        lock.lock();
        try {
            if (top == null) return null;
            T value = top.value;
            top = top.next;
            return value;
        } finally {
            lock.unlock();
        }
    }
}

This works, but:

Only one thread can push or pop at a time
If the thread holding the lock is suspended by the OS, all other threads are blocked
Potential for deadlock if combined with other locks

The Treiber Stack: Lock-Free Design

The Treiber Stack uses a singly-linked list with an AtomicReference pointing to the top node. Pushes and pops use CAS loops on the top reference.

Data Structure

Full Implementation

import java.util.concurrent.atomic.AtomicReference;

public class TreiberStack<T> {
    
    /**
     * Immutable node in the stack.
     * Once created, a node's value and next pointer never change.
     */
    private static class Node<T> {
        final T value;
        final Node<T> next;
        
        Node(T value, Node<T> next) {
            this.value = value;
            this.next = next;
        }
    }
    
    /** 
     * Pointer to the top of the stack.
     * This is the ONLY mutable state — and it's atomic.
     */
    private final AtomicReference<Node<T>> head = 
        new AtomicReference<>(null);
    
    /**
     * Push a value onto the stack.
     * Lock-free: uses CAS loop, never blocks.
     */
    public void push(T value) {
        Node<T> newHead;
        Node<T> currentHead;
        do {
            currentHead = head.get();                    // 1. Read current top
            newHead = new Node<>(value, currentHead);    // 2. Create new node pointing to current top
        } while (!head.compareAndSet(currentHead, newHead)); // 3. CAS: set top to new node
    }
    
    /**
     * Pop a value from the stack.
     * Lock-free: uses CAS loop, never blocks.
     * Returns null if the stack is empty.
     */
    public T pop() {
        Node<T> currentHead;
        Node<T> newHead;
        do {
            currentHead = head.get();                    // 1. Read current top
            if (currentHead == null) {
                return null;                              // Stack is empty
            }
            newHead = currentHead.next;                  // 2. Next node becomes new top
        } while (!head.compareAndSet(currentHead, newHead)); // 3. CAS: set top to next node
        return currentHead.value;
    }
    
    /**
     * Peek at the top value without removing it.
     */
    public T peek() {
        Node<T> currentHead = head.get();
        return currentHead != null ? currentHead.value : null;
    }
    
    /**
     * Check if the stack is empty.
     */
    public boolean isEmpty() {
        return head.get() == null;
    }
}

That's it. The entire lock-free stack in ~50 lines. Let's trace through each operation.

Push: Step by Step

Let's see how push("D") works when the stack contains [C, B, A]:

Push with Contention

Two threads pushing simultaneously:

Both values made it onto the stack. No locks, no lost pushes.

Pop: Step by Step

Let's see how pop() works when the stack contains [D, C, B, A]:

Pop with Contention

Each thread gets a different value. Nothing was popped twice or lost.

Why It's Correct

The correctness of the Treiber Stack rests on a few key properties:

1. Nodes Are Immutable

Once created, a node's value and next never change. This means if you read a node, you can trust its contents — no other thread will modify it under you.

2. Single Atomic Point of Mutation

The only thing that changes is the head reference, and it changes via CAS. This means:

Concurrent pushes never lose a node (failed CAS → retry with correct top)
Concurrent pops never pop the same node twice (failed CAS → retry with new top)

3. Linearizability

Each operation appears to take effect at a single atomic instant (the moment the CAS succeeds). This is called linearizability — the gold standard of correctness for concurrent data structures.

Testing the Treiber Stack

Let's write a concurrent test to verify correctness:

import java.util.*;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicInteger;

public class TreiberStackTest {
    
    public static void main(String[] args) throws Exception {
        TreiberStack<Integer> stack = new TreiberStack<>();
        int numThreads = 8;
        int numOpsPerThread = 100_000;
        
        // Phase 1: All threads push
        ExecutorService executor = Executors.newFixedThreadPool(numThreads);
        CountDownLatch latch = new CountDownLatch(numThreads);
        
        for (int t = 0; t < numThreads; t++) {
            final int threadId = t;
            executor.submit(() -> {
                for (int i = 0; i < numOpsPerThread; i++) {
                    stack.push(threadId * numOpsPerThread + i);
                }
                latch.countDown();
            });
        }
        
        latch.await();
        
        // Phase 2: Pop all and verify
        Set<Integer> popped = ConcurrentHashMap.newKeySet();
        AtomicInteger popCount = new AtomicInteger(0);
        CountDownLatch latch2 = new CountDownLatch(numThreads);
        
        for (int t = 0; t < numThreads; t++) {
            executor.submit(() -> {
                Integer val;
                while ((val = stack.pop()) != null) {
                    if (!popped.add(val)) {
                        System.err.println("DUPLICATE: " + val);
                    }
                    popCount.incrementAndGet();
                }
                latch2.countDown();
            });
        }
        
        latch2.await();
        executor.shutdown();
        
        int expected = numThreads * numOpsPerThread;
        System.out.printf("Pushed: %d, Popped: %d, Unique: %d%n", 
            expected, popCount.get(), popped.size());
        System.out.println(popCount.get() == expected ? "✅ PASS" : "❌ FAIL");
    }
}

Expected output:

Pushed: 800000, Popped: 800000, Unique: 800000
✅ PASS

Performance Comparison

How does the Treiber Stack compare to a synchronized stack?

Note: These are illustrative numbers. Actual performance depends on your hardware, JVM version, and workload. Always benchmark your specific use case (we'll cover JMH benchmarking in Article 9).

The performance gap widens as contention increases, because:

Locked stack: Threads waiting for the lock get context-switched by the OS (expensive)
Lock-free stack: Threads that lose the CAS race just spin and retry (cheap, stays on CPU)

Limitations and Caveats

1. Memory Reclamation

In garbage-collected languages like Java, memory reclamation is handled by the GC. But in C/C++, deciding when to free a popped node is extremely hard — another thread might still be reading it. This problem is solved by hazard pointers or epoch-based reclamation (not needed in Java — the GC handles it!).

2. The ABA Problem

The Treiber Stack is susceptible to the ABA problem in certain scenarios. We'll cover this later in the series, but here's the short version: if a node is popped and a new node with the same reference is pushed before a competing thread finishes its CAS, the CAS might succeed when it shouldn't.

In practice, this is extremely rare in Java because:

The GC prevents address reuse while any reference exists
New nodes are always fresh objects with unique references

3. No Size Operation

A lock-free stack doesn't have an efficient size() method. You'd need to traverse the entire list, and the result would be stale by the time you return it. If you need a size, maintain a separate AtomicInteger counter (at the cost of an extra CAS per push/pop).

Key Takeaways

The Treiber Stack is the simplest lock-free data structure — a linked list with CAS on the head pointer
The pattern is: immutable nodes + AtomicReference to the head + CAS loops
Push: Create node pointing to current head → CAS head to new node
Pop: Read current head → CAS head to head.next → return old head's value
It's linearizable — each operation appears to happen at a single instant (the CAS moment)
Performance advantage grows with contention level