Lock-Free Programming - Java (Part 6)
Building a Lock-Free Queue (Michael-Scott Queue)
The Treiber Stack was our warm-up. Now for the main event: the Michael-Scott Queue — arguably the most important lock-free data structure ever designed. Published by Mick Michael and Michael Scott in 1996, it's the algorithm behind Java's ConcurrentLinkedQueue and has influenced every lock-free queue since.
A queue is harder than a stack because it has two ends: items go in at the tail and come out at the head. That means we need two atomic pointers working in harmony — and that's where things get interesting.
Queues vs Stacks: Why Queues Are Harder
A stack has one "hot spot" — the top pointer. A queue has two:
The challenge: enqueue modifies the tail, dequeue modifies the head, and the last node's next pointer bridges them. We need all three to stay consistent without locks.
The Sentinel Node Trick
The Michael-Scott Queue uses a clever trick: a dummy (sentinel) node. The queue always starts with an empty node that doesn't contain real data. This eliminates tricky edge cases when the queue is empty.
Empty Queue
Both head and tail point to the same sentinel node. The queue is "empty" even though there's one node.
After enqueue("A")
After enqueue("B")
After dequeue() → returns "A"
Notice: when we dequeue, the old sentinel is discarded and the dequeued node becomes the new sentinel. Its value has been consumed; it now serves as the divider between "already consumed" and "available."
The Full Implementation
import java.util.concurrent.atomic.AtomicReference;
public class MichaelScottQueue<T> {
private static class Node<T> {
final T value;
final AtomicReference<Node<T>> next;
Node(T value) {
this.value = value;
this.next = new AtomicReference<>(null);
}
}
private final AtomicReference<Node<T>> head;
private final AtomicReference<Node<T>> tail;
public MichaelScottQueue() {
Node<T> sentinel = new Node<>(null); // Dummy node
head = new AtomicReference<>(sentinel);
tail = new AtomicReference<>(sentinel);
}
/**
* Enqueue: Add an item to the tail of the queue.
* Lock-free: never blocks.
*/
public void enqueue(T value) {
Node<T> newNode = new Node<>(value);
while (true) {
Node<T> currentTail = tail.get();
Node<T> tailNext = currentTail.next.get();
// Is tail still the last node?
if (currentTail == tail.get()) { // Re-check consistency
if (tailNext == null) {
// Tail IS the last node. Try to link our new node after it.
if (currentTail.next.compareAndSet(null, newNode)) {
// Linked! Now try to swing tail forward.
tail.compareAndSet(currentTail, newNode);
return; // Enqueue complete
}
} else {
// Tail is NOT the last node — another enqueue is in progress!
// Help it by advancing tail forward.
tail.compareAndSet(currentTail, tailNext);
}
}
}
}
/**
* Dequeue: Remove and return the item at the head of the queue.
* Returns null if the queue is empty.
* Lock-free: never blocks.
*/
public T dequeue() {
while (true) {
Node<T> currentHead = head.get();
Node<T> currentTail = tail.get();
Node<T> headNext = currentHead.next.get();
// Is head still consistent?
if (currentHead == head.get()) {
if (currentHead == currentTail) {
// Head and tail point to the same node
if (headNext == null) {
return null; // Queue is truly empty
}
// Tail is lagging behind — advance it
tail.compareAndSet(currentTail, headNext);
} else {
// Read the value BEFORE the CAS (another thread might
// dequeue and free this node right after our CAS)
T value = headNext.value;
// Try to advance head to the next node
if (head.compareAndSet(currentHead, headNext)) {
return value; // Dequeue complete
}
}
}
}
}
public boolean isEmpty() {
Node<T> currentHead = head.get();
return currentHead.next.get() == null;
}
}
This is more complex than the Treiber Stack. Let's break down the tricky parts.
Enqueue: Step by Step
The enqueue has two phases:
The Two-Step Protocol
Enqueue is not a single CAS. It requires two modifications:
Link the new node after the current tail (
tail.next = newNode)Swing the tail pointer forward (
tail = newNode)
These two steps are separate CAS operations, which means there's a brief moment when the tail pointer is "lagging behind" — it still points to the old last node even though a new node has been linked.
The "Helping" Mechanism
This is the brilliant part. If Thread 2 starts an enqueue and finds that tail.next is not null, it means Thread 1 is between step 1 and step 2. Instead of waiting, Thread 2 helps by advancing the tail pointer for Thread 1!
This helping pattern is what makes the algorithm lock-free. No thread ever waits — if you find someone else mid-operation, you help them finish and then proceed with your own work.
Dequeue: Step by Step
Why Read the Value BEFORE the CAS?
This is a subtle but critical detail:
T value = headNext.value; // Read FIRST
if (head.compareAndSet(currentHead, headNext)) { // CAS SECOND
return value;
}
We read headNext.value before the CAS because once the CAS succeeds, headNext becomes the new sentinel node. Another thread might immediately dequeue and overwrite it. By reading the value first, we ensure we have a copy regardless of what other threads do.
Concurrent Enqueue and Dequeue
Here's a full scenario with three threads operating simultaneously:
Everything works correctly despite all three threads operating at the same time!
Testing the Michael-Scott Queue
import java.util.*;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicInteger;
public class MichaelScottQueueTest {
public static void main(String[] args) throws Exception {
MichaelScottQueue<Integer> queue = new MichaelScottQueue<>();
int numProducers = 4;
int numConsumers = 4;
int itemsPerProducer = 100_000;
Set<Integer> produced = ConcurrentHashMap.newKeySet();
Set<Integer> consumed = ConcurrentHashMap.newKeySet();
AtomicInteger produceCount = new AtomicInteger(0);
AtomicInteger consumeCount = new AtomicInteger(0);
ExecutorService executor = Executors.newFixedThreadPool(
numProducers + numConsumers
);
// Start producers
CountDownLatch producersDone = new CountDownLatch(numProducers);
for (int p = 0; p < numProducers; p++) {
final int producerId = p;
executor.submit(() -> {
for (int i = 0; i < itemsPerProducer; i++) {
int item = producerId * itemsPerProducer + i;
queue.enqueue(item);
produced.add(item);
produceCount.incrementAndGet();
}
producersDone.countDown();
});
}
// Wait for all producers to finish
producersDone.await();
// Start consumers — drain the queue
CountDownLatch consumersDone = new CountDownLatch(numConsumers);
for (int c = 0; c < numConsumers; c++) {
executor.submit(() -> {
Integer item;
while ((item = queue.dequeue()) != null) {
if (!consumed.add(item)) {
System.err.println("DUPLICATE consumed: " + item);
}
consumeCount.incrementAndGet();
}
consumersDone.countDown();
});
}
consumersDone.await();
executor.shutdown();
int expected = numProducers * itemsPerProducer;
System.out.printf("Produced: %d, Consumed: %d%n",
produceCount.get(), consumeCount.get());
System.out.printf("Unique produced: %d, Unique consumed: %d%n",
produced.size(), consumed.size());
System.out.println(consumed.equals(produced) ? "✅ PASS" : "❌ FAIL");
}
}
Treiber Stack vs Michael-Scott Queue
| Feature | Treiber Stack | Michael-Scott Queue |
|---|---|---|
| Data structure | LIFO (Last-In, First-Out) | FIFO (First-In, First-Out) |
| Atomic pointers | 1 (head) | 2 (head + tail) |
| Nodes | Fully immutable | Mutable next pointer (AtomicReference) |
| Sentinel node | Not needed | Required (simplifies empty queue) |
| Helping | Not needed | Yes — threads help advance lagging tail |
| CAS operations per op | 1 | 1-2 |
| Complexity | Simple | Moderate |
| Real-world usage | Less common | ConcurrentLinkedQueue is based on this |
Why the Helping Pattern Matters
The helping mechanism is what elevates this from a clever hack to a principled design:
This "helping" principle appears across many lock-free algorithms:
If you find the data structure in an inconsistent state, fix it yourself and continue.
Key Takeaways
The Michael-Scott Queue uses a linked list with CAS on both head and tail pointers
A sentinel (dummy) node simplifies empty-queue handling
Enqueue is a two-step operation: link the new node, then swing the tail
The helping mechanism ensures lock-freedom: if tail is lagging, any thread can advance it
Read values before CAS in dequeue — after CAS, the node becomes the new sentinel
Java's
ConcurrentLinkedQueueis based on this algorithm