Skip to main content

Command Palette

Search for a command to run...

Lock-Free Programming - Java (Part 6)

Building a Lock-Free Queue (Michael-Scott Queue)

Published
7 min read

The Treiber Stack was our warm-up. Now for the main event: the Michael-Scott Queue — arguably the most important lock-free data structure ever designed. Published by Mick Michael and Michael Scott in 1996, it's the algorithm behind Java's ConcurrentLinkedQueue and has influenced every lock-free queue since.

A queue is harder than a stack because it has two ends: items go in at the tail and come out at the head. That means we need two atomic pointers working in harmony — and that's where things get interesting.


Queues vs Stacks: Why Queues Are Harder

A stack has one "hot spot" — the top pointer. A queue has two:

The challenge: enqueue modifies the tail, dequeue modifies the head, and the last node's next pointer bridges them. We need all three to stay consistent without locks.


The Sentinel Node Trick

The Michael-Scott Queue uses a clever trick: a dummy (sentinel) node. The queue always starts with an empty node that doesn't contain real data. This eliminates tricky edge cases when the queue is empty.

Empty Queue

Both head and tail point to the same sentinel node. The queue is "empty" even though there's one node.

After enqueue("A")

After enqueue("B")

After dequeue() → returns "A"

Notice: when we dequeue, the old sentinel is discarded and the dequeued node becomes the new sentinel. Its value has been consumed; it now serves as the divider between "already consumed" and "available."


The Full Implementation

import java.util.concurrent.atomic.AtomicReference;

public class MichaelScottQueue<T> {
    
    private static class Node<T> {
        final T value;
        final AtomicReference<Node<T>> next;
        
        Node(T value) {
            this.value = value;
            this.next = new AtomicReference<>(null);
        }
    }
    
    private final AtomicReference<Node<T>> head;
    private final AtomicReference<Node<T>> tail;
    
    public MichaelScottQueue() {
        Node<T> sentinel = new Node<>(null);  // Dummy node
        head = new AtomicReference<>(sentinel);
        tail = new AtomicReference<>(sentinel);
    }
    
    /**
     * Enqueue: Add an item to the tail of the queue.
     * Lock-free: never blocks.
     */
    public void enqueue(T value) {
        Node<T> newNode = new Node<>(value);
        
        while (true) {
            Node<T> currentTail = tail.get();
            Node<T> tailNext = currentTail.next.get();
            
            // Is tail still the last node?
            if (currentTail == tail.get()) {  // Re-check consistency
                
                if (tailNext == null) {
                    // Tail IS the last node. Try to link our new node after it.
                    if (currentTail.next.compareAndSet(null, newNode)) {
                        // Linked! Now try to swing tail forward.
                        tail.compareAndSet(currentTail, newNode);
                        return;  // Enqueue complete
                    }
                } else {
                    // Tail is NOT the last node — another enqueue is in progress!
                    // Help it by advancing tail forward.
                    tail.compareAndSet(currentTail, tailNext);
                }
            }
        }
    }
    
    /**
     * Dequeue: Remove and return the item at the head of the queue.
     * Returns null if the queue is empty.
     * Lock-free: never blocks.
     */
    public T dequeue() {
        while (true) {
            Node<T> currentHead = head.get();
            Node<T> currentTail = tail.get();
            Node<T> headNext = currentHead.next.get();
            
            // Is head still consistent?
            if (currentHead == head.get()) {
                
                if (currentHead == currentTail) {
                    // Head and tail point to the same node
                    if (headNext == null) {
                        return null;  // Queue is truly empty
                    }
                    // Tail is lagging behind — advance it
                    tail.compareAndSet(currentTail, headNext);
                    
                } else {
                    // Read the value BEFORE the CAS (another thread might 
                    // dequeue and free this node right after our CAS)
                    T value = headNext.value;
                    
                    // Try to advance head to the next node
                    if (head.compareAndSet(currentHead, headNext)) {
                        return value;  // Dequeue complete
                    }
                }
            }
        }
    }
    
    public boolean isEmpty() {
        Node<T> currentHead = head.get();
        return currentHead.next.get() == null;
    }
}

This is more complex than the Treiber Stack. Let's break down the tricky parts.


Enqueue: Step by Step

The enqueue has two phases:

The Two-Step Protocol

Enqueue is not a single CAS. It requires two modifications:

  1. Link the new node after the current tail (tail.next = newNode)

  2. Swing the tail pointer forward (tail = newNode)

These two steps are separate CAS operations, which means there's a brief moment when the tail pointer is "lagging behind" — it still points to the old last node even though a new node has been linked.

The "Helping" Mechanism

This is the brilliant part. If Thread 2 starts an enqueue and finds that tail.next is not null, it means Thread 1 is between step 1 and step 2. Instead of waiting, Thread 2 helps by advancing the tail pointer for Thread 1!

This helping pattern is what makes the algorithm lock-free. No thread ever waits — if you find someone else mid-operation, you help them finish and then proceed with your own work.


Dequeue: Step by Step

Why Read the Value BEFORE the CAS?

This is a subtle but critical detail:

T value = headNext.value;                       // Read FIRST
if (head.compareAndSet(currentHead, headNext)) { // CAS SECOND
    return value;
}

We read headNext.value before the CAS because once the CAS succeeds, headNext becomes the new sentinel node. Another thread might immediately dequeue and overwrite it. By reading the value first, we ensure we have a copy regardless of what other threads do.


Concurrent Enqueue and Dequeue

Here's a full scenario with three threads operating simultaneously:

Everything works correctly despite all three threads operating at the same time!


Testing the Michael-Scott Queue

import java.util.*;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicInteger;

public class MichaelScottQueueTest {
    
    public static void main(String[] args) throws Exception {
        MichaelScottQueue<Integer> queue = new MichaelScottQueue<>();
        int numProducers = 4;
        int numConsumers = 4;
        int itemsPerProducer = 100_000;
        
        Set<Integer> produced = ConcurrentHashMap.newKeySet();
        Set<Integer> consumed = ConcurrentHashMap.newKeySet();
        AtomicInteger produceCount = new AtomicInteger(0);
        AtomicInteger consumeCount = new AtomicInteger(0);
        
        ExecutorService executor = Executors.newFixedThreadPool(
            numProducers + numConsumers
        );
        
        // Start producers
        CountDownLatch producersDone = new CountDownLatch(numProducers);
        for (int p = 0; p < numProducers; p++) {
            final int producerId = p;
            executor.submit(() -> {
                for (int i = 0; i < itemsPerProducer; i++) {
                    int item = producerId * itemsPerProducer + i;
                    queue.enqueue(item);
                    produced.add(item);
                    produceCount.incrementAndGet();
                }
                producersDone.countDown();
            });
        }
        
        // Wait for all producers to finish
        producersDone.await();
        
        // Start consumers — drain the queue
        CountDownLatch consumersDone = new CountDownLatch(numConsumers);
        for (int c = 0; c < numConsumers; c++) {
            executor.submit(() -> {
                Integer item;
                while ((item = queue.dequeue()) != null) {
                    if (!consumed.add(item)) {
                        System.err.println("DUPLICATE consumed: " + item);
                    }
                    consumeCount.incrementAndGet();
                }
                consumersDone.countDown();
            });
        }
        
        consumersDone.await();
        executor.shutdown();
        
        int expected = numProducers * itemsPerProducer;
        System.out.printf("Produced: %d, Consumed: %d%n", 
            produceCount.get(), consumeCount.get());
        System.out.printf("Unique produced: %d, Unique consumed: %d%n",
            produced.size(), consumed.size());
        System.out.println(consumed.equals(produced) ? "✅ PASS" : "❌ FAIL");
    }
}

Treiber Stack vs Michael-Scott Queue

Feature Treiber Stack Michael-Scott Queue
Data structure LIFO (Last-In, First-Out) FIFO (First-In, First-Out)
Atomic pointers 1 (head) 2 (head + tail)
Nodes Fully immutable Mutable next pointer (AtomicReference)
Sentinel node Not needed Required (simplifies empty queue)
Helping Not needed Yes — threads help advance lagging tail
CAS operations per op 1 1-2
Complexity Simple Moderate
Real-world usage Less common ConcurrentLinkedQueue is based on this

Why the Helping Pattern Matters

The helping mechanism is what elevates this from a clever hack to a principled design:

This "helping" principle appears across many lock-free algorithms:

If you find the data structure in an inconsistent state, fix it yourself and continue.


Key Takeaways

  1. The Michael-Scott Queue uses a linked list with CAS on both head and tail pointers

  2. A sentinel (dummy) node simplifies empty-queue handling

  3. Enqueue is a two-step operation: link the new node, then swing the tail

  4. The helping mechanism ensures lock-freedom: if tail is lagging, any thread can advance it

  5. Read values before CAS in dequeue — after CAS, the node becomes the new sentinel

  6. Java's ConcurrentLinkedQueue is based on this algorithm