Skip to main content

Command Palette

Search for a command to run...

Deep Dive (java.util.concurrent) - ConcurrentLinkedQueue and ConcurrentLinkedDeque

Published
7 min read

"No locks at all — just atomic compare-and-swap, all the way down."

ConcurrentLinkedQueue and ConcurrentLinkedDeque are lock-free concurrent collections built entirely on CAS (Compare-And-Swap) operations. This deep dive explains the Michael-Scott non-blocking queue algorithm and the helping mechanism that makes it work.


Why Lock-Free?

Lock-based data structures have problems:

  • Deadlock risk — threads can deadlock on nested locks

  • Priority inversion — high-priority thread waits for low-priority lock holder

  • Convoying — if lock holder is preempted, ALL waiters stall

Lock-free structures use CAS (Compare-And-Swap) instead:

Lock-free guarantee: At least one thread is always making progress, regardless of what happens to other threads (preemption, suspension, etc.).

CAS Refresher

// Atomic compare-and-swap (single CPU instruction on x86: CMPXCHG)
boolean CAS(address, expectedValue, newValue) {
    if (*address == expectedValue) {
        *address = newValue;
        return true;   // Success!
    }
    return false;      // Failed — someone else changed it
}

ConcurrentLinkedQueue — The Michael-Scott Queue

ConcurrentLinkedQueue implements the famous Michael-Scott non-blocking queue (1996). It's a singly-linked list with head and tail pointers:

public class ConcurrentLinkedQueue<E> extends AbstractQueue<E>
        implements Queue<E> {
    
    private transient volatile Node<E> head;
    private transient volatile Node<E> tail;
    
    private static class Node<E> {
        volatile E item;
        volatile Node<E> next;
    }
}

Important: The head always points to a sentinel node (a dummy node whose item is null). This simplifies edge cases.


How offer() Works — Lock-Free Insert

public boolean offer(E e) {
    final Node<E> newNode = new Node<E>(e);
    
    for (Node<E> t = tail, p = t;;) {    // Retry loop
        Node<E> q = p.next;
        
        if (q == null) {
            // p is the last node — try to CAS our node as the next
            if (NEXT.compareAndSet(p, null, newNode)) {
                // CAS succeeded! Node is linked.
                if (p != t) // Tail fell behind — update it
                    TAIL.compareAndSet(this, t, newNode);
                return true;
            }
            // CAS failed — another thread won. Retry.
        }
        else if (p == q)
            // Self-link (removed node) — restart from head or tail
            p = (t != (t = tail)) ? t : head;
        else
            // Tail fell behind — advance to the real tail
            p = (p != t && t != (t = tail)) ? t : q;
    }
}

The Two-Phase Insert

The insert is split into two steps:

  1. Link the node: CAS the next pointer of the current last node

  2. Update tail: CAS the tail pointer to the new node


How poll() Works — Lock-Free Remove

public E poll() {
    restartFromHead: for (;;) {
        for (Node<E> h = head, p = h, q;; ) {
            final E item;
            
            if ((item = p.item) != null && p.compareAndSetItem(item, null)) {
                // CAS nulled out the item — we "own" this element
                if (p != h)
                    updateHead(h, ((q = p.next) != null) ? q : p);
                return item;
            }
            else if ((q = p.next) == null) {
                updateHead(h, p);
                return null;  // Queue is empty
            }
            else if (p == q)
                continue restartFromHead;
            else
                p = q;  // Advance to next node
        }
    }
}

Key insight: Removal doesn't physically unlink the node immediately. Instead, the item is CAS'd to null, and the head pointer is lazily updated. Nodes with null items are "logically removed" but may still be in the chain briefly.


The Helping Mechanism

The most elegant design aspect: threads help each other. If Thread A links a node but gets preempted before updating tail, Thread B will notice and do it:

This "helping" is what makes the algorithm lock-free — progress is guaranteed even if any thread is suspended at any point.


ConcurrentLinkedDeque — Double-Ended

ConcurrentLinkedDeque extends the concept to a doubly-linked lock-free structure with operations at both ends:

public class ConcurrentLinkedDeque<E> extends AbstractCollection<E>
        implements Deque<E> {
    
    volatile Node<E> head;
    volatile Node<E> tail;
    
    static final class Node<E> {
        volatile Node<E> prev;  // Added for deque functionality
        volatile E item;
        volatile Node<E> next;
    }
}

The doubly-linked lock-free deque is significantly more complex than the singly-linked queue because it must maintain both next and prev pointers atomically. It uses a sophisticated multi-step CAS protocol.

Operations

Method Description Complexity
offerFirst(e) Add at head O(1)
offerLast(e) Add at tail O(1)
pollFirst() Remove from head O(1)
pollLast() Remove from tail O(1)
peekFirst() Look at head O(1)
peekLast() Look at tail O(1)
push(e) / pop() Stack operations (head) O(1)
size() Count elements O(n)

Why size() is O(n)

Both ConcurrentLinkedQueue and ConcurrentLinkedDeque have O(n) size():

public int size() {
    restartFromHead: for (;;) {
        int count = 0;
        for (Node<E> p = first(); p != null; ) {
            if (p.item != null)
                if (++count == Integer.MAX_VALUE)
                    break;  // Overflow protection
            if (p == (p = p.next))
                continue restartFromHead;
        }
        return count;
    }
}

Why? There's no atomic counter. Adding one would require CAS on every insert and remove, creating a contention bottleneck — the exact thing lock-free structures are designed to avoid.

Rule: If you need to check size() frequently, ConcurrentLinkedQueue is the wrong choice. Use an AtomicInteger counter alongside it, or use a BlockingQueue.


ConcurrentLinkedQueue vs BlockingQueue

Feature ConcurrentLinkedQueue BlockingQueue
Blocking ❌ Never blocks ✅ put()/take() block
Lock-free ✅ CAS-only ❌ Uses ReentrantLock
Bounded ❌ Unbounded ✅ (some implementations)
size() O(n) approximate O(1) exact
Use case Non-blocking producers/consumers Blocking producer/consumer
Back-pressure None Natural (full queue blocks)
Best when poll() in a spin loop or event loop Threads should sleep when idle

Common Pitfalls

Pitfall 1: Spinning on poll() Instead of Blocking

// ❌ TERRIBLE: Burns CPU when queue is empty!
while (true) {
    String item = concurrentQueue.poll();
    if (item != null) {
        process(item);
    }
    // When queue is empty, this loops at 100% CPU doing nothing!
}

// ✅ If you need blocking behavior, use BlockingQueue.take()
String item = blockingQueue.take();  // Thread sleeps when empty

// ✅ Or add a small sleep/yield
while (true) {
    String item = concurrentQueue.poll();
    if (item != null) {
        process(item);
    } else {
        Thread.onSpinWait(); // Java 9+ hint to CPU
    }
}

Pitfall 2: Relying on size()

// ❌ size() is O(n) AND may be inaccurate under concurrency
if (queue.size() == 0) {
    // Another thread might add an element between size() and your next line!
}

// ✅ Use isEmpty() — still a snapshot, but cheaper
if (queue.isEmpty()) { ... }

Pitfall 3: Unbounded Growth

// ❌ ConcurrentLinkedQueue is unbounded — can grow until OOM
ConcurrentLinkedQueue<Task> queue = new ConcurrentLinkedQueue<>();
// If producers outpace consumers, memory grows without limit
// No back-pressure mechanism!

// ✅ Use a bounded BlockingQueue for back-pressure
BlockingQueue<Task> queue = new ArrayBlockingQueue<>(10_000);

Summary

The one-liner: ConcurrentLinkedQueue/Deque are lock-free, CAS-based collections providing O(1) non-blocking insert/remove with a helping mechanism — choose them when blocking is unacceptable and you can handle unbounded growth, otherwise use BlockingQueue.