Deep Dive (java.util.concurrent) - ConcurrentLinkedQueue and ConcurrentLinkedDeque
"No locks at all — just atomic compare-and-swap, all the way down."
ConcurrentLinkedQueue and ConcurrentLinkedDeque are lock-free concurrent collections built entirely on CAS (Compare-And-Swap) operations. This deep dive explains the Michael-Scott non-blocking queue algorithm and the helping mechanism that makes it work.
Why Lock-Free?
Lock-based data structures have problems:
Deadlock risk — threads can deadlock on nested locks
Priority inversion — high-priority thread waits for low-priority lock holder
Convoying — if lock holder is preempted, ALL waiters stall
Lock-free structures use CAS (Compare-And-Swap) instead:
Lock-free guarantee: At least one thread is always making progress, regardless of what happens to other threads (preemption, suspension, etc.).
CAS Refresher
// Atomic compare-and-swap (single CPU instruction on x86: CMPXCHG)
boolean CAS(address, expectedValue, newValue) {
if (*address == expectedValue) {
*address = newValue;
return true; // Success!
}
return false; // Failed — someone else changed it
}
ConcurrentLinkedQueue — The Michael-Scott Queue
ConcurrentLinkedQueue implements the famous Michael-Scott non-blocking queue (1996). It's a singly-linked list with head and tail pointers:
public class ConcurrentLinkedQueue<E> extends AbstractQueue<E>
implements Queue<E> {
private transient volatile Node<E> head;
private transient volatile Node<E> tail;
private static class Node<E> {
volatile E item;
volatile Node<E> next;
}
}
Important: The head always points to a sentinel node (a dummy node whose item is null). This simplifies edge cases.
How offer() Works — Lock-Free Insert
public boolean offer(E e) {
final Node<E> newNode = new Node<E>(e);
for (Node<E> t = tail, p = t;;) { // Retry loop
Node<E> q = p.next;
if (q == null) {
// p is the last node — try to CAS our node as the next
if (NEXT.compareAndSet(p, null, newNode)) {
// CAS succeeded! Node is linked.
if (p != t) // Tail fell behind — update it
TAIL.compareAndSet(this, t, newNode);
return true;
}
// CAS failed — another thread won. Retry.
}
else if (p == q)
// Self-link (removed node) — restart from head or tail
p = (t != (t = tail)) ? t : head;
else
// Tail fell behind — advance to the real tail
p = (p != t && t != (t = tail)) ? t : q;
}
}
The Two-Phase Insert
The insert is split into two steps:
Link the node: CAS the
nextpointer of the current last nodeUpdate tail: CAS the
tailpointer to the new node
How poll() Works — Lock-Free Remove
public E poll() {
restartFromHead: for (;;) {
for (Node<E> h = head, p = h, q;; ) {
final E item;
if ((item = p.item) != null && p.compareAndSetItem(item, null)) {
// CAS nulled out the item — we "own" this element
if (p != h)
updateHead(h, ((q = p.next) != null) ? q : p);
return item;
}
else if ((q = p.next) == null) {
updateHead(h, p);
return null; // Queue is empty
}
else if (p == q)
continue restartFromHead;
else
p = q; // Advance to next node
}
}
}
Key insight: Removal doesn't physically unlink the node immediately. Instead, the item is CAS'd to null, and the head pointer is lazily updated. Nodes with null items are "logically removed" but may still be in the chain briefly.
The Helping Mechanism
The most elegant design aspect: threads help each other. If Thread A links a node but gets preempted before updating tail, Thread B will notice and do it:
This "helping" is what makes the algorithm lock-free — progress is guaranteed even if any thread is suspended at any point.
ConcurrentLinkedDeque — Double-Ended
ConcurrentLinkedDeque extends the concept to a doubly-linked lock-free structure with operations at both ends:
public class ConcurrentLinkedDeque<E> extends AbstractCollection<E>
implements Deque<E> {
volatile Node<E> head;
volatile Node<E> tail;
static final class Node<E> {
volatile Node<E> prev; // Added for deque functionality
volatile E item;
volatile Node<E> next;
}
}
The doubly-linked lock-free deque is significantly more complex than the singly-linked queue because it must maintain both next and prev pointers atomically. It uses a sophisticated multi-step CAS protocol.
Operations
| Method | Description | Complexity |
|---|---|---|
offerFirst(e) |
Add at head | O(1) |
offerLast(e) |
Add at tail | O(1) |
pollFirst() |
Remove from head | O(1) |
pollLast() |
Remove from tail | O(1) |
peekFirst() |
Look at head | O(1) |
peekLast() |
Look at tail | O(1) |
push(e) / pop() |
Stack operations (head) | O(1) |
size() |
Count elements | O(n) |
Why size() is O(n)
Both ConcurrentLinkedQueue and ConcurrentLinkedDeque have O(n) size():
public int size() {
restartFromHead: for (;;) {
int count = 0;
for (Node<E> p = first(); p != null; ) {
if (p.item != null)
if (++count == Integer.MAX_VALUE)
break; // Overflow protection
if (p == (p = p.next))
continue restartFromHead;
}
return count;
}
}
Why? There's no atomic counter. Adding one would require CAS on every insert and remove, creating a contention bottleneck — the exact thing lock-free structures are designed to avoid.
Rule: If you need to check
size()frequently, ConcurrentLinkedQueue is the wrong choice. Use anAtomicIntegercounter alongside it, or use a BlockingQueue.
ConcurrentLinkedQueue vs BlockingQueue
| Feature | ConcurrentLinkedQueue | BlockingQueue |
|---|---|---|
| Blocking | ❌ Never blocks | ✅ put()/take() block |
| Lock-free | ✅ CAS-only | ❌ Uses ReentrantLock |
| Bounded | ❌ Unbounded | ✅ (some implementations) |
| size() | O(n) approximate | O(1) exact |
| Use case | Non-blocking producers/consumers | Blocking producer/consumer |
| Back-pressure | None | Natural (full queue blocks) |
| Best when | poll() in a spin loop or event loop | Threads should sleep when idle |
Common Pitfalls
Pitfall 1: Spinning on poll() Instead of Blocking
// ❌ TERRIBLE: Burns CPU when queue is empty!
while (true) {
String item = concurrentQueue.poll();
if (item != null) {
process(item);
}
// When queue is empty, this loops at 100% CPU doing nothing!
}
// ✅ If you need blocking behavior, use BlockingQueue.take()
String item = blockingQueue.take(); // Thread sleeps when empty
// ✅ Or add a small sleep/yield
while (true) {
String item = concurrentQueue.poll();
if (item != null) {
process(item);
} else {
Thread.onSpinWait(); // Java 9+ hint to CPU
}
}
Pitfall 2: Relying on size()
// ❌ size() is O(n) AND may be inaccurate under concurrency
if (queue.size() == 0) {
// Another thread might add an element between size() and your next line!
}
// ✅ Use isEmpty() — still a snapshot, but cheaper
if (queue.isEmpty()) { ... }
Pitfall 3: Unbounded Growth
// ❌ ConcurrentLinkedQueue is unbounded — can grow until OOM
ConcurrentLinkedQueue<Task> queue = new ConcurrentLinkedQueue<>();
// If producers outpace consumers, memory grows without limit
// No back-pressure mechanism!
// ✅ Use a bounded BlockingQueue for back-pressure
BlockingQueue<Task> queue = new ArrayBlockingQueue<>(10_000);
Summary
The one-liner: ConcurrentLinkedQueue/Deque are lock-free, CAS-based collections providing O(1) non-blocking insert/remove with a helping mechanism — choose them when blocking is unacceptable and you can handle unbounded growth, otherwise use BlockingQueue.