Lock-Free Programming - Java (Part 3)
Compare-And-Swap (CAS) — The Heart of Lock-Free
If lock-free programming were a house, Compare-And-Swap (CAS) would be the foundation. Every lock-free data structure, every atomic variable, and every non-blocking algorithm in Java ultimately boils down to this one magical hardware instruction.
In this article, we'll take CAS apart, see how it works at the hardware level, build CAS loops from scratch, and understand its guarantees and limitations.
The Big Idea
CAS is a single CPU instruction that does three things atomically (all-or-nothing):
Read the current value in a memory location
Compare it to an expected value
If they match, swap in the new value. If not, do nothing.
The critical property: this entire operation is a single atomic step. No other thread can sneak in between the compare and the swap. The hardware guarantees it.
CAS at the Hardware Level
On x86 processors, CAS is implemented by the CMPXCHG (Compare and Exchange) instruction with a LOCK prefix:
LOCK CMPXCHG [memory_address], new_value
The LOCK prefix tells the CPU to lock the cache line containing that memory address for the duration of the instruction. This means:
This is a hardware-level lock on a single cache line (typically 64 bytes), but it's much, much faster than a software lock because:
It doesn't involve the operating system
It doesn't cause thread context switches
It locks only one cache line, not an entire data structure
It takes ~10-100 nanoseconds (vs microseconds for OS locks)
CAS in Java: The Evolution
Java has provided CAS through several APIs over its history:
Era 1: sun.misc.Unsafe (Java 5-8)
import sun.misc.Unsafe;
// ⚠️ Don't use this directly — it's internal API
public class UnsafeCAS {
private static final Unsafe unsafe = getUnsafe();
private static final long VALUE_OFFSET;
private volatile int value = 0;
static {
try {
VALUE_OFFSET = unsafe.objectFieldOffset(
UnsafeCAS.class.getDeclaredField("value")
);
} catch (NoSuchFieldException e) {
throw new RuntimeException(e);
}
}
public boolean compareAndSet(int expected, int newValue) {
return unsafe.compareAndSwapInt(this, VALUE_OFFSET, expected, newValue);
}
}
Era 2: java.util.concurrent.atomic (Java 5+, recommended)
import java.util.concurrent.atomic.AtomicInteger;
public class AtomicCAS {
private final AtomicInteger value = new AtomicInteger(0);
public boolean compareAndSet(int expected, int newValue) {
return value.compareAndSet(expected, newValue);
}
}
Era 3: VarHandle (Java 9+, most modern)
import java.lang.invoke.MethodHandles;
import java.lang.invoke.VarHandle;
public class VarHandleCAS {
private volatile int value = 0;
private static final VarHandle VALUE_HANDLE;
static {
try {
VALUE_HANDLE = MethodHandles.lookup()
.findVarHandle(VarHandleCAS.class, "value", int.class);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
public boolean compareAndSet(int expected, int newValue) {
return VALUE_HANDLE.compareAndSet(this, expected, newValue);
}
}
For this series, we'll use AtomicInteger and AtomicReference — they're the clearest and most widely used.
The CAS Loop: Where the Magic Happens
A single CAS operation isn't very useful by itself. The real power comes from the CAS loop (also called a retry loop or spin loop):
public void increment(AtomicInteger counter) {
while (true) {
int current = counter.get(); // Step 1: Read
int next = current + 1; // Step 2: Compute
if (counter.compareAndSet(current, next)) { // Step 3: CAS
return; // Success!
}
// Failed — someone else changed it. Loop and try again.
}
}
Let's trace through this with three threads:
Every increment was counted! No locks, no lost updates, no blocking. Threads that "lost" the CAS race simply try again with the updated value.
Anatomy of a CAS Loop
Every CAS loop follows the same structure:
Here are some common examples:
CAS Loop: Increment
public int incrementAndGet(AtomicInteger counter) {
int current, next;
do {
current = counter.get();
next = current + 1;
} while (!counter.compareAndSet(current, next));
return next;
}
CAS Loop: Maximum
public void updateMax(AtomicInteger max, int candidate) {
int current;
do {
current = max.get();
if (candidate <= current) {
return; // Not a new max — no update needed
}
} while (!max.compareAndSet(current, candidate));
}
CAS Loop: Accumulate
public int addAndGet(AtomicInteger counter, int delta) {
int current, next;
do {
current = counter.get();
next = current + delta;
} while (!counter.compareAndSet(current, next));
return next;
}
Fun fact: Java 8 added
AtomicInteger.getAndUpdate()andupdateAndGet()which do exactly this CAS loop for you:counter.updateAndGet(current -> current + 1); counter.getAndUpdate(current -> Math.max(current, candidate));
CAS on References: Swapping Whole Objects
CAS isn't limited to integers. AtomicReference<T> lets you CAS entire object references:
import java.util.concurrent.atomic.AtomicReference;
public class CASReference {
static class Config {
final String dbUrl;
final int maxConnections;
Config(String dbUrl, int maxConnections) {
this.dbUrl = dbUrl;
this.maxConnections = maxConnections;
}
}
private final AtomicReference<Config> config =
new AtomicReference<>(new Config("localhost:5432", 10));
public void updateMaxConnections(int newMax) {
Config current, next;
do {
current = config.get();
next = new Config(current.dbUrl, newMax);
} while (!config.compareAndSet(current, next));
}
}
This is incredibly powerful. Instead of locking a config object and mutating its fields, we:
Read the current (immutable) config
Create a brand new config with the change
CAS-swap the reference
If another thread changed the config between our read and our CAS, we simply retry with the new config. This pattern — immutable objects + CAS on the reference — is the bread and butter of lock-free programming.
Is a CAS Loop Really Lock-Free?
Yes! Here's the formal argument:
Lock-free guarantee: In any period of time where threads are running, at least one thread is making progress.
In a CAS loop:
If a CAS succeeds, that thread made progress ✅
If a CAS fails, it means another thread's CAS must have succeeded — that thread made progress ✅
So at any given time, the system as a whole is always making forward progress. No thread can block the entire system.
Compare this with a lock-based approach: if the thread holding the lock gets suspended by the OS, all other threads waiting for that lock are stuck. Nobody makes progress. That's why lock-based code is blocking, not lock-free.
Performance Characteristics
CAS has interesting performance properties:
Low Contention (Few Threads)
CAS loops are extremely fast — often faster than locks because there's no OS involvement. The CAS usually succeeds on the first try.
High Contention (Many Threads)
Under heavy contention, CAS loops can cause spinning — threads burning CPU cycles retrying failed CAS operations. This is called contention.
Cache Line Bouncing
When multiple cores CAS the same variable, the cache line containing that variable bounces between cores:
This "ping-pong" is expensive. For counters with extremely high contention, Java provides LongAdder , which distributes the value across multiple cells to reduce this effect.
Common CAS Pitfalls
Pitfall 1: CAS Without a Loop
// ❌ WRONG — CAS can fail!
public void brokenIncrement(AtomicInteger counter) {
int current = counter.get();
counter.compareAndSet(current, current + 1);
// If CAS fails, we just... skip the increment? 😱
}
Always wrap CAS in a loop (unless you genuinely don't care about failures).
Pitfall 2: Side Effects Before CAS
// ❌ WRONG — side effect happens even if CAS fails!
public void brokenTransfer(AtomicInteger from, AtomicInteger to) {
int current = from.get();
to.addAndGet(1); // Side effect! Can't undo if CAS below fails
from.compareAndSet(current, current - 1);
}
Never perform irreversible side effects before a CAS. The CAS might fail, and you'll have to retry — but you can't undo the side effect.
Pitfall 3: Non-Deterministic Compute
// ⚠️ CAREFUL — result changes on each retry
public void randomUpdate(AtomicInteger value) {
int current, next;
do {
current = value.get();
next = current + ThreadLocalRandom.current().nextInt(10);
} while (!value.compareAndSet(current, next));
// Each retry produces a different random number!
}
This might be intentional, but be aware that the "compute" step runs multiple times if CAS fails.
CAS vs Locks: A Summary
| Aspect | CAS (Lock-Free) | Locks |
|---|---|---|
| Blocking | Never blocks | Can block indefinitely |
| Deadlock | Impossible | Possible |
| Performance (low contention) | Excellent | Good |
| Performance (high contention) | Can spin (CPU waste) | Context switch overhead |
| Scope | Single variable | Can protect multiple variables |
| Complexity | Tricky (ABA, ordering) | Straightforward |
| Composability | Hard | Easier (nested locks, with care) |
| OS involvement | None | Yes (context switches) |
Building Blocks for What's Next
Now that you understand CAS, you have the key ingredient for every lock-free data structure we'll build:
Key Takeaways
CAS is a hardware instruction that atomically compares and swaps a value
CAS loops (read → compute → CAS → retry) are the fundamental pattern of lock-free programming
CAS is lock-free: if one CAS fails, it means another thread's CAS succeeded — the system always progresses
Use
AtomicInteger/AtomicReferencefor CAS in Java — avoidsun.misc.UnsafeCAS is fast under low contention, but under heavy contention, cache line bouncing can hurt performance
Never perform side effects before a CAS — the operation might need to retry.