Deep Dive (java.util) - HashMap
Hashing, Buckets, and Tree Bins
"HashMap is the Swiss Army knife of Java — and like any tool, you need to understand the blade to avoid cutting yourself."
HashMap is the most-used Map implementation in Java. This deep dive tears apart its internals: hash functions, bucket arrays, linked-list chains, treeification, resizing, and the subtle contracts that make it all work.
What is HashMap, Really?
A HashMap is an array of buckets, where each bucket holds entries that hash to the same index. It provides average O(1) put/get/remove by using hash codes to jump directly to the right bucket.
The Bucket Array
The heart of HashMap is a Node<K,V>[] array called table:
transient Node<K,V>[] table; // The bucket array
static class Node<K,V> implements Map.Entry<K,V> {
final int hash; // Cached hash of the key
final K key; // The key
V value; // The value
Node<K,V> next; // Next node in the chain (for collisions)
}
Bucket 2 has a collision — two entries (Alice and Bob) hashed to the same bucket and are linked together.
Hash Function — Spreading the Bits
Java doesn't use key.hashCode() directly. It applies a secondary hash function to spread the bits:
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Why XOR the upper 16 bits with the lower 16?
The bucket index is computed as: index = hash & (table.length - 1)
If table.length is 16 (binary: 10000), the mask is 1111 — only the lowest 4 bits determine the bucket. Without the XOR spread, keys whose hashCodes differ only in the upper bits would all land in the same bucket.
4. put() — Step by Step
Here's the complete flow of map.put("Alice", 100):
Simplified Source Code
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}
final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
// 1. Initialize table if needed
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
// 2. Find the bucket
if ((p = tab[i = (n - 1) & hash]) == null)
// 3a. Empty bucket — create new node
tab[i] = newNode(hash, key, value, null);
else {
// 3b. Collision — find or create entry
Node<K,V> e; K k;
if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k))))
e = p; // Key matches first node
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
// Walk the chain
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // threshold = 8
treeifyBin(tab, hash);
break;
}
if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
// Update existing value
if (e != null) {
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
return oldValue;
}
}
++modCount;
if (++size > threshold)
resize();
return null;
}
get() — Finding a Value
public V get(Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) {
// Always check first node
if (first.hash == hash &&
((k = first.key) == key || (key != null && key.equals(k))))
return first;
if ((e = first.next) != null) {
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
// Walk the chain
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
}
return null;
}
Time complexity:
Best case (no collision): O(1) — single array access + one equals check
With chain: O(k) where k is the chain length
With tree (Java 8+): O(log k) when the bucket is treeified
Worst case (all keys in one bucket): O(n) or O(log n) with treeification
Collision Handling — Chains and Trees
When two keys hash to the same bucket, they form a chain (linked list within the bucket):
Why collisions happen:
Same hashCode — two objects return the same
hashCode()valueDifferent hashCode, same bucket — after
hash & (n-1), different hashes map to the same index (since we're masking tolog₂(n)bits)
Treeification — When Chains Become Red-Black Trees
This is a Java 8+ optimization. When a chain exceeds 8 nodes (and the table size is at least 64), the chain is converted to a Red-Black Tree for O(log n) lookups within that bucket.
The Constants
static final int TREEIFY_THRESHOLD = 8; // Chain → Tree when chain ≥ 8
static final int UNTREEIFY_THRESHOLD = 6; // Tree → Chain when tree ≤ 6 (during resize)
static final int MIN_TREEIFY_CAPACITY = 64; // Don't treeify if table < 64
Why 8? Under random hashing, the probability of 8+ collisions in a single bucket follows a Poisson distribution with λ = 0.5 (at default load factor 0.75). The probability is about 1 in 10 million — so treeification is a safety net against bad hash functions, not normal operation.
Treeification Requirements
For a key to be stored in a Red-Black Tree node, it must be comparable. HashMap tries, in order:
Comparableinterface (if the key implements it)Class.getName().compareTo()(comparison by class name)System.identityHashCode()as a tiebreaker
Resizing — Growing the Table
When size > threshold (where threshold = capacity × loadFactor), the table doubles in size and all entries are rehashed.
The Clever Bit: No Rehashing Needed!
Since the table size is always a power of 2, and the new size is exactly double, each entry goes to either:
Same index (if the new bit is 0)
Old index + old capacity (if the new bit is 1)
// The actual check in resize():
if ((e.hash & oldCap) == 0)
// Stays at index i
else
// Moves to index i + oldCap
This avoids recomputing hash(key) for every entry — just check one bit!
Load Factor — Tuning the Trade-off
The load factor controls the space/time trade-off:
// Default: 75% full before resize
HashMap<String, Integer> map1 = new HashMap<>(); // loadFactor = 0.75
// Custom: resize sooner (more memory, fewer collisions)
HashMap<String, Integer> map2 = new HashMap<>(16, 0.5f);
// Custom: resize later (less memory, more collisions)
HashMap<String, Integer> map3 = new HashMap<>(16, 1.0f);
Formula: threshold = capacity × loadFactor
| Capacity | Load Factor | Threshold (resize at) |
|---|---|---|
| 16 | 0.75 | 12 |
| 32 | 0.75 | 24 |
| 64 | 0.75 | 48 |
| 16 | 0.50 | 8 |
| 16 | 1.00 | 16 |
Rule of thumb: Leave load factor at 0.75 unless you have profiling data showing it matters. If you know the approximate size, set initial capacity:
new HashMap<>(expectedSize * 4 / 3 + 1)to avoid any resize.
The hashCode() and equals() Contract
This is the most critical contract in all of Java Collections:
What Happens If You Violate the Contract
// ❌ BROKEN: equals() without matching hashCode()
class BadKey {
String name;
@Override
public boolean equals(Object o) {
return o instanceof BadKey && ((BadKey) o).name.equals(this.name);
}
// hashCode() NOT overridden — uses Object.hashCode() (identity)
}
BadKey k1 = new BadKey();
k1.name = "test";
BadKey k2 = new BadKey();
k2.name = "test";
map.put(k1, "value");
map.get(k2); // Returns NULL!
// k1.equals(k2) is true, but k1.hashCode() != k2.hashCode()
// So they hash to DIFFERENT buckets — get() never finds it
Correct Implementation
class GoodKey {
String name;
int id;
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof GoodKey other)) return false;
return id == other.id && Objects.equals(name, other.name);
}
@Override
public int hashCode() {
return Objects.hash(name, id); // Must use same fields as equals()
}
}
Null Keys and Values
HashMap allows one null key and any number of null values:
HashMap<String, String> map = new HashMap<>();
map.put(null, "null-key-value"); // ✅ Stored in bucket 0
map.put("key1", null); // ✅ Null value is fine
map.put("key2", null); // ✅ Multiple null values OK
map.get(null); // "null-key-value"
map.get("key1"); // null — but is the key absent or is the value null?
// Use containsKey to distinguish:
map.containsKey("key1"); // true — key exists, value is null
map.containsKey("key3"); // false — key doesn't exist
Note:
ConcurrentHashMapdoes not allow null keys or values. If you're migrating from HashMap to ConcurrentHashMap, this is a common breaking change.
Iteration Order — No Guarantees
HashMap makes no guarantees about iteration order:
HashMap<String, Integer> map = new HashMap<>();
map.put("Banana", 2);
map.put("Apple", 1);
map.put("Cherry", 3);
for (String key : map.keySet()) {
System.out.println(key);
}
// Could print in ANY order: Apple, Cherry, Banana
// Order may CHANGE after resizing!
If you need ordering, use:
LinkedHashMap— insertion order (or access order)TreeMap— sorted by key
Common Pitfalls
Pitfall 1: Mutable Keys
// ❌ DANGEROUS: Modifying a key after insertion
List<String> key = new ArrayList<>();
key.add("hello");
map.put(key, "value");
key.add("world"); // hashCode changes!
map.get(key); // NULL — hash points to wrong bucket now
// The entry still exists in the old bucket, but it's unreachable!
// This is a silent memory leak.
Rule: HashMap keys should be immutable. Use
String,Integer, enums, or your own immutable classes.
Pitfall 2: Not Sizing Properly
// ❌ BAD: Will resize multiple times
HashMap<String, Integer> map = new HashMap<>(); // capacity=16, threshold=12
for (int i = 0; i < 10_000; i++) {
map.put("key-" + i, i); // Resizes at 12, 24, 48, 96, 192, 384, ...
}
// That's ~13 resizes, each copying ALL entries!
// ✅ GOOD: Pre-size to avoid resizing
HashMap<String, Integer> map = new HashMap<>(14_000); // (10000 / 0.75) + 1
Pitfall 3: Using HashMap with Multiple Threads
// ❌ NEVER DO THIS — HashMap is NOT thread-safe
// Two threads calling put() simultaneously can corrupt the internal structure:
// - Infinite loops (pre-Java 8)
// - Lost updates
// - Corrupted chains
// ✅ Use ConcurrentHashMap instead
ConcurrentHashMap<String, Integer> safeMap = new ConcurrentHashMap<>();
Summary
The one-liner: HashMap is an array of buckets with separate chaining, auto-resizing at 75% capacity, treeified chains for collision safety, and O(1) average performance — but it demands immutable keys with correct hashCode()/equals().