Skip to main content

Command Palette

Search for a command to run...

Deep Dive (java.util) - HashMap

Hashing, Buckets, and Tree Bins

Published
10 min read

"HashMap is the Swiss Army knife of Java — and like any tool, you need to understand the blade to avoid cutting yourself."

HashMap is the most-used Map implementation in Java. This deep dive tears apart its internals: hash functions, bucket arrays, linked-list chains, treeification, resizing, and the subtle contracts that make it all work.


What is HashMap, Really?

A HashMap is an array of buckets, where each bucket holds entries that hash to the same index. It provides average O(1) put/get/remove by using hash codes to jump directly to the right bucket.


The Bucket Array

The heart of HashMap is a Node<K,V>[] array called table:

transient Node<K,V>[] table;  // The bucket array

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;    // Cached hash of the key
    final K key;       // The key
    V value;           // The value
    Node<K,V> next;    // Next node in the chain (for collisions)
}

Bucket 2 has a collision — two entries (Alice and Bob) hashed to the same bucket and are linked together.


Hash Function — Spreading the Bits

Java doesn't use key.hashCode() directly. It applies a secondary hash function to spread the bits:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

Why XOR the upper 16 bits with the lower 16?

The bucket index is computed as: index = hash & (table.length - 1)

If table.length is 16 (binary: 10000), the mask is 1111 — only the lowest 4 bits determine the bucket. Without the XOR spread, keys whose hashCodes differ only in the upper bits would all land in the same bucket.


4. put() — Step by Step

Here's the complete flow of map.put("Alice", 100):

Simplified Source Code

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    
    // 1. Initialize table if needed
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    
    // 2. Find the bucket
    if ((p = tab[i = (n - 1) & hash]) == null)
        // 3a. Empty bucket — create new node
        tab[i] = newNode(hash, key, value, null);
    else {
        // 3b. Collision — find or create entry
        Node<K,V> e; K k;
        if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k))))
            e = p; // Key matches first node
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            // Walk the chain
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // threshold = 8
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        // Update existing value
        if (e != null) {
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            return oldValue;
        }
    }
    
    ++modCount;
    if (++size > threshold)
        resize();
    return null;
}

get() — Finding a Value

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        
        // Always check first node
        if (first.hash == hash &&
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // Walk the chain
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

Time complexity:

  • Best case (no collision): O(1) — single array access + one equals check

  • With chain: O(k) where k is the chain length

  • With tree (Java 8+): O(log k) when the bucket is treeified

  • Worst case (all keys in one bucket): O(n) or O(log n) with treeification


Collision Handling — Chains and Trees

When two keys hash to the same bucket, they form a chain (linked list within the bucket):

Why collisions happen:

  1. Same hashCode — two objects return the same hashCode() value

  2. Different hashCode, same bucket — after hash & (n-1), different hashes map to the same index (since we're masking to log₂(n) bits)


Treeification — When Chains Become Red-Black Trees

This is a Java 8+ optimization. When a chain exceeds 8 nodes (and the table size is at least 64), the chain is converted to a Red-Black Tree for O(log n) lookups within that bucket.

The Constants

static final int TREEIFY_THRESHOLD = 8;    // Chain → Tree when chain ≥ 8
static final int UNTREEIFY_THRESHOLD = 6;  // Tree → Chain when tree ≤ 6 (during resize)
static final int MIN_TREEIFY_CAPACITY = 64; // Don't treeify if table < 64

Why 8? Under random hashing, the probability of 8+ collisions in a single bucket follows a Poisson distribution with λ = 0.5 (at default load factor 0.75). The probability is about 1 in 10 million — so treeification is a safety net against bad hash functions, not normal operation.

Treeification Requirements

For a key to be stored in a Red-Black Tree node, it must be comparable. HashMap tries, in order:

  1. Comparable interface (if the key implements it)

  2. Class.getName().compareTo() (comparison by class name)

  3. System.identityHashCode() as a tiebreaker


Resizing — Growing the Table

When size > threshold (where threshold = capacity × loadFactor), the table doubles in size and all entries are rehashed.

The Clever Bit: No Rehashing Needed!

Since the table size is always a power of 2, and the new size is exactly double, each entry goes to either:

  • Same index (if the new bit is 0)

  • Old index + old capacity (if the new bit is 1)

// The actual check in resize():
if ((e.hash & oldCap) == 0)
    // Stays at index i
else
    // Moves to index i + oldCap

This avoids recomputing hash(key) for every entry — just check one bit!


Load Factor — Tuning the Trade-off

The load factor controls the space/time trade-off:

// Default: 75% full before resize
HashMap<String, Integer> map1 = new HashMap<>(); // loadFactor = 0.75

// Custom: resize sooner (more memory, fewer collisions)
HashMap<String, Integer> map2 = new HashMap<>(16, 0.5f);

// Custom: resize later (less memory, more collisions)
HashMap<String, Integer> map3 = new HashMap<>(16, 1.0f);

Formula: threshold = capacity × loadFactor

Capacity Load Factor Threshold (resize at)
16 0.75 12
32 0.75 24
64 0.75 48
16 0.50 8
16 1.00 16

Rule of thumb: Leave load factor at 0.75 unless you have profiling data showing it matters. If you know the approximate size, set initial capacity: new HashMap<>(expectedSize * 4 / 3 + 1) to avoid any resize.


The hashCode() and equals() Contract

This is the most critical contract in all of Java Collections:

What Happens If You Violate the Contract

// ❌ BROKEN: equals() without matching hashCode()
class BadKey {
    String name;
    
    @Override
    public boolean equals(Object o) {
        return o instanceof BadKey && ((BadKey) o).name.equals(this.name);
    }
    
    // hashCode() NOT overridden — uses Object.hashCode() (identity)
}

BadKey k1 = new BadKey();
k1.name = "test";
BadKey k2 = new BadKey();
k2.name = "test";

map.put(k1, "value");
map.get(k2);  // Returns NULL!
// k1.equals(k2) is true, but k1.hashCode() != k2.hashCode()
// So they hash to DIFFERENT buckets — get() never finds it

Correct Implementation

class GoodKey {
    String name;
    int id;
    
    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof GoodKey other)) return false;
        return id == other.id && Objects.equals(name, other.name);
    }
    
    @Override
    public int hashCode() {
        return Objects.hash(name, id); // Must use same fields as equals()
    }
}

Null Keys and Values

HashMap allows one null key and any number of null values:

HashMap<String, String> map = new HashMap<>();
map.put(null, "null-key-value");  // ✅ Stored in bucket 0
map.put("key1", null);            // ✅ Null value is fine
map.put("key2", null);            // ✅ Multiple null values OK

map.get(null);   // "null-key-value"
map.get("key1"); // null — but is the key absent or is the value null?

// Use containsKey to distinguish:
map.containsKey("key1"); // true — key exists, value is null
map.containsKey("key3"); // false — key doesn't exist

Note: ConcurrentHashMap does not allow null keys or values. If you're migrating from HashMap to ConcurrentHashMap, this is a common breaking change.


Iteration Order — No Guarantees

HashMap makes no guarantees about iteration order:

HashMap<String, Integer> map = new HashMap<>();
map.put("Banana", 2);
map.put("Apple", 1);
map.put("Cherry", 3);

for (String key : map.keySet()) {
    System.out.println(key);
}
// Could print in ANY order: Apple, Cherry, Banana
// Order may CHANGE after resizing!

If you need ordering, use:

  • LinkedHashMap — insertion order (or access order)

  • TreeMap — sorted by key


Common Pitfalls

Pitfall 1: Mutable Keys

// ❌ DANGEROUS: Modifying a key after insertion
List<String> key = new ArrayList<>();
key.add("hello");

map.put(key, "value");

key.add("world");  // hashCode changes!
map.get(key);      // NULL — hash points to wrong bucket now

// The entry still exists in the old bucket, but it's unreachable!
// This is a silent memory leak.

Rule: HashMap keys should be immutable. Use String, Integer, enums, or your own immutable classes.

Pitfall 2: Not Sizing Properly

// ❌ BAD: Will resize multiple times
HashMap<String, Integer> map = new HashMap<>(); // capacity=16, threshold=12
for (int i = 0; i < 10_000; i++) {
    map.put("key-" + i, i);  // Resizes at 12, 24, 48, 96, 192, 384, ...
}
// That's ~13 resizes, each copying ALL entries!

// ✅ GOOD: Pre-size to avoid resizing
HashMap<String, Integer> map = new HashMap<>(14_000); // (10000 / 0.75) + 1

Pitfall 3: Using HashMap with Multiple Threads

// ❌ NEVER DO THIS — HashMap is NOT thread-safe
// Two threads calling put() simultaneously can corrupt the internal structure:
// - Infinite loops (pre-Java 8)
// - Lost updates
// - Corrupted chains

// ✅ Use ConcurrentHashMap instead
ConcurrentHashMap<String, Integer> safeMap = new ConcurrentHashMap<>();

Summary

The one-liner: HashMap is an array of buckets with separate chaining, auto-resizing at 75% capacity, treeified chains for collision safety, and O(1) average performance — but it demands immutable keys with correct hashCode()/equals().