# Deep Dive (java.util) - HashMap 

> *"HashMap is the Swiss Army knife of Java — and like any tool, you need to understand the blade to avoid cutting yourself."*

`HashMap` is the most-used `Map` implementation in Java. This deep dive tears apart its internals: hash functions, bucket arrays, linked-list chains, treeification, resizing, and the subtle contracts that make it all work.

* * *

## What is HashMap, Really?

A `HashMap` is an **array of buckets**, where each bucket holds entries that hash to the same index. It provides average O(1) put/get/remove by using hash codes to jump directly to the right bucket.

![](https://cdn.hashnode.com/uploads/covers/637f189ed7d9bcd845996b4b/b8e937f6-2e61-4175-ad99-01c86d3872c9.png align="center")

* * *

## The Bucket Array

The heart of HashMap is a `Node<K,V>[]` array called `table`:

```java
transient Node<K,V>[] table;  // The bucket array

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;    // Cached hash of the key
    final K key;       // The key
    V value;           // The value
    Node<K,V> next;    // Next node in the chain (for collisions)
}
```

![](https://cdn.hashnode.com/uploads/covers/637f189ed7d9bcd845996b4b/880d43c9-d0e3-4979-8bcf-86a445201148.png align="center")

Bucket 2 has a **collision** — two entries (Alice and Bob) hashed to the same bucket and are linked together.

* * *

## Hash Function — Spreading the Bits

Java doesn't use `key.hashCode()` directly. It applies a secondary hash function to spread the bits:

```java
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
```

### Why XOR the upper 16 bits with the lower 16?

The bucket index is computed as: `index = hash & (table.length - 1)`

If `table.length` is 16 (binary: `10000`), the mask is `1111` — only the **lowest 4 bits** determine the bucket. Without the XOR spread, keys whose hashCodes differ only in the upper bits would all land in the same bucket.

![](https://cdn.hashnode.com/uploads/covers/637f189ed7d9bcd845996b4b/d4a8d7fd-500a-4d35-a696-eb0611309e21.png align="center")

* * *

## 4\. put() — Step by Step

Here's the complete flow of `map.put("Alice", 100)`:

![](https://cdn.hashnode.com/uploads/covers/637f189ed7d9bcd845996b4b/53c1c088-d86e-467a-8237-e8c61a9478c5.png align="center")

### Simplified Source Code

```java
public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    
    // 1. Initialize table if needed
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    
    // 2. Find the bucket
    if ((p = tab[i = (n - 1) & hash]) == null)
        // 3a. Empty bucket — create new node
        tab[i] = newNode(hash, key, value, null);
    else {
        // 3b. Collision — find or create entry
        Node<K,V> e; K k;
        if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k))))
            e = p; // Key matches first node
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            // Walk the chain
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // threshold = 8
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        // Update existing value
        if (e != null) {
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            return oldValue;
        }
    }
    
    ++modCount;
    if (++size > threshold)
        resize();
    return null;
}
```

* * *

## get() — Finding a Value

```java
public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        
        // Always check first node
        if (first.hash == hash &&
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // Walk the chain
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}
```

![](https://cdn.hashnode.com/uploads/covers/637f189ed7d9bcd845996b4b/58097e6a-1c7f-435f-8c80-dfe30e8604af.png align="center")

**Time complexity:**

*   Best case (no collision): **O(1)** — single array access + one equals check
    
*   With chain: **O(k)** where k is the chain length
    
*   With tree (Java 8+): **O(log k)** when the bucket is treeified
    
*   Worst case (all keys in one bucket): **O(n)** or **O(log n)** with treeification
    

* * *

## Collision Handling — Chains and Trees

When two keys hash to the same bucket, they form a **chain** (linked list within the bucket):

![](https://cdn.hashnode.com/uploads/covers/637f189ed7d9bcd845996b4b/a7ee3db9-cf58-4f6b-8a12-cf7ae58a7625.png align="center")

**Why collisions happen:**

1.  **Same hashCode** — two objects return the same `hashCode()` value
    
2.  **Different hashCode, same bucket** — after `hash & (n-1)`, different hashes map to the same index (since we're masking to `log₂(n)` bits)
    

* * *

## Treeification — When Chains Become Red-Black Trees

This is a Java 8+ optimization. When a chain exceeds **8 nodes** (and the table size is at least 64), the chain is converted to a **Red-Black Tree** for O(log n) lookups within that bucket.

![](https://cdn.hashnode.com/uploads/covers/637f189ed7d9bcd845996b4b/4297c4bc-6820-43ad-8531-4e473d90ae4e.png align="center")

### The Constants

```java
static final int TREEIFY_THRESHOLD = 8;    // Chain → Tree when chain ≥ 8
static final int UNTREEIFY_THRESHOLD = 6;  // Tree → Chain when tree ≤ 6 (during resize)
static final int MIN_TREEIFY_CAPACITY = 64; // Don't treeify if table < 64
```

> **Why 8?** Under random hashing, the probability of 8+ collisions in a single bucket follows a Poisson distribution with λ = 0.5 (at default load factor 0.75). The probability is about **1 in 10 million** — so treeification is a safety net against bad hash functions, not normal operation.

### Treeification Requirements

For a key to be stored in a Red-Black Tree node, it must be **comparable**. HashMap tries, in order:

1.  `Comparable` interface (if the key implements it)
    
2.  `Class.getName().compareTo()` (comparison by class name)
    
3.  `System.identityHashCode()` as a tiebreaker
    

* * *

## Resizing — Growing the Table

When `size > threshold` (where `threshold = capacity × loadFactor`), the table doubles in size and all entries are **rehashed**.

![](https://cdn.hashnode.com/uploads/covers/637f189ed7d9bcd845996b4b/1ab89e3b-8b3a-458c-893b-1497a90f87c6.png align="center")

### The Clever Bit: No Rehashing Needed!

Since the table size is always a power of 2, and the new size is exactly double, each entry goes to either:

*   **Same index** (if the new bit is 0)
    
*   **Old index + old capacity** (if the new bit is 1)
    

![](https://cdn.hashnode.com/uploads/covers/637f189ed7d9bcd845996b4b/36729951-80b8-4c17-931f-fa49d5d4d7dc.png align="center")

```java
// The actual check in resize():
if ((e.hash & oldCap) == 0)
    // Stays at index i
else
    // Moves to index i + oldCap
```

This avoids recomputing `hash(key)` for every entry — just check one bit!

* * *

## Load Factor — Tuning the Trade-off

The **load factor** controls the space/time trade-off:

```java
// Default: 75% full before resize
HashMap<String, Integer> map1 = new HashMap<>(); // loadFactor = 0.75

// Custom: resize sooner (more memory, fewer collisions)
HashMap<String, Integer> map2 = new HashMap<>(16, 0.5f);

// Custom: resize later (less memory, more collisions)
HashMap<String, Integer> map3 = new HashMap<>(16, 1.0f);
```

![](https://cdn.hashnode.com/uploads/covers/637f189ed7d9bcd845996b4b/ab08571e-1222-4eec-bda2-dbca888cc2b9.png align="center")

**Formula:** `threshold = capacity × loadFactor`

| Capacity | Load Factor | Threshold (resize at) |
| --- | --- | --- |
| 16 | 0.75 | 12 |
| 32 | 0.75 | 24 |
| 64 | 0.75 | 48 |
| 16 | 0.50 | 8 |
| 16 | 1.00 | 16 |

> **Rule of thumb:** Leave load factor at 0.75 unless you have profiling data showing it matters. If you know the approximate size, set initial capacity: `new HashMap<>(expectedSize * 4 / 3 + 1)` to avoid any resize.

* * *

## The hashCode() and equals() Contract

This is the most critical contract in all of Java Collections:

![](https://cdn.hashnode.com/uploads/covers/637f189ed7d9bcd845996b4b/2c6b5b4a-3e88-4607-8dcd-6e11c17b62a1.png align="center")

### What Happens If You Violate the Contract

```java
// ❌ BROKEN: equals() without matching hashCode()
class BadKey {
    String name;
    
    @Override
    public boolean equals(Object o) {
        return o instanceof BadKey && ((BadKey) o).name.equals(this.name);
    }
    
    // hashCode() NOT overridden — uses Object.hashCode() (identity)
}

BadKey k1 = new BadKey();
k1.name = "test";
BadKey k2 = new BadKey();
k2.name = "test";

map.put(k1, "value");
map.get(k2);  // Returns NULL!
// k1.equals(k2) is true, but k1.hashCode() != k2.hashCode()
// So they hash to DIFFERENT buckets — get() never finds it
```

![](https://cdn.hashnode.com/uploads/covers/637f189ed7d9bcd845996b4b/137d2efc-255d-4d50-83b8-f307a0ad88f0.png align="center")

### Correct Implementation

```java
class GoodKey {
    String name;
    int id;
    
    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof GoodKey other)) return false;
        return id == other.id && Objects.equals(name, other.name);
    }
    
    @Override
    public int hashCode() {
        return Objects.hash(name, id); // Must use same fields as equals()
    }
}
```

* * *

## Null Keys and Values

HashMap allows **one null key** and **any number of null values**:

```java
HashMap<String, String> map = new HashMap<>();
map.put(null, "null-key-value");  // ✅ Stored in bucket 0
map.put("key1", null);            // ✅ Null value is fine
map.put("key2", null);            // ✅ Multiple null values OK

map.get(null);   // "null-key-value"
map.get("key1"); // null — but is the key absent or is the value null?

// Use containsKey to distinguish:
map.containsKey("key1"); // true — key exists, value is null
map.containsKey("key3"); // false — key doesn't exist
```

> **Note:** `ConcurrentHashMap` does **not** allow null keys or values. If you're migrating from HashMap to ConcurrentHashMap, this is a common breaking change.

* * *

## Iteration Order — No Guarantees

HashMap makes **no guarantees** about iteration order:

```java
HashMap<String, Integer> map = new HashMap<>();
map.put("Banana", 2);
map.put("Apple", 1);
map.put("Cherry", 3);

for (String key : map.keySet()) {
    System.out.println(key);
}
// Could print in ANY order: Apple, Cherry, Banana
// Order may CHANGE after resizing!
```

If you need ordering, use:

*   `LinkedHashMap` — insertion order (or access order)
    
*   `TreeMap` — sorted by key
    

* * *

## Common Pitfalls

### Pitfall 1: Mutable Keys

```java
// ❌ DANGEROUS: Modifying a key after insertion
List<String> key = new ArrayList<>();
key.add("hello");

map.put(key, "value");

key.add("world");  // hashCode changes!
map.get(key);      // NULL — hash points to wrong bucket now

// The entry still exists in the old bucket, but it's unreachable!
// This is a silent memory leak.
```

> **Rule:** HashMap keys should be **immutable**. Use `String`, `Integer`, enums, or your own immutable classes.

### Pitfall 2: Not Sizing Properly

```java
// ❌ BAD: Will resize multiple times
HashMap<String, Integer> map = new HashMap<>(); // capacity=16, threshold=12
for (int i = 0; i < 10_000; i++) {
    map.put("key-" + i, i);  // Resizes at 12, 24, 48, 96, 192, 384, ...
}
// That's ~13 resizes, each copying ALL entries!

// ✅ GOOD: Pre-size to avoid resizing
HashMap<String, Integer> map = new HashMap<>(14_000); // (10000 / 0.75) + 1
```

### Pitfall 3: Using HashMap with Multiple Threads

```java
// ❌ NEVER DO THIS — HashMap is NOT thread-safe
// Two threads calling put() simultaneously can corrupt the internal structure:
// - Infinite loops (pre-Java 8)
// - Lost updates
// - Corrupted chains

// ✅ Use ConcurrentHashMap instead
ConcurrentHashMap<String, Integer> safeMap = new ConcurrentHashMap<>();
```

* * *

## Summary

![](https://cdn.hashnode.com/uploads/covers/637f189ed7d9bcd845996b4b/9cf8a1ec-8811-425b-8cfb-8aa6b251699b.png align="center")

**The one-liner:** HashMap is an array of buckets with separate chaining, auto-resizing at 75% capacity, treeified chains for collision safety, and O(1) average performance — but it demands immutable keys with correct `hashCode()`/`equals()`.
