# Deep Dive (java.util) - Sets

> *"Every Set in Java is secretly a Map wearing a trenchcoat."*

This article reveals the dirty secret of Java Sets: they're all just **wrappers around Maps** with dummy values. Understanding this makes Sets trivial if you've already read the deep dives into HashMap, TreeMap, and LinkedHashMap.

* * *

## The Secret: Sets Are Maps

Here's the actual source code of `HashSet`:

```java
public class HashSet<E> extends AbstractSet<E> implements Set<E> {
    
    private transient HashMap<E, Object> map;
    
    // Dummy value — every key maps to this same object
    private static final Object PRESENT = new Object();
    
    public HashSet() {
        map = new HashMap<>();
    }
    
    public boolean add(E e) {
        return map.put(e, PRESENT) == null;
    }
    
    public boolean remove(Object o) {
        return map.remove(o) == PRESENT;
    }
    
    public boolean contains(Object o) {
        return map.containsKey(o);
    }
    
    public int size() {
        return map.size();
    }
    
    public Iterator<E> iterator() {
        return map.keySet().iterator();
    }
}
```

That's essentially the entire implementation. **Every** `Set` **method just delegates to the backing** `Map`**.**

![](https://cdn.hashnode.com/uploads/covers/637f189ed7d9bcd845996b4b/ff96c385-5d4d-4fd1-ba20-d0f25418bac5.png align="center")

* * *

## HashSet — Backed by HashMap

`HashSet` is the most commonly used Set. Since it delegates to HashMap:

*   **add()**: O(1) average — calls `map.put(element, PRESENT)`
    
*   **contains()**: O(1) average — calls `map.containsKey(element)`
    
*   **remove()**: O(1) average — calls `map.remove(element)`
    
*   **Iteration order**: No guarantee (same as HashMap)
    
*   **Null elements**: One null allowed (HashMap allows one null key)
    

```java
HashSet<String> fruits = new HashSet<>();
fruits.add("Apple");   // map.put("Apple", PRESENT) → null (new) → returns true
fruits.add("Banana");  // returns true
fruits.add("Apple");   // map.put("Apple", PRESENT) → PRESENT (exists) → returns false!

fruits.size();            // 2
fruits.contains("Apple"); // map.containsKey("Apple") → true
```

![](https://cdn.hashnode.com/uploads/covers/637f189ed7d9bcd845996b4b/a0518a22-9076-4b5c-848f-dd4bcc3208cf.png align="center")

### Memory Overhead

Each element in a HashSet costs:

*   HashMap.Node: ~32 bytes (hash, key, value, next)
    
*   The dummy `PRESENT` value: just one static object (shared)
    
*   **Total: ~32 bytes per element** (plus the element object itself)
    

* * *

## TreeSet — Backed by TreeMap

`TreeSet` maintains elements in **sorted order** using a Red-Black tree (via TreeMap):

```java
public class TreeSet<E> extends AbstractSet<E> implements NavigableSet<E> {
    private transient NavigableMap<E, Object> m;
    
    public TreeSet() {
        this(new TreeMap<>());
    }
    
    public TreeSet(Comparator<? super E> comparator) {
        this(new TreeMap<>(comparator));
    }
}
```

```java
TreeSet<Integer> numbers = new TreeSet<>();
numbers.add(30);
numbers.add(10);
numbers.add(20);
numbers.add(40);

for (int n : numbers) {
    System.out.println(n); // 10, 20, 30, 40 — always sorted!
}

// NavigableSet operations (delegated to TreeMap)
numbers.first();      // 10
numbers.last();       // 40
numbers.floor(25);    // 20 (greatest ≤ 25)
numbers.ceiling(25);  // 30 (smallest ≥ 25)
numbers.headSet(25);  // [10, 20]
numbers.tailSet(25);  // [30, 40]
numbers.subSet(15, 35); // [20, 30]
```

![](https://cdn.hashnode.com/uploads/covers/637f189ed7d9bcd845996b4b/03fac13f-ed6d-481b-a82f-ea357b4d3470.png align="center")

### Performance

All operations are **O(log n)** because they delegate to TreeMap:

*   `add()`, `remove()`, `contains()`: O(log n)
    
*   `first()`, `last()`, `floor()`, `ceiling()`: O(log n)
    

* * *

## LinkedHashSet — Backed by LinkedHashMap

`LinkedHashSet` maintains **insertion order** — elements iterate in the order you added them:

```java
public class LinkedHashSet<E> extends HashSet<E> {
    public LinkedHashSet() {
        super(16, .75f, true); // Calls a special HashSet constructor
    }
}

// Inside HashSet:
HashSet(int initialCapacity, float loadFactor, boolean dummy) {
    map = new LinkedHashMap<>(initialCapacity, loadFactor);
}
```

Yes, `LinkedHashSet` is just `HashSet` with a `LinkedHashMap` instead of a `HashMap`. It's maybe 10 lines of code total.

```java
LinkedHashSet<String> set = new LinkedHashSet<>();
set.add("Cherry");
set.add("Apple");
set.add("Banana");

for (String s : set) {
    System.out.println(s); // Cherry, Apple, Banana — insertion order!
}
```

![](https://cdn.hashnode.com/uploads/covers/637f189ed7d9bcd845996b4b/d04af0b1-ad77-4fe7-87e9-6d12fa277130.png align="center")

* * *

## How Deduplication Works

Since Sets are Maps, deduplication uses the **exact same mechanism** as HashMap key uniqueness:

![](https://cdn.hashnode.com/uploads/covers/637f189ed7d9bcd845996b4b/8dccf5f1-dd86-4d8a-a75f-5267220fa569.png align="center")

**Critical requirement:** For deduplication to work, your objects must have proper `hashCode()` and `equals()` implementations (see [HashMap Deep Dive](03-hashmap.md)).

```java
// ❌ Without proper equals/hashCode — duplicates sneak in!
class Person {
    String name;
    Person(String name) { this.name = name; }
}

Set<Person> set = new HashSet<>();
set.add(new Person("Alice"));
set.add(new Person("Alice"));
set.size(); // 2! Different objects with different identity hashCodes

// ✅ With proper equals/hashCode — deduplication works
class Person {
    String name;
    Person(String name) { this.name = name; }
    
    @Override
    public boolean equals(Object o) {
        return o instanceof Person p && name.equals(p.name);
    }
    
    @Override
    public int hashCode() {
        return name.hashCode();
    }
}

Set<Person> set = new HashSet<>();
set.add(new Person("Alice"));
set.add(new Person("Alice"));
set.size(); // 1 ✓
```

* * *

## Set Operations — Union, Intersection, Difference

Java doesn't have dedicated union/intersection operators, but you can use `addAll`, `retainAll`, and `removeAll`:

```java
Set<String> setA = new HashSet<>(Set.of("A", "B", "C", "D"));
Set<String> setB = new HashSet<>(Set.of("C", "D", "E", "F"));

// UNION: A ∪ B
Set<String> union = new HashSet<>(setA);
union.addAll(setB);
// {A, B, C, D, E, F}

// INTERSECTION: A ∩ B
Set<String> intersection = new HashSet<>(setA);
intersection.retainAll(setB);
// {C, D}

// DIFFERENCE: A \ B
Set<String> difference = new HashSet<>(setA);
difference.removeAll(setB);
// {A, B}

// SYMMETRIC DIFFERENCE: A △ B
Set<String> symDiff = new HashSet<>(setA);
symDiff.addAll(setB);
Set<String> temp = new HashSet<>(setA);
temp.retainAll(setB);
symDiff.removeAll(temp);
// {A, B, E, F}
```

![](https://cdn.hashnode.com/uploads/covers/637f189ed7d9bcd845996b4b/65ab4113-ade6-4a8e-83b6-35e97c8daf5f.png align="center")

* * *

## Choosing the Right Set

![](https://cdn.hashnode.com/uploads/covers/637f189ed7d9bcd845996b4b/da8eddc4-b3ba-4f65-8f79-2cca2312a876.png align="center")

| Feature | HashSet | LinkedHashSet | TreeSet |
| --- | --- | --- | --- |
| **Backed by** | HashMap | LinkedHashMap | TreeMap |
| **add/contains/remove** | O(1) | O(1) | O(log n) |
| **Iteration order** | Unpredictable | Insertion order | Sorted |
| **Null elements** | 1 allowed | 1 allowed | ❌ Not allowed |
| **Memory per element** | ~32 bytes | ~48 bytes | ~48 bytes |
| **Navigation ops** | ❌ | ❌ | ✅ floor/ceiling/subset |
| **Thread-safe** | ❌ | ❌ | ❌ |

* * *

## Common Pitfalls

### Pitfall 1: Mutable Elements in a Set

```java
// ❌ Same problem as mutable HashMap keys
List<String> item = new ArrayList<>(List.of("hello"));
Set<List<String>> set = new HashSet<>();
set.add(item);

item.add("world"); // hashCode changes!
set.contains(item); // false! Can't find it anymore
set.size(); // 1, but it's effectively lost
```

### Pitfall 2: TreeSet with Non-Comparable Elements

```java
// ❌ ClassCastException!
TreeSet<Object> set = new TreeSet<>();
set.add(new Object()); // Object doesn't implement Comparable!
```

### Pitfall 3: Using Wrong Set for Performance-Critical Code

```java
// ❌ TreeSet when you just need uniqueness
TreeSet<String> ids = new TreeSet<>(); // O(log n) per add — unnecessary!
for (String id : millionIds) {
    ids.add(id);
}

// ✅ HashSet for simple deduplication
HashSet<String> ids = new HashSet<>(); // O(1) per add
```

* * *

## Summary

![](https://cdn.hashnode.com/uploads/covers/637f189ed7d9bcd845996b4b/d49241fa-5d32-47d0-ae81-d08337fca192.png align="center")

**The one-liner:** Java Sets are thin wrappers over Maps — HashSet (fast, unordered), TreeSet (sorted, O(log n)), LinkedHashSet (insertion-ordered) — choose based on whether you need ordering, not the Set API itself.
