Deep Dive (java.util)

"Every Set in Java is secretly a Map wearing a trenchcoat."

This article reveals the dirty secret of Java Sets: they're all just wrappers around Maps with dummy values. Understanding this makes Sets trivial if you've already read the deep dives into HashMap, TreeMap, and LinkedHashMap.

The Secret: Sets Are Maps

Here's the actual source code of HashSet:

public class HashSet<E> extends AbstractSet<E> implements Set<E> {
    
    private transient HashMap<E, Object> map;
    
    // Dummy value — every key maps to this same object
    private static final Object PRESENT = new Object();
    
    public HashSet() {
        map = new HashMap<>();
    }
    
    public boolean add(E e) {
        return map.put(e, PRESENT) == null;
    }
    
    public boolean remove(Object o) {
        return map.remove(o) == PRESENT;
    }
    
    public boolean contains(Object o) {
        return map.containsKey(o);
    }
    
    public int size() {
        return map.size();
    }
    
    public Iterator<E> iterator() {
        return map.keySet().iterator();
    }
}

That's essentially the entire implementation. Every Set method just delegates to the backing Map.

HashSet — Backed by HashMap

HashSet is the most commonly used Set. Since it delegates to HashMap:

add(): O(1) average — calls map.put(element, PRESENT)
contains(): O(1) average — calls map.containsKey(element)
remove(): O(1) average — calls map.remove(element)
Iteration order: No guarantee (same as HashMap)
Null elements: One null allowed (HashMap allows one null key)

HashSet<String> fruits = new HashSet<>();
fruits.add("Apple");   // map.put("Apple", PRESENT) → null (new) → returns true
fruits.add("Banana");  // returns true
fruits.add("Apple");   // map.put("Apple", PRESENT) → PRESENT (exists) → returns false!

fruits.size();            // 2
fruits.contains("Apple"); // map.containsKey("Apple") → true

Memory Overhead

Each element in a HashSet costs:

HashMap.Node: ~32 bytes (hash, key, value, next)
The dummy PRESENT value: just one static object (shared)
Total: ~32 bytes per element (plus the element object itself)

TreeSet — Backed by TreeMap

TreeSet maintains elements in sorted order using a Red-Black tree (via TreeMap):

public class TreeSet<E> extends AbstractSet<E> implements NavigableSet<E> {
    private transient NavigableMap<E, Object> m;
    
    public TreeSet() {
        this(new TreeMap<>());
    }
    
    public TreeSet(Comparator<? super E> comparator) {
        this(new TreeMap<>(comparator));
    }
}

TreeSet<Integer> numbers = new TreeSet<>();
numbers.add(30);
numbers.add(10);
numbers.add(20);
numbers.add(40);

for (int n : numbers) {
    System.out.println(n); // 10, 20, 30, 40 — always sorted!
}

// NavigableSet operations (delegated to TreeMap)
numbers.first();      // 10
numbers.last();       // 40
numbers.floor(25);    // 20 (greatest ≤ 25)
numbers.ceiling(25);  // 30 (smallest ≥ 25)
numbers.headSet(25);  // [10, 20]
numbers.tailSet(25);  // [30, 40]
numbers.subSet(15, 35); // [20, 30]

Performance

All operations are O(log n) because they delegate to TreeMap:

add(), remove(), contains(): O(log n)
first(), last(), floor(), ceiling(): O(log n)

LinkedHashSet — Backed by LinkedHashMap

LinkedHashSet maintains insertion order — elements iterate in the order you added them:

public class LinkedHashSet<E> extends HashSet<E> {
    public LinkedHashSet() {
        super(16, .75f, true); // Calls a special HashSet constructor
    }
}

// Inside HashSet:
HashSet(int initialCapacity, float loadFactor, boolean dummy) {
    map = new LinkedHashMap<>(initialCapacity, loadFactor);
}

Yes, LinkedHashSet is just HashSet with a LinkedHashMap instead of a HashMap. It's maybe 10 lines of code total.

LinkedHashSet<String> set = new LinkedHashSet<>();
set.add("Cherry");
set.add("Apple");
set.add("Banana");

for (String s : set) {
    System.out.println(s); // Cherry, Apple, Banana — insertion order!
}

How Deduplication Works

Since Sets are Maps, deduplication uses the exact same mechanism as HashMap key uniqueness:

Critical requirement: For deduplication to work, your objects must have proper hashCode() and equals() implementations (see HashMap Deep Dive).

// ❌ Without proper equals/hashCode — duplicates sneak in!
class Person {
    String name;
    Person(String name) { this.name = name; }
}

Set<Person> set = new HashSet<>();
set.add(new Person("Alice"));
set.add(new Person("Alice"));
set.size(); // 2! Different objects with different identity hashCodes

// ✅ With proper equals/hashCode — deduplication works
class Person {
    String name;
    Person(String name) { this.name = name; }
    
    @Override
    public boolean equals(Object o) {
        return o instanceof Person p && name.equals(p.name);
    }
    
    @Override
    public int hashCode() {
        return name.hashCode();
    }
}

Set<Person> set = new HashSet<>();
set.add(new Person("Alice"));
set.add(new Person("Alice"));
set.size(); // 1 ✓

Set Operations — Union, Intersection, Difference

Java doesn't have dedicated union/intersection operators, but you can use addAll, retainAll, and removeAll:

Set<String> setA = new HashSet<>(Set.of("A", "B", "C", "D"));
Set<String> setB = new HashSet<>(Set.of("C", "D", "E", "F"));

// UNION: A ∪ B
Set<String> union = new HashSet<>(setA);
union.addAll(setB);
// {A, B, C, D, E, F}

// INTERSECTION: A ∩ B
Set<String> intersection = new HashSet<>(setA);
intersection.retainAll(setB);
// {C, D}

// DIFFERENCE: A \ B
Set<String> difference = new HashSet<>(setA);
difference.removeAll(setB);
// {A, B}

// SYMMETRIC DIFFERENCE: A △ B
Set<String> symDiff = new HashSet<>(setA);
symDiff.addAll(setB);
Set<String> temp = new HashSet<>(setA);
temp.retainAll(setB);
symDiff.removeAll(temp);
// {A, B, E, F}

Choosing the Right Set

Feature	HashSet	LinkedHashSet	TreeSet
Backed by	HashMap	LinkedHashMap	TreeMap
add/contains/remove	O(1)	O(1)	O(log n)
Iteration order	Unpredictable	Insertion order	Sorted
Null elements	1 allowed	1 allowed	❌ Not allowed
Memory per element	~32 bytes	~48 bytes	~48 bytes
Navigation ops	❌	❌	✅ floor/ceiling/subset
Thread-safe	❌	❌	❌

Common Pitfalls

Pitfall 1: Mutable Elements in a Set

// ❌ Same problem as mutable HashMap keys
List<String> item = new ArrayList<>(List.of("hello"));
Set<List<String>> set = new HashSet<>();
set.add(item);

item.add("world"); // hashCode changes!
set.contains(item); // false! Can't find it anymore
set.size(); // 1, but it's effectively lost

Pitfall 2: TreeSet with Non-Comparable Elements

// ❌ ClassCastException!
TreeSet<Object> set = new TreeSet<>();
set.add(new Object()); // Object doesn't implement Comparable!

Pitfall 3: Using Wrong Set for Performance-Critical Code

// ❌ TreeSet when you just need uniqueness
TreeSet<String> ids = new TreeSet<>(); // O(log n) per add — unnecessary!
for (String id : millionIds) {
    ids.add(id);
}

// ✅ HashSet for simple deduplication
HashSet<String> ids = new HashSet<>(); // O(1) per add

Summary

The one-liner: Java Sets are thin wrappers over Maps — HashSet (fast, unordered), TreeSet (sorted, O(log n)), LinkedHashSet (insertion-ordered) — choose based on whether you need ordering, not the Set API itself.

Deep Dive (java.util) - Sets