Deep Dive (java.util) - Sets
HashSet, TreeSet and LinkedHashSet
"Every Set in Java is secretly a Map wearing a trenchcoat."
This article reveals the dirty secret of Java Sets: they're all just wrappers around Maps with dummy values. Understanding this makes Sets trivial if you've already read the deep dives into HashMap, TreeMap, and LinkedHashMap.
The Secret: Sets Are Maps
Here's the actual source code of HashSet:
public class HashSet<E> extends AbstractSet<E> implements Set<E> {
private transient HashMap<E, Object> map;
// Dummy value — every key maps to this same object
private static final Object PRESENT = new Object();
public HashSet() {
map = new HashMap<>();
}
public boolean add(E e) {
return map.put(e, PRESENT) == null;
}
public boolean remove(Object o) {
return map.remove(o) == PRESENT;
}
public boolean contains(Object o) {
return map.containsKey(o);
}
public int size() {
return map.size();
}
public Iterator<E> iterator() {
return map.keySet().iterator();
}
}
That's essentially the entire implementation. Every Set method just delegates to the backing Map.
HashSet — Backed by HashMap
HashSet is the most commonly used Set. Since it delegates to HashMap:
add(): O(1) average — calls
map.put(element, PRESENT)contains(): O(1) average — calls
map.containsKey(element)remove(): O(1) average — calls
map.remove(element)Iteration order: No guarantee (same as HashMap)
Null elements: One null allowed (HashMap allows one null key)
HashSet<String> fruits = new HashSet<>();
fruits.add("Apple"); // map.put("Apple", PRESENT) → null (new) → returns true
fruits.add("Banana"); // returns true
fruits.add("Apple"); // map.put("Apple", PRESENT) → PRESENT (exists) → returns false!
fruits.size(); // 2
fruits.contains("Apple"); // map.containsKey("Apple") → true
Memory Overhead
Each element in a HashSet costs:
HashMap.Node: ~32 bytes (hash, key, value, next)
The dummy
PRESENTvalue: just one static object (shared)Total: ~32 bytes per element (plus the element object itself)
TreeSet — Backed by TreeMap
TreeSet maintains elements in sorted order using a Red-Black tree (via TreeMap):
public class TreeSet<E> extends AbstractSet<E> implements NavigableSet<E> {
private transient NavigableMap<E, Object> m;
public TreeSet() {
this(new TreeMap<>());
}
public TreeSet(Comparator<? super E> comparator) {
this(new TreeMap<>(comparator));
}
}
TreeSet<Integer> numbers = new TreeSet<>();
numbers.add(30);
numbers.add(10);
numbers.add(20);
numbers.add(40);
for (int n : numbers) {
System.out.println(n); // 10, 20, 30, 40 — always sorted!
}
// NavigableSet operations (delegated to TreeMap)
numbers.first(); // 10
numbers.last(); // 40
numbers.floor(25); // 20 (greatest ≤ 25)
numbers.ceiling(25); // 30 (smallest ≥ 25)
numbers.headSet(25); // [10, 20]
numbers.tailSet(25); // [30, 40]
numbers.subSet(15, 35); // [20, 30]
Performance
All operations are O(log n) because they delegate to TreeMap:
add(),remove(),contains(): O(log n)first(),last(),floor(),ceiling(): O(log n)
LinkedHashSet — Backed by LinkedHashMap
LinkedHashSet maintains insertion order — elements iterate in the order you added them:
public class LinkedHashSet<E> extends HashSet<E> {
public LinkedHashSet() {
super(16, .75f, true); // Calls a special HashSet constructor
}
}
// Inside HashSet:
HashSet(int initialCapacity, float loadFactor, boolean dummy) {
map = new LinkedHashMap<>(initialCapacity, loadFactor);
}
Yes, LinkedHashSet is just HashSet with a LinkedHashMap instead of a HashMap. It's maybe 10 lines of code total.
LinkedHashSet<String> set = new LinkedHashSet<>();
set.add("Cherry");
set.add("Apple");
set.add("Banana");
for (String s : set) {
System.out.println(s); // Cherry, Apple, Banana — insertion order!
}
How Deduplication Works
Since Sets are Maps, deduplication uses the exact same mechanism as HashMap key uniqueness:
Critical requirement: For deduplication to work, your objects must have proper hashCode() and equals() implementations (see HashMap Deep Dive).
// ❌ Without proper equals/hashCode — duplicates sneak in!
class Person {
String name;
Person(String name) { this.name = name; }
}
Set<Person> set = new HashSet<>();
set.add(new Person("Alice"));
set.add(new Person("Alice"));
set.size(); // 2! Different objects with different identity hashCodes
// ✅ With proper equals/hashCode — deduplication works
class Person {
String name;
Person(String name) { this.name = name; }
@Override
public boolean equals(Object o) {
return o instanceof Person p && name.equals(p.name);
}
@Override
public int hashCode() {
return name.hashCode();
}
}
Set<Person> set = new HashSet<>();
set.add(new Person("Alice"));
set.add(new Person("Alice"));
set.size(); // 1 ✓
Set Operations — Union, Intersection, Difference
Java doesn't have dedicated union/intersection operators, but you can use addAll, retainAll, and removeAll:
Set<String> setA = new HashSet<>(Set.of("A", "B", "C", "D"));
Set<String> setB = new HashSet<>(Set.of("C", "D", "E", "F"));
// UNION: A ∪ B
Set<String> union = new HashSet<>(setA);
union.addAll(setB);
// {A, B, C, D, E, F}
// INTERSECTION: A ∩ B
Set<String> intersection = new HashSet<>(setA);
intersection.retainAll(setB);
// {C, D}
// DIFFERENCE: A \ B
Set<String> difference = new HashSet<>(setA);
difference.removeAll(setB);
// {A, B}
// SYMMETRIC DIFFERENCE: A △ B
Set<String> symDiff = new HashSet<>(setA);
symDiff.addAll(setB);
Set<String> temp = new HashSet<>(setA);
temp.retainAll(setB);
symDiff.removeAll(temp);
// {A, B, E, F}
Choosing the Right Set
| Feature | HashSet | LinkedHashSet | TreeSet |
|---|---|---|---|
| Backed by | HashMap | LinkedHashMap | TreeMap |
| add/contains/remove | O(1) | O(1) | O(log n) |
| Iteration order | Unpredictable | Insertion order | Sorted |
| Null elements | 1 allowed | 1 allowed | ❌ Not allowed |
| Memory per element | ~32 bytes | ~48 bytes | ~48 bytes |
| Navigation ops | ❌ | ❌ | ✅ floor/ceiling/subset |
| Thread-safe | ❌ | ❌ | ❌ |
Common Pitfalls
Pitfall 1: Mutable Elements in a Set
// ❌ Same problem as mutable HashMap keys
List<String> item = new ArrayList<>(List.of("hello"));
Set<List<String>> set = new HashSet<>();
set.add(item);
item.add("world"); // hashCode changes!
set.contains(item); // false! Can't find it anymore
set.size(); // 1, but it's effectively lost
Pitfall 2: TreeSet with Non-Comparable Elements
// ❌ ClassCastException!
TreeSet<Object> set = new TreeSet<>();
set.add(new Object()); // Object doesn't implement Comparable!
Pitfall 3: Using Wrong Set for Performance-Critical Code
// ❌ TreeSet when you just need uniqueness
TreeSet<String> ids = new TreeSet<>(); // O(log n) per add — unnecessary!
for (String id : millionIds) {
ids.add(id);
}
// ✅ HashSet for simple deduplication
HashSet<String> ids = new HashSet<>(); // O(1) per add
Summary
The one-liner: Java Sets are thin wrappers over Maps — HashSet (fast, unordered), TreeSet (sorted, O(log n)), LinkedHashSet (insertion-ordered) — choose based on whether you need ordering, not the Set API itself.