Skip to main content

Command Palette

Search for a command to run...

Deep Dive (java.util) - ArrayList

What really happens under the hood

Published
9 min read

"The simplest data structure is often the fastest — if you understand what it's doing underneath."

Welcome to the first deep dive. We're going to tear apart java.util.ArrayList — the most commonly used collection in Java. By the end, you'll understand not just how to call .add() and .get(), but exactly what the JVM is doing with memory when you do.


What is ArrayList, Really?

At its core, ArrayList is a resizable array. That's it. It wraps a regular Java Object[] array and provides dynamic resizing when the array fills up.

The RandomAccess marker interface is important — it tells algorithms (like Collections.sort()) that this list supports constant-time random access via index. This changes how algorithms choose to iterate.


The Backing Array

Here's what ArrayList looks like inside the JVM:

public class ArrayList<E> extends AbstractList<E>
        implements List<E>, RandomAccess, Cloneable, java.io.Serializable {
    
    // The actual storage — a plain Object array
    transient Object[] elementData;
    
    // The number of elements actually stored (NOT the array length)
    private int size;
}

Critical distinction: sizeelementData.length.

  • size = how many elements you've added

  • elementData.length = the capacity (how many it can hold before resizing)

The blue slots contain our data. The dark slots are allocated but empty — wasted memory that exists so we don't have to resize on every single add().


How add() Works — Amortized Growth

This is the most important mechanism to understand. When you call list.add(element), here's what happens:

// Simplified from OpenJDK source
public boolean add(E e) {
    ensureCapacityInternal(size + 1);  // Step 1: Make sure there's room
    elementData[size++] = e;           // Step 2: Store element, increment size
    return true;
}

Step 1: Capacity Check

If size + 1 > elementData.length, the array is full. Time to grow.

Step 2: The Growth Algorithm

private void grow(int minCapacity) {
    int oldCapacity = elementData.length;
    int newCapacity = oldCapacity + (oldCapacity >> 1); // Grow by 50%
    if (newCapacity < minCapacity)
        newCapacity = minCapacity;
    elementData = Arrays.copyOf(elementData, newCapacity);
}

The key formula: newCapacity = oldCapacity × 1.5 (using bit shift >> 1 for integer division by 2).

What Arrays.copyOf Does Under the Hood

System.arraycopy() is a native method — it drops down to the operating system's memcpy or memmove, which can use SIMD instructions for bulk memory copying. It's much faster than a Java for loop.

Amortized O(1)

Even though resizing is O(n), it happens so infrequently that the amortized cost of add() is O(1). Here's the math:

  • Adding 10 elements: 0 copies (fits initial capacity)

  • Adding element 11: 10 copies (resize to 15)

  • Adding elements 12–15: 0 copies

  • Adding element 16: 15 copies (resize to 22)

  • Total copies after 22 adds: 10 + 15 = 25 copies for 22 elements ≈ 1.1 copies/element


How get() and set() Work — O(1) Access

This is where ArrayList shines:

public E get(int index) {
    rangeCheck(index);          // throws IndexOutOfBoundsException if invalid
    return elementData(index);  // return elementData[index] — that's IT
}

public E set(int index, E element) {
    rangeCheck(index);
    E oldValue = elementData(index);
    elementData[index] = element;
    return oldValue;
}

It's a single array access. Arrays in Java are contiguous blocks of memory, so accessing elementData[index] is just:

address = baseAddress + (index × referenceSize)

On a 64-bit JVM with compressed oops (default for heaps < 32GB):

  • referenceSize = 4 bytes

  • elementData[5] = baseAddress + 20 bytes

That's one addition and one memory read. O(1) — constant time, regardless of list size.


How remove() Works — The Expensive Shift

This is where ArrayList pays a price:

public E remove(int index) {
    rangeCheck(index);
    modCount++;
    
    E oldValue = elementData(index);
    
    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index+1, elementData, index, numMoved);
    
    elementData[--size] = null; // Let GC collect the removed element
    return oldValue;
}

Every element after the removed one must shift left by one position.

Time complexity:

  • remove(0) — worst case: O(n), shifts ALL elements

  • remove(size-1) — best case: O(1), no shifting needed

  • Average: O(n/2) = O(n)

Pro tip: If you're removing elements while iterating, iterate backwards to avoid shifting issues and index corruption.


Memory Layout & Cache Locality

Here's why ArrayList is often faster than LinkedList even for operations where LinkedList has better theoretical complexity:

The Object[] array stores references (pointers) contiguously. When the CPU reads one reference, the entire cache line (typically 64 bytes) is loaded, bringing neighboring references into L1 cache for free.

On a compressed-oops JVM (4-byte references), one 64-byte cache line holds 16 references. Iterating through an ArrayList touches very few cache lines.

Key insight: The objects themselves may be scattered in the heap, but the references to them are packed together. This matters enormously for iteration speed.


Iterator and ConcurrentModificationException

ArrayList uses a fail-fast iterator:

private class Itr implements Iterator<E> {
    int cursor;       // next element to return
    int lastRet = -1; // last element returned
    int expectedModCount = modCount;  // snapshot of structural modification count
    
    public E next() {
        checkForComodification();  // Throws if list was modified!
        // ...
    }
    
    final void checkForComodification() {
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
    }
}

Safe removal during iteration:

// ❌ WRONG — throws ConcurrentModificationException
for (String s : list) {
    if (s.equals("remove-me")) {
        list.remove(s);  // Modifies list during iteration!
    }
}

// ✅ CORRECT — use Iterator.remove()
Iterator<String> it = list.iterator();
while (it.hasNext()) {
    if (it.next().equals("remove-me")) {
        it.remove();  // Safe — modCount is updated in the iterator too
    }
}

// ✅ ALSO CORRECT — use removeIf (Java 8+)
list.removeIf(s -> s.equals("remove-me"));

Sizing & Capacity Management

Default Capacity Behavior

// Default constructor — starts with EMPTY array (not 10!)
ArrayList<String> list1 = new ArrayList<>();
// elementData = {} (length 0)
// On first add(), expands to DEFAULT_CAPACITY = 10

// Pre-sized — avoids early resizing
ArrayList<String> list2 = new ArrayList<>(1000);
// elementData = new Object[1000]

Since Java 8: The default constructor uses an empty array {} and only allocates the default capacity of 10 on the first add(). This saves memory for lists that are never populated.

trimToSize() — Reclaim Wasted Memory

ArrayList<String> list = new ArrayList<>(1000); // capacity=1000
list.add("only");
list.add("two");
list.add("items");
// size=3, capacity=1000 — wasting 997 slots!

list.trimToSize();
// size=3, capacity=3 — array shrunk to fit

ensureCapacity() — Pre-allocate for Bulk Inserts

ArrayList<String> list = new ArrayList<>();
list.ensureCapacity(50_000); // Pre-allocate before bulk insert

for (int i = 0; i < 50_000; i++) {
    list.add("item-" + i);  // No resizing needed!
}

ArrayList vs Arrays

Aspect Object[] Array ArrayList<E>
Size Fixed at creation Dynamic (auto-resizes)
Type safety No generics (pre-Java) Full generics support
Primitives int[], double[] etc. ❌ Must use Integer, Double (boxing)
Memory overhead None ~40 bytes for ArrayList object + unused capacity
API Length only (arr.length) Rich API (sort, search, stream, etc.)
Thread safety None None (use Collections.synchronizedList())
Null elements ✅ Allowed ✅ Allowed

Performance tip: If you know the exact size and are working with primitives, use a raw array. The boxing/unboxing overhead of ArrayList<Integer> vs int[] is significant in hot loops.


Common Pitfalls

Pitfall 1: Creating from Arrays.asList()

// ❌ TRAP: This returns a FIXED-SIZE list, not a real ArrayList!
List<String> fixed = Arrays.asList("a", "b", "c");
fixed.add("d");    // 💥 UnsupportedOperationException!
fixed.set(0, "z"); // ✅ This works — modification is OK, just not resizing

// ✅ FIX: Wrap in a real ArrayList
List<String> mutable = new ArrayList<>(Arrays.asList("a", "b", "c"));
mutable.add("d");  // ✅ Works

// ✅ Java 9+ alternative
List<String> immutable = List.of("a", "b", "c");  // Truly immutable

Pitfall 2: Forgetting Boxing Cost

// ❌ SLOW: Each int is autoboxed to Integer object
ArrayList<Integer> list = new ArrayList<>();
for (int i = 0; i < 1_000_000; i++) {
    list.add(i);  // autoboxing: Integer.valueOf(i) called each time
}

// Memory: 1M Integer objects × ~16 bytes each = ~16 MB of objects
// Plus: 1M references in Object[] = ~4 MB
// Total: ~20 MB for 1M ints

// ✅ BETTER: Use int[] if you just need storage
int[] array = new int[1_000_000];
// Memory: ~4 MB total (1M × 4 bytes)

Pitfall 3: Removing in a Forward Loop

// ❌ BUG: Skips elements after removal
List<String> list = new ArrayList<>(List.of("a", "b", "b", "c"));
for (int i = 0; i < list.size(); i++) {
    if (list.get(i).equals("b")) {
        list.remove(i);  // After removing index 1, "b" at index 2 shifts to index 1
                         // Loop increments i to 2, skipping the second "b"!
    }
}
// Result: ["a", "b", "c"] — missed one!

// ✅ FIX: Iterate backwards
for (int i = list.size() - 1; i >= 0; i--) {
    if (list.get(i).equals("b")) {
        list.remove(i);
    }
}

When to Use ArrayList

Use ArrayList when:

  • You need indexed access (get(i))

  • You mostly add to the end (append)

  • You iterate sequentially (cache-friendly)

  • You don't know the size upfront

Avoid ArrayList when:

  • You frequently insert/remove at the beginning (O(n) shifting)

  • You need thread-safe operations (use CopyOnWriteArrayList or Collections.synchronizedList)

  • You're storing millions of primitives (use raw arrays or specialized libraries)


Summary

The one-liner: ArrayList is a dynamically-resized Object[] with O(1) random access, amortized O(1) appends, but O(n) inserts and removals that require shifting.