Performance Pitfalls in Collections: Big-O Meets Reality

Performance Pitfalls in Collections: Big-O Meets Reality. Java Collections from a practitioner point of view with timeless lessons. If you have ever picked a structure because the chart said O of one or O of log n, this is for you.

Context

Sun is on its way into Oracle, Java 6 is the daily driver, and most laptops ship with two or four cores. We have bigger heaps and faster CPUs, yet apps still crawl when a list grows just a bit. Everyone quotes Big O in code reviews. Then we ship, and the part that looked quick on paper runs slow in production. The gap between the math and what the machine really does is where our bugs hide. Let us walk through the everyday traps in Java Collections performance and keep it grounded in what actually happens on HotSpot today.

Definitions we keep mixing up

Big O: upper bound growth as n grows. It ignores constants. Those constants are where all the pain lives in real code.
Memory locality: data placed close together gives the CPU cache a chance. Pointers scatter data and make the CPU wait.
Autoboxing: int to Integer and back. Free in syntax, not free in time or memory.
Hash quality: a good hashCode spreads keys. A bad one makes a HashMap crawl.
Load factor and capacity: how full a HashMap or HashSet gets before it grows. Grow events are spikes you can avoid.
Allocation cost: new objects stress the allocator and the collector. Many tiny nodes do more GC work than a single big array.
JIT warmup: HotSpot needs steady loops to produce quick code. One shot micro tests lie to you.

Examples from the trenches

ArrayList and LinkedList look similar on the surface and live in opposite worlds inside. One is a growing array. The other is a chain of nodes. Big O says random access on ArrayList is O of one and on LinkedList is O of n. That part is true. The sneaky bit is that almost everything you do all day benefits from the array because it is cache friendly.

// Append N numbers
static void appendList(List<Integer> list, int n) {
    for (int i = 0; i < n; i++) {
        list.add(i); // autoboxing here
    }
}

public static void main(String[] args) {
    int n = 2_000_000;

    long t1 = System.nanoTime();
    List<Integer> a = new ArrayList<Integer>();
    appendList(a, n);
    long t2 = System.nanoTime();

    List<Integer> l = new LinkedList<Integer>();
    appendList(l, n);
    long t3 = System.nanoTime();

    System.out.println("ArrayList append ms: " + (t2 - t1)/1_000_000);
    System.out.println("LinkedList append ms: " + (t3 - t2)/1_000_000);
}

Both are O of one per append on average, yet the array version usually wins by a large margin. The array copies on growth, but it copies a flat block that the CPU loves. The linked list allocates a node per element, each with pointers. More allocations, more cache misses, more GC. Constant factors are king.

Now look at remove from the front. Big O says ArrayList.remove(0) is O of n due to the shift, and LinkedList.removeFirst() is O of one. That looks like a win for the list of nodes.

// Remove first element until empty
static long drainFront(List<Integer> list) {
    long start = System.nanoTime();
    while (!list.isEmpty()) {
        list.remove(0);
    }
    return System.nanoTime() - start;
}

public static void main(String[] args) {
    int n = 500_000;
    List<Integer> a = new ArrayList<Integer>(n);
    for (int i = 0; i < n; i++) a.add(i);

    List<Integer> l = new LinkedList<Integer>();
    for (int i = 0; i < n; i++) l.add(i);

    System.out.println("ArrayList drainFront ms: " + drainFront(a)/1_000_000);
    System.out.println("LinkedList drainFront ms: " + drainFront(l)/1_000_000);
}

On a fresh JVM both are slow. After warmup the linked list tends to win here. The catch is that this is a strange thing to do in app code. You can avoid it by using a Deque. In Java 6 that means ArrayDeque, which gives you O of one at both ends with array speed and no node churn.

contains on a List is O of n. A lot of teams still do it on a list because they already have one. If you need membership checks, move the keys to a HashSet. It is very often the biggest win you can get with one line.

// Slow path
boolean inList = usersList.contains("alice");

// Quick path
Set<String> users = new HashSet<String>(usersList);
boolean inSet = users.contains("alice");

HashMap tuning is also easy. People let it grow on default settings and then wonder about pauses and spikes. If you know you will put about 50 thousand entries, set the capacity upfront. The growth rule is power of two, so pick a bit more than you need and let the load factor stay at 0.75.

// Expect around 50k entries
int expected = 50_000;
int cap = 1;
while (cap < expected / 0.75) cap <<= 1;
Map<String, Integer> map = new HashMap<String, Integer>(cap);

Now the secret cost center in maps: your own code. A map call goes through hashCode, bucket chase, and equals. If you write heavy equals or a slow Comparator, you pay every time. The data structure gets the blame, but your method is the real tax.

final class User {
    final String first;
    final String last;
    final Date created;

    User(String f, String l, Date c) {
        this.first = f; this.last = l; this.created = c;
    }

    // Slow hash: allocates, copies, and depends on Date
    public int hashCode() {
        return (first + "|" + last + "|" + created.getTime()).hashCode();
    }

    // Better hash: mix fields without new Strings
    public int fastHashCode() {
        int h = 17;
        h = 31 * h + first.hashCode();
        h = 31 * h + last.hashCode();
        h = 31 * h + (int)(created.getTime() ^ (created.getTime() >>> 32));
        return h;
    }
}

Notice the string concat in the first version. That allocates on every lookup. The second one is just math on ints and longs. Same Big O, wildly different speed.

TreeMap and TreeSet give sorted order with O of log n operations. They rely on a Comparator or Comparable. If the compare is simple, all good. If it calls the database or parses text, you just turned your tree into a parking brake. Keep compares cheap and pure.

Comparator<String> cmp = new Comparator<String>() {
    public int compare(String a, String b) {
        // Do not do this
        return expensiveNormalize(a).compareTo(expensiveNormalize(b));
    }
};

ConcurrentHashMap is the default for shared maps. Collections.synchronizedMap is a single big lock. On a quad core box the striped map flies while the sync wrapper lines up threads. For mostly reads with rare writes, CopyOnWriteArrayList makes sense. For write heavy paths, it is a bad fit.

// Good for many readers, few writers
List<Listener> listeners = new CopyOnWriteArrayList<Listener>();

// Good for concurrent updates
Map<String, User> cache = new ConcurrentHashMap<String, User>();

Counterexamples that break the simple advice

LinkedList can be the right tool when you already sit on a ListIterator at the target spot and need to add or remove near that position many times. No index math, no array copies, and the iterator stays valid. That is rare but real in some parsers and token streams.

ArrayList can lose if you regularly insert near the front and you cannot batch. The shift cost adds up. If order does not matter, you can swap with the last element and pop. That trick turns a remove into O of one.

// Remove by index without preserving order
static <T> void fastRemove(List<T> list, int idx) {
    int last = list.size() - 1;
    Collections.swap(list, idx, last);
    list.remove(last);
}

HashSet can crawl when the keys have a poor hash. A common one is an object that leaves hashCode as the default index of the object. That spreads fine for identity based sets, not for logical sets. Pair that with a flood of collisions and you will see long chains and lots of equals calls. Fix the hash and the structure wakes up.

Big O hides memory. A linked list uses far more memory per element than an array backed list. When the heap grows, the collector works harder. That is not counted in the time complexity but it is paid in your process time. If your data fits in cache on one choice and spills on the other, the math loses to physics.

A quick micro test that will not lie to you as much

Microbenchmarks are tricky. HotSpot throws away dead code, inlines, unrolls, and needs warmup. JMH is not part of our toolbox yet. You can still do a simple repeat loop with warmup and touch the results so the JIT cannot throw them away.

static long bench(Runnable r, int warm, int reps) {
    for (int i = 0; i < warm; i++) r.run(); // warmup
    long best = Long.MAX_VALUE;
    for (int i = 0; i < reps; i++) {
        long t = System.nanoTime();
        r.run();
        long dt = System.nanoTime() - t;
        if (dt < best) best = dt; // track best to reduce noise
    }
    return best;
}

Run your candidates back to back in the same JVM, on server mode, with the same data. Always sanity check the result so you know the code really did the work.

Decision rubric you can apply on Monday

Know the hot path. Do you mostly read, mostly write, or both
Need random access to elements by index Use ArrayList
Need add or remove at both ends Use ArrayDeque
Need membership checks or uniqueness Use HashSet with a real hashCode
Need key value lookups Use HashMap. If you also need order of insertion, pick LinkedHashMap
Need sorted order and range queries Use TreeMap or TreeSet with a cheap compare
Shared maps with many threads Use ConcurrentHashMap
Many readers and rare writes on a list Use CopyOnWriteArrayList
Lots of primitives and memory pressure Consider Trove primitive collections to dodge autoboxing
Large predictable sizes Pre size with capacity so you do not pay for growth
Order does not matter on remove Use swap then remove on ArrayList
Need to splice while iterating and you have a position Use ListIterator on LinkedList

One more practical step. Profile before and after. Put a stopwatch around the call site or use your profiler. Code by rumor is how we end up with a linked list that nobody needs or a tree map sorted by a slow compare.

Lesson learned

Big O is great at telling you if an approach is doomed as n grows. It does not tell you what feels quick on a real machine today. The surprisingly fast stuff in Java Collections is the code that plays nice with the CPU cache, avoids extra objects, and keeps your own methods lean. The surprisingly slow stuff hides behind friendly method names like contains and remove on the front of a list. The garbage collector collects what you allocate. If your structure needs a fresh node for each step, you also pay with pauses and lost throughput.

Write your picks like this:

Start with ArrayList, HashMap, HashSet, and ArrayDeque
Switch to TreeMap or TreeSet only when you need sorted order
Switch to LinkedList only when you truly benefit from node based edits with a live iterator
Use ConcurrentHashMap for shared mutable maps
Pre size when you know the shape of the data
Keep equals, hashCode, and Comparator simple and cheap
Measure on the same box with a fair warmup

We spend a lot of time arguing Big O on whiteboards. The CPU does not care about our drawings. It cares about cache lines, pointers, and how much junk we allocate. If you take one thing from this, let it be this line: Choose the structure by the work you actually do, not by the chart you remember. The collections that look boring are often the quickest. The fixes are usually small. Pre size a map. Swap then remove. Move membership checks to a set. And when in doubt, write the tiny test and watch the clock.

Software Architecture Software Engineering