Java Collections in Practice: Choosing the Right List or Map
Picking the right collection is not a theory exercise. It decides how many servers you need, how long a page takes to render, and whether a late night pager wakes you up. I keep seeing the same mistakes in code reviews. Someone reaches for LinkedList for fast inserts, or uses a plain HashMap under multiple threads and hopes for the best. Then the bug reports arrive. Or the latency graphs look like a heart monitor.
Today I want to look at Java Collections the way we use them at work. Lists and maps. No academic talk. Just what you need to choose well. Java 6 is on most servers right now, Java 7 is on the horizon, and teams are still moving from Java 5 in a lot of shops. That means generics are standard, ConcurrentHashMap is widely available, and many of us are flirting with Google Guava. The tools are on the table. Let’s put them to work.
Problem framing
When you pick a collection, you are trading between speed, memory, ordering, and thread safety. Those four pull on each other. You will not get all four at once. The trick is to match the shape of your problem with the structure the collection gives you.
Ask yourself these questions before you type new Something:
- How do I access elements most of the time. Random index, by key, or just iterate.
- Do I need ordering. Insertion order, sorted by key, or no order at all.
- Will multiple threads touch it. Read mostly, or read and write.
- What is the size. Tens, thousands, or millions. Memory overhead starts to bite as you grow.
With that in mind, let’s walk three cases that come up all the time, and pick a list or map that fits.
Three cases from the field
Case 1. Append a lot and iterate: ArrayList wins
You are reading a large file of events, storing them, then iterating to process. The usual choice is between ArrayList and LinkedList. This one is simple. If you are mostly appending and then scanning, ArrayList is the right call.
Why. ArrayList keeps a packed array of references. Appends are amortized constant time. Iteration touches memory in order. That is cache friendly. LinkedList splits each element into a node object with two references plus the value. That adds memory overhead and extra pointer chasing during iteration. The result is slower code and a larger heap footprint.
List<String> events = new ArrayList<String>(10000);
BufferedReader reader = new BufferedReader(new FileReader("events.log"));
try {
String line;
while ((line = reader.readLine()) != null) {
events.add(line); // amortized O(1)
}
} finally {
reader.close();
}
// later
for (int i = 0; i < events.size(); i++) {
process(events.get(i));
}When would LinkedList fit. If you are doing many inserts or removes at the head or tail, or you hold a small deque and you push and pop a lot. For random access by index, LinkedList is a bad fit. Indexing into it walks half the list on average.
// A small deque. LinkedList is fine here.
Deque<String> q = new LinkedList<String>();
q.addFirst("a");
q.addLast("b");
String x = q.removeFirst();Rule of thumb: If you ever write list.get(i) inside a loop that runs often, do not use LinkedList.
Case 2. Web sessions by id with many reads: ConcurrentHashMap
You store session data by session id. A web app thread reads on every request. Some requests write. You need lookups by key with no ordering requirement. You also need it to be safe under many threads. ConcurrentHashMap is built for this.
Map<String, Session> sessions =
new ConcurrentHashMap<String, Session>(1024, 0.75f, 16);
// read
Session s = sessions.get(sessionId);
// write if absent
Session created = new Session(sessionId);
Session existing = sessions.putIfAbsent(sessionId, created);
Session use = (existing != null) ? existing : created;
// update
sessions.put(sessionId, updatedSession);
// remove
sessions.remove(sessionId);HashMap is not safe for concurrent writes. If multiple threads write to a plain HashMap, you can get lost updates and strange behavior. In old HotSpot builds you could even trigger a rehash loop. If you do not want to bring in java.util.concurrent, you can wrap a HashMap with Collections.synchronizedMap, but that puts a single lock around all operations which hurts throughput under load.
Do you need predictable iteration order for sessions, maybe to expire the oldest first. Use LinkedHashMap with access order and a tiny LRU trick. If you need thread safety, wrap it or guard it with your own lock.
final int maxEntries = 10000;
Map<String, Session> lru =
new LinkedHashMap<String, Session>(16, 0.75f, true) {
protected boolean removeEldestEntry(Map.Entry<String, Session> eldest) {
return size() > maxEntries;
}
};
Map<String, Session> safeLru = Collections.synchronizedMap(lru);If you want an LRU that scales better, take a look at Guava’s CacheBuilder. Many teams are adopting it already. If you stay on the JDK only, the LinkedHashMap trick is perfectly fine for moderate sizes.
Case 3. Sorted reports by key: TreeMap
You need to keep stats by user name and later print a report in sorted order. You might also need range views like from A to M. This is what TreeMap is for. It keeps keys in sorted order with a balanced tree. Inserts, gets, and removes are O(log n). Iteration walks from smallest to largest.
Map<String, Integer> counts = new TreeMap<String, Integer>();
add(counts, "alice");
add(counts, "bob");
add(counts, "alice");
// later, already sorted
for (Map.Entry<String, Integer> e : counts.entrySet()) {
System.out.println(e.getKey() + ": " + e.getValue());
}
void add(Map<String, Integer> m, String key) {
Integer n = m.get(key);
m.put(key, (n == null) ? 1 : n + 1);
}Need a custom sort. Pass a Comparator when you build it.
Comparator<UserId> byDomainThenName = new Comparator<UserId>() {
public int compare(UserId a, UserId b) {
int c = a.getDomain().compareTo(b.getDomain());
return (c != 0) ? c : a.getName().compareTo(b.getName());
}
};
Map<UserId, Stats> stats = new TreeMap<UserId, Stats>(byDomainThenName);If you do not need sorting, prefer HashMap. It is faster and uses less memory per entry for the same number of keys. Use LinkedHashMap if you want iteration in insertion order.
Objections and replies
But LinkedList is faster for inserts in the middle. Yes, for a list of nodes that you already navigated to with a ListIterator, and only if you do that a lot. Many apps either append or remove from the ends. If you insert in the middle by index, finding the position costs you a walk first. The win often disappears once you measure the full path.
I need a synchronized map, so I will wrap HashMap. That works for correctness, and sometimes it is enough. You will take a single lock for all operations. On a single core box or low traffic that can be fine. Under many threads on a server, ConcurrentHashMap gives better throughput because it splits operations across internal segments, so reads can proceed in parallel with many writes.
I want to keep the order I inserted items. Use LinkedHashMap for maps, and for lists think about whether order actually matters or if you just need to iterate. For predictable map iteration without sorting cost, LinkedHashMap is a great fit. For a small list where you insert in many spots, consider writing to an ArrayList and sorting later if you truly need ordering, or switch to a TreeMap if the order is by key.
What about memory. A LinkedList node adds an object per element, plus two references. A HashMap entry adds an object per key value pair plus references for key and value. ArrayList uses a single backing array of references and grows as needed. If you store millions of items, the extra objects in LinkedList and TreeMap will pressure the heap and the garbage collector will work harder. When you are near your heap limit, use ArrayList and HashMap unless ordering or special behavior is required.
I will just measure and pick the winner. Great idea, just measure the right thing. Warm up the code so the JIT can do its work. Use a loop that runs long enough to drown out noise. Call System.nanoTime around the whole operation, not each small call. Avoid printing in the inner loop. If the test is short, you can fool yourself very fast.
Why not always use TreeMap for a bit of order. Because you pay log n cost for every operation and carry a heavier node per entry. If you do not need sorted order, it is wasted work. If you need an order tied to time or access, LinkedHashMap handles that with less overhead.
We use arrays for speed, should we avoid collections. Arrays are great on the hot path. Collections give you clarity, generic types, and helpers like views and bulk operations. My rule is simple. Start with the right collection. If a profiler says this spot is hot and arrays will save the day, refactor the hot piece and keep the rest readable.
Action oriented close
Here is a quick cheat sheet to stick on your monitor:
- ArrayList: default list. Append and iterate fast. Random access is constant time.
- LinkedList: good deque. Small collections where you push and pop at ends.
- HashMap: default map. Fast, no order.
- LinkedHashMap: predictable iteration order. Can act as a simple LRU.
- TreeMap: sorted by key. Range queries and ordered reports.
- ConcurrentHashMap: many threads read and write. Better throughput than a synchronized wrapper.
To make this concrete, here is a tiny factory that bakes these choices into names. Drop it in a util package and stop guessing each time.
public final class Collections2 {
private Collections2() {}
public static <E> List<E> fastList() {
return new ArrayList<E>();
}
public static <E> Deque<E> deque() {
return new LinkedList<E>();
}
public static <K,V> Map<K,V> fastMap() {
return new HashMap<K,V>();
}
public static <K,V> Map<K,V> insertionOrderMap() {
return new LinkedHashMap<K,V>();
}
public static <K,V> Map<K,V> sortedMap(Comparator<K> cmp) {
return new TreeMap<K,V>(cmp);
}
public static <K,V> ConcurrentMap<K,V> concurrentMap() {
return new ConcurrentHashMap<K,V>();
}
public static <K,V> Map<K,V> lruCache(final int maxEntries) {
return Collections.synchronizedMap(
new LinkedHashMap<K,V>(16, 0.75f, true) {
protected boolean removeEldestEntry(Map.Entry<K,V> eldest) {
return size() > maxEntries;
}
}
);
}
}On Monday, try this:
- Scan your code for LinkedList. If you use it as a random access list, switch to ArrayList.
- Find any shared HashMap across threads. Replace with ConcurrentHashMap or guard it properly.
- If you manually maintain order while using HashMap, check if LinkedHashMap or TreeMap would simplify the code.
- Add initial capacities when you know sizes. That avoids growth copies in ArrayList and rehash in HashMap.
The Java Collections Framework gives you the right tools. The hard part is picking with intent. Start with a clear question. What am I storing. How do I read it. Do I care about order. Who touches it at the same time. Match that to a list or a map with the right shape. Your code gets simpler, the heap stays calmer, and your graphs will thank you.
Timeless lesson: default to ArrayList and HashMap, reach for LinkedHashMap when you want order, choose TreeMap when you need sorted keys, and bring in ConcurrentHashMap when threads enter the room.