Performance Comparison of Different Java List Deduplication Methods
This article examines several Java deduplication techniques—including List.contains, HashSet, double-loop removal, and Stream.distinct—by providing sample code, measuring execution time on a 20,000‑element list, and analyzing their time complexities to guide developers toward efficient duplicate‑removal strategies.
The author creates a test list of 20,000 strings where half are duplicates and demonstrates four ways to remove duplicates in Java.
public static List getTestList() { List list = new ArrayList<>(); for (int i = 1; i <= 10000; i++) { list.add(String.valueOf(i)); } for (int i = 10000; i >= 1; i--) { list.add(String.valueOf(i)); } return list; }
**Method 1 – Using list.contains **
private static void useContainDistinct(List testList) { System.out.println("contains 开始去重,条数:" + testList.size()); List result = new ArrayList<>(); for (String str : testList) { if (!result.contains(str)) { result.add(str); } } System.out.println("contains 去重完毕,条数:" + result.size()); }
Timing code:
public static void main(String[] args) { List testList = getTestList(); StopWatch sw = new StopWatch(); sw.start(); useContainDistinct(testList); sw.stop(); System.out.println("去重 最终耗时" + sw.getTotalTimeMillis()); }
Result: the contains approach is very slow (O(n²) behavior) and the author recommends not using it.
**Method 2 – Using HashSet **
private static void useSetDistinct(List testList) { System.out.println("HashSet.add 开始去重,条数:" + testList.size()); List result = new ArrayList<>(new HashSet<>(testList)); System.out.println("HashSet.add 去重完毕,条数:" + result.size()); }
Timing code is analogous to the previous main method, calling useSetDistinct . The HashSet approach runs in O(n) time (average O(1) per insertion) and is recommended.
**Method 3 – Double‑for‑loop removal**
private static void use2ForDistinct(List testList) { System.out.println("list 双循环 开始去重,条数:" + testList.size()); for (int i = 0; i < testList.size(); i++) { for (int j = i + 1; j < testList.size(); j++) { if (testList.get(i).equals(testList.get(j))) { testList.remove(j); } } } System.out.println("list 双循环 去重完毕,条数:" + testList.size()); }
This method is the slowest (O(n²)) and produces messy code; the author advises against using it.
**Method 4 – Java 8 Stream distinct() **
private static void useStreamDistinct(List testList) { System.out.println("stream 开始去重,条数:" + testList.size()); List result = testList.stream().distinct().collect(Collectors.toList()); System.out.println("stream 去重完毕,条数:" + result.size()); }
The Stream approach offers concise code and acceptable performance, though not as fast as the HashSet method.
**Complexity analysis**: list.contains internally uses indexOf , leading to O(n) per check and overall O(n²); HashSet.add hashes the element and inserts in O(1) average time, giving O(n) overall; Stream.distinct also relies on a hash‑based set internally, achieving O(n) with lower constant overhead.
**Conclusion**: For large collections, prefer HashSet‑based deduplication (or Stream.distinct for readability). The contains and double‑loop methods are inefficient and should be avoided.
Architect's Tech Stack
Java backend, microservices, distributed systems, containerized programming, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.