Fundamentals 7 min read

Performance Comparison of Different Java List Deduplication Methods

This article examines several Java deduplication techniques—including List.contains, HashSet, double-loop removal, and Stream.distinct—by providing sample code, measuring execution time on a 20,000‑element list, and analyzing their time complexities to guide developers toward efficient duplicate‑removal strategies.

Architect's Tech Stack

Aug 3, 2023

Performance Comparison of Different Java List Deduplication Methods

The author creates a test list of 20,000 strings where half are duplicates and demonstrates four ways to remove duplicates in Java.

public static List<String> getTestList() {
    List<String> list = new ArrayList<>();
    for (int i = 1; i <= 10000; i++) {
        list.add(String.valueOf(i));
    }
    for (int i = 10000; i >= 1; i--) {
        list.add(String.valueOf(i));
    }
    return list;
}

**Method 1 – Using list.contains **

private static void useContainDistinct(List<String> testList) {
    System.out.println("contains 开始去重，条数：" + testList.size());
    List<String> result = new ArrayList<>();
    for (String str : testList) {
        if (!result.contains(str)) {
            result.add(str);
        }
    }
    System.out.println("contains 去重完毕，条数：" + result.size());
}

Timing code:

public static void main(String[] args) {
    List<String> testList = getTestList();
    StopWatch sw = new StopWatch();
    sw.start();
    useContainDistinct(testList);
    sw.stop();
    System.out.println("去重 最终耗时" + sw.getTotalTimeMillis());
}

Result: the contains approach is very slow (O(n²) behavior) and the author recommends not using it.

**Method 2 – Using HashSet **

private static void useSetDistinct(List<String> testList) {
    System.out.println("HashSet.add 开始去重，条数：" + testList.size());
    List<String> result = new ArrayList<>(new HashSet<>(testList));
    System.out.println("HashSet.add 去重完毕，条数：" + result.size());
}

Timing code is analogous to the previous main method, calling useSetDistinct. The HashSet approach runs in O(n) time (average O(1) per insertion) and is recommended.

**Method 3 – Double‑for‑loop removal**

private static void use2ForDistinct(List<String> testList) {
    System.out.println("list 双循环 开始去重，条数：" + testList.size());
    for (int i = 0; i < testList.size(); i++) {
        for (int j = i + 1; j < testList.size(); j++) {
            if (testList.get(i).equals(testList.get(j))) {
                testList.remove(j);
            }
        }
    }
    System.out.println("list 双循环  去重完毕，条数：" + testList.size());
}

This method is the slowest (O(n²)) and produces messy code; the author advises against using it.

**Method 4 – Java 8 Stream distinct() **

private static void useStreamDistinct(List<String> testList) {
    System.out.println("stream 开始去重，条数：" + testList.size());
    List<String> result = testList.stream().distinct().collect(Collectors.toList());
    System.out.println("stream 去重完毕，条数：" + result.size());
}

The Stream approach offers concise code and acceptable performance, though not as fast as the HashSet method.

**Complexity analysis**: list.contains internally uses indexOf, leading to O(n) per check and overall O(n²); HashSet.add hashes the element and inserts in O(1) average time, giving O(n) overall; Stream.distinct also relies on a hash‑based set internally, achieving O(n) with lower constant overhead.

**Conclusion**: For large collections, prefer HashSet‑based deduplication (or Stream.distinct for readability). The contains and double‑loop methods are inefficient and should be avoided.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

java Performance algorithm Deduplication Collections Stream hashset

Written by

Architect's Tech Stack

Java backend, microservices, distributed systems, containerized programming, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.