Backend Development 12 min read

Understanding ExecutorCompletionService: Root Cause Analysis and Best Practices

This article analyzes a production outage caused by misuse of ExecutorCompletionService, explains the underlying Java concurrency mechanisms, compares it with ExecutorService, provides correct code examples, and offers practical guidelines to avoid memory leaks and improve backend reliability.

Top Architect

Jul 22, 2022

Understanding ExecutorCompletionService: Root Cause Analysis and Best Practices

Incident Description

From 6:32 am a small number of users accessing the app caused homepage errors, which escalated to a large‑scale outage by 7:20 am and were resolved at 7:36 am.

Overall Process

At 6:58 am an alarm was triggered and the team considered rolling back recent code changes. By 7:07 am they began investigating, and at 7:36 am the rollback completed and the service recovered.

Root Cause

The problematic code used ExecutorCompletionService but never called take() or poll(), leaving completed tasks in the internal queue and causing an OOM.

Incorrect code example:

public static void test() throws InterruptedException, ExecutionException {
    Executor executor = Executors.newFixedThreadPool(3);
    CompletionService<String> service = new ExecutorCompletionService<>(executor);
    service.submit(new Callable<String>() {
        @Override
        public String call() throws Exception {
            return "HelloWorld--" + Thread.currentThread().getName();
        }
    });
    // Missing service.take() or service.poll()
}

Correct usage includes retrieving results:

public static void test() throws InterruptedException, ExecutionException {
    Executor executor = Executors.newFixedThreadPool(3);
    CompletionService<String> service = new ExecutorCompletionService<>(executor);
    service.submit(new Callable<String>() {
        @Override
        public String call() throws Exception {
            return "HelloWorld--" + Thread.currentThread().getName();
        }
    });
    service.take().get(); // Properly consumes the completed task
}

The article then demonstrates how ExecutorService behaves when submitting three tasks with different durations (10 s, 3 s, 6 s) and shows that iterating over the Future list blocks on the longest task.

It contrasts this with ExecutorCompletionService, which returns completed tasks as soon as they finish, avoiding unnecessary blocking:

ExecutorService executorService = Executors.newCachedThreadPool();
ExecutorCompletionService<String> completionService = new ExecutorCompletionService<>(executorService);
completionService.submit(() -> { /* 10 s task */ return "Result A"; });
completionService.submit(() -> { /* 3 s task */ return "Result B"; });
completionService.submit(() -> { /* 6 s task */ return "Result C"; });
for (int i = 0; i < 3; i++) {
    String result = completionService.take().get();
    System.out.println(result + "，你去接他");
}
Thread.currentThread().join();

Why use take() then get()? take() blocks until a completed task is available, guaranteeing that the subsequent get() returns immediately without further waiting.

The article dives into the source of ExecutorCompletionService, explaining that it implements CompletionService and uses a LinkedBlockingQueue to store completed FutureTask instances (named QueueingFuture) whose done() method enqueues them.

Key takeaways:

When using ExecutorService.submit, you must handle each Future individually. CompletionService automatically tracks completed futures, allowing you to process results as soon as they finish.

In gateway RPC scenarios, using CompletionService prevents a single slow downstream call from blocking the entire request flow.

If you do not consume completed tasks (e.g., by calling take(), poll(), or remove()), the internal queue retains references, leading to memory leaks and OOM.

Summary and Recommendations

Before deployment, enforce strict code review, record rollback versions, and verify downgrade paths. After deployment, monitor memory growth, GC activity, thread counts, and latency percentiles (TP99/TP999). Ensure that any use of ExecutorCompletionService always removes completed tasks from the queue to avoid heap accumulation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java concurrency ThreadPool OOM ExecutorCompletionService

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.