Understanding ExecutorCompletionService: Root Cause Analysis and Best Practices
This article analyzes a production outage caused by misuse of ExecutorCompletionService, explains the underlying Java concurrency mechanisms, compares it with ExecutorService, provides correct code examples, and offers practical guidelines to avoid memory leaks and improve backend reliability.
Incident Description
From 6:32 am a small number of users accessing the app caused homepage errors, which escalated to a large‑scale outage by 7:20 am and were resolved at 7:36 am.
Overall Process
At 6:58 am an alarm was triggered and the team considered rolling back recent code changes. By 7:07 am they began investigating, and at 7:36 am the rollback completed and the service recovered.
Root Cause
The problematic code used ExecutorCompletionService but never called take() or poll() , leaving completed tasks in the internal queue and causing an OOM.
Incorrect code example:
public static void test() throws InterruptedException, ExecutionException {
Executor executor = Executors.newFixedThreadPool(3);
CompletionService
service = new ExecutorCompletionService<>(executor);
service.submit(new Callable
() {
@Override
public String call() throws Exception {
return "HelloWorld--" + Thread.currentThread().getName();
}
});
// Missing service.take() or service.poll()
}Correct usage includes retrieving results:
public static void test() throws InterruptedException, ExecutionException {
Executor executor = Executors.newFixedThreadPool(3);
CompletionService
service = new ExecutorCompletionService<>(executor);
service.submit(new Callable
() {
@Override
public String call() throws Exception {
return "HelloWorld--" + Thread.currentThread().getName();
}
});
service.take().get(); // Properly consumes the completed task
}The article then demonstrates how ExecutorService behaves when submitting three tasks with different durations (10 s, 3 s, 6 s) and shows that iterating over the Future list blocks on the longest task.
It contrasts this with ExecutorCompletionService , which returns completed tasks as soon as they finish, avoiding unnecessary blocking:
ExecutorService executorService = Executors.newCachedThreadPool();
ExecutorCompletionService
completionService = new ExecutorCompletionService<>(executorService);
completionService.submit(() -> { /* 10 s task */ return "Result A"; });
completionService.submit(() -> { /* 3 s task */ return "Result B"; });
completionService.submit(() -> { /* 6 s task */ return "Result C"; });
for (int i = 0; i < 3; i++) {
String result = completionService.take().get();
System.out.println(result + ",你去接他");
}
Thread.currentThread().join();Why use take() then get() ? take() blocks until a completed task is available, guaranteeing that the subsequent get() returns immediately without further waiting.
The article dives into the source of ExecutorCompletionService , explaining that it implements CompletionService and uses a LinkedBlockingQueue to store completed FutureTask instances (named QueueingFuture ) whose done() method enqueues them.
Key takeaways:
When using ExecutorService.submit , you must handle each Future individually.
CompletionService automatically tracks completed futures, allowing you to process results as soon as they finish.
In gateway RPC scenarios, using CompletionService prevents a single slow downstream call from blocking the entire request flow.
If you do not consume completed tasks (e.g., by calling take() , poll() , or remove() ), the internal queue retains references, leading to memory leaks and OOM.
Summary and Recommendations
Before deployment, enforce strict code review, record rollback versions, and verify downgrade paths. After deployment, monitor memory growth, GC activity, thread counts, and latency percentiles (TP99/TP999). Ensure that any use of ExecutorCompletionService always removes completed tasks from the queue to avoid heap accumulation.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.