Backend Development 17 min read

Improving Java High‑Concurrency I/O Performance with Quasar Coroutines

This article explains how to replace complex Java NIO callback code with lightweight Quasar coroutines to boost throughput, simplify asynchronous programming, and reduce thread usage in high‑concurrency backend systems, while also covering integration with Netty, RPC, and handling of blocking operations.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Improving Java High‑Concurrency I/O Performance with Quasar Coroutines

IO‑intensive systems under high concurrency often suffer from many blocked threads, leading to poor performance; Java NIO can alleviate this but typical callback‑based usage makes code hard to read and maintain. Recent languages such as Go and Kotlin demonstrate that coroutine‑based approaches can provide both high performance and readability.

The article shows how the open‑source Quasar coroutine framework can be used to refactor a Java NIO‑based service, addressing two main goals: (1) increase single‑machine task throughput and ensure scalability during traffic spikes, and (2) replace heavyweight asynchronous callbacks with lightweight coroutine‑based synchronization.

1. Java asynchronous programming and non‑blocking I/O

The original system processes front‑end HTTP requests and internal RPC calls, causing a sudden surge of threads during peak load. Netty (epoll‑based NIO) is already used, but its callback model leads to deep nesting and difficult maintenance.

Typical Java async tools (CompletableFuture, RxJava, Vert.x) still rely on chained callbacks. For example, a simple conditional flow using CompletableFuture becomes verbose and hard to follow:

HttpResponse a = getA();
HttpResponse b;
if (a.getBody().equals("1")) {
    b = getB1();
} else {
    b = getB2();
}
String ans = b.getBody();

After converting the three request methods to return CompletableFuture , the code grows even more complex with nested thenCompose calls.

String ans = getA()
    .thenCompose(a -> {
        if (a.getBody().equals("1")) {
            return getB1();
        } else {
            return getB2();
        }
    })
    .get()
    .getBody();

2. Coroutines

Coroutines are user‑level scheduled tasks; unlike kernel‑managed threads, many coroutines can share a few OS threads. When a coroutine suspends, its stack state is saved by the scheduler, allowing the underlying thread to execute other coroutines, dramatically reducing context‑switch overhead.

Quasar achieves this by instrumenting bytecode: it inserts a flag field and state‑machine logic so that a method can be paused and later resumed at the exact bytecode position.

Example instrumented method:

public String instrumentDemo(){
    initial();
    String ans = getFromNIO();
    return ans;
}

During the first execution the flag is 0, initial() runs, then the method reaches getFromNIO() . Quasar saves the current stack, suspends the coroutine, and returns control to the thread pool. When the NIO operation completes, the scheduler resumes the method, sees the flag = 1, jumps directly to the return statement, restores the saved stack, and finishes.

3. System refactor with Quasar

After adding the Quasar Maven plugin, developers can create fibers (lightweight coroutines) similarly to threads:

new Fiber(() -> {
    // method body
}).start();

Integration with Netty’s async‑http‑client is straightforward: the client returns a CompletableFuture , which can be awaited inside a coroutine via AsyncCompletionStage.get(future) :

AsyncHttpClient httpClient = Dsl.asyncHttpClient();
Request request = createRequest();
CompletableFuture
future = httpClient.executeRequest(request).toCompletableFuture();
Response response = AsyncCompletionStage.get(future);
deal(response);

For RPC frameworks that expose only callback interfaces, a CompletableFuture can be created manually and completed inside the callback, then awaited in the same way:

CompletableFuture
future = new CompletableFuture<>();
new RpcClient().helloAsync(request, (resp, e) -> {
    if (e == null) future.complete(resp);
    else future.completeExceptionally(e);
});
Response response = AsyncCompletionStage.get(future);

Quasar also provides a generic helper to wrap any asynchronous RPC call:

@FunctionalInterface
private interface RpcAsyncCall
{
    void request(TReq req, Callback
cb);
}
public
TRes waitRpc(RpcAsyncCall
call, TReq req) throws SuspendExecution {
    CompletableFuture
future = new CompletableFuture<>();
    call.request(req, (resp, e) -> {
        if (e == null) future.complete(resp);
        else future.completeExceptionally(e);
    });
    return AsyncCompletionStage.get(future);
}

4. Handling blocking operations

Because the coroutine scheduler’s thread pool is fixed, blocking I/O (e.g., database, message queue) should be off‑loaded to a separate scalable thread pool. A helper method submits the blocking call to an ExecutorService , completes a CompletableFuture , and then awaits it with Quasar:

public void waitBlocking() throws SuspendExecution {
    String ans = waitBlocking(this::selectFromDB);
}
private ExecutorService threadPool = Executors.newCachedThreadPool();
private
T waitBlocking(Supplier
supplier) throws SuspendExecution {
    CompletableFuture
future = new CompletableFuture<>();
    threadPool.submit(() -> {
        T ans = supplier.get();
        future.complete(ans);
    });
    return AsyncCompletionStage.get(future);
}

5. Concurrency primitives

Coroutines cannot be suspended inside a synchronized block; otherwise lock state may become inconsistent when the coroutine resumes on a different thread. Therefore, only non‑blocking primitives (atomic variables, concurrent collections) should be used inside critical sections.

6. Results and limitations

After the Quasar‑based refactor, the system on a 4‑core host kept the thread count stable even when traffic surged dozens of times, while CPU utilization rose from under 5 % to 10‑60 %. The overall CPU core requirement dropped to one‑fifth of the original deployment.

Limitations include the need to manually declare SuspendExecution on every method that may suspend, potential bytecode‑instrumentation bugs, and incompatibility with standard Java language features (no native support yet). Future work points to Project Loom, which aims to provide lightweight virtual threads directly in the JVM.

JavaperformanceNIOcoroutinesasynchronous ioQuasar
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.