Backend Development 20 min read

Diagnosing Slow Asynchronous Log Printing in Java: From Mapping Overhead to Disruptor and JNI Analysis

This article investigates why a large number of quality‑check mappings cause severe latency in a Java service, traces the slowdown to excessive asynchronous log printing, explains the Disruptor‑based async logger internals, evaluates stack‑trace location handling and JNI overhead, and proposes practical configuration and code‑level fixes.

Top Architect
Top Architect
Top Architect
Diagnosing Slow Asynchronous Log Printing in Java: From Mapping Overhead to Disruptor and JNI Analysis

1 Problem Background

Our business integrates multiple channels for online recycling, each with its own quality‑check standards. We unified external standards into internal ones via a mapping feature, but after importing a large mapping set the interface became extremely slow, eventually traced to log printing.

2 Problem Verification

In a test environment we maintained over 15,000 channel‑to‑internal mapping entries, and the template conversion interface took about 4 seconds. Reducing the mapping to around 100 entries dropped the latency to roughly 100 ms, confirming a strong correlation between mapping size and execution time.

3 Preliminary定位 (Initial定位)

Using Alibaba's Arthas trace command we found the method’s execution time was dominated by log printing. The mapping process generates massive logs, and the more mappings, the more logs are produced. Commenting out the log‑printing code dramatically improved response speed.

4 Initial Thoughts

4.1 Could the log configuration be synchronous?

Inspection of Log4j2 configuration showed that logging is asynchronous, not synchronous.

4.2 Could multithreaded resource contention cause the slowdown?

A single‑thread test printing 20,000 logs took about 600 ms, while submitting the same number of tasks to a thread pool completed in 20‑30 ms, indicating that multithreading is not the culprit.

@Test
private void test4() {
    long l = System.currentTimeMillis();
    for (int i = 0; i < 20000; i++) {
        log.info(i + "");
    }
    System.out.println("Print log time: " + (System.currentTimeMillis() - l));
}

We also considered whether the async logger’s internal queue could discard tasks when full, similar to a thread‑pool rejection policy.

5 Asynchronous Log Printing Principle

5.1 Disruptor Principle

The async logger uses the LMAX Disruptor framework. Producers write events into a ring buffer, consumers read from it, and sequences coordinate production and consumption.

public synchronized void start() {
    // ...
    disruptor.handleEventsWith(handlers);
    // ...
}

5.2 Process of Putting Log Events into the Disruptor Ring Buffer

private void logToAsyncDelegate(final LogEvent event) {
    if (!isFiltered(event)) {
        populateLazilyInitializedFields(event);
        if (!delegate.tryEnqueue(event, this)) {
            handleQueueFull(event);
        }
    }
}

The tryEnqueue method attempts to publish the event; if the ring buffer is full it returns false and a queue‑full handling strategy (ENQUEUE, SYNCHRONOUS, DISCARD) is applied.

5.3 Using the Test Code to Debug

Debugging showed that the consumer processes events quickly and handleQueueFull is never invoked, so the slowdown is not caused by the Disruptor’s discard policy.

6 Locating the Slow Code

Step‑by‑step commenting identified the getLocation method as the bottleneck. This method retrieves class, method, line, and file information for each log entry, which involves building a StackTraceElement array.

private StackTraceElement getLocation(String fqcn) {
    return requiresLocation() ? StackLocatorUtil.calcLocation(fqcn) : null;
}

Further analysis showed that StackLocatorUtil.calcLocation creates a new Throwable and iterates over its stack trace, which triggers native methods and incurs significant overhead.

6.1 JNI Overhead Hypothesis

Because stack‑trace generation ultimately calls native C++ code via JNI, we hypothesized that the JNI transition could be the performance killer.

public native int helloFromCpp(int i);
static { System.load(System.getProperty("user.dir") + "/libnative.dylib"); }

Benchmarking a native call loop showed negligible cost (≈50 ms for 20,000 × 32 calls), disproving the JNI hypothesis.

6.2 Official Data

Documentation indicates that enabling location information can make logging 30‑100× slower compared to disabling it.

6.3 JNI Principles (Brief)

JNI bridges Java and native code, handling class/method lookup, data type conversion, exception handling, and thread attachment.

Java code calls native methods.

Header files generated by javac -h define the native signatures.

Native implementation provides the actual logic.

7 Conclusion

Avoid excessive unnecessary logs; set observation logs to DEBUG level.

Production environments should log at INFO level.

If location information ( %C, %F, %l, %L, %M ) is not needed, remove those patterns from the log format.

8 Extension (C‑level Implementation)

For readers interested in the C implementation, refer to the OpenJDK source files:

jdk/src/share/native/java/lang/Throwable.c

hotspot/src/share/vm/prims/jvm.cpp

hotspot/src/share/vm/classfile/javaClasses.cpp

References

Disruptor concurrency framework introduction and analysis.

Log4j2 asynchronous logging deep dive (AsyncLogger).

Javaperformance optimizationDisruptorJNIasynchronous logging
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.