Backend Development 14 min read

Optimizing Log4j2 Asynchronous Logging: Configuration, Diagnosis, and Load‑Testing

This article presents a detailed case study of a severe service outage caused by Log4j2 asynchronous logging bottlenecks, explains step‑by‑step diagnostics of JVM, disk, and RingBuffer metrics, and demonstrates how adjusting log4j2.asyncQueueFullPolicy and log4j2.discardThreshold dramatically improves recovery time during load testing.

JD Tech Talk

Dec 19, 2024

Optimizing Log4j2 Asynchronous Logging: Configuration, Diagnosis, and Load‑Testing

The incident began on 2023‑12‑15 when a critical RPC service’s availability dropped from 100% to 0.72% after a sudden surge of error logs; rapid investigation traced the issue to the dependent system’s failure and subsequent logging overload.

Initial checks of GC activity showed increased Young GC counts but no Full GC, while CPU and thread usage remained within acceptable limits. Disk I/O metrics appeared normal, though brief spikes in usage were observed.

Deep analysis of the client’s JVM memory dump revealed that many threads were blocked in the Log4j2 enqueue method, causing a massive RingBuffer (≈1.61 GB) to fill with WARN and ERROR events. The filled RingBuffer triggered lock contention via AsyncLoggerConfig.SynchronizeEnqueueWhenQueueFull, turning parallel processing into a serial bottleneck.

Key Log4j2 settings identified were: log4j2.asyncQueueFullPolicy=Discard – discard logs when the queue is full instead of blocking. log4j2.discardThreshold=ERROR – only discard logs at ERROR level or lower.

Load‑testing verified the impact of different log4j2.discardThreshold values:

// Example configuration
log4j2.discardThreshold=INFO

With INFO, the client required a restart to recover availability. Setting the threshold to WARN reduced recovery time to about 8 minutes, while ERROR or FATAL restored full availability within 2 minutes without manual intervention.

The article also provides a concise overview of Log4j2’s asynchronous logging architecture using Disruptor’s RingBuffer, and a table of recommended configuration parameters (e.g., RingBuffer size, wait strategies, retry counts).

In conclusion, combining log4j2.asyncQueueFullPolicy=Discard with log4j2.discardThreshold=ERROR, avoiding blocking appenders such as KafkaAppender in production, and disabling immediate flush for batch writes ensure that logging does not become a system bottleneck under extreme load.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java configuration Performance tuning log4j2 asynchronous logging RingBuffer

Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.