How to Implement Cross-Thread Log Sampling in High-Throughput Java Services
Log volume can cripple high-throughput Java systems, especially during traffic spikes, so this article explores practical strategies for request-level log sampling, cross-thread trace propagation, and component-based APIs to balance observability with performance, including code examples and implementation considerations.
1. Background
System logs are essential for tracing user actions and diagnosing issues, but excessive logging in high-throughput JD services can severely impact performance and disk I/O, especially during large promotions.
Because there is no unified logging framework across the group, many teams wrap log4j or logback and add dynamic downgrade features via the configuration center, which reduces log volume but also loses operation traceability.
Therefore, sampling requests to control log output proportionally to system load is proposed: lower sampling rates when throughput is high, increase them when load is low.
2. Main Content
Ⅰ
At the request entry point, a sampling algorithm decides whether to sample the request, and Java's ThreadLocal can store the sampling flag for the current thread. However, logs often originate from child threads, thread-pool threads, or nested threads, making simple ThreadLocal insufficient.
Key challenge: propagate the sampling identifier across threads to ensure consistent request-level sampling.
Possible approaches:
Wrap thread-pool or task classes (Runnable, Callable) similar to Transmittable Thread Local, as JD uses utilities like pfinder or jade .
Explicitly pass the sampling flag in code wherever asynchronous execution occurs.
Encapsulate request-thread control logic into a reusable component that exposes an API; business systems can decide whether to use the API for asynchronous threads.
Without a group-wide API, the third approach is pragmatic: integrate a component for request-thread log sampling and extend its API to coordinate sampling across asynchronous threads, modifying code only where necessary.
Guidelines:
If a system uses few asynchronous threads, simply integrate the component for request-thread sampling.
If a system heavily relies on async threads, adopt the component and extend the API at critical points to maintain sampling consistency.
AOP could be considered for new projects, but retrofitting existing code may be costly.
Ⅱ
Additional considerations include sampling algorithm choice (random vs. traceId modulo), scenario-based sampling tied to request parameters, API design to minimize business impact, ensuring traceId consistency across async logs, granularity of sampling rates, hierarchical sampling settings, error-only vs. info-plus-error logging strategies, and mechanisms to throttle massive error logs and link log rate to disk I/O.
3. Practice
In the promotional transaction system, we implemented request-thread log sampling on top of the existing JD JSF filter component and exposed an extension API for business services, achieving system-wide request-level sampling with minimal code changes. The diagram below outlines the implementation flow.
When asynchronous threads are involved, the extension API simplifies the transformation:
// Before refactor: thread pool task execution
threadPoolExecutor.execute(() -> "your business logic");
// After refactor: wrap execution to propagate sampling flag
threadPoolExecutor.execute(XxxUtils.wrap(() -> "your business logic"));Feel free to comment for further discussion of implementation details.
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.