Design and Implementation of a Log‑Based Service Pre‑warming Component for Java Applications
To mitigate startup latency spikes in Java-based query services caused by class loading, JIT warm‑up, and lazy resource loading, the article presents a generic, low‑cost pre‑warming component that parses local Dubbo and HTTP logs, filters, samples, and replays traffic, detailing its design, implementation, and performance optimizations.
When Java‑based query services start, class loading, HotSpot JIT compilation, and lazy resource initialization can cause noticeable response‑time jitter, especially under high QPS, degrading user experience. The proposed solution is to pre‑warm the service by automatically invoking representative traffic before exposing the service.
The component is built on local log analysis: it extracts Dubbo and HTTP logs according to naming conventions, parses each line into a string list, and then applies a series of filters and sampling to obtain a small, representative set of requests for replay.
The pre‑warming workflow consists of six steps: log collection, log analysis, primary filtering (simple patterns and custom regex), deep filtering (e.g., removing ultra‑fast responses), sampling (selecting a limited number of logs per method or path), and finally executing the replayed HTTP/Dubbo calls while checking convergence criteria.
Key technical challenges include replaying Dubbo calls using the GenericService API, parsing customizable HTTP access logs via Tomcat's AccessLogValve pattern, and ensuring the service is only exposed after successful pre‑warming, which is coordinated through a healthcheck.html hook that the deployment system polls.
Performance optimizations focus on reducing the amount of log data processed (sampling based on method name, limiting to N samples per method) and parallelizing replay calls to cut execution time dramatically.
The component aims for zero‑configuration adoption but also supports flexible settings such as whitelist/blacklist of interfaces, maximum pre‑warm duration, and concurrency limits, allowing teams to tailor the behavior to their specific needs.
Implementation follows a plugin architecture: core functionality is abstracted into interchangeable modules (simple vs. precise implementations), which are combined to achieve the final behavior, including configurable checkers for convergence.
Three built‑in filters are provided: pattern‑based filtering (e.g., excluding healthcheck URLs), error‑code filtering (dropping 4xx/5xx responses), and response‑time filtering (ignoring requests faster than a configurable threshold, such as 10 ms).
Sampling logic can be configured to limit the total number of log lines (e.g., 500) or to sample a fixed number of requests per method/path (e.g., 20), ensuring a manageable workload while preserving coverage.
Convergence is determined by multiple criteria: a minimum number of successful calls, absolute difference between the last two response times below a threshold, slope‑based comparison, and deviation from an expected response time.
Future directions include moving the log source to a cloud‑based log pool, providing a centralized pre‑warm service that can be extended with additional features, and handling the overhead of increasingly complex configurations.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.