Big Data 13 min read

Optimizing Query Performance in WeChat's Multi‑Dimensional Monitoring Platform with Druid and Redis

The article details how WeChat's multi‑dimensional metric monitoring platform, which handles billions of data points per minute, reduced average query latency from over 1000 ms to around 140 ms and achieved over 85% cache hit rate by analyzing query behavior, redesigning the data layer architecture, splitting queries into sub‑queries, adding Redis caching, and introducing sub‑dimension tables.

FunTester
FunTester
FunTester
Optimizing Query Performance in WeChat's Multi‑Dimensional Monitoring Platform with Druid and Redis

WeChat's multi‑dimensional metric monitoring platform processes billions of data points per minute, but query latency exceeded 1000 ms and failure rates were high.

Analysis of user query behavior showed that over 99 % of requests are time‑series queries, many spanning days, causing heavy segment I/O in Apache‑Druid and overwhelming brokers.

The platform’s data layer uses Druid’s Master, Real‑time, and Historical nodes; large segment sizes and complex dimension combinations further degrade performance.

To address these issues, the team designed three optimization steps: (1) split large queries into finer‑grained sub‑queries distributed across multiple brokers; (2) introduce a Redis cache for sub‑query results, reducing segment I/O; (3) create sub‑dimension tables to shrink segment size for high‑cardinality dimensions.

Sub‑query requests are generated per day (or per hour for dimensions) and cached; cache hits exceed 85 % (full hit 86 %, partial hit 98.8 %). Queries older than one day are served entirely from cache, and recent hot data uses a short‑term cache window.

After implementation, average query latency dropped from >1000 ms to ~140 ms (P95 ≈ 220 ms) and Druid access frequency fell to about 10 % of the original volume.

{
    "biz_id": 1,
    "formula": "avg_cost_time",
    "keys": [
        {"field": "xxx_id", "relation": "eq", "value": "3"}
    ],
    "start_time": "2020-04-15 13:23",
    "end_time": "2020-04-17 12:00"
}
performanceBig DataCacheRedisQuery OptimizationDruid
FunTester
Written by

FunTester

10k followers, 1k articles | completely useless

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.