Evolution of 37 Mobile Games' Multi-Dimensional Analysis Platform: From MySQL to StarRocks
The article details how 37 Mobile Games built and continuously evolved a multi-dimensional analytics platform—covering business background, data challenges, the migration from MySQL through Druid, Impala, ClickHouse to StarRocks, self‑service data tools, monitoring, and future roadmap—highlighting technical decisions and lessons learned.
Business Background : 37 Mobile Games operates over 2,000 games with ~30 million monthly active users, facing diverse data sources, complex ad‑media integrations, and high‑frequency updates.
Data Analysis Challenges : Real‑time requirements, high dimensionality (exploding ad‑plan combinations), and massive data volumes strain traditional databases.
OLAP Platform Evolution : Initially used MySQL for aggregated reports; introduced Druid for behavior analysis; later adopted Impala for ad‑hoc queries; migrated to ClickHouse for high‑performance reporting and automated ad‑placement; evaluated commercial tools (Alibaba ADB, Hologress) and finally integrated StarRocks for its versatile data models and lower operational overhead.
Impala Read/Write Flow : Client submits a request → StateStore registers the query → Query Planner, Coordinator, and Executor parse SQL, distribute tasks to ImpalaDaemons, aggregate results, and return to client.
Impala Advantages & Disadvantages : Benefits include MPP architecture, Hive compatibility, and efficient query execution; drawbacks are single‑point components (Catalog, StateStore), coarse resource isolation, metadata refresh limitations, memory‑overflow risk, and limited concurrency.
Why ClickHouse Is Fast : Multi‑core parallelism, SIMD support, diverse table engines, columnar storage with compression, vectorized execution, and distributed processing across shards.
ClickHouse in Advertising Automation : Uses ReplicatedMergeTree for local joins, handles high‑frequency updates via append‑only inserts, and applies FINAL merges cautiously to balance performance.
StarRocks Features : Four data models (detail, aggregation, update, primary‑key); primary‑key model suits user‑profile scenarios; supports bitmap operations for audience selection; offers built‑in monitoring dashboards without external dependencies.
Self‑Service Data Platform : Enables business users to select dimensions, metrics, and granularity, automatically generating SQL tasks executed on Impala, reducing data‑engineer workload by >80%.
Platform Service Health Monitoring : Collects logs and performance metrics, visualizes via Prometheus + Grafana, and triggers alerts based on thresholds.
Data Quality Monitoring & Alerting : Implements a four‑stage pipeline (scheduling, backend service, execution engine, alert service) to detect anomalies, with notifications via SMS, WeChat, or phone.
Future Plans : Consolidate components to reduce operational complexity, adopt SaaS solutions (e.g., Hologress) for tighter integration, and explore ELT workflows to shorten data pipelines.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.