Big Data OLAP Applications and Practices: Insights from Xiaomi and 58.com
The article reviews the 2018 58 Group technology salon on big‑data OLAP, summarizing Xiaomi’s one‑stop OLAP architecture, 58.com’s challenges and solutions using Kylin, Druid, and UnionSQL, and the practical implementations and optimizations that illustrate modern OLAP practices.
Background
On November 13, 2018, the 58 Group Technology Salon (Session 3) titled “Big Data OLAP Application and Practice” was held at the Beijing headquarters, organized jointly by the 58 Group Technical Engineering Platform and the HR Magic Academy. Speakers from Xiaomi’s AI & Cloud Platform big‑data team, 58 TEG Data Intelligence team, and related business‑line developers shared their OLAP experiences.
Key Takeaways
1. Xiaomi One‑Stop OLAP Solution
1.1 Xiaomi Big‑Data Architecture Overview
As a company‑level big‑data R&D team, Xiaomi organizes internal business data into a layered “data pyramid”, which forms the core of its big‑data platform architecture.
(Figure provided by Xiaomi)
The raw data layer aggregates original business data, which is then cleaned and transformed into a data warehouse (middle layer). Aggregated wide tables (summary layer) are built per business domain, and finally data is loaded into various engines for application use. Unified data management and job scheduling platforms handle data governance and task orchestration.
On top of the data pyramid, Xiaomi provides a unified data service layer that supports OLAP queries, point queries, behavior analysis, and model capabilities, offering a consistent interface, authentication, compliance auditing, quality monitoring, caching, and cross‑engine, cross‑datacenter transparent query capabilities.
1.2 One‑Stop OLAP Solution
The solution consists of data governance tools and a query engine called UnionSQL. UnionSQL provides a unified SQL interface, a self‑developed Query Router for parsing, splitting, and result merging, as well as Lambda and cross‑datacenter capabilities. It uses Apache Kylin for batch processing and Elasticsearch/Kudu for speed layer, enabling real‑time analytics.
(Figure provided by Xiaomi)
Unified SQL interface
Self‑developed Query Router with SQL parsing, splitting, result merging, Lambda and cross‑datacenter support
Batch layer powered by Apache Kylin, speed layer by Elasticsearch/Kudu for Lambda capabilities and real‑time analysis
1.3 OLAP Application Case: Xiaomi Intelligent Data Analysis Decision System
Step 1: Built a company‑wide BI system that aggregates key data and, together with the user‑profile platform, provides natural‑language data query and visualization.
Step 2: Implemented automated dimension splitting using pre‑computation engines like Apache Kylin, allowing users to drill down into dimensions that contribute most to metric changes.
Step 3: Integrated anomaly detection and other mathematical models to proactively discover issues, not just answer queries.
2. 58.com OLAP Technology Application and Practice
58.com faces massive OLAP scenarios across business lines (real‑estate, recruitment, classifieds, automotive, etc.), generating hundreds of terabytes of new data daily and demanding high‑performance analytical queries.
Challenges:
Data scale: daily addition of billions of rows and rapidly expanding dimensions (from dozens to hundreds).
Development efficiency and cost: traditional MR/Hive/Spark pipelines require many manual steps to build cubes, leading to high maintenance overhead.
Query speed: most interactive queries must return results within 10 seconds.
Real‑time processing: A/B testing, new‑business tracking, advertising effectiveness, etc., increasingly rely on real‑time OLAP.
Ad‑hoc queries: users need flexible, self‑service query capabilities.
To address these, 58.com combines Kylin, Druid, SQL‑on‑Hadoop, and a self‑developed real‑time processing engine, delivering services through the “Cloud Window” self‑service analytics platform, which reduces OLAP adoption barriers and cuts repetitive development costs.
For the WMDA user‑behavior analysis platform, which involves hundreds of dimensions, long time spans, and diverse query patterns, a Druid‑based OLAP solution is employed.
These implementations illustrate the evolution of OLAP practice at 58.com and highlight the ongoing challenges of building a modern, virtualized, and democratized data warehouse.
3. Druid Technical Practice in 58.com
3.1 Druid Introduction
Druid is a high‑performance, low‑latency real‑time OLAP engine. 58.com adopts community version v0.9.2 and has performed extensive functional optimizations and platform‑level enhancements, dramatically improving performance and stability for multiple departments.
3.2 Functional Optimizations
Multi‑tenant architecture for building, storage, and query to ensure stability of critical services.
Replaced default cache with CaffeineCache for better read/write performance.
Developed segment merge functionality, saving about 60% storage space and boosting query speed.
Added SQL query capability and fixed several bugs (memory leaks, OOM, slow‑query‑induced pending tasks).
3.3 Platformization
Simplified onboarding: users can connect via a web form, monitor storage usage, task status, and control business lifecycle.
Built comprehensive monitoring and alerting, added metrics, and enhanced daily statistics and task analysis.
3.4 Typical Cases
Druid now serves WMDA, Lego, Sundial, DSP effect evaluation, whole‑site real‑time multi‑dimensional analysis, etc., handling over 650 data sources, more than 25,000 daily build tasks, ingesting over 600 billion raw records, and supporting up to 110 dimensions.
4. Summary
OLAP technology is a vital component of the big‑data ecosystem, widely applied in both 58 Group and Xiaomi Group. The salon highlighted concrete use cases, current challenges, and one‑stop solutions, fostering knowledge exchange and encouraging further innovation in OLAP practice.
Future salons will continue to explore breakthroughs and integration of underlying technologies.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.