Big Data 13 min read

58.com Big Data Application Practice: Architecture, Challenges, and Solutions

This article presents 58.com’s large‑scale big data platform, detailing its business scope, the WMDA one‑stop analytics system, the Wanxiang user‑portrait service, the technical challenges of massive daily data ingestion, multi‑dimensional analysis, OLAP engine selection (Kylin, Druid), bitmap‑based user‑group processing, scheduling, and overall data service architecture.

DataFunSummit

Dec 11, 2020

58.com Big Data Application Practice: Architecture, Challenges, and Solutions

58.com, the leading classified‑information site in China, handles tens of billions of daily events, requiring a robust big‑data infrastructure to support strategic, investment, and operational decisions.

Business Scope

Multiple verticals such as real‑estate, recruitment, classifieds, social, second‑hand goods, pets, vehicles, finance, and community.

Daily traffic reaches tens of millions of UV and hundreds of billions of new records.

WMDA – One‑Stop User Behavior Analysis Platform

Provides intelligent data collection (zero‑code, manual, cross‑platform) and scenario‑driven analysis (real‑time flow, multi‑dimensional reports, retention, conversion, ad monitoring, channel operations, behavior trace).

Supports both approximate and precise UV counting using count‑min sketch and HyperLogLog.

Adopted Druid as the core OLAP engine after evaluating Kylin, Point, and ClickHouse, offering roll‑up, pre‑aggregation, columnar storage, and sub‑second multi‑dimensional queries.

Technical Challenges and Optimizations

Massive daily data volume (hundreds of billions of records) and hundreds of analysis dimensions.

Real‑time and offline data consistency.

Cube construction time and storage overhead in Kylin, leading to a shift toward Druid.

Bitmap‑based user‑group processing using RoaringBitmap and count‑sketch to enable fast set operations and ID reverse lookup.

Segment merging, cache tuning, hot‑cold data separation, and file‑size control to improve query performance.

Wanxiang – Intelligent User‑Portrait Platform

Offers DMP + UDP capabilities: tagging, analysis, insight, and outreach, with APIs for online and offline usage.

Architecture includes data ingestion, computation, and service layers, supporting multi‑tenant isolation.

Handles high‑throughput user‑group extraction via bitmap, Elasticsearch, and Parquet + Spark engines.

Provides scheduling (periodic and trigger‑based) using 58DP, Kettle, and TaskServer.

Data Service Layer

Defines hundreds of unified APIs for detail queries, distribution analysis, file download, and batch traversal, ensuring accurate, timely, and performant data delivery.

Conclusion

The evolution from monolithic to modular big‑data architectures at 58.com demonstrates the necessity of scalable storage, real‑time‑offline hybrid processing, and flexible service interfaces to meet diverse analytical needs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Platform user profiling OLAP Druid Kylin

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.