Designing Efficient and Agile Real-Time Big Data Analytics Platforms for Enterprises
The article explains how enterprises can build a comprehensive big data analytics platform—covering data collection, storage, computation, and decision layers—by clarifying business scenarios, choosing appropriate on‑premise or cloud deployment, selecting suitable architectures such as Lambda/Kappa, and addressing component choices and emerging technical trends.
Big data analytics platforms are enterprise‑level solutions used for analysis and decision‑making in big data environments.
According to technical architecture, they consist of three layers: data collection & storage, data computation, and data analysis & decision. Compared with a data middle platform, the focus is on analytical and decision capabilities, de‑emphasizing data governance, and they now incorporate deep‑learning techniques on top of OLAP to broaden analytical depth while lowering business‑side barriers.
Enterprises build such platforms to aggregate data from all business systems, enabling real‑time wide‑table analysis, BI reporting, user‑behavior analysis, self‑service analytics, and AI‑driven insights.
Overall approach: first clarify business‑scenario requirements and select the platform framework and components based on data volume.
Before construction, users must define their data volume and business scenarios, determine the information they need, identify data sources, decide on thematic analyses, and specify desired functions.
After defining required functions, the appropriate big‑data processing framework and tools are chosen and integrated to mine and analyze massive data.
Implementation typically proceeds in three stages: establishing a foundational data center with unified storage and modeling; building a data processing center that decentralizes processing and provides unified monitoring for stability; and creating a data application center to deliver unified data services that meet business needs.
Deployment can be on‑premises or cloud‑based. On‑premises solutions are split between Hadoop‑ecosystem platforms and “database + AP engine” data‑warehouse approaches, each with its own advantages and drawbacks.
Cloud deployment leverages public‑cloud “low‑cost storage + elastic compute” data‑lake solutions, offering one‑stop PaaS capabilities, rapid data ingestion, and eliminating the need to manage compatibility, security, performance tuning, or operations.
From an architectural perspective, platforms can be abstracted into Lambda or Kappa architectures, and further divided into offline, online, and real‑time analysis layers, with differences mainly arising from the chosen compute engine.
The data‑analysis layer is the most complex yet innovative, featuring cloud‑native data lakes with compute‑storage separation, elastic scaling, and Docker‑based decoupling for on‑premises deployments to address resource elasticity.
Component selection for on‑premises users depends on data volume: enterprises with annual data growth around 100 TB often adopt open‑source or commercial Hadoop ecosystems due to bandwidth and compliance constraints, while smaller firms (< TB) prefer simpler “database + AP engine” warehouses.
Technical trends aim to break traditional heterogeneous architectures, unifying data capabilities to enhance business value.
Traditional Hadoop and MPP‑based warehouses struggle to fully adapt to cloud environments; Hadoop couples storage and compute within the same cluster, while MPP databases inherently bind compute and storage.
Lake‑warehouse separation creates data silos due to heterogeneous technologies, limited cluster size, and concurrency constraints; integrated lake‑warehouse architectures address these issues by unifying storage and query layers, improving real‑time performance, scalability, and reducing operational costs.
In China, evolution focuses on Hadoop‑based solutions enhanced with transactional storage layers like Hudi and Iceberg, while multi‑cloud warehouses such as Snowflake offer more forward‑looking compute‑storage separation.
The content is excerpted from the "2022 China Big Data Analytics Platform Industry Research Report" and invites readers to download the full report.
How to build a personalized, high‑efficiency, accurate, and agile real‑time big‑data analytics platform has become a challenge for technical teams.
The e‑book "Big Data Analytics Platform" compiles design ideas, architectural evolution, and practical applications from major companies such as QuTouTiao, NetEase, Ant Group, iQIYI, Tencent Games, and 37Mobile.
Scan the QR code or reply with "Data Analysis" to receive the e‑books for free.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.