Big Data 18 min read

NetEase Cloud Music Real-Time Data Warehouse Architecture and Low-Code Platform Practices

This article presents NetEase Cloud Music's real-time data warehouse architecture, covering its streaming and batch scenarios, layered design (ODS, CDM, ADS), technology stack choices, consistency mechanisms, the FastX low-code platform, and future development plans, offering a comprehensive technical overview for data engineers and architects.

DataFunTalk
DataFunTalk
DataFunTalk
NetEase Cloud Music Real-Time Data Warehouse Architecture and Low-Code Platform Practices

The presentation introduces NetEase Cloud Music's real-time data warehouse architecture, divided into four parts: an overall overview, consistency implementation of the stream‑batch model, low‑code platform achievements, and future planning.

Real‑time scenarios include feature generation for recommendation and search, monitoring of resource allocation and AB testing, and data products such as real‑time dashboards that support personalized content delivery.

The architecture follows a dimensional modeling approach with four layers: ODS (log collection from servers and binlog), CDM (data cleaning and intermediate layers for traffic and business details), ADS (application layer using OLAP engines like ClickHouse, Impala, Kudu), and a business decision layer that feeds recommendation, risk control, and precise song promotion.

Technical choices consist of Flink + Kafka for streaming from ODS to CDM, Spark + Hive for offline processing, and ClickHouse/Impala/Kudu for low‑latency queries; the system balances second‑level real‑time needs, minute‑level near‑real‑time analytics, and hour‑level batch processing.

To ensure model consistency, a Lambda architecture is employed alongside a partition flow table that maps partition fields to Kafka topics, supporting static and dynamic partition management via ZooKeeper and metadata services.

The FastX low‑code platform addresses computation consistency by providing model‑centric, component‑based, configurable, and service‑oriented development; it abstracts tasks into views, AB models, and relational models, enabling drag‑and‑drop pipeline creation, automated binlog parsing, ClickHouse sink simplification, and lineage tracking.

Future plans focus on strengthening infrastructure, integrating low‑code with stream‑batch processing, and exploring lake‑warehouse solutions, while leveraging FastX's data model and lineage for asset governance and evaluation.

The session concludes with a Q&A covering metric definition, high availability, binlog usage, data consistency checks, and handling late‑arriving data.

Big DataFlinkStreamingKafkaClickHousereal-time data warehouselow-code platform
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.