User Segmentation Data‑Driven Operations Intelligent Decision Engine: Data Development Practice at NetEase Cloud Music
This article details the design, challenges, and implementation of NetEase Cloud Music's user‑segmentation data‑driven operation decision engine, covering project background, product architecture, data‑warehouse responsibilities, development workflow, optimization strategies, and the resulting performance and future outlook.
01 Project Background
The "User Segmentation Data‑Driven Operations Intelligent Decision Engine" (code‑named "Nuo‑ren") was launched in late August to drive user growth for NetEase Cloud Music, focusing on three goals: new‑user acquisition, existing‑user activation, and churn recovery.
2. Product Flow
We decompose the product into user layering, business status, gap analysis, and clustering strategy. The strategy combines user activity, consumption, and production data to define targeted actions such as content or benefit outreach, with precise matching of users, items, and contexts.
3. Product Architecture
The architecture consists of a data input layer, an intelligent decision layer, a delivery channel layer (currently push‑based), and a data feedback layer. Multiple teams (data product, data development, front‑end/back‑end, testing) collaborate on the system.
4. Data‑Warehouse Responsibilities
The data‑warehouse serves as the lowest‑level data input layer, providing stable, fast data streams to the Nuo‑ren platform for continuous strategy iteration.
02 Project Challenges
1. Data‑Warehouse Role
Daily strategy data (user profiles, content matching, scenario data) are generated and fed to Nuo‑ren for user group selection, strategy matching, and one‑click delivery. The warehouse also handles full‑link effect analysis and feedback for strategy monitoring and optimization.
2. Challenges
Business complexity: multiple user identities (consumers, creators, artists, fans), diverse content (songs, playlists, videos, podcasts), and varied behavior (social interaction, playback) across many time windows.
Strategy complexity: need for both statistical/predictive tags and fine‑grained scenario data to support diverse copy.
3. Data‑Warehouse Challenges
Functional: clear metric definitions, data quality (validity, consistency, completeness), and strict interface standards.
Non‑functional: stable, scalable architecture; strict timeliness (daily scheduled pushes); resource‑cost management.
03 Project Solution
1. Prerequisites – Data Middle‑Platform & Standard System
All development revolves around NetEase Cloud Music's full‑link data middle‑platform, which includes a proprietary big‑data storage and compute platform, standardized data construction, cost‑effective data‑product tools, CI/CD pipelines, and user‑facing OLAP/Easyfetch tools.
We adopt a dimensional‑modeling approach, building independent layers for content and user domains, aggregating lightweight metrics at the DWS layer, and creating wide tables for downstream analysis.
2. Data‑Development Process & Mechanism
From requirement analysis to CDM layer delivery, the workflow includes data research, bus matrix design, model review, testing, quality monitoring, scheduling, and operation, leveraging CI/CD tools for quality assurance and efficiency.
3. Data‑Warehouse Optimization & Assurance
Cost‑reduction: plug‑in strategy packaging, effect‑based resource release.
Task optimization: dependency reduction, schedule adjustment, node‑level independence, SQL and engine tuning.
Model optimization: partitioning large tables, decoupling heavy models for faster output.
Non‑functional ops: baseline‑level operation, intelligent alerting, acceleration pools, visual monitoring for end‑to‑end stability.
04 Project Outcomes
1. Overall Data Flow Architecture
The final pipeline shows clear separation of user‑profile metrics, dimension layers, and fact tables, forming a stable, well‑structured data model.
2. Production Timeliness
Daily data delivery is kept within defined windows; early‑stage fluctuations were mitigated through CDM and market‑layer optimizations, and baseline ops further stabilized timing.
3. Delivery Effectiveness
Push performance ranks: social interaction > asset change > platform reminder > content recommendation, achieving up to 3% click‑through rates with delivery volumes ranging from tens of thousands to millions.
4. Summary of Achievements
Standardized data system guiding development.
Rigorous end‑to‑end R&D process.
Extensive compute optimizations (dependency, engine, SQL).
Baseline operation ensuring reliable production.
DataOps tooling improving quality and efficiency.
5. Future Outlook
Strengthen baseline operations.
Iterate product features to broaden strategy coverage.
Enhance data service capabilities and asset governance.
Thank you for listening.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.