Big Data 13 min read

Building an Attribution System for NetEase Cloud Music Data Warehouse: Challenges and Solutions

This article presents the problems faced by NetEase Cloud Music's data warehouse attribution system and details a comprehensive solution that includes upgrading the event‑tracking framework, redesigning the attribution model, and launching a unified management platform to improve stability, accuracy, and scalability.

DataFunSummit
DataFunSummit
DataFunSummit
Building an Attribution System for NetEase Cloud Music Data Warehouse: Challenges and Solutions

The presentation introduces the attribution system built for NetEase Cloud Music, outlining three main sections: the problems encountered, the proposed solutions, and future planning.

Problems : The existing data pipeline integrates client logs, server logs, algorithm tags, and business data into a dimension‑DWD‑DWS architecture, but attribution suffers from high latency due to massive log sorting, unclear attribution results, and limited extensibility for new business scenarios.

Attribution Background : User behavior attribution aims to trace the cause of conversion actions (play, like, comment, purchase) across complex app flows, focusing on the last touchpoint that adds content to the playback list rather than the direct UI navigation.

Previous Implementation Issues : The prior ETL‑based approach required costly sorting, produced ambiguous results, and could not handle cross‑day or multi‑level attribution, leading to stability, accuracy, and scalability challenges.

Solution Overview :

1. Event‑tracking framework upgrade : Jointly developed by the data warehouse and front‑end teams, introducing a standardized SDK that records object IDs, positions, content types, and distribution strategies, eliminating large‑scale log joins.

2. Attribution model upgrade : Defines three attribution parameters (PS refer, Multi refer, add refer) and distinguishes between accompanying and non‑accompanying states to capture richer context for playback, comment, and share events.

3. Management platform : Provides unified configuration for requirements, objects, and events, generating dimension tables for downstream attribution analyses.

The upgraded model supports multi‑dimensional analysis across page positions, media content, business scenarios, and traffic classifications, enabling more precise and extensible attribution.

Future Planning : Expand coverage of attribution tracking across client types, introduce first‑touch attribution for entry analysis, standardize SPM definitions across business lines, and consolidate all configuration rules within the management platform to further boost efficiency and reliability.

Overall, the new system improves attribution data stability, accuracy, and extensibility, supporting offline, near‑real‑time, and real‑time analytics for NetEase Cloud Music.

analyticsBig DataData Warehouseevent trackingETLdata attribution
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.