Big Data 17 min read

How NetEase Cloud Music Cut Storage Costs by 30% Through Data Governance

This article details NetEase Cloud Music's year‑long data governance initiative, covering data background, governance strategy, project plan, practical actions, results, and future outlook, and shows how metadata‑driven management reduced storage by over 30% while improving reliability and efficiency.

Data Thinking Notes
Data Thinking Notes
Data Thinking Notes
How NetEase Cloud Music Cut Storage Costs by 30% Through Data Governance

Data Background

NetEase Cloud Music operates nine independent products (six domestic, three overseas) and faces massive data scale: over 20,000 online scheduling tasks, more than 50,000 tables, and 12 data projects serving over 600 users across algorithms, analysts, data products, and business services. Daily storage costs exceed 190,000 CNY and compute costs exceed 270,000 CNY.

Quality issues include unstable core tasks and reports, coarse‑grained queue resource usage, and high operational costs. Efficiency suffers because many tasks still run on Hive and Spark 2, generating small files and consuming excessive resources, especially when downstream jobs directly read ODS tables.

The environment spans five domestic clusters and overseas clusters on Alibaba Cloud and AWS.

Data background diagram
Data background diagram

Governance Approach

Problems were identified from a technical perspective across four layers: HDFS files, database tables, model design, and task scheduling/execution engines.

Key issues:

HDFS files lacking management, leading to “orphaned” files that waste resources.

Unrestricted database creation resulting in many unused or poorly named tables.

Model design with low CDM layer reuse, many idle tasks, and heavy reliance on ODS data.

Tasks still running on Hive and Spark 2 with resource‑intensive small‑file problems.

Solutions focused on metadata: collecting HDFS, Hive, and task metadata to enable analysis and monitoring.

Metadata analysis diagram
Metadata analysis diagram

Project Plan

The plan began with acquiring complete metadata from NetEase’s data‑fabric team, covering table‑level, task‑level, and HDFS‑level information.

Metadata modeling produced wide CDM tables and dimension tables, enabling multi‑dimensional views of platform data, usage by teams, domains, individuals, and tables.

Metadata model diagram
Metadata model diagram

The governance framework follows five principles: evidence‑based governance, clear ownership, sustainable mechanisms, measurable outcomes, and reusable methods.

Four action categories support governance: monitoring, standards, tooling, and actual governance execution.

Governance framework
Governance framework

Governance Practices

Ownership

All data, tasks, and tables are assigned owners. ODS dump tasks are configured centrally; responsibilities are linked to developers. Issues with orphaned tables from departed staff or project accounts were addressed through ODS governance, ownership reassignment, and batch‑ownership tools.

Sustainable Mechanisms

A unified promotion mechanism and principles were established to handle cross‑departmental collaboration and resource constraints.

Sustainable mechanism diagram
Sustainable mechanism diagram

HDFS Orphan File Governance

By correlating HDFS and Hive metadata with access logs, orphan files were identified and cleared, releasing over 7 PB of logical storage and removing more than 4.5 million files and directories.

Orphan file cleanup
Orphan file cleanup

Database Governance

From over 70 databases, 27 unused ones were decommissioned, and usage standards were defined, consolidating active databases to 22.

Database governance
Database governance

Table Governance

Four major initiatives:

Temporary table cleanup, reducing both stock and growth.

Lifecycle management, improving coverage.

Large‑table (cost > 100 CNY/day) optimization, targeting 163 tables that accounted for 80% of storage.

AB‑test task optimization, migrating to a new system and retiring legacy tasks.

Table governance
Table governance

Model Design – “Three‑Degree” Metrics

Introduced metrics for progress, health, and value, improving CDM table reuse from 30% to 60% and reducing penetration rate from 20% to 10%.

Three‑Degree metrics
Three‑Degree metrics

Compute Governance

Migration from Hive and Spark 2 to Spark 3, adding AQE, Z‑order, and ZSTD compression, yielding significant resource savings across multiple migration projects.

Spark3 migration
Spark3 migration

Project Outcomes

Cost & Benefit

Storage: 30% of total storage decommissioned; daily storage growth slowed from 170 TB to 55 TB.

Compute: Core and high‑cost tasks saved >30% of compute resources; cluster stability improved; core task delivery advanced from 9:00 am to 5:30 am.

Governance Assets

Created visual dashboards (data‑asset sandbox, three‑degree overview, cost‑storage sandbox, governance effect board) and monitoring tools for orphan files, tasks, and more.

Dashboard overview
Dashboard overview

Standardization

Established comprehensive data development standards covering database usage, temporary table creation, node naming, queue usage, task release, and data‑governance decommission processes.

Future Outlook

Data governance will continue evolving from fragmented to centralized, from reactive to proactive, and from experience‑based to intelligent. The three‑stage management (pre‑, mid‑, post‑governance) emphasizes preventive actions, enriched monitoring, and automated, intelligent solutions.

Future governance roadmap
Future governance roadmap
metadatacost optimizationData GovernanceSparkHadoopcloud music
Data Thinking Notes
Written by

Data Thinking Notes

Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.