Artificial Intelligence 17 min read

Music FeatureBox: A Custom Feature Store for Machine Learning at NetEase Cloud Music

Music FeatureBox is NetEase Cloud Music’s custom feature store that centralizes metadata, unifies offline and online feature storage across multiple engines, provides a cross‑language DSL for extraction, ensures training‑inference data consistency, and offers built‑in monitoring, thereby streamlining feature engineering and accelerating the platform’s machine‑learning lifecycle.

NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Music FeatureBox: A Custom Feature Store for Machine Learning at NetEase Cloud Music

In the full lifecycle of machine learning, a Feature Store acts as the bridge between data and models. By storing and managing datasets and data pipelines used in ML processes, it reduces duplicate feature engineering work, enabling efficient feature data development and shortening model iteration cycles.

The evolution from ML-Ops to Feature-Ops reflects the three-part organization of a standard ML system: "data", "model", and "code", corresponding to feature engineering, model training, and model deployment. As AI platforms matured, model training and deployment efficiencies improved, but feature engineering remained reliant on traditional data development processes. To meet ML’s custom data needs, the industry explored ML‑scenario data solutions, giving rise to Feature‑Ops and the Feature Store concept.

A Feature Store was first formalized in Uber’s 2017 Michelangelo Platform. Its primary goals are to promote feature registration, discovery, and reuse while ensuring consistency of feature data between offline batch processing and online application reads. It provides high‑performance, low‑latency online serving and high‑throughput, large‑capacity offline storage for training and batch prediction.

NetEase Cloud Music built its own Feature Store, called Music FeatureBox, to solve specific challenges in its recommendation and search scenarios. The system addresses: (1) feature discovery, governance, and reuse through centralized metadata management; (2) high‑performance feature storage and service by employing multiple storage kernels (MDB, RDB, FDB, TDB) routed via a logical storage layer; (3) consistency of feature data used for training and online inference via a unified data access layer and automated synchronization; (4) feature extraction and operator reuse via a cross‑language, cross‑platform operator library and DSL that enables a single codebase to run in different environments; (5) training sample production and management through standardized APIs that support custom data pipelines; and (6) feature quality monitoring and analysis using statistical metrics to detect data‑related model errors.

The core of Music FeatureBox is the Datahub module, which abstracts feature metadata and encapsulates various physical storage APIs, presenting all reads/writes as feature operations. Datahub also handles serialization, compression, and monitoring. For schema and serialization, the team evaluated JSON and Protobuf, ultimately adopting a hybrid approach: using com.google.protobuf.DynamicMessage and com.google.protobuf.Descriptors.Descriptor together with the open‑source Protostuff library to dynamically compile .proto files, allowing Protobuf‑like efficiency with JSON‑style map‑based manipulation. An example table schema is shown as music_alg:fm_dsin_user_static_ftr_dpb , where the value is a Protobuf schema and the primary/sort keys are added as explicit columns.

The Transform module manages the end‑to‑end process from feature retrieval to model input, expressed through a custom DSL (MFDL) that describes feature extraction logic without binding to specific execution engines. Transform supports three scenarios: offline training (producing TFRecord files from Hive/HDFS), online prediction (returning Vectors from Redis/Tair), and planned real‑time features (yielding dynamic ProtoBuf objects from Kafka/Nydus).

Monitor provides observability into the feature system, collecting three metric groups: basic feature statistics (coverage, capacity, freshness, distribution), service‑level indicators (QPS, latency, error rates), and feature/model drift detectors. It integrates TFX’s Data Validation component for visualizing static dataset statistics, schema‑based validation, and drift detection via two‑sample comparison.

Storage separates offline (Hive/HDFS for TB‑scale batch workloads) and online layers. In the online layer, Cloud Music customized Tair‑based engines: MDB (in‑memory hash table for low‑latency small features), RDB (RocksDB‑based for larger features with bulk‑load support), FDB (FIFO‑compaction RocksDB for log‑like snapshot data), and TDB (custom time‑series engine for aggregated temporal features). Datahub/DataService act as routing proxies, shielding users from low‑level storage APIs while enabling schema definition and engine selection via the Web Console.

In summary, Music FeatureBox is a purpose‑built data system for Feature‑Ops that centralizes metadata, manages feature extraction pipelines, and delivers consistent, high‑performance feature services for model training and inference, thereby accelerating the machine learning workflow at NetEase Cloud Music.

machine learningmlopsProtobufStorageFeature StoretransformDatahubMonitor
NetEase Cloud Music Tech Team
Written by

NetEase Cloud Music Tech Team

Official account of NetEase Cloud Music Tech Team

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.