Big Data 23 min read

NetEase Big Data Platform: HDFS Optimization and Practice

This article presents NetEase's big data platform architecture, detailing multi‑layer storage and compute design, HDFS deployment challenges, NameNode and NameSpace performance optimizations, cluster scaling strategies, data tiering, hardware upgrades, and real‑world business use cases, illustrating practical large‑scale big data engineering.

DataFunTalk

Mar 30, 2022

NetEase Big Data Platform: HDFS Optimization and Practice

NetEase has operated Hadoop for over ten years, building a comprehensive big data platform that spans six logical layers: application development, application scenarios, data computation, data management, data storage, and data sources, with cross‑cloud deployment and a focus on open‑source principles.

The platform provides visual development tools for SQL‑based data queries and job execution, supports various data formats including structured, semi‑structured (JSON, audio/video), and integrates auxiliary services such as Azkaban for job scheduling, Kerberos for authentication, and unified metadata management.

HDFS serves as the core distributed storage, with optimizations for high‑throughput workloads (e.g., using HBase) and a robust architecture featuring active/standby NameNodes, multiple DataNodes, and JournalNodes for high availability.

Key challenges addressed include rapid cluster scaling to thousands of nodes, managing massive metadata growth, and reducing NameNode startup latency. Optimizations involve parallel loading of FSImage and INODE structures, concurrent validation, multithreaded metadata parsing, and configurable lease handling to minimize RPC overhead during DataNode full‑report cycles.

Performance improvements also encompass NameSpace isolation via Router‑based federation, RPC priority queues, and resource‑aware scheduling to protect critical workloads, achieving up to 80% faster NameNode restarts and 20%+ RPC throughput gains.

Data management strategies feature tiered storage (hot vs. cold clusters), storage‑compute separation, and the adoption of high‑density hardware (NVMe) to boost I/O performance while reducing costs, with erasure coding (RS6x3) applied to cold data for space efficiency.

Operational monitoring leverages Matrix metrics and custom scripts to track RPC latency, DataNode heartbeats, and system resource usage, enabling rapid detection of anomalies and ensuring 24‑hour service availability.

Real‑world use cases span NetEase's consumer services (Music, News, Yanxuan) and enterprise offerings (supply chain, finance, media), demonstrating the platform's ability to handle petabyte‑scale workloads with high reliability.

The article concludes with a call for community engagement and highlights upcoming resources such as a downloadable big‑data PPT collection.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Data Management YARN HDFS Cluster Optimization NetEase

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.