Big Data 20 min read

JD Unified Storage Practice: Cross‑Region and Tiered Storage on HDFS

This article details JD's large‑scale HDFS unified storage implementation, covering cross‑region storage challenges, topology design, asynchronous block replication, flow‑control mechanisms, tiered storage strategies, automatic hot‑cold data migration, and the resulting performance and cost improvements for big‑data workloads.

JD Retail Technology

Oct 29, 2024

JD Unified Storage Practice: Cross‑Region and Tiered Storage on HDFS

Overview

With the rise of big‑data workloads, JD Retail built a robust HDFS‑based offline storage platform that supports petabyte‑scale data, millions of daily jobs, and visual management tools for efficient operation.

Cross‑Region Storage

Traditional single‑datacenter deployments cannot meet JD's multi‑datacenter growth, leading to issues such as limited disaster‑recovery, inconsistent metadata, data redundancy, and uncontrolled inter‑datacenter links.

JD adopted a full‑copy plus full‑mesh topology, allowing each DataNode to report to a common NameNode, enabling unified metadata management, consistent data placement, and cost‑effective migration with a 350% efficiency boost.

Challenges include rapid cluster scaling, heartbeat stability across long distances, and complex topology control, all of which require sophisticated monitoring and traffic‑shaping.

Topology and Data Storage

Cross‑domain tags (XTTR) are stored in EditLog and fsimage, allowing directory‑level hot‑cold labeling; the nearest tag is applied to new files, ensuring efficient data placement.

Cross‑Domain Data Flow

When a client writes data, the block’s cross‑domain tag determines the target datacenter; a CR‑check module then issues asynchronous block‑copy tasks to achieve consistency and redundancy.

Cross‑Region Block Supplement

The CR‑check module replaces distcp for block replication, offering higher concurrency and better node selection, while also handling existing data through asynchronous updaters that prioritize high‑priority tables.

Asynchronous Updater

This component processes bulk data updates with a priority queue, polling tasks across directories to avoid blocking large tables and ensuring responsive migration.

Cross‑Domain Flow Control

Separate queues and rate limiters per datacenter prevent bottlenecks on narrow inter‑datacenter links, and RPC requests carry datacenter metadata to direct reads/writes appropriately.

Tiered Storage

To address hot, warm, and cold data, JD classifies storage machines into SSD, HDD, and high‑density HDD, assigning data based on access patterns and using XATTRs for labeling.

An automatic conversion module in the NameNode migrates data between tiers, employing TTL and erasure coding for cold data to improve durability and reduce cost.

Key modules include a data‑access monitor (LRU‑based), a tier‑management module that creates conversion tasks, and a distributed task scheduler that dispatches operations to DataNodes.

These optimizations yielded a 10% overall performance gain, a 30% increase in EC coverage, and a 90% reduction in cold‑data storage cost.

Practical Applications

Two use cases illustrate the benefits: cross‑region lifecycle management reduces redundancy by moving stale data to single‑datacenter, high‑density storage with EC; and data‑scheduling leverages hot‑data detection to rebalance workloads across clusters, improving task latency.

Conclusion

JD's unified storage solution combines cross‑region replication and tiered storage to achieve both performance enhancements and significant cost savings for large‑scale distributed storage systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Data Management Distributed File System HDFS tiered storage Cross-Region Storage

Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.