JD Big Data Platform: Cross‑Region and Tiered Storage Architecture and Practices
This article presents JD's large‑scale big‑data platform, detailing its overall architecture, the challenges of cross‑region storage, the design of a unified cross‑domain data synchronization mechanism, and the implementation of tiered storage to improve performance, cost efficiency, and data reliability across multi‑datacenter clusters.
Guest Speaker: Wu Weiwei, JD Architecture Engineer
Editor: Chen Feijun, Shenzhen University
Platform: DataFunTalk
Overview
With business adjustments and the need to consolidate cluster resources, data migration in large‑scale big‑data systems has become complex and chaotic. This article uses JD's big‑data platform as a case study to illustrate JD's exploration and practice in distributed and tiered storage over the past year.
1. JD Data Platform Architecture Overview
The platform consists of six layers, with the storage layer serving as the foundation for compute engines, tools, services, and applications. The storage subsystem handles several exabytes of data across tens of thousands of nodes in three geographically distributed data centers, processing hundreds of petabytes daily. Visual management and monitoring enable rapid issue localization, ensuring high availability.
2. Cross‑Region Storage
Problems with traditional Distcp‑based synchronization:
Metadata consistency relies on business teams, leading to high cost and long migration times.
Uncontrolled inter‑datacenter traffic affects synchronization tasks.
Redundant data copies increase storage and sharing costs.
Lack of disaster‑recovery across data centers.
Architecture: JD introduced a cross‑region data synchronization feature at the storage layer, providing strong consistency, automatic data sharing, and business‑transparent migration. The design follows a “full‑storage + global topology” model to achieve cross‑datacenter disaster recovery and storage capabilities.
Cross‑Region Data Flow:
Asynchronous flow: Data is first written locally, then the NameNode automatically synchronizes it across regions, offering write performance comparable to non‑cross‑region scenarios and lower latency than Distcp.
Synchronous flow: A pipeline connects all DataNodes across sites, synchronizing data in a single pass for workloads requiring strong consistency.
Topology & Datacenter Awareness: A topology management module adds a datacenter dimension to node placement, allowing the system to control block distribution and client traffic based on datacenter locality.
Cross‑Region Identifier: Metadata tags describe replication factors per datacenter (e.g., A:3,B:2,C:2) and include status, lifecycle periods, and start/end timestamps. Erasure Coding (EC) blocks also carry cross‑region attributes to reduce synchronization traffic.
Cross‑Region Flow Control:
Bandwidth throttling for cross‑region block replication.
Client‑side read/write preference for local DataNodes.
Back‑pressure mechanisms on both new and legacy clients to protect core network bandwidth.
In‑datacenter load balancing.
3. Tiered Storage
Problems:
Cold and hot data are not distinguished, preventing targeted acceleration.
Diverse hardware types are treated uniformly, under‑utilizing specialized storage.
Data governance requires extensive manual effort.
Solution: Implement a tiered storage framework that tags data as Hot, Warm, or Cold and classifies hardware as SSD, HDD, or high‑density. Hot data is placed on high‑performance SSDs, while cold data resides on high‑density nodes, reducing cost.
Use Cases:
Storage acceleration: Move hot data to high‑performance nodes during peak periods.
Cold data archiving: Store cold data on high‑density nodes and optionally convert to EC for further cost savings.
Logical sub‑clusters: Partition data by business or directory for isolation and targeted performance tuning.
Architecture: The tiered storage system is built within the NameNode and includes:
Tiered policy configuration (external API and internal settings).
Policy distribution API for offline analysis and business‑side rule injection.
Built‑in LRU‑based tiering strategy.
Tag manager for directory and node tags.
Data distribution validator to enforce tag‑based placement.
Existing data satisfier to scan and migrate legacy data according to policies.
Core Design: Two modules – metadata tag management for hot/cold classification and a virtual multi‑topology tree for node segregation – enable efficient node selection. Incremental writes consult tags to choose target nodes, while background processes scan existing data and migrate it according to the virtual topology.
4. Q&A
Q1: How is data migrated to high‑density clusters?
A1: Using tiered storage rules, data is classified into cold, warm, and hot. Warm data is moved via a balancer‑like mechanism to high‑density nodes; cold data is handled by an internal scheduler that scans, tags, and triggers EC conversion.
Q2: Does JD consider HDFS billing?
A2: Yes. Write operations are weighted ten times higher than reads due to NameNode pressure. Billing data is aggregated at the HDFS Router and fed back to the NameNode for throttling excess usage.
Q3: How is NameNode pressure mitigated?
A3: New modules monitor lock time on the NameNode; if a module exceeds a threshold, its lock usage is dynamically reduced to protect overall throughput.
Thank you for attending the session.
Feel free to like, share, and give a three‑click boost at the end of the article.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.