Big Data 26 min read

How Tencent Scaled Its TDW to 8,800 Nodes and Mastered Cross-City Data Migration

Tencent’s senior engineer explains how the TDW (Tencent Distributed Data Warehouse) grew from a few hundred to thousands of nodes, the challenges of cross‑city migration, and the modeling, relationship‑chain, dual‑write tables, and platform strategies they built to ensure seamless, low‑impact data and task migration.

Efficient Ops
Efficient Ops
Efficient Ops
How Tencent Scaled Its TDW to 8,800 Nodes and Mastered Cross-City Data Migration

Introduction

The article describes the operation and migration experience of Tencent's large‑scale data warehouse, TDW, which is the biggest offline processing platform inside Tencent and one of the largest Hadoop clusters in China.

1. Tencent Large‑Scale TDW Cluster

TDW (Tencent Distributed Data Warehouse) is a massive data storage and computation platform. The cluster grew from about 400 nodes a few years ago to 8,800 nodes, supporting a daily scan volume of 20 PB and providing 200 PB of storage capacity. By the end of 2017 the size is expected to reach 20,000 nodes.

Single cluster: 8,800 nodes

Daily scan: 20 PB

Storage capacity: 200 PB

When the cluster reached 8,800 nodes, simple capacity expansion could no longer solve the problem because the existing data center and network architecture could not support further growth, prompting a cross‑city migration.

1.1 Overall Architecture of Tencent Big Data Platform

The TDW platform sits within a five‑layer big data architecture:

Data storage layer (HDFS, HBase, Ceph, PGXZ)

Resource scheduling layer

Compute engine layer (MapReduce, Spark, GPU)

Compute framework layer (Hermes, Hive, Pig, SparkSQL, GraphX)

Service layer providing analytics and machine‑learning capabilities

The migration covers HDFS, GAIA, MapReduce, Spark, Hive, Pig and SparkSQL.

Tencent big data platform architecture
Tencent big data platform architecture

2. Migration Model

2.1 Why Cross‑City Data Migration Is Hard

First , the operation workload is huge – hundreds of petabytes of data, hundreds of thousands of tasks, and tens of thousands of machines need to be moved. Second , business must remain invisible; the system must stay stable and data must not be lost during migration. Third , computation results must stay accurate and job runtimes must not fluctuate noticeably.

The most critical issue is network congestion caused by data crossing between two cities, which can affect all systems sharing the dedicated line.

2.2 Dual‑Cluster Solution

Two completely independent clusters are deployed in two cities. Data and computation are duplicated, so no cross‑city traffic occurs during migration.

Dual‑cluster solution
Dual‑cluster solution

The advantage is zero impact on business; the drawback is the need for a large amount of redundant hardware.

2.3 Single‑Cluster Solution

Only one active cluster exists; data is stored in a single location (either city A or city B). Migration moves a portion of machines and workloads gradually.

Single‑cluster solution
Single‑cluster solution

The risk is that tasks in one city may read data that has already been moved to the other city, causing heavy cross‑city traffic.

2.4 Relationship‑Chain Based Migration Model

A relationship chain captures the data‑to‑task dependencies. By analyzing these chains, the team can decide where data should reside so that computation follows the data, minimizing cross‑city traffic.

Example of a relationship chain
Example of a relationship chain

2.5 Generating Relationship Chains

The platform uses a tool called hadoopdoctor to collect task metadata every five minutes, recording data paths, task IDs, and read/write flags. The collected fine‑grained paths are normalized to table‑level paths, producing atomic data‑access relationships that are aggregated into relationship chains.

hadoopdoctor architecture
hadoopdoctor architecture

2.6 Splitting Large Relationship Chains

Large chains (over 100 k nodes) are split by identifying key nodes that act as cut points. Because finding a single optimal cut is hard, multiple key nodes are used to break the chain into manageable pieces.

Splitting a large relationship chain
Splitting a large relationship chain

2.7 Introducing Hive Dual‑Write Tables

When a key node is identified, the corresponding Hive table is turned into a dual‑write table with two locations (city A and city B). Writes go to both locations, and reads are directed to the nearest location, reducing cross‑city traffic.

Hive dual‑write tables and sync task
Hive dual‑write tables and sync task

2.8 Ensuring Data Consistency

A synchronization task copies data from city A to city B. Downstream tasks are made dependent on the sync task, guaranteeing that they only read data after it has been fully synchronized.

Task dependency for data consistency
Task dependency for data consistency

3. Cross‑City Migration Platform

3.1 Relationship‑Chain Migration Module

The module first processes dual‑write tables, then migrates other data by expanding partitions and using

distcp

. Before migration, users are notified; during migration, write tasks are frozen to guarantee consistency, and data differences between cities are continuously compared until they match.

Relationship chain migration state
Relationship chain migration state

3.2 Platform Assurance Module

Two aspects are covered:

Basic assurance : data validation after migration and sampling‑replay of a vertical path in the relationship chain to verify task correctness.

Monitoring assurance : monitoring data volume changes, task health in the new city, and abnormal traffic spikes caused by unintended cross‑city data access.

4. Migration Strategies

4.1 Independent Deployment of Migration Cluster

A dedicated migration cluster runs

distcp

jobs, consuming mainly network bandwidth while keeping CPU usage low. Using high‑speed network cards and a small number of machines (e.g., 40) can support up to 1 PB of migration traffic.

4.2 Traffic Control During Migration

Traffic control architecture
Traffic control architecture

The migration cluster’s bandwidth is limited to avoid saturating the source or target clusters. Because Hadoop writes multiple replicas, target‑cluster traffic can be twice the migration traffic, so a resource‑pool limit is applied.

4.3 Synchronization Tasks

Sync tasks run in the opposite direction (city B → city A) and have minimal impact on traffic. They are placed in a separate resource pool to finish quickly without affecting other workloads.

4.4 HDFS Cluster Scaling Strategies

During scaling‑down, the whole cluster is taken offline after cleaning data and merging small files. During migration, machines are moved in batches (e.g., 200 nodes per round). New nodes are initially excluded from computation to avoid network saturation caused by Hadoop’s balance mechanism.

data migrationdistributed systemsbig datacloud operationsTDW
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.