Cloud Native 17 min read

Cloud-Native Intelligent Data Management Architecture for Baidu Search Platform

Cloud-native redesign of Baidu's search middle platform introduces partition, shard, replica, and addressing controllers that enable elastic scaling, on-demand resource allocation, precise fan‑out, and localized computation, reducing capacity adjustment time from weeks to hours, cutting costs by 30‑80%, raising availability above 99.9% and halving query latency.

Baidu Geek Talk

Dec 15, 2021

Cloud-Native Intelligent Data Management Architecture for Baidu Search Platform

Background: Baidu's search middle platform powers many scenarios such as Aladdin special results, vertical‑domain search, and in‑app search, handling hundreds of retrieval use‑cases and petabyte‑scale content. The traditional architecture required manual capacity planning, data‑size estimation and static deployment, which caused long cycles, high operational cost and stability risks when data grew rapidly.

Problems of the previous architecture:

Efficiency: manual capacity adjustments took weeks or months, unable to keep up with frequent scaling needs.

Cost: static resource allocation could not match the highly variable access patterns of different business scenarios.

Stability: as data volume grew from billions to tens of billions, full‑fan‑out across hundreds of shards dramatically reduced overall availability.

Performance: join‑like operations required multiple distributed queries, leading to latency spikes.

To address these issues, a cloud‑native redesign was introduced, featuring elastic scaling, on‑demand resource allocation, and data‑driven scheduling strategies.

Core architectural components :

Partition controller: decides how to split data into logical partitions (slots) based on business characteristics (e.g., hot‑cold, scenario‑based).

Shard controller: creates and adjusts shard services according to data volume and partition strategy.

Replica controller: selects container resource packages and replica counts according to QPS and latency requirements.

Addressing controller: provides dynamic service discovery and routing based on partition and shard information.

Overall workflow :

Evaluate business search scenarios and data features; generate control policies (partition, shard, replica, addressing).

Write partitioned data to the data center via the partition controller.

Initialize shard services (number of shards, assigned partitions) via the shard controller.

Allocate instances and register services according to replica policies via the replica controller.

Business queries are routed to the appropriate content data via the addressing controller.

Elastic scaling design :

Both vertical (increase container resources, replace SSD) and horizontal (adjust replica count) scaling are considered. Horizontal scaling is preferred for search workloads.

Horizontal scaling for traffic spikes:

Detect traffic increase or receive manual scaling command (e.g., increase shards from 2 to 4).

Shard controller computes new shard requirements and creates new data intervals (e.g., shard1=[2,4), shard3=[6,8)).

Addressing controller updates routing rules after new shards become ready.

Old shard data is cleaned up (e.g., shard0=[0,4) → shard0=[0,2), shard1=[4,8) → shard2=[4,6)).

Result: capacity adjustment time reduced from weeks to hours.

On‑demand resource allocation :

Cold‑hot data separation is used to match resource packages to workload characteristics. Example scenarios:

Scenario

Data Scale

QPS

Latency Requirement

Scenario A

Millions

10k+

Low latency (≈10 ms 99th percentile)

Scenario B

Ten‑millions

1k+

Low latency (≈100 ms 99th percentile)

Scenario C

Hundreds of millions

2k+

Higher latency tolerance (≈200 ms 99th percentile)

Resource packages (high‑performance SSD, high‑capacity HDD, etc.) are chosen per scenario, achieving up to 80 % cost reduction in typical cases.

Precise fan‑out data scheduling :

Full‑fan‑out across hundreds of shards reduces overall availability (e.g., 99.99% ^ 100 ≈ 99%). The precise fan‑out strategy groups data by common attributes (e.g., user ID, shop ID) to limit the fan‑out depth.

Example for shop‑level search:

Partition controller writes data using a key such as sliceKey=uid-123 to determine the group.

Shard controller allocates a small number of shards per group (e.g., 2 shards for a small group, 32 for a large group).

Addressing controller routes queries based on the same sliceKey.

Effect: fan‑out depth reduced from hundreds to as low as 2–32 layers.

Localized computation data scheduling (live‑broadcast e‑commerce example):

Join‑like queries (search A then search B) cause latency spikes. By co‑locating related data in the same partition and shard, computation becomes local.

Key steps:

Build side tags data with a common identifier, e.g., sliceKey=sid-1.

Partition controller ensures related data share the same partition.

Shard controller places the partitioned data into the same shard service.

Search side uses the same sliceKey=sid-1 to locate the shard via the addressing controller.

Result: average latency reduced by 50 % in the live‑broadcast scenario.

Actual effects :

Business data flow and capacity changes now scale automatically within hours.

Average cost savings of 30 % across all workloads, up to 80 % in specific scenarios.

Stability improved: overall availability increased from ~99 % to >99.9 % for large‑scale fan‑out cases.

Latency reduced by up to 50 % for association‑heavy queries.

Conclusion & Outlook :

The cloud‑native redesign of Baidu's search middle platform solves efficiency, cost, stability and performance challenges by introducing elastic scaling, on‑demand resource allocation, precise fan‑out and localized computation strategies. Future work will automate hot‑cold data detection and incorporate multi‑objective optimization to further maximize resource efficiency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native Data Management resource allocation Search Architecture elastic scaling

Written by

Baidu Geek Talk

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.