Inside Baidu’s 8‑Year Evolution of Hadoop and Distributed Computing
This article chronicles Baidu’s eight‑year journey from early Hadoop adoption to advanced MPI, DAG engines, and real‑time streaming platforms, detailing architectural milestones, performance optimizations, and practical lessons for large‑scale offline and online data processing.
Guest Introduction
Zhu Guanyin, a 2008 master’s graduate of Beijing University of Posts and Telecommunications, is a senior technical manager in Baidu’s Infrastructure Department and one of the first Hadoop engineers in China, leading large‑scale offline model training and real‑time computing projects.
Topic Overview
Baidu introduced Hadoop in 2007 and now operates the world’s largest Hadoop clusters (single cluster over 13,000 nodes, total over 100,000 nodes) with daily CPU utilization exceeding 80%.
Typical Offline Computing Scenario
Offline jobs with latency over five minutes are handled by Hadoop and MPI platforms.
MapReduce Development Timeline
Early 2000s: Publication of GFS, MapReduce, Bigtable papers.
2004: MapReduce paper released; 2006: Doug Cutting founded Hadoop.
October 2007: Hadoop 0.15.1 released.
2007 – Hadoop Journey Begins
First Hadoop trial in November 2007 with a 28‑node cluster built from idle servers; initial workloads included large‑scale search PV/UV analysis.
Key improvements: LZMA compression and a binary streaming interface (bistreaming) to support non‑text data such as web indexing.
2009 – MPI Journey Begins
MPI was introduced to address Hadoop’s limitations for iterative machine‑learning tasks, offering a single All‑Reduce operation equivalent to an entire MapReduce job.
MPI’s All‑Reduce dramatically reduces job startup overhead and improves iteration efficiency.
Optimizing PLSA on Hadoop and then migrating to MPI yielded an order‑of‑magnitude speedup.
2010 – Infrastructure Department Formation
Consolidation of infrastructure teams and integration of the Pyramid system with Hadoop.
Initial MPI scheduling was manual, later replaced by Torque and then Maui for more robust scheduling.
Development of Hadoop C++ Extension (hce) and extensive bug fixes and feature additions.
2011–2015 Milestones
2011: Single‑cluster MapReduce scaled to 5,000 nodes.
2012: Baidu’s Hadoop 2.0 cluster launched, a year ahead of the open‑source version.
2013: World’s largest Hadoop cluster (13,000+ nodes) with millions of concurrent jobs; introduced transparent LZMA compression based on file hotness.
2014: Native C++ DAG engine deployed, merging multiple MapReduce jobs into a single DAG to cut redundant I/O.
2015: In‑memory streaming shuffle implemented, pushing map output to reducers proactively.
Real‑Time and Streaming Platforms
Baidu’s DStream platform achieves millisecond‑level latency, predating Storm.
TaskManager provides exactly‑once processing with 30 s–5 min latency, using a queue‑worker model and HDFS for durable storage.
Q&A Highlights
Performance vs. Spark
Spark is used selectively; Baidu prefers SparkSQL for ad‑hoc queries but relies on its own platforms for large‑scale DAG jobs.
TaskManager Guarantees
Ensures no data loss or duplication through queue‑worker decoupling and HDFS‑backed streams.
Shuffle Reliability
Map side pushes data to HDFS; reducers read from HDFS with acknowledgment mechanisms to handle failures.
Security and Auditing
Data access requires owner approval; comprehensive logging enables traceability.
Compression Strategy
Transparent LZMA compression applied to cold files during idle periods, balancing CPU cost with storage savings.
Resource Management
Monitoring dashboards track job counts, throughput, queue times, and cluster utilization.
Scheduling
Baidu’s self‑developed Normandy scheduler complements YARN, offering per‑queue concurrency limits and priority controls.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.