Big Data 23 min read

Building Big Data Infrastructure at Baidu Aifanfan: Architecture Practices and Lessons Learned

At Baidu Aifanfan, the data team built a unified real‑time and offline big‑data platform—leveraging Watt, Bigpipe, Fengge, AFS and Palo within Lambda/Kappa patterns and a fast‑slow parallel rollout—that cut OLAP query latency from 18 minutes to under 15 seconds, enabled self‑service analytics, and standardized metrics across 15 agile teams.

Baidu Geek Talk

Nov 24, 2021

Building Big Data Infrastructure at Baidu Aifanfan: Architecture Practices and Lessons Learned

This article describes how Baidu Aifanfan's data team built real-time and offline big data infrastructure platforms to empower business operations with valuable data products and services. It covers the journey of addressing challenges in business, technology, and organization dimensions.

Key Terminology:

Watt: Data flow open platform for integrating MySQL and BaikalDB Binlog logs, supporting multi-table Join and UDF extensions.

Fengge Platform: Data asset center and R&D governance platform built by Baidu's Commercial Platform R&D department.

Bigpipe (BP): Distributed data transmission system for real-time messaging and log transfer with decoupling capabilities.

AFS: Big data file storage system similar to HDFS.

Palo: MPP data warehouse based on Apache Doris, supporting high-concurrency low-latency queries for PB-level datasets.

Challenges Faced:

Business challenges include providing valuable data products with timeliness and richness. Technical challenges involve handling large-scale business data across sharded databases, scattered metadata, redundant data storage, limited development resources, and ensuring stability of hundreds of scheduled tasks. Organizational challenges include supporting 15 agile teams with varying OKRs and inconsistent metric definitions.

Architecture Evolution:

The team adopted Lambda and Kappa architectures as foundational patterns. They implemented a "fast and slow parallel" approach: Version 1.0 addressed urgent business needs using Watt for CDC, while Version 2.0 integrated with the Fengge platform for comprehensive data warehouse management, metadata governance, and self-service query capabilities.

Key Technical Implementations:

1) Real-time processing improvements using Spark Streaming/Flink with Bigpipe for disaster recovery

2) Data warehouse modeling using Kimball dimensional methodology with star schema design

3) Data governance covering asset management, quality monitoring, lineage tracking, and permission control

4) Marketing effectiveness analysis migrated from Impala+Kudu to Palo (Doris) for better performance

5) Real-time capabilities enhanced with Flink to Palo and Kafka to Palo data pipelines using Stream Load and Routine Load methods

Business Impact:

The architecture reduced OLAP query latency from 18 minutes to 10-15 seconds, enabled self-service data queries for product and operations teams, unified metric definitions across the organization, and significantly improved data product value for customers.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Real-time Processing Data Warehouse OLAP Data Governance Big Data Architecture Lambda architecture Kappa architecture Apache Doris Kimball methodology

Written by

Baidu Geek Talk

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.