TDS Platform Overview: Architecture, Modules, and Features of Baidu MEG's Turing 3.0 Data Ecosystem
The TDS platform, central to Baidu MEG’s Turing 3.0 ecosystem, unifies data development, warehouse management, monitoring, and resource control through Spark‑based TDE, a visual studio, and AI‑enhanced tools like Smart Diagnosis and Text2SQL, enabling standardized workflows, scalable scheduling, and handling over 30 k daily tasks.
Baidu MEG's previous generation of big‑data products suffered from platform fragmentation, uneven quality, and poor usability, leading to low development efficiency, high learning costs, and slow business response.
To address these issues, Baidu MEG internally developed the Turing 3.0 ecosystem, which covers the entire data lifecycle and consists of three core components:
TDE (Turing Data Engine) : the computation engine, built on Spark and ClickHouse.
TDS (Turing Data Studio) : a one‑stop data development and governance platform.
TDA (Turing Data Analysis) : the next‑generation visual BI product.
The TDS platform is the core of Turing 3.0, focusing on data development and governance. Its architecture spans from infrastructure to user‑facing features and includes modules for data development, data‑warehouse management, monitoring & operations, and resource management. It supports efficient task scheduling, resource management, and data‑lineage analysis.
Key functional modules :
1. Data Development : creation, management, and inspection of data‑processing tasks, with dependency detection, computation, and import capabilities.
2. Data‑Warehouse Management : table construction, CK table management, and integration with the data‑security platform for permission control.
3. Monitoring & Operations : task latency and failure alerts, data‑lineage queries, and task/operator statistics.
4. Resource Management : front‑end configuration for creating, editing, and deleting development groups, storage, and queues; back‑end metadata handling; and unified data‑source configuration (AFS, UGI, FTP, DRDS, Doris, etc.).
Data development workflow is standardized and visualized, covering task development & testing, task release, and task operation. The platform provides intelligent tools such as Smart Diagnosis and Text2SQL to lower the technical barrier for users.
Smart Diagnosis automatically analyzes failed task logs and erroneous SQL using LLM‑based models, quickly pinpointing problems and offering remediation suggestions.
Text2SQL converts natural‑language queries into SQL statements. The first stage generates SQL based on user‑selected tables; the second stage employs a fine‑tuned large model with context optimization to handle more complex queries and improve accuracy.
Task scheduling architecture consists of a Scheduler and Executor connected via Baidu’s internal TM message queue. The Scheduler includes three core threads (timer, concurrency‑control, worker) to trigger, control, and advance task states. Executors run in Elastic Container Instances (ECI) and can be horizontally scaled.
The DAG construction uses an adjacency‑list representation with virtual start and end nodes, ensuring a single entry and exit point for each task.
Data‑warehouse management covers schema lifecycle (creation, modification, handover, deprecation), operation approval workflows, permission management through the data‑security platform, and comprehensive table statistics (usage, storage, partition details).
Summary & Outlook : TDS has become a critical backbone for Baidu MEG, handling over 30 k tasks and more than 400 k daily task instances. Future directions include deeper AI‑driven assistance, broader data‑source integration, enhanced scalability, and stronger security & compliance.
Baidu Geek Talk
Follow us to discover more Baidu tech insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.