Titan 2.0 Big Data Processing Platform: Architecture Evolution and Practice
The article describes the evolution of 360's Titan big‑data processing platform through three architectural stages, details its functional modules, explains the DITTO component framework, context and rule‑engine abstractions, and shares practical case studies and personal insights on building a flexible, self‑service data platform.
Titan 2.0 is a comprehensive big‑data processing platform developed by 360’s Data Center, offering data integration, synchronization, computation, analysis, and streaming capabilities built on a third‑generation compute engine.
Background : Modern big‑data ecosystems have shifted from early Hadoop to third‑generation engines like Spark and Flink, with diverse storage options (MPP relational, distributed, time‑series). Traditional script‑based development limits business participation, leads to opaque data flows, and causes resource waste.
Platform Evolution :
1. Pre‑Titan – Distributed computing emerged, moving from single‑node to cluster processing with script templates, improving efficiency but still labor‑intensive.
2. Titan 1.0 – Introduced a template library that exposed data development to business users, supporting richer data sources and multiple product lines, yet lacked real‑time ingestion, flexible customization, and fully self‑service operations.
3. Titan 2.0 – Adopted a third‑generation engine with DAG support and real‑time processing, built the DITTO component framework and rule engine, provided drag‑and‑drop visual development, self‑service scheduling, monitoring, permission management, and multi‑source connectivity.
Functional Modules :
Primary modules include data source management (unified handling of heterogeneous sources, security, and quality), task management (graph‑based configuration, visual and instance‑level operations), scheduling engine (instant, periodic, historical, real‑time), and permission management (role, operation, menu, and data‑source level).
Secondary modules emphasize multi‑source support, ease of use through no‑code operations, and self‑service flexibility for users to configure, monitor, and execute tasks without developer involvement.
Practice Cases :
The DITTO component framework addresses multi‑source, multi‑storage, and varied compute scenarios by providing a unified entry, supporting offline/real‑time computation, machine learning, and interactive queries on top of Spark/Flink.
DITTO’s three‑layer task structure (Application → Job → Task) enables hierarchical initialization and event submission, with DAG orchestration at the Job level.
Component abstraction defines a lifecycle, data ingestion, metadata (type and fields), and dependency management, while the context layer handles initialization of components, engines, time, environment, and scheduler, facilitating data flow between components.
The rule engine separates business logic from code, offering logical, built‑in, arithmetic, and text operations, initially as a DSL and later as an independent service with rule and function libraries.
Additional optimizations include handling data skew, preventing memory overflow, and caching strategies.
Personal Insights : The author emphasizes simplifying design, avoiding over‑engineering, grounding architecture in business scenarios, and maintaining platform stability as the foundation for performance optimization.
360 Tech Engineering
Official tech channel of 360, building the most professional technology aggregation platform for the brand.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.