Waggle Dance Based Metadata Solution at Tongcheng Travel: Architecture, Migration Strategies, and Future Outlook
This article presents Tongcheng Travel's metadata solution built on the open‑source Waggle Dance project, detailing the three‑layer architecture, challenges of a monolithic Hive Metastore, evaluated migration plans, federation implementation, migration workflow, and future directions for unified metadata governance.
The talk introduces Tongcheng Travel's metadata solution that leverages the open‑source Waggle Dance project to unify Hive Metastore access across multiple environments.
Background: the existing data platform consists of three layers—data products, a cloud platform & data middle‑platform (including Hive Metastore), and diverse data sources. The Hive Metastore acts as a critical bridge between compute engines and metadata.
Problems with the first‑generation architecture include limited storage capacity of a single Postgres‑based Metastore, degraded query performance, and difficulty scaling as daily data grows to petabytes, leading to timeouts and stability issues.
Three migration plans were evaluated: Plan A (sharding by tenant using separate Metastore instances, requiring extensive Hive Metastore API changes), Plan B (migrating to a distributed TiDB cluster for horizontal scaling, with migration risk due to complex Hive schema), and Plan C (adopting the open‑source Waggle Dance as a federation layer that proxies multiple Metastore clusters with low integration cost).
The chosen solution uses Waggle Dance, which provides a routing service that maps Hive databases to specific Metastore instances, supports manual and prefix‑based routing configurations, and offers fine‑grained read/write permissions.
In federation practice, Waggle Dance routes metadata requests from Flink, Kafka, JDBC, and other systems, using a routing table and white‑list configuration to enable seamless migration without service disruption.
The migration workflow involves extracting metadata from the old Metastore via API, writing it to the new federation Metastore, updating routing tables, and performing thorough consistency checks with the ability to roll back instantly; pre‑migration synchronization and low‑peak‑time execution reduced downtime from over 30 minutes to about 1 minute.
Future work focuses on building a unified metadata platform based on Hive Metastore for visual, configurable ETL development, and tighter integration with data governance tools.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.