Real-Time Data Warehouse Architecture and Practice at Ctrip Hotel
The article explains why enterprises need real-time data warehouses, compares Lambda and Kappa architectures, describes Ctrip Hotel's Lambda‑plus‑OLAP variant built with Flink and StarRocks, and details practical solutions for ordering, wide‑table generation, and data validation that enable billion‑row, low‑latency analytics.
Authors: Qiu Shi, a Ctrip data‑warehouse specialist; Jiu Hao, a Ctrip data‑technology expert; Kui Wei, senior data engineer focusing on real‑time and batch big‑data products.
Why real‑time data warehouses? Enterprises increasingly require sub‑second data freshness, which traditional offline warehouses (T+1 latency, daily batch schedules) cannot satisfy. Real‑time warehouses support OLAP analysis, dashboards, and live business metric monitoring.
2. Real‑time warehouse architectures
2.1 Lambda architecture separates data into real‑time and batch streams, using stream engines (Storm, Flink, SparkStreaming) for the former and batch engines (Hive, Spark) for the latter, storing results in different storage layers.
2.2 Kappa architecture converts all sources to streams and processes everything with a single stream engine, simplifying the pipeline by removing the batch layer; Kafka serves both as a message queue and a long‑term log.
3. Ctrip Hotel real‑time warehouse design
3.1 Architecture selection – Chose a Lambda + OLAP variant: Lambda provides flexibility, fault‑tolerance, and low migration cost; the OLAP variant moves aggregation from the stream layer to an OLAP engine, reducing stream‑side load while meeting analysts’ self‑service needs.
3.2 Real‑time compute engine – Selected Flink for its exactly‑once guarantees, lightweight checkpointing, low latency, high throughput, and ease of use (SparkStreaming is more suited to micro‑batch).
3.3 OLAP engine – Adopted StarRocks because it offers MPP‑based distributed execution, strong concurrent query performance (outperforming ClickHouse), multiple storage models for diverse hotel scenarios, and can handle tens of thousands of queries per hour.
StarRocks uses an MPP distributed execution framework with powerful cluster query performance.
It excels in high‑concurrency, multi‑table join analytical workloads, surpassing ClickHouse for Ctrip Hotel’s query volume.
Provides four storage models to adapt to varied business cases.
4. Real‑time hotel order case study
4.1 Data source – MySQL binlog captured by Ctrip’s self‑developed Muise platform and streamed into Kafka.
4.2 ETL challenges and solutions
• Message ordering : Ensured binlog order in Kafka, assigned each partition to a single Flink task, and used a logical primary‑key as shuffle key to maintain internal ordering; updates to StarRocks are performed by primary‑key upserts to guarantee end‑to‑end consistency.
• Generating a real‑time wide table : Hotel orders have long lifecycles and multiple data streams. Replaced costly multi‑join pipelines with a Union‑All + Group‑By approach, avoiding state explosion (no 9× storage increase) and allowing near‑zero cost for adding new streams.
• Historical state handling : Since union‑all retains state only 30 minutes, older order states are stored in Redis (initially loaded from offline data, then kept in sync with real‑time updates).
• Data validation : Each night compare T‑1 real‑time data with offline data; discrepancies trigger offline‑to‑StarRocks corrections and root‑cause analysis.
4.3 Impact – The real‑time order table now holds billions of rows with millions of dimension rows, serving dozens of dashboards and reports; typical queries (30+ dimensions, ~10 metrics) complete in about 3 seconds (≈99 % of queries).
4.4 Summary – Ctrip Hotel’s real‑time data exhibits large scale, long lifecycle, and complex business flows. By adopting a Lambda + OLAP variant architecture and leveraging StarRocks’ high‑performance computing, development costs were reduced while achieving real‑time multi‑dimensional analytics, monitoring, and rapid response to incidents.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.