Big Data 7 min read

Design and Evolution of Ctrip Flight Ticket Log Tracking System

This article describes how Ctrip's flight ticket team built a massive log‑tracking platform using Elasticsearch, Kafka, and Spark, evaluated storage options such as Cassandra and HBase, introduced secondary indexing and hot‑cold data separation, and continuously evolved the architecture to balance resource usage and query performance.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Design and Evolution of Ctrip Flight Ticket Log Tracking System

1. Initial Architecture

The flight ticket business generates an enormous amount of log data—about 50‑100 billion records per day, reaching over a trillion records for a 15‑day window. Each of the roughly 60 log types has its own format, making efficient storage and fast querying a critical challenge.

To meet the requirements of high write throughput (>50 K/s), flexible schema, sub‑5 s query latency, and massive deletion of expired data, three storage solutions were evaluated: Cassandra, HBase, and Elasticsearch.

1.1 Elasticsearch

Elasticsearch was chosen because it supports schemaless data structures, horizontal scaling for massive writes, flexible and fast queries (average response < 1 s), and easy expiration of data via index aliases and daily index rotation.

1.2 Kafka

Kafka serves as the message queue, decoupling log producers from downstream processing and providing reliable log ingestion.

1.3 ETL

Spark is used to perform near‑real‑time ETL, pulling logs from Kafka and indexing them into Elasticsearch with a latency of less than 5 seconds.

1.4 Global Transaction ID

Each user request is assigned a unique TransactionID that propagates through all modules, enabling end‑to‑end traceability of logs across the system.

2. Architecture Evolution

The first‑generation system, based solely on Elasticsearch, performed well for single‑log queries but became unstable when more than 50 log types were queried in parallel, causing cluster health degradation.

2.1 Adding a Secondary Index

A secondary index maps each TransactionID to the specific Elasticsearch indices that contain its logs, eliminating unnecessary queries across all 60+ log types and reducing query time from 20‑30 seconds to under 5 seconds.

2.2 Hot‑Cold Data Separation

Because the secondary index grew to billions of entries, a hot‑cold strategy was introduced: today’s secondary index data is stored in Redis (sub‑5 ms lookup), while historical secondary index data is migrated to Elasticsearch.

3. Summary

The log‑tracking system continues to evolve—recently moving cold secondary‑index data to a Codis cluster and adopting faster bulk‑load methods for ETL. Ongoing improvements in big‑data technologies ensure the architecture remains scalable, performant, and adaptable to future data growth.

architecturebig dataElasticsearchKafkaETLLog Analytics
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.