Evolution of the Ctrip Travel Product Log System: Architecture, Challenges, and Solutions
This article describes the development trajectory of Ctrip's travel product log system, detailing its three major phases—from a single‑table DB approach to a platform‑based solution and finally an empowered version—while discussing technical challenges, design decisions, and the implementation of HBase, Elasticsearch, and related components to handle billions of log entries efficiently.
The Ctrip travel product line system handles extremely complex product structures with thousands of underlying tables, generating over 600 million data change records daily; these logs are crucial for both supplier and internal troubleshooting and serve as a data source for BI analysis, accumulating up to 170 billion log entries.
The article outlines the evolution of this product log system, covering its development trajectory, evolution process, and concluding remarks.
1. Development Trajectory
Three stages are identified:
Before 2019 – Single‑table logs : No unified logging system existed; changes were recorded in an unstructured DB table (id, LogContent) covering only basic product information.
2020‑2022 – Platformization : A unified logging platform was built, allowing configurable data‑change logging across the entire product maintenance workflow.
2023‑2024 – Open Enablement : The platform, proven at hundred‑billion‑record scale, adopted flexible configurations, low integration cost, and began opening to other business lines such as tickets and car services.
2. Evolution Process
2.1 V1.0 – DB Single‑Table Storage
Logs were stored as plain text in a single table, leading to massive table size (over 1 billion rows, ~370 GB), poor query performance, low readability, and limited extensibility due to tight coupling with business code.
2.2 V2.0 – Platformization
2.2.1 Technology Selection : Evaluated solutions for massive log storage and query, including ES+HBase, MongoDB, and ClickHouse. ES+HBase was chosen for its combined search and storage capabilities despite higher architectural complexity.
2.2.2 Overall Architecture : HBase handles storage, Elasticsearch handles search; the ES DocID is linked to the HBase RowKey. Logs are ingested via an API, queued through MQ for asynchronous processing, decoupling write operations from business logic.
2.2.3 RowKey Design : RowKey consists of five parts—MD5‑derived prefix, padded tableId, pk with random suffix, log type, and timestamp—ensuring uniqueness, distribution, order, compactness, and readability.
2.2.4 Extensibility : A unified data‑write service and logging API were abstracted, allowing any module (e.g., entry, direct‑connect) to write logs uniformly, with configuration centralized in a logging center.
Write Process : Clients call the logging API; the service pushes messages to MQ; consumers generate RowKey, write log content to HBase, then index it in Elasticsearch. Failures trigger compensation via Redis.
Query Process : Clients call a query API; the service translates parameters to an ES paginated query, retrieves RowKeys, then fetches full log content from HBase in batch.
The platform provides a query UI for developers, achieving comprehensive, configurable, and structured log storage and retrieval at massive scale, though it remains technical and developer‑centric.
2.3 V3.0 – Empowerment
2.3.1 Business Empowerment
To address performance degradation from growing log volume, ES and HBase were horizontally sharded and expanded. A routing rule in the logging center directs different business‑line logs to appropriate clusters, supporting both shared and dedicated deployments.
Search Empowerment
Index fields were expanded to ten configurable attributes (numeric, string, date) to cover diverse query scenarios. Index partitioning by time (weekly indices retained for one year) improves query speed and storage management.
2.3.2 Supplier Empowerment
A B‑side log query page was built, transforming raw log data into business‑friendly formats (e.g., key‑value conversion, data association, enumeration mapping, bit‑field decoding, field aggregation, external API enrichment, and diff comparison). Seven presentation patterns were defined to quickly adapt new log types for supplier and business user consumption.
3. Conclusion
The article details the log platform’s evolution, the technical challenges of storing and searching massive log data, and the solutions implemented to achieve sub‑500 ms query latency on trillion‑level records, while opening the system to suppliers and business users to reduce troubleshooting effort and support multi‑business‑line expansion.
Wukong Talks Architecture
Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.