Evolution of Ctrip Vacation Pricing Engine: Architecture, Challenges, and Optimizations
Ctrip’s vacation pricing engine evolved from a MySQL‑based synchronous queue to a Kafka‑driven, Spark‑parallelized architecture using HBase, dramatically cutting task generation from five hours to 1.5 hours, boosting price‑accuracy above 90 % while handling billions of calculations and external API constraints.
The article introduces Ctrip's vacation pricing engine, a system that monitors and calculates prices for travel packages whose components (flights, hotels, etc.) change frequently. Capturing price changes quickly requires a highly responsive architecture, which puts pressure on downstream services and hardware.
Background : Unlike ordinary e‑commerce where each SKU has a fixed price and inventory, travel packages consist of multiple SKUs. Any change in a resource’s price or stock affects the whole package, making pricing complex.
Challenges include complex price composition, intricate calculation rules, massive computation volume (over 8 billion task units, QPS up to 1 million), poor controllability due to external resources, high selling risk, and the concept of “live packages” that depend on third‑party APIs.
Business Scope & Metrics : The engine computes both product‑level and schedule‑level base prices. The core metric is price‑accuracy rate, measured by the deviation between offline‑computed prices and real‑time query results.
Engine 1.0 : Used MySQL as a task queue with synchronous processing (flight → hotel → optional resources). Issues included low throughput and single‑point failures.
Engine 2.0 : Replaced MySQL queues with Redis, then with a message‑queue middleware (Kafka). Introduced distributed Spark for parallel computation, and switched storage from MySQL to HBase for write‑heavy, read‑light offline workloads. Added route aggregation to reduce duplicate flight calls, achieving several‑fold performance gains.
Engine 3.0 : Further reduced message‑distribution nodes, fully adopted HBase, and optimized hot‑task strategies. Hot tasks are defined as schedule‑level units with high price‑accuracy deviation over the past three days.
Optimization Results : Task generation time dropped from 5 hours to 1.5 hours; overall technical cycle reduced from two weeks to 1.5 days. Accuracy improved from 60 % (initial) → 71 % (v1.0) → 81 % (v2.0) → 87 % (v3.0), with current average above 90 % and fully consistent prices around 60 %.
Key Takeaways : Continuous architectural evolution is driven by business needs. Optimizations are interdependent; improving speed often increases load on downstream services. Offline computation, data‑driven analysis, and gradual migration (e.g., gray‑release) are essential for stable upgrades.
Q&A Highlights :
Transition from .NET to Java was motivated by ecosystem maturity.
Message‑queue failures are rare due to multi‑node deployment and retry mechanisms.
High‑availability is achieved by distributing results across data centers.
MySQL to HBase migration involved dual‑writes to ensure consistency.
Spark is used for large‑scale data analysis and offline computation.
Real‑time price updates are not fully achievable due to external API rate limits; the system relies on near‑real‑time polling.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.