Backend Development 10 min read

Optimizing the TTS Pricing Engine: From Monolithic V1 to Scalable V2 Architecture

This article details the evolution of Qunar's TTS pricing engine from a monolithic V1 implementation with dual-level caching and blocking HTTP interfaces to a scalable V2 architecture featuring service separation, asynchronous processing, CQRS-based rule storage, canal synchronization, Solr and Redis indexing, and Kafka-driven data pipelines.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Optimizing the TTS Pricing Engine: From Monolithic V1 to Scalable V2 Architecture

Author: Zhu Shizhi, senior architect at Qunar, international ticket technology director. Joined Qunar in 2013, contributed to public services and the international ticket search system, building a real‑time search and pricing system with high availability, performance, and scalability.

In the previous article we presented the overall search architecture; this piece dives into the core pricing engine TTS and its optimization process.

Pricing Engine Optimization

Pricing Engine TTS V1

In the first version the search and transaction code lived in a single project, with databases sharded per agency (MMM instances). To reduce MySQL pressure, two‑level caching was applied:

Fine‑grained cache: agency, departure, arrival, date, version

Coarse‑grained cache: agency, arrival, version

The engine operated asynchronously internally but exposed a blocking HTTP interface, leading to wasteful thread usage. Over time the system could no longer keep up with business growth, causing frequent failures due to increased traffic, complex logic, MySQL load, and synchronous HTTP waits.

Pricing Engine TTS V2

Based on the identified problems, a new architecture was designed:

Separate codebases for search and transaction

Service‑oriented and asynchronous design to boost throughput

Introduce CQRS for pricing‑rule storage with heterogeneous data dumping

Use Canal as the data‑sync channel

Split scheduling and computation modules; computation keeps a local rule cache, hashing requests by agency‑route dimensions

Physically isolate public and private business modules to improve availability

Prune inventory queries using rule transit‑point information

CQRS (Command Query Responsibility Segregation) separates read and write workloads, allowing targeted optimizations for each. Sharding by agency is unfriendly to search, which needs route‑date indexes, so a route‑dimension index is introduced.

Storage and Dump – Details

Data dumping solves search challenges but imposes requirements:

High write volume (tens of thousands QPS at peak)

Sync latency under 60 seconds

Sequential consistency

Eventual consistency

High availability of the sync system

Technical solutions include:

Agency‑sharded source DB to keep binlog generation fast (Canal subscribes at table level, limiting per‑agency granularity)

Route‑level queues in the sync system to guarantee order and reduce contention

Periodic full‑copy comparisons to ensure eventual consistency

MySQL MMM mechanism, Canal, and sync services coordinated via ZooKeeper for HA

Local Cache Refresh

To reduce DB pressure, a local cache of pricing rules is kept in the computation module. Each request checks the rule’s last‑update timestamp; this tiny SQL still becomes a bottleneck under high concurrency. An event‑driven cache invalidation approach—using change events from the sync system—can proactively refresh the cache.

Common cache‑update strategies:

Passive expiration without validity checks

Passive expiration with validity checks

Event‑aware active invalidation and active reload

Event‑aware active invalidation with passive reload

Business‑aware segmentation (hot rules active reload, cold rules passive)

Rule Splitting Pressure

Pricing rules may involve many departure and arrival airports; cross‑product expansion can explode record counts (e.g., 100 × 100 = 10,000 rows), stressing the index DB. Introducing a full‑text search engine (Solr) eliminates the need for exhaustive splitting, accelerating data sync from minutes to seconds.

However, Solr struggles with large I/O queries when many agencies request many rules, leading to response times of several hundred milliseconds—unacceptable for real‑time computation.

To mitigate, heavy I/O is offloaded from Solr to Redis, which handles indexing while Solr remains a pure indexer. This hybrid approach reduces query latency to around 100 ms.

Inverted Index

Further analysis showed that only an inverted index is needed; using Redis to build a custom inverted index and KV store cuts rule‑lookup time to 20‑30 ms.

Sync Channel

Canal’s table‑level subscription cannot efficiently handle massive updates from a single agency, leading to huge binlogs. Switching to a message‑queue based sync channel (e.g., Kafka) dramatically reduces latency—from ten minutes for millions of changes to a few seconds.

Due to space limits, this article ends here; the next part will explore computation load and response‑time optimizations and summarize reusable patterns.

Read the original "Qunar Flight Ticket Search Architecture Practice (Part 1): Overall Architecture" for more context.

Microservicesrediscachingkafkadata synchronizationCQRSpricing enginesolr
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.