Optimizing Hotel Pricing Service: Removing the Stateful Architecture of the Sort Interface
This article presents a detailed case study of a hotel pricing system’s stateful Sort interface, analyzes its drawbacks, and describes a comprehensive plan to replace it with a stateless, memory‑based architecture that improves performance, reduces server count, and lowers operational complexity.
Background
In August 2019 the author, a new member of the hotel pricing team, encountered a persistent Full GC (FGC) issue on several machines of the Sort service, which could not be resolved by simple restarts. After initial mitigation by removing unused fields, a deeper analysis revealed that the Sort interface relied on a distributed lazy‑loading, stateful architecture that caused memory imbalance and high operational overhead.
Core Interface Overview
The Sort interface performs coarse‑ranking of hotels based on search parameters and returns basic pricing information. Downstream services then call a Render interface for fine‑grained price calculations. The optimization target is the Sort interface.
Stateful Architecture Description
Routing logic in Nginx/OpenResty distributes requests across roughly 90 clusters using consistent hashing, each cluster maintaining its own in‑memory cache. This design leads to a large number of servers (≈340 machines) and several drawbacks such as uneven load, high memory consumption, and difficulty in scaling or testing.
Pre‑Adjustment Architecture Pros and Cons
Advantages
Most requests hit memory quickly; hot‑city clusters are stable; failures in non‑hot clusters have limited impact.
Disadvantages
Stateful design increases system complexity and operational cost, makes horizontal scaling hard, causes uneven request distribution, memory imbalance across clusters, limits cache size, and hampers performance testing.
Plan to Remove the Stateful Architecture
Core Conflict
The main conflict is between response latency and data storage. To achieve low latency, data must be in memory, but storing all data in memory on many machines leads to the current stateful design.
Research and Solution
After researching industry practices, the team decided to shift coarse‑ranking data to the upstream search‑ranking service, keep fine‑grained calculations in the Render service, and eliminate the stateful Sort service. Data compression, full‑memory caching on large‑memory servers, and reliable Redis/message‑queue pipelines are used to keep data fresh.
Final Architecture
The new architecture removes the Sort service and its routing layer; the search‑ranking service loads all necessary pricing data into memory at startup and updates it via messages. Redis stores daily price changes, and the system no longer depends on the stateful Sort interface.
Expected Benefits
After migration, the Sort service and its ~340 machines can be decommissioned, reducing resource usage and operational overhead. The search‑ranking service handles coarse‑ranking directly from memory, improving latency and user experience, while overall system complexity and maintenance costs are significantly lowered.
Future Plans
Further work will continue to eliminate remaining stateful components and simplify the overall architecture, aiming for additional cost reductions and efficiency gains.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.