Big Data 19 min read

Building an ElasticSearch-based Search Platform for Ride-Hailing: Architecture, Data Synchronization, and Performance Optimization

Hello Mobility unified its fragmented ElasticSearch clusters into a single, real‑time search platform—leveraging Kafka‑driven CDC, Flink stream processing, custom ES plugins, and extensive performance tuning—to deliver scalable matching, recommendation and voice services, ultimately raising completed orders by 49.8 % and driver acceptance by 37 %.

HelloTech
HelloTech
HelloTech
Building an ElasticSearch-based Search Platform for Ride-Hailing: Architecture, Data Synchronization, and Performance Optimization

On November 5, 2021, at the DAMS China Data Intelligence Management Summit, Ren Tianbing, head of the search and recommendation platform at Hello Mobility, presented “Application of an ElasticSearch-based Search Platform in Hello Mobility”. The talk covered the motivation, architecture, technical challenges, and outcomes of building a unified search platform for various mobility services.

Background: Hello Mobility offers multiple services (shared bikes, e‑bikes, carpool, taxi, train tickets, hotels, etc.). Each business previously built its own ElasticSearch cluster, leading to duplicated effort and operational overhead. In 2019, the company had nearly 30 separate ES clusters.

Goal: Consolidate search capabilities into a single platform, provide algorithmic support, and reduce development and operational costs.

Key challenges identified:

Business teams’ lack of trust in the central platform.

Redundant development resources within business teams.

Technical issues such as real‑time data synchronization, algorithm integration, and system stability.

Solution architecture: The platform integrates a matching engine, recommendation engine, voice recognition, and deep‑learning ranking services on top of a shared data‑midlayer (storage, feature, model, compute platforms). The original PG‑based matching relied on offline computation and caching, which caused latency and scalability problems.

Migration to ElasticSearch: The matching logic was re‑implemented using ES as the core search engine, enabling near‑real‑time queries and horizontal scalability.

Data synchronization: Business data resides in relational databases. Real‑time sync is achieved by capturing binlog changes into Kafka, then processing streams with Flink (both batch from PG and CDC from Kafka) to write into ES. Flink was chosen for its high availability, exactly‑once semantics, and SQL‑based API.

Flink dual‑stream join challenges: Long‑window state and high‑volume updates caused memory pressure and potential Cartesian products. The team resolved this by separating insert and update streams, using ES partial updates, and limiting state size.

Custom ES plugins: Business‑specific scoring (e.g., route similarity) was encapsulated in ES plugins, allowing distributed computation without restarting the cluster. Hot‑deployment mechanisms were built to update plugins without downtime.

Performance tuning: Initial rollout showed limited CPU utilization and occasional timeouts. Investigation revealed disk I/O bottlenecks due to massive point‑set data (GPS trajectories). Optimizations included using mmap, point‑cloud thinning, and compression, which increased throughput fourfold and cut latency by ~50%.

Stability measures: The team emphasized monitoring, alerting, and run‑books. They adopted a “four‑blade” approach (monitoring, alerts, pre‑plans, and human factor) and implemented multi‑region active‑active deployment, index rebuilding, and rapid failover mechanisms.

Results: The platform boosted completed order volume by 49.8% and increased the number of drivers accepting orders by 37%. The solution continues to evolve with plans for more algorithmic components (intent detection, spelling correction) and multi‑active disaster‑recovery.

Performance Optimizationbig dataFlinkElasticsearchdata synchronizationRide-hailingsearch platform
HelloTech
Written by

HelloTech

Official Hello technology account, sharing tech insights and developments.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.