How Didi Scales Online Search with Elasticsearch: Architecture, Performance, and Stability
The article details Didi's comprehensive use of Elasticsearch across all online retrieval scenarios, covering its physical‑machine architecture, gateway and control layers, data synchronization methods, cross‑datacenter replication, JDK17 + ZGC performance upgrades, cost‑saving ZSTD compression, multi‑tenant isolation, custom security, and ongoing stability practices leading to a planned upgrade to Elasticsearch 8.13.
Elasticsearch Overview and Didi Adoption
Elasticsearch is an open‑source, distributed, RESTful full‑text search engine built on Lucene, capable of indexing every field and scaling horizontally to hundreds of servers handling terabytes of data. Didi upgraded its ES cluster from 2.x to 7.6.0 in 2020 and now uses it as the unified engine for virtually all online retrieval workloads, becoming a benchmark for stability within the big‑data architecture team.
System Architecture
The overall product architecture consists of ES services deployed on physical machines, with the Gateway and management components running in containers. Physical‑machine deployment was chosen over Kubernetes for higher stability in online text‑search scenarios.
The control layer provides intelligent segment merge, index lifecycle management, pre‑creation of indices, and tenant governance to prevent segment bloat and OOM issues.
The Gateway handles read/write forwarding, query optimization (e.g., rewriting BKD queries to numeric range queries), three‑level rate limiting (AppID, index‑template, DSL), tenant authentication, and SQL capabilities built on the open‑source NLPChina ES‑SQL project. Only the Gateway interface is exposed externally.
User Console
Application management: businesses request an AppID to obtain read/write permissions.
Index management: create indices, request read/write rights, define or modify mappings, and perform index cleanup or decommissioning.
Search interfaces: Kibana, DSL, and SQL are supported.
Monitoring: users can view index metadata (document count, size) and read/write rates for operational insight.
Business Scenarios
Online full‑text search such as map POI queries.
Real‑time MySQL snapshots for order lookups.
One‑stop log retrieval via Kibana (e.g., trace logs).
Time‑series analysis for security monitoring.
Simple OLAP for internal dashboards.
Vector search for customer‑service RAG.
Deployment and Access Modes
Clusters are deployed on physical machines with a maximum size of about 100 nodes. Users create indices through the console and select the target cluster. Queries go through the Gateway, which resolves the appropriate ES cluster based on the AppID and provides a VIP address for HTTP access.
Data Synchronization
Two synchronization pathways are supported:
Real‑time: Log → ES, MQ (Kafka/Pulsar) → ES, MySQL binlog → ES via a unified DSINK tool built on Flink.
Offline: Hive → ES using batch load; MapReduce generates Lucene files stored in HDFS, which are then imported into ES via a custom AppendLucene plugin. The workflow includes (1) MR job to create Lucene files, (2) store files in HDFS, (3) pull files to DataNode, and (4) import into ES.
Engine Iterations – Fine‑Grained Tiered Protection
Cluster‑level isolation with four protection tiers (log cluster, public cluster, independent cluster, active‑active cluster). Mis‑assigned clusters can be migrated transparently using DCDR.
Clientnode isolation: read/write separation to limit the impact of OOM or slow writes to client nodes only.
Datanode region isolation: problematic indices can be labeled and migrated to specific nodes to avoid affecting other workloads.
Active‑Active Construction – DCDR
Didi’s self‑developed Didi Cross‑Datacenter Replication (DCDR) pushes data from a leader index to follower indices across clusters, ensuring strong consistency. Compared with Elasticsearch CCR (pull‑based and paid), DCDR uses a push‑based model, adds checkpoints to avoid full data copy, introduces sequence numbers for update consistency, and employs a write queue to prevent OOM during massive replication.
Performance Optimization – JDK 17 + ZGC
JDK 11‑G1 caused GC pauses exceeding P99 latency requirements (e.g., POI queries 180 ms vs. 60 ms target). Testing showed ZGC can keep pauses under 10 ms, and JDK 17‑G1 improved GC performance by 15 %. Consequently, Didi upgraded ES to JDK 17 and migrated critical GC algorithms to ZGC, also updating Groovy syntax, plugins, and dependent JARs. After the upgrade, payment‑cluster P99 latency dropped from 800 ms to 30 ms (96 % reduction) and average query latency fell from 25 ms to 6 ms (75 % reduction). Write throughput increased 15‑20 %.
Cost Optimization
Index mapping optimization: disable unnecessary inverted and stored fields.
Introduce ZSTD compression, reducing CPU usage by ~15 %.
Integrate with a big‑data asset management platform to identify and retire unused partitions and indices.
Multi‑Tenant Resource Isolation
Inspired by Presto’s resource groups, Didi split the search thread pool into multiple sub‑pools per AppID, configuring thread counts and queue sizes based on tenant priority. This prevents a single tenant’s bursty queries from saturating CPU and affecting other services.
Data Security
Gateway‑level authentication based on AppID.
ES‑level authentication (Clientnode, Datanode, MasterNode) using a custom security plugin.
The custom plugin is lightweight (string checks in HTTP handling), supports rolling restarts, offers a one‑click enable/disable switch, and avoids the stability risks of X‑Pack’s upgrade‑blocking design.
Stability Governance
Pre‑emptive measures: annual stability “mine‑sweeping” program resolved 61 issues (e.g., Gateway Full GC, ThreadLocal leaks).
Monitoring: hardware metrics, shard counts, master pending tasks, MQ lag, and Grafana dashboards for rapid fault localization (e.g., CPU spikes within 5 minutes).
Self‑healing: automated recovery for disk failures, query spikes, and write queue back‑pressure; active‑active traffic shifting and rate limiting per AppID, index template, and DSL.
Summary and Outlook
Running on ES 7.6, Didi plans to upgrade to ES 8.13 to gain a more efficient master node, automatic disk balancing, reduced segment memory footprint, and new features such as ANN vector search. Future work includes further write‑performance tuning, improved merge strategies, and exploring machine‑learning capabilities in newer ES versions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Past Memory Big Data
A popular big-data architecture channel with over 100,000 developers. Publishes articles on Spark, Hadoop, Flink, Kafka and more. Visit the Past Memory Big Data blog at https://www.iteblog.com. Search "Past Memory" on Google or Baidu.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
