Backend Development 16 min read

How Didi Scales Online Search with Elasticsearch: Architecture, Performance, and Stability

The article details Didi's comprehensive use of Elasticsearch across all online retrieval scenarios, covering its physical‑machine architecture, gateway and control layers, data synchronization methods, cross‑datacenter replication, JDK17 + ZGC performance upgrades, cost‑saving ZSTD compression, multi‑tenant isolation, custom security, and ongoing stability practices leading to a planned upgrade to Elasticsearch 8.13.

Past Memory Big Data

Sep 13, 2024

How Didi Scales Online Search with Elasticsearch: Architecture, Performance, and Stability

Elasticsearch Overview and Didi Adoption

Elasticsearch is an open‑source, distributed, RESTful full‑text search engine built on Lucene, capable of indexing every field and scaling horizontally to hundreds of servers handling terabytes of data. Didi upgraded its ES cluster from 2.x to 7.6.0 in 2020 and now uses it as the unified engine for virtually all online retrieval workloads, becoming a benchmark for stability within the big‑data architecture team.

System Architecture

The overall product architecture consists of ES services deployed on physical machines, with the Gateway and management components running in containers. Physical‑machine deployment was chosen over Kubernetes for higher stability in online text‑search scenarios.

The control layer provides intelligent segment merge, index lifecycle management, pre‑creation of indices, and tenant governance to prevent segment bloat and OOM issues.

The Gateway handles read/write forwarding, query optimization (e.g., rewriting BKD queries to numeric range queries), three‑level rate limiting (AppID, index‑template, DSL), tenant authentication, and SQL capabilities built on the open‑source NLPChina ES‑SQL project. Only the Gateway interface is exposed externally.

User Console

Application management: businesses request an AppID to obtain read/write permissions.

Index management: create indices, request read/write rights, define or modify mappings, and perform index cleanup or decommissioning.

Search interfaces: Kibana, DSL, and SQL are supported.

Monitoring: users can view index metadata (document count, size) and read/write rates for operational insight.

Business Scenarios

Online full‑text search such as map POI queries.

Real‑time MySQL snapshots for order lookups.

One‑stop log retrieval via Kibana (e.g., trace logs).

Time‑series analysis for security monitoring.

Simple OLAP for internal dashboards.

Vector search for customer‑service RAG.

Deployment and Access Modes

Clusters are deployed on physical machines with a maximum size of about 100 nodes. Users create indices through the console and select the target cluster. Queries go through the Gateway, which resolves the appropriate ES cluster based on the AppID and provides a VIP address for HTTP access.

Data Synchronization

Two synchronization pathways are supported:

Real‑time: Log → ES, MQ (Kafka/Pulsar) → ES, MySQL binlog → ES via a unified DSINK tool built on Flink.

Offline: Hive → ES using batch load; MapReduce generates Lucene files stored in HDFS, which are then imported into ES via a custom AppendLucene plugin. The workflow includes (1) MR job to create Lucene files, (2) store files in HDFS, (3) pull files to DataNode, and (4) import into ES.

Engine Iterations – Fine‑Grained Tiered Protection

Cluster‑level isolation with four protection tiers (log cluster, public cluster, independent cluster, active‑active cluster). Mis‑assigned clusters can be migrated transparently using DCDR.

Clientnode isolation: read/write separation to limit the impact of OOM or slow writes to client nodes only.

Datanode region isolation: problematic indices can be labeled and migrated to specific nodes to avoid affecting other workloads.

Active‑Active Construction – DCDR

Didi’s self‑developed Didi Cross‑Datacenter Replication (DCDR) pushes data from a leader index to follower indices across clusters, ensuring strong consistency. Compared with Elasticsearch CCR (pull‑based and paid), DCDR uses a push‑based model, adds checkpoints to avoid full data copy, introduces sequence numbers for update consistency, and employs a write queue to prevent OOM during massive replication.

Performance Optimization – JDK 17 + ZGC

JDK 11‑G1 caused GC pauses exceeding P99 latency requirements (e.g., POI queries 180 ms vs. 60 ms target). Testing showed ZGC can keep pauses under 10 ms, and JDK 17‑G1 improved GC performance by 15 %. Consequently, Didi upgraded ES to JDK 17 and migrated critical GC algorithms to ZGC, also updating Groovy syntax, plugins, and dependent JARs. After the upgrade, payment‑cluster P99 latency dropped from 800 ms to 30 ms (96 % reduction) and average query latency fell from 25 ms to 6 ms (75 % reduction). Write throughput increased 15‑20 %.

Cost Optimization

Index mapping optimization: disable unnecessary inverted and stored fields.

Introduce ZSTD compression, reducing CPU usage by ~15 %.

Integrate with a big‑data asset management platform to identify and retire unused partitions and indices.

Multi‑Tenant Resource Isolation

Inspired by Presto’s resource groups, Didi split the search thread pool into multiple sub‑pools per AppID, configuring thread counts and queue sizes based on tenant priority. This prevents a single tenant’s bursty queries from saturating CPU and affecting other services.

Data Security

Gateway‑level authentication based on AppID.

ES‑level authentication (Clientnode, Datanode, MasterNode) using a custom security plugin.

The custom plugin is lightweight (string checks in HTTP handling), supports rolling restarts, offers a one‑click enable/disable switch, and avoids the stability risks of X‑Pack’s upgrade‑blocking design.

Stability Governance

Pre‑emptive measures: annual stability “mine‑sweeping” program resolved 61 issues (e.g., Gateway Full GC, ThreadLocal leaks).

Monitoring: hardware metrics, shard counts, master pending tasks, MQ lag, and Grafana dashboards for rapid fault localization (e.g., CPU spikes within 5 minutes).

Self‑healing: automated recovery for disk failures, query spikes, and write queue back‑pressure; active‑active traffic shifting and rate limiting per AppID, index template, and DSL.

Summary and Outlook

Running on ES 7.6, Didi plans to upgrade to ES 8.13 to gain a more efficient master node, automatic disk balancing, reduced segment memory footprint, and new features such as ANN vector search. Future work includes further write‑performance tuning, improved merge strategies, and exploring machine‑learning capabilities in newer ES versions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Elasticsearch Stability Didi Search Multi‑Tenant Isolation Cross‑Datacenter Replication ZSTD Compression

Written by

Past Memory Big Data

A popular big-data architecture channel with over 100,000 developers. Publishes articles on Spark, Hadoop, Flink, Kafka and more. Visit the Past Memory Big Data blog at https://www.iteblog.com. Search "Past Memory" on Google or Baidu.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Elasticsearch Overview and Didi Adoption

System Architecture

User Console

Business Scenarios

Deployment and Access Modes

Data Synchronization

Engine Iterations – Fine‑Grained Tiered Protection

Active‑Active Construction – DCDR

Performance Optimization – JDK 17 + ZGC

Cost Optimization

Multi‑Tenant Resource Isolation

Data Security

Stability Governance

Summary and Outlook

Past Memory Big Data

How this landed with the community

Was this worth your time?

0 Comments

Performance Optimization – JDK 17 + ZGC