Big Data 18 min read

Elasticsearch Version Upgrade: Architecture, Challenges, and Performance Optimization at Didi

Over seven months, Didi’s Elasticsearch team upgraded more than 30 clusters, 2,000 nodes and 4 PB of data from version 2.3.3 to 6.6.1, overcoming protocol and mapping incompatibilities with a multi‑version Arius Gateway, custom Java SDK, ECM and AMS, while saving 1 PB of storage, decommissioning 400 machines, boosting query speed by 40 %, write throughput by 30 % and cutting CPU use 10 % for an estimated 80 w/month cost reduction.

Didi Tech
Didi Tech
Didi Tech
Elasticsearch Version Upgrade: Architecture, Challenges, and Performance Optimization at Didi

This article details Didi's Elasticsearch team's successful upgrade of 30+ clusters, 2000+ nodes, and 4PB of data from version 2.3.3 to 6.6.1 over 7 months. The upgrade addressed critical challenges including protocol incompatibility, mapping differences, and resource constraints while maintaining zero impact on user queries.

The team implemented a comprehensive architecture upgrade including multi-version support through Arius Gateway, a custom ES Java SDK to handle TCP/HTTP differences, and an ElasticSearch Cluster Manager (ECM) for efficient cluster operations. They also developed an AMS (Arius MetaData Service) for comprehensive monitoring and analysis.

Resource optimization was achieved through data-driven storage tiering, mapping optimization, and the introduction of FastIndex for offline data import, resulting in 1PB data savings and 400+ physical machines returned. Performance improvements included 40% query performance increase, 30% write throughput improvement, and 10% CPU reduction.

The upgrade process involved extensive testing including query traffic replay comparison systems to ensure data consistency and performance parity. The successful migration demonstrates best practices for large-scale Elasticsearch upgrades while maintaining service stability and improving cost efficiency by 80w/month.

Performance OptimizationarchitectureBig DataElasticsearchStorage OptimizationVersion Upgradecluster management
Didi Tech
Written by

Didi Tech

Official Didi technology account

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.