Databases 18 min read

Elasticsearch 8.x Performance Boosts, New Features, and Migration Guide

This article details how upgrading from Elasticsearch 5.x/2.x to 8.x dramatically improves search, aggregation, and write performance while reducing storage costs, introduces vector KNN, synthetic _source, TSDS, searchable snapshots, security enhancements, and provides migration examples and code snippets for enterprise search platforms.

ZhongAn Tech Team

Jun 4, 2024

Elasticsearch 8.x Performance Boosts, New Features, and Migration Guide

Elasticsearch 8.x (latest 8.13) offers significant performance improvements over 5.x/2.x, including 30‑50% faster search, 60‑90% faster aggregations, 20‑30% faster writes, and roughly 20% lower storage costs, plus new capabilities such as vector KNN, RRF ranking, ESRE, snapshots, and a built‑in time‑series database (TSDB).

The middleware search team upgraded the XSearch platform to support ES8 clusters, enabling seamless one‑click upgrades from older versions and reducing operational overhead.

Background : The company still runs many ES5.x and a few ES2.x clusters, causing slow queries, timeouts, and rising costs due to constant scaling. Growing demand for vector storage and retrieval driven by large language models further motivates the upgrade.

Performance Gains :

Search performance: +30‑50%

Aggregation performance: +60‑90% (some queries 2‑10× faster)

Write performance: +20‑30%

Storage cost: ~20% reduction

Range Query : ES8 reduces latency by ~30% compared with ES5, as shown by esrally benchmarks.

Wildcard Query : ES7.9 added native wildcard support, improving fuzzy matching efficiency. Example mapping and query:

PUT my-index-000001</code>
<code>{</code>
<code>  "mappings": {</code>
<code>    "properties": {</code>
<code>      "my_wildcard": {</code>
<code>        "type": "wildcard"</code>
<code>      }</code>
<code>    }</code>
<code>  }</code>
<code>}

GET my-index-000001/_search</code>
<code>{</code>
<code>  "query": {</code>
<code>    "wildcard": {</code>
<code>      "my_wildcard": "*quite*lengthy"</code>
<code>    }</code>
<code>  }</code>
<code>}

Vector KNN Retrieval : ES8 uses HNSW for approximate nearest‑neighbor search. Sample query:

POST image-index/_search</code>
<code>{</code>
<code>  "knn": {</code>
<code>    "field": "image-vector",</code>
<code>    "query_vector": [-5, 9, -12],</code>
<code>    "k": 10,</code>
<code>    "num_candidates": 100</code>
<code>  }</code>
<code>}

ES8 also integrates Elastic Learned Sparse Encoder (ELSER) for sparse vector retrieval, usable via text_expansion queries:

GET my_index/_search</code>
<code>{</code>
<code>  "query": {</code>
<code>    "text_expansion": {</code>
<code>      "ml.tokens": {</code>
<code>        "model_id": ".elser_model_1",</code>
<code>        "model_text": "Sample"</code>
<code>      }</code>
<code>    }</code>
<code>  }</code>
<code>}

Mixed ranking combines BM25 relevance scores with KNN similarity using either linear weighting or Reciprocal Rank Fusion (RRF). Example of linear fusion for image search:

POST image-index/_search</code>
<code>{</code>
<code>  "query": {</code>
<code>    "multi_match": {</code>
<code>      "query": "flower",</code>
<code>      "fields": ["title", "description"],</code>
<code>      "boost": 0.6</code>
<code>    }</code>
<code>  },</code>
<code>  "knn": {</code>
<code>    "field": "image-vector",</code>
<code>    "query_vector": [-5, 9, -12],</code>
<code>    "k": 5,</code>
<code>    "num_candidates": 100,</code>
<code>    "boost": 0.4</code>
<code>  }</code>
<code>}

RRF example (no tuning required) merges rankings from BM25 and vector models.

Time‑Series Database (TSDS) : ES8 adds a distributed TSDB that stores data in time‑series data streams, reducing storage by ~70% and allowing native time‑based queries. Example index template:

{</code>
<code>  "index_patterns": ["metrics-weather_sensors-*"],</code>
<code>  "data_stream": {},</code>
<code>  "template": {</code>
<code>    "settings": {</code>
<code>      "index.mode": "time_series",</code>
<code>      "index.routing_path": ["sensor_id", "location"]</code>
<code>    },</code>
<code>    "mappings": {</code>
<code>      "properties": {</code>
<code>        "sensor_id": {"type": "keyword", "time_series_dimension": true},</code>
<code>        "location": {"type": "keyword", "time_series_dimension": true},</code>
<code>        "temperature": {"type": "half_float", "time_series_metric": "gauge"},</code>
<code>        "humidity": {"type": "half_float", "time_series_metric": "gauge"}</code>
<code>      }</code>
<code>    }</code>
<code>  }</code>
<code>}

Searchable Snapshots : Allows querying archived indices directly without restoring, improving efficiency for large backup data.

Security : The open‑source edition now includes built‑in security authentication without third‑party plugins.

Application Scenarios : AIGC knowledge‑base services, insurance core policy search, and auto‑insurance core search have already migrated to ES8, leveraging vector KNN, multi‑path recall, and hybrid ranking.

Future Plans : Company‑wide rollout aims to replace dozens of ES5/2 clusters, saving ~30% of ECS resources annually, and to upgrade the XSearch platform to expose ES8 features such as TSDS, snapshots, OLAP, ML/NLP models, and security to all business lines.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Migration Elasticsearch search performance security TSDB Hybrid Ranking Vector KNN

Written by

ZhongAn Tech Team

China's first online insurer. Through tech innovation we make insurance simpler, warmer, and more valuable. Powered by technology, we support 50 billion RMB of policies and serve 600 million users with smart, personalized solutions. ZhongAn's hardcore tech and article shares are here.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.