Elasticsearch Search Tuning Guide: Part 2 - Index Optimization, Mapping, Scripts, and Segment Merging
The second part of the Elasticsearch search‑tuning series explains how to pre‑index data, choose appropriate keyword or text mappings, minimize script use by preferring Painless or Lucene expressions, and efficiently force‑merge read‑only indices into single segments for better performance.
This article is the second part of a three-part series on Elasticsearch search tuning, focusing on Elasticsearch 5.0 and above. The author is Adam Vanderbush, translated by Yang Zhentao.
1. Pre-indexing Data
To optimize index patterns, pre-index certain data patterns at ingestion time. For example, if all documents have a price field and most queries perform range aggregations on fixed ranges, you can pre-index the ranges into the index and use a terms aggregation instead. This involves adding a price_range field mapped as a keyword type during indexing, allowing the search query to aggregate on this pre-computed field rather than executing a range aggregation on the numeric price field.
2. Mapping
Not all numeric data should be mapped as numeric fields. Data stored as identifiers like ISBN, or fields that identify records in another database, are better mapped as keywords rather than integer or long types. The keyword type is used for structured content like email addresses, hostnames, status codes, zip codes, or tags - typically used for filtering, sorting, and aggregations. Keyword fields can only be searched by their exact values. For full-text content like email bodies or product descriptions, use text fields instead. Indices imported from version 2.x don't support keywords and will downgrade keyword type to string type.
3. Avoiding Scripts
Generally avoid using scripts; when necessary, prefer Painless and expression engines. Painless is a simple, secure scripting language designed specifically for use in Elasticsearch, and is the default script language that can safely be used for inline and stored scripts.
Lucene Expression Language
Lucene expressions compile a JavaScript expression into bytecode, designed for high-performance custom scoring and sorting functions, supporting inline and stored scripts. Expressions have better performance than custom Lucene code with lower per-document cost. The syntax supports a JavaScript subset: a single expression. Variables accessible in expression scripts include document fields (e.g., doc['myfield'].value ), field-supported variables and methods (e.g., doc['myfield'].empty ), parameters passed to scripts (e.g., mymodifier ), and the current document score _score (only valid in script_score). Expression scripts can be used in script_score, script_fields, sort scripts, and numeric aggregation scripts.
4. Force Merge Read-Only Indices
Read-only indices benefit greatly from being merged into a single segment. A typical case is time-based indices: only the current time window's index receives new documents while older indices become read-only. The Force Merge API supports forcing merge of one or more indices, reducing the number of Lucene index segments per shard. The call blocks until the merge completes. If the HTTP connection breaks, the request continues in the background and all new requests will block until the previous force merge completes.
The Force Merge API accepts these parameters:
max_num_segments - The number of segments to merge to. Set to 1 to fully merge the index. Default simply checks if a merge is needed and executes if so.
only_expunge_deletes - Whether the merge process only expunges segments containing deletes. In Lucene, documents aren't directly deleted from a segment but just marked as deleted. During segment merge, a new segment may be created that doesn't contain those deletions. This parameter allows merging only segments with deletions, defaulting to false.
flush - Whether to flush after force merge, defaulting to true.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.