How to Achieve MySQL‑LIKE Style Fuzzy Search in Elasticsearch 8.x
This article walks through the challenge of implementing MySQL‑LIKE style front‑and‑back wildcard searches in Elasticsearch, comparing match, match_phrase, n‑gram, legacy wildcard queries, and the new wildcard field type introduced in ES 7.9+, with code samples, performance benchmarks, and practical recommendations for choosing the optimal solution.
Introduction
When a product manager demanded a MySQL‑LIKE "LIKE '%keyword%'" style fuzzy search, the Elasticsearch team realized a deeper technical challenge.
Tokenization Basics
Understanding Elasticsearch’s core concept of tokenization is essential for fuzzy search.
Tokenization Example
原始文本:"苹果手机真香"
分词结果:["苹果", "手机", "真", "香"]Match Query and Its Limitation
GET /products/_search
{
"query": {
"match": {
"name": "苹果手机"
}
}
}Problem: The default match query uses the or operator, returning many unrelated results such as "苹果电脑" or "华为手机".
Match with Operator "and"
GET /products/_search
{
"query": {
"match": {
"name": {
"query": "苹果手机",
"operator": "and"
}
}
}
}Result: Only documents containing both "苹果" and "手机" are returned, but order is still flexible.
Match_phrase
GET /products/_search
{
"query": {
"match_phrase": {
"name": "苹果手机"
}
}
}Result: Exact phrase match with correct order, but no fuzzy capability.
n‑gram + match_phrase (Pre‑7.9)
PUT /products
{
"settings": {
"analysis": {
"analyzer": {
"ngram_analyzer": {
"tokenizer": "ngram_tokenizer"
}
},
"tokenizer": {
"ngram_tokenizer": {
"type": "ngram",
"min_gram": 2,
"max_gram": 3
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "ngram_analyzer",
"search_analyzer": "standard"
}
}
}
} GET /products/_search
{
"query": {
"match_phrase": {
"name": "果手"
}
}
}Result: Successfully matches "苹果手机" via substring "果手".
✅ Supports substring matching anywhere.
❌ Index size grows roughly 3×.
❌ Query performance degrades.
❌ Requires careful n‑gram tuning.
Wildcard Queries (Pre‑7.9)
Legacy wildcard queries can be used on keyword fields but are risky.
GET /products/_search
{
"query": {
"wildcard": {
"name": {
"value": "*iPhone*",
"case_insensitive": true
}
}
}
}Leading wildcard (*) forces enumeration of all terms, causing high CPU and memory usage.
Wildcard Field Type (ES 7.9+)
PUT /products
{
"mappings": {
"properties": {
"name": {
"type": "wildcard"
}
}
}
} GET /products/_search
{
"query": {
"wildcard": {
"name": {
"value": "*果手*"
}
}
}
}Performance: ~25 ms latency, index size ~1.4×, low impact on the cluster.
Comparison Summary
match: Simple, low precision. match + operator "and": Better relevance, order‑independent. match_phrase: Exact phrase, order‑sensitive. n‑gram + match_phrase: Full fuzzy capability, high index cost.
Legacy wildcard: Easy to use but terrible performance.
Wildcard field type: Best for front‑and‑back fuzzy matching with good performance.
Final Recommendation
Deploy an Elasticsearch 8.x cluster.
Use the wildcard field type for fuzzy matching requirements.
Keep traditional searches with match_phrase or other mature queries.
Tip: If a product manager asks for deep pagination, remind them that even large platforms limit pages for usability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
