Elasticsearch Fundamentals: Architecture, Indexing, Queries, Docker Setup, and Chinese Tokenization
This tutorial introduces Elasticsearch's core concepts, installation via Docker, index and document operations, query DSL, aggregations, and Chinese tokenization using the IK analyzer with custom dictionaries, providing step‑by‑step code examples for building a searchable log analysis stack.
1. Introduction
In many projects we use Kibana to search logs in test or production environments, which is part of the ELK stack (Elasticsearch, Logstash, Kibana). This article explains the principles of Elasticsearch, its architecture, and how to set up the environment.
1.1 What is Elasticsearch?
Elasticsearch is a distributed open‑source search and analytics engine built on Lucene, capable of handling text, numeric, geospatial, structured, and unstructured data.
1.2 Use Cases
Typical uses include product catalog search for e‑commerce, log collection and analysis with Logstash, and real‑time analytics.
2. Basic Concepts
2.1 Index
In Elasticsearch an index is similar to a database; a document is stored as JSON, and fields are indexed using an inverted index for fast full‑text search.
2.2 Inverted Index
The inverted index maps terms to the documents containing them, enabling rapid relevance scoring.
2.3 Logstash
Logstash ( L ) collects, parses, and enriches data before sending it to Elasticsearch.
2.4 Kibana
Kibana ( K ) visualizes data stored in Elasticsearch, providing histograms, line charts, and dashboards.
3. Docker Environment Setup
3.1 Deploy Elasticsearch
docker pull elasticsearch:7.4.2 mkdir -p /mydata/elasticsearch/config
mkdir -p /mydata/elasticsearch/data
chmod 777 /mydata/elasticsearch -R
echo "http.host: 0.0.0.0" >> /mydata/elasticsearch/config/elasticsearch.yml docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
-e "discovery.type"="single-node" \
-e ES_JAVA_OPTS="-Xms64m -Xmx128m" \
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-d elasticsearch:7.4.23.2 Deploy Kibana
docker pull kibana:7.4.2 docker run --name kibana -e ELASTICSEARCH_HOSTS=http://192.168.56.10:9200 -p 5601:5601 -d kibana:7.4.24. Basic CRUD Operations
4.1 Index a Document
PUT member/external/1
{
"name": "jay huang"
}4.2 Get a Document
GET 192.168.56.10:9200/member/external/24.3 Update a Document
POST 192.168.56.10:9200/member/external/2/_update
{
"doc": {"name": "jay huang"}
}4.4 Delete a Document or Index
DELETE 192.168.56.10:9200/member/external/2 DELETE 192.168.56.10:9200/member5. Advanced Search
5.1 Query DSL Examples
Match all with sorting:
GET bank/_search
{
"query": {"match_all": {}},
"sort": [{"account_number": "asc"}]
}Match phrase, multi‑match, bool queries, filters, term queries, and aggregations are demonstrated with full JSON examples.
5.2 Aggregations
Example aggregating age distribution and average balance:
GET bank/_search
{
"query": {"match": {"address": "mill"}},
"aggs": {
"ageAggr": {"terms": {"field": "age", "size": 10}},
"ageAvg": {"avg": {"field": "age"}},
"balanceAvg": {"avg": {"field": "balance"}}
},
"size": 0
}6. Chinese Tokenization
6.1 IK Analyzer Installation
Download the matching version (e.g., 7.4.2) and unzip it into the Elasticsearch plugins directory, then restart the container:
docker exec -it
/bin/bash
wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zip
unzip elasticsearch-analysis-ik-7.4.2.zip -d ./ik
chmod -R 777 ik/
exit
docker restart elasticsearch6.2 Using IK Analyzer
Smart mode:
POST _analyze
{
"analyzer": "ik_smart",
"text": "一颗小星星"
}Max‑word mode:
POST _analyze
{
"analyzer": "ik_max_word",
"text": "一颗小星星"
}6.3 Custom Dictionary
Edit /mydata/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml to add a remote dictionary, e.g., http://192.168.56.10/ik/ik.txt , then restart Elasticsearch.
7. Conclusion
The article covered Elasticsearch fundamentals, Docker deployment, CRUD operations, advanced query DSL, aggregations, and Chinese tokenization with IK Analyzer, providing a complete guide for building a searchable log analysis platform.
Wukong Talks Architecture
Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.