Big Data 28 min read

Elasticsearch Fundamentals: Architecture, Indexing, Queries, Docker Setup, and Chinese Tokenization

This tutorial introduces Elasticsearch's core concepts, installation via Docker, index and document operations, query DSL, aggregations, and Chinese tokenization using the IK analyzer with custom dictionaries, providing step‑by‑step code examples for building a searchable log analysis stack.

Wukong Talks Architecture

Oct 9, 2020

Elasticsearch Fundamentals: Architecture, Indexing, Queries, Docker Setup, and Chinese Tokenization

1. Introduction

In many projects we use Kibana to search logs in test or production environments, which is part of the ELK stack (Elasticsearch, Logstash, Kibana). This article explains the principles of Elasticsearch, its architecture, and how to set up the environment.

1.1 What is Elasticsearch?

Elasticsearch is a distributed open‑source search and analytics engine built on Lucene, capable of handling text, numeric, geospatial, structured, and unstructured data.

1.2 Use Cases

Typical uses include product catalog search for e‑commerce, log collection and analysis with Logstash, and real‑time analytics.

2. Basic Concepts

2.1 Index

In Elasticsearch an index is similar to a database; a document is stored as JSON, and fields are indexed using an inverted index for fast full‑text search.

2.2 Inverted Index

The inverted index maps terms to the documents containing them, enabling rapid relevance scoring.

2.3 Logstash

Logstash ( L) collects, parses, and enriches data before sending it to Elasticsearch.

2.4 Kibana

Kibana ( K) visualizes data stored in Elasticsearch, providing histograms, line charts, and dashboards.

3. Docker Environment Setup

3.1 Deploy Elasticsearch

docker pull elasticsearch:7.4.2

mkdir -p /mydata/elasticsearch/config
mkdir -p /mydata/elasticsearch/data
chmod 777 /mydata/elasticsearch -R
echo "http.host: 0.0.0.0" >> /mydata/elasticsearch/config/elasticsearch.yml

docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
  -e "discovery.type"="single-node" \
  -e ES_JAVA_OPTS="-Xms64m -Xmx128m" \
  -v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
  -v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
  -d elasticsearch:7.4.2

3.2 Deploy Kibana

docker pull kibana:7.4.2

docker run --name kibana -e ELASTICSEARCH_HOSTS=http://192.168.56.10:9200 -p 5601:5601 -d kibana:7.4.2

4. Basic CRUD Operations

4.1 Index a Document

PUT member/external/1
{
  "name": "jay huang"
}

4.2 Get a Document

GET 192.168.56.10:9200/member/external/2

4.3 Update a Document

POST 192.168.56.10:9200/member/external/2/_update
{
  "doc": {"name": "jay huang"}
}

4.4 Delete a Document or Index

DELETE 192.168.56.10:9200/member/external/2

DELETE 192.168.56.10:9200/member

5. Advanced Search

5.1 Query DSL Examples

Match all with sorting:

GET bank/_search
{
  "query": {"match_all": {}},
  "sort": [{"account_number": "asc"}]
}

Match phrase, multi‑match, bool queries, filters, term queries, and aggregations are demonstrated with full JSON examples.

5.2 Aggregations

Example aggregating age distribution and average balance:

GET bank/_search
{
  "query": {"match": {"address": "mill"}},
  "aggs": {
    "ageAggr": {"terms": {"field": "age", "size": 10}},
    "ageAvg": {"avg": {"field": "age"}},
    "balanceAvg": {"avg": {"field": "balance"}}
  },
  "size": 0
}

6. Chinese Tokenization

6.1 IK Analyzer Installation

Download the matching version (e.g., 7.4.2) and unzip it into the Elasticsearch plugins directory, then restart the container:

docker exec -it <container_id> /bin/bash
wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zip
unzip elasticsearch-analysis-ik-7.4.2.zip -d ./ik
chmod -R 777 ik/
exit
docker restart elasticsearch

6.2 Using IK Analyzer

Smart mode:

POST _analyze
{
  "analyzer": "ik_smart",
  "text": "一颗小星星"
}

Max‑word mode:

POST _analyze
{
  "analyzer": "ik_max_word",
  "text": "一颗小星星"
}

6.3 Custom Dictionary

Edit /mydata/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml to add a remote dictionary, e.g., http://192.168.56.10/ik/ik.txt, then restart Elasticsearch.

7. Conclusion

The article covered Elasticsearch fundamentals, Docker deployment, CRUD operations, advanced query DSL, aggregations, and Chinese tokenization with IK Analyzer, providing a complete guide for building a searchable log analysis platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Docker Elasticsearch Search Logstash Kibana Chinese Tokenization IK Analyzer

Written by

Wukong Talks Architecture

Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.