Backend Development 12 min read

How to Install and Use the IK Chinese Analyzer Plugin in Elasticsearch

This article explains why Elasticsearch's built‑in tokenizers struggle with Chinese text, introduces the IK analyzer plugin, provides step‑by‑step Docker and file‑based installation methods, shows how to configure custom dictionaries via Nginx, and demonstrates smart and max‑word tokenization queries.

Wukong Talks Architecture
Wukong Talks Architecture
Wukong Talks Architecture
How to Install and Use the IK Chinese Analyzer Plugin in Elasticsearch

Elasticsearch's built‑in tokenizers do not handle Chinese well, so searching Chinese terms such as “悟空哥” fails.

1. Tokenizer principles in Elasticsearch

1.1 Tokenizer concept

A tokenizer receives a character stream and splits it into tokens, which can be combined into custom analyzers.

1.2 Standard tokenizer

The standard tokenizer splits on whitespace and records token positions and offsets, useful for phrase queries and highlighting.

1.3 English and punctuation example

POST _analyze
{
  "analyzer": "standard",
  "text": "Do you know why I want to study ELK? 2 3 33..."
}

Result:

do, you, know, why, i, want, to, study, elk, 2,3,33

1.4 Chinese tokenization example

POST _analyze
{
  "analyzer": "standard",
  "text": "悟空聊架构"
}

The standard tokenizer splits each Chinese character separately, producing 悟, 空, 聊, 架, 构 instead of the desired words.

2. Installing the IK Chinese analyzer plugin

2.1 Plugin source

https://github.com/medcl/elasticsearch-analysis-ik/releases

Match the plugin version with the Elasticsearch version (e.g., 7.4.2).

{
  "name" : "8448ec5f3312",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "xC72O3nKSjWavYZ-EPt9Gw",
  "version" : {
    "number" : "7.4.2",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "2f90bbf7b93631e52bafb59b3b049cb44ec25e96",
    "build_date" : "2019-10-28T20:40:44.881551Z",
    "build_snapshot" : false,
    "lucene_version" : "8.2.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

2.2 Installation methods

2.2.1 Inside the Elasticsearch container

Enter the container:

docker exec -it
/bin/bash

Download the plugin zip:

wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zip

Unzip and clean up:

unzip ELK-IKv7.4.2.zip -d ./ik
chmod -R 777 ik/
rm -rf *.zip

2.2.2 Via a mapping directory

Copy the zip to the plugins folder:

cd /mydata/elasticsearch/plugins
wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zip
unzip ELK-IKv7.4.2.zip -d ./ik
rm -rf *.zip

2.2.3 Upload with Xftp

Use XShell/Xftp to copy the zip into the container, then unzip as above.

3. Verifying the installation

docker exec -it
/bin/bash
elasticsearch-plugin list

The command should output ik , confirming the plugin is installed. Restart the container:

exit
docker restart elasticsearch

4. Using the IK analyzer

The plugin provides two modes: ik_smart (intelligent) and ik_max_word (maximum word segmentation).

Smart mode example

POST _analyze
{
  "analyzer": "ik_smart",
  "text": "一颗小星星"
}

Result: “一颗”, “小星星”.

Max‑word mode example

POST _analyze
{
  "analyzer": "ik_max_word",
  "text": "一颗小星星"
}

Result: “一颗”, “一”, “颗”, “小星星”, “小星”, “星星”.

5. Custom dictionary

To keep terms like “悟空哥” intact, add them to a custom dictionary and reference it in IKAnalyzer.cfg.xml (path: /usr/share/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml ).

<?xml version="1.0" encoding="UTF-8"?>
IK Analyzer 扩展配置
custom/mydict.dic;custom/single_word_low_freq.dic
custom/ext_stopword.dic
location
http://xxx.com/xxx.dic

Place a file (e.g., ik.txt ) on a remote Nginx server and set remote_ext_dict to its URL.

Deploying Nginx for remote dictionary

docker run -p 80:80 --name nginx -d nginx:1.10
docker container cp nginx:/etc/nginx ./conf
mkdir nginx
mv conf nginx/
docker stop nginx
docker rm
docker run -p 80:80 --name nginx \
  -v /mydata/nginx/html:/usr/share/nginx/html \
  -v /mydata/nginx/logs:/var/log/nginx \
  -v /mydata/nginx/conf:/etc/nginx \
  -d nginx:1.10

Create ik.txt containing “悟空哥” and make it accessible at http://192.168.56.10/ik/ik.txt . After updating IKAnalyzer.cfg.xml to point to this URL, restart Elasticsearch:

docker restart elasticsearch
docker update elasticsearch --restart=always

Now a query for “悟空哥聊架构” yields the three tokens “悟空哥”, “聊”, “架构”.

- END -

dockerElasticsearchnginxChinese TokenizationCustom DictionaryIK Analyzer
Wukong Talks Architecture
Written by

Wukong Talks Architecture

Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.