Artificial Intelligence 16 min read

Understanding X‑Pack Machine Learning in Elasticsearch: Features, Architecture, and Implementation

This article explains Elasticsearch X‑Pack's machine‑learning capabilities, covering supervised and unsupervised learning concepts, data preparation, task creation types, architecture components, data flow, result indices, and provides code examples for configuring and running ML jobs.

Tencent Database Technology

Sep 26, 2019

Understanding X‑Pack Machine Learning in Elasticsearch: Features, Architecture, and Implementation

Elasticsearch Cloud Service (CES) now includes commercial features such as RBAC, machine learning (ML), cross‑cluster replication, and APM. To help users leverage these capabilities, the Tencent Cloud ES kernel team introduces the X‑Pack ML feature from functional, architectural, and source‑code perspectives.

Basic Concepts

Machine learning is divided into supervised learning (training on labeled data) and unsupervised learning (discovering patterns without labels). X‑Pack ML primarily uses unsupervised techniques for time‑series anomaly detection and forecasting, which avoids the need for pre‑labeling data.

Feature Overview

The ML module provides an animated demo that shows a single‑metric task analyzing a time‑series dataset. The original data line is shown in blue, the predicted normal range as a shaded area, and anomalies highlighted in yellow/red.

Data Preparation

Example data (server‑metrics) contains about 90 000 records (~80 MB). It can be downloaded from the official Elastic site. A snippet of the JSON document looks like:

{
  "_index": "server-metrics",
  "_type": "metric",
  "_id": "272961",
  "_score": 1.0,
  "_source": {
    "@timestamp": "2017-04-01T12:42:00",
    "accept": 11922,
    "deny": 1923,
    "host": "server_1",
    "response": 2.4755486712,
    "service": "app_4",
    "total": 13845
  }
}

Creating ML Jobs

In Kibana, navigate to *Job Management* → *Create New Job*. Choose the index‑pattern and select a job type: Single metric, Multi‑metric, Population, or Advanced. Each type differs in the number of metrics and the ability to add influencers (dimensions such as host or application).

For a single‑metric job analyzing average response time, you must specify:

Aggregation (e.g., mean)

Field (e.g., response)

Bucket span (time window, typically 5 min to 1 h)

After configuring the job ID, description, and optional output index, click *Create Job* to start the analysis.

Viewing Results

The *Single Metric Viewer* shows the model plot with normal bounds and detected anomalies. The *Anomaly Explorer* provides a table sorted by severity, showing actual vs. typical values, probability, and top influencers. Forecasting can be performed by clicking *Forecast* and specifying a future horizon.

Architecture Analysis

Key components of the ML module include:

Jobs : Store configuration and metadata in the .ml-config index; have lifecycle states (OPENING, OPENED, CLOSING, CLOSED, FAILED).

Datafeeds : Pull data from Elasticsearch indices or external sources and feed it to detectors via named pipes.

Detectors : C++ processes that perform real‑time modeling; communicate with ES through pipes for input and output.

Data flow consists of a *data‑send* stream (Datafeeds → detector) and a *result‑output* stream (detector → result indices). The detector processes are launched with a command similar to:

./autodetect --jobid=test1 --bucketspan=900 --summarycountfield=doc_count \
  --lengthEncodedInput --maxAnomalyRecords=500 --timefield=@timestamp \
  --persistInterval=10000 --maxQuantileInterval=20000 \
  --limitconfig=C:\Users\DANIEL1\AppData\Local\Temp\limitconfig.conf \
  --modelplotconfig=C:\Users\DANIEL1\AppData\Local\Temp\modelplotconfig.conf \
  --fieldconfig=C:\Users\DANIEL~1\AppData\Local\Temp\fieldconfig.conf \
  --logPipe=.\pipe\autodetect_test1_log --input=.\pipe\autodetect_test1_input \
  --inputIsPipe --output=.\pipe\autodetect_test1_output --outputIsPipe \
  --persist=.\pipe\autodetect_test1_persist --persistIsPipe

Result indices store different aspects of the analysis: .ml-config: job configuration metadata. .ml-anomalies-*: model bounds and anomaly scores per bucket. .ml-state: compressed model snapshots for reuse. .ml-notifications: lifecycle events and logs. .ml-annotations-*: user‑added notes on anomalies.

Source‑Code Insight

The article briefly outlines the flow of data through the C++ detector processes, emphasizing the continuous loop that reads from the input pipe, performs modeling, and writes results back to ES via the output pipe.

Conclusion

By covering functional demos, architectural components, and source‑code snippets, this guide provides a comprehensive overview of Elasticsearch X‑Pack’s machine‑learning feature, helping users understand how to configure, run, and interpret ML jobs for time‑series anomaly detection.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

architecture Elasticsearch Anomaly Detection Data visualization Time-series X-Pack

Written by

Tencent Database Technology

Tencent's Database R&D team supports internal services such as WeChat Pay, WeChat Red Packets, Tencent Advertising, and Tencent Music, and provides external support on Tencent Cloud for TencentDB products like CynosDB, CDB, and TDSQL. This public account aims to promote and share professional database knowledge, growing together with database enthusiasts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.