Operations 13 min read

Boosting Log Anomaly Detection with NLP and Deep Learning

This article presents a log anomaly detection approach that leverages NLP techniques such as Part‑of‑Speech tagging and Named Entity Recognition combined with deep neural networks, detailing a six‑step model, experimental validation on three datasets, and superior performance compared with existing DeepLog and LogClass methods.

JD Cloud Developers

Feb 8, 2023

Boosting Log Anomaly Detection with NLP and Deep Learning

1. Introduction

Log files are widely used in the IT industry, and detecting anomalies in logs is crucial for identifying system health. Traditional rule‑based supervised methods require complex rules and large manual effort. We propose a natural‑language‑processing‑based log anomaly detection model.

2. Log Anomaly Detection

To improve the quality of log template vectors, we enhance feature extraction by incorporating Part‑of‑Speech (PoS) and Named Entity Recognition (NER) techniques, reducing rule dependence. NER‑derived weight vectors modify template vectors, and PoS attributes of each word are analyzed to lower manual labeling costs and improve weight allocation. The model uses a deep neural network (DNN) to perform detection based on the corrected template vectors. Experiments on three datasets show higher accuracy compared with two state‑of‑the‑art models.

3. Log Anomaly Detection Model

Our model consists of six steps: template parsing, PoS analysis, initial vector construction, NER‑based weight calculation, composite vector generation, and final anomaly detection (see Figure 1).

Step 1: Template Parsing

Raw logs are semi‑structured text containing variable parts. We use FT‑Tree to extract constant templates while discarding irrelevant tokens, avoiding complex rule‑based filters.

Step 2: PoS Analysis

Parsed templates contain words with PoS tags (e.g., VB, NN). Unimportant PoS tags are removed, keeping only words that help the model understand the template.

Step 3: Initial Vector Construction

Word2vec encodes each template into an initial numeric vector, preserving semantic information.

Step 4: Weight Analysis

PoS‑filtered words are weighted using NER. Important entities identified by NER receive higher weights (e.g., 2.0) via a CRF model, producing a weight vector W.

Step 5: Composite Vector

The initial vector V′ is multiplied by the weight vector W to obtain a composite optimized vector V, emphasizing important template words.

Step 6: Anomaly Detection

The composite vector V is fed into a fully connected layer that outputs 0 (normal) or 1 (anomalous).

Model Evaluation

We evaluated the model on three datasets (HDFS, BGL, and an internal dataset A) and compared it with DeepLog and LogClass. Our model achieved the highest F1 scores (0.981 on HDFS, 0.986 on BGL, and best performance on dataset A) and the best recall, indicating lower uncertainty.

References

Natural Language Processing‑based Model for Log Anomaly Detection. SEAI.

IEEE Xplore: https://ieeexplore.ieee.org/abstract/document/9680175

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

deep learning Operations NLP DNN NER POS log anomaly detection

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.