5 Correlation Analysis Models Every Security Engineer Should Know
This article explores five primary correlation analysis models—rule‑based, statistical, threat‑intelligence‑based, context‑based, and big‑data‑driven—detailing their principles, typical use cases such as single‑log alerts, event‑count thresholds, multi‑value detections, temporal sequences, and how accurate log parsing underpins effective security analytics.
Introduction
In many security‑analysis products—log analysis, SOC, situational awareness, risk control—correlation analysis is a core capability. Different customers often request "correlation analysis" without specifying the exact type, and vendors may keep details confidential.
Overview
Products often advertise numerous built‑in rules (e.g., host password guessing, database password guessing, network device password guessing), which are essentially models. Evaluating a product should consider not only the number of built‑in rules but also the support for correlation rule models. Accurate and comprehensive log parsing is essential for effective analysis.
The five major categories of correlation analysis models are:
Rule‑based correlation analysis
Statistical correlation analysis
Threat‑intelligence‑based correlation analysis
Context‑based correlation analysis
Big‑data‑driven correlation analysis
1. Rule‑Based Correlation Analysis
Rule‑based analysis uses pre‑defined or user‑defined rules to match normalized security events. When events match a rule within a time window, an alert is generated. This approach models attacker behavior by combining relevant log fields.
1.1 Single‑Log Rule
A simple example is a Linux login log:
<code>May 22 17:13:01 10-9-83-151 sshd[17422]: Accepted password for secisland from 129.74.226.122 port 64485 ssh2</code>From this log we can extract time, hostname, process, event type, user, source IP, port, and derive asset, account, and geographic information. Various alerts can be defined, such as:
Non‑working‑hour login
Detect logins outside normal business hours.
Non‑working‑location login
Detect logins from unusual locations.
Bastion‑host bypass login
Detect logins that did not go through the bastion host.
Privilege‑escalation login
Detect logins where the account is not authorized for the target.
Foreign login
Detect logins from non‑domestic IP addresses.
1.2 Event‑Count Alert
Some attacks require multiple events within a short period, e.g., password‑guessing attacks need more than a threshold of failed logins within minutes. The model includes time, threshold, and condition dimensions.
1.3 Multi‑Value Event‑Count Alert
For scenarios like port scanning, the number of distinct ports accessed matters. The model adds a “different value” dimension to the time, threshold, and condition dimensions.
1.4 Temporal Alert
Complex attacks may involve a sequence of events (e.g., uploading a script then downloading a sensitive file). Temporal models capture ordered event chains similar to a kill‑chain.
2. Statistical Correlation Analysis
Statistical models compute dynamic baselines from historical data and flag deviations, such as traffic spikes or DDOS patterns. They are widely used for anomaly detection in user behavior, access, and downloads.
3. Threat‑Intelligence‑Based Correlation Analysis
Integrating threat‑intelligence feeds (IP reputation, domain reputation, URL reputation, file reputation, C&C reputation, etc.) enhances detection accuracy by correlating local alerts with external intelligence, filtering noise, and enabling rapid response and attribution.
4. Context‑Based Correlation Analysis
This approach enriches events with asset, vulnerability, and topology information, linking alerts to the actual environment. It builds on existing models but adds dynamic context, increasing analysis difficulty when contextual data is missing.
5. Big‑Data‑Driven Correlation Analysis
Leveraging big‑data platforms enables storage, retrieval, and aggregation of massive security event streams, allowing analyses that were previously infeasible due to volume. Traditional models are applied on top of big‑data infrastructure.
Conclusion
The discussed models provide a framework for evaluating the flexibility of correlation analysis in security products. Accurate log parsing remains a prerequisite. Emerging models such as predictive or machine‑learning‑based correlation are mentioned but not covered in depth.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.