Operations 16 min read

Master ELK Log Processing: Tips, Multiline Tricks & Performance Hacks

This guide shares hard‑won ELK insights, covering codec conversion, log line removal, Grok pattern handling, multiline aggregation in both Filebeat and Logstash, date filtering, log type classification, performance tuning, JVM and Elasticsearch settings, and practical recommendations for efficient log processing pipelines.

Efficient Ops
Efficient Ops
Efficient Ops
Master ELK Log Processing: Tips, Multiline Tricks & Performance Hacks

1. ELK Practical Knowledge Points

1) Encoding conversion

Chinese garbled characters can be fixed by converting GB2312 to UTF‑8.

<code>codec => plain {
  charset => "GB2312"
}</code>

Filebeat can also perform the conversion (recommended):

<code>filebeat.prospectors:

- input_type: log
  paths:
    - C:\Users\Administrator\Desktop\performanceTrace.txt
  encoding: GB2312</code>

2) Delete extra log lines

Use the

drop

filter in Logstash to remove unwanted lines.

<code>if [message] =~ "~ ^20.*- task request,.*,start time.*" {
  # delete the matched line
  drop {}
}</code>

Example log snippet:

<code>2018-03-20 10:44:01,523 [33]DEBUG Debug - task request,task Id:1cbb72f1-a5ea-4e73-957c-6d20e9e12a7a,start time:2018-03-20 10:43:59 #need to delete line
-- Request String : {"UserName":"15046699923","Pwd":"ZYjyh727",...}
-- Response String : {"ErrorCode":0,"Success":true,...}</code>

3) Grok handling for multiple log lines

Use separate

match

statements for each line type.

<code>match => {
  "message" => "^20.*- task request,.*,start time:%{TIMESTAMP_ISO8601:RequestTime}"
  "message" => "^-- Request String : {\"UserName\":%{NUMBER:UserName:int},...} -- End"
  "message" => "^-- Response String : {\"ErrorCode\":%{NUMBER:ErrorCode:int},...} -- End"
}</code>

4) Multiline aggregation (key point)

Filebeat multiline configuration (recommended) :

<code>filebeat.prospectors:
- input_type: log
  paths:
    - /root/performanceTrace*
  multiline.pattern: '.*"WaitInterval":.*-- End'
  multiline.negate: true
  multiline.match: before</code>

Filebeat multiline before version 5.5 (after example) :

<code>filebeat.prospectors:
- input_type: log
  path: /root/performanceTrace*
  multiline:
    pattern: '^20.*'
    negate: true
    match: after</code>

Logstash input multiline (when Filebeat is not used) :

<code>input {
  file {
    path => ["/root/logs/log2"]
    start_position => "beginning"
    codec => multiline {
      pattern => "^20.*"
      negate => true
      what => "previous"
    }
  }
}</code>

5) Date filter usage

<code>date {
  match => ["InsertTime", "YYYY-MM-dd HH:mm:ss "]
  remove_field => "InsertTime"
}</code>

Can also use

match => ["timestamp", "ISO8601"]

for standard formats.

2. Overall ELK Performance Optimization

Hardware assumptions : 1 CPU, 4 GB RAM, average log size 250 B.

Logstash on Linux processes ~500 logs/s; removing Ruby reduces to ~660 logs/s; removing Grok can reach ~1000 logs/s.

Filebeat on Linux processes 2500‑3500 logs/s, ~64 GB per day per machine.

Logstash bottleneck is reading from Redis and writing to ES; one instance ~6000 logs/s, two instances ~10000 logs/s (CPU near full).

Log collection choice : Both Filebeat and Logstash can ship logs; Filebeat is lightweight and preferred for collection, Logstash for heavy processing.

Logstash configuration tuning :

pipeline.workers: set to number of CPU cores (or a multiple).

pipeline.output.workers: do not exceed pipeline workers.

pipeline.batch.size: increase from default 125 to ~1000.

pipeline.batch.delay: reduce from 5 s to ~10 ms.

JVM settings for Logstash : Xms256m, Xmx1g (adjust according to RAM).

3. Redis as a Buffer

Bind to 0.0.0.0, set a strong password.

Disable persistence (save "" and appendonly no) for pure queue usage.

Set maxmemory 0 to remove memory limits.

4. Elasticsearch Tuning

System parameters (sysctl.conf):

<code>vm.swappiness = 1
net.core.somaxconn = 65535
vm.max_map_count = 262144
fs.file-max = 518144</code>

limits.conf for the elasticsearch user:

<code>elasticsearch soft nofile 65535
elasticsearch hard nofile 65535
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited</code>

JVM heap: set Xms and Xmx equal, not exceeding 50 % of physical RAM and < 32 GB.

Elasticsearch.yml optimizations (selected excerpts):

<code>bootstrap.memory_lock: true
indices.fielddata.cache.size: 40%
indices.cache.filter.size: 30%
indices.cache.filter.terms.size: 1024mb
threadpool.search.type: cached
threadpool.search.size: 100
threadpool.search.queue_size: 2000</code>

5. Monitoring and Diagnostics

CPU: use top, check Logstash worker settings.

Memory: monitor JVM heap, avoid swap.

Disk I/O: watch for saturation from Logstash file outputs or error logs.

Network I/O: monitor when using many remote inputs/outputs.

JVM heap: ensure sufficient size, avoid excessive GC pauses.

Overall recommendation: Deploy Filebeat on each log‑producing server for lightweight collection, forward to Logstash for parsing, filtering, and enrichment, then send to Elasticsearch. Adjust pipeline workers, batch size, and JVM settings based on hardware to achieve optimal throughput.

PerformanceELKLogstashFilebeatMultiline
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.