Big Data 14 min read

How to Build a Billion-Scale ELK Log Platform with Filebeat, Kafka, and Elasticsearch

Learn step‑by‑step how to design and deploy a billion‑scale log collection and analysis platform using the ELK stack—Filebeat, Kafka, Logstash, Elasticsearch, and Kibana—covering architecture, configuration, installation, and best practices for high‑availability and performance.

Efficient Ops

May 9, 2021

How to Build a Billion-Scale ELK Log Platform with Filebeat, Kafka, and Elasticsearch

Overall Architecture

The platform consists of four modules: Filebeat, Kafka, Logstash, and Elasticsearch, each providing specific functions.

Filebeat : lightweight data collector, replacement for Logstash‑forwarder.

Kafka : message queue for buffering and decoupling, ensuring scalability and handling traffic spikes.

Logstash : data processing engine that ingests, filters, enriches, and formats logs before storage.

Elasticsearch : distributed search engine for full‑text, structured, and analytical queries.

Filebeat: 6.2.4</code>
<code>Kafka: 2.11-1</code>
<code>Logstash: 6.2.4</code>
<code>Elasticsearch: 6.2.4</code>
<code>Kibana: 6.2.4

Specific Implementation (Nginx JSON logs)

Example Nginx log entries in JSON format are shown.

{"@timestamp":"2017-12-27T16:38:17+08:00","host":"192.168.56.11","clientip":"192.168.56.11","size":26,"responsetime":0.000,"upstreamtime":"-","upstreamhost":"-","http_host":"192.168.56.11","url":"/nginxweb/index.html","domain":"192.168.56.11","xff":"-","referer":"-","status":"200"}

Filebeat

Filebeat is used instead of Logstash‑forwarder because it consumes fewer resources; it runs as a Go‑based lightweight agent deployed on each application server, often installed via Salt.

Download

$ wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.2.4-darwin-x86_64.tar.gz

Extract

tar -zxvf filebeat-6.2.4-darwin-x86_64.tar.gz
mv filebeat-6.2.4-darwin-x86_64 filebeat
cd filebeat

Configuration

$ vim filebeat.yml
filebeat.prospectors:
- input_type: log
  paths:
    - /opt/logs/server/nginx.log
  json.keys_under_root: true
  json.add_error_key: true
  json.message_key: log
output.kafka:
  hosts: ["192.168.0.1:9092","192.168.0.2:9092","192.168.0.3:9092"]
  topic: 'nginx'

Start Filebeat:

$ ./filebeat -e -c filebeat.yml

Kafka

Deploy a three‑node Kafka cluster (2N+1 rule) and a Zookeeper ensemble.

Download

$ wget http://mirror.bit.edu.cn/apache/kafka/1.0.0/kafka_2.11-1.0.0.tgz

Extract

tar -zxvf kafka_2.11-1.0.0.tgz
mv kafka_2.11-1.0.0 kafka
cd kafka

Zookeeper configuration

$ vim zookeeper.properties
tickTime=2000
dataDir=/opt/zookeeper
clientPort=2181
maxClientCnxns=50
initLimit=10
syncLimit=5
server.1=192.168.0.1:2888:3888
server.2=192.168.0.2:2888:3888
server.3=192.168.0.3:2888:3888

Create /opt/zookeeper/myid with node id (1,2,3) and start each Zookeeper node:

$ ./zookeeper-server-start.sh -daemon ./config/zookeeper.properties

Kafka broker configuration

$ vim ./config/server.properties
broker.id=1
port=9092
host.name=192.168.0.1
num.replica.fetchers=1
log.dirs=/opt/kafka_logs
num.partitions=3
zookeeper.connect=192.168.0.1:2181,192.168.0.2:2181,192.168.0.3:2181
zookeeper.connection.timeout.ms=6000
zookeeper.sync.time.ms=2000
num.io.threads=8
num.network.threads=8
queued.max.requests=16
fetch.purgatory.purge.interval.requests=100
producer.purgatory.purge.interval.requests=100
delete.topic.enable=true

Start each broker:

$ ./bin/kafka-server-start.sh -daemon ./config/server.properties

Verify topic creation:

$ bin/kafka-topics.sh --list --zookeeper localhost:2181
nginx

Monitor with Kafka‑Manager (open‑source tool from Yahoo).

Logstash

Logstash provides INPUT, FILTER, and OUTPUT stages. Use Grok debugger for parsing.

Download

$ wget https://artifacts.elastic.co/downloads/logstash/logstash-6.2.4.tar.gz

Extract

tar -zxvf logstash-6.2.4.tar.gz
mv logstash-6.2.4 logstash

Configuration (nginx.conf)

input {
  kafka {
    type => "kafka"
    bootstrap_servers => "192.168.0.1:2181,192.168.0.2:2181,192.168.0.3:2181"
    topics => "nginx"
    group_id => "logstash"
    consumer_threads => 2
  }
}
output {
  elasticsearch {
    host => ["192.168.0.1","192.168.0.2","192.168.0.3"]
    port => "9300"
    index => "nginx-%{+YYYY.MM.dd}"
  }
}

Start Logstash:

$ ./bin/logstash -f nginx.conf

Elasticsearch

Download, extract, and configure the cluster.

Download

$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.2.4.tar.gz

Extract

tar -zxvf elasticsearch-6.2.4.tar.gz
mv elasticsearch-6.2.4 elasticsearch

Configuration (elasticsearch.yml)

cluster.name: es
node.name: es-node1
network.host: 192.168.0.1
discovery.zen.ping.unicast.hosts: ["192.168.0.1"]
discovery.zen.minimum_master_nodes: 1

Start in background: $ ./bin/elasticsearch -d Verify by opening http://192.168.0.1:9200/ and checking the JSON response.

Key operational notes:

Separate master and data nodes; keep data node memory ≤31 GB.

Set discovery.zen.minimum_master_nodes to (total/2)+1 to avoid split‑brain.

Do not expose Elasticsearch to the public internet; enable X‑Pack for security.

Kibana

Download, extract, configure, and launch Kibana for visualization.

Download

$ wget https://artifacts.elastic.co/downloads/kibana/kibana-6.2.4-darwin-x86_64.tar.gz

Extract

tar -zxvf kibana-6.2.4-darwin-x86_64.tar.gz
mv kibana-6.2.4-darwin-x86_64 kibana

Configuration (kibana.yml)

server.port: 5601
server.host: "192.168.0.1"
elasticsearch.url: "http://192.168.0.1:9200"

Start Kibana: $ nohup ./bin/kibana & Create index patterns in Management → Index Patterns using the nginx-* prefix.

Conclusion

By following the commands above you can deploy a complete ELK pipeline that handles log collection, filtering, indexing, and visualization, and by horizontally scaling Kafka and Elasticsearch you can achieve daily processing of billions of log entries in real time.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch ELK Logstash Kibana Filebeat log aggregation

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.