How to Build a Billion-Scale ELK Log Platform with Filebeat, Kafka, and Elasticsearch
Learn step‑by‑step how to design and deploy a billion‑scale log collection and analysis platform using the ELK stack—Filebeat, Kafka, Logstash, Elasticsearch, and Kibana—covering architecture, configuration, installation, and best practices for high‑availability and performance.
Overall Architecture
The platform consists of four modules: Filebeat, Kafka, Logstash, and Elasticsearch, each providing specific functions.
Filebeat : lightweight data collector, replacement for Logstash‑forwarder.
Kafka : message queue for buffering and decoupling, ensuring scalability and handling traffic spikes.
Logstash : data processing engine that ingests, filters, enriches, and formats logs before storage.
Elasticsearch : distributed search engine for full‑text, structured, and analytical queries.
<code>Filebeat: 6.2.4</code>
<code>Kafka: 2.11-1</code>
<code>Logstash: 6.2.4</code>
<code>Elasticsearch: 6.2.4</code>
<code>Kibana: 6.2.4</code>Specific Implementation (Nginx JSON logs)
Example Nginx log entries in JSON format are shown.
<code>{"@timestamp":"2017-12-27T16:38:17+08:00","host":"192.168.56.11","clientip":"192.168.56.11","size":26,"responsetime":0.000,"upstreamtime":"-","upstreamhost":"-","http_host":"192.168.56.11","url":"/nginxweb/index.html","domain":"192.168.56.11","xff":"-","referer":"-","status":"200"}</code>Filebeat
Filebeat is used instead of Logstash‑forwarder because it consumes fewer resources; it runs as a Go‑based lightweight agent deployed on each application server, often installed via Salt.
Download
<code>$ wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.2.4-darwin-x86_64.tar.gz</code>Extract
<code>tar -zxvf filebeat-6.2.4-darwin-x86_64.tar.gz
mv filebeat-6.2.4-darwin-x86_64 filebeat
cd filebeat</code>Configuration
<code>$ vim filebeat.yml
filebeat.prospectors:
- input_type: log
paths:
- /opt/logs/server/nginx.log
json.keys_under_root: true
json.add_error_key: true
json.message_key: log
output.kafka:
hosts: ["192.168.0.1:9092","192.168.0.2:9092","192.168.0.3:9092"]
topic: 'nginx'</code>Start Filebeat:
<code>$ ./filebeat -e -c filebeat.yml</code>Kafka
Deploy a three‑node Kafka cluster (2N+1 rule) and a Zookeeper ensemble.
Download
<code>$ wget http://mirror.bit.edu.cn/apache/kafka/1.0.0/kafka_2.11-1.0.0.tgz</code>Extract
<code>tar -zxvf kafka_2.11-1.0.0.tgz
mv kafka_2.11-1.0.0 kafka
cd kafka</code>Zookeeper configuration
<code>$ vim zookeeper.properties
tickTime=2000
dataDir=/opt/zookeeper
clientPort=2181
maxClientCnxns=50
initLimit=10
syncLimit=5
server.1=192.168.0.1:2888:3888
server.2=192.168.0.2:2888:3888
server.3=192.168.0.3:2888:3888</code>Create
/opt/zookeeper/myidwith node id (1,2,3) and start each Zookeeper node:
<code>$ ./zookeeper-server-start.sh -daemon ./config/zookeeper.properties</code>Kafka broker configuration
<code>$ vim ./config/server.properties
broker.id=1
port=9092
host.name=192.168.0.1
num.replica.fetchers=1
log.dirs=/opt/kafka_logs
num.partitions=3
zookeeper.connect=192.168.0.1:2181,192.168.0.2:2181,192.168.0.3:2181
zookeeper.connection.timeout.ms=6000
zookeeper.sync.time.ms=2000
num.io.threads=8
num.network.threads=8
queued.max.requests=16
fetch.purgatory.purge.interval.requests=100
producer.purgatory.purge.interval.requests=100
delete.topic.enable=true</code>Start each broker:
<code>$ ./bin/kafka-server-start.sh -daemon ./config/server.properties</code>Verify topic creation:
<code>$ bin/kafka-topics.sh --list --zookeeper localhost:2181
nginx</code>Monitor with Kafka‑Manager (open‑source tool from Yahoo).
Logstash
Logstash provides INPUT, FILTER, and OUTPUT stages. Use Grok debugger for parsing.
Download
<code>$ wget https://artifacts.elastic.co/downloads/logstash/logstash-6.2.4.tar.gz</code>Extract
<code>tar -zxvf logstash-6.2.4.tar.gz
mv logstash-6.2.4 logstash</code>Configuration (nginx.conf)
<code>input {
kafka {
type => "kafka"
bootstrap_servers => "192.168.0.1:2181,192.168.0.2:2181,192.168.0.3:2181"
topics => "nginx"
group_id => "logstash"
consumer_threads => 2
}
}
output {
elasticsearch {
host => ["192.168.0.1","192.168.0.2","192.168.0.3"]
port => "9300"
index => "nginx-%{+YYYY.MM.dd}"
}
}</code>Start Logstash:
<code>$ ./bin/logstash -f nginx.conf</code>Elasticsearch
Download, extract, and configure the cluster.
Download
<code>$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.2.4.tar.gz</code>Extract
<code>tar -zxvf elasticsearch-6.2.4.tar.gz
mv elasticsearch-6.2.4 elasticsearch</code>Configuration (elasticsearch.yml)
<code>cluster.name: es
node.name: es-node1
network.host: 192.168.0.1
discovery.zen.ping.unicast.hosts: ["192.168.0.1"]
discovery.zen.minimum_master_nodes: 1</code>Start in background:
<code>$ ./bin/elasticsearch -d</code>Verify by opening
http://192.168.0.1:9200/and checking the JSON response.
Key operational notes:
Separate master and data nodes; keep data node memory ≤31 GB.
Set
discovery.zen.minimum_master_nodesto
(total/2)+1to avoid split‑brain.
Do not expose Elasticsearch to the public internet; enable X‑Pack for security.
Kibana
Download, extract, configure, and launch Kibana for visualization.
Download
<code>$ wget https://artifacts.elastic.co/downloads/kibana/kibana-6.2.4-darwin-x86_64.tar.gz</code>Extract
<code>tar -zxvf kibana-6.2.4-darwin-x86_64.tar.gz
mv kibana-6.2.4-darwin-x86_64 kibana</code>Configuration (kibana.yml)
<code>server.port: 5601
server.host: "192.168.0.1"
elasticsearch.url: "http://192.168.0.1:9200"</code>Start Kibana:
<code>$ nohup ./bin/kibana &</code>Create index patterns in Management → Index Patterns using the
nginx-*prefix.
Conclusion
By following the commands above you can deploy a complete ELK pipeline that handles log collection, filtering, indexing, and visualization, and by horizontally scaling Kafka and Elasticsearch you can achieve daily processing of billions of log entries in real time.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.