Big Data 14 min read

How to Build a Billion-Scale ELK Log Platform with Filebeat, Kafka, and Elasticsearch

Learn step‑by‑step how to design and deploy a billion‑scale log collection and analysis platform using the ELK stack—Filebeat, Kafka, Logstash, Elasticsearch, and Kibana—covering architecture, configuration, installation, and best practices for high‑availability and performance.

Efficient Ops
Efficient Ops
Efficient Ops
How to Build a Billion-Scale ELK Log Platform with Filebeat, Kafka, and Elasticsearch

Overall Architecture

The platform consists of four modules: Filebeat, Kafka, Logstash, and Elasticsearch, each providing specific functions.

Filebeat : lightweight data collector, replacement for Logstash‑forwarder.

Kafka : message queue for buffering and decoupling, ensuring scalability and handling traffic spikes.

Logstash : data processing engine that ingests, filters, enriches, and formats logs before storage.

Elasticsearch : distributed search engine for full‑text, structured, and analytical queries.

<code>Filebeat: 6.2.4</code>
<code>Kafka: 2.11-1</code>
<code>Logstash: 6.2.4</code>
<code>Elasticsearch: 6.2.4</code>
<code>Kibana: 6.2.4</code>

Specific Implementation (Nginx JSON logs)

Example Nginx log entries in JSON format are shown.

<code>{"@timestamp":"2017-12-27T16:38:17+08:00","host":"192.168.56.11","clientip":"192.168.56.11","size":26,"responsetime":0.000,"upstreamtime":"-","upstreamhost":"-","http_host":"192.168.56.11","url":"/nginxweb/index.html","domain":"192.168.56.11","xff":"-","referer":"-","status":"200"}</code>

Filebeat

Filebeat is used instead of Logstash‑forwarder because it consumes fewer resources; it runs as a Go‑based lightweight agent deployed on each application server, often installed via Salt.

Download

<code>$ wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.2.4-darwin-x86_64.tar.gz</code>

Extract

<code>tar -zxvf filebeat-6.2.4-darwin-x86_64.tar.gz
mv filebeat-6.2.4-darwin-x86_64 filebeat
cd filebeat</code>

Configuration

<code>$ vim filebeat.yml
filebeat.prospectors:
- input_type: log
  paths:
    - /opt/logs/server/nginx.log
  json.keys_under_root: true
  json.add_error_key: true
  json.message_key: log
output.kafka:
  hosts: ["192.168.0.1:9092","192.168.0.2:9092","192.168.0.3:9092"]
  topic: 'nginx'</code>

Start Filebeat:

<code>$ ./filebeat -e -c filebeat.yml</code>

Kafka

Deploy a three‑node Kafka cluster (2N+1 rule) and a Zookeeper ensemble.

Download

<code>$ wget http://mirror.bit.edu.cn/apache/kafka/1.0.0/kafka_2.11-1.0.0.tgz</code>

Extract

<code>tar -zxvf kafka_2.11-1.0.0.tgz
mv kafka_2.11-1.0.0 kafka
cd kafka</code>

Zookeeper configuration

<code>$ vim zookeeper.properties
tickTime=2000
dataDir=/opt/zookeeper
clientPort=2181
maxClientCnxns=50
initLimit=10
syncLimit=5
server.1=192.168.0.1:2888:3888
server.2=192.168.0.2:2888:3888
server.3=192.168.0.3:2888:3888</code>

Create

/opt/zookeeper/myid

with node id (1,2,3) and start each Zookeeper node:

<code>$ ./zookeeper-server-start.sh -daemon ./config/zookeeper.properties</code>

Kafka broker configuration

<code>$ vim ./config/server.properties
broker.id=1
port=9092
host.name=192.168.0.1
num.replica.fetchers=1
log.dirs=/opt/kafka_logs
num.partitions=3
zookeeper.connect=192.168.0.1:2181,192.168.0.2:2181,192.168.0.3:2181
zookeeper.connection.timeout.ms=6000
zookeeper.sync.time.ms=2000
num.io.threads=8
num.network.threads=8
queued.max.requests=16
fetch.purgatory.purge.interval.requests=100
producer.purgatory.purge.interval.requests=100
delete.topic.enable=true</code>

Start each broker:

<code>$ ./bin/kafka-server-start.sh -daemon ./config/server.properties</code>

Verify topic creation:

<code>$ bin/kafka-topics.sh --list --zookeeper localhost:2181
nginx</code>

Monitor with Kafka‑Manager (open‑source tool from Yahoo).

Logstash

Logstash provides INPUT, FILTER, and OUTPUT stages. Use Grok debugger for parsing.

Download

<code>$ wget https://artifacts.elastic.co/downloads/logstash/logstash-6.2.4.tar.gz</code>

Extract

<code>tar -zxvf logstash-6.2.4.tar.gz
mv logstash-6.2.4 logstash</code>

Configuration (nginx.conf)

<code>input {
  kafka {
    type => "kafka"
    bootstrap_servers => "192.168.0.1:2181,192.168.0.2:2181,192.168.0.3:2181"
    topics => "nginx"
    group_id => "logstash"
    consumer_threads => 2
  }
}
output {
  elasticsearch {
    host => ["192.168.0.1","192.168.0.2","192.168.0.3"]
    port => "9300"
    index => "nginx-%{+YYYY.MM.dd}"
  }
}</code>

Start Logstash:

<code>$ ./bin/logstash -f nginx.conf</code>

Elasticsearch

Download, extract, and configure the cluster.

Download

<code>$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.2.4.tar.gz</code>

Extract

<code>tar -zxvf elasticsearch-6.2.4.tar.gz
mv elasticsearch-6.2.4 elasticsearch</code>

Configuration (elasticsearch.yml)

<code>cluster.name: es
node.name: es-node1
network.host: 192.168.0.1
discovery.zen.ping.unicast.hosts: ["192.168.0.1"]
discovery.zen.minimum_master_nodes: 1</code>

Start in background:

<code>$ ./bin/elasticsearch -d</code>

Verify by opening

http://192.168.0.1:9200/

and checking the JSON response.

Key operational notes:

Separate master and data nodes; keep data node memory ≤31 GB.

Set

discovery.zen.minimum_master_nodes

to

(total/2)+1

to avoid split‑brain.

Do not expose Elasticsearch to the public internet; enable X‑Pack for security.

Kibana

Download, extract, configure, and launch Kibana for visualization.

Download

<code>$ wget https://artifacts.elastic.co/downloads/kibana/kibana-6.2.4-darwin-x86_64.tar.gz</code>

Extract

<code>tar -zxvf kibana-6.2.4-darwin-x86_64.tar.gz
mv kibana-6.2.4-darwin-x86_64 kibana</code>

Configuration (kibana.yml)

<code>server.port: 5601
server.host: "192.168.0.1"
elasticsearch.url: "http://192.168.0.1:9200"</code>

Start Kibana:

<code>$ nohup ./bin/kibana &</code>

Create index patterns in Management → Index Patterns using the

nginx-*

prefix.

Conclusion

By following the commands above you can deploy a complete ELK pipeline that handles log collection, filtering, indexing, and visualization, and by horizontally scaling Kafka and Elasticsearch you can achieve daily processing of billions of log entries in real time.

Big DataElasticsearchKafkaELKLogstashKibanaFilebeatlog aggregation
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.