Mastering Filebeat: How to Collect and Ship Container Logs to Kafka
This article introduces Filebeat as a lightweight log shipper, explains its core components and processing flow, and provides step‑by‑step configuration examples for gathering container logs and forwarding them to Kafka or Elasticsearch in cloud‑native environments.
Recently, due to the need for cloud‑native log collection, we decided to use Filebeat as a container log collector and extend it; this article outlines Filebeat’s basic usage and principles as an introductory guide.
1. Introduction
There are many open‑source log collectors, and we chose Filebeat for several reasons:
It meets our functional needs: collect disk log files and send them to a Kafka cluster; supports multiline collection and custom fields.
Performance is clearly better than JVM‑based Logstash or Flume.
Filebeat is built on the Go stack, which matches our existing technical expertise for further development.
Deployment is simple with no third‑party dependencies.
2. What Filebeat Can Do
In short, Filebeat is a data mover that can also perform lightweight processing to add business value.
Filebeat accepts data from various inputs; the most common is the log input, which reads log files.
It can process collected data, e.g., multiline merging, adding custom fields, JSON encoding.
Processed data is sent to an output ; we mainly use Elasticsearch and Kafka.
Filebeat implements an ACK feedback mechanism, allowing it to resume from the last position after a restart.
If sending to an output fails, a retry mechanism ensures at‑least‑once delivery semantics.
When the output is blocked, the upstream input throttles collection to match the downstream state.
One diagram to illustrate the flow.
3. The “Core” Behind Filebeat
Filebeat is one member of the Beats family; the core library that powers all Beats is
libbeat.
libbeatprovides a publisher component to connect inputs.
Collected data passes through processors for filtering, field addition, multiline merging, etc.
The input component pushes data to the publisher’s internal queue.
libbeatimplements various outputs and sends processed data downstream.
It also encapsulates retry logic.
ACK feedback is propagated back to the input.
Thus, most of the work is handled by
libbeat.
Inputs only need to:
Collect data from a source and hand it to
libbeat.
Persist ACK feedback received from
libbeat.
4. Simple Filebeat Usage Example
Filebeat’s usage is straightforward: write the appropriate input and output configurations. Below is an example that collects disk log files and forwards them to a Kafka cluster.
1. Configure the
inputs.ddirectory in
filebeat.ymlso each collection path can be placed in a separate file.
<code>filebeat.config.inputs:
enabled: true
path: inputs.d/*.yml
</code>2. Create
test1.ymlunder
inputs.d:
<code>- type: log
enabled: true
paths:
- /home/lw/test/filebeat/*.log
fields:
log_topic: lw_filebeat_t_2
</code>This configuration collects all files matching
/home/lw/test/filebeat/*.logand adds a custom field
log_topic: lw_filebeat_t_2.
3. Configure Kafka output in
filebeat.yml:
<code>output.kafka:
hosts: ["xxx.xxx.xxx.xxx:9092", "xxx.xxx.xxx.xxx:9092", "xxx.xxx.xxx.xxx:9092"]
version: 0.9.0.1
topic: '%{[fields.log_topic]}'
partition.round_robin: {}
reachable_only: true
compression: none
required_acks: 1
max_message_bytes: 1000000
codec.format:
string: '%{[host.name]}-%{[message]}'
</code>Key points:
hostsdefines the Kafka broker list.
The
topicuses the custom field defined in the input configuration.
codec.formatprefixes each log line with the host name.
Start Filebeat with
./filebeat runin the same directory as
filebeat.ymland
inputs.d.
5. How the Log Input Collects Files
Processors are created from the configuration to handle data enrichment.
An
Ackerpersists the collection progress reported by
libbeat.
A
Pipeline.queue.Producercreates a producer that pushes processed file content into
libbeat’s internal queue.
File collection details:
The input polls configured paths, checking for new, expired, deleted, or moved files.
For each file, a Harvester reads the file line by line.
File content is packaged and sent to the internal queue via the producer.
File identity is tracked using device ID + inode; progress is persisted for resume.
If a file is truncated, the Harvester restarts from the beginning.
Deletion or renaming behavior depends on
CloseRemovedand
CloseRenamedsettings.
6. How Logs Are Sent
After the input writes logs to the internal queue,
libbeatconsumes them.
It creates a consumer that batches events into
Batchobjects.
Each batch has an ACK channel for feedback.
Batches are placed into a
workQueuechannel.
For Kafka output, an
outputs.Groupspawns multiple Kafka clients and goroutines.
Each goroutine reads batches from
workQueueand sends them via the Kafka client.
Successful sends write to the ACK channel, which propagates back to the input.
Retry mechanism (Kafka example):
Failed messages are read from
ch <-chan *sarama.ProducerError.
Errors like
ErrInvalidMessage,
ErrMessageSizeTooLarge, and
ErrInvalidMessageSizeare not retried.
Failed events are re‑queued into the
workQueuefor another send attempt.
Maximum retry count can be set, but the current implementation may retry indefinitely, prioritizing events marked as
Guaranteedwhen retries are exhausted.
If the retry queue grows large, it temporarily blocks normal sending to prioritize retries.
7. Closing Remarks
This article does not dive into source‑code details; some implementation nuances were omitted for clarity. Future posts will analyze Filebeat at the code level.
Source: Original article on 360 Cloud Computing
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.