Operations 13 min read

Mastering Filebeat: How to Collect and Ship Container Logs to Kafka

This article introduces Filebeat as a lightweight log shipper, explains its core components and processing flow, and provides step‑by‑step configuration examples for gathering container logs and forwarding them to Kafka or Elasticsearch in cloud‑native environments.

Efficient Ops

Aug 4, 2020

Mastering Filebeat: How to Collect and Ship Container Logs to Kafka

Recently, due to the need for cloud‑native log collection, we decided to use Filebeat as a container log collector and extend it; this article outlines Filebeat’s basic usage and principles as an introductory guide.

1. Introduction

There are many open‑source log collectors, and we chose Filebeat for several reasons:

It meets our functional needs: collect disk log files and send them to a Kafka cluster; supports multiline collection and custom fields.

Performance is clearly better than JVM‑based Logstash or Flume.

Filebeat is built on the Go stack, which matches our existing technical expertise for further development.

Deployment is simple with no third‑party dependencies.

2. What Filebeat Can Do

In short, Filebeat is a data mover that can also perform lightweight processing to add business value.

Filebeat accepts data from various inputs; the most common is the log input, which reads log files.

It can process collected data, e.g., multiline merging, adding custom fields, JSON encoding.

Processed data is sent to an output ; we mainly use Elasticsearch and Kafka.

Filebeat implements an ACK feedback mechanism, allowing it to resume from the last position after a restart.

If sending to an output fails, a retry mechanism ensures at‑least‑once delivery semantics.

When the output is blocked, the upstream input throttles collection to match the downstream state.

One diagram to illustrate the flow.

3. The “Core” Behind Filebeat

Filebeat is one member of the Beats family; the core library that powers all Beats is libbeat. libbeat provides a publisher component to connect inputs.

Collected data passes through processors for filtering, field addition, multiline merging, etc.

The input component pushes data to the publisher’s internal queue. libbeat implements various outputs and sends processed data downstream.

It also encapsulates retry logic.

ACK feedback is propagated back to the input.

Thus, most of the work is handled by libbeat.

Inputs only need to:

Collect data from a source and hand it to libbeat.

Persist ACK feedback received from libbeat.

4. Simple Filebeat Usage Example

Filebeat’s usage is straightforward: write the appropriate input and output configurations. Below is an example that collects disk log files and forwards them to a Kafka cluster.

1. Configure the inputs.d directory in filebeat.yml so each collection path can be placed in a separate file.

filebeat.config.inputs:
  enabled: true
  path: inputs.d/*.yml

2. Create test1.yml under inputs.d:

- type: log
  enabled: true
  paths:
    - /home/lw/test/filebeat/*.log
  fields:
    log_topic: lw_filebeat_t_2

This configuration collects all files matching /home/lw/test/filebeat/*.log and adds a custom field log_topic: lw_filebeat_t_2.

3. Configure Kafka output in filebeat.yml:

output.kafka:
  hosts: ["xxx.xxx.xxx.xxx:9092", "xxx.xxx.xxx.xxx:9092", "xxx.xxx.xxx.xxx:9092"]
  version: 0.9.0.1
  topic: '%{[fields.log_topic]}'
  partition.round_robin: {}
  reachable_only: true
  compression: none
  required_acks: 1
  max_message_bytes: 1000000
  codec.format:
    string: '%{[host.name]}-%{[message]}'

Key points: hosts defines the Kafka broker list.

The topic uses the custom field defined in the input configuration. codec.format prefixes each log line with the host name.

Start Filebeat with ./filebeat run in the same directory as filebeat.yml and inputs.d.

5. How the Log Input Collects Files

Processors are created from the configuration to handle data enrichment.

An Acker persists the collection progress reported by libbeat.

A Pipeline.queue.Producer creates a producer that pushes processed file content into libbeat ’s internal queue.

File collection details:

The input polls configured paths, checking for new, expired, deleted, or moved files.

For each file, a Harvester reads the file line by line.

File content is packaged and sent to the internal queue via the producer.

File identity is tracked using device ID + inode; progress is persisted for resume.

If a file is truncated, the Harvester restarts from the beginning.

Deletion or renaming behavior depends on CloseRemoved and CloseRenamed settings.

6. How Logs Are Sent

After the input writes logs to the internal queue, libbeat consumes them.

It creates a consumer that batches events into Batch objects.

Each batch has an ACK channel for feedback.

Batches are placed into a workQueue channel.

For Kafka output, an outputs.Group spawns multiple Kafka clients and goroutines.

Each goroutine reads batches from workQueue and sends them via the Kafka client.

Successful sends write to the ACK channel, which propagates back to the input.

Retry mechanism (Kafka example):

Failed messages are read from ch <-chan *sarama.ProducerError.

Errors like ErrInvalidMessage, ErrMessageSizeTooLarge, and ErrInvalidMessageSize are not retried.

Failed events are re‑queued into the workQueue for another send attempt.

Maximum retry count can be set, but the current implementation may retry indefinitely, prioritizing events marked as Guaranteed when retries are exhausted.

If the retry queue grows large, it temporarily blocks normal sending to prioritize retries.

7. Closing Remarks

This article does not dive into source‑code details; some implementation nuances were omitted for clarity. Future posts will analyze Filebeat at the code level.

Source: Original article on 360 Cloud Computing

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch Go Kafka log collection Filebeat container logging

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.