Understanding Filebeat: Architecture, Features, and Simple Usage for Log Collection
This article introduces Filebeat as a container log collector, explains why it was chosen, outlines its core architecture and processing flow, and provides a practical configuration example for sending logs to Kafka, offering a solid foundation for further development and deeper source‑code analysis.
Recently, due to the need for cloud‑native log collection, the author chose Filebeat as a container log collector and plans to extend it; this article gives an introductory overview of Filebeat’s basic usage and principles.
Reasons for selecting Filebeat include meeting functional requirements (collecting disk log files, sending to a Kafka cluster, supporting multiline and custom fields), superior performance compared to JVM‑based Logstash or Flume, a Go‑based stack that facilitates second‑development, and easy deployment without third‑party dependencies.
Filebeat acts as a data mover: it can ingest from many inputs (most commonly the log input), process data (multiline merging, adding custom fields, JSON encoding), and output to downstream destinations such as Elasticsearch or Kafka, providing ACK feedback for at‑least‑once delivery and adapting to downstream back‑pressure.
Filebeat is one member of the Beats family; other beats include Metricbeat, Packetbeat, Auditbeat, Heartbeat, etc. All share the core library libbeat, which handles publishing, processing, queueing, output, retry, and ACK mechanisms.
Libbeat’s responsibilities are to provide a publisher component for inputs, run processors for filtering and field addition, push events to an internal queue, implement various outputs, manage retry logic, and propagate ACKs back to inputs.
The log input workflow creates processors and an Acker, uses Pipeline.queue.Producer to push events, creates a Harvester per file to read lines, tracks file progress using device‑inode identifiers, and handles file rotation, truncation, deletion, and renaming according to configuration.
When sending logs, events from inputs are consumed by libbeat, batched, and sent via output plugins (e.g., Kafka). Each batch has an ACK channel; workers read from a workQueue and send batches concurrently. Successful sends write ACKs back; failures trigger the retry mechanism.
The retry mechanism for Kafka reads failed messages from a chan *sarama.ProducerError , skips non‑retryable errors (e.g., ErrInvalidMessage), re‑queues other events via Batch.RetryEvents , and processes them again. Although a max‑retries setting exists, current code retries indefinitely, prioritizing events marked as guaranteed.
Simple usage example: configure inputs in inputs.d/*.yml : filebeat.config.inputs: enabled: true path: inputs.d/*.yml Create test1.yml : - type: log enabled: true paths: - /home/lw/test/filebeat/*.log fields: log_topic: lw_filebeat_t_2 Configure Kafka output in filebeat.yml : output.kafka: hosts: ["xxx.xxx.xxx.xxx:9092", "xxx.xxx.xxx.xxx:9092", "xxx.xxx.xxx.xxx:9092"] version: 0.9.0.1 topic: '%{[fields.log_topic]}' partition.round_robin: reachable_only: true compression: none required_acks: 1 max_message_bytes: 1000000 codec.format: string: '%{[host.name]}-%{[message]}' Run Filebeat with ./filebeat run . The article notes many additional global and per‑input/output settings that affect memory usage and reliability.
Conclusion: The article provides a high‑level overview of Filebeat’s architecture and a basic configuration example, laying the groundwork for deeper source‑code analysis in future posts.
360 Tech Engineering
Official tech channel of 360, building the most professional technology aggregation platform for the brand.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.