Collecting Docker Container Logs with Flume: Strategies and Implementation
This article explains how to capture Docker container logs, discusses the challenges of multi‑line log correlation, and presents two approaches—client‑side parsing and server‑side parsing—along with a concrete Flume customization using a DockerLog Java bean.
Recently I added support for Docker container logs to my log collection feature and this article discusses strategy choices and handling methods.
When a component runs inside Docker, its logs can be viewed via docker logs ${containerName} or found in the host file system under arb/docker/containers/${fullContainerId}/${fullContainerId}-json.log . The full container ID can be obtained with docker ps --no-trunc .
Docker stores logs in JSON format, with each line representing a log entry. The outer Docker wrapper is fixed, while the inner log content varies per component. Multi‑line logs such as stack traces are split across multiple Docker log entries, making correlation difficult.
Two main processing approaches are considered:
Client does not parse: the agent only collects raw logs, leaving parsing to the server (e.g., Storm). This requires preserving order, either via queue‑based ordering or by adding sequence numbers, each with trade‑offs.
Client parses Docker format: the agent extracts the original log message from the Docker JSON wrapper, then downstream parsers can handle multi‑line correlation as usual. This requires customizing the log collector.
For Flume, the EventDeserializer can be customized. Using a MultiLineDeserializer based on LineDeserializer , we add a configuration flag wrappedByDocker = true and define a Java bean:
public static class DockerLog {
private String log;
private String stream;
private String time;
// getters and setters
}During reading, if wrappedByDocker is true, the line is deserialized with Gson and the log field is extracted:
readBeforeOffset = in.tell();
String preReadLine = readSingleLine();
if (preReadLine == null) return null;
if (wrappedByDocker) {
DockerLog dockerLog = GSON.fromJson(preReadLine, DockerLog.class);
preReadLine = dockerLog.getLog();
}This ensures the agent forwards the original log text, allowing consistent downstream processing. The full Flume customization is available on GitHub.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.