Cloud Native 7 min read

Collecting Docker Container Logs with Flume: Strategies and Implementation

This article explains how to capture Docker container logs, discusses the challenges of multi‑line log correlation, and presents two approaches—client‑side parsing and server‑side parsing—along with a concrete Flume customization using a DockerLog Java bean.

Architect

Feb 18, 2016

Collecting Docker Container Logs with Flume: Strategies and Implementation

Recently I added support for Docker container logs to my log collection feature and this article discusses strategy choices and handling methods.

When a component runs inside Docker, its logs can be viewed via docker logs ${containerName} or found in the host file system under

arb/docker/containers/${fullContainerId}/${fullContainerId}-json.log

. The full container ID can be obtained with docker ps --no-trunc.

Docker stores logs in JSON format, with each line representing a log entry. The outer Docker wrapper is fixed, while the inner log content varies per component. Multi‑line logs such as stack traces are split across multiple Docker log entries, making correlation difficult.

Two main processing approaches are considered:

Client does not parse: the agent only collects raw logs, leaving parsing to the server (e.g., Storm). This requires preserving order, either via queue‑based ordering or by adding sequence numbers, each with trade‑offs.

Client parses Docker format: the agent extracts the original log message from the Docker JSON wrapper, then downstream parsers can handle multi‑line correlation as usual. This requires customizing the log collector.

For Flume, the EventDeserializer can be customized. Using a MultiLineDeserializer based on LineDeserializer, we add a configuration flag wrappedByDocker = true and define a Java bean:

public static class DockerLog {
    private String log;
    private String stream;
    private String time;
    // getters and setters
}

During reading, if wrappedByDocker is true, the line is deserialized with Gson and the log field is extracted:

readBeforeOffset = in.tell();
String preReadLine = readSingleLine();
if (preReadLine == null) return null;
if (wrappedByDocker) {
    DockerLog dockerLog = GSON.fromJson(preReadLine, DockerLog.class);
    preReadLine = dockerLog.getLog();
}

This ensures the agent forwards the original log text, allowing consistent downstream processing. The full Flume customization is available on GitHub.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Docker Container logging log collection Flume

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.