Cloud Native 13 min read

How to Seamlessly Pipe Kubernetes Container Logs into Loki with Fluentd

This article walks through a cloud-native approach to collect Kubernetes container logs using Fluentd and forward them to Loki, covering why Fluentd is chosen, required plugins, Docker log handling, multi-worker configuration, metadata filtering, label customization, and Loki output settings.

Ops Development Stories

Jun 3, 2021

How to Seamlessly Pipe Kubernetes Container Logs into Loki with Fluentd

Why Fluentd

Fluentd is a CNCF‑managed unified log collector that can gather, process, and forward logs from many sources to files or databases, providing a consistent data collection layer for infrastructure. It offers many plugins, production‑grade guidance, and backing from major cloud providers, making it a solid choice for Kubernetes log collection.

Loki Plugin

Loki provides the fluent-plugin-grafana-loki output plugin for Fluentd, allowing collected logs to be sent to a Loki instance. To use it, add the following gems to your Fluentd Docker image.

# Required Loki output and Kubernetes metadata plugins
gem install fluent-plugin-grafana-loki
gem install fluent-plugin-kubernetes_metadata_filter
# Recommended plugins for Prometheus metrics and tag manipulation
gem install fluent-plugin-prometheus
gem install fluent-plugin-record-modifier
gem install fluent-plugin-rewrite-tag-filter

Collection Process

General recommendations for logging in Kubernetes state that applications should write to stdout/stderr, a DaemonSet should collect these streams, and a log backend such as Loki should be prepared for processing and storage.

Applications should output logs to stdout and stderr instead of local files. Use a DaemonSet (not a sidecar) to collect container stdout/stderr. Provide a log processing backend like ES or Loki with a retention plan. Include log level switches so the same image can emit different verbosity per environment.

The following diagram shows the Fluentd collection workflow on Kubernetes.

Pre‑Input Stage

Docker redirects container stdout/stderr to /var/lib/docker/containers in JSON format, e.g.:

{ 
    "log":"xxxxxxxxxxx",
    "stream":"stdout",
    "time":"2020-09-15T23:09:04.902156725Z"
}

When deploying Fluentd on a node, map this directory into the Fluentd container.

# Volume mapping for container standard input
...
volumeMounts:
- mountPath: /var/lib/docker/containers
  name: varlibdockercontainers
volumes:
- hostPath:
    path: /var/lib/docker/containers
    name: varlibdockercontainers
...

Input Stage

Use Fluentd’s in_tail plugin to tail Docker logs:

<worker 0>
  <source>
    @type tail
    @id input.containers.out
    path /var/log/containers/*.log
    exclude_path ["/var/log/containers/*fluentd*.log"]
    pos_file /var/log/fluentd/container.out.pos
    limit_recently_modified 86400
    read_from_head true
    tag kubernetes.*
    <parse>
      @type json
      time_key time
      time_format %Y-%m-%dT%H:%M:%S.%NZ
      utc true
    </parse>
  </source>
</worker>

For higher container density, create two workers to parallelize collection:

<worker 0>
  <source>
    @type tail
    @id input.containers.out.0
    path /var/log/containers/*[0-7].log
    exclude_path ["/var/log/containers/*fluentd*.log"]
    pos_file /var/log/fluentd/container.out.pos.0
    limit_recently_modified 86400
    read_from_head true
    tag kubernetes.*
    <parse>
      @type json
      time_key time
      time_format %Y-%m-%dT%H:%M:%S.%NZ
      utc true
    </parse>
  </source>
</worker>
<worker 1>
  <source>
    @type tail
    @id input.containers.out.1
    path /var/log/containers/*[8-f].log
    exclude_path ["/var/log/containers/*fluentd*.log"]
    pos_file /var/log/fluentd/container.out.pos.1
    limit_recently_modified 86400
    read_from_head true
    tag kubernetes.*
    <parse>
      @type json
      time_key time
      time_format %Y-%m-%dT%H:%M:%S.%NZ
      utc true
    </parse>
  </source>
</worker>

Docker does not limit log storage by default; in production you should set --log-opt=10G to cap container log size. If the log bucket rotates while collection is paused, logs may be lost.

Filter Stage

Two plugins are used to enrich logs with Kubernetes metadata and to modify fields:

fluent-plugin-kubernetes_metadata_filter

fluent-plugin-record-modifier

The metadata filter extracts information from the log tag, queries the Kubernetes API for pod and namespace labels, and adds them to the log JSON:

<filter kubernetes.var.log.containers.**>
  @type kubernetes_metadata
  @id kubernetes_metadata_container_out
  skip_container_metadata true
  skip_master_url true
  cache_size 3000
  cache_ttl 1800
</filter>

The plugin caches metadata; adjust cache size and TTL according to cluster size. If new pods lack labels, wait a moment or restart Fluentd.

The record_modifier plugin creates custom fields and removes the original Docker/Kubernetes metadata:

<match kubernetes.var.log.containers.**>
  @type record_modifier
  @id label.container.out
  tag ${record.dig('k8s_std_collect') ? 'loki.kubernetes.var.log.containers' : 'dropped.var.log.containers'}
  <record>
    k8s_container_id ${record.dig("docker","container_id")}
    k8s_cloud_cluster "${ENV['CLOUD_CLUSTER'] || 'default'}"
    k8s_node ${record.dig('kubernetes','host')}
    k8s_container_name ${record.dig('kubernetes','container_name')}
    k8s_app_name ${record.dig('kubernetes','labels','app_kubernetes_io/name')}
    k8s_svc_name ${record.dig('kubernetes','labels','app')}
    k8s_pod_name ${record.dig('kubernetes','pod_name')}
    k8s_namespace_name ${record.dig('kubernetes','namespace_name')}
    k8s_image_version ${record.dig('kubernetes','labels','app_image_version')}
    k8s_std_collect ${record.dig("kubernetes","labels","log-collect") or false}
    formated_time "${Time.at(time).to_datetime.iso8601(9)}"
    fluentd_worker "${worker_id}"
  </record>
  remove_keys docker,kubernetes
</match>

A typical label convention might look like:

metadata:
  labels:
    app: <component_name>
    app.kubernetes.io/name: <app_name>
    app.kubernetes.io/version: <app_release>
spec:
  template:
    metadata:
      labels:
        app: <component_name>
        app.image.version: <component_image_tag>
        app.kubernetes.io/name: <app_name>
        log-collect: "true"

The log-collect label can be used to enable or disable log collection per container; adjust the record_modifier tag accordingly. <code>tag loki.kubernetes.var.log.containers </code>

Output Stage

The final step sends enriched logs to Loki using the fluent-plugin-grafana-loki output plugin:

<match loki.**>
  @type loki
  @id loki.output
  url "http://loki:3100"
  remove_keys topic,k8s_std_collect,formated_time,k8s_container_id
  drop_single_key true
  <label>
    stream
    k8s_cloud_cluster
    k8s_container_name
    k8s_node
    k8s_app_name
    k8s_svc_name
    k8s_pod_name
    k8s_image_version
    k8s_namespace_name
  </label>
  <buffer label>
    @type file
    path /var/log/fluentd-buffers/loki.buffer
    flush_mode interval
    flush_thread_count 4
    flush_interval 3s
    retry_type exponential_backoff
    retry_wait 2s
    retry_max_interval 60s
    retry_timeout 12h
    chunk_limit_size 8M
    total_limit_size 5G
    queued_chunks_limit_size 64
    overflow_action drop_oldest_chunk
  </buffer>
</match>

Adjust buffer settings to mitigate log loss when Docker log files rotate.

With these configurations, you have a cloud‑native logging pipeline that collects Kubernetes container logs via Fluentd, enriches them with metadata, and stores them in Loki, ready for Grafana visualization and multi‑tenant querying.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native Kubernetes Logging Loki Fluentd

Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.