How to Seamlessly Pipe Kubernetes Container Logs into Loki with Fluentd
This article walks through a cloud-native approach to collect Kubernetes container logs using Fluentd and forward them to Loki, covering why Fluentd is chosen, required plugins, Docker log handling, multi-worker configuration, metadata filtering, label customization, and Loki output settings.
Why Fluentd
Fluentd is a CNCF‑managed unified log collector that can gather, process, and forward logs from many sources to files or databases, providing a consistent data collection layer for infrastructure. It offers many plugins, production‑grade guidance, and backing from major cloud providers, making it a solid choice for Kubernetes log collection.
Loki Plugin
Loki provides the
fluent-plugin-grafana-lokioutput plugin for Fluentd, allowing collected logs to be sent to a Loki instance. To use it, add the following gems to your Fluentd Docker image.
<code># Required Loki output and Kubernetes metadata plugins
gem install fluent-plugin-grafana-loki
gem install fluent-plugin-kubernetes_metadata_filter
# Recommended plugins for Prometheus metrics and tag manipulation
gem install fluent-plugin-prometheus
gem install fluent-plugin-record-modifier
gem install fluent-plugin-rewrite-tag-filter
</code>Collection Process
General recommendations for logging in Kubernetes state that applications should write to stdout/stderr, a DaemonSet should collect these streams, and a log backend such as Loki should be prepared for processing and storage.
Applications should output logs to stdout and stderr instead of local files. Use a DaemonSet (not a sidecar) to collect container stdout/stderr. Provide a log processing backend like ES or Loki with a retention plan. Include log level switches so the same image can emit different verbosity per environment.
The following diagram shows the Fluentd collection workflow on Kubernetes.
Pre‑Input Stage
Docker redirects container stdout/stderr to
/var/lib/docker/containersin JSON format, e.g.:
<code>{
"log":"xxxxxxxxxxx",
"stream":"stdout",
"time":"2020-09-15T23:09:04.902156725Z"
}
</code>When deploying Fluentd on a node, map this directory into the Fluentd container.
<code># Volume mapping for container standard input
...
volumeMounts:
- mountPath: /var/lib/docker/containers
name: varlibdockercontainers
volumes:
- hostPath:
path: /var/lib/docker/containers
name: varlibdockercontainers
...
</code>Input Stage
Use Fluentd’s
in_tailplugin to tail Docker logs:
<code><worker 0>
<source>
@type tail
@id input.containers.out
path /var/log/containers/*.log
exclude_path ["/var/log/containers/*fluentd*.log"]
pos_file /var/log/fluentd/container.out.pos
limit_recently_modified 86400
read_from_head true
tag kubernetes.*
<parse>
@type json
time_key time
time_format %Y-%m-%dT%H:%M:%S.%NZ
utc true
</parse>
</source>
</worker>
</code>For higher container density, create two workers to parallelize collection:
<code><worker 0>
<source>
@type tail
@id input.containers.out.0
path /var/log/containers/*[0-7].log
exclude_path ["/var/log/containers/*fluentd*.log"]
pos_file /var/log/fluentd/container.out.pos.0
limit_recently_modified 86400
read_from_head true
tag kubernetes.*
<parse>
@type json
time_key time
time_format %Y-%m-%dT%H:%M:%S.%NZ
utc true
</parse>
</source>
</worker>
<worker 1>
<source>
@type tail
@id input.containers.out.1
path /var/log/containers/*[8-f].log
exclude_path ["/var/log/containers/*fluentd*.log"]
pos_file /var/log/fluentd/container.out.pos.1
limit_recently_modified 86400
read_from_head true
tag kubernetes.*
<parse>
@type json
time_key time
time_format %Y-%m-%dT%H:%M:%S.%NZ
utc true
</parse>
</source>
</worker>
</code>Docker does not limit log storage by default; in production you should set --log-opt=10G to cap container log size. If the log bucket rotates while collection is paused, logs may be lost.
Filter Stage
Two plugins are used to enrich logs with Kubernetes metadata and to modify fields:
fluent-plugin-kubernetes_metadata_filter fluent-plugin-record-modifierThe metadata filter extracts information from the log tag, queries the Kubernetes API for pod and namespace labels, and adds them to the log JSON:
<code><filter kubernetes.var.log.containers.**>
@type kubernetes_metadata
@id kubernetes_metadata_container_out
skip_container_metadata true
skip_master_url true
cache_size 3000
cache_ttl 1800
</filter>
</code>The plugin caches metadata; adjust cache size and TTL according to cluster size. If new pods lack labels, wait a moment or restart Fluentd.
The record_modifier plugin creates custom fields and removes the original Docker/Kubernetes metadata:
<code><match kubernetes.var.log.containers.**>
@type record_modifier
@id label.container.out
tag ${record.dig('k8s_std_collect') ? 'loki.kubernetes.var.log.containers' : 'dropped.var.log.containers'}
<record>
k8s_container_id ${record.dig("docker","container_id")}
k8s_cloud_cluster "${ENV['CLOUD_CLUSTER'] || 'default'}"
k8s_node ${record.dig('kubernetes','host')}
k8s_container_name ${record.dig('kubernetes','container_name')}
k8s_app_name ${record.dig('kubernetes','labels','app_kubernetes_io/name')}
k8s_svc_name ${record.dig('kubernetes','labels','app')}
k8s_pod_name ${record.dig('kubernetes','pod_name')}
k8s_namespace_name ${record.dig('kubernetes','namespace_name')}
k8s_image_version ${record.dig('kubernetes','labels','app_image_version')}
k8s_std_collect ${record.dig("kubernetes","labels","log-collect") or false}
formated_time "${Time.at(time).to_datetime.iso8601(9)}"
fluentd_worker "${worker_id}"
</record>
remove_keys docker,kubernetes
</match>
</code>A typical label convention might look like:
<code>metadata:
labels:
app: <component_name>
app.kubernetes.io/name: <app_name>
app.kubernetes.io/version: <app_release>
spec:
template:
metadata:
labels:
app: <component_name>
app.image.version: <component_image_tag>
app.kubernetes.io/name: <app_name>
log-collect: "true"
</code>The log-collect label can be used to enable or disable log collection per container; adjust the record_modifier tag accordingly. <code>tag loki.kubernetes.var.log.containers </code>
Output Stage
The final step sends enriched logs to Loki using the
fluent-plugin-grafana-lokioutput plugin:
<code><match loki.**>
@type loki
@id loki.output
url "http://loki:3100"
remove_keys topic,k8s_std_collect,formated_time,k8s_container_id
drop_single_key true
<label>
stream
k8s_cloud_cluster
k8s_container_name
k8s_node
k8s_app_name
k8s_svc_name
k8s_pod_name
k8s_image_version
k8s_namespace_name
</label>
<buffer label>
@type file
path /var/log/fluentd-buffers/loki.buffer
flush_mode interval
flush_thread_count 4
flush_interval 3s
retry_type exponential_backoff
retry_wait 2s
retry_max_interval 60s
retry_timeout 12h
chunk_limit_size 8M
total_limit_size 5G
queued_chunks_limit_size 64
overflow_action drop_oldest_chunk
</buffer>
</match>
</code>Adjust buffer settings to mitigate log loss when Docker log files rotate.
With these configurations, you have a cloud‑native logging pipeline that collects Kubernetes container logs via Fluentd, enriches them with metadata, and stores them in Loki, ready for Grafana visualization and multi‑tenant querying.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.