How Kelemetry Transforms Kubernetes Observability with Object‑Centric Tracing
Kelemetry, an open‑source tracing system from ByteDance, links Kubernetes control‑plane components by treating each object as a span, aggregating audit logs and events into unified traces that are visualized as trees or timelines, supporting multi‑cluster monitoring and custom conversion pipelines.
Background
Kelemetry is a tracing system developed by ByteDance for the Kubernetes control plane. It connects the behavior of multiple Kubernetes components from a global perspective, tracking the full lifecycle of individual objects and the interactions between different objects.
Traditional distributed tracing follows a request‑centric model where a root span is created at the start of a user request and child spans are generated for each internal RPC. Kubernetes, however, operates asynchronously and declaratively: components update the desired state in the apiserver and other components continuously reconcile toward that state, making the classic span model unsuitable.
Because each component reconciles independently, direct causal relationships are hard to observe. Existing component‑specific traces only capture isolated reconciliations, leaving observability islands.
Design
1. Object as Span
Kelemetry creates a span for each Kubernetes object instead of for each operation. Events occurring on the object become child spans. Objects are linked through ownership, so child‑object spans become sub‑spans of the parent‑object span. This yields two dimensions: a tree hierarchy representing object relationships and a timeline showing event order, usually matching causality.
For example, a single‑pod Deployment’s interaction among the deployment controller, replica‑set controller, and kubelet can be displayed as a single trace using audit logs and events.
2. Audit Log Collection
Kelemetry’s primary data source is the apiserver audit log, which records detailed information about each controller operation, including the client, involved objects, and precise duration. Audit logs are exposed either as log files or via a webhook. Kelemetry provides an audit webhook to receive native audit events and a plugin API for consuming logs from vendor‑specific message queues.
3. Event Collection
Controllers emit Kubernetes
Eventobjects during reconciliation. Kelemetry watches the event API to retrieve the latest version of each event. To avoid duplicate spans, it uses heuristics such as persisting the timestamp of the last processed event and ignoring earlier events after a restart, and checking for changes in
resourceVersion.
0/4022 nodes are available to run pod xxxxx: 1072 Insufficient memory, 1819 Insufficient cpu, 1930 node(s) didn't match node selector, 71 node(s) had taint {xxxxx}, that the pod didn't tolerate.
4. Linking Object State with Audit Logs
Kelemetry runs a controller that watches object create, update, and delete events. When an audit event arrives, it links the audit span to the corresponding object span using the object's
resourceVersion. Diffs and snapshots of each
resourceVersionare cached in a distributed KV store, allowing later correlation of audit logs with the fields changed by a controller. This also helps identify 409 conflicts by grouping audit logs that share the same old
resourceVersion.
5. Front‑end Trace Conversion
Because traditional tracing protocols (e.g., OTLP) cannot modify a span after it ends, Kelemetry intercepts results between the Jaeger query front‑end and the storage back‑end, applying custom conversion pipelines. Four pipelines are supported:
tree – a simplified trace tree with shortened service/operation names.
timeline – flattens nested pseudo‑spans, placing all event spans under the root.
tracing – expands non‑object spans into logs attached to related object spans.
grouping – creates a new pseudo‑span for each data source (audit/event) and merges spans from different components.
Users select a pipeline via the “service name” filter; the intermediate storage plugin generates a new CacheID that maps to the actual TraceID and the chosen pipeline.
6. Breaking Duration Limits
Traces are limited to 30‑minute windows to avoid storage overload. Kelemetry’s storage plugin merges spans that share the same object label across windows, presenting a continuous story as a single trace while eliminating duplicate object spans.
7. Multi‑Cluster Support
Kelemetry can be deployed to monitor events from multiple clusters. At ByteDance, it creates about 8 billion spans per day (excluding pseudo‑spans) using a multi‑Raft cache backend. Objects can be linked across clusters, enabling cross‑cluster component tracing.
Future Enhancements
1. Custom Trace Sources
Kelemetry will ingest traces from existing components and integrate them into its unified view, extending observability beyond audit logs and events.
2. Batch Analysis
Aggregated traces will enable queries such as “how long does a deployment upgrade take from start to first image pull?” Future work includes periodic analysis of half‑hourly trace outputs to identify patterns and correlate scenarios.
Use Cases
1. ReplicaSet Controller Anomaly
A deployment repeatedly created new Pods. Kelemetry trace revealed that the ReplicaSet controller emitted a
SuccessfulCreateevent for Pod creation but never updated the ReplicaSet status or observed the Pods, indicating a possible informer consistency issue.
2. Floating minReadySeconds
A deployment’s rolling update was unusually slow because the
minReadySecondsfield was temporarily set to 3600 by the federation component. Kelemetry trace pinpointed the change, allowing quick rollback to the intended value of 10.
Try Kelemetry
Kelemetry is open‑source on GitHub: github.com/kubewharf/kelemetry . Follow the docs/QUICK_START.md guide to get started, or view an online preview at kubewharf.io/kelemetry/trace-deployment/ . Contributions and issue reports are welcome.
- END -
Volcano Engine Cloud‑Native Team
The team builds PaaS‑class products for Volcano Engine’s public and private cloud, leveraging ByteDance’s cloud‑native expertise to accelerate digital transformation with services such as container platforms, image registries, serverless, service mesh, continuous delivery, and observability.
ByteDance Cloud Native
Sharing ByteDance's cloud-native technologies, technical practices, and developer events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.