Observability Concepts and OpenTelemetry Architecture Overview
Observability turns a black‑box application into a system by gathering logs, metrics, and traces, using alerts to spot anomalies, then linking trace IDs to logs; OpenTelemetry standardizes this with instrumented client agents, a Collector (receivers, processors, exporters), and backend storage, while Java agents, span propagation, exemplars, eBPF, and bundles like SigNoz or OpenObserve let teams choose between a custom OTel stack or a solution.
Observability treats a running software system as a black box unless it is actively observed. By collecting logs, metrics, and traces, engineers can move from passive fault handling to proactive fault detection, especially in micro‑service environments.
The three fundamental pillars are:
Logs – discrete log entries.
Metrics – aggregated numeric indicators.
Trace – request‑level call‑chain tracking.
A typical troubleshooting workflow starts with metrics alerts (e.g., CPU or memory usage > 80% or an instance down). An example Prometheus alert rule is shown below:
groups:
- name: AllInstances
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
annotations:
title: 'Instance {{ $labels.instance }} down'
description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute.'
labels:
severity: 'critical'After detecting an anomaly, trace information is used to locate the faulty node, and the associated trace_id is used to retrieve the relevant logs.
OpenTelemetry (OTel) unifies observability standards. Its history includes key milestones such as Google’s Dapper paper (2010), the rise of Kubernetes, Jaeger, the merger of OpenTracing and OpenCensus (2019), and the first GA release in 2021.
OTel’s architecture consists of three layers:
Client agents that instrument applications.
The Collector service that receives, processes, and exports data.
Back‑ends that store metrics, logs, and traces (e.g., VictoriaMetrics, StackRocks, Elasticsearch).
The OpenTelemetry Collector is the central component, composed of Receivers, Processors, and Exporters, supporting a wide variety of protocols and destinations.
For Java applications, the opentelemetry-java-instrumentation library provides a javaagent that automatically instruments most frameworks. Example usage:
# Java example
java -javaagent:path/to/opentelemetry-javaagent.jar \
-jar myapp.jarTracing concepts include Spans (with fields such as SpanName, ParentID, SpanID, TraceID, start/end timestamps) and Span Kind (Client, Server, Internal, Producer, Consumer). Context propagation is essential for passing the trace_id across process boundaries via HTTP headers, gRPC metadata, or message properties, using Inject and Extract APIs.
Metrics in OTel follow standardized naming conventions and support Exemplars, which link a metric sample back to a trace, enabling fast navigation from a metric anomaly to the corresponding trace and logs.
Beyond OTel, the article introduces eBPF – a kernel‑level virtual machine that provides non‑intrusive, high‑performance, and safe instrumentation for Linux systems. While powerful, eBPF is more suited for platform‑level monitoring rather than fine‑grained business metrics.
Commercial platforms such as SigNoz and OpenObserve bundle the OTel collector and storage, offering unified UI and storage for logs, traces, and metrics, simplifying operations for small‑to‑medium teams.
In summary, organizations can either build a custom OTel stack or adopt integrated solutions like SigNoz or OpenObserve, depending on their operational requirements and resource constraints.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.