Observability in Serverless Environments: Monitoring, Logging, Distributed Tracing, and Best Practices
In this talk, Gal Bashan explains how serverless architectures complicate observability and why metrics, logs, and especially distributed tracing with tools like OpenTelemetry, Jaeger, or commercial platforms are essential for gaining end-to-end visibility, automating instrumentation, and maintaining reliable, business-focused services across cloud providers.
Serverless environments introduce new complexity and observability challenges for DevOps and development teams. In this talk, Gal Bashan, Director of Engineering at Epsagon, reviews the fundamentals of observability—metrics, logs, traces, alerts—and explains how to implement them in micro‑service and serverless architectures.
He begins with a historical view of traditional monitoring, describing how early systems relied on agents on physical servers to collect CPU, memory, and request metrics. While metrics can indicate that a problem exists, they do not explain why. Bashan emphasizes the need for logs to provide contextual information about requests, database interactions, and application state, enabling root‑cause analysis.
The discussion then shifts to modern cloud‑native applications. Today, workloads run on managed services such as AWS Lambda, containers, Kubernetes, ECS, or Fargate, which offload infrastructure management but also obscure visibility. Because a single user request may traverse dozens of services across multiple cloud providers, isolated metrics or logs from individual components are insufficient. Full‑stack observability must consider cross‑service interactions, third‑party APIs (e.g., Stripe), and data pipelines (Kafka, Kinesis, SQS, Pub/Sub).
To address these challenges, Bashan introduces distributed tracing as a solution. Open‑source projects like Jaeger and Zipkin, as well as the CNCF’s OpenTelemetry (the convergence of OpenTracing and OpenCensus), allow developers to instrument code, generate trace data, and visualize end‑to‑end request flows. He explains how trace data is emitted from instrumented functions (e.g., a Lambda handling an HTTP request and writing to DynamoDB) and sent to a tracing backend for correlation and UI display.
Finally, Bashan shares best practices for monitoring in serverless contexts. Although SDKs such as OpenTelemetry simplify ID injection and extraction, significant manual effort remains. Effective solutions should automatically detect frameworks, generate most trace data, provide powerful visualizations (flame graphs, timelines), and integrate alerting. At scale, organizations may prefer commercial tracing platforms to avoid the operational overhead of building and maintaining their own observability stack.
The talk concludes with a reminder that the primary goal of serverless adoption is to focus on delivering business value, letting cloud providers handle infrastructure while observability tools ensure confidence in system reliability.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.