How AutoTagging and MultistageCodec Transform Cloud‑Native Observability
This article explores the challenges of building a unified observability data platform for hybrid‑cloud microservices, examines six common data‑island scenarios, and presents DeepFlow's AutoTagging and MultistageCodec techniques that dramatically reduce tagging overhead and storage costs while enabling seamless cross‑data correlation.
1. Challenges of an Observability Data Platform
Hybrid cloud and containers are becoming the main infrastructure for micro‑service applications, but monitoring cloud‑native workloads faces difficulties such as complex diagnosis, large scale, high elasticity, and strong volatility, making observability a key concern for operations developers.
2. Six Common Data‑Island Scenarios
Metrics, Tracing, and Logging are the three data types identified by Peter Bourgon. Their separation creates data islands: coarse granularity, lack of correlation, and high resource consumption. The six scenarios include Trace‑Metrics association, Metrics‑Metrics correlation, Metrics‑Log association, Log‑Log correlation, Log‑Trace association outside request scope, and cross‑layer Trace association.
3. Solving Data Islands: AutoTagging
DeepFlow’s AutoTagging automatically injects hundreds of resource and custom labels (cluster, namespace, service, deployment, pod, version, environment, owner, CI/CD stage, commit ID, etc.) into all observability data without requiring developers to add code, thus breaking the isolation between metrics, traces, and logs.
4. Reducing Resource Overhead: MultistageCodec
AutoTagging introduces many tags, which can increase resource consumption. MultistageCodec encodes tags in three stages—collection, storage, and query—using integer representations and selective enrichment, dramatically lowering CPU, memory, bandwidth, and ClickHouse storage costs.
5. Practical Results: Resource Consumption Below 1%
In a production environment monitoring 600 nodes (8000 pods) with millions of rows per second, the backend resource usage stayed under 1% of the monitored workload, demonstrating the efficiency of the combined AutoTagging and MultistageCodec approach.
Conclusion
AutoTagging injects unified query tags across diverse data sources, while MultistageCodec compresses those tags to reduce storage overhead by up to tenfold; together they enable seamless cross‑data correlation, powerful slicing and dicing, and minimal backend resource consumption for cloud‑native observability.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.