Cloud Native 11 min read

Prometheus Architecture and Design Principles: A Deep Dive into Cloud-Native Monitoring

Prometheus, a CNCF‑graduated, cloud‑native monitoring system, combines pull‑based target discovery, a label‑rich time‑series data model, and four core metric types—gauge, counter, histogram, and summary—to provide near‑real‑time visibility, short‑term retention, alerting via AlertManager, and integration with Grafana and remote storage for scalable observability.

vivo Internet Technology

Apr 29, 2020

Prometheus Architecture and Design Principles: A Deep Dive into Cloud-Native Monitoring

Prometheus is the second open-source project to graduate from CNCF after Kubernetes, originating from Google's Borgmon. This article explores the architecture principles, target discovery mechanisms, metric models, and aggregation queries of Prometheus from the perspective of monitoring fundamentals.

A monitoring system is a productized solution for quantifying and managing technology and business services. It addresses two core problems: (1) Technology - digitizing and visualizing system functions and states to ensure stability and security; (2) Business - digitizing and visualizing business performance for analysis and timely intervention.

Basic Monitoring Principles:

Pre-emptive Monitoring: Monitoring must be considered during architecture design, not after deployment

What to Monitor: Global perspective, top-down from business; focus on user-facing elements first

User-Friendly: Easy-to-use monitoring services with automated integration

Visualization: Clear data display through various charts

Alerting: Define what issues need notification, who to notify, how to notify, frequency, and escalation procedures

Prometheus Architecture:

Prometheus is a near real-time monitoring system with built-in time-series data capabilities. It focuses on current data rather than historical data, as research shows 85% of time-series queries are within 26 hours. It primarily uses pull mode to collect metrics from exposed endpoints, though PushGateway is available for smaller data volumes.

Target Discovery Methods:

Static Configuration: Manual configuration in prometheus.yml with target lists

File-based Service Discovery: Loading configuration from files that are monitored for changes

API-based Service Discovery: Integration with service registries like Consul, Kubernetes, Amazon EC2, Azure

DNS-based Discovery: Querying DNS records for target lists

Metric Types:

Gauge: Numeric values that can increase or decrease (e.g., memory usage)

Counter: monotonically increasing values that only reset to zero (e.g., HTTP request count)

Histogram: Samples observations to show distribution frequency (important for understanding latency percentiles)

Summary: Similar to histogram but aggregates on client-side; suitable for non-aggregated metrics like GC data

Three rules: Use Histogram when aggregating across multiple collection nodes; use Histogram when observing data distribution; use Summary for non-cluster metrics requiring accurate percentiles.

Data Model:

Prometheus uses metric name + labels as unique identifier for time series. Data includes timestamp, metric name, tags (labels), and value. Labels define different dimensions of the same metric; changing labels creates a new time series.

Retention: Prometheus is designed for short-term monitoring and alerting, defaulting to 15 days of retention. For longer storage, consider remote storage solutions like InfluxDB.

Ecosystem: Includes AlertManager for alerting, PushGateway for push-mode data, Grafana for visualization, RemoteStoreAdapter for remote storage, Mtail for log-to-metric conversion, and various Exporters for monitoring applications, machines, databases, and message queues. Client libraries are available for Java, C, Python, and other languages.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring DevOps Prometheus Time Series Database CNCF Alertmanager metrics collection

Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.