Building a Cloud‑Native Large‑Scale Distributed Monitoring System with Prometheus
This article explains how to design and implement a cloud‑native, large‑scale distributed monitoring system using Prometheus, covering its limitations, service‑level sharding, centralized storage, federation, and high‑availability strategies to overcome scaling challenges in Kubernetes environments.
In this series, the author from Tencent Cloud Container Service (TKE) introduces practical methods for building a cloud‑native large‑scale distributed monitoring system based on Prometheus.
1. Overview – Prometheus has become the de‑facto standard for monitoring, offering an efficient time‑series database, powerful PromQL, and a pull‑based metric collection model, making it the primary choice for monitoring systems.
2. Pain points of Prometheus in large‑scale scenarios – Prometheus is a single‑node solution without built‑in clustering, limiting storage to a single disk and making high‑availability and horizontal scaling difficult. Large data volumes force trade‑offs such as dropping less important metrics, reducing scrape intervals, or shortening data retention.
3. Splitting Prometheus by service dimension – The recommended approach is to run multiple Prometheus instances, each responsible for a subset of services, thereby achieving horizontal scaling.
In most cases this satisfies the demand, but extremely large services may still exceed a single instance’s capacity.
4. Sharding ultra‑large services – For clusters with thousands of nodes, metrics from node‑level services (e.g., kubelet cAdvisor, node‑exporter) become massive. Sharding splits a service into multiple groups, each scraped by a dedicated Prometheus instance.
Two sharding solutions are presented:
• Custom sharding with Consul service discovery – Assign nodes to groups, register them in Consul, and configure each Prometheus instance with consul_sd_config to scrape only its group.
- job_name: 'cadvisor-1'
consul_sd_configs:
- server: 10.0.0.3:8500
services:
- cadvisor-1 # This is the 2nd slave• Kubernetes node discovery with Prometheus relabel hashmod – Use kubernetes_sd_configs to discover nodes and apply a hashmod relabel rule to divide them into a configurable number of groups.
- job_name: 'cadvisor-1'
metrics_path: /metrics/cadvisor
scheme: https
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__address__]
modulus: 4
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: '^1$'
action: keep5. New issues introduced by splitting – Deploying many Prometheus instances increases the number of data sources in Grafana, making global queries and dashboards more complex.
Two ways to address this:
• Centralized storage – Let Prometheus only scrape data and forward it via remote_write to a clustered time‑series database such as OpenTSDB or InfluxDB.
remote_write:
- url: http://10.0.0.2:8888/writeGrafana then queries the external TSDB directly.
Note that this sacrifices PromQL in favor of the storage engine’s query language.
6. Prometheus federation – An alternative is to federate multiple Prometheus instances, letting a higher‑level Prometheus aggregate selected metrics from the lower‑level ones.
Federation is useful when only a subset of data needs to be combined, such as aggregating cAdvisor metrics from many nodes.
7. High availability for Prometheus – Deploy each Prometheus instance in two identical replicas sharing the same data volume. Place a load balancer (Nginx, HAProxy, or Kubernetes Service) in front so queries are routed to any healthy replica.
This provides HA for query and scrape operations.
8. Summary – The discussed techniques mitigate single‑node Prometheus limitations in large‑scale environments, but they increase operational complexity and do not solve long‑term storage. For cold, infrequently accessed data, storing it in cheap object storage via Thanos offers unlimited retention while preserving the Prometheus API.
Future articles will dive deeper into Thanos architecture.
Cloud Native Technology Community
The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.