Cloud Native 12 min read

Building a Cloud‑Native Large‑Scale Distributed Monitoring System with Prometheus

This article explains how to design and implement a cloud‑native, large‑scale distributed monitoring system using Prometheus, covering its limitations, service‑level sharding, centralized storage, federation, and high‑availability strategies to overcome scaling challenges in Kubernetes environments.

Cloud Native Technology Community

Mar 30, 2020

Building a Cloud‑Native Large‑Scale Distributed Monitoring System with Prometheus

In this series, the author from Tencent Cloud Container Service (TKE) introduces practical methods for building a cloud‑native large‑scale distributed monitoring system based on Prometheus.

1. Overview – Prometheus has become the de‑facto standard for monitoring, offering an efficient time‑series database, powerful PromQL, and a pull‑based metric collection model, making it the primary choice for monitoring systems.

2. Pain points of Prometheus in large‑scale scenarios – Prometheus is a single‑node solution without built‑in clustering, limiting storage to a single disk and making high‑availability and horizontal scaling difficult. Large data volumes force trade‑offs such as dropping less important metrics, reducing scrape intervals, or shortening data retention.

3. Splitting Prometheus by service dimension – The recommended approach is to run multiple Prometheus instances, each responsible for a subset of services, thereby achieving horizontal scaling.

In most cases this satisfies the demand, but extremely large services may still exceed a single instance’s capacity.

4. Sharding ultra‑large services – For clusters with thousands of nodes, metrics from node‑level services (e.g., kubelet cAdvisor, node‑exporter) become massive. Sharding splits a service into multiple groups, each scraped by a dedicated Prometheus instance.

Two sharding solutions are presented:

• Custom sharding with Consul service discovery – Assign nodes to groups, register them in Consul, and configure each Prometheus instance with consul_sd_config to scrape only its group.

- job_name: 'cadvisor-1'
  consul_sd_configs:
    - server: 10.0.0.3:8500
      services:
        - cadvisor-1 # This is the 2nd slave

• Kubernetes node discovery with Prometheus relabel hashmod – Use kubernetes_sd_configs to discover nodes and apply a hashmod relabel rule to divide them into a configurable number of groups.

- job_name: 'cadvisor-1'
  metrics_path: /metrics/cadvisor
  scheme: https
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  kubernetes_sd_configs:
    - role: node
  tls_config:
    insecure_skip_verify: true
  relabel_configs:
    - source_labels: [__address__]
      modulus: 4
      target_label: __tmp_hash
      action: hashmod
    - source_labels: [__tmp_hash]
      regex: '^1$'
      action: keep

5. New issues introduced by splitting – Deploying many Prometheus instances increases the number of data sources in Grafana, making global queries and dashboards more complex.

Two ways to address this:

• Centralized storage – Let Prometheus only scrape data and forward it via remote_write to a clustered time‑series database such as OpenTSDB or InfluxDB.

remote_write:
  - url: http://10.0.0.2:8888/write

Grafana then queries the external TSDB directly.

Note that this sacrifices PromQL in favor of the storage engine’s query language.

6. Prometheus federation – An alternative is to federate multiple Prometheus instances, letting a higher‑level Prometheus aggregate selected metrics from the lower‑level ones.

Federation is useful when only a subset of data needs to be combined, such as aggregating cAdvisor metrics from many nodes.

7. High availability for Prometheus – Deploy each Prometheus instance in two identical replicas sharing the same data volume. Place a load balancer (Nginx, HAProxy, or Kubernetes Service) in front so queries are routed to any healthy replica.

This provides HA for query and scrape operations.

8. Summary – The discussed techniques mitigate single‑node Prometheus limitations in large‑scale environments, but they increase operational complexity and do not solve long‑term storage. For cold, infrequently accessed data, storing it in cheap object storage via Thanos offers unlimited retention while preserving the Prometheus API.

Future articles will dive deeper into Thanos architecture.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native Sharding high availability Prometheus Scaling Federation

Written by

Cloud Native Technology Community

The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.