How to Enable Ceph Enterprise Monitoring with Prometheus & Grafana
Learn step‑by‑step how to activate Ceph’s monitoring modules, configure Prometheus to collect Ceph metrics, verify data collection, and integrate Grafana dashboards, including tips on required dependencies and troubleshooting, to ensure reliable, secure storage management in enterprise cloud‑native environments.
In today’s data‑driven world, ensuring efficient and secure storage systems is critical; Ceph provides an open‑source distributed storage solution with a powerful monitoring panel for cluster health and performance.
Prevention is better than cure – monitor first.
1. Enable Ceph monitoring
<code>$ ceph mgr module enable prometheus</code>Tip: If the mgr host lacks the cherrypy module, the command will fail.
Solution:
<code>pip3 install cherrypy -i https://mirrors.aliyun.com/pypi/simple/
sudo systemctl restart ceph-mgr.target</code>2. Enable RBD monitoring
<code>$ ceph config set mgr mgr/prometheus/rbd_stats_pools "kubernetes,cephfs-data,cephfs-metadata"</code>Tip: To monitor all RBD pools, set the value to "*".
Prometheus collection of Ceph metrics
Edit the Prometheus ConfigMap to add a job named “ceph” with the target nodes:
<code>$ kubectl -n kube-system edit cm prometheus
- job_name: 'ceph'
static_configs:
- targets:
- "172.139.20.20:9283"
- "172.139.20.208:9283"
- "172.139.20.94:9283"</code>Verify collection success with curl:
<code>$ curl -s $(kubectl -n kube-system get svc prometheus -ojsonpath='{.spec.clusterIP}:{.spec.ports[0].port}')/prometheus/api/v1/query --data-urlencode 'query=up{job=~"ceph.*"}' | jq '.data.result[] | {job: .metric.job, instance: .metric.instance ,status: .value[1]}'</code>Sample output shows each Ceph instance returning status “1”.
Grafana: add Ceph monitoring dashboards
Download the official Ceph dashboard files from the Ceph repository:
https://github.com/ceph/ceph/tree/main/monitoring/ceph-mixin/dashboards_out
Tip: The dashboards rely on node‑exporter metrics.
Conclusion
Ceph’s enterprise‑grade monitoring panel is essential for managing large‑scale distributed storage, improving reliability and efficiency while safeguarding data assets; proper configuration and continuous optimization enable stable operation and support business growth in digital transformation.
Linux Ops Smart Journey
The operations journey never stops—pursuing excellence endlessly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.