Persisting Prometheus Alertmanager Alerts with Alertsnitch, MySQL, and Grafana
This article explains how Prometheus stores alerts only as time‑series data, why that limits historical queries, and provides a complete open‑source solution using Alertmanager, Alertsnitch, MySQL, and Grafana to persist, query, and visualize alerts in production environments.
1. Alert Triggering Mechanism
Prometheus evaluates alert rules on host‑level metrics such as memory, CPU, disk, I/O, and network, polling the collected time‑series at a configured interval. When a rule’s expression exceeds its threshold, an alert is generated and sent to the configured receiver.
1.1 Alert Rules
Example of a memory‑usage alert:
<code>groups:
- name: host_alert
rules:
- alert: 内存使用过高
expr: 100 -(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 85
for: 5m
labels:
severity: 一般
type: 服务器
annotations:
summary: '{{$labels.mountpoint}} 内存使用率过高!'
description: '{{$labels.mountpoint }} 内存使用大于85%(目前使用:{{ printf "%.2f" $value }}%)'
</code>The rule consists of:
alert : name displayed in the alert title.
expr : PromQL expression defining the condition.
for : optional evaluation period; the alert fires only after this duration.
labels : custom key‑value pairs for routing or severity.
annotations : additional information such as summary and description.
1.2 Alert States
An alert starts as Inactive . When the expression exceeds the threshold it becomes Pending ; after the
forperiod it transitions to Firing and is sent to the receiver. If the expression falls below the threshold, the alert returns to Inactive , completing its lifecycle.
1.3 Convergence Rules
Inhibition : suppresses related low‑severity alerts (e.g., process or database down) when a higher‑severity host‑down alert is active, preventing alert storms.
Silences : temporarily mute alerts during maintenance windows or deployments.
Routing : each alert passes through the top‑level route tree; the deepest matching route determines the final receiver. Matching can be based on literal strings or regular expressions.
<code>route:
group_by: ['alertname','instance','project']
group_wait: 1m
group_interval: 10m
repeat_interval: 4h
receiver: default
routes:
- receiver: 'mysql'
match_re:
project: mysql
severity: 紧急
receivers:
- name: 'whole'
webhook_configs:
- url: http://192.168.0.2:31102/webhook
send_resolved: true
</code>2. Alert Persistence
Tech Stack : Prometheus + Alertmanager + Alertsnitch + Grafana.
Alertsnitch provides a webhook that receives Alertmanager alerts and writes them into MySQL (or PostgreSQL) for long‑term storage, which can then be visualized in Grafana.
2.1 Database Schema
<code>DROP PROCEDURE IF EXISTS bootstrap;
DELIMITER //
CREATE PROCEDURE bootstrap()
BEGIN
SET @exists := (SELECT 1 FROM information_schema.tables I WHERE I.table_name = "Model" AND I.table_schema = database());
IF @exists IS NULL THEN
CREATE TABLE `Model` (
`ID` enum('1') NOT NULL,
`version` VARCHAR(20) NOT NULL,
PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `Model` (`version`) VALUES ("0.0.1");
ELSE
SIGNAL SQLSTATE '42000' SET MESSAGE_TEXT='Model Table Exists, quitting...';
END IF;
END;//
DELIMITER ;
CALL bootstrap();
DROP PROCEDURE bootstrap;
CREATE TABLE `AlertGroup` (
`ID` INT NOT NULL AUTO_INCREMENT,
`time` TIMESTAMP NOT NULL,
`receiver` VARCHAR(100) NOT NULL,
`status` VARCHAR(50) NOT NULL,
`externalURL` TEXT NOT NULL,
`groupKey` VARCHAR(255) NOT NULL,
KEY `idx_time` (`time`) USING BTREE,
KEY `idx_status_ts` (`status`, `time`) USING BTREE,
PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `GroupLabel` (
`ID` INT NOT NULL AUTO_INCREMENT,
`AlertGroupID` INT NOT NULL,
`GroupLabel` VARCHAR(100) NOT NULL,
`Value` VARCHAR(1000) NOT NULL,
FOREIGN KEY (AlertGroupID) REFERENCES AlertGroup (ID) ON DELETE CASCADE,
PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
</code>2.2 Deploying Alertsnitch in Kubernetes
<code>apiVersion: v1
kind: Service
metadata:
name: mysql-dev
namespace: monitoring
spec:
ports:
- port: 3306
targetPort: 3306
protocol: TCP
nodePort: 31102
type: NodePort
</code>The service is exposed via NodePort 31102, allowing any cluster node to reach the alert ingestion endpoint.
2.3 Configuring Alertmanager to Send to Alertsnitch
<code>route:
receiver: whole
routes:
- receiver: whole
continue: true
receivers:
- name: 'whole'
webhook_configs:
- url: http://192.168.0.2:31102/webhook
send_resolved: true
</code>After reloading Alertmanager, alerts are pushed to Alertsnitch and stored in MySQL.
2.4 Visualization with Grafana
Grafana dashboards query the MySQL tables to display total alerts, resolved alerts, trend charts, severity distribution, and detailed alert information (name, start/end time, duration, labels, etc.).
3. Conclusion
The article dissects the Prometheus alerting ecosystem, explains Alertmanager’s mechanisms, and proposes a practical persistence solution—Alertmanager + Alertsnitch + MySQL + Grafana—that satisfies historical alert queries and statistics for production‑grade monitoring.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.