Operations 6 min read

Building a Simple Cloud‑Native Alert Platform: Features, Architecture & Roadmap

This article describes the design and implementation of a lightweight cloud‑native alert platform, outlining its core features, future enhancements, system architecture, and demo screenshots, offering practical insights for SREs and operations teams handling growing monitoring workloads.

Ops Development Stories
Ops Development Stories
Ops Development Stories
Building a Simple Cloud‑Native Alert Platform: Features, Architecture & Roadmap

Introduction

Hello, I’m Joker, an operations engineer and cloud‑native enthusiast.

Why a Custom Alert Platform?

With an increasing number of teams and clusters feeding monitoring data, alerts become chaotic, historical queries are hard, and progress tracking is difficult. Existing SaaS solutions like

快猫

,

睿象云

, and other

SAAS

alert systems cannot be used due to network restrictions, prompting the development of a simple in‑house alert platform to meet daily business needs.

Current Platform Features

Alert grouping: adopts the collaboration‑space concept from

快猫

for grouping alerts.

Configurable notification templates: allows teams to customize templates for different business requirements.

Multiple notification channels: currently supports Enterprise WeChat, with plans to add SMS, email, and phone.

Selectable notification strategies: supports single‑channel and multi‑channel strategies to route alerts of different severity to appropriate recipients.

Alert silencing.

Alert claiming.

On‑call scheduling management.

Planned Enhancements

Root‑cause inference: provide basic initial diagnosis for each alert to help SREs locate issues quickly.

Automatic remediation: automate handling of alerts that can be resolved without human intervention.

Additional alert sources: beyond Prometheus, integrate Zabbix, Alibaba Cloud, Tencent Cloud, etc.

Advanced dispatch strategies: enable routing based on labels, time windows, and other criteria, not just alert level.

Architecture Overview (V1.0)

Architecture diagram
Architecture diagram

The system consists of:

Management console: enables SREs to configure integrations, handle alerts, and query history.

Mobile H5 client: allows users to view, claim, and silence alerts on smartphones.

The management console’s front‑ and back‑end are built with the

gin-vue-admin

framework.

Demo Screenshots

Dashboard:

Dashboard
Dashboard

Collaboration Space:

Collaboration Space
Collaboration Space

Fault List:

Fault List
Fault List

Notification Templates:

Notification Templates
Notification Templates

Notification Channels:

Notification Channels
Notification Channels

Notification Strategies:

Notification Strategies
Notification Strategies

On‑call Scheduling:

On‑call Scheduling
On‑call Scheduling

Alert details within a collaboration space:

Space Alert Details
Space Alert Details

Current notification strategy (by alert level):

Level‑Based Notification
Level‑Based Notification

Sample alert received by the notification endpoint:

Alert Payload
Alert Payload

Clicking “Unresolved Alerts” opens the H5 page showing related alert information:

H5 Alert View
H5 Alert View

The platform currently implements the core functionalities described above; some features are still incomplete or missing, and feedback from the community is welcomed.

monitoringcloud-nativeoperationsalert managementincident response
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.