Operations 13 min read

Design and Implementation of the “Sentinel” Monitoring System for Enterprise Data Reporting

The article details the background, five‑layer architecture, core modules, data model, processing, storage, and alert strategies of the Sentinel monitoring system built on Nebula Graph and integrated with Enterprise WeChat, highlighting its real‑time monitoring, task tracing, and the resulting improvements in reporting timeliness and reliability.

58 Tech
58 Tech
58 Tech
Design and Implementation of the “Sentinel” Monitoring System for Enterprise Data Reporting

The Sentinel monitoring system was created to address the complexity of the Group’s daily business reports, which involve over 2,300 upstream tasks and require strict consistency, completeness, and timeliness checks before publishing.

Background : Existing baseline monitoring on the Xinghe platform could trigger alerts when tasks missed their promised completion time, but manual confirmation required VPN access and the system lacked unified feedback on report delivery.

Architecture : The solution adopts a five‑layer design—Storage, Foundation, Monitoring, Interaction, and Configuration—built on Nebula Graph, a distributed native graph database that efficiently stores task nodes and dependency edges.

Core Modules :

Data Model: collects historical task run data and current day status via APIs.

Data Processing: uses a box‑plot (Q1, Q3) model on the past 30 days to compute standard start/end times, filtering outliers.

Data Storage: stores task metadata as tags and dependencies as edges in Nebula Graph, enabling fast graph queries.

Technology Selection: Nebula Graph provides Graph, Meta, and Storage services, supports nGQL queries, and integrates with Spark/Flink for data import.

Real‑time Monitoring : A fixed‑time window scans tasks that should be running or completed, leveraging nGQL to traverse the graph and compare actual vs. standard runtimes. Alerts are generated for tasks exceeding 80% of standard time, delayed over 20 minutes, or stuck in waiting.

Alert Strategies : Alerts are categorized (task exception, delay, waiting) and sent via Enterprise WeChat and phone calls. Duplicate alerts for the same upstream task are merged to reduce noise.

Interaction via Enterprise WeChat :

Task scheduling, querying, and termination directly from the mobile client.

One‑click confirmation of data‑validation failures without VPN.

Daily report push statistics and status overview.

Effect Evaluation : After deployment, report‑generation delay rates dropped significantly, on‑time delivery improved by 9.6 %, and night‑time alarm handling efficiency increased.

Challenges & Future Plans :

Refining alert precision to avoid unnecessary notifications.

Stabilizing graph traversal performance by limiting hop counts.

Adding intelligent holiday handling and ML‑based task‑time anomaly detection.

delay taskID|taskName is the command used for task tracing.

monitoringdata pipelineoperationsgraph databaseEnterprise WeChatNebula Graphreal-time alerts
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.