Big Data 12 min read

Real‑time Business Data Anomaly Attribution with Tugraph‑Analytics at Huolala

This article describes how Huolala leveraged the open‑source high‑performance streaming graph engine Tugraph‑Analytics together with Flink to build a real‑time business data anomaly detection and attribution system, detailing the background, architectural evolution, technical choices, implementation details, benefits, and future plans.

DataFunSummit
DataFunSummit
DataFunSummit
Real‑time Business Data Anomaly Attribution with Tugraph‑Analytics at Huolala

Huolala faced challenges in timely detecting and attributing business anomalies due to large data volumes, fragmented influencing factors, and slow batch processing. To address this, they adopted big‑data technologies for unified collection and computation, aiming for real‑time alerts and rapid root‑cause analysis.

The early solution combined streaming and batch processing: a streaming engine cleaned and aggregated data, which was then written to a Doris database; batch jobs queried Doris for anomaly detection and joined external factors such as weather and traffic. This approach suffered from engine inconsistency, batch latency, difficulty replaying traffic for rule verification, and cumbersome joins for complex relationships.

After evaluating alternatives, the team selected Flink as the streaming engine for its maturity and integration capabilities, and chose Tugraph‑Analytics as the graph component because graphs naturally express complex multi‑entity relationships, support incremental real‑time computation, handle high‑dimensional data, and enable elastic scaling with compute‑storage separation.

The new architecture is fully streaming: Flink consumes business data, performs cleaning and statistics, and publishes results to a data‑warehouse topic. Anomaly detection now consists of two modules—dynamic threshold calculation (feeding MySQL) and real‑time anomaly judgment (publishing to Kafka for graph construction). Notification uses Flink SQL for aggregation and frequency reduction, providing stronger timeliness, higher development efficiency, and adaptive alert sensitivity.

For attribution, data streams—including metrics, alerts, task/service changes, weather, and traffic—are ingested into the graph. Incremental updates are merged with the full graph, and a graph computation produces a table of attribution results. The system aligns entity timestamps in 5‑minute slices, marks delayed data, and quantifies discrete external factors (e.g., weather) using numeric severity levels.

The attribution logic first drills down from an alert to related metrics, calculates contribution rates, and then evaluates associated factors against abnormality coefficients to compute confidence scores, finally outputting the top‑k influencing factors.

Benefits include reducing alert latency from one hour to five minutes, improving development speed via SQL/GQL, and achieving higher computation performance by replacing table joins with graph matches. The graph also expands capability to model complex relationships, enhances attribution accuracy, and lays groundwork for global traffic replay and broader graph‑OLAP scenarios.

Future plans focus on implementing a global traffic‑replay system, extending graph OLAP to task‑diagnosis and other use cases, and lowering GQL adoption barriers by generating queries from large‑language models.

The presentation concluded with a Q&A session, confirming that the graph is primarily used for incremental, real‑time computation and complex relationship queries, replacing cumbersome multi‑table joins.

Big DataFlinkreal-time analyticsgraph databaseAnomaly DetectionTuGraph-Analytics
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.