Tag

fault localization

1 views collected around this technical thread.

TAL Education Technology
TAL Education Technology
Jun 13, 2025 · Operations

How Large Language Models Are Revolutionizing Fault Localization

This article explores how the rapid rise of large language models and techniques like Retrieval‑Augmented Generation, Chain‑of‑Thought prompting, and multi‑agent architectures can dramatically improve the speed, accuracy, and automation of fault localization in modern operations environments.

CoTOperationsRAG
0 likes · 14 min read
How Large Language Models Are Revolutionizing Fault Localization
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Oct 9, 2024 · Operations

AIOps Implementation at Xiaohongshu: Fault Localization and Intelligent Operations

Xiaohongshu’s AIOps initiative builds a four‑layer framework that leverages machine‑learning‑driven anomaly detection, causal analysis, and trace‑based fault localization to automatically identify root‑cause services in micro‑service environments, achieving over 80 % accuracy across 1000 daily diagnoses while guiding future enhancements in change correlation and automated remediation.

AIOpsAnomaly DetectionDevOps
0 likes · 28 min read
AIOps Implementation at Xiaohongshu: Fault Localization and Intelligent Operations
vivo Internet Technology
vivo Internet Technology
Jan 4, 2023 · Artificial Intelligence

Root Cause Localization Algorithm and Its Implementation for Service Fault Diagnosis

The article describes a root‑cause localization algorithm implemented in vivo’s monitoring platform that automatically analyzes latency spikes by splitting service timelines, computing variance, clustering results with K‑means, and recursively tracing downstream services, achieving over 85 % accuracy for dependency failures while still requiring human verification and outlining future AI‑driven enhancements.

AIOpsalgorithmfault localization
0 likes · 13 min read
Root Cause Localization Algorithm and Its Implementation for Service Fault Diagnosis
Baidu Intelligent Testing
Baidu Intelligent Testing
Dec 21, 2022 · Operations

Intelligent Test Localization Practices: Spectrum-Based Fault Localization, Error-Code Build System, Revenue‑Loss Decision, and UI Case Localization

This article presents a comprehensive overview of intelligent test localization techniques—including spectrum‑based fault localization, error‑code driven build‑system localization, commercial revenue‑loss decision making, and UI case‑level tracing—detailing their motivations, methodologies, algorithms, and practical applications within automated testing pipelines.

CI/CDautomationerror code
0 likes · 10 min read
Intelligent Test Localization Practices: Spectrum-Based Fault Localization, Error-Code Build System, Revenue‑Loss Decision, and UI Case Localization
NetEase Game Operations Platform
NetEase Game Operations Platform
Sep 19, 2022 · Artificial Intelligence

Applying AIOps to Game Operations: Roadmap, Anomaly Detection, and Fault Localization

This article describes NetEase's AIOps journey for game operations, explaining the Gartner definition of intelligent operations, the implementation roadmap, detailed anomaly‑detection techniques for business, performance, and log data, and a comprehensive fault‑localization workflow that combines resource, code, and historical analysis.

AIOpsAnomaly DetectionGame Operations
0 likes · 12 min read
Applying AIOps to Game Operations: Roadmap, Anomaly Detection, and Fault Localization
Xianyu Technology
Xianyu Technology
Jul 28, 2020 · Operations

ShenTan: Automated Fault Localization System for Online Services

ShenTan is an automated fault‑localization platform for online services that quickly (under five seconds) pinpoints server‑side issues with developer‑level accuracy by aggregating real‑time metrics, applying a decision‑tree model enriched by expert knowledge and dynamic thresholds, and presenting results through an integrated alert and visualization system, while planning broader endpoint coverage and multi‑tenant support.

Big DataOperationsautomation
0 likes · 12 min read
ShenTan: Automated Fault Localization System for Online Services
Xianyu Technology
Xianyu Technology
Jul 23, 2019 · Operations

Automated Service Fault Localization System Architecture

The automated service fault localization system ingests massive real‑time instrumentation data, builds call‑chain graphs, and instantly pinpoints the exact component causing timeouts or other errors, achieving developer‑level accuracy within seconds instead of minutes while remaining simple, fast, and fully automated.

Big DataOperationsReal-time Analytics
0 likes · 8 min read
Automated Service Fault Localization System Architecture
Efficient Ops
Efficient Ops
Dec 12, 2017 · Operations

Sogou’s AI‑Powered Ops: Smart Circuit Breaker, Fault Localization & Chatbot

This article examines the three major pain points faced by Sogou's operations engineers—worry cost, insufficient intelligence, and annoyance cost—and explains how the company applies AI through intelligent circuit breaking, fault localization, and a chatbot to streamline reliability and reduce manual effort.

AI OpsChatbotIntelligent Monitoring
0 likes · 10 min read
Sogou’s AI‑Powered Ops: Smart Circuit Breaker, Fault Localization & Chatbot