Operations 16 min read

Application Monitoring Principles and Non‑Intrusive Data Collection at Huya

This article explains the fundamentals of distributed application monitoring, describes Huya's non‑intrusive data‑collection techniques using SDKs and plugins, outlines the design and correlation of observable metrics, and demonstrates practical results and troubleshooting scenarios for backend services.

DataFunSummit

Apr 29, 2023

Application Monitoring Principles and Non‑Intrusive Data Collection at Huya

The article begins with an overview of monitoring types, focusing on application monitoring and its importance for observing service calls, process execution, and business‑related metrics.

It then analyzes distributed application monitoring principles, highlighting cross‑process monitoring challenges such as linking request metrics across services to avoid isolated data and illustrating the need to correlate upstream and downstream request information.

Next, it discusses non‑intrusive data‑collection approaches, evaluating options like log collection, port probing, network‑packet monitoring, and finally selecting SDK/plugin‑based instrumentation for its zero‑code‑change capability and extensibility.

The implementation details describe how plugins intercept various frameworks (e.g., Spring MVC, OkHttp) and use ThreadLocal to propagate context across synchronous and asynchronous threads, enabling request correlation without modifying business code.

Metric design is covered, distinguishing basic call metrics (QPS, latency, success rate) from process‑load indicators (thread pool capacity, active threads, waiting threads) and introducing a thread‑load‑rate calculation to reflect CPU usage per request.

Alert aggregation and metric‑correlation techniques are presented to consolidate alarms from multiple instances and to identify root causes by linking related metric trends.

Practical results showcase dashboards displaying request metrics, thread‑load distribution, and examples of abnormal scenarios such as sudden traffic spikes, high‑latency requests, and database connection pool bottlenecks.

A Q&A section addresses SDK applicability to front‑end, comparisons with SkyWalking and Jaeger, and how thread‑pool data is obtained from containers like Tomcat.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Observability SRE distributed tracing application monitoring Metrics Design non‑intrusive data collection

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.