Design and Implementation of the Log Reporting, Collection, and Distribution Pipeline in NetEase Cloud Music's Corona Front‑end Monitoring System
The article details NetEase Cloud Music’s Corona monitoring pipeline, explaining how SDKs report logs via an HTTP service, how a transmission layer normalizes and stores them, how a Flume‑like collector forwards logs to HBase and Kafka, and how Flink tasks shard and filter streams for various monitoring services while handling traffic spikes and offering an independent Node.js channel for other business units.
Corona is NetEase Cloud Music's large‑scale front‑end monitoring product. Its SDK captures various types of logs in applications, which are then reported, collected, and routed to Corona's consumption services. This pipeline directly impacts data real‑time and system stability, making efficient and reliable log delivery crucial.
The log reporting, collection, and distribution chain sits within NetEase Group's shared services infrastructure. It is designed with replaceable cloud‑music service nodes so that other business units can adopt the same capability, and the article provides a detailed exposition of this chain.
3.1 Log Reporting via Cloud Music Log Service – All SDKs use an HTTP interface provided by the Cloud Music log service. Android and iOS apps write logs to local files and upload them asynchronously, while web front‑ends use ordinary HTTP POST requests. The service employs business‑specific domain names that automatically carry cookies, enriching the log data.
Log Transmission Layer – After reception, logs are decompressed, decrypted (if needed), and normalized. The system parses cookies, extracts user IDs, resolves IP addresses, records the server‑side timestamp, and performs other preprocessing steps. Processed logs are temporarily stored on local disks and archived according to 日志类型 , 应用类型 and 业务 information before being written to different log files.
3.2 Log Collection Service – Corona leverages an internally developed log collection service (similar to Apache Flume). The core collector, called Agent , runs on the application servers of the Cloud Music log service cluster and is managed via a Manager Service that distributes collection rules. The Agent reads designated log files, tracks read positions, and forwards logs to two destinations: (1) the DWD detail layer (an HBase database) for raw log backup and analyst access, and (2) Kafka message queues, which later flow into the ADS layer database for consumption by Corona services.
3.3 Data Consumption Layer and Log Sharding – Logs written to Kafka contain all monitoring log types for a business (e.g., device activity, exceptions, performance). To avoid resource waste and stability risks, Corona uses Flink real‑time tasks to split logs according to the needs of individual consumption services (exception monitoring, performance monitoring, real‑time traffic monitoring). Each Flink job queries the relevant logs from a streaming table and writes them into separate Kafka topics for downstream services.
3.4 Handling Extreme Traffic – When a particular log type experiences a sudden surge, Corona adds traffic monitoring and alerts at the consumption service level. Upon alert, Flink can be reconfigured with filter rules to drop the high‑volume logs, minimizing impact. The lightweight Flink logic allows rapid scaling without operator intervention, and manual response is limited due to the rarity of such spikes.
3.5 Independent Log Reporting Channel – For other business units, Corona provides a standalone log receiver implemented in Node.js. This receiver does not carry business cookies; additional information (e.g., user ID) can be injected via SDK APIs. The receiver reassembles logs, adds IP data, and writes them to an independent Kafka topic that connects directly to the existing sharding pipeline.
The article concludes by summarizing the five sections, presenting a complete end‑to‑end log flow diagram, and inviting readers to discuss further. It also includes a recruitment notice from the NetEase Cloud Music technical team.
NetEase Cloud Music Tech Team
Official account of NetEase Cloud Music Tech Team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.