Operations 9 min read

Understanding White‑Box and Black‑Box Monitoring: Data Collection Methods and the Four Golden Metrics

This article explains the differences between white‑box and black‑box monitoring, outlines common data‑collection techniques for both basic and business metrics, and details Google SRE’s four golden indicators—error, latency, traffic, and saturation—to help engineers design effective monitoring solutions.

JD Tech

Jan 31, 2019

Understanding White‑Box and Black‑Box Monitoring: Data Collection Methods and the Four Golden Metrics

Many articles discuss white‑box and black‑box monitoring and the four golden metrics, but this piece focuses on how to collect monitoring data for a new system.

Monitoring Data Collection – When configuring monitoring, the first challenge is how to gather data. Metrics are divided into basic (system‑level) and business (application‑level) monitoring.

1. Basic Monitoring – Includes CPU, memory, disk, ports, processes, and other OS‑level information. Popular open‑source tools such as Prometheus and Zabbix already provide collectors for these metrics, but they alone cannot fully represent service health.

2. Business Monitoring – Generated by the business system itself and reflects real‑world operation. Common collection methods are:

Logs : Use log collectors like Rsyslog, Logstash, Filebeat, or Flume.

JMX : Java services expose metrics via JMX; tools like jmxtrans or jmxcmd can be used.

REST : Services expose metrics through REST APIs (e.g., Hadoop, Elasticsearch).

OpenMetrics : The Prometheus‑compatible format becoming an industry standard.

Command‑line : Local commands that output metrics.

Push (Active Reporting) : Services push metrics to the monitoring system (e.g., custom Metrics sinks).

Instrumentation (Instrumentation Points) : In‑code hooks that emit metrics.

Other Methods : Zookeeper four‑letter commands, MySQL SHOW STATUS, etc.

If no ready‑made collector exists, developers may need to write custom scripts.

The Four Golden Metrics

Google SRE defines four essential indicators for any service:

Error – Rate of failed requests. Monitor both infrastructure failures (disk, process, network) and business‑level errors (core function failures, master node health, node availability).

Latency – Time taken to serve requests. Track both low‑level IO/network latency and high‑level business response times (e.g., Zookeeper zk_avg_latency, Elasticsearch query latency).

Traffic – Volume of requests or data (QPS, PV, UV, network I/O). Sudden spikes or drops can signal attacks or failures.

Saturation – Utilization of resources (CPU, memory, disk, network, message‑queue length). When utilization approaches capacity, errors and latency tend to increase.

Both white‑box (internal) and black‑box (end‑to‑end) monitoring should be employed: white‑box for detailed component metrics and black‑box for user‑visible performance.

Conclusion – The article summarizes common monitoring collection techniques and the four golden metrics, emphasizing that monitoring designs vary across systems and must be tailored to specific business requirements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Monitoring Operations Metrics SRE black-box golden metrics white-box

Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.