Operations 8 min read

Common Open‑Source Monitoring Systems and Zabbix Monitoring Process

The article introduces common open‑source monitoring tools such as Zabbix and Nagios, explains why distributed systems need proactive health checks, compares features, and provides a detailed Zabbix monitoring workflow including data collection, storage, visualization, alerting, and specific metrics for servers, networks, JVM and MySQL.

Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Common Open‑Source Monitoring Systems and Zabbix Monitoring Process

In the core skills of an architect, distributed architecture design is often discussed, but an equally important aspect is distributed cluster deployment and monitoring, especially which core data should be monitored in real time and trigger email alerts.

In high‑concurrency distributed environments, heavy‑traffic services and interfaces must be monitored promptly to prevent site slowdown or server avalanche scenarios such as cache avalanche; proactive monitoring of core metrics helps avoid these problems.

Common Open‑Source Monitoring Systems

1. Zabbix

Zabbix is a web‑based, enterprise‑grade open‑source operations platform that provides distributed system and network monitoring, and it is currently the most widely used monitoring software among Chinese internet users.

Easy to get started, powerful features, and free open source.

Zabbix is easy to manage and configure, can generate attractive graphs, and its auto‑discovery greatly reduces daily maintenance work. Rich data collection methods and API interfaces allow flexible data gathering, while its distributed architecture supports monitoring many devices.

2. Nagios

Nagios is an open‑source enterprise‑grade monitoring system that can monitor basic system parameters such as CPU, disk, and network, as well as services like SMTP, POP3, HTTP, and NNTP. By installing plugins and writing monitoring scripts, users can achieve application monitoring and build hierarchical monitoring architectures for large numbers of hosts.

Nagios’s biggest strength is its powerful management console; while the core product does not include monitoring code, all monitoring and alert functions are provided by plugins.

3. Open‑Source Monitoring Tools Comparison

4. Recommendation

It is recommended to start with Zabbix, the free open‑source monitoring solution. The following sections use Zabbix as an example to discuss the monitoring workflow and core monitoring indicators.

Zabbix Monitoring Process

Zabbix’s monitoring process can be simply described as:

Data collection → Data storage → Data analysis → Data display → Monitoring alarm

Data collection: Zabbix gathers data via SNMP, Agent, ICMP, SSH, IPMI, etc.

Data storage: Zabbix stores data in MySQL (or other databases).

Data display: Web UI, mobile app, or custom web interface (Java/PHP) can be used.

Data alarm: Email, WeChat, SMS alerts, and escalation mechanisms.

Zabbix’s configuration workflow:

When a trigger reaches its threshold, an event is generated. An Action then processes the event, which includes two parts:

Sending messages to users.

Executing commands to attempt automatic fault recovery.

Host groups → Hosts → Templates → Applications → Items → Graphs → Screens → Triggers → Events → Actions → Media types (alert escalation: 1. remote command 2. email) → User groups → Users → Media (alert email)

In production, Items, Triggers, and Graphs are usually managed through templates, allowing changes to be applied to all hosts that use the template.

Zabbix Monitoring Features

1. Monitoring Metrics

Host performance monitoring

Network device performance monitoring

Database performance monitoring

Multiple alert channels

Detailed reporting and charting

Zabbix agents for Linux, Windows, FreeBSD, etc.

SNMP (and optionally SSH) for network devices

2. Monitorable Objects

Devices: servers, routers, switches

Software: OS, network, applications

Host performance indicators

Fault monitoring: down hosts, unavailable services, unreachable hosts

3. Basic Monitoring Data

Main categories include:

CPU

Load

Memory

Disk

IO

Network related

Kernel parameters

ss statistics

Port collection

Process health of core services

Resource consumption of key business processes

NTP offset

DNS resolution

Fully understanding these basic monitoring options marks the point where one has mastered Linux operation principles and advanced commands.

4. JVM Monitoring

For companies whose main development language is Java, JVM monitoring is indispensable. Important JVM parameters include GC, class loading, memory, processes, threads, etc., which can be obtained via MxBeans.

5. MySQL Four Key Performance Indicators

Query throughput

Query execution performance

Connection status

Buffer pool usage

6. Business Application Monitoring

Monitoring business‑critical interfaces, such as response time, is also essential.

-END-

distributed systemsmonitoringoperationsopen sourceZabbixnagios
Mike Chen's Internet Architecture
Written by

Mike Chen's Internet Architecture

Over ten years of BAT architecture experience, shared generously!

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.