Operations 13 min read

Mastering IaaS Monitoring: Strategies for Servers, Networks, and Traffic

This article, the second in a monitoring and alerting series, explains how to comprehensively monitor the IaaS layer—including server health, network device performance, and traffic analysis—by classifying resources and applying status, performance, capacity, and quality dimensions to achieve unified operational insight.

Efficient Ops
Efficient Ops
Efficient Ops
Mastering IaaS Monitoring: Strategies for Servers, Networks, and Traffic

Overview

This article is the second in a series on monitoring and alerting products, focusing on IaaS layer monitoring, including server status and performance, network device status and performance, and network traffic analysis.

IaaS

IaaS, PaaS, and SaaS are the three cloud‑computing layers. IaaS (Infrastructure‑as‑a‑Service) provides visible resources such as servers, network devices, and storage. It is the foundation; poor IaaS management affects higher layers.

IaaS Monitoring

Monitoring IaaS means monitoring each resource object (physical servers, switches, dedicated lines, public IPs). Four dimensions are used: status, performance, capacity, and quality.

Status Monitoring : device alive, port status, power, fan.

Performance Monitoring : memory size, port traffic, CPU utilization.

Quality Monitoring : packet loss, error rate, latency.

Capacity Monitoring : load utilization of devices, bandwidth usage, etc.

Monitoring Product Layer Structure

Most commercial or open‑source monitoring‑alerting products adopt a layered architecture: data collection at the bottom, then storage, analysis, visualization, alerting, and handling.

Data Collection

Enterprise monitoring systems support multiple collection methods (agent reporting, SNMP, IPMI, etc.) and objects (servers, OS metrics, network devices, sessions, lines, etc.). Different objects use appropriate collection methods.

Basic Concepts

Monitoring and alerting involve collection, storage, analysis, display, alerting, and handling of a concrete object. The principle is to manage objects first, then monitor them.

Alarm (Monitoring) Object

Defined as a specific resource in CMDB or a custom CI, e.g., a physical server, a business tier, a TDSQL instance.

Alarm (Monitoring) Metric

One or more feature IDs or derived calculations, e.g., CPU usage, memory usage, or success rate = (successful requests / total requests) * 100.

Alarm (Monitoring) Type

Specifies an algorithm applied to a set of metrics for an object, e.g., single‑machine performance alarm covering CPU, memory, etc.

Alarm Rule

Combines object, metric, condition, and notification convergence (threshold, count, duration) into a policy, e.g., CPU usage > 80% on a switch.

Alarm Strategy

Groups multiple rules for an object and type; aims to simplify management for many services.

Flow

Flow is a data‑exchange method that caches the first IP packet of a flow and aggregates subsequent packets, containing fields such as source/destination IP, ports, protocol, ToS, and input interface.

Network Flow

Analyzing flow data helps answer usage, protocol, direction, loss, and latency questions.

Network Device Monitoring

Focuses on device status, performance, quality, and capacity, including syslog monitoring, stack status, port traffic, and logical‑port performance. Syslog alerts require grouping devices by vendor, type, model, and purpose.

Server Monitoring

Monitors status (ping, agent timeout, power), performance (CPU, memory, traffic, packets), and capacity. Rich data enables deeper insight and diverse consumption scenarios.

Summary

IaaS monitoring classifies resources and evaluates them across status, performance, capacity, and quality, providing a unified view for development and operations. Building a monitoring and alerting product is complex and requires consideration of technical capabilities, user and system perspectives, and permission models.

monitoringoperationsnetworkalertingIaaSFlow
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.