Operations 13 min read

Mastering IaaS Monitoring: Strategies for Servers, Networks, and Traffic

This article, the second in a monitoring and alerting series, explains how to comprehensively monitor the IaaS layer—including server health, network device performance, and traffic analysis—by classifying resources and applying status, performance, capacity, and quality dimensions to achieve unified operational insight.

Efficient Ops

Oct 29, 2017

Mastering IaaS Monitoring: Strategies for Servers, Networks, and Traffic

Overview

This article is the second in a series on monitoring and alerting products, focusing on IaaS layer monitoring, including server status and performance, network device status and performance, and network traffic analysis.

IaaS

IaaS, PaaS, and SaaS are the three cloud‑computing layers. IaaS (Infrastructure‑as‑a‑Service) provides visible resources such as servers, network devices, and storage. It is the foundation; poor IaaS management affects higher layers.

IaaS Monitoring

Monitoring IaaS means monitoring each resource object (physical servers, switches, dedicated lines, public IPs). Four dimensions are used: status, performance, capacity, and quality.

Status Monitoring : device alive, port status, power, fan.

Performance Monitoring : memory size, port traffic, CPU utilization.

Quality Monitoring : packet loss, error rate, latency.

Capacity Monitoring : load utilization of devices, bandwidth usage, etc.

Monitoring Product Layer Structure

Most commercial or open‑source monitoring‑alerting products adopt a layered architecture: data collection at the bottom, then storage, analysis, visualization, alerting, and handling.

Data Collection

Enterprise monitoring systems support multiple collection methods (agent reporting, SNMP, IPMI, etc.) and objects (servers, OS metrics, network devices, sessions, lines, etc.). Different objects use appropriate collection methods.

Basic Concepts

Monitoring and alerting involve collection, storage, analysis, display, alerting, and handling of a concrete object. The principle is to manage objects first, then monitor them.

Alarm (Monitoring) Object

Defined as a specific resource in CMDB or a custom CI, e.g., a physical server, a business tier, a TDSQL instance.

Alarm (Monitoring) Metric

One or more feature IDs or derived calculations, e.g., CPU usage, memory usage, or success rate = (successful requests / total requests) * 100.

Alarm (Monitoring) Type

Specifies an algorithm applied to a set of metrics for an object, e.g., single‑machine performance alarm covering CPU, memory, etc.

Alarm Rule

Combines object, metric, condition, and notification convergence (threshold, count, duration) into a policy, e.g., CPU usage > 80% on a switch.

Alarm Strategy

Groups multiple rules for an object and type; aims to simplify management for many services.

Flow

Flow is a data‑exchange method that caches the first IP packet of a flow and aggregates subsequent packets, containing fields such as source/destination IP, ports, protocol, ToS, and input interface.

Network Flow

Analyzing flow data helps answer usage, protocol, direction, loss, and latency questions.

Network Device Monitoring

Focuses on device status, performance, quality, and capacity, including syslog monitoring, stack status, port traffic, and logical‑port performance. Syslog alerts require grouping devices by vendor, type, model, and purpose.

Server Monitoring

Monitors status (ping, agent timeout, power), performance (CPU, memory, traffic, packets), and capacity. Rich data enables deeper insight and diverse consumption scenarios.

Summary

IaaS monitoring classifies resources and evaluates them across status, performance, capacity, and quality, providing a unified view for development and operations. Building a monitoring and alerting product is complex and requires consideration of technical capabilities, user and system perspectives, and permission models.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

IaaS Flow

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.