Operations 28 min read

Process‑Level Resource Monitoring and Reporting System Using Python RPC

This article describes a complete solution for per‑process monitoring of CPU, memory, disk I/O and network usage on Linux servers, covering the shortcomings of existing tools, data collection methods, aggregation logic, storage in MySQL, visualization with Grafana, deployment steps, and operational best practices.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Process‑Level Resource Monitoring and Reporting System Using Python RPC

In many online services a single machine runs dozens of instances, making it difficult to pinpoint which process is exhausting system resources because traditional monitoring lacks per‑process granularity.

The open‑source process_exporter was evaluated but proved unsuitable: it requires pre‑configuration of each process, cannot monitor network usage, and handling many instances becomes cumbersome.

The goal is an automatic discovery mechanism that gathers CPU, memory, disk I/O and network metrics for every active process, including temporary ones, without manual configuration.

Initial attempts to read /proc/pid files revealed that CPU statistics are not exposed, memory can be obtained from /proc/pid/status , I/O from /proc/pid/io , and network files ( /proc/pid/net/dev , /proc/pid/net/netstat ) only report whole‑interface traffic, not per‑process data.

Consequently the implementation falls back to standard Linux utilities (top, ps, iotop, iftop) combined with custom scripts. Sample commands include:

# List /proc of PID 1
ls /proc/1
# Extract memory usage
grep "VmRSS:" /proc/3948/status
# Extract I/O counters
grep "bytes" /proc/3948/io
# Capture network stats (ineffective per‑process)
cat /proc/66218/net/dev /proc/67099/net/dev /proc/net/dev | grep eth0

For CPU and memory per process, top -b -n 1 output is filtered and combined with ps -ef to build dictionaries top_dic and ps_dic , keyed by PID. Process details are hashed (MD5) and stored separately to avoid duplication.

# Example JSON payload per process
{
    "19991": {"cpu":"50.0","mem":"12.5","io_r":"145","io_w":"14012","md5":"2932fb739fbfed7175c196b42021877b","remarks":"/opt/soft/mysql57/bin/mysqld --defaults-file=//work/mysql23736/etc/my23736.cnf"},
    "58163": {"cpu":"38.9","mem":"13.1","io_r":"16510","io_w":"1245","md5":"c9e1804bcf8a9a2f7c4d5ef6a2ff1b62","remarks":"/opt/soft/mysql57/bin/mysqld --defaults-file=//work/mysql23758/etc/my23758.cnf"}
}

Network monitoring proved challenging; per‑process traffic cannot be derived from /proc . Instead, iftop is used to capture traffic per IP:port pair, with unit conversion logic to normalize KB/MB/GB values, and results are stored in iftop_dic .

# Sample iftop extraction
iftop -t -n -B -P -s 2 -L 200 2>/dev/null | grep -P '(< =|=>)' | sed 'N;s/\n/,/g' | awk 'NF==13{...}'

All collected metrics are inserted into a MySQL database. The schema includes tables for host configuration, version control, alerts, and detailed process, disk, and network information. A version field enables automatic code upgrades: when the server version changes, clients pull the new code and restart.

Visualization is handled by Grafana. A pre‑built dashboard JSON (imported via Grafana UI) reads from the MySQL tables and displays machine‑level and process‑level charts for CPU, memory, disk, and network. Recommendations such as using “null as zero” and setting appropriate units are provided.

Deployment steps are outlined: clone the repository, configure conf/config.ini , create MySQL users and tables, add client entries to tb_monitor_host_config , ensure password‑less SSH between server and clients, and start the server via a cron job. Sample scripts for server startup, MySQL connection handling, and long‑running connection management are included.

Operational notes cover SSH environment requirements, long MySQL connections, threshold‑based filtering to reduce metric volume, timeout handling for subprocesses (including the use of set -o pipefail ), and alerting via Enterprise WeChat bots.

Limitations include dependence on specific Linux tools, compatibility with particular OS and MySQL versions, and the fact that per‑process network statistics are approximated via IP:port analysis rather than true per‑process counters.

PythonRPClinuxMySQLGrafanaresource-usageProcess Monitoring
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.