Tagged articles

2195 articles

Page 16 of 22

Sep 13, 2020 · Big Data

ClickHouse Deployment, Management, and Monitoring Practices in Production

This article explains ClickHouse's strengths as a high‑performance MPP database, details hardware selection, read/write separation, shard expansion steps, batch‑size tuning, and presents a three‑layer monitoring model, while also describing its practical application in Tencent's game analytics platform.

Big DataClickHouseData Warehouse

0 likes · 19 min read

ClickHouse Deployment, Management, and Monitoring Practices in Production

DataFunTalk

Sep 13, 2020 · Big Data

Online Sample Generation with Flink: Architecture and Implementation

This article explains why Flink is chosen for online sample generation, describes the end‑to‑end implementation steps—including stream union, state‑timer processing, and output formatting—covers state backend choices, monitoring, validation, fault handling, and platformization for scalable real‑time machine‑learning pipelines.

FlinkKafkaOnline Sample Generation

0 likes · 11 min read

Online Sample Generation with Flink: Architecture and Implementation

Java Backend Technology

Sep 12, 2020 · Databases

Why Redis Gets Slow: Common Latency Causes and How to Diagnose Them

This article explains the typical reasons Redis latency spikes—such as high‑complexity commands, large keys, concentrated expirations, memory limits, fork overhead, CPU binding, AOF settings, swap usage, and network saturation—and provides practical steps to monitor, identify, and mitigate each issue.

MemorySlowlogmonitoring

0 likes · 18 min read

Why Redis Gets Slow: Common Latency Causes and How to Diagnose Them

ITPUB

Sep 11, 2020 · Blockchain

How Red Pulse Secured Its Blockchain Platform: Real‑World Attack Lessons

This article details Red Pulse's journey of integrating the NEO blockchain, the security vulnerabilities it faced—from token theft and credential‑stuffing attacks to sophisticated social‑engineering exploits—and the comprehensive technical measures, monitoring tools, and mitigation strategies it implemented to protect its platform and users.

Attack MitigationBlockchainNEO

0 likes · 21 min read

How Red Pulse Secured Its Blockchain Platform: Real‑World Attack Lessons

Aikesheng Open Source Community

Sep 10, 2020 · Databases

Setting Up ClickHouse Monitoring with clickhouse-exporter, Prometheus, and Grafana

This guide walks through deploying clickhouse-exporter, configuring Prometheus to scrape its metrics, and importing a Grafana dashboard to monitor ClickHouse single‑node or cluster performance, providing a practical monitoring solution for the database.

ClickHouseExporterGo

0 likes · 4 min read

Setting Up ClickHouse Monitoring with clickhouse-exporter, Prometheus, and Grafana

Aikesheng Open Source Community

Sep 9, 2020 · Databases

How to Monitor MySQL Compressed Tables and Their Suitable Use Cases

This article explains the scenarios where MySQL compressed tables are appropriate, describes how to monitor their health using InnoDB CMP tables in information_schema, and provides practical examples of creation, performance comparison, and update/delete operations to illustrate best‑practice usage.

Compressed TableDatabaseInnoDB

0 likes · 10 min read

How to Monitor MySQL Compressed Tables and Their Suitable Use Cases

Ops Development Stories

Sep 9, 2020 · Operations

How to Deploy MQTT with Mosquitto and Monitor It Using Zabbix Agent2

This guide explains the MQTT protocol, shows how to install and run a Mosquitto broker on CentOS, and demonstrates how to collect MQTT messages with a custom Zabbix Agent2 plugin for real‑time monitoring in Zabbix.

Agent2IoTMQTT

0 likes · 7 min read

How to Deploy MQTT with Mosquitto and Monitor It Using Zabbix Agent2

HaoDF Tech Team

Sep 7, 2020 · Operations

Analyzing Latency and Slow Interface Detection in a Full‑Chain Monitoring System

This article explains how latency is used as a key indicator for application risk identification, defines slow interfaces, describes why percentile‑based thresholds are preferred over averages, and outlines the architecture, task workflow, and practical optimization strategies for a full‑chain monitoring system in a microservice environment.

LatencyRisk AssessmentSRE

0 likes · 14 min read

Analyzing Latency and Slow Interface Detection in a Full‑Chain Monitoring System

New Oriental Technology

Sep 7, 2020 · Operations

Performance Optimization and Stability Enhancement of the Continuation Enrollment System

This article details the background, performance and stability requirements, strategic approach, and concrete initiatives—including full‑chain load testing, chaos engineering, monitoring, and targeted optimization projects—that were undertaken to boost the performance by over 300% and improve high‑availability of the continuation enrollment platform.

Stabilitybackend optimizationchaos testing

0 likes · 7 min read

Performance Optimization and Stability Enhancement of the Continuation Enrollment System

dbaplus Community

Sep 6, 2020 · Operations

Building a High‑Performance Monitoring Alert System with Akka, Dubbo, and Ignite

The article outlines G Bank’s transition from a single‑threaded commercial monitoring solution to a self‑developed, open‑source based alert system that leverages Akka for parallel collection, Apache Dubbo for distributed processing, and Apache Ignite for in‑memory storage, achieving million‑level alert capacity, sub‑100 ms latency, and linear scalability.

AkkaApache DubboApache Ignite

0 likes · 17 min read

Building a High‑Performance Monitoring Alert System with Akka, Dubbo, and Ignite

MaGe Linux Operations

Sep 4, 2020 · Operations

Master Prometheus: From Basics to Full-Scale Monitoring Deployment

This guide walks through Prometheus fundamentals, architecture, components, service discovery, Docker-based deployment, exporter integration, Alertmanager configuration, Grafana visualization, PromQL queries, and Consul service discovery, providing a complete end‑to‑end monitoring solution for cloud‑native environments.

AlertmanagerConsulDocker

0 likes · 32 min read

Master Prometheus: From Basics to Full-Scale Monitoring Deployment

Suning Technology

Sep 4, 2020 · Big Data

How ClickHouse Powers Real-Time OLAP Monitoring at Suning Big Data Platform

This article explains how Suning's big‑data center leverages ClickHouse’s columnar OLAP engine and a full‑chain monitoring platform to achieve real‑time query tracing, slow‑query analysis, cluster health checks, and resource‑level alerts across diverse business scenarios.

ClickHouseClusterOLAP

0 likes · 14 min read

How ClickHouse Powers Real-Time OLAP Monitoring at Suning Big Data Platform

Alibaba Cloud Native

Sep 1, 2020 · Cloud Native

CTrip’s CDubbo Journey: Scaling 10k Services with Registration, Monitoring, and Service Mesh

From early .Net ESB attempts to a Java‑based CDubbo framework, CTrip details its migration to Dubbo, covering registration, health checks, CAT monitoring, dynamic configuration, SOA compatibility, testing tools, thread‑less execution, performance gains, extensibility, ecosystem integration, and future service‑mesh standardization.

Registrationcloud-nativemicroservices

0 likes · 15 min read

CTrip’s CDubbo Journey: Scaling 10k Services with Registration, Monitoring, and Service Mesh

Liangxu Linux

Aug 29, 2020 · Operations

Enforcing Clear Git Commit Messages with a Webhook‑Based Monitoring Service

This article explains why consistent Git commit messages matter, presents a detailed commit‑message format with type, scope and subject, shows how to enforce the standard using a webhook that validates messages, monitors large commits, and provides useful statistics for the development team.

code-qualitycommit messagemonitoring

0 likes · 11 min read

Enforcing Clear Git Commit Messages with a Webhook‑Based Monitoring Service

Amap Tech

Aug 28, 2020 · Fundamentals

Git Commit Message Standardization and Monitoring Service

The team introduced an Angular‑style Git commit‑message standard—type(scope): subject in Chinese—and built a webhook‑based monitoring service that validates pushes, alerts violations, tracks diff size and deletions, stores metrics, and visualizes compliance, improving traceability, readability, and automated changelog generation.

DevOpsbest-practicescommit message

0 likes · 10 min read

Git Commit Message Standardization and Monitoring Service

Java Architecture Diary

Aug 27, 2020 · Operations

Visualizing Redis in Grafana: Quick Start with the Redis Data Source Plugin

Grafana’s new Redis Data Source plugin lets DevOps engineers and DBAs seamlessly connect to Redis instances—whether open‑source, Enterprise, or Cloud—visualize time‑series and core data types, run management commands, and build interactive dashboards using Grafana’s transformations and built‑in panels.

Data SourceDevOpsGrafana

0 likes · 7 min read

Visualizing Redis in Grafana: Quick Start with the Redis Data Source Plugin

Java Architect Essentials

Aug 26, 2020 · Backend Development

A Comprehensive Guide to Evolving a Monolithic Online Store into a Robust Microservice Architecture

This article walks through the transformation of a simple online supermarket from a monolithic design to a fully fledged microservice system, explaining the motivations, architectural changes, component selection, common pitfalls, and best‑practice solutions such as service decomposition, database sharding, monitoring, tracing, service mesh, resilience patterns, and testing strategies.

ResilienceTracingarchitecture

0 likes · 22 min read

A Comprehensive Guide to Evolving a Monolithic Online Store into a Robust Microservice Architecture

Architecture Digest

Aug 25, 2020 · Operations

Best Practices and Advanced Topics for Prometheus Monitoring in Kubernetes

This article provides a comprehensive guide on using Prometheus for Kubernetes monitoring, covering fundamental principles, exporter selection, Grafana dashboard creation, memory and storage optimization, high‑availability designs, query performance, cardinality management, and integration with alerting and logging systems.

ExportersGrafanaKubernetes

0 likes · 33 min read

Best Practices and Advanced Topics for Prometheus Monitoring in Kubernetes

dbaplus Community

Aug 24, 2020 · Operations

How Zhongtong Scaled Elasticsearch Monitoring with ESPaaS: Architecture, Alerts, and Diagnosis

Zhongtong built the ESPaaS platform to automate deployment, unify monitoring, and provide real‑time alerts and diagnostic capabilities for over 40 Elasticsearch clusters, handling petabytes of data with Prometheus, Grafana, and DingTalk integrations while sharing practical lessons learned.

Prometheusalertingdiagnosis

0 likes · 9 min read

How Zhongtong Scaled Elasticsearch Monitoring with ESPaaS: Architecture, Alerts, and Diagnosis

Aikesheng Open Source Community

Aug 24, 2020 · Operations

Prometheus Data Query Basics and Practical Usage Guide

This article introduces Prometheus' query language PromQL, explains instant and range vector selectors, label matching, offset handling, storage design, common functions and aggregation operators, and provides practical advice for efficient querying and avoiding performance issues.

OperationsPromQLPrometheus

0 likes · 13 min read

Prometheus Data Query Basics and Practical Usage Guide

58 Tech

Aug 19, 2020 · Backend Development

Design and Implementation of a Testing Quality System for the 58.com SSP Advertising Platform

The article details the architecture of 58.com’s SSP advertising platform, identifies three key reliability challenges—data consistency, interface regression, and storage synchronization—and presents a three‑layer testing quality system comprising web‑layer validation, service‑layer automated testing, and data‑layer monitoring with concrete tools and future improvement plans.

MySQLRedisSSP

0 likes · 14 min read

Design and Implementation of a Testing Quality System for the 58.com SSP Advertising Platform

Open Source Linux

Aug 17, 2020 · Operations

Step-by-Step Guide to Install and Configure Zabbix on CentOS 7

This tutorial walks you through installing Zabbix on CentOS 7, covering prerequisite disabling of SELinux and firewalls, adding repositories, installing server, web, and database components, configuring files, securing MariaDB, starting services, and completing the web‑based setup with language customization.

CentOSInstallationLinux

0 likes · 7 min read

Step-by-Step Guide to Install and Configure Zabbix on CentOS 7

Full-Stack DevOps & Kubernetes

Aug 16, 2020 · Cloud Native

How to Configure Alertmanager, Add WeChat Alerts, and Enable Automatic Service Discovery in Kubernetes

This guide walks through modifying Alertmanager to use a NodePort service, decoding and editing its secret to add custom receivers and a WeChat template, recreating the secret, and extending Prometheus Operator with additional scrape configs for automatic service discovery, including RBAC adjustments and verification steps.

KubernetesRBACServiceDiscovery

0 likes · 10 min read

How to Configure Alertmanager, Add WeChat Alerts, and Enable Automatic Service Discovery in Kubernetes

Liangxu Linux

Aug 15, 2020 · Fundamentals

Why Does `free` Show More Used Memory Than `ps aux`? A Deep Dive into Linux Memory Accounting

This article explains why Linux's `free` command often reports higher used memory than the RSS values shown by `ps aux`, covering buffer/cache reclaimable memory, slab and page‑table consumption, and provides Bash scripts to accurately calculate total memory usage.

BashFreeMemory

0 likes · 10 min read

Why Does `free` Show More Used Memory Than `ps aux`? A Deep Dive into Linux Memory Accounting

Tencent Cloud Developer

Aug 12, 2020 · Databases

How Autonomous Databases Evolve: From Stone Age to AI‑Driven Self‑Healing

This article traces the evolution of database autonomy from manual, knowledge‑driven operations through tool‑assisted and expert‑level stages to cloud‑native intelligent services, and details Tencent's DBbrain platform, its architecture, performance‑optimization, security, monitoring, cost‑based analysis, and future self‑healing capabilities.

AI OpsCloud DatabasesDBbrain

0 likes · 29 min read

How Autonomous Databases Evolve: From Stone Age to AI‑Driven Self‑Healing

Java Architect Essentials

Aug 11, 2020 · Operations

Four Essential Linux Monitoring Tools for Operations Engineers

This article introduces four widely used Linux monitoring tools—iotop, htop, IPTraf, and Monit—explaining their features, usage scenarios, and how they help operations engineers diagnose performance issues without a GUI, including real‑time I/O tracking, visual CPU/memory graphs, network traffic analysis, and flexible alerting.

IPTrafLinuxMonit

0 likes · 7 min read

Four Essential Linux Monitoring Tools for Operations Engineers

IT Architects Alliance

Aug 10, 2020 · Operations

Step‑by‑Step Guide to Building a Filebeat‑Kafka‑ELK Logging Pipeline

This tutorial walks through installing and configuring Filebeat, Kafka, Logstash, Elasticsearch, and Kibana, detailing version requirements, file permissions, YAML settings, startup commands, topic verification, and how to ingest and visualize log data in Kibana.

ELKElasticsearchFilebeat

0 likes · 13 min read

Step‑by‑Step Guide to Building a Filebeat‑Kafka‑ELK Logging Pipeline

Programmer DD

Aug 9, 2020 · Backend Development

Why Did My Java Service’s Response Time Spike? A Deep Dive into QPS, GC, and CPU Load

An internal Java‑based HTTP service suddenly suffered high latency and timeouts, prompting a systematic investigation that uncovered excessive QPS, frequent ParNew GCs, CPU load spikes, and large response payloads, leading to concrete performance and design improvements.

javamonitoring

0 likes · 9 min read

Why Did My Java Service’s Response Time Spike? A Deep Dive into QPS, GC, and CPU Load

MaGe Linux Operations

Aug 8, 2020 · Operations

Step-by-Step Guide to Installing and Configuring Zabbix on CentOS 7

This tutorial walks you through disabling SELinux and the firewall, adding Zabbix and EPEL repositories, installing Zabbix server, web, and database components, configuring files, securing MariaDB, starting services, and completing the web‑based setup to get a fully functional monitoring system.

CentOSInstallationOpen-source

0 likes · 7 min read

Step-by-Step Guide to Installing and Configuring Zabbix on CentOS 7

Big Data Technology & Architecture

Aug 8, 2020 · Big Data

Setting Up InfluxDB and Grafana for Flink Metrics Monitoring

This guide walks through installing InfluxDB and Grafana on CentOS, configuring InfluxDB for Flink metrics storage, creating databases and retention policies, integrating the Flink InfluxDB reporter, and building Grafana dashboards to visualize real‑time Flink job metrics.

Big DataFlinkGrafana

0 likes · 8 min read

Setting Up InfluxDB and Grafana for Flink Metrics Monitoring

MaGe Linux Operations

Aug 7, 2020 · Operations

How to Diagnose Linux Server Issues in the First 60 Seconds with 10 Essential Commands

This article explains how Netflix's performance team uses ten standard Linux command‑line tools to quickly assess system health within the first minute, focusing on error detection, resource saturation, and utilization across CPU, memory, disk, and network to pinpoint performance problems.

System Administrationcommand linemonitoring

0 likes · 18 min read

How to Diagnose Linux Server Issues in the First 60 Seconds with 10 Essential Commands

dbaplus Community

Aug 3, 2020 · Operations

How iQIYI Built a Full‑Link Automated Monitoring Platform for Microservices

iQIYI’s tech product team designed a unified full‑link automated monitoring platform that integrates link, metric, and log collection with deep analysis, enhancing fault localization, performance insight, and scalability across microservices, while addressing limitations of existing tools like ELK, Prometheus, and Dapper.

MetricsObservabilityfull‑link

0 likes · 15 min read

How iQIYI Built a Full‑Link Automated Monitoring Platform for Microservices

转转QA

Jul 31, 2020 · Operations

Design and Implementation of a Real-Time Log Collection and Query System for Distributed Deployment

The article describes the challenges of troubleshooting distributed deployments across many machines and presents a solution built on the ELK stack that centralizes logs from Java and Go services, enabling near‑real‑time search, visualization, and faster issue resolution.

Distributed SystemsOperationslog collection

0 likes · 5 min read

Design and Implementation of a Real-Time Log Collection and Query System for Distributed Deployment

Xianyu Technology

Jul 28, 2020 · Operations

ShenTan: Automated Fault Localization System for Online Services

ShenTan is an automated fault‑localization platform for online services that quickly (under five seconds) pinpoints server‑side issues with developer‑level accuracy by aggregating real‑time metrics, applying a decision‑tree model enriched by expert knowledge and dynamic thresholds, and presenting results through an integrated alert and visualization system, while planning broader endpoint coverage and multi‑tenant support.

Big DataFault LocalizationOperations

0 likes · 12 min read

ShenTan: Automated Fault Localization System for Online Services

Top Architect

Jul 27, 2020 · Operations

10 Practical Tips to Boost Web Application Performance Up to 10× with NGINX

This article presents ten actionable recommendations—including reverse‑proxy deployment, load balancing, caching, compression, SSL/TLS tuning, HTTP/2 adoption, software upgrades, Linux and web‑server tuning, and real‑time monitoring—to dramatically improve web application performance, often achieving tenfold speed gains.

CachingCompressionWeb Performance

0 likes · 22 min read

10 Practical Tips to Boost Web Application Performance Up to 10× with NGINX

DevOps Cloud Academy

Jul 27, 2020 · Operations

Monitoring GitLab Runner and GitLab CI Pipelines with Prometheus

This guide details how to enable Prometheus metrics on GitLab Runner, configure Prometheus to scrape those metrics, and set up the gitlab-ci-pipelines-exporter with Grafana dashboards to monitor both runner performance and CI/CD pipeline health.

DevOpsGitLab RunnerGrafana

0 likes · 7 min read

Monitoring GitLab Runner and GitLab CI Pipelines with Prometheus

WecTeam

Jul 23, 2020 · Backend Development

How We Reduced WebMonitor Latency from Minutes to Seconds – Architecture & Performance Secrets

This article chronicles the evolution of the WebMonitor front‑end monitoring system, detailing its three‑tier stack, data pipeline upgrades from raw disk sampling to HDFS and Elasticsearch, extensive collector‑side optimizations, Jetty thread and timeout tuning, and the resulting performance gains that lowered response times from minutes to sub‑second levels.

Jettydata pipelinejava

0 likes · 15 min read

How We Reduced WebMonitor Latency from Minutes to Seconds – Architecture & Performance Secrets

dbaplus Community

Jul 20, 2020 · Operations

How to Build Reliable Monitoring for Low‑Frequency Financial Services

After two years transitioning from e‑commerce to finance, the team shares practical monitoring strategies for low‑frequency financial services, contrasting e‑commerce traffic‑based methods with finance‑specific challenges, and detailing point‑based metrics, hourly success‑rate alerts, aspect‑oriented exception handling, white‑list filtering, and Sentinel‑based circuit breaking.

Aspect Oriented ProgrammingCircuit BreakingFinancial Services

0 likes · 16 min read

How to Build Reliable Monitoring for Low‑Frequency Financial Services

Liangxu Linux

Jul 19, 2020 · Operations

How to Diagnose Linux Performance Issues with Flame Graphs and System Tools

This guide explains how to systematically analyze Linux performance problems—including CPU, memory, disk I/O, network, and load—using 5W2H methodology, built‑in monitoring commands, perf, flame‑graph visualizations, and a real‑world Nginx case study to pinpoint and resolve bottlenecks.

PerformanceTroubleshootingflamegraph

0 likes · 19 min read

How to Diagnose Linux Performance Issues with Flame Graphs and System Tools

360 Tech Engineering

Jul 17, 2020 · Big Data

Qbus Service Overview: Architecture, Use Cases, and Implementation Details

This article introduces Qbus, a cloud‑based queue service built on Kafka, covering its architecture, core components such as log collection, SDKs, HDFS persistence, monitoring with Prometheus, business integration methods, use‑case scenarios, and future development directions.

Cloud QueueHDFSKafka

0 likes · 6 min read

Qbus Service Overview: Architecture, Use Cases, and Implementation Details

Qunhe Technology Quality Tech

Jul 17, 2020 · Operations

How We Built a Robust Monitoring System for Construction Drawing Production

This article describes how our team designed and implemented a comprehensive online monitoring system for construction drawing generation, covering business background, technical architecture analysis, metric definition, monitoring methods, and the resulting dashboards that improve quality, stability, and rapid issue resolution.

MetricsOperationsconstruction drawing

0 likes · 10 min read

How We Built a Robust Monitoring System for Construction Drawing Production

Full-Stack DevOps & Kubernetes

Jul 16, 2020 · Cloud Native

How to Install HAProxy and Exporter on Kubernetes and Monitor It with Prometheus

This guide walks through installing HAProxy on a Kubernetes master node, compiling and configuring it, adding the HAProxy exporter, creating a ServiceMonitor YAML for the Prometheus Operator, and verifying that metrics are correctly scraped and displayed in the Prometheus UI.

ExporterHAProxyKubernetes

0 likes · 10 min read

How to Install HAProxy and Exporter on Kubernetes and Monitor It with Prometheus

Full-Stack Internet Architecture

Jul 12, 2020 · Operations

Monitoring Practices for Low‑Frequency Financial Services: Lessons from E‑commerce and Reliable Alerting Techniques

This article shares practical monitoring strategies for financial services with low‑frequency operations, contrasting e‑commerce monitoring methods, outlining the challenges of financial monitoring, and presenting reliable solutions such as success‑rate alerts, aspect‑oriented exception handling with whitelists, and circuit‑breaker degradation using Sentinel.

Aspect Oriented ProgrammingCircuit BreakerFinancial Services

0 likes · 14 min read

Monitoring Practices for Low‑Frequency Financial Services: Lessons from E‑commerce and Reliable Alerting Techniques

Big Data Technology & Architecture

Jul 11, 2020 · Operations

Prometheus Overview: Features, Architecture, Data Model, and Installation Guide with Grafana Integration

This article introduces Prometheus, covering its architecture, key features, data model, metric types, installation steps, integration with node_exporter and Grafana, and outlines suitable and unsuitable use cases for this open‑source monitoring system.

GrafanaInstallationPrometheus

0 likes · 8 min read

Prometheus Overview: Features, Architecture, Data Model, and Installation Guide with Grafana Integration

Full-Stack DevOps & Kubernetes

Jul 11, 2020 · Cloud Native

Managing Prometheus Alerts and Alertmanager with the Prometheus Operator

This guide walks through creating PrometheusRule resources, deploying Alertmanager via the Prometheus Operator, configuring custom alerting rules, exposing Alertmanager with a Service, and applying custom Prometheus configuration files using Kubernetes secrets and kubectl commands.

KubernetesPrometheus OperatorYAML

0 likes · 11 min read

Managing Prometheus Alerts and Alertmanager with the Prometheus Operator

Full-Stack DevOps & Kubernetes

Jul 9, 2020 · Cloud Native

Deploy and Manage Prometheus Operator on Kubernetes: A Step‑by‑Step Guide

This article explains what the Prometheus Operator is, how it extends Kubernetes with custom resources, lists the CRDs it provides, and walks through a complete deployment—including cloning the source, creating a monitoring namespace, applying RBAC, installing the operator, creating a Prometheus instance, configuring ServiceMonitor, and troubleshooting common permission errors—using concrete YAML manifests and kubectl commands.

KubernetesPrometheus OperatorRBAC

0 likes · 18 min read

Deploy and Manage Prometheus Operator on Kubernetes: A Step‑by‑Step Guide

HaoDF Tech Team

Jul 8, 2020 · Operations

How We Rebuilt Our Monitoring System into a Scalable Alert Service

After two months of intensive development, the team launched a new monitoring and alerting platform that transforms a legacy system into a service‑oriented solution, addressing pain points such as inflexible escalation, noisy alerts, and poor ownership while introducing phone alerts, automated escalation, Prometheus integration, and a unified rule engine.

DevOpsPrometheusSystem Design

0 likes · 16 min read

How We Rebuilt Our Monitoring System into a Scalable Alert Service

Full-Stack DevOps & Kubernetes

Jul 8, 2020 · Cloud Native

How to Deploy a Redis Exporter on Kubernetes for Prometheus Monitoring

This guide shows how to configure a Redis exporter alongside a Redis pod in Kubernetes, add Prometheus scrape annotations, apply the deployment and service manifests, and visualize metrics in Grafana, providing step‑by‑step commands, YAML examples, and screenshots of the monitoring dashboard.

ExporterKubernetesRedis

0 likes · 5 min read

How to Deploy a Redis Exporter on Kubernetes for Prometheus Monitoring

ITPUB

Jul 7, 2020 · Operations

Top 2020 DevOps Tools: A Complete Guide to Building Your CI/CD Stack

This article categorizes the most popular 2020 DevOps tools across development, testing, deployment, runtime, and collaboration, explains why each tool leads its class, lists key advantages and competitors, and offers a practical checklist for assembling a full CI/CD pipeline.

CollaborationDevOpsautomation

0 likes · 24 min read

Top 2020 DevOps Tools: A Complete Guide to Building Your CI/CD Stack

ITPUB

Jul 5, 2020 · Operations

2020’s Best DevOps Tools by Category – From CI/CD to Collaboration

This article categorises the most popular 2020 DevOps tools—development/build, automated testing, deployment, runtime, and collaboration—explains why each tool topped its class, lists key advantages, and compares notable competitors to help teams build a complete CI/CD pipeline.

Collaborationautomationmonitoring

0 likes · 27 min read

2020’s Best DevOps Tools by Category – From CI/CD to Collaboration

Architecture Digest

Jul 3, 2020 · Cloud Native

Understanding Loki: Architecture, Benefits, and Comparison with ELK

This article explains the motivations behind Loki, its architecture and components, how it reduces the cost and complexity of log and metric querying compared to ELK, and details its write‑read pipeline, scalability, and integration with Kubernetes and Prometheus.

LoggingLokiObservability

0 likes · 7 min read

Understanding Loki: Architecture, Benefits, and Comparison with ELK

dbaplus Community

Jul 2, 2020 · Information Security

How 58 Daojia Secures Data in the DT Era: Threats, Practices, and Lessons

This article summarizes Liu Huan's presentation on data security in the DT era, covering the current security landscape, internal and external threats to enterprise data, and 58 Daojia's practical approaches to data discovery, classification, authentication, monitoring, and incident response.

DT eraData Securityenterprise security

0 likes · 14 min read

How 58 Daojia Secures Data in the DT Era: Threats, Practices, and Lessons

Java High-Performance Architecture

Jul 2, 2020 · Operations

4 Essential Linux Monitoring Tools Every Sysadmin Should Master

Discover four high‑usage Linux monitoring utilities—iotop, htop, IPTraf, and Monit—that help you quickly diagnose I/O, CPU, memory, network, and process issues, with visual insights and flexible alerting to keep single or multiple servers running smoothly.

LinuxSystem Administrationhtop

0 likes · 4 min read

4 Essential Linux Monitoring Tools Every Sysadmin Should Master

Full-Stack DevOps & Kubernetes

Jul 1, 2020 · Cloud Native

How to Install and Configure mysql_exporter on a Kubernetes Master Node

This guide walks through downloading the mysql_exporter package, extracting it on a Kubernetes master, installing the binary, creating a dedicated MySQL user with proper permissions, configuring a password‑less client file, launching the exporter, and updating Prometheus via kubectl so MySQL metrics are exposed on port 9104.

DevOpsKubernetescloud-native

0 likes · 4 min read

How to Install and Configure mysql_exporter on a Kubernetes Master Node

Taobao Frontend Technology

Jul 1, 2020 · Frontend Development

How Taobao’s Front‑End Team Delivered a Lightning‑Fast 618 Shopping Experience

This article explains how Taobao’s front‑end engineers tackled the massive traffic of the 618 promotion by optimizing resource requests, data fetching, module loading, monitoring, and fallback strategies, ultimately achieving smooth, high‑performance pages for billions of shoppers.

Cachinge‑commercemonitoring

0 likes · 10 min read

How Taobao’s Front‑End Team Delivered a Lightning‑Fast 618 Shopping Experience

Top Architect

Jul 1, 2020 · Backend Development

Understanding Microservices Architecture: Concepts, Benefits, and Key Components

Microservices, introduced in 2012 and popularized by Martin Fowler, decompose applications into small, independent services that communicate via lightweight protocols, enabling modular development, flexible technology choices, independent deployment, and improved scalability, while also introducing challenges such as distributed data consistency, testing complexity, and operational overhead.

Backend ArchitectureConfiguration Managementapi-gateway

0 likes · 16 min read

Understanding Microservices Architecture: Concepts, Benefits, and Key Components

dbaplus Community

Jun 28, 2020 · Databases

How to Build a Visual MongoDB Slow Query Dashboard with PHP

This guide explains how to set up a PHP‑based web platform that collects MongoDB slow‑query logs via remote profiling, stores them in MySQL, and visualizes the data, including installation of required PHP extensions, database preparation, configuration, cron scheduling, and enabling profiling on MongoDB.

MongoDBPHPmonitoring

0 likes · 7 min read

How to Build a Visual MongoDB Slow Query Dashboard with PHP

MaGe Linux Operations

Jun 25, 2020 · Databases

How to Monitor Redis with Zabbix: Auto‑Discovery Scripts and Templates

This guide walks you through creating Zabbix auto‑discovery scripts to extract all Redis INFO parameters, configuring custom keys, setting permissions, and building a complete Redis monitoring template with step‑by‑step screenshots.

Auto-discoveryDatabase MonitoringRedis

0 likes · 7 min read

How to Monitor Redis with Zabbix: Auto‑Discovery Scripts and Templates

Qunar Tech Salon

Jun 23, 2020 · Operations

A Simple Gray Release Solution for High‑Concurrency Flight Ticket Systems

This article presents a lightweight gray release approach for complex flight ticket services, comparing traditional hardware and soft‑routing isolation methods, describing the authors' traffic‑based gray identification, business‑focused monitoring, implementation details, and automated safeguards to enable safe incremental deployments.

BackendGray ReleaseOperations

0 likes · 8 min read

A Simple Gray Release Solution for High‑Concurrency Flight Ticket Systems

Aikesheng Open Source Community

Jun 22, 2020 · Operations

Introduction to the Prometheus Data Collection Process

This article explains the complete Prometheus data collection workflow, covering key concepts such as targets, samples, and meta labels, detailing the relabeling steps, configuration options, example use‑cases, and the final scrape and storage phases for effective monitoring.

Data CollectionPrometheusconfiguration

0 likes · 8 min read

Introduction to the Prometheus Data Collection Process

Ops Development Stories

Jun 18, 2020 · Operations

Forward Zabbix Alerts to WeChat via Kafka – Complete Step‑by‑Step Guide

This guide shows how to route Zabbix alarm messages through a Kafka cluster and then deliver them to Enterprise WeChat using Python scripts, covering host configuration, Kafka/Zookeeper startup, topic creation, alert‑sending scripts, and Zabbix action setup.

Enterprise WeChatKafkaPython

0 likes · 6 min read

Forward Zabbix Alerts to WeChat via Kafka – Complete Step‑by‑Step Guide

JD Retail Technology

Jun 17, 2020 · Operations

How JD’s Data Platforms Scaled for the 618 Mega‑Sale: Operations, Stress‑Testing, and Dual‑Stream Architecture

The article details JD’s data product teams’ systematic preparation for the 618 shopping festival, covering pressure estimation, capacity expansion, stress testing, emergency downgrade strategies, dual‑data‑center isolation, high‑fidelity end‑to‑end testing, and continuous monitoring to ensure stable, real‑time data services during massive traffic spikes.

Big DataData PlatformJD.com

0 likes · 10 min read

How JD’s Data Platforms Scaled for the 618 Mega‑Sale: Operations, Stress‑Testing, and Dual‑Stream Architecture

Xianyu Technology

Jun 17, 2020 · Backend Development

Lottery System Risk Management and SDK Integration

Xianyu mitigated lottery‑related financial loss by centralizing rights management, decoupling UI from business logic, and providing a unified SDK with simple draw APIs, while adding real‑time log backflow, comprehensive accounting and monitoring, cutting configuration time by over 50 % and eliminating UI‑only risk.

BackendLottery SystemSDK

0 likes · 10 min read

Lottery System Risk Management and SDK Integration

Laravel Tech Community

Jun 16, 2020 · Mobile Development

Kuaishou’s APM Platform and Mobile Performance Optimization: Insights from Yang Kai

In a mobile‑first world where limited device resources and unstable networks threaten user retention, Kuaishou’s performance team built an APM monitoring platform and applied systematic memory, startup, and jank optimizations that cut startup time by 40%, reduced package size by 23 MB, and significantly improved key product metrics.

APMKuaishouPerformance Optimization

0 likes · 9 min read

Kuaishou’s APM Platform and Mobile Performance Optimization: Insights from Yang Kai

dbaplus Community

Jun 15, 2020 · Cloud Native

Deploying Prometheus on Kubernetes with Operator, Grafana, and Alertmanager

This guide walks through setting up a complete Prometheus monitoring stack on a Kubernetes cluster, covering both traditional YAML deployments and the Prometheus Operator, configuring services, integrating Grafana dashboards, and enabling Alertmanager notifications including WeChat alerts.

Prometheusmonitoring

0 likes · 34 min read

Deploying Prometheus on Kubernetes with Operator, Grafana, and Alertmanager

Liangxu Linux

Jun 13, 2020 · Operations

Mastering Monitoring: From Basics to Advanced Zabbix Practices

This comprehensive guide explains why monitoring is essential for operations, outlines monitoring goals and methods, reviews a wide range of open‑source tools, details a Zabbix‑based workflow, enumerates key metrics across hardware, system, application, network, security and business layers, and offers practical alerting and interview tips.

Operationsalertinglog analysis

0 likes · 21 min read

Mastering Monitoring: From Basics to Advanced Zabbix Practices

JD Retail Technology

Jun 10, 2020 · Operations

Logistics R&D Preparation for the 618 Promotion: System Readiness, Stress Testing, and Real‑Time Monitoring

The logistics R&D team spent 62 days preparing for the 618 promotion by analyzing core processes, applying stress tests, implementing fault‑tolerant architectures, planning capacity, and deploying real‑time monitoring tools to ensure system stability and performance under peak traffic.

OperationsSystem Designcapacity planning

0 likes · 7 min read

Logistics R&D Preparation for the 618 Promotion: System Readiness, Stress Testing, and Real‑Time Monitoring

Full-Stack DevOps & Kubernetes

Jun 9, 2020 · Operations

Configure Alertmanager to Send Alerts to Email, DingTalk, and WeChat

This guide walks you through modifying Alertmanager’s configuration to deliver alerts via QQ email, DingTalk chat‑bot webhooks, and Enterprise WeChat, including SMTP settings, webhook plugin installation, and the required wechat_configs parameters for seamless integration.

DevOpsDingTalkKubernetes

0 likes · 7 min read

Configure Alertmanager to Send Alerts to Email, DingTalk, and WeChat

Manbang Technology Team

Jun 8, 2020 · Cloud Native

Design and Implementation of a Zookeeper Operator for Kubernetes

This article outlines the design, functional requirements, CRD definition, architecture, deployment, scaling, monitoring, fault‑tolerance, and upgrade strategies of a Zookeeper operator on Kubernetes, including code examples, service configurations, and integration with Prometheus and OAM standards.

CRDKubernetesOperator

0 likes · 18 min read

Design and Implementation of a Zookeeper Operator for Kubernetes

Efficient Ops

Jun 3, 2020 · Operations

Understanding Kubernetes vs VM Monitoring: CPU, Memory, Disk & Network

This article compares monitoring metrics for CPU, memory, disk, and network between traditional KVM-based servers and Kubernetes pods, explaining why their indicators differ, how resource isolation works, and what key metrics users should watch to diagnose performance bottlenecks.

CPUKubernetesMemory

0 likes · 11 min read

Understanding Kubernetes vs VM Monitoring: CPU, Memory, Disk & Network

Open Source Linux

Jun 1, 2020 · Operations

Why Inodes Fill Up Before Disk Space? Diagnose and Fix Linux Filesystem Limits

This article explains what inodes are, why they can become exhausted even when disk space remains, and provides step‑by‑step Linux commands and cleanup techniques to monitor and resolve inode exhaustion issues.

Filesystemcleanupcron

0 likes · 6 min read

Why Inodes Fill Up Before Disk Space? Diagnose and Fix Linux Filesystem Limits

iQIYI Technical Product Team

May 29, 2020 · Big Data

iQiyi's Full-Link Automated Monitoring Platform: Design and Implementation

iQiyi’s full‑link automated monitoring platform unifies tracing, metric and log collection with deep offline and real‑time analysis, delivering a DAG‑based call graph, near‑real‑time ingestion of tens of millions of logs, multi‑dimensional alerts and rapid root‑cause diagnosis that cut error‑lookup time by over 50 % and now serves as a core component of the company’s microservice reference architecture.

Big DataLoggingMetrics

0 likes · 12 min read

iQiyi's Full-Link Automated Monitoring Platform: Design and Implementation

FunTester

May 26, 2020 · Fundamentals

Understanding Load Testing: Key Strategies and Best Practices

This article clarifies common misconceptions about load testing, defines it within performance testing, and provides practical strategies for test volume, load generators, scripting, think time, ramp-up/down, monitoring, diagnosis, and data analysis to ensure reliable performance assessments.

Test Strategymonitoringsoftware testing

0 likes · 11 min read

Understanding Load Testing: Key Strategies and Best Practices

dbaplus Community

May 25, 2020 · Operations

Scaling CAT Monitoring at Ctrip: Thread Model, Client Computation & Memory Tweaks

This article details how Ctrip optimized the CAT monitoring system—covering its large‑scale deployment, thread‑model redesign, offloading calculations to clients, double‑buffered reporting, and string handling improvements—to dramatically cut CPU usage, GC pressure, and memory consumption while handling billions of messages daily.

Distributed SystemsPerformance OptimizationThread Model

0 likes · 25 min read

Scaling CAT Monitoring at Ctrip: Thread Model, Client Computation & Memory Tweaks

Aikesheng Open Source Community

May 25, 2020 · Operations

Understanding Prometheus Data Collection: Formats, Types, and Best Practices

This article explains Prometheus data collection by describing metric syntax, label usage, time‑series concepts, the four logical metric types (Counter, Gauge, Histogram, Summary), and provides practical naming, labeling, and selection guidelines for effective monitoring.

Best PracticesCounterGauge

0 likes · 7 min read

Understanding Prometheus Data Collection: Formats, Types, and Best Practices

Programmer DD

May 22, 2020 · Operations

Grafana 7.0 Released: New UX, Plugin Platform, Transformations & CloudWatch Support

Grafana 7.0 introduces a revamped user experience, a unified data model, a new plugin platform, Jaeger tracing support, powerful data transformations, AWS CloudWatch Logs integration, and enterprise usage analytics, offering enhanced visualization and monitoring capabilities across major data sources.

Data visualizationGrafanaObservability

0 likes · 3 min read

Grafana 7.0 Released: New UX, Plugin Platform, Transformations & CloudWatch Support

Top Architect

May 21, 2020 · Backend Development

Comprehensive Guide to Java Application Performance Optimization and Diagnosis

This article provides an in‑depth overview of Java application performance optimization, covering a four‑layer model (application, database, framework, JVM), on‑site and post‑mortem analysis methods, OS and JVM diagnostic tools, common code and GC issues, database deadlock handling, and practical tuning recommendations.

Database TuningJVMPerformance Optimization

0 likes · 23 min read

Comprehensive Guide to Java Application Performance Optimization and Diagnosis

Efficient Ops

May 20, 2020 · Operations

How to Build a Sustainable CMDB: Three Essential Phases for Reliable Operations

This article explains how to design, implement, and maintain a robust Configuration Management Database (CMDB) by focusing on simple modeling, establishing data closure loops, and efficiently handling existing inventory, while leveraging Kafka, Flink, Elasticsearch, and Neo4j for fast querying and topology visualization.

CMDBConfiguration Managementautomation

0 likes · 19 min read

How to Build a Sustainable CMDB: Three Essential Phases for Reliable Operations

Efficient Ops

May 19, 2020 · Cloud Native

Mastering Prometheus on Kubernetes: Practical Tips, Exporter Guide, and Capacity Planning

This article explores the history and principles of Prometheus monitoring, offers guidance on version selection, highlights its limitations, details common Kubernetes exporters, shows Grafana dashboard setups, and provides in‑depth strategies for exporter aggregation, golden metrics, multi‑cluster scraping, GPU monitoring, timezone handling, memory optimization, capacity planning, and rate calculations.

GrafanaKubernetesPrometheus

0 likes · 19 min read

Mastering Prometheus on Kubernetes: Practical Tips, Exporter Guide, and Capacity Planning

Ops Development Stories

May 14, 2020 · Operations

How to Set Up Zabbix VMware Monitoring: Step-by-Step Configuration Guide

Learn how to enable Zabbix’s VMware monitoring by configuring collectors, editing the server config, linking vCenter templates, adding CPU, memory, and disk usage items, and creating triggers, with detailed code snippets and screenshots to ensure comprehensive virtual machine performance tracking.

PerformanceVMwareconfiguration

0 likes · 6 min read

How to Set Up Zabbix VMware Monitoring: Step-by-Step Configuration Guide

HomeTech

May 14, 2020 · Cloud Native

Design and Implementation of the Next‑Generation Cloud‑Native Monitoring System at Autohome

The article describes Autohome's third‑generation cloud‑native monitoring platform, detailing its background, strategic goals for R&D efficiency, mobile‑first design, Prometheus‑based architecture with multi‑replica storage and InfluxDB remote storage, its operational impact, and future directions such as AI‑driven noise reduction.

Containerscloud-nativemonitoring

0 likes · 7 min read

Design and Implementation of the Next‑Generation Cloud‑Native Monitoring System at Autohome

Programmer DD

May 12, 2020 · Operations

Boost RabbitMQ Reliability: Proven Strategies for Producers, Consumers, and Ops

This comprehensive guide explains how to enhance RabbitMQ reliability by covering confirmation mechanisms, producer and consumer configurations, queue mirroring, alerting, monitoring metrics, and health‑check commands, providing actionable steps for developers and operations teams to ensure stable message delivery.

Message queueOperationsRabbitMQ

0 likes · 22 min read

Boost RabbitMQ Reliability: Proven Strategies for Producers, Consumers, and Ops

MaGe Linux Operations

May 10, 2020 · Databases

How to Build a Complete MySQL Monitoring Dashboard with Prometheus and Grafana

This guide walks through deploying mysqld_exporter, configuring Prometheus and Grafana, and monitoring essential MySQL metrics such as replication health, query throughput, slow‑query counts, connection usage, and InnoDB buffer‑pool statistics, while also showing how to set up alert rules for proactive database operations.

ExportersGrafanaMySQL

0 likes · 15 min read

How to Build a Complete MySQL Monitoring Dashboard with Prometheus and Grafana

ITPUB

May 3, 2020 · Operations

Mastering IT Monitoring: Goals, Methods, Tools, and Best Practices

This comprehensive guide explains why monitoring is essential for reliable operations, outlines clear monitoring objectives, walks through practical monitoring methods, compares popular open‑source tools, details a Zabbix‑based workflow, and lists key hardware, system, application, network, security, API, performance, and business metrics to track.

IT infrastructureOperationsmonitoring

0 likes · 19 min read

Mastering IT Monitoring: Goals, Methods, Tools, and Best Practices

Laravel Tech Community

May 2, 2020 · Operations

Comprehensive MySQL and Linux Operations Interview Guide

This guide compiles essential MySQL security steps, master‑slave replication principles, backup scripts, Linux boot overview, common port services, virus mitigation, monitoring tools, nginx optimization, InnoDB lock troubleshooting, replication lag reduction, high‑availability components, data migration utilities, and automation configuration management techniques for operations engineers.

DatabaseLinuxMySQL

0 likes · 13 min read

Comprehensive MySQL and Linux Operations Interview Guide

Top Architect

May 1, 2020 · Operations

Comprehensive Guide to Java Runtime Error Diagnosis: CPU, Memory, Disk, GC, and Network Troubleshooting

This article presents a systematic approach to diagnosing and resolving Java runtime problems by examining CPU usage, disk I/O, memory consumption, garbage‑collection behavior, and network anomalies, offering practical commands, analysis techniques, and visual aids to pinpoint root causes in production environments.

OperationsPerformanceTroubleshooting

0 likes · 22 min read

Comprehensive Guide to Java Runtime Error Diagnosis: CPU, Memory, Disk, GC, and Network Troubleshooting

Liangxu Linux

Apr 29, 2020 · Operations

How to Build a Complete Monitoring System: Goals, Methods, Tools & Best Practices

This guide explains why monitoring is essential for the entire operations lifecycle, outlines key monitoring objectives, describes practical methods and workflows, reviews a range of open‑source tools (including Zabbix, MRTG, Ganglia, Nagios, Smokeping, OpenTSDB), and details metric categories such as hardware, system, application, network, log, security, API, performance and business monitoring.

Metricsalertingmonitoring

0 likes · 22 min read

How to Build a Complete Monitoring System: Goals, Methods, Tools & Best Practices

vivo Internet Technology

Apr 29, 2020 · Cloud Native

Prometheus Architecture and Design Principles: A Deep Dive into Cloud-Native Monitoring

Prometheus, a CNCF‑graduated, cloud‑native monitoring system, combines pull‑based target discovery, a label‑rich time‑series data model, and four core metric types—gauge, counter, histogram, and summary—to provide near‑real‑time visibility, short‑term retention, alerting via AlertManager, and integration with Grafana and remote storage for scalable observability.

AlertmanagerCNCFDevOps

0 likes · 11 min read

Prometheus Architecture and Design Principles: A Deep Dive into Cloud-Native Monitoring

Qunhe Technology Quality Tech

Apr 29, 2020 · Operations

How Our Team Built a Stable SIT Environment: Lessons in Test Environment Governance

This article documents the step‑by‑step practices of a six‑person test‑environment availability team that unified middleware, streamlined deployment pipelines, piloted business usage, introduced monitoring and recovery mechanisms, and created a comprehensive SIT environment handbook to improve integration testing stability and operational efficiency.

OperationsSITdeployment

0 likes · 19 min read

How Our Team Built a Stable SIT Environment: Lessons in Test Environment Governance

UCloud Tech

Apr 28, 2020 · Cloud Native

How We Built a Highly Available Kubernetes Platform for Multi‑Cluster Deployments

This article explains why Kubernetes was chosen, describes the overall architecture, high‑availability master design, multi‑IDC cluster deployment, logging, monitoring, service exposure, image building, lifecycle hooks, CI/CD, multi‑cluster management, encountered challenges, and future plans for operators and automated scaling.

KubernetesMulti-Clusterci/cd

0 likes · 11 min read

How We Built a Highly Available Kubernetes Platform for Multi‑Cluster Deployments

Aikesheng Open Source Community

Apr 27, 2020 · Operations

Detailed Introduction to Prometheus: Architecture, Quick Deployment, Advantages and Drawbacks

This article provides a comprehensive overview of Prometheus, covering its origins, architecture, step‑by‑step deployment, configuration, web UI usage, as well as its key advantages and limitations for cloud‑native monitoring and operations.

AlertmanagerOperationsPrometheus

0 likes · 6 min read

Detailed Introduction to Prometheus: Architecture, Quick Deployment, Advantages and Drawbacks

DevOps Cloud Academy

Apr 23, 2020 · Operations

Step-by-Step Guide to Installing and Configuring Prometheus, Node Exporter, Alertmanager, and Grafana

This tutorial provides a beginner-friendly, step-by-step walkthrough for downloading, installing, configuring, and verifying Prometheus, Node Exporter, Alertmanager, and Grafana on a Linux server, including service setup, configuration files, and a simple alert test.

AlertmanagerGrafanaInstallation

0 likes · 7 min read

Step-by-Step Guide to Installing and Configuring Prometheus, Node Exporter, Alertmanager, and Grafana

dbaplus Community

Apr 22, 2020 · Operations

How 58 Daojia Built a Cloud‑Native Ops Platform to Streamline Migration and Cut Costs

This article recounts 58 Daojia’s four‑year journey from migrating its IDC infrastructure to public cloud, the challenges encountered, and how the team designed and evolved a multi‑generation operations platform that centralizes asset, cost, domain, and monitoring management, ultimately improving efficiency and reducing expenses.

Cost Managementasset managementcloud migration

0 likes · 14 min read

How 58 Daojia Built a Cloud‑Native Ops Platform to Streamline Migration and Cut Costs

MaGe Linux Operations

Apr 22, 2020 · Operations

Why Kubernetes CPU Metrics Differ from Traditional VMs: A Deep Dive

This article compares CPU, memory, disk, and network monitoring metrics between traditional KVM servers and Kubernetes pods, explaining the underlying reasons for differences and offering guidance on interpreting the metrics for effective performance troubleshooting.

CPUKubernetesmonitoring

0 likes · 11 min read

Why Kubernetes CPU Metrics Differ from Traditional VMs: A Deep Dive

Architects' Tech Alliance

Apr 19, 2020 · Operations

IO Performance Evaluation: Models, Tools, Metrics, and Optimization Strategies

This article explains common IO latency problems, introduces how to define and refine IO models, lists disk and network evaluation tools, describes key monitoring metrics, and provides practical tuning methods and case studies for improving storage and network performance.

OptimizationPerformance tuningmonitoring

0 likes · 14 min read

IO Performance Evaluation: Models, Tools, Metrics, and Optimization Strategies

21CTO

Apr 16, 2020 · Backend Development

How JD’s API Gateway Handles Tens of Millions of Concurrent Requests

This article explains how JD Retail built a high‑performance, secure, and observable API gateway that supports massive traffic, implements asynchronous processing for high concurrency, provides fine‑grained traffic control, gray‑release capabilities, and automated operations to serve native, web, and mini‑program clients.

Gray ReleaseSecurityapi-gateway

0 likes · 10 min read

How JD’s API Gateway Handles Tens of Millions of Concurrent Requests

FunTester

Apr 14, 2020 · Operations

Spot Performance Problems Without Writing a Single Line of Code

Experienced developers can often identify performance bottlenecks simply by reviewing code implementations, configuration settings such as timeouts, intervals, database and Redis parameters, as well as service monitoring data, container and JVM configurations, allowing them to avoid unnecessary test scripts and code changes.

DevOpsOperationsOptimization

0 likes · 2 min read

Spot Performance Problems Without Writing a Single Line of Code

MaGe Linux Operations

Apr 13, 2020 · Operations

Step-by-Step Guide to Install and Configure Zabbix 4.4 on CentOS 8

This tutorial walks you through preparing a CentOS 8 system, installing Zabbix 4.4 and its dependencies, configuring the database and server, and completing the web‑based setup, providing all necessary commands and screenshots for a successful monitoring solution.

CentOS8Installationmonitoring

0 likes · 8 min read

Step-by-Step Guide to Install and Configure Zabbix 4.4 on CentOS 8

Cloud Native Technology Community

Apr 8, 2020 · Operations

Decoding Thanos Architecture: From Query to Compact for Scalable Monitoring

This article provides a detailed analysis of Thanos' architecture, explaining each core component—Query, Sidecar, Store Gateway, Ruler, Compact, and the upcoming Receiver—how they enable global view, high availability, and long‑term storage for distributed Prometheus deployments, and discusses design trade‑offs and optimization strategies.

Long‑term StorageObservabilityPrometheus

0 likes · 12 min read

Decoding Thanos Architecture: From Query to Compact for Scalable Monitoring

Ops Development Stories

Apr 8, 2020 · Operations

Deploy Zabbix Monitoring with Docker and Docker‑Compose on CentOS

This guide walks through preparing a CentOS 7 host, installing Docker, configuring a Zabbix server and MySQL containers, and optionally using docker‑compose to set up Zabbix components, including the web UI and agent, with detailed commands and volume mappings for persistent monitoring.

CentOSDockerdocker-compose

0 likes · 18 min read

Deploy Zabbix Monitoring with Docker and Docker‑Compose on CentOS