Tagged articles
3287 articles
Page 17 of 33
DevOps
DevOps
Nov 23, 2021 · Operations

Zero‑Downtime Application Deployment: Strategies, Maturity Levels, and Required Technical Components

The article explains why traditional three‑step application releases cause service interruptions, introduces three maturity levels for zero‑downtime deployment, compares blue‑green, rolling, and canary release models, and provides concrete technical components, load‑balancer architectures, and Spring‑Boot/Eureka shutdown procedures to achieve uninterrupted service.

OperationsZero Downtimeload balancing
0 likes · 22 min read
Zero‑Downtime Application Deployment: Strategies, Maturity Levels, and Required Technical Components
IT Architects Alliance
IT Architects Alliance
Nov 20, 2021 · Operations

Analysis and Optimization of Business System Performance

This article outlines a comprehensive approach to diagnosing and optimizing performance problems in production business systems, covering analysis processes, hardware, OS, database, middleware, JVM tuning, code inefficiencies, and monitoring techniques to identify root causes and improve system reliability.

Database TuningJVM tuningOperations
0 likes · 16 min read
Analysis and Optimization of Business System Performance
Efficient Ops
Efficient Ops
Nov 19, 2021 · Operations

How Shanghai Pudong Development Bank Achieved Top‑Tier DevOps Maturity Across 8 Projects

Shanghai Pudong Development Bank’s eight systems passed the third‑level DevOps continuous‑delivery assessment, showcasing how standardized processes, tool empowerment, and a unified maturity model can dramatically boost development efficiency, quality, and competitive advantage in the banking sector.

Continuous DeliveryDevOpsMaturity Assessment
0 likes · 13 min read
How Shanghai Pudong Development Bank Achieved Top‑Tier DevOps Maturity Across 8 Projects
Efficient Ops
Efficient Ops
Nov 18, 2021 · Operations

Latest DevOps Maturity Assessment Results Reveal Top Companies and New 2+ Level Standards

The 2021 GOPS Global Operations Conference in Shanghai announced the latest DevOps capability maturity assessment results, detailing the enterprises that achieved continuous delivery level 3 and technical operation level 2+, explaining the new 2+ grading, and outlining the DevOps maturity model and its industry adoption.

Capability MaturityContinuous DeliveryDevOps
0 likes · 6 min read
Latest DevOps Maturity Assessment Results Reveal Top Companies and New 2+ Level Standards
vivo Internet Technology
vivo Internet Technology
Nov 17, 2021 · Operations

Design and Architecture of a Unified Alert Convergence System for Monitoring

The paper presents a unified alert convergence system that centralizes metric calculation, detection, and alarm handling across monitoring subsystems, employing mechanisms such as convergence, claiming, silencing, escalation, and a Redis‑based delayed queue integrated via Kafka or REST to reduce alarm fatigue, improve MTTA/MTTR, and enable future AI‑driven AIOps.

MTTAMTTROperations
0 likes · 18 min read
Design and Architecture of a Unified Alert Convergence System for Monitoring
58UXD
58UXD
Nov 17, 2021 · Operations

How 58 Daojia Scaled Service Center Design Across Hundreds of Stores

This article details the design principles, brand‑value strategies, quality control, and cost‑saving measures used to launch the first 58 Daojia premium service center and expand the concept to nearly a hundred physical stores nationwide.

OperationsService Centerbrand value
0 likes · 9 min read
How 58 Daojia Scaled Service Center Design Across Hundreds of Stores
Open Source Linux
Open Source Linux
Nov 16, 2021 · Databases

How to Stress Test Redis with redis-benchmark: A Quick Guide

This guide explains how to use Redis's built-in redis-benchmark tool to simulate concurrent client load, interpret key performance metrics such as request latency and throughput, and monitor server resource usage, helping operators prevent cache-related failures like penetration and avalanche after deployment.

OperationsRedisbenchmark
0 likes · 3 min read
How to Stress Test Redis with redis-benchmark: A Quick Guide
DevOps
DevOps
Nov 16, 2021 · Operations

Key Strategies and Recommendations for Successful Enterprise Digital Transformation

The article outlines how enterprises can assess digital transformation outcomes, formulate effective strategies, build large‑scale capabilities, foster agile culture, and continuously monitor progress, drawing on McKinsey research and real‑world examples to guide traditional firms toward sustainable digital growth.

Big DataEnterprise StrategyOperations
0 likes · 17 min read
Key Strategies and Recommendations for Successful Enterprise Digital Transformation
58UXD
58UXD
Nov 15, 2021 · Operations

How Strategic Visual Design Boosts E‑commerce Campaign Performance

This article examines how thoughtfully crafted main visuals influence user engagement and sales in e‑commerce campaigns, presenting four case studies from the “Super Welfare Day” series that illustrate design background, strategy, visual style, implementation, and measurable results such as an 85.2% GMV lift.

Design ThinkingOperationscampaign strategy
0 likes · 8 min read
How Strategic Visual Design Boosts E‑commerce Campaign Performance
Open Source Linux
Open Source Linux
Nov 14, 2021 · Operations

How to Quickly Identify Disk Space Hogs on Linux Servers

Learn step-by-step Linux techniques—including df, du, find, and lsof commands—to pinpoint large directories or files, filter results, handle hidden space consumption, and adjust reserved filesystem space, ensuring you can efficiently resolve unexpected disk usage issues on your servers.

Operationsdfdisk usage
0 likes · 4 min read
How to Quickly Identify Disk Space Hogs on Linux Servers
IT Architects Alliance
IT Architects Alliance
Nov 11, 2021 · Operations

Design and Implementation of a TB‑Scale Log Monitoring System Using the ELK Stack

This article explains how to build a terabyte‑level log monitoring platform for micro‑service environments by unifying log collection with FileBeat, enriching observability through Elastic APM, processing streams via Kafka Streams, and visualizing metrics with Grafana and Kibana, while addressing cost‑effective filtering and retention strategies.

ELK StackGrafanaLog Monitoring
0 likes · 8 min read
Design and Implementation of a TB‑Scale Log Monitoring System Using the ELK Stack
DevOps
DevOps
Nov 8, 2021 · Operations

Digital Transformation vs. IT Transformation: Key Differences and How They Should Interact

The article explains that digital transformation is a customer‑driven, end‑to‑end business overhaul distinct from IT transformation, which focuses on technology, highlighting three major differences, the risks of conflating the two, and why digital transformation should ultimately drive IT transformation for lasting competitive advantage.

IT transformationOperationsbusiness strategy
0 likes · 9 min read
Digital Transformation vs. IT Transformation: Key Differences and How They Should Interact
Liangxu Linux
Liangxu Linux
Nov 7, 2021 · Operations

How to Quickly Identify Disk Space Hogs on Linux Servers

This guide explains how to diagnose unexpected disk usage on Linux by using df, du, find, and lsof commands, demonstrates efficient ways to locate large directories or deleted files, and shows how to adjust reserved space with tune2fs to reclaim lost storage.

Operationsdisk spacedu
0 likes · 5 min read
How to Quickly Identify Disk Space Hogs on Linux Servers
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Nov 4, 2021 · Operations

Mastering Service Degradation: Strategies to Keep Systems Available

Service degradation involves strategically reducing or disabling non‑essential features during traffic spikes or failures to maintain core functionality, covering concepts like SLA levels, fallback data, rate‑limiting, timeout handling, circuit breaking, and front‑end and back‑end downgrade techniques for high‑availability systems.

OperationsRate LimitingSLA
0 likes · 14 min read
Mastering Service Degradation: Strategies to Keep Systems Available
Alibaba Cloud Native
Alibaba Cloud Native
Nov 2, 2021 · Cloud Native

How to Spot Load‑Balancing, Scheduling, and Hotspot Issues with Kubernetes Monitoring

This article explains how to use Kubernetes monitoring features such as service details, topology maps, and pod metrics to quickly identify load‑balancing imbalances, cluster scheduling bottlenecks, and resource hotspot problems, providing practical steps and visual examples for improving system reliability and performance.

OperationsResource HotspotsScheduling
0 likes · 10 min read
How to Spot Load‑Balancing, Scheduling, and Hotspot Issues with Kubernetes Monitoring
Efficient Ops
Efficient Ops
Nov 1, 2021 · Operations

How AIOps Is Empowering Enterprise Digital Transformation

The article explains how AIOps, built on DevOps principles and leveraging AI and big‑data analytics, helps enterprises overcome governance challenges, improve operational efficiency, and accelerate digital transformation, highlighting standards, real‑world evaluations, and key benefits such as real‑time analysis and noise reduction.

DevOpsIT GovernanceOperations
0 likes · 7 min read
How AIOps Is Empowering Enterprise Digital Transformation
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Nov 1, 2021 · Operations

Mastering Service Degradation: Strategies to Keep Your System Available Under Load

Service degradation, a crucial reliability technique, involves selectively disabling non-essential features, applying rate limiting, timeout handling, fallback data, and tiered switches across front‑end, back‑end, and infrastructure layers to maintain core functionality during traffic spikes or component failures, ensuring high availability and meeting SLA targets.

FallbackOperationsRate Limiting
0 likes · 13 min read
Mastering Service Degradation: Strategies to Keep Your System Available Under Load
58UXD
58UXD
Oct 29, 2021 · Operations

How the CST Model Boosts User Conversion: A Design Case Study

This article examines how applying the CST design model, user segmentation, and psychological principles such as mental accounting and social proof can significantly improve conversion rates for a savings membership product.

AB testingCST modelOperations
0 likes · 7 min read
How the CST Model Boosts User Conversion: A Design Case Study
Huolala Tech
Huolala Tech
Oct 29, 2021 · Operations

How Huolala Guarantees Cloud‑Native Stability at Scale

In this detailed account of Huolala's 2021 Cloud Operations Best Practices talk, the company shares its multi‑cloud architecture, service‑oriented governance, capacity‑testing, monitoring, and risk‑prediction techniques that together ensure high‑availability and efficient scaling for its diverse logistics services.

Operationscapacity testingmonitoring
0 likes · 17 min read
How Huolala Guarantees Cloud‑Native Stability at Scale
Alibaba Terminal Technology
Alibaba Terminal Technology
Oct 25, 2021 · Cloud Native

How Alibaba’s AServer Gateway Evolved to a Cloud‑Native Architecture

Alibaba’s AServer access gateway, handling billions of users and millions of QPS, transitioned from a monolithic tengine‑based system to a cloud‑native, containerized architecture with Kubernetes, Pilot, and Envoy, improving operational complexity, dynamic routing, traffic isolation, and scalability for massive e‑commerce traffic.

OperationsScalabilityService Mesh
0 likes · 17 min read
How Alibaba’s AServer Gateway Evolved to a Cloud‑Native Architecture
Efficient Ops
Efficient Ops
Oct 22, 2021 · Operations

How Zhengzhou Bank Boosted Delivery Speed 2.3× with DevOps Standard Assessment

Zhengzhou Bank’s new retail loan system passed the third‑level DevOps continuous‑delivery assessment, leading to a 2.31‑fold increase in annual delivery demand, an 18‑day reduction in cycle time, and a shift to automated, seconds‑level environment delivery, illustrating the transformative power of standardized DevOps practices.

Continuous DeliveryDevOpsOperations
0 likes · 12 min read
How Zhengzhou Bank Boosted Delivery Speed 2.3× with DevOps Standard Assessment
Efficient Ops
Efficient Ops
Oct 22, 2021 · Operations

What Do 42 Companies Reveal About DevOps Maturity in China?

The DevOps International Summit in Beijing announced that 42 enterprises covering 108 projects achieved level‑3 maturity in the CAICT DevOps Capability Model, highlighting the impact of standardized tools and processes on software delivery efficiency across finance, telecom and other sectors.

ChinaDevOpsMaturity Model
0 likes · 6 min read
What Do 42 Companies Reveal About DevOps Maturity in China?
ByteFE
ByteFE
Oct 20, 2021 · Operations

Troubleshooting DNS Resolution Failure of goofy.app in Singapore Office Due to DNSSEC Misconfiguration

After users in Singapore reported inability to resolve the internal domain goofy.app, a systematic investigation revealed that DNSSEC misconfiguration—specifically an incorrect DS record—caused DNS resolution failures globally, while Chinese DNS servers succeeded due to disabled DNSSEC validation, and removing the faulty key resolved the issue.

DNSSECDomain ResolutionOperations
0 likes · 8 min read
Troubleshooting DNS Resolution Failure of goofy.app in Singapore Office Due to DNSSEC Misconfiguration
Baidu Geek Talk
Baidu Geek Talk
Oct 20, 2021 · Operations

Practical Strategies for Building High‑Availability Systems

This article presents a comprehensive, step‑by‑step guide on improving system reliability through early fault detection, scope reduction, frequency reduction, and rapid incident handling, using real‑world practices from Baidu's commercial hosting platform.

Circuit BreakerLog StandardizationOperations
0 likes · 20 min read
Practical Strategies for Building High‑Availability Systems
Zhongtong Tech
Zhongtong Tech
Oct 19, 2021 · Operations

Transforming Load Testing at ZTO: From Offline Pitfalls to Safe Full‑Chain Online Testing

This article details ZTO's evolution from traditional offline and online load‑testing approaches—highlighting their shortcomings—to a comprehensive full‑chain performance testing solution that uses JavaAgent probes, shadow resources, and a structured deployment and verification process to ensure safe, accurate production testing.

Operationsfull-chain testingload-testing
0 likes · 17 min read
Transforming Load Testing at ZTO: From Offline Pitfalls to Safe Full‑Chain Online Testing
360 Tech Engineering
360 Tech Engineering
Oct 15, 2021 · Operations

Log Collection Architecture Using Filebeat, Logstash, and Kafka

This article describes a lightweight, resource‑efficient log collection solution that combines Filebeat agents, optional Logstash aggregation, and Kafka transport, detailing configuration choices, meta‑persistence, back‑pressure mechanisms, monitoring setup, and deployment architecture for reliable at‑least‑once delivery.

FilebeatLogstashOperations
0 likes · 14 min read
Log Collection Architecture Using Filebeat, Logstash, and Kafka
DataFunTalk
DataFunTalk
Oct 15, 2021 · Artificial Intelligence

Risk Control and Operations for Existing Credit Customers: Models, Strategies, and Practices

This article examines how financial institutions can manage risk and improve operations for existing loan customers by analyzing client flow, regulatory impacts, accelerated deterioration, and layered segmentation, and by applying advanced models such as rule‑based alerts, B‑card scoring, LSTM, and survival analysis to enable timely risk detection and targeted cross‑selling.

Customer SegmentationOperationsfinancial modeling
0 likes · 20 min read
Risk Control and Operations for Existing Credit Customers: Models, Strategies, and Practices
IT Architects Alliance
IT Architects Alliance
Oct 14, 2021 · Operations

How to Build a TB‑Scale Log Monitoring System with ELK Stack

This article explains how to design and implement a TB‑level log monitoring platform for micro‑service environments using ELK Stack, Filebeat, Elastic APM, Kafka Streams, Prometheus, and Grafana, covering data collection, filtering, storage, and visualization while addressing cost and resource constraints.

ELKFilebeatGrafana
0 likes · 9 min read
How to Build a TB‑Scale Log Monitoring System with ELK Stack
DevOps
DevOps
Oct 12, 2021 · Operations

Gray Release (Canary Deployment): Concepts, Benefits, and Implementation Guide

This article explains what gray release (canary deployment) is, why it is needed to reduce risk and improve product quality, and provides a step‑by‑step guide covering strategy, user targeting, data feedback, rollback, deployment architectures, and version management for modern software operations.

Gray ReleaseOperationsVersion Control
0 likes · 13 min read
Gray Release (Canary Deployment): Concepts, Benefits, and Implementation Guide
Open Source Linux
Open Source Linux
Oct 11, 2021 · Operations

10 Essential Ops Principles Every Engineer Should Follow

This article shares ten practical operations guidelines—from avoiding duplicated work and embracing mistakes to emphasizing monitoring, backup roles, clear division of labor, and continuous improvement—aimed at boosting reliability, efficiency, and team cohesion for both engineers and managers.

OperationsReliabilitybest practices
0 likes · 10 min read
10 Essential Ops Principles Every Engineer Should Follow
Open Source Linux
Open Source Linux
Oct 10, 2021 · Operations

Essential Linux Command-Line Tools to Boost Your Productivity

This article presents a curated list of powerful Linux command-line utilities—ranging from fast file searchers and interactive Git viewers to system monitors and multi‑threaded downloaders—each explained with concise descriptions and usage examples to help developers and sysadmins work more efficiently.

Operationscommand-lineproductivity
0 likes · 5 min read
Essential Linux Command-Line Tools to Boost Your Productivity
HaoDF Tech Team
HaoDF Tech Team
Oct 8, 2021 · Operations

Understanding SRE: Foundations, Metrics, and Tackling Technical Debt

This article introduces the fundamentals of Site Reliability Engineering (SRE), explains how to measure service stability with metrics like MTTR, MTBF, and availability, outlines the SRE workflow from prevention to post‑mortem, and discusses how to identify and reduce technical debt to improve system health.

OperationsReliabilitySRE
0 likes · 18 min read
Understanding SRE: Foundations, Metrics, and Tackling Technical Debt
dbaplus Community
dbaplus Community
Oct 7, 2021 · Databases

How to Measure and Eliminate Slow SQL in Large‑Scale MySQL Deployments

This article explains what MySQL slow queries are, why they cause system failures, proposes multi‑dimensional metrics to assess their severity, outlines concrete guidelines and change standards, and shares real‑world optimization cases and daily operational practices for eliminating slow SQL.

Database PerformanceMetricsMySQL
0 likes · 13 min read
How to Measure and Eliminate Slow SQL in Large‑Scale MySQL Deployments
Top Architect
Top Architect
Oct 7, 2021 · Backend Development

Essential Linux Commands and Java Debugging Tools for Backend Engineers

This article compiles a practical set of Linux command examples and Java debugging utilities—including tail, grep, awk, find, tsar, btrace, Greys, Arthas, JProfiler, and various JVM tools—to help backend developers quickly diagnose and resolve performance and stability issues in production environments.

DebuggingOperationsjava
0 likes · 13 min read
Essential Linux Commands and Java Debugging Tools for Backend Engineers
IT Architects Alliance
IT Architects Alliance
Oct 1, 2021 · Operations

Understanding Service Degradation: Definitions, Levels, and Mitigation Strategies

The article explains service degradation concepts, defines SLA levels and the meaning of six nines, and details various degradation techniques such as fallback data, rate‑limiting, timeout, fault handling, read/write strategies, frontend safeguards, and the use of switches and pre‑embedding to maintain system availability during traffic spikes or failures.

Circuit BreakerFallbackOperations
0 likes · 12 min read
Understanding Service Degradation: Definitions, Levels, and Mitigation Strategies
Continuous Delivery 2.0
Continuous Delivery 2.0
Sep 30, 2021 · Operations

Key Findings from the 2021 DORA DevOps Report: SRE Practices, Documentation, Security, and Culture

The 2021 DORA DevOps Report reveals that elite teams outperform low‑performing teams by adopting SRE principles, high‑quality documentation, integrated security, modern technical practices such as loose coupling, continuous testing, CI/CD, and a performance‑driven culture that fosters belonging and inclusion.

CultureOperationsSRE
0 likes · 19 min read
Key Findings from the 2021 DORA DevOps Report: SRE Practices, Documentation, Security, and Culture
Liangxu Linux
Liangxu Linux
Sep 28, 2021 · Operations

Top Linux CLI Tools for Real‑Time Network Bandwidth Monitoring

This article surveys a collection of Linux command‑line utilities that can monitor overall, per‑interface, per‑socket, and per‑process network bandwidth, explaining how each tool works, what data it reports, and how to install it on major distributions.

Network MonitoringOperationsSystem Administration
0 likes · 10 min read
Top Linux CLI Tools for Real‑Time Network Bandwidth Monitoring
Open Source Linux
Open Source Linux
Sep 27, 2021 · Operations

Step-by-Step Guide to Installing Zabbix 5 on CentOS 7

This article provides a comprehensive, hands‑on tutorial for installing and configuring Zabbix 5 on CentOS 7, covering system overview, key terminology, disabling SELinux and firewalls, setting up repositories, installing server, agent, frontend, MariaDB, database initialization, configuration tweaks, and final web‑UI setup.

CentOSInstallationOperations
0 likes · 9 min read
Step-by-Step Guide to Installing Zabbix 5 on CentOS 7
Programmer DD
Programmer DD
Sep 27, 2021 · Operations

How a Rural County Built China’s Dominant Copy‑Printing Empire

This article traces the emergence and evolution of Newhua County’s copy‑printing industry—from 1960s typewriter repairs to a nationwide network of repair shops, second‑hand markets, and equipment manufacturing—highlighting its social roots, ladder‑style development, research methods, key findings, and lasting impact on China’s office‑equipment sector.

ChinaNewhuaOperations
0 likes · 25 min read
How a Rural County Built China’s Dominant Copy‑Printing Empire
Efficient Ops
Efficient Ops
Sep 23, 2021 · Operations

Why Did Our New Deployment Crash? Uncovering Metaspace‑Induced Full‑GC

The article recounts a staged rollout of the Maybach service on elastic cloud, details the timeline of successful and failing deployments, analyzes JVM metrics revealing excessive Metaspace usage that triggered continuous full garbage collections, and explains how this caused system‑wide timeouts and a half‑hour outage.

Full GCMetaspaceOperations
0 likes · 10 min read
Why Did Our New Deployment Crash? Uncovering Metaspace‑Induced Full‑GC
Efficient Ops
Efficient Ops
Sep 23, 2021 · Operations

How Leading Chinese Insurers Achieved DevOps Maturity: Case Studies and Insights

This article examines how three major Chinese insurance firms applied the CAICT DevOps Capability Maturity Model to improve IT efficiency, integrate teams, and accelerate continuous delivery, highlighting architectural innovations, cloud adoption, and measurable performance gains across distributed core systems, e‑commerce platforms, and agricultural claims solutions.

Continuous DeliveryDevOpsInsurance
0 likes · 9 min read
How Leading Chinese Insurers Achieved DevOps Maturity: Case Studies and Insights
Liangxu Linux
Liangxu Linux
Sep 22, 2021 · Cloud Native

Master Dockerfile: Complete Guide to All Instructions and Best Practices

This article provides a comprehensive, step‑by‑step explanation of every Dockerfile instruction—including variables, FROM, RUN, CMD, LABEL, EXPOSE, ENV, ARG, ADD, COPY, ENTRYPOINT, VOLUME, STOPSIGNAL, HEALTHCHECK, SHELL, WORKDIR, and USER—along with syntax details, usage tips, and practical code examples for building efficient container images.

ContainerDockerDockerfile
0 likes · 12 min read
Master Dockerfile: Complete Guide to All Instructions and Best Practices
Efficient Ops
Efficient Ops
Sep 22, 2021 · Operations

Master Advanced kubectl Tricks: Debug, Filter, and Automate Kubernetes Pods

This article shares a collection of powerful kubectl commands and techniques—including API debugging, status‑based pod filtering and deletion, node‑specific pod listing, pod distribution statistics, and proxy usage—to help Kubernetes operators work more efficiently and avoid manual API scripting.

CLIDevOpsOperations
0 likes · 7 min read
Master Advanced kubectl Tricks: Debug, Filter, and Automate Kubernetes Pods
DevOps Cloud Academy
DevOps Cloud Academy
Sep 21, 2021 · Operations

Practical Elasticsearch Operations and Performance Tuning Guide

This article extends previous Elasticsearch cheat sheets with practical commands and step‑by‑step instructions for shard allocation, replica adjustment, cluster settings, slow‑log configuration, mapping routing, force merge, bulk writes, refresh intervals, translog durability, heap sizing, disk‑space monitoring, and troubleshooting strategies.

Cluster ManagementElasticsearchOperations
0 likes · 7 min read
Practical Elasticsearch Operations and Performance Tuning Guide
Efficient Ops
Efficient Ops
Sep 16, 2021 · Operations

How Chinese Banks Are Accelerating Digital Transformation with DevOps Maturity

This article reviews the China Academy of Information and Communications Technology's DevOps Capability Maturity Model, shows how major state‑owned banks have participated in 39 assessments, and presents detailed case studies illustrating each bank's DevOps adoption, challenges, and outcomes.

Capability Maturity ModelDevOpsOperations
0 likes · 11 min read
How Chinese Banks Are Accelerating Digital Transformation with DevOps Maturity
Efficient Ops
Efficient Ops
Sep 15, 2021 · Operations

How China’s Telecom Giants Accelerate Efficiency with the DevOps Maturity Model

This article details how leading Chinese telecom operators have adopted the CAICT‑led DevOps Capability Maturity Model, evaluating 17 projects across multiple companies to improve IT efficiency, integrate resources, and support business systems, showcasing concrete performance gains and best‑practice insights.

Continuous DeliveryDevOpsMaturity Model
0 likes · 15 min read
How China’s Telecom Giants Accelerate Efficiency with the DevOps Maturity Model
Java Architect Essentials
Java Architect Essentials
Sep 14, 2021 · Operations

Graceful Service Startup and Shutdown for Microservices with Spring Boot and Docker

This article explains how to implement graceful shutdown and startup for microservices using JVM shutdown hooks, Spring Boot's built‑in mechanisms, Docker stop signals, and external containers like Jetty, providing code examples and best‑practice recommendations for ensuring services deregister, reject traffic, and start only after health checks succeed.

DockerGracefulShutdownMicroservices
0 likes · 10 min read
Graceful Service Startup and Shutdown for Microservices with Spring Boot and Docker
Efficient Ops
Efficient Ops
Sep 14, 2021 · Operations

How China’s Leading Banks Achieve DevOps Maturity: Real‑World Case Studies

This article examines how major Chinese state‑owned banks applied the CAICT DevOps Capability Maturity Model to improve IT efficiency, integrate resources, and support business systems, detailing assessment numbers, project implementations, challenges, and outcomes across continuous delivery, security, and toolchain standards.

Continuous DeliveryDevOpsMaturity Model
0 likes · 14 min read
How China’s Leading Banks Achieve DevOps Maturity: Real‑World Case Studies
Architect's Alchemy Furnace
Architect's Alchemy Furnace
Sep 11, 2021 · Operations

Mastering Arthas: A Practical Guide to Java Runtime Debugging and Monitoring

This article introduces Arthas, a Java online diagnostic tool, explains its instrumentation‑based runtime principle, guides installation on various platforms, and provides a comprehensive command reference—including basic, system, class, and enhancement commands—for effective debugging, monitoring, and performance analysis of Java applications.

ArthasInstrumentationOperations
0 likes · 10 min read
Mastering Arthas: A Practical Guide to Java Runtime Debugging and Monitoring
Alibaba Terminal Technology
Alibaba Terminal Technology
Sep 10, 2021 · Mobile Development

How Taobao Overhauled Mobile Diagnostics to Achieve 5‑15‑60 SLA

Taobao redesigned its mobile client’s diagnostics and logging architecture—introducing scenario‑based monitoring, standardized log protocols, snapshot collection, and change‑tracking SDKs—to meet a 5‑minute response, 15‑minute identification, and 60‑minute recovery goal, dramatically improving issue detection, analysis, and resolution efficiency.

Operationsclient-sidelog system
0 likes · 17 min read
How Taobao Overhauled Mobile Diagnostics to Achieve 5‑15‑60 SLA
Efficient Ops
Efficient Ops
Sep 9, 2021 · Operations

How a Chinese Consumer Finance Firm Boosted Efficiency with DevOps – Level‑3 Assessment

In a detailed interview, Henan Zhongyuan Consumer Finance explains how its new generation consumer loan system achieved the industry‑first Level‑3 DevOps continuous delivery assessment, highlighting the standards, tools, performance metrics, challenges overcome, and future plans that together illustrate the transformative power of standardized DevOps practices.

Continuous DeliveryDevOpsOperations
0 likes · 12 min read
How a Chinese Consumer Finance Firm Boosted Efficiency with DevOps – Level‑3 Assessment
Efficient Ops
Efficient Ops
Sep 9, 2021 · Operations

How CITIC Securities Boosted Efficiency with DevOps: A Deep Dive into Their Level‑3 Assessment

CITIC Securities’ CIO Xiao Gang discusses how their outsourced service platform achieved Level‑3 DevOps continuous delivery assessment, detailing the motivations, implementation challenges, measurable improvements, and future plans, while highlighting the broader significance of the national DevOps maturity model for the financial sector.

Continuous DeliveryDevOpsFinancial Services
0 likes · 11 min read
How CITIC Securities Boosted Efficiency with DevOps: A Deep Dive into Their Level‑3 Assessment
Efficient Ops
Efficient Ops
Sep 9, 2021 · Operations

How Haitong Securities Boosted Efficiency with DevOps Standard Evaluation

The interview reveals how Haitong Securities leveraged the national DevOps standard assessment to transform its software development, achieving level‑3 continuous delivery maturity, accelerating release cycles, improving quality, and outlining future DevSecOps and industry‑specific standardization plans.

Continuous DeliveryDevOpsOperations
0 likes · 11 min read
How Haitong Securities Boosted Efficiency with DevOps Standard Evaluation
Efficient Ops
Efficient Ops
Sep 9, 2021 · Operations

How China Construction Bank’s FinTech Arm Earned Top Marks in the National DevOps Standard

The article details how JiAnXin FinTech’s YaoGuang Agile Development Platform achieved an excellent rating in China’s first national DevOps standard evaluation, sharing interview insights on platform architecture, the importance of end‑to‑end toolchains, future DevOps trends, and the tangible benefits realized after the assessment.

Continuous DeliveryDevOpsFinTech
0 likes · 12 min read
How China Construction Bank’s FinTech Arm Earned Top Marks in the National DevOps Standard
Open Source Linux
Open Source Linux
Sep 4, 2021 · Operations

How to Use nologin to Block User Logins on Linux

This guide explains how the Linux nologin command can politely deny user logins, log attempts, and provides multiple methods—including command-line usage, password locking, and /etc/passwd modifications—to restrict login access for specific or all users during system maintenance.

OperationsSystem Administrationlinux
0 likes · 3 min read
How to Use nologin to Block User Logins on Linux
HelloTech
HelloTech
Sep 2, 2021 · Operations

How Production Full‑Link Load Testing Guarantees High Availability at Scale

The article explains why large‑scale services must conduct production full‑link load testing, describes its evolution from ad‑hoc trials to standardized monthly practices, and details the technical and procedural steps—including traffic modeling, JMeter usage, middleware tagging, and responsibility mapping—that ensure reliable capacity planning and risk mitigation.

MicroservicesOperationscapacity planning
0 likes · 13 min read
How Production Full‑Link Load Testing Guarantees High Availability at Scale
Liangxu Linux
Liangxu Linux
Aug 29, 2021 · Operations

Boosting a Python Service to 50k QPS: My Step‑by‑Step Performance Tuning

Through a detailed case study, the author documents the process of optimizing a Python‑based web module—identifying bottlenecks, redesigning architecture with Redis queues, tuning MySQL, adjusting Linux TCP settings, and iteratively load‑testing until achieving 50,000 QPS with sub‑70 ms latency and zero errors.

BackendOperationsPython
0 likes · 9 min read
Boosting a Python Service to 50k QPS: My Step‑by‑Step Performance Tuning
JD Retail Technology
JD Retail Technology
Aug 24, 2021 · Operations

Key Metrics and Process for Lean Value Stream Analysis

The article explains how lean value‑stream analysis uses meaningful metrics such as lead time, process time and percent complete & accurate, outlines a step‑by‑step workflow for mapping and evaluating value streams, and demonstrates the approach with a department‑level case study and radar‑chart analysis.

LeanOperationsProcess Improvement
0 likes · 6 min read
Key Metrics and Process for Lean Value Stream Analysis
Efficient Ops
Efficient Ops
Aug 23, 2021 · Operations

Master HAProxy: Build High‑Performance L7/L4 Load Balancers & HA Clusters

This guide introduces HAProxy, an open‑source L4/L7 load balancer, and walks through its core features, performance and stability characteristics, step‑by‑step installation on CentOS 7, configuration of both L7 and L4 balancing, monitoring, and setting up high‑availability with Keepalived.

HAProxyOperationshigh availability
0 likes · 27 min read
Master HAProxy: Build High‑Performance L7/L4 Load Balancers & HA Clusters
IT Architects Alliance
IT Architects Alliance
Aug 21, 2021 · Operations

Mastering Nginx: From Basics to Advanced Load Balancing and Rate Limiting

This article explains what Nginx is, why it’s chosen for high‑performance reverse proxy and load balancing, walks through its event‑driven architecture, core configuration directives, virtual host setups, location regex rules, static‑dynamic separation, rate‑limiting techniques, load‑balancing algorithms, high‑availability settings and practical code examples.

OperationsRate LimitingWeb server
0 likes · 19 min read
Mastering Nginx: From Basics to Advanced Load Balancing and Rate Limiting
58UXD
58UXD
Aug 20, 2021 · Operations

How the Ganjian Salary Wish Festival Boosted User Engagement

This article analyzes the Ganjian Salary Wish Festival as a case study of operational marketing, exploring industry insights, audience targeting, brand messaging, benefit‑driven conversion, interactive game design, and data results to reveal how such activities can sustainably retain users beyond simple incentives.

MarketingOperationscase study
0 likes · 5 min read
How the Ganjian Salary Wish Festival Boosted User Engagement
Architects' Tech Alliance
Architects' Tech Alliance
Aug 16, 2021 · Operations

The Evolution, Types, and Pitfalls of Enterprise Mid‑Platform Architecture

This article traces the history of the Chinese "mid‑platform" concept, outlines how major tech firms implement various middle‑platform strategies, distinguishes front‑end, back‑end, and middle layers, categorizes platform types, and highlights common pitfalls and organizational challenges in building such platforms.

Business ArchitectureEnterprise ArchitectureOperations
0 likes · 12 min read
The Evolution, Types, and Pitfalls of Enterprise Mid‑Platform Architecture
Efficient Ops
Efficient Ops
Aug 11, 2021 · Operations

Scaling Kubernetes Clusters: Node Quotas, Kernel Tweaks & Etcd Tips

This guide outlines how to prepare large‑scale Kubernetes clusters on public clouds by increasing node quotas, adjusting kernel parameters, configuring high‑availability etcd with the etcd‑operator, tuning kube‑apiserver settings, and applying pod‑level best practices for resource limits and affinity.

Kernel TuningOperationscluster scaling
0 likes · 8 min read
Scaling Kubernetes Clusters: Node Quotas, Kernel Tweaks & Etcd Tips
DevOps
DevOps
Aug 11, 2021 · Operations

Introduction to Chaos Engineering – Part 2: Four Steps for Disrupting Complex Systems

This article explains that chaos engineering is not a magic cure but a disciplined practice for testing distributed systems by designing and running controlled experiments, outlining four essential steps—observability, defining steady state, hypothesizing events, and executing experiments—to gain confidence in system resilience.

ObservabilityOperationschaos engineering
0 likes · 11 min read
Introduction to Chaos Engineering – Part 2: Four Steps for Disrupting Complex Systems
DevOps
DevOps
Aug 9, 2021 · Operations

Microsoft Digital: Internal IT Transformation and Operational Excellence

Microsoft Digital describes how Microsoft’s internal IT organization, renamed from CSEO to Microsoft Digital, drove a comprehensive digital transformation by migrating operations to Azure, adopting cloud‑centric architecture, implementing DevOps, enhancing security, data, and AI capabilities, and aligning vision‑driven priorities to boost productivity, customer focus, and business outcomes.

Information SecurityOperationsdata analytics
0 likes · 20 min read
Microsoft Digital: Internal IT Transformation and Operational Excellence
Wukong Talks Architecture
Wukong Talks Architecture
Aug 6, 2021 · Databases

Redis Operational Best Practices and Guidelines

This guide presents a comprehensive set of mandatory, reference, and recommended Redis usage standards—including command restrictions, key naming, data sizing, persistence configurations, monitoring, and deployment strategies—to improve performance, reliability, and operational efficiency for production environments.

OperationsPersistenceRedis
0 likes · 9 min read
Redis Operational Best Practices and Guidelines
Efficient Ops
Efficient Ops
Aug 2, 2021 · Operations

How Alibaba Scales Massive Big Data Engines with an SRE Framework

This article describes Alibaba’s comprehensive SRE system for managing ultra‑large‑scale big data engines, detailing stability metrics, resource cost management, and intelligent operation productization, and introduces speaker Fu Tianyuan, a senior operations expert leading the MaxCompute and DataWorks SRE team.

AlibabaBig DataCloud Computing
0 likes · 3 min read
How Alibaba Scales Massive Big Data Engines with an SRE Framework
ByteDance SE Lab
ByteDance SE Lab
Jul 30, 2021 · Operations

Inside Salesforce’s Global Outage: What Went Wrong and How to Prevent It

The article examines Salesforce’s five‑hour global outage caused by a shortcut DNS deployment and the subsequent recovery challenges, then explores a viral experiment where twenty smartphones generated artificial traffic congestion, illustrating how real‑time data feeds and operational safeguards can prevent large‑scale service disruptions.

Big DataCloud ComputingIncident Management
0 likes · 7 min read
Inside Salesforce’s Global Outage: What Went Wrong and How to Prevent It