DevOps Operations Practice
Author

DevOps Operations Practice

We share professional insights on cloud-native, DevOps & operations, Kubernetes, observability & monitoring, and Linux systems.

164
Articles
0
Likes
661
Views
0
Comments
Recent Articles

Latest from DevOps Operations Practice

100 recent articles max
DevOps Operations Practice
DevOps Operations Practice
Jan 2, 2025 · Operations

Career Paths for Operations Professionals After Age 35

The article compiles various Zhihu users' perspectives on how operations engineers can navigate career transitions after age 35, emphasizing the importance of aligning with larger companies, developing technical or managerial expertise, leveraging specialized infrastructure knowledge, and considering alternative paths such as product, project, or data roles.

ManagementOperationsSkill Development
0 likes · 7 min read
Career Paths for Operations Professionals After Age 35
DevOps Operations Practice
DevOps Operations Practice
Dec 17, 2024 · Backend Development

From CPU Alert to Resolution: A Step‑by‑Step Backend Performance Debugging Guide

This article recounts a midnight CPU alert incident and walks through systematic backend troubleshooting—from initial system checks and JVM profiling to algorithm refactoring, database indexing, Docker‑based isolation, and proactive monitoring—demonstrating how to restore service performance and prevent future outages.

DatabaseDockerJVM
0 likes · 7 min read
From CPU Alert to Resolution: A Step‑by‑Step Backend Performance Debugging Guide
DevOps Operations Practice
DevOps Operations Practice
Dec 16, 2024 · Cloud Native

Analysis of OpenAI's December 2024 Outage: Kubernetes Control Plane Overload and Mitigation

The December 11, 2024 OpenAI outage, caused by a misconfigured monitoring service that overloaded the Kubernetes control plane, led to a four‑hour service disruption and was resolved through cluster scaling, API blocking, and resource expansion, highlighting critical infrastructure risks for large‑scale cloud‑native operations.

Control PlaneKubernetesOpenAI
0 likes · 7 min read
Analysis of OpenAI's December 2024 Outage: Kubernetes Control Plane Overload and Mitigation
DevOps Operations Practice
DevOps Operations Practice
Oct 31, 2024 · Operations

Bilibili Data Center Migration: Planning, Execution, and Lessons Learned

This article details Bilibili’s 18‑month, multi‑region data‑center migration, covering background, project challenges, comprehensive planning, execution steps, risk management, automation, and post‑migration benefits, offering practical insights for large‑scale infrastructure relocation and operational optimization.

BilibiliData Center MigrationProject Management
0 likes · 21 min read
Bilibili Data Center Migration: Planning, Execution, and Lessons Learned
DevOps Operations Practice
DevOps Operations Practice
Oct 22, 2024 · Cloud Native

How to Find the IP Address of a Docker Container

This guide explains how to quickly retrieve the IP address of a running Docker container using simple commands such as `docker ps`, `docker inspect`, and a formatted inspect query, with step‑by‑step instructions and example output for easy debugging and network configuration.

ContainerDockerIP address
0 likes · 3 min read
How to Find the IP Address of a Docker Container
DevOps Operations Practice
DevOps Operations Practice
Oct 10, 2024 · Operations

Seven Key Truths About Operations: Downtime, Automation, Prevention, Technology as a Tool, DevOps, Communication, and Security

Effective operations management acknowledges inevitable downtime, emphasizes automation, prioritizes proactive prevention, treats technology as a means rather than an end, integrates closely with development through DevOps, relies on strong communication, and continuously addresses pervasive security challenges to minimize business impact.

MonitoringOperationsautomation
0 likes · 5 min read
Seven Key Truths About Operations: Downtime, Automation, Prevention, Technology as a Tool, DevOps, Communication, and Security