Tagged articles
4063 articles
Page 19 of 41
Alibaba Cloud Native
Alibaba Cloud Native
Feb 27, 2023 · Cloud Native

How CNStack 2.0 Enables Multi‑Cloud, Multi‑Cluster Management with OCM

CNStack 2.0 introduces a cloud‑native multi‑cluster service built on Open Cluster Management, offering unified registration, lifecycle management, resource distribution, multi‑tenant authentication, and high‑availability cross‑cluster communication for Kubernetes clusters across clouds.

AuthenticationCluster RegistrationMulti‑Cluster
0 likes · 15 min read
How CNStack 2.0 Enables Multi‑Cloud, Multi‑Cluster Management with OCM
Alibaba Cloud Native
Alibaba Cloud Native
Feb 27, 2023 · Cloud Native

What’s Next for Microservices? Highlights from the Beijing Cloud Native Meetup

The Beijing "Microservices x Container Open Source Developer Meetup" gathered over 100 developers and core maintainers of leading cloud‑native projects to discuss next‑generation microservice architectures, static compilation, service governance, multi‑cluster management, observability, and more, providing deep technical insights and real‑world examples.

Observabilitycloud-nativekubernetes
0 likes · 11 min read
What’s Next for Microservices? Highlights from the Beijing Cloud Native Meetup
Top Architect
Top Architect
Feb 27, 2023 · Cloud Native

Deploying a K8s ChatGPT Bot with Robusta for Intelligent Alert Troubleshooting

This article guides readers through setting up a Kubernetes‑based ChatGPT bot using the open‑source Robusta platform, covering prerequisites, installation, Slack integration, configuration generation, Helm deployment, testing with crash pods, and interactive alert handling to streamline Prometheus alert resolution.

ChatGPTPrometheusRobusta
0 likes · 12 min read
Deploying a K8s ChatGPT Bot with Robusta for Intelligent Alert Troubleshooting
Architect
Architect
Feb 25, 2023 · Cloud Native

Deploying a K8s ChatGPT Bot with Robusta: A Step‑by‑Step Guide

This article walks through installing Robusta, configuring Slack integration, adding Helm repositories, deploying the Robusta platform on a Kubernetes cluster, creating a crash‑loop pod to trigger alerts, and interacting with a ChatGPT bot to automatically troubleshoot Prometheus alerts, providing complete code snippets and screenshots for each step.

AI OpsChatGPTPrometheus
0 likes · 12 min read
Deploying a K8s ChatGPT Bot with Robusta: A Step‑by‑Step Guide
Baidu Geek Talk
Baidu Geek Talk
Feb 24, 2023 · Cloud Native

Design and Resource Scheduling of Cloud‑Native AI and the PaddleFlow Workflow Engine

The article explains Baidu’s cloud‑native AI resource scheduling across single‑ and multi‑GPU nodes, describes the PaddleFlow Kubernetes‑based workflow engine with its hierarchical queues, advanced scheduling algorithms, unified storage, and how these technologies improve GPU utilization, reduce fragmentation, and simplify AI task orchestration.

AIPaddleFlowWorkflow Engine
0 likes · 23 min read
Design and Resource Scheduling of Cloud‑Native AI and the PaddleFlow Workflow Engine
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Feb 24, 2023 · Cloud Native

NetEase Cloud Music Open-Sources Horizon: A Kubernetes-Based GitOps Continuous Deployment Platform

NetEase Cloud Music open-sourced Horizon, a Kubernetes-based GitOps continuous deployment platform, offering standardized Helm‑based templates, RBAC, multi‑cloud support, CI integration, and extensibility, built on Argo CD, Tekton, and other components, now used in large‑scale production across multiple regions.

Argo CDGitOpsHorizon
0 likes · 9 min read
NetEase Cloud Music Open-Sources Horizon: A Kubernetes-Based GitOps Continuous Deployment Platform
Alibaba Cloud Native
Alibaba Cloud Native
Feb 23, 2023 · Cloud Native

How OpenYurt Enables Large‑Scale Edge Computing for Longyuan Power

This article explains how OpenYurt, an unobtrusive cloud‑native edge platform, integrates with the CNStack technology hub to deliver high‑availability, offline‑autonomous, and programmable edge services for Longyuan Power’s massive multi‑province server fleet.

CNStackDistributed SystemsOpenYurt
0 likes · 10 min read
How OpenYurt Enables Large‑Scale Edge Computing for Longyuan Power
Zhuanzhuan Tech
Zhuanzhuan Tech
Feb 20, 2023 · Operations

Evolution of Zhuanzhuan's Test Environments: From Monolithic Setups to Docker‑Based Dynamic and Stable Platforms

This article details how Zhuanzhuan transformed its testing infrastructure from a handful of monolithic servers to a Docker‑driven, tag‑routed dynamic and stable environment, addressing resource shortages, waste, and stability issues while achieving significant reductions in deployment time, resource consumption, and user‑reported problems.

DevOpsDockerTag Routing
0 likes · 14 min read
Evolution of Zhuanzhuan's Test Environments: From Monolithic Setups to Docker‑Based Dynamic and Stable Platforms
转转QA
转转QA
Feb 17, 2023 · Operations

Evolution of Zhuanzhuan's Test Environments: From Monolithic Setups to Docker‑Based Dynamic and Stable Environments

This article details how Zhuanzhuan’s testing environment progressed from a handful of static machines to a Docker‑driven dynamic‑and‑stable architecture, addressing resource shortages, stability issues, and operational inefficiencies through IP routing, tag routing, and extensive automation, ultimately achieving significant reductions in resource usage, deployment time, and user‑reported problems.

DevOpsDockerEnvironment
0 likes · 13 min read
Evolution of Zhuanzhuan's Test Environments: From Monolithic Setups to Docker‑Based Dynamic and Stable Environments
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 16, 2023 · Cloud Native

How AppManager Enables Scalable Multi‑Cloud Application Deployment

This article explains how AppManager, built on OAM and Groovy plug‑ins, provides extensible multi‑cloud management by supporting dynamic component, trait, and workflow integration, automated build‑packaging, resource add‑ons, multi‑environment isolation, and built‑in state monitoring for reliable application delivery.

AppManagerDevOpsOAM
0 likes · 15 min read
How AppManager Enables Scalable Multi‑Cloud Application Deployment
JD Cloud Developers
JD Cloud Developers
Feb 14, 2023 · Cloud Native

Why Kubernetes Is the Backbone of Modern Cloud‑Native Architecture

This article explains the evolution from monolithic to microservice architectures, introduces Kubernetes as the core cloud‑native platform, and details its components, design principles, and resource management strategies for compute, networking, and storage within a cluster.

CSIingresskubernetes
0 likes · 22 min read
Why Kubernetes Is the Backbone of Modern Cloud‑Native Architecture
Bilibili Tech
Bilibili Tech
Feb 14, 2023 · Cloud Native

Bilibili's Vertical Pod Autoscaler (VPA) Practice and Cluster Resource Governance

Bilibili extended Kubernetes with a custom in‑place Vertical Pod Autoscaler framework—including generator, recommender, updater, and webhook controllers plus a management platform for strategy tuning, avoidance, analysis, and anomaly detection—reducing over‑provisioned resources across its ten‑thousand‑node private cloud and achieving up to 60 % CPU and 30 % memory savings.

SREkubernetesvertical pod autoscaler
0 likes · 19 min read
Bilibili's Vertical Pod Autoscaler (VPA) Practice and Cluster Resource Governance
JD Cloud Developers
JD Cloud Developers
Feb 13, 2023 · Cloud Native

Why Docker and Kubernetes Are Revolutionizing Cloud‑Native Development

This article explains Docker’s lightweight container engine, its goals, core concepts such as images, containers, and repositories, compares containers to virtual machines, introduces Dockerfile, cgroups, Docker Compose, Docker Machine, and provides an overview of Kubernetes architecture and components, highlighting their role in cloud‑native environments.

ContainersDevOpsDocker
0 likes · 13 min read
Why Docker and Kubernetes Are Revolutionizing Cloud‑Native Development
21CTO
21CTO
Feb 10, 2023 · Cloud Native

Why Kubernetes Is So Hard to Master: A Beginner’s Q&A Walkthrough

This article introduces Kubernetes fundamentals through a series of questions and answers, covering its architecture, node communication, pod scheduling, data storage, external access, scaling mechanisms, and component coordination, all illustrated with clear diagrams.

Cluster ManagementContainersPod Scheduling
0 likes · 9 min read
Why Kubernetes Is So Hard to Master: A Beginner’s Q&A Walkthrough
Top Architect
Top Architect
Feb 7, 2023 · Cloud Native

Understanding Kubernetes: Core Concepts and Architecture

This article provides a concise, question‑driven overview of Kubernetes, covering its architecture, node and master communication, pod fundamentals, scheduling, storage via etcd, service exposure, scaling mechanisms, and the roles of core components such as kube‑apiserver, kubelet, kube‑proxy and controllers.

Cluster ManagementContainerscloud-native
0 likes · 9 min read
Understanding Kubernetes: Core Concepts and Architecture
Cloud Native Technology Community
Cloud Native Technology Community
Feb 7, 2023 · Cloud Native

Machine Learning‑Based Optimization of Kubernetes Resources

This article explains how machine learning can be applied to automatically optimize CPU and memory settings in Kubernetes clusters, covering both experiment‑driven and observation‑driven approaches, step‑by‑step procedures, best‑practice recommendations, and the benefits of combining both methods for efficient, scalable cloud‑native operations.

Autoscalingkubernetesmachine learning
0 likes · 11 min read
Machine Learning‑Based Optimization of Kubernetes Resources
IT Architects Alliance
IT Architects Alliance
Feb 6, 2023 · Cloud Native

What Is Kubernetes and Why Is It Hard to Get Started?

This article introduces Kubernetes as a Google‑originated container‑based distributed cluster management system, explaining its architecture, core components such as Master, Nodes, Pods, Services, etcd, and detailing how communication, scheduling, storage, external access, scaling, and controller coordination work together.

Distributed SystemsPodService
0 likes · 8 min read
What Is Kubernetes and Why Is It Hard to Get Started?
Ops Development Stories
Ops Development Stories
Feb 6, 2023 · Cloud Native

How to Deploy Odigos for Zero‑Code Observability on Kubernetes

This guide walks you through installing and configuring the open‑source Odigos observability control plane on a Kubernetes cluster, showing how to automatically collect traces, metrics, and logs from applications without modifying code and how to visualize the data with Grafana.

OdigosOpenTelemetrycloud-native
0 likes · 11 min read
How to Deploy Odigos for Zero‑Code Observability on Kubernetes
Efficient Ops
Efficient Ops
Feb 5, 2023 · Cloud Native

Unlock Hidden kubectl Tricks: Boost Your Kubernetes Workflow

This guide showcases advanced kubectl techniques—including printing API details, filtering and deleting pods by status, counting node‑wise pod distribution, and leveraging kubectl proxy—to help Kubernetes users streamline debugging and routine cluster management tasks.

command-linekubectlkubernetes
0 likes · 7 min read
Unlock Hidden kubectl Tricks: Boost Your Kubernetes Workflow
Selected Java Interview Questions
Selected Java Interview Questions
Feb 1, 2023 · Cloud Native

Introduction to Rancher: Features, Installation, and Application Deployment

This article introduces Rancher as a comprehensive container management platform, explains its API server capabilities, monitoring and alerting features, provides step‑by‑step Docker‑based installation instructions, and demonstrates how to bind a Kubernetes cluster, deploy applications, and view pod logs through the Rancher UI.

Container ManagementDockercloud-native
0 likes · 4 min read
Introduction to Rancher: Features, Installation, and Application Deployment
Cloud Native Technology Community
Cloud Native Technology Community
Feb 1, 2023 · Cloud Native

Why Is Kubernetes So Hard to Master? A Step‑by‑Step Overview

This article breaks down the core concepts of Kubernetes—including its master‑worker architecture, pod scheduling, etcd storage, service exposure, scaling mechanisms, and controller interactions—through a series of clear questions and illustrated answers to help beginners grasp the platform’s complexity.

Pod SchedulingService Meshcloud-native
0 likes · 8 min read
Why Is Kubernetes So Hard to Master? A Step‑by‑Step Overview
MaGe Linux Operations
MaGe Linux Operations
Jan 31, 2023 · Cloud Native

Mastering ulimit and cgroup: Limit Files & Threads in Docker/Kubernetes

This article explains how Linux's ulimit and cgroup mechanisms can be used to restrict file descriptors and thread counts in Docker and Kubernetes environments, compares configuration methods, presents experimental results, and offers practical recommendations for setting limits at the container, pod, and host levels.

Containercgroupkubernetes
0 likes · 17 min read
Mastering ulimit and cgroup: Limit Files & Threads in Docker/Kubernetes
Qunar Tech Salon
Qunar Tech Salon
Jan 31, 2023 · Operations

Root Cause Analysis and Mitigation of JVM GC‑Induced OOM and Memory Fragmentation in a Containerized Hotel Pricing Service

This article details how long JVM garbage‑collection pauses and glibc ptmalloc memory‑fragmentation caused container OOM kills in a hotel‑pricing system, and explains the step‑by‑step diagnosis, JVM tuning, Kubernetes health‑check adjustments, and the replacement of ptmalloc with jemalloc to eliminate the issue.

JVMMemoryFragmentationOOM
0 likes · 9 min read
Root Cause Analysis and Mitigation of JVM GC‑Induced OOM and Memory Fragmentation in a Containerized Hotel Pricing Service
Open Source Linux
Open Source Linux
Jan 28, 2023 · Cloud Native

Mastering Kubernetes Probes: Startup, Liveness, and Readiness Explained

This article explains why Kubernetes uses Startup, Liveness, and Readiness probes, describes common Pod states and restart policies, compares the probes, details their configuration fields, and provides practical YAML examples for each probe type to ensure reliable container health monitoring.

LivenessProbePodProbes
0 likes · 17 min read
Mastering Kubernetes Probes: Startup, Liveness, and Readiness Explained
DataFunTalk
DataFunTalk
Jan 18, 2023 · Big Data

Five Major Trends Shaping Big Data, AI, and Cloud Industries in 2023

The article forecasts five key trends for 2023—including cloud cost optimization, multi‑cloud freedom, rapid AI model adoption, expanding data‑sharing ecosystems, and the convergence of data warehouses and lakes—highlighting how they will reshape the big data, artificial intelligence, and cloud landscapes.

Alluxiodata sharingkubernetes
0 likes · 6 min read
Five Major Trends Shaping Big Data, AI, and Cloud Industries in 2023
Alibaba Cloud Native
Alibaba Cloud Native
Jan 18, 2023 · Cloud Native

Decoding Terway ENI‑Trunking: Data‑Plane Paths and SOP Scenarios in Alibaba Cloud

This article provides a deep technical walkthrough of Alibaba Cloud's Terway ENI‑Trunking mode, explaining its architecture, pod‑level networking resources, VLAN‑based traffic steering, security‑group handling, and ten concrete SOP scenarios that illustrate how data packets travel between pods, services, and external clients.

Cloud Native NetworkingENI-TrunkingSecurity Groups
0 likes · 29 min read
Decoding Terway ENI‑Trunking: Data‑Plane Paths and SOP Scenarios in Alibaba Cloud
Open Source Linux
Open Source Linux
Jan 17, 2023 · Backend Development

Why Your Java App Gets OOMKilled in Kubernetes and How to Fix It

This article explains why Java applications running in Kubernetes containers are often terminated with OOMKilled (exit code 137), analyzes the underlying JVM memory‑limit mismatches, and provides practical solutions using cgroup‑aware JVM flags and memory‑tuning techniques.

DockerJVMJava
0 likes · 14 min read
Why Your Java App Gets OOMKilled in Kubernetes and How to Fix It
DevOps
DevOps
Jan 17, 2023 · Operations

Building a DevOps CI/CD Pipeline: A Five‑Step Guide

This article walks beginners through the fundamentals of DevOps by outlining a practical five‑step process for creating a CI/CD pipeline, covering tools for continuous integration, source control, build automation, web server deployment, test coverage, and optional extensions such as containers and middleware automation.

DockerJenkinsci/cd
0 likes · 15 min read
Building a DevOps CI/CD Pipeline: A Five‑Step Guide
Efficient Ops
Efficient Ops
Jan 15, 2023 · Cloud Native

Understanding kubectl top: How Kubernetes Monitors Nodes and Pods

This article explains how the kubectl top command retrieves real‑time CPU and memory metrics for Kubernetes nodes and pods, details the underlying data flow, metric‑server and cAdvisor architecture, and addresses common issues and calculation differences compared to traditional system tools.

cAdvisorkubectl topkubernetes
0 likes · 15 min read
Understanding kubectl top: How Kubernetes Monitors Nodes and Pods
Alibaba Cloud Native
Alibaba Cloud Native
Jan 15, 2023 · Cloud Native

What Real‑World Cloud‑Native Metrics Reveal About JDK, Frameworks, and Resource Usage

Analyzing a year‑long EDAS report of tens of thousands of cloud‑native applications, this article uncovers trends in JDK version adoption, microservice framework choices, resource shape shifts, instance specifications, JVM heap settings, startup latency, elastic policy usage, and health indicators, offering actionable insights for architects.

JDKkubernetesmicroservices
0 likes · 13 min read
What Real‑World Cloud‑Native Metrics Reveal About JDK, Frameworks, and Resource Usage
Alibaba Cloud Native
Alibaba Cloud Native
Jan 14, 2023 · Cloud Native

Why Java Apps OOM in Kubernetes Even Below Xmx and How to Fix It

This article explains why Java applications running in Kubernetes can encounter Out‑Of‑Memory errors despite heap usage staying under the Xmx limit, by examining container resource limits, JVM memory models, cgroup behavior, and provides practical configuration recommendations to prevent OOM.

JVMJavaOOM
0 likes · 16 min read
Why Java Apps OOM in Kubernetes Even Below Xmx and How to Fix It
Ctrip Technology
Ctrip Technology
Jan 12, 2023 · Big Data

Evolution of Ctrip's Log System: From Elasticsearch to ClickHouse and Log 3.0

This article details the evolution of Ctrip's log infrastructure, describing the shift from fragmented departmental logging to a unified Elasticsearch-based platform, the migration to ClickHouse for cost‑effective, high‑performance storage, and the subsequent Log 3.0 redesign that leverages Kubernetes, sharding, and a unified query governance layer to handle petabyte‑scale data.

Big DataClickHouseETL
0 likes · 16 min read
Evolution of Ctrip's Log System: From Elasticsearch to ClickHouse and Log 3.0
Cloud Native Technology Community
Cloud Native Technology Community
Jan 11, 2023 · Cloud Native

Key Kubernetes Trends in 2022: Mainstream Adoption, Edge Growth, Open‑Source Ecosystem, Stateful Deployments, and Ongoing Challenges

The 2022 Kubernetes landscape saw mainstream adoption with widespread managed services, increased edge usage, a thriving open‑source ecosystem, growing interest in stateful workloads, and persistent operational and security challenges, highlighting both the platform's maturity and the work still needed for broader enterprise confidence.

2022 trendsGovernanceStateful Workloads
0 likes · 7 min read
Key Kubernetes Trends in 2022: Mainstream Adoption, Edge Growth, Open‑Source Ecosystem, Stateful Deployments, and Ongoing Challenges
Open Source Linux
Open Source Linux
Jan 11, 2023 · Cloud Computing

How Docker’s Rise and Fall Shaped the Cloud Container Landscape

This article chronicles Docker’s rapid ascent, leadership turmoil, competition with Kubernetes, and eventual sale to Mirantis, illustrating how a pioneering container platform became both a catalyst for cloud innovation and a cautionary tale for open‑source startups.

Cloud ComputingDockerTech Business
0 likes · 14 min read
How Docker’s Rise and Fall Shaped the Cloud Container Landscape
Alibaba Cloud Native
Alibaba Cloud Native
Jan 9, 2023 · Cloud Native

CNStack 2.0: Cloud‑Native Design for Agile, Secure Multi‑Cluster Ops

CNStack 2.0 is a cloud‑native PaaS platform built on Kubernetes that unifies resource and workload management, offering agile, open, and secure multi‑cluster capabilities through modular cloud services, a unified API gateway, and integration with open‑source projects such as Sealer, Emissary‑Ingress, cert‑manager, Velero, and OCM.

Multi-Clustercloud-nativekubernetes
0 likes · 24 min read
CNStack 2.0: Cloud‑Native Design for Agile, Secure Multi‑Cluster Ops
Alibaba Cloud Native
Alibaba Cloud Native
Jan 4, 2023 · Cloud Native

Explore Koordinator v1.1: Load‑Aware Scheduling, cgroup v2, and Descheduler Updates

Koordinator v1.1 introduces load‑aware scheduling with workload‑type awareness, percentile‑based resource aggregation, cgroup v2 support, a new LowNodeLoad descheduler plugin for load‑aware rebalancing, expanded performance collectors, ServiceMonitor integration, and detailed configuration examples, aiming to improve latency‑sensitive workloads and overall cluster resource efficiency.

CloudNativeDeschedulerLoadAware
0 likes · 25 min read
Explore Koordinator v1.1: Load‑Aware Scheduling, cgroup v2, and Descheduler Updates
Alibaba Cloud Native
Alibaba Cloud Native
Jan 3, 2023 · Cloud Native

How KubeVela Workflow Transforms SAE’s Serverless Architecture for Faster Cloud‑Native Upgrades

This article explains how Alibaba Cloud's Serverless Application Engine (SAE) leverages the open‑source KubeVela Workflow to overcome operational, scaling, and integration challenges, detailing the workflow design, step definitions, and three real‑world use cases that illustrate automated ops, release optimization, and rapid feature rollout.

KubeVelaServerlessautomation
0 likes · 17 min read
How KubeVela Workflow Transforms SAE’s Serverless Architecture for Faster Cloud‑Native Upgrades
DataFunSummit
DataFunSummit
Jan 1, 2023 · Big Data

Shopee Data Infra Presentation: Storage Status, Acceleration, Serviceization, and Future Plans

The Shopee Data Infra talk details the current storage architecture, Presto‑based acceleration with Alluxio caching, service‑oriented storage solutions using Alluxio Fuse and S3 APIs, and outlines future enhancements for Spark/Hive integration and CSI/Fuse optimizations, providing a comprehensive view of large‑scale big data storage engineering.

AlluxioCache ManagerData Infrastructure
0 likes · 16 min read
Shopee Data Infra Presentation: Storage Status, Acceleration, Serviceization, and Future Plans
DataFunTalk
DataFunTalk
Jan 1, 2023 · Big Data

Zhihu's Real-Time Computing Platform: From Skytree 1.0 to Mipha 2.0

Zhihu’s real‑time computing platform, initially built as Skytree 1.0 on Kubernetes and later re‑engineered as Mipha 2.0 with Flink SQL, unified metadata management, dynamic jar loading, UDF support, Protobuf format, CDC integration, and extensive operational optimizations, now processes petabyte‑scale data with high reliability.

FlinkReal-Time ComputingSQL Gateway
0 likes · 21 min read
Zhihu's Real-Time Computing Platform: From Skytree 1.0 to Mipha 2.0
Open Source Linux
Open Source Linux
Dec 30, 2022 · Operations

Top 7 Kubernetes Management Tools to Simplify Cluster Operations

This article introduces seven popular Kubernetes management solutions—including K9s, Rancher, the native Dashboard with Kubectl and Kubeadm, Helm, KubeSpray, Kontena Lens, and WKSctl—detailing their key features, usage scenarios, and how they help streamline cluster monitoring, deployment, scaling, and security across cloud‑native environments.

Cluster ManagementDevOpsOperations
0 likes · 9 min read
Top 7 Kubernetes Management Tools to Simplify Cluster Operations
Efficient Ops
Efficient Ops
Dec 29, 2022 · Operations

How eBay Scales Its Event Platform with ClickHouse and Kubernetes

This article details eBay's event platform architecture, explaining why a dedicated event system is needed, how ClickHouse provides high‑performance storage, the use of Kubernetes CRDs for cross‑region high availability, data routing, read/write separation, and query optimizations with LogQL.

ClickHouseEvent PlatformObservability
0 likes · 18 min read
How eBay Scales Its Event Platform with ClickHouse and Kubernetes
MaGe Linux Operations
MaGe Linux Operations
Dec 28, 2022 · Cloud Native

Master Essential kubectl Commands: A Practical Guide for Kubernetes Ops

This comprehensive guide covers kubectl autocomplete, context configuration, object creation, resource viewing, updating, patching, editing, scaling, deletion, pod and node interaction, as well as the versatile kubectl set commands, formatted output options, and visual references for effective Kubernetes cluster management.

Operationscloud-nativecommand-line
0 likes · 15 min read
Master Essential kubectl Commands: A Practical Guide for Kubernetes Ops
Ops Development Stories
Ops Development Stories
Dec 28, 2022 · Operations

When a Massive File Transfer Crashed My K8s Master: A Real‑World Docker Recovery Tale

The author recounts a sudden overload caused by copying hundreds of gigabytes of small files to an Alibaba Cloud NAS, which crashed the master node of a Kubernetes cluster, leading to Docker failures, and describes step‑by‑step troubleshooting, configuration changes, and lessons learned about backups, cautious operations, and calm analysis.

Dockercloud-nativekubernetes
0 likes · 5 min read
When a Massive File Transfer Crashed My K8s Master: A Real‑World Docker Recovery Tale
Past Memory Big Data
Past Memory Big Data
Dec 27, 2022 · Operations

How Volcano Engine DataTester Handles Private Deployment: Architecture, Challenges, and Business‑Driven Solutions

This article details Volcano Engine DataTester's private deployment architecture, the version‑management, performance, and stability challenges encountered, and the business‑oriented solutions—including branch strategies, pipeline automation, ClickHouse model optimizations, and multi‑level caching—that enable reliable, efficient A/B testing in on‑premise environments.

A/B testingAnsibleClickHouse
0 likes · 13 min read
How Volcano Engine DataTester Handles Private Deployment: Architecture, Challenges, and Business‑Driven Solutions
dbaplus Community
dbaplus Community
Dec 26, 2022 · Cloud Native

How Bilibili Boosted Server Utilization with Kubernetes Co‑Location Strategies

This article explains how Bilibili’s large‑scale Kubernetes cloud platform reduces costs and improves machine utilization by applying co‑location (mixed‑tenant) techniques, including resource‑aware scheduling, dynamic isolation, and a dedicated management console across online, offline, and idle‑machine scenarios.

Co-locationSchedulingcloud-native
0 likes · 17 min read
How Bilibili Boosted Server Utilization with Kubernetes Co‑Location Strategies
ITPUB
ITPUB
Dec 26, 2022 · Cloud Native

What Really Happens When You Deploy an App on Kubernetes?

This article walks through the complete lifecycle of a Kubernetes deployment, explaining how a manual upgrade request triggers API calls, creates Deployments, ReplicaSets, Pods, and how the scheduler, kubelet, and Docker work together, while also covering concepts like containers, labels, replication controllers, deployments, and autoscaling mechanisms.

AutoscalingContainersPod
0 likes · 23 min read
What Really Happens When You Deploy an App on Kubernetes?
Tencent Cloud Developer
Tencent Cloud Developer
Dec 26, 2022 · Cloud Native

Challenges and Optimization Strategies for Containerized Deployment of Online Services on Kubernetes

Tencent’s shift from VMs to Kubernetes for massive online services faces pod‑size rigidity, heterogeneous node balancing, elastic scaling, and massive cluster‑pool mapping, prompting optimizations such as dynamic CPU compression, custom load‑aware scheduling, collaborative HPA/VPA scaling, dynamic quota migration, unified routing‑sync, and an automated decision‑tree‑driven self‑healing workflow for container‑destruction failures.

ContainerizationDynamic Schedulingkubernetes
0 likes · 12 min read
Challenges and Optimization Strategies for Containerized Deployment of Online Services on Kubernetes
Open Source Linux
Open Source Linux
Dec 26, 2022 · Cloud Native

Why Does My Kubernetes Service Fail? 10 Common Issues and Quick Fixes

This guide walks through ten frequent Kubernetes problems—including service access failures, port mapping errors, certificate issues, pod image pull errors, init‑container hangs, and CrashLoopBackOff—explaining their causes and providing concise, step‑by‑step solutions to restore cluster functionality.

CrashLoopBackOffInitContainerNodePort
0 likes · 7 min read
Why Does My Kubernetes Service Fail? 10 Common Issues and Quick Fixes
HelloTech
HelloTech
Dec 23, 2022 · Cloud Native

Design Principles and Implementation Details of Kubernetes Horizontal Pod Autoscaler and Custom Water Pod Autoscaler

The article explains Kubernetes’ built‑in Horizontal Pod Autoscaler, then details the custom Water Pod Autoscaler (WPA) that extends HPA with dual‑signal (load and SOA registration) detection, dual‑threshold scaling, noise filtering, configurable cooldown, frequency limits, tolerance buffers, and integrated alerting for reliable elastic scaling.

AutoscalingMetricsWPA
0 likes · 13 min read
Design Principles and Implementation Details of Kubernetes Horizontal Pod Autoscaler and Custom Water Pod Autoscaler
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 23, 2022 · Cloud Native

What Happens When You Deploy an App on Kubernetes? A Deep Dive

This article walks through the entire lifecycle of deploying an application on Kubernetes, explaining how Docker containers differ from virtual machines, the role of Pods, ReplicationControllers, Deployments, and how automatic scaling with HPA and VPA keeps services reliable and efficient.

ReplicationControllercloud-nativedeployment
0 likes · 21 min read
What Happens When You Deploy an App on Kubernetes? A Deep Dive
Code Ape Tech Column
Code Ape Tech Column
Dec 23, 2022 · Cloud Native

Overview of Popular Microservice Technology Stack and Governance Frameworks

This article presents a comprehensive overview of widely adopted microservice technology stacks, including governance frameworks like Apache Dubbo and Spring Cloud Alibaba, CI/CD tools, container orchestration, and various application services, while also offering practical selection guidance for developers and product teams.

cloud-nativekubernetesservice governance
0 likes · 12 min read
Overview of Popular Microservice Technology Stack and Governance Frameworks
ITPUB
ITPUB
Dec 22, 2022 · Cloud Native

How 58 Tongcheng Built a Cloud‑Native Deep Learning Inference Platform with Istio

This article details the evolution of 58 Tongcheng's deep learning inference platform—from the initial WPAI‑based architecture to a cloud‑native, Istio‑powered design—covering its background, technical challenges, architectural redesign, traffic‑management features, adaptive rate limiting, model warm‑up, and observability improvements.

AI inferenceIstioService Mesh
0 likes · 24 min read
How 58 Tongcheng Built a Cloud‑Native Deep Learning Inference Platform with Istio
Ctrip Technology
Ctrip Technology
Dec 22, 2022 · Cloud Native

Evolution and Cloud‑Native Architecture of Ctrip’s Microservice Products

The article outlines Ctrip’s microservice journey from its 2013 inception, detailing the evolution of its frameworks, the complexities of operating multiple stacks, the challenges faced, and the design of a progressive cloud‑native service‑mesh architecture built on Istio, Envoy, and custom operators.

DubboIstioService Mesh
0 likes · 10 min read
Evolution and Cloud‑Native Architecture of Ctrip’s Microservice Products
58 Tech
58 Tech
Dec 22, 2022 · Artificial Intelligence

Implementing a Cloud-Native Istio Gateway for 58.com Deep Learning Inference Platform

This article details the evolution of 58.com’s deep learning inference platform, describing the transition from the original SCF‑based architecture to a cloud‑native Istio gateway (architecture 2.0), and explains design choices, traffic‑management, adaptive rate‑limiting, observability, model pre‑warming, and performance improvements.

AIDeep LearningInference Platform
0 likes · 22 min read
Implementing a Cloud-Native Istio Gateway for 58.com Deep Learning Inference Platform
Efficient Ops
Efficient Ops
Dec 20, 2022 · Cloud Native

Understanding Kubernetes Pods, Services, and Load Balancing Basics

This article explains Kubernetes pod architecture, networking, external exposure, and how Services use virtual IPs and selectors to provide load balancing and dynamic discovery of pod changes, including the role of kube-proxy and the limitations of using Nginx for pod-level balancing.

PodsServicecloud-native
0 likes · 8 min read
Understanding Kubernetes Pods, Services, and Load Balancing Basics
Volcano Engine Developer Services
Volcano Engine Developer Services
Dec 15, 2022 · Cloud Native

How ByteDance Scaled Cloud‑Native Infrastructure: Lessons in Multi‑Cluster Scheduling

ByteDance’s cloud‑native transformation details a layered technical system, multi‑year Kubernetes‑based evolution, unified multi‑cluster resource management, and hierarchical scheduling, illustrating how the company achieves high development speed, resource efficiency, and prepares for next‑generation serverless infrastructure.

DevOpsServerlesscloud-native
0 likes · 21 min read
How ByteDance Scaled Cloud‑Native Infrastructure: Lessons in Multi‑Cluster Scheduling
Open Source Linux
Open Source Linux
Dec 15, 2022 · Cloud Native

Kubernetes 1.26 ‘Electrifying’: Key New Features, Deprecations, and Upgrades

Kubernetes 1.26, themed “Electrifying,” introduces 37 enhancements—including registry changes, storage upgrades, signed release artifacts, Windows high‑privilege containers, metric and scheduling improvements—while promoting 11 features to stable, deprecating 12 APIs, and emphasizing sustainability and carbon‑footprint awareness.

Metricscloud-nativecontainer-runtime
0 likes · 10 min read
Kubernetes 1.26 ‘Electrifying’: Key New Features, Deprecations, and Upgrades
Efficient Ops
Efficient Ops
Dec 14, 2022 · Operations

How to Build a Scalable Container Log Collection System with S6 and Filebeat

This article explains Docker and Kubernetes container logging fundamentals, highlights the limitations of default json‑file logging, and presents a unified log‑collection architecture using S6‑based images, filebeat, logrotate, Kafka, and Elasticsearch, with practical steps for dynamic configuration and log rotation in a k8s cluster.

DockerFilebeatLogging
0 likes · 9 min read
How to Build a Scalable Container Log Collection System with S6 and Filebeat
vivo Internet Technology
vivo Internet Technology
Dec 14, 2022 · Cloud Native

Vivo’s Cloud‑Native Container Practices: High‑Availability, Automation, and Platform Evolution

Vivo’s cloud‑native journey, detailed from its 2018 machine‑learning pilot to a large‑scale container ecosystem, showcases how high‑availability design, automated multi‑cluster operations, CI/CD pipelines, and unified traffic ingress have dramatically improved efficiency, reduced costs, and enabled rapid, scalable AI‑driven services across the business.

ContainerPlatform Engineeringautomation
0 likes · 19 min read
Vivo’s Cloud‑Native Container Practices: High‑Availability, Automation, and Platform Evolution
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Dec 14, 2022 · Artificial Intelligence

How Cloud‑Native AI Boosts Resource Efficiency with PaddleFlow

This article explains how cloud‑native AI leverages container‑based architectures and advanced scheduling algorithms—such as resource queues, gang scheduling, bin‑packing, GPU topology‑aware and Tor‑aware dispatch—to improve resource and engineering efficiency, and introduces Baidu’s AI workflow engine PaddleFlow with its design, features, and deployment options.

AI workflowCloud Native AIGPU virtualization
0 likes · 25 min read
How Cloud‑Native AI Boosts Resource Efficiency with PaddleFlow
Cloud Native Technology Community
Cloud Native Technology Community
Dec 14, 2022 · Cloud Native

Kubernetes v1.26 Release: New Features, Enhancements, and Deprecations

Kubernetes 1.26 is officially released, introducing 37 enhancements—including 11 stable and 10 beta features—while deprecating 12 APIs, updating the container image registry, removing CRI v1alpha2, advancing storage CSI migrations, enhancing metrics, and adding support for Windows privileged containers and dynamic resource allocation.

CSIContainer Runtime InterfaceMetrics
0 likes · 15 min read
Kubernetes v1.26 Release: New Features, Enhancements, and Deprecations
Architect's Guide
Architect's Guide
Dec 14, 2022 · Cloud Native

Understanding Underlay and Overlay Network Models in Kubernetes

This article explains Kubernetes networking models, detailing the underlay network infrastructure, overlay techniques, and common CNI implementations such as Flannel, Calico, IPVLAN, and VxLAN, while comparing their architectures, protocols, and configuration considerations.

CNICalicoFlannel
0 likes · 12 min read
Understanding Underlay and Overlay Network Models in Kubernetes
Efficient Ops
Efficient Ops
Dec 12, 2022 · Operations

How Bilibili Built a 5‑Year SRE Journey: High‑Availability, Multi‑Active, and Capacity Management

This article chronicles Bilibili's five‑year evolution of Site Reliability Engineering, detailing the introduction of SRE culture, the construction of high‑availability and multi‑active architectures, capacity management with Kubernetes, VPA/HPA, incident case studies, and the ongoing transformation of SRE practices across the organization.

OperationsSREhigh availability
0 likes · 24 min read
How Bilibili Built a 5‑Year SRE Journey: High‑Availability, Multi‑Active, and Capacity Management
Alibaba Cloud Native
Alibaba Cloud Native
Dec 12, 2022 · Cloud Native

How ACK One Enables Multi‑Cluster GitOps and Unified Alert Management

ACK One is a distributed cloud‑native container platform that unifies management of Kubernetes clusters across hybrid‑cloud, edge, and on‑prem environments, offering GitOps‑based multi‑cluster application distribution with ArgoCD integration and a centralized alert‑management system.

Alert ManagementArgoCDGitOps
0 likes · 9 min read
How ACK One Enables Multi‑Cluster GitOps and Unified Alert Management
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Dec 12, 2022 · Cloud Native

How Karmada Powers Multi‑Cloud, Multi‑Cluster Production at Cloud Native Days China 2022

The Karmada community's Cloud Native Days China 2022 session in Nanjing gathered over 30 enterprises and developers to share multi‑cloud, multi‑cluster production practices, large‑scale testing results, and real‑world implementations from Huawei Cloud, vivo, Hurricane Engine, China Mobile, DaoCloud, and Zhejiang University, highlighting Karmada's scalability and ecosystem growth.

KarmadaMulti-Clusterkubernetes
0 likes · 9 min read
How Karmada Powers Multi‑Cloud, Multi‑Cluster Production at Cloud Native Days China 2022
Top Architect
Top Architect
Dec 12, 2022 · Cloud Native

Building a Container Platform at Ximalaya: Practices, Principles, and Evolution

The article chronicles Ximalaya's journey from early Docker-based Java project templates to a mature Kubernetes-driven container platform, detailing development principles, health‑check strategies, deployment workflows, middleware integration, and lessons learned about scaling, automation, and collaborative engineering.

CloudNativeContainerizationDevOps
0 likes · 13 min read
Building a Container Platform at Ximalaya: Practices, Principles, and Evolution
DevOps Cloud Academy
DevOps Cloud Academy
Dec 11, 2022 · Cloud Native

GitOps: The Missing Link for CI/CD on Kubernetes

GitOps leverages Git as an immutable source of truth to streamline CI/CD pipelines for Kubernetes, enhancing productivity, security, and compliance by providing observable, auditable deployments, centralized control, and easy rollbacks, while requiring dedicated tools such as Flux or Weave GitOps Core for full implementation.

DevOpsFluxGitOps
0 likes · 12 min read
GitOps: The Missing Link for CI/CD on Kubernetes
Architect's Guide
Architect's Guide
Dec 11, 2022 · Cloud Native

The Journey of Containerization at Ximalaya: Practices, Principles, and Lessons Learned

This article recounts Ximalaya's multi‑year containerization effort, detailing the evolution from early Docker templates and Marathon to Kubernetes, the development of internal tools like barge and k8s‑sync, health‑check strategies, deployment patterns, and the practical lessons gained from integrating containers with existing middleware.

ContainerizationDevOpsJava
0 likes · 12 min read
The Journey of Containerization at Ximalaya: Practices, Principles, and Lessons Learned
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Dec 10, 2022 · Cloud Native

Why Kubernetes Pods Fail with “Too Many Open Files” and How to Fix It

The article explains the “Too many open files” error in Kubernetes, clarifies that it refers to exceeding system file‑handle limits, shows how to inspect current usage with ulimit and lsof, and provides step‑by‑step commands to temporarily or permanently raise the limits and troubleshoot the application code.

DevOpsToo many open fileskubernetes
0 likes · 5 min read
Why Kubernetes Pods Fail with “Too Many Open Files” and How to Fix It