Tagged articles
4063 articles
Page 15 of 41
Open Source Linux
Open Source Linux
Nov 20, 2023 · Cloud Native

Master Helm: One‑Click Kubernetes Deployments and Management Guide

This article explains how Helm, the Kubernetes package manager, simplifies deploying multiple micro‑service applications with a single command, covering its core features, workflow, key concepts, step‑by‑step usage, release management, installation order, and a comprehensive command reference.

DevOpschartcloud-native
0 likes · 14 min read
Master Helm: One‑Click Kubernetes Deployments and Management Guide
Alibaba Cloud Native
Alibaba Cloud Native
Nov 18, 2023 · Cloud Native

How eBPF Powers Next‑Gen Observability and Root‑Cause Analysis in Kubernetes

This talk explains the three major observability challenges in Kubernetes, demonstrates how eBPF enables comprehensive, low‑overhead data collection across all stack layers, and outlines a practical workflow that combines architecture awareness, application‑level metrics, and fault‑tree analysis to achieve automated root‑cause diagnosis.

Fault DiagnosiseBPFkubernetes
0 likes · 21 min read
How eBPF Powers Next‑Gen Observability and Root‑Cause Analysis in Kubernetes
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Nov 17, 2023 · Cloud Native

Cloud Music FinOps Practice: Building Enterprise Cloud Cost Management Platform

NetEase Cloud Music’s self‑built FinOps platform tackles rising cloud spend by unifying cost data, visualizing and allocating expenses, rating resource utilization, and empowering platform providers, business units, and developers with data‑driven governance to curb the Andy‑Bill effect and enable scalable, long‑term cost control.

Cloud Cost ManagementContainer GovernanceFinOps
0 likes · 8 min read
Cloud Music FinOps Practice: Building Enterprise Cloud Cost Management Platform
iQIYI Technical Product Team
iQIYI Technical Product Team
Nov 17, 2023 · Big Data

Mixed Workload Co-location of Big Data and Online Services at iQIYI: Design, Implementation, and Results

iQIYI’s mixed‑workload system colocates Spark/Hive big‑data jobs with online video services by running YARN NodeManagers inside Kubernetes, using an Elastic YARN Operator, Koordinator‑driven CPU oversubscription, and remote shuffle, boosting online CPU utilization from ~9 % to over 40 % and saving tens of millions of RMB annually.

Big DataMixed WorkloadYARN
0 likes · 19 min read
Mixed Workload Co-location of Big Data and Online Services at iQIYI: Design, Implementation, and Results
Open Source Linux
Open Source Linux
Nov 17, 2023 · Databases

How Large Linux Pages Boost Database Performance on Kubernetes

This article explains how using larger Linux page sizes—especially 2 MB hugepages—dramatically improves database throughput on Kubernetes nodes by reducing TLB cache misses, and provides practical guidance on configuring hugepages, disabling transparent hugepages, and sizing resources for optimal performance.

Database PerformanceLinuxTLB
0 likes · 13 min read
How Large Linux Pages Boost Database Performance on Kubernetes
dbaplus Community
dbaplus Community
Nov 16, 2023 · Cloud Native

Master Kubernetes Troubleshooting: From Pod Failures to DNS Issues

This guide walks through a systematic approach to diagnosing Kubernetes problems, covering pod startup failures, cluster health checks, event logs, pod status, network and storage verification, container logs, DNS service checks, and provides practical commands and tips for each step.

ClusterDNSPod
0 likes · 10 min read
Master Kubernetes Troubleshooting: From Pod Failures to DNS Issues
Alibaba Cloud Native
Alibaba Cloud Native
Nov 10, 2023 · Big Data

Scaling Spark on Kubernetes: Elastic Compute, Cost Savings, and Storage Decoupling

MiHoYo’s data platform team details their migration of Spark workloads to Alibaba Cloud’s ACK Kubernetes service, describing how the Spark‑on‑K8s + OSS‑HDFS architecture delivers elastic compute, up to 50% cost reduction, and true compute‑storage separation, while addressing operational challenges through custom operators, Celeborn, and robust monitoring.

Big DataSparkStorage Decoupling
0 likes · 24 min read
Scaling Spark on Kubernetes: Elastic Compute, Cost Savings, and Storage Decoupling
MaGe Linux Operations
MaGe Linux Operations
Nov 9, 2023 · Cloud Native

Docker vs Kubernetes: Which Container Solution Fits Your Needs?

This article explains Docker and Kubernetes fundamentals, compares their architectures, highlights Docker's achievements, traces the evolution of container technology, and clarifies why Kubernetes is essential while addressing common misconceptions about Docker’s role in modern cloud‑native environments.

ContainerizationDevOpscloud-native
0 likes · 15 min read
Docker vs Kubernetes: Which Container Solution Fits Your Needs?
AntTech
AntTech
Nov 8, 2023 · Artificial Intelligence

Kapacity V0.2 Release: AI‑Driven Traffic‑Based Replica Prediction for Cloud‑Native Autoscaling

Kapacity V0.2 introduces an AI‑powered, traffic‑driven replica prediction algorithm for cloud‑native autoscaling, featuring a Linear‑Residual model, a lightweight Swish Net time‑series forecaster, custom metric support, and open‑source tools, aiming to improve resource efficiency and reduce operational risk.

AIPredictive Autoscalingcapacity planning
0 likes · 9 min read
Kapacity V0.2 Release: AI‑Driven Traffic‑Based Replica Prediction for Cloud‑Native Autoscaling
Open Source Linux
Open Source Linux
Nov 7, 2023 · Cloud Native

How to Deploy and Test Multus CNI for Multi‑Network Pods in Kubernetes

This guide explains the background, architecture, and step‑by‑step deployment of Multus CNI in a Kubernetes cluster, including configuring Calico and Flannel as primary and secondary networks, creating network attachment definitions, and testing pod connectivity across multiple interfaces.

CalicoFlannelMultus CNI
0 likes · 21 min read
How to Deploy and Test Multus CNI for Multi‑Network Pods in Kubernetes
Alibaba Cloud Native
Alibaba Cloud Native
Nov 6, 2023 · Cloud Native

Mastering Loose‑Mode Traffic Swimlanes in Alibaba Cloud Service Mesh (ASM)

This guide walks you through configuring Alibaba Cloud Service Mesh (ASM) in loose‑mode traffic swimlane, covering prerequisites, sample service deployment, swimlane group and lane creation, automatic generation of DestinationRule and VirtualService resources, routing rule setup, and step‑by‑step verification of full‑link gray release.

ASMLoose ModeSwimlane
0 likes · 20 min read
Mastering Loose‑Mode Traffic Swimlanes in Alibaba Cloud Service Mesh (ASM)
MaGe Linux Operations
MaGe Linux Operations
Nov 5, 2023 · Cloud Native

How to Deploy and Test Multus CNI for Multi‑Network Pods in Kubernetes

This guide explains why Multus CNI is needed for multi‑network pods in Kubernetes, describes its architecture, walks through installing Multus alongside Calico and Flannel, shows how to configure NetworkAttachmentDefinitions, adjust Calico’s NIC selection, and demonstrates testing pod connectivity and routing limitations.

CalicoFlannelMultus CNI
0 likes · 22 min read
How to Deploy and Test Multus CNI for Multi‑Network Pods in Kubernetes
DataFunTalk
DataFunTalk
Nov 5, 2023 · Cloud Native

Cloud‑Native Storage Acceleration: Experience and Practices with CloudFS on Volcano Engine

This article presents the cloud‑native storage acceleration demands, evaluates what constitutes a good acceleration solution, and details the design, implementation, and real‑world practice of CloudFS—including metadata acceleration, data‑plane caching, FUSE enhancements, AI training and multi‑cloud data‑lake use cases—while outlining future roadmap plans.

AICloudFSbig-data
0 likes · 15 min read
Cloud‑Native Storage Acceleration: Experience and Practices with CloudFS on Volcano Engine

How Cloud‑Native Transforms Big Data Platforms: Challenges, Solutions, and Future Trends

This article analyzes the rise of cloud‑native technologies in big data ecosystems, identifies key pain points such as resource scheduling, service capabilities, performance, and operations, and presents detailed technical explorations—including Volcano batch scheduling, Kyuubi serverless, vectorized computing, remote shuffle services, and storage‑compute separation—while outlining future development directions.

Serverlesscloud-nativekubernetes
0 likes · 23 min read
How Cloud‑Native Transforms Big Data Platforms: Challenges, Solutions, and Future Trends
Tencent Music Tech Team
Tencent Music Tech Team
Oct 31, 2023 · Cloud Native

Advanced Istio Best Practices – Locality Routing and Service Mesh Optimization

The article by delphisfang offers a concise, step‑by‑step guide to mastering Istio’s locality‑aware routing, explaining the three‑evidence learning method, the priority algorithm, required DestinationRule and outlier detection settings, how Envoy discovers locality, and tips for simplifying the Pilot‑Envoy mesh architecture.

Best PracticesEnvoyIstio
0 likes · 17 min read
Advanced Istio Best Practices – Locality Routing and Service Mesh Optimization
MaGe Linux Operations
MaGe Linux Operations
Oct 27, 2023 · Cloud Native

Deploy Grafana and Prometheus on Kubernetes in Minutes

This guide walks you through preparing a Kubernetes cluster, creating deployment manifests, configuring Grafana and Prometheus, and verifying the monitoring setup, including code snippets and step‑by‑step commands for a seamless installation on a lightweight cloud server.

DevOpsGrafanaPrometheus
0 likes · 7 min read
Deploy Grafana and Prometheus on Kubernetes in Minutes
Ops Development Stories
Ops Development Stories
Oct 27, 2023 · Cloud Native

Collect Kubernetes Logs with OpenTelemetry and Loki Using Helm

This guide walks through deploying Loki via Helm, configuring the OpenTelemetry Collector to use a filelog receiver and Loki exporter, and enabling Kubernetes event collection, providing step‑by‑step commands and YAML snippets for a complete logging pipeline in a Kubernetes cluster.

CollectorLoggingLoki
0 likes · 17 min read
Collect Kubernetes Logs with OpenTelemetry and Loki Using Helm
Cloud Native Technology Community
Cloud Native Technology Community
Oct 26, 2023 · Cloud Native

Understanding Kubernetes Validating Admission Policies with Practical Examples

This article explains Kubernetes Admission Controllers, distinguishes Mutating and Validating types, introduces the native Validating Admission Policies feature using CEL expressions, and provides a step‑by‑step demonstration with YAML manifests and kubectl commands to enforce replica limits on deployments.

Admission ControllersCELValidating Admission Policies
0 likes · 11 min read
Understanding Kubernetes Validating Admission Policies with Practical Examples
Sohu Tech Products
Sohu Tech Products
Oct 25, 2023 · Cloud Native

Strategies for Rolling Restart of Pods During Istio Service Mesh Upgrade

To upgrade an Istio service mesh without overloading the cluster or causing downtime, the author recommends using Kubernetes’s built‑in kubectl rollout restart for each deployment—scaling replicas up then deleting old pods or simply invoking the command in a scripted loop—to safely perform a rolling restart of all sidecar‑proxied pods.

DevOpsIstioPod Restart
0 likes · 8 min read
Strategies for Rolling Restart of Pods During Istio Service Mesh Upgrade
MaGe Linux Operations
MaGe Linux Operations
Oct 25, 2023 · Cloud Native

Deploy a Typecho Blog on Kubernetes: Step‑by‑Step Guide with MySQL

This tutorial walks you through preparing a Kubernetes cluster, deploying MySQL and Typecho containers with detailed YAML configurations, creating the necessary services and ingress, testing the setup, and highlighting Kubernetes' high‑availability and auto‑scaling benefits for a reliable blog platform.

MySQLTypechoYAML
0 likes · 8 min read
Deploy a Typecho Blog on Kubernetes: Step‑by‑Step Guide with MySQL
Efficient Ops
Efficient Ops
Oct 24, 2023 · Operations

How to Monitor Business Metrics with Prometheus in Kubernetes

This article explains how to use Prometheus to monitor business‑level metrics in a Kubernetes environment, covering observability fundamentals, metric definitions, metric types, exposing metrics via a /metrics endpoint, and practical Go code examples for defining, recording, and scraping custom metrics.

GoMetricsObservability
0 likes · 11 min read
How to Monitor Business Metrics with Prometheus in Kubernetes
Alibaba Cloud Native
Alibaba Cloud Native
Oct 24, 2023 · Cloud Native

Boost Cluster Efficiency with Koordinator’s K8s‑YARN Co‑Location Solution

Koordinator extends its open‑source container scheduler to enable seamless co‑location of Kubernetes Pods and Hadoop YARN tasks, allowing over‑provisioned batch resources to be shared without modifying YARN, and has delivered up to 10 % CPU utilization gains and sub‑1 % eviction rates in Xiaohongshu’s production clusters.

Cluster Schedulingkubernetesresource management
0 likes · 9 min read
Boost Cluster Efficiency with Koordinator’s K8s‑YARN Co‑Location Solution
Huolala Tech
Huolala Tech
Oct 23, 2023 · Information Security

How Huolala Secures Kubernetes: Real-World Container Security Practices

This article details Huolala's end‑to‑end container security strategy—from Kubernetes component basics and a real unauthorized‑access incident to lifecycle‑based safeguards, threat‑matrix guidance, image/ecosystem/baseline/runtime protections, and a custom HIDS architecture—offering practical insights for cloud‑native environments.

DevSecOpsHIDSThreat Modeling
0 likes · 14 min read
How Huolala Secures Kubernetes: Real-World Container Security Practices
Efficient Ops
Efficient Ops
Oct 22, 2023 · Operations

Master Loki: Deploy, Configure, and Query Logs Efficiently

This guide explains Loki's core concepts, deployment steps for Promtail and Loki, Grafana integration, label‑based indexing, handling dynamic and high‑cardinality tags, and query optimization techniques, providing a complete roadmap for building a cost‑effective, scalable log aggregation system.

GrafanaLoggingLoki
0 likes · 15 min read
Master Loki: Deploy, Configure, and Query Logs Efficiently
MaGe Linux Operations
MaGe Linux Operations
Oct 22, 2023 · Cloud Native

Kubernetes Ingress vs OpenShift Route: Key Differences and How to Use Them

This article compares Kubernetes Ingress and OpenShift Route, outlining their similar functions for exposing services, detailing their architectures, configuration steps, and highlighting essential differences such as ecosystem integration, syntax, and security features, while providing practical examples and code snippets for implementation.

OpenShiftcloud-nativekubernetes
0 likes · 9 min read
Kubernetes Ingress vs OpenShift Route: Key Differences and How to Use Them
MaGe Linux Operations
MaGe Linux Operations
Oct 22, 2023 · Cloud Native

Automate Kubernetes Deployments: Step‑by‑Step Jenkins Pipeline Guide

Learn how to connect Jenkins Pipeline with Kubernetes to automate building, testing, and deploying containerized applications, covering prerequisite setup, detailed pipeline stages—including code checkout, Docker image creation, testing, registry push, and Kubernetes deployment—complete with code snippets and configuration tips.

CI/CDJenkinsautomation
0 likes · 4 min read
Automate Kubernetes Deployments: Step‑by‑Step Jenkins Pipeline Guide
Liangxu Linux
Liangxu Linux
Oct 22, 2023 · Databases

How Huge Linux Pages Can Boost Database Throughput on Kubernetes by Up to 8×

This article explains how Linux page size—from the default 4 KB to 2 MB or 1 GB huge pages—affects database performance, details the role of TLB cache hits and misses, presents benchmark results showing up to an eight‑fold throughput increase, and offers practical guidance for configuring huge pages on Kubernetes nodes.

Database PerformanceTLBhugepages
0 likes · 14 min read
How Huge Linux Pages Can Boost Database Throughput on Kubernetes by Up to 8×
Alibaba Cloud Native
Alibaba Cloud Native
Oct 20, 2023 · Cloud Native

How Knative Cuts AI Service Costs by 60% and Halves Deployment Time

This article explains how Shuhe Tech combined Knative with AI workloads to achieve 60% resource cost savings and reduce model deployment cycles from one day to half a day, detailing Knative's architecture, request‑based autoscaling, multi‑version releases, and advanced scaling features.

AIKPAKnative
0 likes · 19 min read
How Knative Cuts AI Service Costs by 60% and Halves Deployment Time
Didi Tech
Didi Tech
Oct 19, 2023 · Cloud Native

Design and Implementation of a New Tiered Resource Guarantee System for Elastic Cloud Containers

The new tiered resource‑guarantee system for Didi’s elastic cloud containers defines S, A, and B priority levels with explicit over‑commit rules, upgrades OS, Kubernetes, kube‑odin, service‑tree, and CMP components, and thereby cuts CPU contention by up to 80%, reduces latency, improves scaling reliability, and lowers operational costs.

Container ManagementOvercommitkubernetes
0 likes · 16 min read
Design and Implementation of a New Tiered Resource Guarantee System for Elastic Cloud Containers
Efficient Ops
Efficient Ops
Oct 18, 2023 · Cloud Native

Why Does Containerd’s PLEG Relisting Stall at Node Startup and How to Fix It

When replacing dockershim with containerd, we observed that pods take over a minute to start because the GenericPLEG Relisting operation stalls for more than 30 seconds during node boot, caused by containerd’s UpdateContainerResources holding a bbolt lock and intensive image pulls; the article explains the root cause and provides a fix using the overlay volatile mount option.

PLEGcontainer-runtimecontainerd
0 likes · 16 min read
Why Does Containerd’s PLEG Relisting Stall at Node Startup and How to Fix It
MaGe Linux Operations
MaGe Linux Operations
Oct 17, 2023 · Databases

How Large Linux Pages Can Boost Database Throughput on Kubernetes by Up to 8×

This article explains how Linux page size, especially using 2 MB or 1 GB huge pages, dramatically improves database throughput on Kubernetes nodes—showing up to an eight‑fold increase for 4 KB pages—by reducing TLB misses and optimizing memory access, and provides practical guidance for configuring huge pages in various environments.

DatabaseLinuxhugepages
0 likes · 12 min read
How Large Linux Pages Can Boost Database Throughput on Kubernetes by Up to 8×
DevOps Cloud Academy
DevOps Cloud Academy
Oct 14, 2023 · Cloud Native

Introducing Kargo: A Multi‑Stage Application Orchestration Platform for CI/CD on Kubernetes

The article explains how Kargo, an open‑source, GitOps‑based platform built on Argo CD experience, addresses the complexities of multi‑stage CI/CD pipelines in Kubernetes by providing declarative stage definitions, promotion workflows, and advanced delivery features such as canary releases and A/B testing.

Argo CDContinuous DeliveryDevOps
0 likes · 12 min read
Introducing Kargo: A Multi‑Stage Application Orchestration Platform for CI/CD on Kubernetes
MaGe Linux Operations
MaGe Linux Operations
Oct 13, 2023 · Cloud Native

How Kubernetes Transforms Cloud‑Native Application Deployment and Management

This article explains what Kubernetes (K8s) is, its core features such as portability, scalability and automation, explores enterprise use cases, resource estimation, service migration, deployment evolution, cloud‑native concepts, and details the master‑node architecture and components that enable efficient container orchestration.

DevOpscloud-nativecontainer orchestration
0 likes · 9 min read
How Kubernetes Transforms Cloud‑Native Application Deployment and Management
DataFunSummit
DataFunSummit
Oct 13, 2023 · Big Data

Practical Experience of Flink on Kubernetes at Kuaishou

This article presents Kuaishou's comprehensive journey of adopting Flink on Kubernetes, covering its background, evolution, architecture, production migration, observability, testing, and future plans, and demonstrates how large‑scale streaming workloads are transformed to a cloud‑native environment.

Big DataFlinkMigration
0 likes · 14 min read
Practical Experience of Flink on Kubernetes at Kuaishou
Volcano Engine Developer Services
Volcano Engine Developer Services
Oct 12, 2023 · Cloud Native

How ByteDance’s Katalyst Memory Advisor Boosts Kubernetes Memory Efficiency

This article explains the challenges of memory management in mixed workloads, outlines the limitations of native Linux and Kubernetes mechanisms, and details how ByteDance’s open‑source Katalyst Memory Advisor improves memory utilization, QoS, and eviction handling through user‑space policies, multi‑dimensional interference detection, and adaptive mitigation actions.

Katalystkubernetesmemory management
0 likes · 17 min read
How ByteDance’s Katalyst Memory Advisor Boosts Kubernetes Memory Efficiency
Ops Development Stories
Ops Development Stories
Oct 12, 2023 · Cloud Native

How to Monitor Kubernetes with OpenTelemetry Collector: Step‑by‑Step Helm Deployment

This guide walks through installing OpenTelemetry Collector on a Kubernetes cluster using Helm, configuring DaemonSet and Deployment collectors, integrating Prometheus for metrics, and customizing receivers, processors, and exporters to achieve comprehensive observability of nodes, pods, containers, and cluster resources.

ObservabilityOpenTelemetryPrometheus
0 likes · 26 min read
How to Monitor Kubernetes with OpenTelemetry Collector: Step‑by‑Step Helm Deployment
ByteDance Cloud Native
ByteDance Cloud Native
Oct 11, 2023 · Cloud Native

How Katalyst Memory Advisor Optimizes Kubernetes Memory Management in Mixed Workloads

This article explains the challenges of memory management in mixed Kubernetes workloads, introduces ByteDance's open‑source Katalyst Memory Advisor, details native allocation and reclamation mechanisms, outlines its architecture and plugins, and describes interference detection and multi‑level mitigation strategies to improve memory utilization and service quality.

Katalystcloud-nativekubernetes
0 likes · 19 min read
How Katalyst Memory Advisor Optimizes Kubernetes Memory Management in Mixed Workloads
DevOps Cloud Academy
DevOps Cloud Academy
Oct 11, 2023 · Cloud Native

A/B Testing with Argo Rollouts Experiments for Progressive Delivery

This article explains how to perform data‑driven A/B testing in progressive delivery using Argo Rollouts Experiments, covering the concepts of progressive delivery, A/B testing fundamentals, the Argo Rollouts architecture, required Kubernetes resources, and step‑by‑step commands and YAML manifests for a weather‑app example.

A/B testingArgo RolloutsProgressive Delivery
0 likes · 19 min read
A/B Testing with Argo Rollouts Experiments for Progressive Delivery
DevOps
DevOps
Oct 10, 2023 · Operations

Common Kubernetes Pod Issues and Troubleshooting Guide

This article outlines typical Kubernetes pod failure states such as ContainerCreating, ErrImagePull, Pending, CrashLoopBackOff, and UnexpectedAdmissionError, explains their common causes—including Docker service problems, storage mount errors, ConfigMap misconfigurations, and image issues—and provides practical troubleshooting steps and example manifests.

ConfigMapDockerNFS
0 likes · 10 min read
Common Kubernetes Pod Issues and Troubleshooting Guide
Efficient Ops
Efficient Ops
Oct 9, 2023 · Cloud Native

Why Do Kubernetes Pods Get Stuck? Decoding Common Pod Status Errors

Learn how to diagnose and resolve frequent Kubernetes pod status issues such as ContainerCreating, ErrImagePull, Pending, CrashLoopBackOff, and UnexpectedAdmissionError by examining Docker services, storage mounts, ConfigMaps, image repositories, and node resources, with practical examples and command‑line solutions.

ConfigMapContainerCreatingErrImagePull
0 likes · 9 min read
Why Do Kubernetes Pods Get Stuck? Decoding Common Pod Status Errors
Ximalaya Technology Team
Ximalaya Technology Team
Oct 9, 2023 · Artificial Intelligence

DeepRec-Based High-Dimensional Sparse Feature Support and Real-Time Model Training in Ximalaya AI Cloud

Ximalaya AI Cloud leverages DeepRec’s Embedding Variable to elastically manage high‑dimensional sparse features with low collision, supporting admission/eviction, multi‑level storage and minute‑level incremental model updates, which together boost GPU utilization, halve training time and improve recommendation CTR by 2‑3 % while maintaining latency.

AI cloudDeepRecTensorFlow
0 likes · 13 min read
DeepRec-Based High-Dimensional Sparse Feature Support and Real-Time Model Training in Ximalaya AI Cloud
Java Backend Technology
Java Backend Technology
Oct 8, 2023 · Operations

How I Traced a Sudden CPU Spike to JVM GC Issues in a Container

After receiving an alarm that a production container’s CPU usage surged past 90%, I investigated the JVM metrics, discovered excessive young and full GCs in a single pod, and walked through the detailed troubleshooting steps—including top, thread analysis, jstack, and code fixes—that resolved the issue.

CPU SpikeJVMJava
0 likes · 7 min read
How I Traced a Sudden CPU Spike to JVM GC Issues in a Container
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Oct 7, 2023 · Cloud Native

How the US Air Force U‑2 Spy Plane Uses Jenkins & Kubernetes for Automated CI/CD

The US Air Force’s U‑2 reconnaissance aircraft has adopted a Jenkins‑driven CI/CD pipeline orchestrated by Kubernetes, enabling automated builds, repeatable deployments, rapid threat response, enhanced security, and resource savings, with a detailed step‑by‑step case study illustrating code management, pipeline configuration, container deployment, testing, and RBAC controls.

DevOpsJenkinsautomation
0 likes · 7 min read
How the US Air Force U‑2 Spy Plane Uses Jenkins & Kubernetes for Automated CI/CD
DevOps Cloud Academy
DevOps Cloud Academy
Oct 5, 2023 · Cloud Native

Balancing Kubernetes Workloads with the Descheduler and Related Tools

This article explains why Kubernetes does not automatically rebalance pods, demonstrates how to use the Descheduler, Node Problem Detector, and Cluster Autoscaler together to detect node pressure, evict overloaded pods, and scale down underutilized nodes for improved cluster efficiency.

Cluster AutoscalerDeschedulerNode Problem Detector
0 likes · 7 min read
Balancing Kubernetes Workloads with the Descheduler and Related Tools
21CTO
21CTO
Oct 4, 2023 · Artificial Intelligence

How LangStream Merges Data Streams with Generative AI for Real‑Time LLM Apps

LangStream, the new open‑source framework from DataStax, combines event‑driven data streaming with generative AI, offering seamless integration with vector databases like Astra DB, Milvus, and Pinecone, and providing a Kubernetes‑based runtime that enables real‑time LLM applications without extensive coding.

Data StreamingLLMLangStream
0 likes · 7 min read
How LangStream Merges Data Streams with Generative AI for Real‑Time LLM Apps
Tencent Cloud Developer
Tencent Cloud Developer
Sep 28, 2023 · Cloud Computing

Cloud Studio: Building Tencent Cloud's Cloud-Based Development Environment - A Self-Hosting Case Study

Tencent Cloud’s Cloud Studio team migrated its fragmented monorepo workflow to a self‑hosted, Kubernetes‑based cloud development environment, unifying code in a trunk‑based repository, delivering pre‑warmed IDE images, fast Git and startup performance, robust security, and laying groundwork for containerized debugging, vGPU AI training, and seamless cloud‑native development.

Cloud IDECloud StudioDevOps
0 likes · 19 min read
Cloud Studio: Building Tencent Cloud's Cloud-Based Development Environment - A Self-Hosting Case Study
37 Interactive Technology Team
37 Interactive Technology Team
Sep 25, 2023 · Cloud Native

Investigation of Kubernetes Container Isolation Mechanism and Its Impact

The article investigates a cloud‑vendor Kubernetes isolation feature that inserts iptables DROP rules into a pod’s network namespace, demonstrating how it fully blocks traffic, triggers liveness‑probe restarts, and impacts services depending on replica count and probe configuration, while preserving state only without probes.

LivenessProbecontainer securityiptables
0 likes · 7 min read
Investigation of Kubernetes Container Isolation Mechanism and Its Impact
Alibaba Cloud Native
Alibaba Cloud Native
Sep 24, 2023 · Cloud Computing

Designing Highly Available Cloud‑Native Applications on Alibaba Cloud ACK

This article explains how to build robust, highly available cloud‑native applications on Alibaba Cloud Container Service for Kubernetes (ACK) by covering architecture principles, multi‑zone cluster design, Kubernetes HA features such as topology spread constraints and pod anti‑affinity, storage strategies, load‑balancing, virtual nodes, health probes, monitoring, and multi‑cluster deployment patterns.

ACKPod AntiAffinityTopology Spread Constraints
0 likes · 35 min read
Designing Highly Available Cloud‑Native Applications on Alibaba Cloud ACK
Alibaba Cloud Native
Alibaba Cloud Native
Sep 21, 2023 · Cloud Native

How Alibaba Cloud’s SAE Achieves High Stability with Diagnostic Engines and Probes

This article explains how Alibaba Cloud's Serverless Application Engine (SAE) builds end‑to‑end stability by dividing fault handling into prevention, detection, localization and recovery, using a Kubernetes‑based diagnostic engine, runtime availability probes, a unified alert center, and a plug‑in architecture for root‑cause analysis.

ObservabilityServerlessStability
0 likes · 28 min read
How Alibaba Cloud’s SAE Achieves High Stability with Diagnostic Engines and Probes
TAL Education Technology
TAL Education Technology
Sep 21, 2023 · Cloud Native

Kubernetes Development Practice: Code Compilation and Image Building

This guide walks through preparing the hardware and software environment, cloning the Kubernetes source, checking out a specific tag, compiling the code, building release images with appropriate build parameters, extracting and loading the images, and updating static manifests for custom Kubernetes deployments.

DockerImage Buildcloud-native
0 likes · 8 min read
Kubernetes Development Practice: Code Compilation and Image Building
MaGe Linux Operations
MaGe Linux Operations
Sep 19, 2023 · Cloud Native

Container Runtimes Explained: low‑level vs high‑level (containerd, CRI‑O, Docker)

This article outlines the architecture and functions of container runtimes, detailing low‑level, high‑level, and sandbox types, and compares major implementations such as runC, containerd, CRI‑O, and Docker, highlighting their components, image handling, networking, storage, and integration with Kubernetes.

CRI-Ocloud-nativecontainer runtimes
0 likes · 26 min read
Container Runtimes Explained: low‑level vs high‑level (containerd, CRI‑O, Docker)
Didi Tech
Didi Tech
Sep 19, 2023 · Cloud Native

OrangeFS: A Cloud‑Native Multi‑Protocol Distributed Data Lake Storage System

OrangeFS is Didi’s cloud‑native, multi‑protocol distributed data‑lake storage system that unifies POSIX, S3 and HDFS access on a single logical hierarchy, integrates with Kubernetes via a CSI plugin, supports on‑premise and public‑cloud backends, provides multi‑tenant isolation, and dramatically improves elasticity, utilization and latency for petabyte‑scale workloads such as ride‑hailing logs, machine‑learning training, finance and analytics.

CSICloud Native StorageData Lake
0 likes · 17 min read
OrangeFS: A Cloud‑Native Multi‑Protocol Distributed Data Lake Storage System
Efficient Ops
Efficient Ops
Sep 17, 2023 · Cloud Native

Top 9 Essential Kubernetes Tools to Streamline Your Cloud‑Native Workflows

Explore nine indispensable Kubernetes tools—including Kubie, Kubespray, Helm, Minikube, K3s, Kustomize, KOps, Prometheus, and krew—that simplify cluster management, accelerate deployments, and enhance efficiency, helping you choose the right solution for smoother, more productive cloud‑native operations.

Cluster ManagementPrometheuscloud-native
0 likes · 6 min read
Top 9 Essential Kubernetes Tools to Streamline Your Cloud‑Native Workflows
MaGe Linux Operations
MaGe Linux Operations
Sep 16, 2023 · Cloud Native

How to Diagnose and Fix Pod Network Issues in Kubernetes Clusters

This article introduces a systematic approach for troubleshooting Kubernetes pod network anomalies, classifies common failure types, presents essential tools such as tcpdump, mtr, nsenter and paping, and walks through real‑world case studies to pinpoint and resolve connectivity problems.

CNIkubernetesmtr
0 likes · 26 min read
How to Diagnose and Fix Pod Network Issues in Kubernetes Clusters
Alibaba Cloud Native
Alibaba Cloud Native
Sep 16, 2023 · Cloud Native

Decoding Istio Ambient Mesh: Full Pod‑to‑Pod Traffic Path Explained

This article provides a step‑by‑step technical walkthrough of Istio Ambient Mesh traffic flow, detailing how a curl request from a sleep pod on Node‑A reaches an httpbin pod on Node‑B via iptables, policy routing, ztunnel and waypoint components, complete with code snippets and diagrams.

Ambient MeshIstioiptables
0 likes · 27 min read
Decoding Istio Ambient Mesh: Full Pod‑to‑Pod Traffic Path Explained
dbaplus Community
dbaplus Community
Sep 14, 2023 · Cloud Native

Mastering Kubernetes: 30+ Essential Pod, Node, and Cluster Troubleshooting Techniques

This guide compiles over thirty practical Kubernetes troubleshooting steps, covering pod startup failures, networking issues, resource bottlenecks, node abnormalities, cluster‑wide service problems, and detailed explanations of common container exit codes to help operators quickly diagnose and resolve issues.

Container exit codesNode diagnosticsPod troubleshooting
0 likes · 22 min read
Mastering Kubernetes: 30+ Essential Pod, Node, and Cluster Troubleshooting Techniques
Efficient Ops
Efficient Ops
Sep 11, 2023 · Cloud Native

Why Multi-Cluster Kubernetes Matters and How Vivo Tackles It

This article examines the motivations, benefits, and existing solutions for Kubernetes multi‑cluster management, then details Vivo's non‑federated and federated approaches, application‑centric continuous delivery, elastic scaling, unified scheduling, gray‑release strategies, and summarizes the current state and challenges.

DevOpsKarmadaMulti-Cluster
0 likes · 22 min read
Why Multi-Cluster Kubernetes Matters and How Vivo Tackles It
Architect
Architect
Sep 11, 2023 · Databases

How eBay Scaled ClickHouse with Read/Write Separation and Keeper

This article details eBay's event monitoring platform architecture, explains the challenges of high‑load OLAP workloads on ClickHouse clusters, describes the design and implementation of read/write separation and multi‑shard Keeper coordination, and shares concrete configuration snippets, performance observations, and production lessons learned.

ClickHouseDistributed SystemsKeeper
0 likes · 20 min read
How eBay Scaled ClickHouse with Read/Write Separation and Keeper
DataFunSummit
DataFunSummit
Sep 11, 2023 · Big Data

eBay's Cloud‑Native Kafka Big Data Platform: Disaster Recovery and High‑Availability Practices

This article details eBay's implementation of a cloud‑native Kafka platform on Kubernetes, covering operational challenges, K8s Operator deployment, single‑ and multi‑data‑center high‑availability designs, anti‑affinity strategies, automated failover components, and future work on remote storage for Kafka.

Big DataKafkadisaster recovery
0 likes · 14 min read
eBay's Cloud‑Native Kafka Big Data Platform: Disaster Recovery and High‑Availability Practices
DataFunSummit
DataFunSummit
Sep 10, 2023 · Cloud Native

An Overview of Curve: High‑Performance Cloud‑Native Distributed Storage System

Curve is a high‑performance, easy‑to‑operate, cloud‑native open‑source distributed storage system (CNCF Sandbox) that provides block and file storage for OpenStack, Kubernetes, and PolarFS, featuring Raft‑based consistency, hybrid storage, high availability, and an ongoing roadmap for AI and other workloads.

CurveRaftblock storage
0 likes · 16 min read
An Overview of Curve: High‑Performance Cloud‑Native Distributed Storage System
MaGe Linux Operations
MaGe Linux Operations
Sep 8, 2023 · Cloud Native

Master Real-Time Kubernetes Log Viewing with Kubetail and Stern

Learn how to efficiently monitor multiple Kubernetes pods by installing and using two lightweight, real‑time log aggregation tools—Kubetail and Stern—including installation steps for Homebrew, Linux, and Zsh, command‑line options, color output, and practical usage examples.

Log MonitoringOperationscloud-native
0 likes · 12 min read
Master Real-Time Kubernetes Log Viewing with Kubetail and Stern
Ops Development Stories
Ops Development Stories
Sep 8, 2023 · Cloud Native

Why Containerd Beats Docker: Understanding Container Runtimes in Kubernetes

Container runtimes are essential for managing containers in Kubernetes, and this article explains their core functions, compares Docker and containerd, details the CRI interface, explores supported backends, and provides practical command references to help you choose the optimal runtime for cloud‑native deployments.

CRIDockercontainer-runtime
0 likes · 11 min read
Why Containerd Beats Docker: Understanding Container Runtimes in Kubernetes
Architect
Architect
Sep 7, 2023 · Cloud Native

How Vivo Scaled Container Monitoring with Prometheus, Kafka, and VictoriaMetrics

This article details how Vivo's container platform faced exploding metric volumes, component overload, data gaps, and storage spikes, and explains the step‑by‑step architectural redesign, metric governance, performance tuning, cAdvisor redeployment, and VictoriaMetrics upgrade that restored high‑availability, low‑latency monitoring across a large Kubernetes fleet.

ObservabilityPrometheusVictoriaMetrics
0 likes · 18 min read
How Vivo Scaled Container Monitoring with Prometheus, Kafka, and VictoriaMetrics
Alibaba Cloud Native
Alibaba Cloud Native
Sep 7, 2023 · Cloud Native

Unlock Real‑Time Container Network Monitoring with KubeSkoop’s eBPF Probes

This article explains how KubeSkoop leverages eBPF to provide low‑overhead, pod‑level network monitoring and real‑time diagnostics for Kubernetes clusters, covering packet flow fundamentals, traditional troubleshooting tool limitations, the exporter’s probe architecture, daily monitoring practices, and future development plans.

GrafanaKubeSkoopNetwork Monitoring
0 likes · 22 min read
Unlock Real‑Time Container Network Monitoring with KubeSkoop’s eBPF Probes
Alibaba Cloud Native
Alibaba Cloud Native
Sep 7, 2023 · Cloud Native

Access On-Premises Data from Alibaba Cloud ECI with ACK Fluid & MinIO

This guide walks through using ACK Fluid to connect Alibaba Cloud Elastic Compute Instances (ECI) with on‑premises MinIO storage, covering prerequisites, deployment of MinIO, building a custom ThinRuntime image, creating Fluid profiles and datasets, and accessing data via a PVC‑mounted pod.

ACK FluidECIMinio
0 likes · 17 min read
Access On-Premises Data from Alibaba Cloud ECI with ACK Fluid & MinIO
37 Interactive Technology Team
37 Interactive Technology Team
Sep 7, 2023 · Cloud Native

Design and Implementation of the kjob Asynchronous Task Scheduling Platform on Kubernetes

The 37Game team built the cloud‑native kjob platform to replace VM‑based schedulers, providing a unified, highly available Kubernetes solution that manages both CronJob‑style scheduled tasks and long‑running Deployments through a backend‑agent architecture, offering CRUD operations, rich configuration, real‑time monitoring, alerting, and seamless migration.

Asynchronous JobsCloud-nativeGo
0 likes · 15 min read
Design and Implementation of the kjob Asynchronous Task Scheduling Platform on Kubernetes
Huolala Safety Emergency Response Center
Huolala Safety Emergency Response Center
Sep 7, 2023 · Information Security

How Huolala Secured Its Kubernetes Workloads: A Deep Dive into Container Security Practices

This article details Huolala's comprehensive container‑security program, covering Kubernetes component basics, a real‑world unauthorized‑access incident, a lifecycle‑based security framework, the Microsoft threat matrix, and the design of a home‑grown HIDS architecture to protect cloud‑native workloads.

DevSecOpsHIDSThreat Matrix
0 likes · 12 min read
How Huolala Secured Its Kubernetes Workloads: A Deep Dive into Container Security Practices
Cloud Native Technology Community
Cloud Native Technology Community
Sep 7, 2023 · Information Security

Kubernetes Security Testing: Importance, Methods, and Best Practices

This article explains why security testing is critical for Kubernetes clusters, outlines key testing approaches such as SAST, DAST, container image scanning, configuration audits, and network policy testing, and provides practical steps for integrating these methods into CI/CD pipelines to ensure robust cloud‑native security.

Configuration AuditContainer ScanningDAST
0 likes · 9 min read
Kubernetes Security Testing: Importance, Methods, and Best Practices