Tag

Cluster Management

1 views collected around this technical thread.

DevOps Operations Practice
DevOps Operations Practice
Jun 16, 2025 · Cloud Native

Mastering Kubernetes: 6 Essential Tools for Cluster Management

This article introduces six indispensable tools—kubectl, Helm, Prometheus + Grafana, Istio, Velero, and K9s—that simplify Kubernetes cluster management by covering resource handling, monitoring, networking, security, backup, and interactive UI, helping readers efficiently operate production‑grade clusters.

Cluster ManagementDevOpsKubernetes
0 likes · 7 min read
Mastering Kubernetes: 6 Essential Tools for Cluster Management
Efficient Ops
Efficient Ops
May 12, 2025 · Cloud Native

Master Kubernetes Management with Kuboard: Visual UI Guide & Installation

Kuboard is a web‑based visual tool for managing Kubernetes clusters, offering multi‑auth, multi‑cluster support, micro‑service layering, and storage integration; the guide explains Docker installation, adding clusters via KubeConfig, workload inspection, and how the UI simplifies complex command‑line operations.

Cluster ManagementDockerKubernetes
0 likes · 5 min read
Master Kubernetes Management with Kuboard: Visual UI Guide & Installation
Raymond Ops
Raymond Ops
Mar 30, 2025 · Operations

Mastering Elasticsearch Data Sync and Cluster Architecture: 3 Strategies Explained

This article explains three Elasticsearch data‑synchronization methods, compares their pros and cons, and then dives into ES cluster structure, node roles, shard allocation, distributed queries, split‑brain handling, and fault‑tolerance mechanisms, providing a comprehensive guide for developers and ops engineers.

Cluster ManagementData Synchronizationdistributed systems
0 likes · 9 min read
Mastering Elasticsearch Data Sync and Cluster Architecture: 3 Strategies Explained
Cloud Native Technology Community
Cloud Native Technology Community
Mar 18, 2025 · Cloud Native

Best Practices for Managing Core Services in Large‑Scale Kubernetes Deployments

Scaling Kubernetes across dozens or hundreds of clusters requires standardized core services—networking, security, observability, and automation—so organizations should adopt templated configurations, GitOps tools, centralized monitoring, and automated certificate management to reduce complexity, improve security, and lower operational overhead.

Cluster ManagementGitOpsKubernetes
0 likes · 8 min read
Best Practices for Managing Core Services in Large‑Scale Kubernetes Deployments
Architect
Architect
Dec 27, 2024 · Big Data

Fault Self‑Healing System for Large‑Scale Big Data Clusters

This article describes the design, architecture, and technical implementation of BMR's fault self‑healing platform, which automatically collects data, analyzes failures, defines decision rules, and executes safe recovery workflows to improve reliability and efficiency of massive, heterogeneous big‑data environments.

Cluster Managementautomationbig data
0 likes · 16 min read
Fault Self‑Healing System for Large‑Scale Big Data Clusters
Bilibili Tech
Bilibili Tech
Dec 10, 2024 · Big Data

Fault Self‑Healing System for Bilibili's Large‑Scale Big Data Cluster (BMR)

Bilibili's fault‑self‑healing platform for its massive BMR big‑data cluster—over 10,000 machines and 1 EB storage—adds near‑real‑time fault discovery, intelligent diagnosis, and automated workflow handling, dramatically cutting resolution time, improving stability across services, and scaling to dozens of daily automated repairs.

BMRCluster Managementautomation
0 likes · 16 min read
Fault Self‑Healing System for Bilibili's Large‑Scale Big Data Cluster (BMR)
Bilibili Tech
Bilibili Tech
Oct 29, 2024 · Big Data

Bilibili One‑Stop Big Data Cluster Management Platform (BMR): Architecture, Modules, and Future Outlook

Bilibili's One‑Stop Big Data Cluster Management Platform (BMR) unifies cluster, metadata, intelligent operations, and custom managers to oversee 50+ services, 10,000 machines, exabyte storage, and millions of cores, using cloud‑native containers, fault prediction, and resource‑sharing techniques to boost efficiency, stability, and cost savings.

BMRCluster ManagementDevOps
0 likes · 17 min read
Bilibili One‑Stop Big Data Cluster Management Platform (BMR): Architecture, Modules, and Future Outlook
Efficient Ops
Efficient Ops
Oct 15, 2024 · Operations

Master 9 Essential kubectl Commands for Efficient Kubernetes Management

This guide introduces nine commonly used kubectl commands—get, create, edit, delete, apply, describe, logs, exec, and cp—explaining their purposes, providing practical examples, and offering tips to help system administrators streamline Kubernetes resource management and troubleshooting.

Cluster ManagementDevOpsKubernetes
0 likes · 10 min read
Master 9 Essential kubectl Commands for Efficient Kubernetes Management
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Aug 6, 2024 · Operations

ZooKeeper Core Concepts: Data Model, Node Types, Sessions, Cluster, Election, ZAB, Watch, ACL, and Distributed Lock Patterns

This article explains ZooKeeper's hierarchical data model, node types, session mechanism, cluster roles and election process, ZAB protocol, watch mechanism, ACL permissions, and common distributed lock implementations, providing a comprehensive overview of its core concepts and practical usage.

ACLCluster ManagementCoordination Service
0 likes · 17 min read
ZooKeeper Core Concepts: Data Model, Node Types, Sessions, Cluster, Election, ZAB, Watch, ACL, and Distributed Lock Patterns
Bilibili Tech
Bilibili Tech
Jul 19, 2024 · Big Data

Bilibili's One-Stop Big Data Cluster Management Platform (BMR) - Architecture and Implementation

Bilibili’s one‑stop Big Data Cluster Management Platform (BMR) consolidates HDFS, Spark, Flink, ClickHouse, Kafka and other services into a unified system that evolved through four stages—standardization, metadata‑driven construction, containerization, and observability—addressing node consistency, scaling, fault self‑healing, and resource optimization while delivering elastic scaling, automated start/stop, and future cost‑saving and stability enhancements.

Cluster ManagementContainerizationObservability
0 likes · 12 min read
Bilibili's One-Stop Big Data Cluster Management Platform (BMR) - Architecture and Implementation
DevOps Cloud Academy
DevOps Cloud Academy
Jun 18, 2024 · Operations

Essential kubectl Commands for DevOps Engineers

This guide presents a comprehensive collection of the most important and frequently used kubectl commands, explaining how to retrieve version information, manage clusters, list resources, manipulate contexts, create, update, patch, scale, expose, delete, and debug Kubernetes objects, as well as format output and control verbosity, enabling DevOps engineers to efficiently operate Kubernetes clusters.

Cluster ManagementDevOpsKubernetes
0 likes · 14 min read
Essential kubectl Commands for DevOps Engineers
Practical DevOps Architecture
Practical DevOps Architecture
Apr 18, 2024 · Cloud Native

Kubernetes Source Code Deep Dive and Secondary Development Course Outline

This curriculum provides a comprehensive, step‑by‑step exploration of Kubernetes internals—including kubeadm core source, Go module management, cobra libraries, kubeadm init/join processes, client‑go components, code generators, custom resources, operators, and practical deployment automation—aimed at mastering cluster setup, configuration, and advanced development.

Client-goCluster ManagementKubernetes
0 likes · 10 min read
Kubernetes Source Code Deep Dive and Secondary Development Course Outline
Practical DevOps Architecture
Practical DevOps Architecture
Feb 26, 2024 · Big Data

Advanced ElasticStack Development and Architecture Course (P6)

This course provides comprehensive, hands‑on training on ElasticSearch, Logstash, Kibana, and the ElasticStack ecosystem, covering advanced development, cluster design, performance tuning, security, and real‑world integration techniques for large‑scale data processing.

Cluster ManagementElasticStackPerformance Optimization
0 likes · 6 min read
Advanced ElasticStack Development and Architecture Course (P6)
Didi Tech
Didi Tech
Jan 9, 2024 · Big Data

Introducing Apache Pulsar: Technical Benefits and Solutions for Didi Big Data Messaging System

Apache Pulsar, a cloud‑native distributed messaging platform, solves Didi Big Data’s DKafka bottlenecks by separating compute and storage, using sequential log writes, heterogeneous disks, multi‑level caching, bundle‑based load balancing and automatic scaling, dramatically improving stability while introducing richer monitoring complexity.

Apache PulsarCluster ManagementDKafka
0 likes · 17 min read
Introducing Apache Pulsar: Technical Benefits and Solutions for Didi Big Data Messaging System
Efficient Ops
Efficient Ops
Sep 17, 2023 · Cloud Native

Top 9 Essential Kubernetes Tools to Streamline Your Cloud‑Native Workflows

Explore nine indispensable Kubernetes tools—including Kubie, Kubespray, Helm, Minikube, K3s, Kustomize, KOps, Prometheus, and krew—that simplify cluster management, accelerate deployments, and enhance efficiency, helping you choose the right solution for smoother, more productive cloud‑native operations.

Cluster ManagementHelmKubernetes
0 likes · 6 min read
Top 9 Essential Kubernetes Tools to Streamline Your Cloud‑Native Workflows
Aikesheng Open Source Community
Aikesheng Open Source Community
Jul 3, 2023 · Databases

Replacing OCP Nodes Using the ANTMAN Tool in OceanBase Cloud Platform

This article provides a step‑by‑step guide on how to replace OceanBase Cloud Platform (OCP) nodes using the ANTMAN tool, covering environment preparation, configuration adjustments, execution of management scripts, tenant migration, cleanup of old services, and troubleshooting tips for a seamless database cluster upgrade.

ANTMANCluster ManagementDocker
0 likes · 25 min read
Replacing OCP Nodes Using the ANTMAN Tool in OceanBase Cloud Platform
Test Development Learning Exchange
Test Development Learning Exchange
Jun 29, 2023 · Cloud Native

Essential Kubernetes Commands for Testers: 50 Commands with Practical Examples

This article presents a comprehensive collection of 50 essential kubectl commands covering cluster, namespace, pod, deployment, service, ConfigMap, secret, volume, logging, debugging, scaling, configuration, and cleanup operations, providing testers with practical examples to efficiently manage and troubleshoot Kubernetes environments.

Cluster ManagementDevOpsKubernetes
0 likes · 9 min read
Essential Kubernetes Commands for Testers: 50 Commands with Practical Examples
High Availability Architecture
High Availability Architecture
May 26, 2023 · Big Data

Amiya: Dynamic Overcommit Component for Bilibili Offline Big Data Cluster Resource Scheduling

This article introduces Amiya, a self‑developed overcommit component that dynamically increases Yarn memory and vCore capacity on Bilibili's offline big‑data clusters, details its architecture, key implementation of overcommit, eviction and mixed‑deployment strategies, and evaluates its resource‑utilization impact.

Cluster ManagementOvercommitResource Scheduling
0 likes · 22 min read
Amiya: Dynamic Overcommit Component for Bilibili Offline Big Data Cluster Resource Scheduling
Bilibili Tech
Bilibili Tech
May 23, 2023 · Big Data

Amiya: Dynamic Overcommit Component for Bilibili Offline Big Data Cluster

Amiya, a self‑developed dynamic over‑commit component for Bilibili’s offline big‑data cluster, inflates reported resources on under‑utilized nodes and adjusts them when load rises, adding roughly 683 TB of memory and 137 k vCores, boosting per‑node memory by 15 % and CPU usage by over 20 % while keeping eviction rates below 3 %.

AmiyaBilibiliCluster Management
0 likes · 22 min read
Amiya: Dynamic Overcommit Component for Bilibili Offline Big Data Cluster