Tagged articles
8 articles
Page 1 of 1
Raymond Ops
Raymond Ops
Dec 17, 2025 · Operations

Build a Production‑Ready Prometheus HA Architecture with Federation and Remote Storage

Learn how to design and implement a robust, production‑grade Prometheus high‑availability solution using a federated global cluster, multiple business‑level instances, remote storage with Thanos or VictoriaMetrics, Docker‑Compose deployment, health‑check scripts, performance metrics, alerting rules, and best‑practice operational guidelines.

Docker-ComposeFederationRemote Storage
0 likes · 17 min read
Build a Production‑Ready Prometheus HA Architecture with Federation and Remote Storage
dbaplus Community
dbaplus Community
Dec 10, 2023 · Big Data

How Bilibili Built a Remote State Backend for Flink Using Taishan KV Store

This article explains Bilibili's design and implementation of a remote state backend for Flink, detailing the motivations, pain points of the existing RocksDBStateBackend, the architecture of TaishanStateBackend, and the performance optimizations applied to achieve storage‑compute separation and faster rescaling.

Big DataFlinkRemote Storage
0 likes · 21 min read
How Bilibili Built a Remote State Backend for Flink Using Taishan KV Store
Big Data Technology & Architecture
Big Data Technology & Architecture
May 11, 2023 · Big Data

Remote State Backend for Flink: Design, Optimization, and Deployment with Taishan KV Store

This article describes the motivation, challenges, design, and performance optimizations of a remote state backend for Flink that leverages Bilibili's Taishan distributed KV store to achieve storage‑compute separation, lighter checkpoints, faster rescaling, and improved resource utilization in large‑scale streaming jobs.

Big DataFlinkPerformance Optimization
0 likes · 20 min read
Remote State Backend for Flink: Design, Optimization, and Deployment with Taishan KV Store
Open Source Linux
Open Source Linux
Jan 5, 2022 · Operations

Designing Scalable High‑Availability Prometheus Architectures

This article explains how to build both small‑scale and large‑scale high‑availability Prometheus setups using local and remote storage, federation, keepalived, and PostgreSQL + TimescaleDB adapters to ensure reliable monitoring and alerting across growing infrastructures.

FederationPrometheusRemote Storage
0 likes · 6 min read
Designing Scalable High‑Availability Prometheus Architectures
MaGe Linux Operations
MaGe Linux Operations
Dec 1, 2021 · Operations

Scalable High‑Availability Prometheus: Small‑Scale to Massive Deployments

This article explains how Prometheus’s local storage limits scalability and how Remote Storage, federation, and high‑availability setups—using dual instances, keepalived, and adapters with PostgreSQL + TimescaleDB—can overcome data persistence and performance challenges for both small‑scale and large‑scale monitoring environments.

FederationPrometheusRemote Storage
0 likes · 5 min read
Scalable High‑Availability Prometheus: Small‑Scale to Massive Deployments
dbaplus Community
dbaplus Community
Jul 23, 2019 · Cloud Native

How Xiaomi Scaled Kubernetes Monitoring with Prometheus and Open‑Falcon

This article details Xiaomi's Ocean elastic scheduling platform's challenges in monitoring massive Kubernetes clusters, the transition from Open‑Falcon to a Prometheus‑based solution with remote storage, partitioned deployment strategies, performance testing, and future plans for automated scaling and data analytics.

Cloud NativeKubernetesPrometheus
0 likes · 16 min read
How Xiaomi Scaled Kubernetes Monitoring with Prometheus and Open‑Falcon