Tag

Scaling

2 views collected around this technical thread.

macrozheng
macrozheng
Apr 29, 2025 · Backend Development

How to Tame a 100× Traffic Surge: Practical Strategies for Backend Engineers

This guide walks backend developers through a step‑by‑step approach to handle sudden 100‑fold traffic spikes, covering emergency response, traffic analysis, robust system design, scaling techniques, circuit breaking, message queuing, and stress testing to keep services resilient and performant.

Rate LimitingScalingbackend performance
0 likes · 12 min read
How to Tame a 100× Traffic Surge: Practical Strategies for Backend Engineers
IT Services Circle
IT Services Circle
Apr 23, 2025 · Backend Development

Handling Sudden Traffic Spikes in Backend Systems

The article outlines a comprehensive approach for backend engineers to manage a sudden 100‑fold increase in traffic, covering emergency response, traffic analysis, robust system design, rate limiting, circuit breaking, scaling, sharding, pooling, caching, asynchronous processing, and stress testing to ensure system stability and performance.

Rate LimitingScalingasynchronous processing
0 likes · 13 min read
Handling Sudden Traffic Spikes in Backend Systems
Tencent Cloud Developer
Tencent Cloud Developer
Apr 23, 2025 · Cloud Native

Microservices Architecture: Principles, Modeling, Integration, and Scaling

Microservices are small, autonomous services that replace monolithic codebases by emphasizing loose coupling, high cohesion, bounded contexts, technology-agnostic integration via REST, RPC, or events, disciplined code governance, semantic versioning, local transactions with eventual consistency, and robust scaling patterns such as timeouts, circuit breakers, and auto-scaling, while reflecting organizational structure and avoiding premature complexity.

DevOpsIntegrationScaling
0 likes · 19 min read
Microservices Architecture: Principles, Modeling, Integration, and Scaling
vivo Internet Technology
vivo Internet Technology
Mar 5, 2025 · Cloud Native

Beidou Container Operations Management Platform: Architecture, Automation, and Capabilities

The Beidou Operations Management Platform, created by vivo’s Internet Server team, unifies management of over twenty Kubernetes clusters and tens of thousands of nodes, automates scaling, inspections, event collection, and Helm‑based application deployment, achieving more than 90% UI‑driven operations and dramatically improving stability and operational efficiency.

Container ManagementDevOpsKubernetes
0 likes · 20 min read
Beidou Container Operations Management Platform: Architecture, Automation, and Capabilities
Xiaolei Talks DB
Xiaolei Talks DB
Dec 27, 2024 · Databases

Mastering Production TiDB Cluster Management: Access, Scaling, and Upgrades

This guide walks through accessing a production TiDB cluster via pod IP, Service ClusterIP, or DNS, initializing users and databases, and performing scaling and version upgrades by editing the cluster's YAML configuration in Kubernetes.

Database OperationsKubernetesScaling
0 likes · 9 min read
Mastering Production TiDB Cluster Management: Access, Scaling, and Upgrades
Raymond Ops
Raymond Ops
Dec 19, 2024 · Operations

How to Auto‑Scale Non‑CPU Apps with cAdvisor Network Metrics in Kubernetes

This guide explains how to use cAdvisor‑provided container network traffic counters as custom metrics for Kubernetes HPA, covering metric collection, Prometheus‑adapter configuration, verification, and a complete HPA testing workflow for elastic scaling of non‑CPU‑intensive workloads.

Custom MetricsHPAKubernetes
0 likes · 7 min read
How to Auto‑Scale Non‑CPU Apps with cAdvisor Network Metrics in Kubernetes
macrozheng
macrozheng
Aug 2, 2024 · Backend Development

How to Quickly Resolve Massive Kafka Message Backlog in Production

This guide explains why Kafka message backlogs occur, how to diagnose bugs, optimize consumer logic, and use temporary topics for emergency scaling, while emphasizing monitoring, alerts, and proper offset handling to keep your streaming system healthy.

BacklogConsumerJava
0 likes · 5 min read
How to Quickly Resolve Massive Kafka Message Backlog in Production
DevOps Cloud Academy
DevOps Cloud Academy
May 31, 2024 · Cloud Native

Optimizing RabbitMQ Performance on Kubernetes

This guide explains how to deploy RabbitMQ on Kubernetes and improve its performance through Helm installation, resource tuning, monitoring, scaling, security hardening, and advanced configuration techniques, providing practical code examples for each step.

HelmKubernetesRabbitMQ
0 likes · 9 min read
Optimizing RabbitMQ Performance on Kubernetes
DevOps Cloud Academy
DevOps Cloud Academy
May 6, 2024 · Cloud Native

How to Deploy a Highly Available Application on Kubernetes

This article explains key Kubernetes configurations—such as pod replicas, pod anti‑affinity, deployment strategies, graceful termination, probes, resource allocation, scaling, and disruption budgets—to achieve high availability and zero‑downtime deployments for containerized applications in production.

High AvailabilityKubernetesPod Disruption Budget
0 likes · 20 min read
How to Deploy a Highly Available Application on Kubernetes
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Apr 25, 2024 · Databases

Redis Cluster: Architecture, Setup, Testing, and High Availability

This article explains Redis Cluster's sharding architecture, demonstrates how to configure multiple Redis nodes on different ports, shows commands for creating and testing the cluster, and illustrates failover behavior, highlighting its scalability and high‑availability advantages over Sentinel mode for large‑scale data workloads.

ClusterDatabaseHigh Availability
0 likes · 11 min read
Redis Cluster: Architecture, Setup, Testing, and High Availability
DevOps Operations Practice
DevOps Operations Practice
Mar 14, 2024 · Operations

Resolving Frequent Crashes of a Single-Node Prometheus Deployment: Analysis and Solutions

This article analyzes why a single Prometheus instance repeatedly runs out of memory and crashes, explains the underlying storage mechanisms, and presents practical solutions such as metric reduction, retention tuning, federation architecture, and remote storage integration to improve stability and scalability.

FederationPrometheusScaling
0 likes · 6 min read
Resolving Frequent Crashes of a Single-Node Prometheus Deployment: Analysis and Solutions
Qunar Tech Salon
Qunar Tech Salon
Feb 20, 2024 · Databases

Qunar.com Redis Automation Operations System: Architecture, Deployment, Migration, Scaling, and Inspection

This article details Qunar.com's Redis automation operations system, covering background challenges, the high‑availability cluster architecture, resource management, automated deployment, various migration strategies, scaling mechanisms with RedisGate, inspection processes, and future AI‑driven enhancements.

AIDatabase OperationsMigration
0 likes · 14 min read
Qunar.com Redis Automation Operations System: Architecture, Deployment, Migration, Scaling, and Inspection
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Feb 7, 2024 · Operations

Understanding Load Balancing: Principles, Types, and Application Scenarios

This article explains the fundamentals of load balancing, covering its principles, classifications from layer 2 to layer 7, common software implementations, and typical application scenarios such as high traffic handling, horizontal scaling, fault tolerance, and multi‑zone disaster recovery.

High AvailabilityLoad BalancingScaling
0 likes · 8 min read
Understanding Load Balancing: Principles, Types, and Application Scenarios
Selected Java Interview Questions
Selected Java Interview Questions
Nov 26, 2023 · Databases

Understanding and Solving Hot Key Issues in Redis

Hot keys in Redis—high‑frequency accessed keys—can overload the cache and downstream databases, causing crashes; this article explains what hot keys are, why they arise, their risks, how to detect them, and practical mitigation strategies such as scaling clusters, using secondary caches, monitoring commands, and traffic analysis.

CacheRedisScaling
0 likes · 6 min read
Understanding and Solving Hot Key Issues in Redis
Laravel Tech Community
Laravel Tech Community
Oct 26, 2023 · Cloud Native

How Kuaishou Scales Live E‑commerce Flash Sales with an Elastic Container Cloud and Hybrid Cloud Architecture

To handle billions of daily users and massive flash‑sale spikes in its live‑ecommerce streams, Kuaishou built a large‑scale elastic container cloud, integrated with Alibaba Cloud in a hybrid‑cloud setup, employing load balancing, caching, message queues, rate‑limiting, and intelligent resource scheduling to achieve million‑request‑per‑second throughput and high availability.

KuaishouLive E‑commerceScaling
0 likes · 8 min read
How Kuaishou Scales Live E‑commerce Flash Sales with an Elastic Container Cloud and Hybrid Cloud Architecture
Continuous Delivery 2.0
Continuous Delivery 2.0
Sep 21, 2023 · Operations

Scaling DevOps in Large Organizations: Normalization, Standardization, and Platformization

The article outlines how organizations over a hundred engineers must go beyond merely copying DevOps practices by adopting three progressive steps—normalization, standardization, and platformization—to achieve measurable, scalable efficiency, and concludes with a promotional notice for a Python‑based continuous deployment training course.

DevOpsScalingStandardization
0 likes · 8 min read
Scaling DevOps in Large Organizations: Normalization, Standardization, and Platformization
vivo Internet Technology
vivo Internet Technology
Sep 6, 2023 · Cloud Native

Multi-Cluster Management in Kubernetes: Concepts, Practices, and Karmada Exploration

The article explains why enterprises adopt multi‑cluster Kubernetes architectures, reviews community solutions such as Karmada, Clusternet and OCM, and details vivo’s hybrid strategy that combines a unified UI for independent clusters with Karmada‑based federation for resource distribution, elastic scaling, cross‑cluster scheduling, and gray‑release migration.

Cloud-NativeKarmadaKubernetes
0 likes · 20 min read
Multi-Cluster Management in Kubernetes: Concepts, Practices, and Karmada Exploration
Code Ape Tech Column
Code Ape Tech Column
Aug 15, 2023 · Operations

High‑Availability Architecture for a Billion‑Scale Membership System: Dual‑Center ES, Redis, and MySQL Solutions

This article details the design and implementation of a highly available, high‑performance membership system serving over a billion users, covering dual‑center Elasticsearch clusters, traffic‑isolated three‑cluster ES architecture, Redis dual‑center caching, MySQL partitioned clusters, migration strategies, and refined flow‑control and degradation mechanisms.

ElasticsearchHigh AvailabilityMySQL
0 likes · 20 min read
High‑Availability Architecture for a Billion‑Scale Membership System: Dual‑Center ES, Redis, and MySQL Solutions
Architect
Architect
Aug 10, 2023 · Operations

Capacity Management: Goals, Stages, Optimization Techniques, and Scaling Practices

The article explains how capacity management balances cost control and service quality through defined goals, three development stages, detailed resource optimization methods, stress‑testing metrics and standards, and automated scaling to achieve significant cost reductions while maintaining system stability.

Scalingcapacity-managementoperations
0 likes · 10 min read
Capacity Management: Goals, Stages, Optimization Techniques, and Scaling Practices
Top Architect
Top Architect
Jul 6, 2023 · Databases

Understanding HikariCP Connection Pool Sizing: Principles, Experiments, and Practical Guidelines

This article translates and expands on HikariCP's pool‑sizing guidance, explaining why smaller database connection pools often yield better performance, presenting real‑world benchmark data for various pool sizes, and offering a simple formula to calculate an optimal pool size based on CPU cores and effective disks.

Connection PoolHikariCPPostgreSQL
0 likes · 10 min read
Understanding HikariCP Connection Pool Sizing: Principles, Experiments, and Practical Guidelines