Tagged articles

236 articles

Page 1 of 3

Machine Learning Algorithms & Natural Language Processing

May 28, 2026 · Artificial Intelligence

A New Paradigm for GUI Agent Trajectory Generation: FSM‑Synthesized Data at $0.04 per Trajectory

AutoWebWorld introduces a finite‑state‑machine‑driven pipeline that synthesizes verified web‑GUI trajectories at an average cost of only $0.04 each, producing longer interaction sequences, scaling efficiently, and demonstrably improving large‑language‑model agents on WebVoyager and grounding benchmarks.

AutoWebWorldData GenerationFinite State Machine

0 likes · 13 min read

A New Paradigm for GUI Agent Trajectory Generation: FSM‑Synthesized Data at $0.04 per Trajectory

Architects' Tech Alliance

May 25, 2026 · Industry Insights

Huawei’s τ (Tao) Scaling Theory: How Time‑Based Chip Design Breaks Performance Limits

Huawei’s new τ (Tao) scaling theory shifts chip optimisation from shrinking dimensions to compressing time, offering a post‑Moore roadmap that boosts mobile SoC density by 55%, AI data‑center latency by 500×, and promises continued performance growth without relying on advanced EUV lithography.

AI data centerHuaweiLogicFolding

0 likes · 8 min read

Huawei’s τ (Tao) Scaling Theory: How Time‑Based Chip Design Breaks Performance Limits

Xiaomi Tech

May 14, 2026 · Artificial Intelligence

500 M Videos Yield the Largest Open‑Source GUI Dataset; 3B Model Cuts Inference Tokens 71% and Beats Larger Models (Xiaomi AI at ICML 2026)

Xiaomi’s AI team extracted 5 billion video frames to create the world’s largest open‑source GUI dataset, demonstrated that a 3 B‑parameter model can reduce inference tokens by 71% while surpassing larger models, and presented a suite of ICML 2026 papers covering data scaling, benchmarking, reasoning, multimodal perception, and training stability for GUI agents and other AI tasks.

GUI AgentLarge Language ModelMultimodal

0 likes · 21 min read

500 M Videos Yield the Largest Open‑Source GUI Dataset; 3B Model Cuts Inference Tokens 71% and Beats Larger Models (Xiaomi AI at ICML 2026)

AI Info Trend

May 12, 2026 · Industry Insights

McKinsey 2026 Report Uncovers 3 Golden Rules to Scale Cleantech in Its Golden Growth Phase

The McKinsey 2026 cleantech report shows that despite recent funding slow‑downs, investment remains robust, outlines three development phases, and identifies three core rules—cheaper, faster, better—while warning of common pitfalls and highlighting AI, standardisation and low‑cost financing as future growth drivers.

AIScalingcleantech

0 likes · 6 min read

McKinsey 2026 Report Uncovers 3 Golden Rules to Scale Cleantech in Its Golden Growth Phase

Machine Heart

Mar 31, 2026 · Artificial Intelligence

ProMoE: Explicit Routing Breaks the Scaling Bottleneck of Diffusion‑Transformer MoE (ICLR 2026)

ProMoE introduces a two‑step routing MoE framework with explicit semantic guidance that tackles the high spatial redundancy and functional heterogeneity of visual tokens, enabling diffusion transformers to scale efficiently and outperform dense models and prior MoE approaches across generation, convergence, and scaling benchmarks.

Diffusion TransformerExplicit RoutingMixture of Experts

0 likes · 9 min read

ProMoE: Explicit Routing Breaks the Scaling Bottleneck of Diffusion‑Transformer MoE (ICLR 2026)

MaGe Linux Operations

Mar 30, 2026 · Cloud Native

How to Scale Prometheus to Thousands of Nodes with Thanos: A Deep Dive

This article examines the storage, query performance, high‑availability, and high‑cardinality challenges of running Prometheus on a thousand‑node Kubernetes cluster and presents a complete, step‑by‑step Thanos‑based architecture, capacity‑planning models, configuration examples, and operational best practices for reliable horizontal scaling.

KubernetesMonitoringObservability

0 likes · 34 min read

How to Scale Prometheus to Thousands of Nodes with Thanos: A Deep Dive

JD Retail Technology

Mar 25, 2026 · Databases

How JD.com Scaled POP Order Elasticsearch to Handle Billions of Orders

This article analyzes the challenges of JD.com's POP order Elasticsearch storage—including data skew, oversized shards, frequent updates, and high maintenance costs—and details the multi‑layered architectural redesign that introduced tenant isolation, dual‑hash routing, differentiated shard strategies, and a dual‑active physical foundation to achieve high performance, scalability, and availability.

Data PartitioningElasticsearchOrder Management

0 likes · 16 min read

How JD.com Scaled POP Order Elasticsearch to Handle Billions of Orders

Architect Chen

Mar 24, 2026 · Databases

How High Can Redis Really Scale? Real-World QPS Limits Explained

This article breaks down Redis performance limits, showing that a single node can handle roughly 100‑200k simple GET/SET QPS and up to 500‑700k with multithreaded I/O, while sharded clusters can theoretically reach millions of QPS, though practical factors affect the actual throughput.

ClusterDatabasePerformance

0 likes · 6 min read

How High Can Redis Really Scale? Real-World QPS Limits Explained

PMTalk Product Manager Community

Mar 18, 2026 · Product Management

When Your Team Is All Agents: How Product Management Must Evolve

The article analyses why using instant‑messaging groups to orchestrate multiple AI agents cannot scale to dozens or hundreds of agents, proposes a four‑layer ICSE architecture, compares three agent‑to‑agent communication models, and outlines the new governance, design, and roadmap responsibilities that product managers will need to master.

AI agentsGovernanceICSE architecture

0 likes · 14 min read

When Your Team Is All Agents: How Product Management Must Evolve

MaGe Linux Operations

Feb 27, 2026 · Artificial Intelligence

How to Deploy Scalable LLM Inference with vLLM on Kubernetes and GPU Scheduling

This guide explains how to deploy vLLM for large‑language‑model serving on Kubernetes, covering GPU resource management, tensor‑parallel configuration, continuous batching, quantization choices, autoscaling with HPA and KEDA, multi‑model routing, and best‑practice recommendations for performance, cost control, and high availability.

GPUKubernetesLLM inference

0 likes · 48 min read

How to Deploy Scalable LLM Inference with vLLM on Kubernetes and GPU Scheduling

Architect Chen

Feb 13, 2026 · Databases

Boost MySQL Performance: Proven Tuning, Indexing, and Scaling Strategies

This guide presents practical MySQL optimization techniques—including SQL and index refinement, InnoDB and connection parameter tuning, cache layer integration, and architectural scaling with read‑write splitting and sharding—to dramatically increase query throughput and reduce latency.

Index OptimizationInnoDBPerformance tuning

0 likes · 6 min read

Boost MySQL Performance: Proven Tuning, Indexing, and Scaling Strategies

ITPUB

Jan 31, 2026 · Databases

How OpenAI Scaled PostgreSQL to Support 800 Million Users and Millions of QPS

OpenAI’s engineering team expanded a single‑primary PostgreSQL cluster with nearly 50 read‑only replicas, migrated write‑heavy workloads to Azure Cosmos DB, and applied extensive optimizations to reliably serve the global traffic of ChatGPT and the OpenAI API for 800 million users at multi‑million queries per second.

AzurePostgreSQLRead Replicas

0 likes · 24 min read

How OpenAI Scaled PostgreSQL to Support 800 Million Users and Millions of QPS

Radish, Keep Going!

Jan 23, 2026 · Databases

How OpenAI Really Scaled PostgreSQL for Hundreds of Millions of Users

The article debunks OpenAI's sensational claim of handling 800 million ChatGPT users with a single PostgreSQL instance, revealing a pragmatic hybrid architecture that combines many read replicas, Azure CosmosDB for write‑heavy workloads, and top‑tier hardware, while highlighting cost and complexity considerations.

Azure CosmosDBDatabase ArchitecturePostgreSQL

0 likes · 6 min read

How OpenAI Really Scaled PostgreSQL for Hundreds of Millions of Users

DevOps Coach

Jan 20, 2026 · Cloud Native

How to Scale Kubernetes to Hundreds of Clusters: A Practical Enterprise Guide

This article walks you through the complete journey from a single Kubernetes cluster to a production‑grade, multi‑cluster platform, covering managed services, capacity planning, GitOps pipelines, networking, observability, cost optimisation, upgrade strategies, and the people and processes needed for sustainable large‑scale operations.

Cloud NativeCost ManagementInfrastructure

0 likes · 27 min read

How to Scale Kubernetes to Hundreds of Clusters: A Practical Enterprise Guide

Mike Chen's Internet Architecture

Jan 5, 2026 · Databases

Scaling Redis QPS from 50k to 500k: Sharding, Replication, and Performance Tweaks

This guide explains how to boost Redis query per second (QPS) from 50,000 to 500,000 by applying horizontal sharding, read‑write separation with master‑slave replication, memory and data‑structure optimizations, and network‑level tuning, while providing concrete examples and configuration tips.

RedisScalingSharding

0 likes · 5 min read

Scaling Redis QPS from 50k to 500k: Sharding, Replication, and Performance Tweaks

Tencent Cloud Developer

Dec 30, 2025 · Backend Development

Mastering Microservices: Design Principles, Service Modeling, Integration, and Scaling Strategies

This comprehensive guide explains microservice fundamentals, when to adopt them, key design principles, service modeling techniques, integration patterns, versioning, data handling, monolith decomposition, Conway's law, scaling tactics, and the situations where microservices may not be the right choice, providing actionable insights for building resilient backend systems.

MicroservicesScalingarchitecture

0 likes · 23 min read

Mastering Microservices: Design Principles, Service Modeling, Integration, and Scaling Strategies

Alibaba Cloud Observability

Dec 29, 2025 · Cloud Native

How Alibaba Cloud Log Service Supercharges Dify’s Scaling and Cuts DB Costs

This article examines Dify’s production‑scale bottlenecks caused by heavy PostgreSQL logging, explains why a cloud‑native log service (SLS) better matches the append‑only, high‑throughput nature of workflow logs, and provides a step‑by‑step migration guide that dramatically reduces database pressure, storage cost, and unlocks advanced analytics.

Alibaba Cloud Log ServiceCloud NativeDify

0 likes · 17 min read

How Alibaba Cloud Log Service Supercharges Dify’s Scaling and Cuts DB Costs

MaGe Linux Operations

Dec 22, 2025 · Big Data

How to Quickly Resolve Kafka Consumer Lag: Scaling, Partitioning, and Tuning Strategies

This guide walks you through diagnosing Kafka consumer lag, from monitoring the current backlog and identifying root causes to applying scaling, partition adjustments, configuration tweaks, and temporary offset resets, while providing scripts, code samples, and best‑practice recommendations for reliable recovery.

Consumer LagKafkaKubernetes

0 likes · 29 min read

How to Quickly Resolve Kafka Consumer Lag: Scaling, Partitioning, and Tuning Strategies

Full-Stack DevOps & Kubernetes

Dec 17, 2025 · Databases

10 Essential Steps to Optimize Your Database for High‑Performance E‑Commerce

This article shares practical, step‑by‑step guidance from a 15‑year e‑commerce veteran on why, when, and how to optimize databases—including segregation, archiving, query tuning, replication lag detection, parameter tweaks, partitioning, ProxySQL, caching, vertical scaling, and monitoring—to achieve faster, more reliable services.

Database OptimizationPerformance tuningScaling

0 likes · 10 min read

10 Essential Steps to Optimize Your Database for High‑Performance E‑Commerce

Ray's Galactic Tech

Dec 16, 2025 · Operations

How to Eliminate Kafka Consumer Lag: 4 Proven Strategies and Advanced Tips

This guide explains why Kafka consumer lag occurs, presents four classic solutions—including horizontal scaling, performance tuning, multi‑group consumption, and offset reset—plus advanced practices like dead‑letter queues, partition design, rebalance mitigation, and monitoring to help engineers quickly diagnose and resolve backlog issues.

Consumer LagScalingbest-practices

0 likes · 8 min read

How to Eliminate Kafka Consumer Lag: 4 Proven Strategies and Advanced Tips

AntTech

Oct 9, 2025 · Artificial Intelligence

Ling-1T: The Trillion‑Parameter AI Model Redefining Efficient Reasoning

Ling-1T, a trillion‑parameter flagship non‑thinking model, combines 50 billion active parameters per token, 128 K context, Evo‑CoT reasoning, and FP8 mixed‑precision training to achieve state‑of‑the‑art performance on complex reasoning, code generation, and multimodal tasks while outlining its architecture, benchmarks, limitations, and future roadmap.

AIFP8LLM

0 likes · 11 min read

Ling-1T: The Trillion‑Parameter AI Model Redefining Efficient Reasoning

DevOps Coach

Oct 5, 2025 · Cloud Native

How Medium Scales Microservices with Kubernetes: Architecture, Tools, and Tuning

Medium explains why it chose Kubernetes for microservice management, describes its multi‑cluster deployment across four availability zones, details configuration tooling with Terraform, and shares scaling optimizations using a cluster over‑provisioner and pod preemption to achieve smoother node utilization.

Cloud NativeCluster OverprovisionerKubernetes

0 likes · 7 min read

How Medium Scales Microservices with Kubernetes: Architecture, Tools, and Tuning

MaGe Linux Operations

Aug 19, 2025 · Big Data

Master Kafka High Availability: Replica Sync & Disaster Recovery Strategies

This article provides a comprehensive guide to building enterprise‑grade, highly available Kafka clusters, covering architecture design, hardware planning, production‑level broker configurations, ISR management, monitoring, fault‑tolerance procedures, rolling upgrades, capacity planning, and automation scripts for seamless operations.

KafkaMonitoringOperations

0 likes · 16 min read

Master Kafka High Availability: Replica Sync & Disaster Recovery Strategies

Data Party THU

Aug 19, 2025 · Artificial Intelligence

Why RL Fine‑Tuning Fails to Extend LLM Reasoning Limits: Entropy Collapse Explained

This article examines how reinforcement learning fine‑tuning influences large language model reasoning, revealing that RL primarily amplifies pre‑trained capabilities, suffers from entropy collapse, and fails to push the model’s reasoning boundary, supported by extensive experiments on scaling laws, entropy analysis, and mitigation techniques.

LLMRLRLVR

0 likes · 24 min read

Why RL Fine‑Tuning Fails to Extend LLM Reasoning Limits: Entropy Collapse Explained

Ops Community

Jul 24, 2025 · Operations

How a Small E‑commerce Site Scaled to 10 Million Daily Visits: Real‑World Architecture Lessons

This article details a small‑to‑mid‑size e‑commerce platform’s journey from a few thousand daily page views to ten million, covering business challenges, three architecture evolution stages, key technical solutions, performance optimizations, cost‑control strategies, and practical automation tips.

MonitoringOperationsPerformance Optimization

0 likes · 14 min read

How a Small E‑commerce Site Scaled to 10 Million Daily Visits: Real‑World Architecture Lessons

Tech Freedom Circle

Jul 22, 2025 · Backend Development

How I Resolved an 8‑Million‑Message MQ Backlog at 2 AM: A Proven Generic Solution

At 2 AM an alert triggered when a RocketMQ queue surged from 500 K to 10 M messages, causing severe latency; the article walks through root‑cause analysis, a five‑step emergency fix, long‑term architectural upgrades, monitoring, and scripts to reliably eliminate such MQ backlogs.

BacklogMessage QueueMonitoring

0 likes · 26 min read

How I Resolved an 8‑Million‑Message MQ Backlog at 2 AM: A Proven Generic Solution

dbaplus Community

Jun 26, 2025 · Operations

How AI Can Transform Kubernetes Operations: 10 Smart Use Cases

This article explores ten practical AI‑driven scenarios for Kubernetes operations—including intelligent monitoring, automated scaling, log analysis, fault repair, resource optimization, CI/CD automation, security checks, knowledge‑base assistance, capacity planning, and an ops assistant—detailing methods, tools, and implementation tips.

AI OpsKubernetesScaling

0 likes · 12 min read

How AI Can Transform Kubernetes Operations: 10 Smart Use Cases

IT Services Circle

Jun 21, 2025 · Backend Development

How Instagram Scaled to 14 Million Users: Inside Its Backend Architecture

This article recounts a 2009 photo‑sharing startup idea, then dives into Instagram’s backend design principles, cloud infrastructure, request flow, data storage, sharding, caching, background jobs, and monitoring, illustrating how disciplined engineering enabled rapid scaling to millions of users.

Scalingbackendcloud

0 likes · 9 min read

How Instagram Scaled to 14 Million Users: Inside Its Backend Architecture

AI Large Model Application Practice

May 30, 2025 · Artificial Intelligence

Why Layer Normalization Stabilizes Transformers: A Deep Dive

This article explains the mathematical foundation of layer normalization, why it is needed for deep neural networks like Transformers, how scaling (γ) and bias (β) parameters restore important signal variations, and practical placement tips for stable training.

BiasLayer NormalizationScaling

0 likes · 8 min read

Why Layer Normalization Stabilizes Transformers: A Deep Dive

macrozheng

Apr 29, 2025 · Backend Development

How to Tame a 100× Traffic Surge: Practical Strategies for Backend Engineers

This guide walks backend developers through a step‑by‑step approach to handle sudden 100‑fold traffic spikes, covering emergency response, traffic analysis, robust system design, scaling techniques, circuit breaking, message queuing, and stress testing to keep services resilient and performant.

Backend PerformanceCircuit BreakingRate Limiting

0 likes · 12 min read

How to Tame a 100× Traffic Surge: Practical Strategies for Backend Engineers

IT Services Circle

Apr 23, 2025 · Backend Development

Handling Sudden Traffic Spikes in Backend Systems

The article outlines a comprehensive approach for backend engineers to manage a sudden 100‑fold increase in traffic, covering emergency response, traffic analysis, robust system design, rate limiting, circuit breaking, scaling, sharding, pooling, caching, asynchronous processing, and stress testing to ensure system stability and performance.

CachingCircuit BreakingRate Limiting

0 likes · 13 min read

Handling Sudden Traffic Spikes in Backend Systems

Tencent Cloud Developer

Apr 23, 2025 · Cloud Native

Microservices Architecture: Principles, Modeling, Integration, and Scaling

Microservices are small, autonomous services that replace monolithic codebases by emphasizing loose coupling, high cohesion, bounded contexts, technology-agnostic integration via REST, RPC, or events, disciplined code governance, semantic versioning, local transactions with eventual consistency, and robust scaling patterns such as timeouts, circuit breakers, and auto-scaling, while reflecting organizational structure and avoiding premature complexity.

Distributed SystemsScalingarchitecture

0 likes · 19 min read

Microservices Architecture: Principles, Modeling, Integration, and Scaling

ITPUB

Apr 13, 2025 · Operations

How Cursor Scaled Its AI Code Editor: Lessons from Indexing to Object Storage

Cursor, the AI‑powered code editor, grew to handle billions of document queries and over a hundred‑million model calls daily, prompting a multi‑stage infrastructure overhaul that moved from a failing YugaByte setup to PostgreSQL RDS, then to object‑storage‑backed databases, while tackling indexing, inference scaling, and cold‑start challenges.

AIInfrastructureScaling

0 likes · 11 min read

How Cursor Scaled Its AI Code Editor: Lessons from Indexing to Object Storage

Baobao Algorithm Notes

Mar 30, 2025 · Artificial Intelligence

Why Scaling, Data, and Infra Matter More Than Reward Design in R1 Replication

The article analyses two months of community attempts to reproduce DeepSeek R1, highlighting that model scaling, high‑quality data, robust training infrastructure, and careful hyper‑parameter tuning outweigh pure reward‑based tricks, and it outlines common pitfalls and future research directions.

DeepSeekInfrastructureLLM

0 likes · 13 min read

Why Scaling, Data, and Infra Matter More Than Reward Design in R1 Replication

Xiaolei Talks DB

Dec 27, 2024 · Databases

Mastering Production TiDB Cluster Management: Access, Scaling, and Upgrades

This guide walks through accessing a production TiDB cluster via pod IP, Service ClusterIP, or DNS, initializing users and databases, and performing scaling and version upgrades by editing the cluster's YAML configuration in Kubernetes.

Database operationsKubernetesScaling

0 likes · 9 min read

Mastering Production TiDB Cluster Management: Access, Scaling, and Upgrades

Raymond Ops

Dec 19, 2024 · Operations

How to Auto‑Scale Non‑CPU Apps with cAdvisor Network Metrics in Kubernetes

This guide explains how to use cAdvisor‑provided container network traffic counters as custom metrics for Kubernetes HPA, covering metric collection, Prometheus‑adapter configuration, verification, and a complete HPA testing workflow for elastic scaling of non‑CPU‑intensive workloads.

KubernetesPrometheusScaling

0 likes · 7 min read

How to Auto‑Scale Non‑CPU Apps with cAdvisor Network Metrics in Kubernetes

Alibaba Cloud Developer

Nov 19, 2024 · Databases

What’s New in PolarDB‑X 2.4.1? Cloud Backup, Online DDL, Physical Scaling & TTL Explained

The article introduces PolarDB‑X version 2.4.1, detailing its enterprise‑grade operational enhancements such as cloud backup set restore, native online DDL with OMC, physical file‑based scaling, flexible data TTL, the overall architecture, and various deployment options for multi‑cloud environments.

Online DDLPolarDB-XScaling

0 likes · 19 min read

What’s New in PolarDB‑X 2.4.1? Cloud Backup, Online DDL, Physical Scaling & TTL Explained

Goodme Frontend Team

Nov 18, 2024 · Frontend Development

Add Rotation and Scaling to Video Previews with React and Vime

This article explains how to implement video rotation, fullscreen handling, and proportional scaling in a React application using the Vime library and CSS transforms, covering container setup, control customization, and code examples for a seamless user experience.

CSS transformFrontendScaling

0 likes · 10 min read

Add Rotation and Scaling to Video Previews with React and Vime

NewBeeNLP

Oct 16, 2024 · Artificial Intelligence

Unlocking Long-Sequence LLMs: Position Embeddings, Scaling, and Efficient Attention

This article reviews recent advances in training and inference for long‑sequence large language models, comparing ALIBI and RoPE position embeddings, exploring RoPE scaling techniques, analyzing attention optimizations, and outlining practical data, evaluation, and system frameworks for scalable LLM deployment.

Flash AttentionLLMRoPE

0 likes · 14 min read

Unlocking Long-Sequence LLMs: Position Embeddings, Scaling, and Efficient Attention

macrozheng

Aug 2, 2024 · Backend Development

How to Quickly Resolve Massive Kafka Message Backlog in Production

This guide explains why Kafka message backlogs occur, how to diagnose bugs, optimize consumer logic, and use temporary topics for emergency scaling, while emphasizing monitoring, alerts, and proper offset handling to keep your streaming system healthy.

BacklogConsumerJava

0 likes · 5 min read

How to Quickly Resolve Massive Kafka Message Backlog in Production

DevOps Cloud Academy

May 31, 2024 · Cloud Native

Optimizing RabbitMQ Performance on Kubernetes

This guide explains how to deploy RabbitMQ on Kubernetes and improve its performance through Helm installation, resource tuning, monitoring, scaling, security hardening, and advanced configuration techniques, providing practical code examples for each step.

KubernetesMonitoringPerformance Optimization

0 likes · 9 min read

Optimizing RabbitMQ Performance on Kubernetes

MaGe Linux Operations

May 25, 2024 · Databases

Redis Cluster Mastery: Step‑by‑Step Setup, Scaling, and Management Guide

This tutorial explains how Redis Cluster automatically shards data across multiple nodes, covering required TCP ports, hash‑slot sharding, master‑slave replication, consistency trade‑offs, essential configuration parameters, and step‑by‑step commands for creating, expanding, resharding, and managing a production‑grade Redis cluster.

ClusterDatabaseRedis

0 likes · 18 min read

Redis Cluster Mastery: Step‑by‑Step Setup, Scaling, and Management Guide

JavaEdge

May 18, 2024 · Cloud Native

Why We Abandoned Microservices: Lessons from Scaling a High‑Throughput Event Pipeline

The article recounts how a fast‑growing event‑processing platform initially embraced microservices, then faced queue bottlenecks, test‑suite overload, and operational complexity, leading the team to consolidate over 140 services into a single, shared‑queue architecture, and shares the practical insights and trade‑offs learned from this transition.

MicroservicesScalingService Architecture

0 likes · 12 min read

Why We Abandoned Microservices: Lessons from Scaling a High‑Throughput Event Pipeline

DevOps Cloud Academy

May 6, 2024 · Cloud Native

How to Deploy a Highly Available Application on Kubernetes

This article explains key Kubernetes configurations—such as pod replicas, pod anti‑affinity, deployment strategies, graceful termination, probes, resource allocation, scaling, and disruption budgets—to achieve high availability and zero‑downtime deployments for containerized applications in production.

Cloud NativeKubernetesProbes

0 likes · 20 min read

How to Deploy a Highly Available Application on Kubernetes

Full-Stack Internet Architecture

Apr 25, 2024 · Databases

Redis Cluster: Architecture, Setup, Testing, and High Availability

This article explains Redis Cluster's sharding architecture, demonstrates how to configure multiple Redis nodes on different ports, shows commands for creating and testing the cluster, and illustrates failover behavior, highlighting its scalability and high‑availability advantages over Sentinel mode for large‑scale data workloads.

ClusterDatabaseRedis

0 likes · 11 min read

Redis Cluster: Architecture, Setup, Testing, and High Availability

ITPUB

Mar 27, 2024 · Backend Development

How Instagram Scaled to 14 Million Users with Just Three Engineers

This article details how Instagram grew from zero to 14 million users in just over a year using three engineers by applying three core principles and a reliable AWS‑based tech stack covering frontend, load balancing, backend, PostgreSQL sharding, S3 storage, Redis caching, asynchronous task queues, and comprehensive monitoring.

MonitoringPostgreSQLRedis

0 likes · 9 min read

How Instagram Scaled to 14 Million Users with Just Three Engineers

Ops Development Stories

Mar 18, 2024 · Cloud Native

13 Essential Kubernetes Tips to Boost Scalability, Security, and Management

Discover 13 practical Kubernetes techniques—including PreStop hooks, automatic secret rotation, ephemeral containers, custom metric autoscaling, init containers, node affinity, taints and tolerations, pod priority, ConfigMaps, debugging tools, resource requests, CRDs, and API automation—to enhance application reliability, scalability, and security in cloud‑native environments.

KubernetesScalingpod management

0 likes · 21 min read

13 Essential Kubernetes Tips to Boost Scalability, Security, and Management

DevOps Operations Practice

Mar 14, 2024 · Operations

Resolving Frequent Crashes of a Single-Node Prometheus Deployment: Analysis and Solutions

This article analyzes why a single Prometheus instance repeatedly runs out of memory and crashes, explains the underlying storage mechanisms, and presents practical solutions such as metric reduction, retention tuning, federation architecture, and remote storage integration to improve stability and scalability.

FederationMonitoringPerformance

0 likes · 6 min read

Resolving Frequent Crashes of a Single-Node Prometheus Deployment: Analysis and Solutions

Qunar Tech Salon

Feb 20, 2024 · Databases

Qunar.com Redis Automation Operations System: Architecture, Deployment, Migration, Scaling, and Inspection

This article details Qunar.com's Redis automation operations system, covering background challenges, the high‑availability cluster architecture, resource management, automated deployment, various migration strategies, scaling mechanisms with RedisGate, inspection processes, and future AI‑driven enhancements.

AIDatabase operationsMigration

0 likes · 14 min read

Qunar.com Redis Automation Operations System: Architecture, Deployment, Migration, Scaling, and Inspection

MaGe Linux Operations

Feb 15, 2024 · Cloud Native

Mastering Kubernetes StatefulSet: Architecture, Access, and Lifecycle Management

This article explains Kubernetes StatefulSet fundamentals, its headless service networking, access patterns, creation workflow, controller mechanics, and detailed procedures for updating, scaling, and deleting stateful pods with illustrative code examples.

Cloud NativeKubernetesScaling

0 likes · 11 min read

Mastering Kubernetes StatefulSet: Architecture, Access, and Lifecycle Management

ITPUB

Feb 13, 2024 · Databases

Achieve Seamless Second‑Level Database Scaling for High‑Throughput Microservices

This guide explains how to design a high‑concurrency, high‑throughput internet architecture that ensures database high availability with double‑master sync and virtual IPs, and how to horizontally shard and smoothly expand the cluster in seconds using configuration changes, reloads, and cleanup steps.

MicroservicesScalingSharding

0 likes · 8 min read

Achieve Seamless Second‑Level Database Scaling for High‑Throughput Microservices

Mike Chen's Internet Architecture

Feb 7, 2024 · Operations

Understanding Load Balancing: Principles, Types, and Application Scenarios

This article explains the fundamentals of load balancing, covering its principles, classifications from layer 2 to layer 7, common software implementations, and typical application scenarios such as high traffic handling, horizontal scaling, fault tolerance, and multi‑zone disaster recovery.

Scalingload balancingnetwork operations

0 likes · 8 min read

Understanding Load Balancing: Principles, Types, and Application Scenarios

Java High-Performance Architecture

Jan 2, 2024 · Databases

How GitHub Upgraded 1,200 MySQL Servers to 8.0 Without Downtime

GitHub migrated over 1,200 MySQL hosts to version 8.0 through a staged, zero‑downtime process, detailing infrastructure scale, tooling, and five upgrade steps, while also highlighting new Java architecture video courses for developers.

GitHubScalingdatabase migration

0 likes · 5 min read

How GitHub Upgraded 1,200 MySQL Servers to 8.0 Without Downtime

dbaplus Community

Dec 18, 2023 · Cloud Native

Mastering Secure and Scalable Kubernetes Deployments: Essential Best Practices

This guide outlines practical Kubernetes best practices—including health checks, graceful shutdown, resource limits, security policies, network policies, RBAC, autoscaling, and logging—to help you build secure, resilient, and efficiently managed services on a cloud‑native platform.

Cloud NativeDevOpsKubernetes

0 likes · 22 min read

Mastering Secure and Scalable Kubernetes Deployments: Essential Best Practices

Programmer DD

Dec 14, 2023 · Databases

How GitHub Upgraded Its 1200‑Node MySQL Cluster to 8.0 Without Downtime

GitHub detailed its year‑long, multi‑team effort to seamlessly upgrade over 1,200 MySQL servers—supporting more than 300 TB of data and 5.5 million queries per second—from 5.7 to 8.0, outlining the infrastructure, tools, and step‑by‑step migration strategy used to maintain service reliability.

GitHubScalingdatabase migration

0 likes · 5 min read

How GitHub Upgraded Its 1200‑Node MySQL Cluster to 8.0 Without Downtime

Selected Java Interview Questions

Nov 26, 2023 · Databases

Understanding and Solving Hot Key Issues in Redis

Hot keys in Redis—high‑frequency accessed keys—can overload the cache and downstream databases, causing crashes; this article explains what hot keys are, why they arise, their risks, how to detect them, and practical mitigation strategies such as scaling clusters, using secondary caches, monitoring commands, and traffic analysis.

CacheDatabase PerformanceHot Key

0 likes · 6 min read

Understanding and Solving Hot Key Issues in Redis

Su San Talks Tech

Nov 20, 2023 · Databases

Mastering Redis Cluster: Architecture, Sharding, and Scaling Explained

This article explains Redis Cluster’s decentralized architecture, slot‑based sharding, node communication, data migration, and client redirection mechanisms, showing how to scale Redis horizontally while maintaining high availability and fault‑tolerance for large‑scale applications.

ClusterScaling

0 likes · 13 min read

Mastering Redis Cluster: Architecture, Sharding, and Scaling Explained

Laravel Tech Community

Oct 26, 2023 · Cloud Native

How Kuaishou Scales Live E‑commerce Flash Sales with an Elastic Container Cloud and Hybrid Cloud Architecture

To handle billions of daily users and massive flash‑sale spikes in its live‑ecommerce streams, Kuaishou built a large‑scale elastic container cloud, integrated with Alibaba Cloud in a hybrid‑cloud setup, employing load balancing, caching, message queues, rate‑limiting, and intelligent resource scheduling to achieve million‑request‑per‑second throughput and high availability.

KuaishouLive E‑commerceScaling

0 likes · 8 min read

How Kuaishou Scales Live E‑commerce Flash Sales with an Elastic Container Cloud and Hybrid Cloud Architecture

dbaplus Community

Oct 25, 2023 · Databases

ByConity vs ClickHouse: Deep Dive into Architecture, Features, and Performance

This article compares ByConity and ClickHouse from a usage perspective, detailing their architectural differences, core components, basic operations such as table creation, data import and query, distributed transaction support, special table engines, scaling strategies, and deployment requirements.

ByConityClickHouseScaling

0 likes · 26 min read

ByConity vs ClickHouse: Deep Dive into Architecture, Features, and Performance

Continuous Delivery 2.0

Sep 21, 2023 · Operations

Scaling DevOps in Large Organizations: Normalization, Standardization, and Platformization

The article outlines how organizations over a hundred engineers must go beyond merely copying DevOps practices by adopting three progressive steps—normalization, standardization, and platformization—to achieve measurable, scalable efficiency, and concludes with a promotional notice for a Python‑based continuous deployment training course.

OperationsPlatformizationScaling

0 likes · 8 min read

Scaling DevOps in Large Organizations: Normalization, Standardization, and Platformization

Senior Tony

Sep 12, 2023 · Backend Development

What Really Powers High‑Concurrency Systems? Practical Solutions Explained

This article breaks down real‑world high‑concurrency strategies—horizontal scaling, caching, Elasticsearch, sharding, message‑queue smoothing, and cellization—explaining when each applies, their trade‑offs, and practical tips for building scalable, reliable backend services.

CachingMessage QueueScaling

0 likes · 9 min read

What Really Powers High‑Concurrency Systems? Practical Solutions Explained

21CTO

Aug 26, 2023 · R&D Management

From Developer to CTO: Building Tech Ops and Scalable Teams

This guide shares a developer’s personal journey to becoming a CTO, covering the shift from coding to technical operations, building scalable team structures, adopting agile practices, and managing growth in a software startup.

CTOR&D ManagementScaling

0 likes · 21 min read

From Developer to CTO: Building Tech Ops and Scalable Teams

Code Ape Tech Column

Aug 15, 2023 · Operations

High‑Availability Architecture for a Billion‑Scale Membership System: Dual‑Center ES, Redis, and MySQL Solutions

This article details the design and implementation of a highly available, high‑performance membership system serving over a billion users, covering dual‑center Elasticsearch clusters, traffic‑isolated three‑cluster ES architecture, Redis dual‑center caching, MySQL partitioned clusters, migration strategies, and refined flow‑control and degradation mechanisms.

Distributed SystemsElasticsearchScaling

0 likes · 20 min read

Architect

Aug 10, 2023 · Operations

Capacity Management: Goals, Stages, Optimization Techniques, and Scaling Practices

The article explains how capacity management balances cost control and service quality through defined goals, three development stages, detailed resource optimization methods, stress‑testing metrics and standards, and automated scaling to achieve significant cost reductions while maintaining system stability.

OperationsResource OptimizationScaling

0 likes · 10 min read

Capacity Management: Goals, Stages, Optimization Techniques, and Scaling Practices

Meituan Technology Team

Aug 3, 2023 · Frontend Development

Rome: Enhancing Front‑end Development Collaboration and Efficiency at Meituan

The article details Meituan’s Rome front‑end framework, covering its business and technical background, the engineering ecosystem and evolution path, large‑scale upgrades, IDE‑based development assistance, efficiency and quality improvements, metric collection, real‑world adoption across 1,400+ projects, and future trends such as deeper dev‑chain integration and AI‑assisted coding.

Build OptimizationFrameworkFrontend

0 likes · 29 min read

Rome: Enhancing Front‑end Development Collaboration and Efficiency at Meituan

Top Architect

Jul 6, 2023 · Databases

Understanding HikariCP Connection Pool Sizing: Principles, Experiments, and Practical Guidelines

This article translates and expands on HikariCP's pool‑sizing guidance, explaining why smaller database connection pools often yield better performance, presenting real‑world benchmark data for various pool sizes, and offering a simple formula to calculate an optimal pool size based on CPU cores and effective disks.

Connection PoolHikariCPPostgreSQL

0 likes · 10 min read

Understanding HikariCP Connection Pool Sizing: Principles, Experiments, and Practical Guidelines

Architecture & Thinking

Jun 9, 2023 · Backend Development

Why Do Message Queues Get Backlogged and How to Fix It Fast?

This article examines why message queues become backlogged—covering producer overload, broker persistence failures, and consumer bottlenecks—and outlines a step‑by‑step scaling and remediation strategy to restore smooth processing, including temporary queue expansion, load‑balanced forwarding, and post‑recovery cleanup.

BacklogOperationsScaling

0 likes · 6 min read

Why Do Message Queues Get Backlogged and How to Fix It Fast?

Full-Stack DevOps & Kubernetes

Apr 11, 2023 · Cloud Native

Master Kubernetes Basics: Deploy, Scale, and Update Apps with Simple Commands

This article introduces Kubernetes as an open‑source container orchestration platform, explains its core objects like Pods, Services, ReplicaSets, and Deployments, clarifies its relationship with Docker, and provides a step‑by‑step example covering deployment, exposure, scaling, rolling updates, and rollback using kubectl commands.

DeploymentDevOpsKubernetes

0 likes · 5 min read

Master Kubernetes Basics: Deploy, Scale, and Update Apps with Simple Commands

DataFunTalk

Apr 6, 2023 · Artificial Intelligence

A Comprehensive Survey of Large Language Models: Background, Capabilities, Key Technologies, and Future Directions

This article reviews the rapid progress of large language models (LLMs), covering their historical development, scaling laws, emergent abilities, core technologies such as training and alignment, resource ecosystems, evaluation methods, safety concerns, and prospective research challenges.

AI researchLLMScaling

0 likes · 21 min read

A Comprehensive Survey of Large Language Models: Background, Capabilities, Key Technologies, and Future Directions

dbaplus Community

Mar 23, 2023 · Operations

How Qunar Scaled Container Monitoring with VictoriaMetrics: Lessons from Replacing Prometheus

This article details Qunar's migration from Prometheus to VictoriaMetrics for large‑scale container monitoring, covering the shortcomings of Prometheus at massive data volumes, the architectural choices made, performance improvements achieved, and future optimization plans.

KubernetesMonitoringPrometheus

0 likes · 13 min read

How Qunar Scaled Container Monitoring with VictoriaMetrics: Lessons from Replacing Prometheus

21CTO

Feb 10, 2023 · Cloud Native

Why Kubernetes Is So Hard to Master: A Beginner’s Q&A Walkthrough

This article introduces Kubernetes fundamentals through a series of questions and answers, covering its architecture, node communication, pod scheduling, data storage, external access, scaling mechanisms, and component coordination, all illustrated with clear diagrams.

Cluster ManagementContainersKubernetes

0 likes · 9 min read

Why Kubernetes Is So Hard to Master: A Beginner’s Q&A Walkthrough

Architects Research Society

Feb 9, 2023 · Fundamentals

Agile Architecture Strategies for Scaling Agile Development

This article explains how architecture remains a vital part of agile software development, covering agile‑first approaches, lifecycle‑wide modeling, ownership roles, scaling strategies, demand‑driven design, multi‑view modeling, and practical tips for communicating and evolving architecture without over‑building.

Scalingagilemodeling

0 likes · 40 min read

Agile Architecture Strategies for Scaling Agile Development

Zhuanzhuan Tech

Feb 8, 2023 · Operations

Capacity Management: Goals, Practices, and Optimization at ZuanZuan

This article outlines ZuanZuan’s capacity management approach, covering its objectives, development stages, water‑level metrics, resource optimization techniques, cluster capacity assessment, stress‑test indicators and standards, as well as scaling strategies, demonstrating how systematic capacity management reduces costs while ensuring service stability.

Resource OptimizationScalingcapacity management

0 likes · 12 min read

Capacity Management: Goals, Practices, and Optimization at ZuanZuan

Tencent Tech

Jan 16, 2023 · Operations

How a Mini-Game Scaled to 100M DAU: Architecture, Ops, and Security Lessons

This article examines how the viral mini‑game "Sheep..." overcame its initial 5,000‑QPS bottleneck and scaled to over 100 million daily active users by redesigning its architecture, implementing cloud‑native auto‑scaling, enhancing operational monitoring with CLS, and fortifying security with WAF.

Scalingcloud-nativegame-development

0 likes · 11 min read

How a Mini-Game Scaled to 100M DAU: Architecture, Ops, and Security Lessons

MaGe Linux Operations

Jan 10, 2023 · Cloud Native

When Microservices Backfire: Lessons from Scaling a Data Service Platform

This case study examines S Company's transition to a microservice architecture for its data‑service platform, highlighting initial gains in visibility and deployment cost, the subsequent explosion of complexity, and the eventual rollback to a monolith with insights on trade‑offs, scaling, and operational overhead.

Operational ChallengesScalingarchitecture

0 likes · 12 min read

When Microservices Backfire: Lessons from Scaling a Data Service Platform

MaGe Linux Operations

Dec 21, 2022 · Operations

Mastering Elasticsearch Nodes: Types, Roles, and Scaling Strategies

This guide explains the different Elasticsearch node types, their default roles, how to configure master‑eligible, data, ingest, and coordinating‑only nodes, and provides best‑practice recommendations for planning and scaling large clusters to ensure stability and performance.

Cluster ConfigurationCoordinating NodeData Node

0 likes · 12 min read

Mastering Elasticsearch Nodes: Types, Roles, and Scaling Strategies

Top Architect

Dec 16, 2022 · Databases

Comprehensive Guide to Database Horizontal Scaling, Sharding, and High Availability with MariaDB and Keepalived

This article presents a detailed analysis and step‑by‑step implementation of horizontal database scaling, including sharding strategies, shutdown and stop‑write plans, log‑based migration, dual‑write approaches, and a smooth 2N expansion method, while also covering MariaDB master‑master configuration, dynamic data source addition, and Keepalived high‑availability setup.

MariaDBScalinghigh-availability

0 likes · 37 min read

Comprehensive Guide to Database Horizontal Scaling, Sharding, and High Availability with MariaDB and Keepalived

dbaplus Community

Nov 29, 2022 · Backend Development

How a Mistaken Delete in ElasticSearch Nearly Erased 17 Million Products – Key Lessons

A senior engineer accidentally issued a DELETE request on an ElasticSearch index holding 17 million product records, triggering a massive data loss incident, and the team’s subsequent recovery strategies, scaling challenges, and process improvements are detailed to guide backend developers.

Incident ResponseMicroservicesScaling

0 likes · 14 min read

How a Mistaken Delete in ElasticSearch Nearly Erased 17 Million Products – Key Lessons

vivo Internet Technology

Nov 16, 2022 · Operations

Understanding and Mitigating Bigkey Issues in Redis Operations

Bigkeys—Redis values over 1 MB or structures with more than 2,000 elements—cause memory imbalance, command blocking, network overload, and migration failures, so DBAs must detect them using built‑in commands or RDB analysis, split or partition oversized keys, and tune migration settings to preserve performance and availability.

BigKeyDatabase operationsPerformance

0 likes · 14 min read

Understanding and Mitigating Bigkey Issues in Redis Operations

dbaplus Community

Sep 24, 2022 · Backend Development

Beyond Adding Servers: Mastering the AKF Scale Cube for Efficient Microservice Scaling

When service load spikes, instead of merely adding machines, this article explains how the AKF Scale Cube model—covering X‑axis horizontal scaling, Y‑axis functional or business splitting, and Z‑axis data partitioning—offers elegant, fine‑grained strategies to boost microservice performance and reliability.

AKF Scale CubeData PartitioningMicroservices

0 likes · 10 min read

Beyond Adding Servers: Mastering the AKF Scale Cube for Efficient Microservice Scaling

DevOps

Sep 12, 2022 · Cloud Native

How Slack Designs, Operates, and Scales Its Remote Development Environments

The article explains Slack's cloud‑native development environment—a full, isolated copy of the Slack system running on AWS EC2—detailing why remote environments are used, how they are managed with custom tooling, and how dynamic provisioning enables massive scaling while controlling costs.

Cloud NativeDevelopment EnvironmentScaling

0 likes · 9 min read

How Slack Designs, Operates, and Scales Its Remote Development Environments

DevOps

Sep 2, 2022 · Operations

Seven Lessons Learned When Growing Your Configuration Management

Scaling a configuration management team from a small startup to a large enterprise reveals seven key lessons about managing tool costs, customization control, infrastructure scalability, build environment governance, early adoption of third‑party solutions, ensuring traceability with many developers, and continuously evaluating tool costs versus market alternatives.

Build AutomationConfiguration ManagementDevOps

0 likes · 10 min read

Seven Lessons Learned When Growing Your Configuration Management

Zhuanzhuan Tech

Jul 20, 2022 · Backend Development

Design and Evolution of the Price‑Increase Coupon Service for a C2B Recycling Platform

This article details the design, evolution, and scaling strategies of a price‑increase coupon system for a C2B digital product recycling platform, covering its initial experimental phase, platformization, sharding‑JDBC implementation, intelligent coupon recommendation, Elasticsearch integration, and operational optimizations for high‑throughput stability.

CouponMicroservicesScaling

0 likes · 11 min read

Design and Evolution of the Price‑Increase Coupon Service for a C2B Recycling Platform

Practical DevOps Architecture

Jul 18, 2022 · Cloud Computing

Deploying and Exposing an Nginx Service on Kubernetes with YAML Generation and Scaling

This guide walks through creating an Nginx deployment on Kubernetes, exposing it via a NodePort service, generating the corresponding YAML configuration, performing dynamic scaling tests, and using various kubectl commands to manage resources and namespaces.

DeploymentScalingService

0 likes · 4 min read

Deploying and Exposing an Nginx Service on Kubernetes with YAML Generation and Scaling

DevOps

Jul 18, 2022 · R&D Management

Practical Strategies for Scaling Lean‑Agile Transformation in Large Development Teams

The article examines the challenges of moving large, multi‑team software organizations from waterfall to lean‑agile practices, offering concrete tactics for product planning, cross‑team coordination, integration, testing, and release, and concludes with a note on an upcoming DevOps hackathon.

ScalingTeam Collaboration

0 likes · 10 min read

Practical Strategies for Scaling Lean‑Agile Transformation in Large Development Teams

IT Architects Alliance

Jul 17, 2022 · Industry Insights

How Meituan Scaled Instant Delivery with Distributed Architecture and AI

This article examines Meituan's five‑year evolution of instant logistics, detailing the distributed, high‑concurrency architecture, AI‑driven optimization, scalability techniques, fault‑tolerance mechanisms, and future challenges faced by its real‑time delivery platform.

AIDistributed SystemsMicroservices

0 likes · 11 min read

How Meituan Scaled Instant Delivery with Distributed Architecture and AI

Cloud Native Technology Community

Jul 12, 2022 · Cloud Native

How Tencent Cut Kubernetes CPU Costs by 70%: A Full‑Scale Cloud‑Native Optimization Journey

This article presents a comprehensive, data‑driven case study of how Tencent’s internal Kubernetes/TKE platform reduced monthly CPU usage by up to 70% and memory usage by 50% through systematic cost data collection, VPA/HPA enhancements, custom scheduling, node‑level over‑commit, and safe node decommissioning, while maintaining zero‑incident reliability.

Cloud NativeKubernetesOperations

0 likes · 28 min read

How Tencent Cut Kubernetes CPU Costs by 70%: A Full‑Scale Cloud‑Native Optimization Journey

21CTO

Jun 21, 2022 · R&D Management

From Farm State to GitHub CTO: Jason Warner’s Journey and Lessons on Scaling Tech Platforms

In a Stack Overflow Podcast interview, former GitHub CTO Jason Warner recounts his unconventional path from a Connecticut farm to leading massive platform scaling, shares insights on engineering leadership, product strategy, venture investing, and the future of cloud, data, and blockchain technologies.

CTOCloud ComputingGitHub

0 likes · 9 min read

From Farm State to GitHub CTO: Jason Warner’s Journey and Lessons on Scaling Tech Platforms

Selected Java Interview Questions

May 21, 2022 · Databases

Understanding Database Connection Pool Sizing: Lessons from HikariCP and Real‑World Performance Tests

This article translates and expands on a HikariCP wiki post, explaining why a small database connection pool often yields better performance than a large one, presenting benchmark data, a practical sizing formula, and guidance for tuning pools in various environments.

Connection PoolHikariCPPerformance

0 likes · 9 min read

Understanding Database Connection Pool Sizing: Lessons from HikariCP and Real‑World Performance Tests

G7 EasyFlow Tech Circle

May 20, 2022 · Backend Development

Securing Public‑Facing Kafka: Authentication, Configuration, and Scaling Strategies

This article shares G7 Tech’s practical experience of exposing Kafka to the public internet, covering encryption, AAA, three authentication schemes, listener configuration, scaling for massive topics with Kubernetes, storage optimization, and integration with the gmq management platform and Kafka‑REST.

AuthenticationKafkaKubernetes

0 likes · 10 min read

Securing Public‑Facing Kafka: Authentication, Configuration, and Scaling Strategies

Architecture Digest

May 19, 2022 · Operations

Designing High‑Availability Stateless Services: Redundancy, Load Balancing, Scaling, and Monitoring

The article explains how to build highly available stateless services by using redundant deployment, vertical and horizontal scaling, appropriate load‑balancing algorithms, monitoring, and automated recovery, and also discusses high‑concurrency identification, CDN/OSS usage, and practical recommendations for cloud‑native environments.

MonitoringScalingVertical Scaling

0 likes · 11 min read

Designing High‑Availability Stateless Services: Redundancy, Load Balancing, Scaling, and Monitoring

Cloud Native Technology Community

May 10, 2022 · Cloud Native

How PayPal Scaled Kubernetes to 4,100 Nodes and 200k Pods

PayPal’s engineering team detailed their journey of scaling Kubernetes from a few hundred nodes to over 4,100 nodes and 200,000 Pods, describing cluster topology, workload generation, API server bottlenecks, controller manager and scheduler tuning, extensive etcd optimizations, and the resulting performance gains that met Kubernetes SLOs.

Cloud NativeKubernetesPayPal

0 likes · 13 min read

How PayPal Scaled Kubernetes to 4,100 Nodes and 200k Pods

Top Architect

May 8, 2022 · Databases

Redis Replication, Sentinel, and Cluster: Mechanisms, Configuration, and Best Practices

This article provides a comprehensive technical guide on Redis performance, covering single‑threaded characteristics, master‑slave replication, Sentinel high‑availability mechanisms, Redis Cluster architecture, configuration steps, code examples, and alternative middleware solutions for scaling and fault tolerance.

ClusterDatabaseRedis

0 likes · 24 min read

Redis Replication, Sentinel, and Cluster: Mechanisms, Configuration, and Best Practices

HomeTech

Apr 27, 2022 · Big Data

AutoStream Real‑Time Computing Platform: Architecture, Resource Management, Scaling, Lakehouse Integration, and PyFlink Practices

This article details Car Home's AutoStream platform evolution from Storm to Flink‑based versions, covering real‑time application scenarios, strict budget‑controlled resource management, automatic scaling, lake‑house architecture with Iceberg, PyFlink integration, and future plans for resource optimisation and batch‑stream unification.

AutoStreamFlinkLakehouse

0 likes · 15 min read

AutoStream Real‑Time Computing Platform: Architecture, Resource Management, Scaling, Lakehouse Integration, and PyFlink Practices

IT Architects Alliance

Apr 27, 2022 · Operations

High‑Availability Architecture for a Billion‑Scale Membership System: ES Dual‑Center, Redis Caching, MySQL Migration, and Flow‑Control Strategies

This article details how a membership system serving billions of users achieves high performance and high availability through a dual‑center Elasticsearch cluster, traffic‑isolated ES clusters, Redis cache with distributed locks, MySQL dual‑center partitioning, and fine‑grained flow‑control and degradation mechanisms, all while ensuring zero‑downtime migrations and consistent data.

Flow ControlScalingdistributed-systems

0 likes · 20 min read

High‑Availability Architecture for a Billion‑Scale Membership System: ES Dual‑Center, Redis Caching, MySQL Migration, and Flow‑Control Strategies

Top Architect

Apr 3, 2022 · Databases

Designing Data Architecture for Microservices: Database Choices, Decoupling, and Scaling

This article explains how to design data architecture for microservice systems, covering the advantages of microservices, decoupling principles, lightweight APIs, DevOps integration, database per service versus shared databases, polyglot persistence, and why MongoDB is a suitable choice for scalable, dynamic, and sharded data storage.

Database DesignMongoDBScaling

0 likes · 17 min read

Designing Data Architecture for Microservices: Database Choices, Decoupling, and Scaling

Open Source Linux

Mar 17, 2022 · Cloud Native

How PayPal Scaled Kubernetes to 4,000 Nodes and 200,000 Pods

PayPal’s engineering team detailed their journey of scaling Kubernetes from a few hundred nodes to over 4,000 nodes and 200,000 pods, describing the cluster topology, workload generation, bottlenecks in the API server, controller manager, scheduler, and etcd, and the optimizations that enabled stable performance at massive scale.

Cloud NativeKubernetesPayPal

0 likes · 12 min read

How PayPal Scaled Kubernetes to 4,000 Nodes and 200,000 Pods

21CTO

Mar 13, 2022 · Backend Development

How Meituan Built a Fault‑Tolerant Instant Logistics Platform at Scale

Meituan’s instant logistics platform evolved from vertical services to a micro‑service, distributed architecture that handles massive order‑rider matching, ultra‑low latency, and high availability, leveraging AI for pricing, ETA, scheduling, and employing robust scaling, consistency, and disaster‑recovery techniques.

AIDistributed SystemsLogistics

0 likes · 10 min read

How Meituan Built a Fault‑Tolerant Instant Logistics Platform at Scale

Architecture Digest

Jan 13, 2022 · Backend Development

Scaling RabbitMQ to Million‑Message Throughput: Architecture, Sharding, Federation, and High‑Availability Practices

This article explains how to horizontally scale RabbitMQ clusters to handle millions of messages per second by leveraging cluster modes, mirror queues, sharding plugins, consistent‑hash exchanges, federation, and high‑availability configurations, while also covering practical scenarios such as retries, delayed tasks, and Spring AMQP integration.

FederationMessage QueueRabbitMQ

0 likes · 22 min read

Scaling RabbitMQ to Million‑Message Throughput: Architecture, Sharding, Federation, and High‑Availability Practices

21CTO

Dec 24, 2021 · Operations

Why Xi'an’s One‑Code Pass Crashed: Analyzing System Overload and Scaling Fixes

On December 20 the Xi'an health‑code app "One‑Code Pass" suffered a massive outage as a sudden traffic surge overwhelmed its query‑heavy backend, exposing network bottlenecks and a lack of scaling mechanisms, prompting a detailed technical analysis and proposed architectural remedies.

Rate LimitingScalingsystem overload

0 likes · 9 min read

Why Xi'an’s One‑Code Pass Crashed: Analyzing System Overload and Scaling Fixes