Key Elements and Evolution of Large‑Scale Website Architecture
This article summarizes the evolution, patterns, and five core factors—performance, availability, scalability, extensibility, and security—of large‑scale website architecture, covering server tiers, caching, clustering, load balancing, data redundancy, and security measures.
Preface
This article is a textual mind‑map of the book Large‑Scale Website Architecture Design by Li Zhihui, focusing on the five essential factors: performance, availability, scalability, extensibility, and security.
The discussion revolves around application, cache, and storage servers.
Overview
Three dimensions: evolution, pattern, element.
Five elements: performance, availability, scalability, extensibility, security.
Evolution Timeline
Reference: Large‑Scale Website Architecture Evolution
Initial stage : a single server hosts application, database, files, etc. (e.g., LAMP).
Separation of application and data services : three servers for application, file, and database.
Cache introduction : local cache on application servers and distributed cache servers.
Application server clustering : load balancer distributes requests across the cluster.
Database read/write separation : master‑slave replication with transparent data access.
Reverse proxy and CDN : both act as caches, deployed in central and ISP data centers.
Distributed file and database systems : sharding and business‑level database partitioning.
NoSQL and search engines : better support for scalable distribution.
Business splitting : independent deployment of each business module, communication via hyperlinks, message queues, or shared storage.
Distributed services : extraction of common services for independent deployment.
The core value of large‑scale website architecture is flexibility to meet evolving business needs, driven primarily by business growth.
Avoid blindly copying solutions from big companies.
Don’t pursue technology for its own sake.
Don’t expect technology to solve every problem.
Architecture Patterns
Patterns are valuable because they are repeatable.
Layered : horizontal segmentation.
Partitioned : vertical segmentation.
Distributed : enables distributed deployment of modules; common solutions include distributed applications, static resources, data/storage, computing, configuration, locks, files, etc.
Cluster : multiple identical servers behind a load balancer.
Cache : place data close to computation; two prerequisites: hotspot access and time‑bounded validity. CDN Reverse proxy Local cache Distributed cache
Asynchronous : decouples systems; typical producer‑consumer model improves availability, speed, and smooths traffic spikes.
Redundancy : high availability via cold/hot backups.
Automation : covers CI/CD, testing, security scanning, deployment, monitoring, alerting, failover, recovery, degradation, and resource allocation.
Security : passwords, mobile verification codes, encryption, CAPTCHAs, filtering, risk control.
Core Elements
Architecture is a high‑level, hard‑to‑change plan focusing on five elements:
Performance
Availability
Scalability
Extensibility
Security
Architecture Details
The following sections elaborate on each element.
High Performance
Key performance metrics:
Response time
Concurrency
Throughput
Performance counters
Testing methods include performance, load, stress, and stability testing.
Performance optimization can be divided into three layers:
Web front‑end optimization : security, caching, load balancing, reducing HTTP requests, browser caching, compression, CSS/JS placement, cookie reduction, browser access tuning, CDN, reverse proxy.
Application server optimization : caching, clustering, asynchronous processing, multithreading, object pooling, data structures, garbage collection, distributed cache, message queues, code optimization.
Storage server optimization : HDD vs SSD, B+ tree vs LSM tree, RAID vs HDFS.
High Availability
Goal: keep services and data accessible despite hardware failures using redundancy and failover.
Stateless application design, session replication, load‑balanced failover, session servers.
Service strategies: hierarchical management, timeout settings, asynchronous calls, degradation, idempotent design.
Data strategies: backup, failover, cold/hot backup, CAP theorem, data recovery.
Software quality: trunk‑branch development, automated testing, pre‑release verification, CI/CD, gray release.
Monitoring: alert systems, graceful degradation, user behavior logging, server performance monitoring, reporting, data collection, management.
Scalability
Large‑scale sites face massive users, complex functions, and massive server deployments.
Architectural scalability: vertical separation (layering), horizontal separation (business splitting), physical separation of functions, scaling via cluster size.
Application server cluster scaling: various load‑balancing algorithms (Round Robin, Weighted RR, Random, Least Connections, Source Hashing, HTTP redirect, DNS, reverse proxy, IP, LVS, etc.).
Distributed cache scaling: Memcached client, server cluster, access model, consistency‑hash challenges.
Data storage scaling: relational DB clusters, NoSQL clusters.
Extensibility
Apply the open‑closed principle at the system‑architecture level.
Build extensible architectures using distributed message queues (e.g., Event‑Driven Architecture).
Leverage distributed services (Web Service, Thrift, Dubbo) for reusable business platforms.
Design extensible data structures such as ColumnFamily.
Use open platforms to create ecosystem.
Website Security Architecture
Common attacks: XSS, SQL injection, CSRF, session hijacking, etc.
Defense techniques: error codes, HTML comments, file upload validation, path traversal protection, form tokens, CAPTCHAs, Referer checks, database schema obfuscation, input sanitization, parameter binding, HttpOnly cookies, XSS filters, injection defenses, CSRF token verification, WAF (ModSecurity), vulnerability scanning.
Encryption and key management: separate key server, asymmetric/symmetric encryption, digital signatures, salting, one‑way hashing.
Content filtering and anti‑spam: text matching, classification algorithms, blacklists.
© Content sourced from the web; rights belong to the original authors. Please contact us for removal if any infringement is found.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.