Backend Development 37 min read

Scalable Web Architecture and Distributed Systems

This article explains the key design principles, components, and techniques—such as availability, performance, reliability, scalability, cost, redundancy, partitioning, caching, proxies, indexing, load balancing, and queuing—required to build large‑scale, high‑performance, and fault‑tolerant web and distributed systems, illustrated with an image‑hosting example.

Qunar Tech Salon

Jan 16, 2017

Scalable Web Architecture and Distributed Systems

1.1 Design Principles of Web Distributed Systems

Building and operating a scalable web site means distributing resources across many servers so that users can access them over the Internet.

Key principles that influence large‑scale web system design include:

Availability – continuous operation and graceful degradation are essential for reputation and revenue.

Performance – low latency and fast response improve user satisfaction and search rankings.

Reliability – consistent responses and durable data storage are required.

Scalability – the ability to handle increased traffic, storage, or transaction volume.

Manageability – easy operation, diagnostics, and upgrades.

Cost – hardware, software, deployment, maintenance, and developer time must be considered.

These principles often trade off against each other; for example, adding servers improves scalability but raises cost and management complexity.

1.2 Foundations

When designing an architecture, you must choose the right components, integrate them well, and make sensible compromises. Early investment in scalability is rarely wise; thoughtful design saves time and resources later.

The core concerns of most large web applications are services, redundancy, partitioning, and error handling.

Example: Image‑Hosting Application

An image‑hosting service must store unlimited images, provide low‑latency downloads, ensure data durability, be easy to manage, and remain cost‑effective.

Figure 1.1 shows a simplified functional diagram of such a system.

Services

Decoupling functionality into separate services (SOA) with clear interfaces improves scalability and fault isolation. In the example, upload and retrieval can be split into distinct services.

Redundancy

Redundant services and data eliminate single points of failure. Replicating data across geographically separated nodes and running multiple service instances enable failover and high availability.

Figures 1.3 and 1.4 illustrate redundant storage and a partitioned image‑storage design.

Partitioning

Horizontal scaling adds nodes; vertical scaling adds resources to a single node. Partitioning (sharding) distributes data or functionality across multiple servers, allowing independent scaling of hot and cold workloads.

1.3 Building Efficient and Scalable Data Access Modules

Fast data access for large data sets requires caching, proxies, indexes, and load balancing.

Cache

Caches exploit locality by keeping recently accessed data in fast memory. Caches can be placed at the request layer, globally shared, or distributed across nodes.

Figures 1.8, 1.9, 1.10, and 1.11 illustrate various cache placements and designs.

Global Cache

A single cache service serves all nodes, simplifying cache coherence but potentially becoming a bottleneck.

Distributed Cache

Each node holds a portion of the cache, using consistent hashing to locate data. Adding nodes increases total cache capacity.

Figure 1.12 shows a distributed cache topology.

Proxy

Proxies sit between clients and servers, performing request filtering, logging, compression, or request collapsing to reduce duplicate work.

Figures 1.13, 1.14, and 1.15 demonstrate proxy placement and request collapsing.

Index

Indexes map logical queries to physical storage locations, enabling fast look‑ups in massive data sets. Multi‑level indexes and inverted indexes are common for search‑type workloads.

Figures 1.16, 1.17 illustrate index structures.

Load Balancer

Load balancers distribute incoming traffic across a pool of service nodes, supporting algorithms such as round‑robin, random, or resource‑aware selection. They also perform health checks and can act as reverse proxies.

Figures 1.18 and 1.19 show single and multi‑layer load‑balancing setups.

Queue

Queues decouple request submission from processing, allowing asynchronous handling of long‑running tasks and improving reliability through retry mechanisms.

Figures 1.20 and 1.21 contrast synchronous request handling with queue‑based asynchronous processing.

1.4 Conclusion

Designing systems for fast, large‑scale data access is challenging but supported by many proven tools and patterns. This article covered only a subset of techniques; the field continues to evolve with new innovations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Caching Web Performance scalable architecture redundancy

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.