Fundamentals 12 min read

Key Characteristics and Design Principles of Distributed Systems

The article explains the historical background, core characteristics such as scalability, cost‑efficiency, fault tolerance, and elasticity, and outlines essential design principles for distributed systems, emphasizing low hardware requirements, horizontal scaling, avoidance of single points of failure, reduced communication overhead, and stateless services.

Qunar Tech Salon

Aug 16, 2015

Key Characteristics and Design Principles of Distributed Systems

Distributed systems are not a new concept; they appeared in the 1970s‑80s, but gained prominence in the Internet era, especially through Google’s extensive use of systems like Borg, MapReduce, and BigTable. Open‑source projects such as Hadoop, Spark, and Mesos have made large‑scale data processing accessible to many enterprises.

1. Characteristics of Distributed Systems

The most important characteristic is scalability: the ability to expand by adding more servers to meet growing business demands, such as high‑concurrency mobile Internet applications that require handling massive user requests and data volumes.

The core idea is to let multiple servers cooperate to accomplish tasks that a single server cannot handle, especially high‑concurrency or large‑data workloads. Distributed systems consist of loosely coupled independent servers connected via a fast internal network; because network overhead dominates performance, designs aim to minimize inter‑node communication, and individual nodes can be modest‑performance PCs.

They are cost‑effective: clusters of inexpensive PCs can achieve or surpass mainframe performance at a fraction of the cost, while software provides fault tolerance to compensate for lower hardware reliability.

They enable elastic scaling at the application‑service level, allowing the number of service instances to dynamically increase or decrease with workload fluctuations, something that pure IaaS resources cannot fully achieve.

2. Design Principles of Distributed Systems

Below are several design principles:

1. Low hardware requirements

Two aspects:

Hardware reliability is not required; failures are tolerated by software.

High‑performance hardware is unnecessary because the bottleneck is network communication, not CPU or memory speed.

Consequently, large data centers use many cheap PCs rather than a few high‑end servers, as exemplified by Google’s custom‑designed low‑cost servers.

2. Emphasis on horizontal scalability (Scale‑Out)

Horizontal scaling adds more servers to increase overall capacity, offering a much larger scaling ceiling than vertical scaling, which is limited by the physical limits of a single machine.

For example, a 10‑node cluster scaled to 100 nodes can achieve roughly ten‑fold performance improvement, and Google’s data‑center cells contain around twenty‑thousand servers each.

3. No Single Point of Failure

Services must run multiple instances across different nodes, and data must be replicated, so that the failure of a single server does not render the service or data unavailable.

By keeping each server’s load moderate and avoiding full‑load operation, overall cluster stability is improved.

4. Minimize inter‑node communication overhead

Since network latency is the main performance bottleneck, placing computation close to the data (e.g., Hadoop MapReduce) reduces data transfer and improves efficiency.

5. Prefer stateless application services

Stateless services store state externally (e.g., in Redis or Memcached), allowing any instance to be restarted without losing data, which enhances reliability and simplifies recovery after server failures.

In summary, distributed systems are the preferred platform for enterprise applications in the big‑data era, offering excellent horizontal scalability, low hardware requirements, and true elastic scaling at the service level.

Author Introduction

Wang Pu , B.Sc. in Mechanics (Beihang University, 2002), M.Sc. in Computer Science (Peking University, 2007), Ph.D. in Computer Science (George Mason University, 2011). Research interests include machine learning and data mining; former engineer at StumbleUpon, Groupon, and Google. Founder of Shuren Technology (2014), focusing on end‑to‑end big‑data analytics solutions.

Source: InfoQ

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Scalability fault tolerance horizontal scaling stateless services

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.