Databases 14 min read

Inside Twitter’s Manhattan: How a Massive Distributed Database Powers Real‑Time Ads

The article explores Twitter’s Manhattan storage system, detailing its architecture, CAP trade‑offs across various database types, the design of its modular storage engines, high‑performance operations, and the DevOps practices that enable reliable, low‑latency handling of billions of requests in a massive distributed environment.

Efficient Ops

Dec 11, 2017

Inside Twitter’s Manhattan: How a Massive Distributed Database Powers Real‑Time Ads

Various Types of Databases

Twitter’s overall storage architecture consists of four systems: NoSQL for user information and small numbers, a large file system for images and videos, Hadoop for backend data processing, and MySQL for relational queries.

CAP Theorem of Different Databases

CAP theorem states that a distributed system can satisfy at most two of consistency, availability, and partition tolerance. Different databases make trade‑offs accordingly.

Different databases make trade‑offs:

Relational databases prioritize strong consistency, sacrificing availability.

NoSQL offers high concurrency with eventual consistency.

NewSQL such as Google Spanner attempts to satisfy all three properties by reducing partition likelihood, though it cannot truly break CAP.

Specialized databases include time‑series stores (OpenTSDB), document stores (MongoDB), and graph databases (Neo4j).

Manhattan’s Evolution

Twitter originally used Cassandra, which suffered from scaling limits and gossip‑protocol inconsistencies at large node counts. To overcome these issues, Twitter built its own NoSQL system, Manhattan, focusing on reliability, availability, operability, low latency, and efficiency.

Manhattan now runs on over 30,000 nodes, each handling tens of thousands of QPS with a 99th‑percentile latency of 10 ms.

Manhattan Storage Engine Design

Manhattan uses a modular design with interchangeable storage nodes and engines, tailored to different workloads.

Common Data Engines

SeaDB – read‑optimized for batch updates and read‑heavy workloads.

SSTable – write‑optimized, similar to Google’s Bigtable.

RMDBS – traditional relational engine supporting indexes and real‑time updates.

Special‑Purpose Engines

Strong consistency.

Time‑series services.

Secondary indexes.

Manhattan Architecture

Coordinator module packages requests and distributes them across backend nodes, reassembling responses.

Bloom filters quickly eliminate non‑existent keys, improving query efficiency.

Read path for SSTable‑based storage is expensive; optimizations include compaction, two‑level indexing, and Bloom filters.

Writes go directly to memory via a fast commit log.

Reconciliation copies written data across replicas and periodically resolves inconsistencies to achieve final consistency.

Large‑Scale Operations Practices

Twitter handles tens of millions of requests per second across dozens of clusters, making operations challenging.

Introducing DevOps improves efficiency through organizational structure, processes, and tools.

Organization

Integrating development and operations ensures shared responsibility; operations must sign off designs, and developers run their own services for a month after launch.

Process

Newcomers receive training, shadow experienced engineers, then handle incidents independently.

Tools

Self‑service UI reduces communication overhead and provides debugging interfaces.

Deployment pipelines support canary testing, rollback, and staged rollouts to handle data format changes safely.

Topology transition mechanisms enable incremental scaling without full‑stop migrations.

Challenges and Opportunities in Operations

Manual fault diagnosis is painful given billions of metrics; automated anomaly detection and log aggregation help pinpoint issues.

Intelligent operations use machine learning to correlate front‑end and back‑end failures, generate flexible alerts, and build a knowledge base that continuously learns from incidents.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems CAP theorem Storage Engine DevOps Twitter

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.