Snapdeal Ads: Architecture and Lessons for Building a Scalable Web System Handling 5 Billion Daily Requests
The article details Snapdeal's Ads platform architecture, key strategies, infrastructure, requirements, and technologies that enable a highly available, low‑latency backend capable of processing billions of daily requests with a small engineering team.
Snapdeal, India’s largest e‑commerce platform, built an Ads system that supports up to 5 billion requests per day, demonstrating how a small team can create a web‑scale service by choosing the right technologies and design principles.
Key Strategies : expand horizontally and vertically; prioritize availability and partition tolerance (AP) over consistency; avoid vendor lock‑in by using open‑source software; apply mechanical sympathy to maximize hardware efficiency; limit latency to under 1 ms using RocksDB and in‑memory stores; use SSDs, avoid virtualization, and leverage large RAM/CPU resources; fine‑tune Nginx and Netty for massive concurrency; keep critical ad data in memory for microsecond‑level access; adopt a share‑nothing architecture; ensure data replication and retain raw backups for several days; tolerate stale or inconsistent data; design a fault‑tolerant messaging system that never loses data.
Current Infrastructure : three data centers with 40–50 nodes, including 30 high‑compute machines (128–256 GB RAM, 24 cores, SSD‑backed); remaining nodes have <32 GB RAM, quad‑core CPUs; 10 Gbps private and public networks; small Cassandra, HBase, and Spark clusters.
Critical Requirements : support HTTP/REST RTB 2.0 requests from multiple bidders; provide Yes/No pricing responses; handle hundreds of thousands of QPS and billions of daily events; process data as close to real‑time as possible.
Key Technologies : HBase and Cassandra for high‑write, near‑real‑time data storage; Java as the primary backend language (with legacy C++/Erlang experience); Google Protobuf for data serialization; Netty for high‑performance non‑blocking servers; RocksDB as an embedded read/write store synchronized via Apache Kafka; Kafka for durable messaging and stream processing; CQEngine for fast in‑memory queries; Nginx as reverse proxy; Apache Spark for ML workloads; Jenkins for CI; Nagios and New Relic for monitoring; Zookeeper for distributed coordination; Bittorrent Sync for cross‑node data replication; a custom quota manager based on Yahoo’s white paper.
System Design and Results : the ad server, built on non‑blocking Netty, processes each HTTP request by querying in‑memory structures via CQEngine, avoiding network or disk latency; the server operates in a share‑nothing fashion with asynchronous communication to other components; it delivers results with 5–15 ms latency, writes raw data asynchronously to Kafka, which feeds HBase, Cassandra, and Spark for further processing and budgeting.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.