Backend Development 21 min read

Snowball Architecture Overview and Optimization Practices

This article presents a detailed overview of Snowball's backend architecture, technology stack, service decomposition, and the series of performance and scalability optimizations—including quote server, IM service, database caching, monitoring, and deployment strategies—implemented to support rapid growth and market volatility.

Architect
Architect
Architect
Snowball Architecture Overview and Optimization Practices

Snowball is a fintech platform with fewer than 100 employees (about half engineers) that provides news, stock quotes, and a range of financial products to over one million daily active users, handling roughly 4 billion API calls per day.

The overall architecture follows a typical mobile‑internet startup design: three client tiers (web, Android, iOS) feed into an internal data‑center where Java, Scala, Akka, Finagle, Node.js, Docker and Hadoop are used; services are exposed via an Nginx‑fronted API layer, a legacy monolith called snowball , a high‑performance quote server, an IM service built on Netty + Akka, and various business services. Infrastructure components include Redis, MySQL, MQ, ZooKeeper, HDFS and Docker containers.

During periods of market turbulence and large promotional events, Snowball faced spikes such as 30‑plus‑times normal API traffic, 5 w+ QPS on the quote server, and 5 w/s push messages from IM. To cope, the team isolated the quote server, moved data into in‑memory stores, introduced Hazelcast replication, hardened JVM parameters, and shifted heavy DB writes to asynchronous pipelines.

Key optimizations include a hybrid native‑H5 app framework for rapid releases, end‑to‑end service‑quality monitoring that attaches lightweight request metadata, redesign of the quote service to use a dedicated server with local memory, an IM architecture where each online client has a dedicated Akka actor, reduction of log overhead by bypassing Akka’s log adapter, migration from Zabbix to OpenFalcon monitoring, and a gradual move from RabbitMQ to Kafka for event handling.

The team emphasizes solving bottlenecks with minimal changes, enforcing core metrics (QPS, p99 latency, error rate), keeping the tech stack consistent and simple, preferring cache over DB and async over sync, and conducting thorough code reviews that stress scalability, fault‑tolerance, and operational ownership.

The Q&A section covers details such as IM protocol design, idempotent failure‑retry strategies, config‑center implementation, why Finagle (Thrift) was chosen over other RPC frameworks, the shift from RabbitMQ to Kafka, and deployment practices for high‑availability services.

backendperformanceArchitectureMicroservicesscalabilityAkkafinagle
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.