Big Data 8 min read

Real-Time Computing with Apache Storm: Architecture, Code Samples, and Fault Tolerance

This article explains the principles of real-time computing, compares it with offline batch processing, and demonstrates a practical solution using Kafka for ingestion, Apache Storm for continuous computation, and various storage options, while also covering streaming concepts and Storm's high‑availability mechanisms.

Qunar Tech Salon

Dec 15, 2015

Real-Time Computing with Apache Storm: Architecture, Code Samples, and Fault Tolerance

Continuing from the previous discussion on offline computation, this article introduces real‑time computing, which processes data as soon as it is generated and delivers results within seconds, enabling use cases such as instant hot‑topic detection on large websites.

The proposed solution consists of three layers: (1) data collection via a message queue (e.g., Kafka) that continuously streams search logs; (2) a persistent computation framework (e.g., Apache Storm) that consumes the queue, applies business logic, and emits results; (3) real‑time presentation and storage using in‑memory stores (Redis, MongoDB) or disk‑based databases (HBase, MySQL, SQL Server).

Storm Overview

Storm is a distributed, fault‑tolerant real‑time computation system. Its core abstraction is a topology , analogous to a MapReduce job, which defines the complete data‑flow graph. Unlike Hadoop, Storm’s Spout component can ingest data from any source (Kafka, Redis, databases, etc.), and multiple spouts can be used to separate streams.

Example Spout implementation:

public class CalcPriceSpout : BaseRichSpout {
    private SpoutCollector Collector;
    public override void NexData() {
        // Read from various sources such as Redis, message queues, databases
        Collector.emit("message");
    }
}

The NexData method is repeatedly invoked by Storm, reading messages from Kafka and emitting them to downstream bolts for immediate processing.

Stream Processing

Stream processing treats data like a flowing liquid: each node (Bolt) receives a tuple, performs computation, and forwards the result to the next node, forming a directed acyclic graph. This model is more flexible than the two‑stage MapReduce paradigm.

Example Bolt implementation:

public class CalcProductPriceBolt : BaseRichBolt {
    private BoltCollector Collector;
    public override void Execute(Tuplestring, string> input) {
        //Result = compute something.
        //Collector.Emit("Result"); // send to next node
    }
}

Images illustrate the spout‑bolt data flow and the overall topology.

Fault Tolerance

Storm ensures reliability through an Acker component that tracks the completion of each tuple using unique IDs. If a bolt fails, the originating spout can re‑emit the tuple after a timeout. Similar recovery mechanisms exist for failed ackers, spouts, or entire Storm components, with the Nimbus master handling node restarts.

In summary, unlike Hadoop’s batch‑oriented, static data model, Storm’s streaming architecture allows continuous, low‑latency computation on data from any source, while keeping the processing logic immutable and providing robust fault‑tolerance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

stream processing Kafka fault tolerance Real‑Time Computing Apache Storm

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.