Performance Comparison of Apache Flink and Apache Storm for Real-Time Stream Processing
This article presents a comprehensive performance evaluation of Apache Flink versus Apache Storm across multiple real‑time processing scenarios, measuring throughput, latency, and the impact of different configurations and delivery semantics to guide framework selection and optimization.
Background
Apache Flink and Apache Storm are two widely used distributed real‑time computation frameworks. Storm has been maturely deployed at Meituan‑Dianping, while Flink has recently attracted attention for its high throughput, low latency, strong reliability, and precise window support.
Test Objectives
The goal is to assess Flink and Storm performance under various data pressures and scenarios, identify limits, evaluate configuration impacts, and provide tuning recommendations for resource planning and SLA construction.
Test Scenarios
Input‑Output Simple Processing : isolates framework overhead by using a minimal processing pipeline.
Long‑Running User Jobs : simulates complex user logic that accesses external components, increasing job execution time.
Windowed Statistics : measures performance of time‑based or count‑based window aggregations, a common use case.
Exactly‑Once Computation : compares at‑least‑once (Storm) with exactly‑once (Flink) delivery semantics.
Performance Metrics
Throughput (records/second) and latency (milliseconds) are collected.
Test Environment
Standalone clusters with one master and two workers were built for each framework; some Flink tests also ran on YARN. Cluster and framework parameters are shown in the original diagrams.
Test Methodology
Data Generator writes timestamped messages to a Kafka topic. Storm and Flink tasks consume from the same offset, record in‑time and out‑time, and write results to separate Kafka topics. A Metrics Collector aggregates per‑five‑minute windows and stores averages, medians, and 99th‑percentile values in MySQL for analysis.
Test Cases
Identity : simple pass‑through processing, measuring raw framework throughput.
Sleep : adds a 1 ms pause after reading each record to emulate long‑running user logic.
Windowed Word Count : parses JSON sentences, splits into words, counts within a window, and records timestamps; also used to test exactly‑once semantics.
Results
Identity Throughput
Flink achieves ~350 k records/s versus Storm’s ~87 k records/s (3–5× higher).
Identity Latency
At full throughput, Flink’s median latency is ~50 ms (99th‑percentile ~300 ms) versus Storm’s ~100 ms (99th‑percentile ~700 ms).
Sleep Throughput & Latency
Both frameworks reach ~900 records/s per thread; Flink still shows lower latency.
Windowed Word Count
Flink’s standalone throughput ~43 k records/s, over three times Storm’s ~12 k. Exactly‑once reduces Flink throughput by ~6.3% with negligible latency impact.
State Backend Comparison
Memory and FileSystem backends deliver similar throughput; RocksDB reduces throughput to about one‑tenth and slightly increases latency.
Conclusions & Recommendations
Flink outperforms Storm in raw throughput and latency, especially for high‑volume, exactly‑once, and windowed workloads. When user logic dominates (e.g., 1 ms sleep), the performance gap narrows. For scenarios requiring exactly‑once guarantees, high throughput, or complex stateful/windowed processing, Flink is the preferred choice.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.