Performance Comparison of Apache Flink and Apache Storm for Real‑Time Stream Processing
This report presents a systematic performance evaluation of Apache Flink and Apache Storm across multiple real‑time processing scenarios, measuring throughput, latency, message‑delivery semantics, and state‑backend effects, and provides recommendations for selecting the most suitable engine based on the observed results.
1. Background
Apache Flink and Apache Storm are two widely used distributed real‑time computation frameworks. Storm has a mature deployment in Meituan‑Dianping with management platforms, APIs and documentation, while Flink has recently attracted attention for its high throughput, low latency, strong reliability and precise window support.
2. Test Objectives
The goal is to evaluate the performance of Flink and Storm under different scenarios and data pressures, identify performance limits, analyze the impact of configuration on Flink, and provide tuning recommendations for real‑time platforms.
2.1 Test Scenarios
Simple Input‑Output : Minimal processing to isolate framework overhead.
Long‑Running User Job : Simulates complex user logic that increases job execution time.
Windowed Statistics : Counts words in time or count windows to assess window support.
Exactly‑Once Computation : Tests the cost of Flink’s exactly‑once delivery semantics.
2.2 Performance Metrics
Throughput (records/second): measures data processing capacity.
Latency (milliseconds): time from data entry to output.
3. Test Environment
Standalone clusters with one master and two workers were built for both Storm and Flink; some Flink tests also ran on YARN.
3.1 Cluster Parameters
Parameter
Value
CPU
QEMU Virtual CPU version 1.1.2 2.6GHz
Core
8
Memory
16GB
Disk
500G
OS
CentOS release 6.5 (Final)
3.2 Framework Parameters
Parameter
Storm Config
Flink Config
Version
Storm 1.1.0‑mt002
Flink 1.3.0
Master Memory
2600M
2600M
Slave Memory
1600M × 16
12800M × 2
Parallelism
2 supervisors, 16 workers
2 TaskManagers, 16 slots
4. Test Methodology
4.1 Test Flow
Data Generator produces records with incremental IDs and event timestamps into a Kafka topic. Storm and Flink tasks consume from the same offset, process the data, and write results with in‑time and out‑time timestamps to separate Kafka topics. Metrics Collector aggregates throughput and latency every five minutes and stores them in MySQL for analysis.
4.2 Default Parameters
Both frameworks use At‑Least‑Once semantics by default.
Storm enables ACK with a single ACKer.
Flink checkpoint interval is 30 seconds; StateBackend defaults to Memory.
Kafka is sized to avoid being a bottleneck.
4.3 Test Cases
Identity : Simple pass‑through (msgId, eventTime) → (msgId, eventTime, inTime, outTime).
Sleep : Adds a 1 ms sleep after reading each record to simulate long‑running user logic.
Windowed Word Count : Parses JSON sentences, splits into words, performs count windows, and records window statistics.
5. Test Results
5.1 Identity Single‑Thread Throughput
Flink achieves ~350 k records/s, about 3–5× higher than Storm’s ~87 k records/s.
5.2 Identity Single‑Thread Latency
At full throughput, Flink’s median latency is ~50 ms (99th percentile ~300 ms) versus Storm’s ~100 ms (99th ~700 ms).
5.3 Sleep Throughput
With a 1 ms sleep, both frameworks reach ~900 records/s per thread, showing similar throughput.
5.4 Sleep Single‑Thread Latency
Flink still exhibits lower latency than Storm under the Sleep scenario.
5.5 Windowed Word Count Throughput
Flink Standalone processes ~43 k records/s, over three times Storm’s ~12 k records/s.
5.6 Exactly‑Once vs At‑Least‑Once (Flink)
Exactly‑Once reduces throughput by ~6.3 % with negligible latency impact.
5.7 At‑Most‑Once vs At‑Least‑Once (Storm)
At‑Most‑Once improves Storm’s throughput by ~16.8 % with slight latency reduction.
5.8 Windowed Word Count Latency
Flink maintains lower latency than Storm across both out‑time‑minus‑event‑time and out‑time‑minus‑in‑time measurements.
5.9 StateBackend Throughput (Flink)
Memory and FileSystem backends deliver high throughput (3–5× Storm); RocksDB drops to 0.3–0.5× Storm.
5.10 StateBackend Latency (Flink)
Memory and FileSystem show low, similar latency; RocksDB incurs higher latency, especially under YARN.
6. Conclusions and Recommendations
Flink outperforms Storm in both throughput (3–5×) and latency (about half at full load).
When user logic is heavy (e.g., 1 ms sleep), the performance gap narrows.
Exactly‑Once adds modest overhead for Flink, while Storm’s At‑Most‑Once improves throughput but still lags behind Flink.
For stateful or windowed workloads, Flink’s Memory or FileSystem StateBackends are recommended; RocksDB is suited for very large state.
Recommended scenarios for Flink: exactly‑once requirements, high‑volume low‑latency streams, and workloads needing state management or window aggregation.
7. Future Work
Investigate how Exactly‑Once scales with higher parallelism.
Determine the user‑logic latency range where Flink’s advantage remains significant.
Include reliability and scalability metrics in future benchmarks.
Explore performance of Flink’s Table API, SQL, and CEP.
8. References
Distributed Stream Processing Frameworks – Feature Comparison and Performance Evaluation.
Intel‑Hadoop/HiBench: A Big Data Benchmark Suite.
Yahoo! Streaming Benchmark and its extensions.
Source: https://tech.meituan.com/Flink_Benchmark.html
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.