Big Data 9 min read

Optimizing Flink‑Storm for Large‑Scale Storm Task Migration on the 58 Real‑Time Computing Platform

This article describes how the 58 real‑time computing platform optimized the Flink‑Storm beta tool and implemented large‑scale, smooth migration of Storm jobs to Flink, covering background, architecture, platform‑level enhancements, YARN runtime support, deployment, and user‑side integration.

58 Tech
58 Tech
58 Tech
Optimizing Flink‑Storm for Large‑Scale Storm Task Migration on the 58 Real‑Time Computing Platform

Introduction Flink‑Storm is a beta tool provided by Apache Flink to run Storm programs on Flink; it was removed after Flink 1.8. This article explains how the 58 real‑time computing platform optimized Flink‑Storm and used it to smoothly migrate massive Storm tasks to Flink in a production scenario.

Background The 58 real‑time computing platform supplies stable, high‑performance streaming services for various business units, originally built on Storm and Spark Streaming. Storm suffered from insufficient throughput and multi‑cluster operational overhead, while Spark Streaming could not meet low‑latency requirements. After evaluating Flink’s architectural, performance, and stability advantages, the platform adopted Flink as the next‑generation engine and built a one‑stop high‑performance platform (Wstream) supporting Flink JARs, Stream SQL, and Flink‑Storm.

Alongside the platform upgrade, a migration plan was launched to move existing Storm jobs to Flink, aiming to improve overall efficiency, reduce hardware costs, and lower operational burden.

Storm vs. Flink

Although Flink can compatibly run Storm topologies, migration still faces three main challenges: (1) learning curve for users, (2) development effort to rewrite jobs on Flink, and (3) limitations of Stream‑SQL for complex scenarios. Therefore, the team chose Flink‑Storm to preserve existing Storm code while ensuring stable migration.

Flink‑Storm Principle

1. Build the original Storm topology using TopologyBuilder . 2. Convert the Storm topology to a Flink streaming dataflow via FlinkTopology.createTopology(builder) . 3. SpoutWrapper turns a Storm spout into a RichParallelSourceFunction , mapping output fields to Flink TypeInformation . 4. BoltWrapper converts bolts into Flink operators, with grouping logic translated to appropriate DataStream operations. 5. After constructing the FlinkTopology , a StreamExecutionEnvironment generates a StreamGraph , which is turned into a JobGraph and submitted to the Flink runtime.

Practice

Platform Layer

Version Compatibility The production environment runs Apache Flink 1.6. The Flink‑Storm module was built against Storm 1.0, while the platform operates Storm 0.9.5 and 1.2. For Storm 1.2 jobs, the Storm‑1.0 API is fully compatible, requiring only a dependency upgrade to storm‑core 1.2 . For Storm 0.9.5 jobs, the API is incompatible, so the storm‑core dependency must be downgraded to 0.9.5 and package paths adjusted throughout the module.

Rebuilding the Flink‑Storm package is done with:

mvn clean package -Dmaven.test.skip=true -Dcheckstyle.skip=true

Feature Enhancements

1. Semantic Guarantees – Storm’s ack mechanism provides at‑least‑once guarantees, which were not originally ported to Flink‑Storm. By leveraging Flink checkpoints, the team implemented at‑least‑once semantics for Flink‑Storm jobs while preserving at‑most‑once behavior where needed.

2. Tick Tuple Support – Tick tuples enable periodic actions such as time‑outs and bolt timers. The platform added identical configuration ( topology.tick.tuple.freq.secs ) to Flink‑Storm, allowing users to enable tick tuples just as in native Storm.

3. AllGrouping (Broadcast) under Multiple Inputs – Previously, Flink‑Storm treated multiple‑input AllGrouping as a rebalance operation. The optimization now correctly maps it to a broadcast, matching Storm’s semantics.

Runtime Flink‑Storm originally supports only local and standalone modes, lacking YARN submission. Since the platform’s Flink cluster runs on YARN for unified resource management, a custom YARN client was implemented. The client builds the Flink JobGraph , configures YARN resources, and interacts with the ResourceManager and ApplicationMaster to submit and monitor jobs.

Task Deployment To simplify job submission and integration with the Wstream platform, a command‑line style interface similar to native Flink was provided.

User Layer

1. Maven Dependency – Compiled Flink‑Storm packages are uploaded to the company’s private Maven repository for users to pull the appropriate version.

2. Code Changes – Users only need to switch the submission command from Storm to Flink; no other code modifications are required.

Conclusion By optimizing and employing Flink‑Storm, the platform successfully migrated multiple Storm clusters, achieving over 40% reduction in compute resources while maintaining real‑time performance and throughput. Consolidating workloads on a YARN‑managed Flink cluster eliminated the need for separate Storm clusters, improving overall resource utilization and reducing operational effort.

migrationFlinkStream ProcessingReal-time Analyticsyarnstorm
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.