Real-Time Computation Platform Based on Storm and StreamCQL: Architecture, CQL Integration, and Development Guide
This article introduces a real‑time computation platform built on Apache Storm, explains its low‑latency, high‑throughput design, details the integration of Continuous Query Language (CQL) via StreamCQL, showcases development workflows, code examples, and two typical business use cases, and outlines future directions.
0x01 Business Pain Points When data cleaning, log analysis, user tracking, real‑time reporting, monitoring, and price calculation involve massive data volumes with strict latency requirements, developers face difficulties such as high‑throughput, low‑latency tuning and long development cycles. To address this, a real‑time computation platform was created to simplify models and lower integration barriers.
0x02 Platform Model The platform is built on Apache Storm, chosen for its low latency, high performance, distributed scalability, fault tolerance, and at‑least‑once message guarantees. Storm provides a simple programming model similar to MapReduce, automatic fault handling, horizontal scaling across threads, processes, and servers, fast processing via ØMQ, and a local mode for rapid development and testing.
Storm processes data as an infinite stream of tuples. Sources (spouts) emit tuples that flow through bolts for transformation. Input and output sources are unrestricted; any data source (Kafka, ActiveMQ, databases, files, web services) can be used by implementing the appropriate adapters.
0x03 Integrating CQL The platform’s core uses a graphical drag‑and‑drop interface combined with Continuous Query Language (CQL) to describe real‑time computation tasks, which are then compiled into Storm topologies. Unlike traditional SQL, CQL adds window semantics, allowing data to remain in memory for fast, large‑scale calculations. Open‑source CQL implementations (Esper, Siddhi, Squall) require additional client code, making them cumbersome; StreamCQL, an open‑source project from Huawei, provides a more complete solution.
Key StreamCQL concepts:
Stream : an unbounded collection of elements (tuples) each associated with a logical timestamp.
Window : a bounded view of a stream (time‑based or row‑based) that can slide or tumble, enabling relational‑style queries.
Expression : a combination of symbols, operators, constants, variables, or functions that evaluates to a value.
An example CQL task reads from a Kafka input stream s1 , writes to a Kafka output stream s2 , and finally outputs to a console stream s4 . The example notes that the Kafka input uses the high‑level consumer API, with the fromBeginning parameter controlling offset reset behavior.
StreamCQL’s architecture consists of three components: cql , streaming , and adapter . The execution pipeline includes syntax analysis, CQL parsing for input/output streams, lazy task handling, INSERT processing, SET/SUBMIT handling, operator construction, JAR merging, semantic analysis, physical plan generation, and submission to Storm.
0x04 Platform Development Features implemented include:
Graphical drag‑and‑drop configuration for real‑time tasks.
Support for CQL‑based and code‑based task submission.
Storm Metrics + Graphite for task‑level monitoring.
Alerts for data backlog and failures.
JsonSerDe for serialization (StreamCQL).
Sinks for HDFS, Elasticsearch, HTTP, MySQL/SQL Server (StreamCQL).
Custom SimpleSerDe with dynamic column support.
Compatibility with Storm 1.0.1.
Fixes for case‑insensitive field handling.
Example: developing a new Elasticsearch sink requires extending InnerOutputSourceOperator :
public class EsOutputOperator extends InnerOutputSourceOperator {
@ConfigAnnotation(StreamingConfig.OPERATOR_ES_NODES)
private String nodes;
@ConfigAnnotation(StreamingConfig.OPERATOR_ES_CLUSTERNAME)
private String clustername;
}In PhysicalPlanLoader.setInputOutputStreamAlias() the following alias mappings are added:
ALIAS_MAPPING.put("InputOperator", InputStreamOperator.class);
ALIAS_MAPPING.put("OutputOperator", OutputStreamOperator.class);
ALIAS_MAPPING.put("KafkaInput", KafkaInputOperator.class);
ALIAS_MAPPING.put("KafkaOutput", KafkaOutputOperator.class);
ALIAS_MAPPING.put("RandomGenInput", RandomGenInputOperator.class);
ALIAS_MAPPING.put("ConsoleOutput", ConsoleOutputOperator.class);
ALIAS_MAPPING.put("HttpOutput", HttpOutputOperator.class);
ALIAS_MAPPING.put("HdfsOutput", HdfsOutputOperator.class);
ALIAS_MAPPING.put("EsOutput", EsOutputOperator.class);Similarly, operator mappings are added to CQLSimpleLexerMapping :
MAPPING.put("SimpleSerDe", SimpleSerDe.class.getName());
MAPPING.put("JsonSerDe", JsonSerDe.class.getName());
MAPPING.put("CsvSerDe", CSVSerDe.class.getName());0x06 Business Integration Two typical scenarios demonstrate the platform’s capabilities:
Scenario 1 – ETL via Drag‑and‑Drop : Consume data from a Kafka topic, filter based on conditions, and persist results to HDFS and Elasticsearch.
Scenario 2 – Real‑Time Monitoring via CQL : Consume monitoring data from Kafka, forward filtered records to a downstream topic, store raw data in HDFS, and compute per‑minute aggregates that trigger SMS alerts when thresholds are exceeded.
Configuration screenshots and generated topologies illustrate the end‑to‑end workflow.
The platform provides comprehensive metrics (latency, throughput, GC, memory usage) for precise task monitoring and debugging.
0x07 Future Outlook
The release event was a success, with enthusiastic participation and follow‑up discussions. The team, part of the Big Data R&D department since 2012, has evolved from early HBase recommendation systems to the comprehensive "Data Galaxy" one‑stop big‑data service platform, offering data ingestion, cleaning, computation, analysis, and machine‑learning capabilities. Their goal is to commercialize this platform.
Author: Data Platform Technical Representative – Li Su Xing
Tongcheng Travel Technology Center
Pursue excellence, start again with Tongcheng! More technical insights to help you along your journey and make development enjoyable.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.