Big Data 10 min read

Real-Time Order Statistics with Apache Flink in a Data Aggregation Platform

This article explains how the data aggregation platform adopts Apache Flink for high‑throughput, low‑latency stream processing, covering the complete workflow from data source integration, transformation operations, windowing and time concepts, to a concrete order‑count example with custom aggregation logic.

JD Tech Talk
JD Tech Talk
JD Tech Talk
Real-Time Order Statistics with Apache Flink in a Data Aggregation Platform

Abstract – As internet companies shift from coarse‑grained to fine‑grained operations, real‑time access to business data becomes a frequent demand. The platform "ShuJuLi" adopts Apache Flink to provide a high‑throughput, low‑latency, reliable stream‑processing engine.

Statistics Flow – The overall stream‑processing pipeline consists of three steps: data source ingestion, multiple transformation operations (filter, split, aggregation, etc.), and storage of the computation results. Multiple sources can feed into a node, and a node can forward its output to one or many downstream nodes.

Flink Data Source – Implement a custom source by extending RichParallelSourceFunction and overriding three methods. Example (translated): public void open(Configuration parameters) // called during initialization public void run(SourceContext<T> ctx) // emit data via ctx.collect(); // this example emits an Order object every 20 seconds

Data Transformation Operations – Flink provides a rich set of functions such as map (merge or split strings), flatMap (split a record into multiple records), filter , keyBy (SQL‑like GROUP BY), aggregate (count, sum, avg), reduce , join , and connect . These are illustrated in the original diagrams.

Window Concepts – Flink supports various window types to segment continuous streams: Time Window (based on elapsed time), Count Window (based on element count), Tumbling Windows (non‑overlapping fixed size), Sliding Windows (overlapping with a slide interval), and Session Windows (gap‑based, merged when activity resumes). Images in the source show the visual differences.

Time Concepts – Flink distinguishes three timestamps: Processing Time (system time when the transformation runs), Event Time (the business occurrence time embedded in each record), and Ingestion Time (system time when the record enters Flink). When using Event Time, Watermarks are required to bound out‑of‑order data; the example sets a 1‑minute watermark.

Order‑Statistics Example – The article walks through six steps: (1) build the execution environment, (2) add a custom data source that emits an Order every 20 seconds with parallelism = 1, (3) validate data, (4) assign timestamps and Watermarks (Event Time + 1 minute watermark), (5) key the stream by the biz field, and (6) define a 1‑minute tumbling Event‑Time window and apply an OrderSumAggregator . Images illustrate each step.

Aggregation Logic – The core aggregator implements Flink’s AggregateFunction interface with four methods: ACC createAccumulator(); // create a container for statistics ACC add(IN in, ACC acc); // called for each element entering the window OUT getResult(ACC acc); // produce the final result when the window fires ACC merge(ACC acc1, ACC acc2); // merge two accumulators when windows are combined The container can reside in memory or external storage.

Conclusion – Flink offers a powerful and flexible window API that covers most daily data‑transformation needs, though documentation is still limited and learning relies heavily on the official API and source code. The platform thanks its users and commits to further improving its data tools.

Big DataFlinkStream ProcessingApache Flinkdata aggregationWindowingevent time
JD Tech Talk
Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.