Big Data 8 min read

Streaming Prediction System Construction and Real‑time Feature Templatization

The article describes how a Flink‑based streaming prediction platform was built to flatten peak request loads, reduce latency and memory use, and improve stability by deduplicating SDK calls, incrementally loading Hive features, partitioned caching, and comprehensive monitoring, while a templating system automates feature definition, SQL generation and stress testing, enabling real‑time supply‑demand forecasting that outperforms offline methods.

HelloTech
HelloTech
HelloTech
Streaming Prediction System Construction and Real‑time Feature Templatization

Why Build Streaming Prediction

New decision‑making scenarios are increasingly invoked via message middleware such as Kafka, generating massive request volumes (thousands to tens of thousands of QPS). Streaming prediction can flatten peak QPS, reduce machine cost while preserving real‑time guarantees, simplify costly decision‑service deployments, and avoid the cumbersome code‑centric integration process.

Key Challenges of Streaming Prediction

1. Prediction at scale: large data streams make per‑request decision service calls expensive. 2. Feature retrieval: high‑frequency feature queries (e.g., HBase) require concurrent access to meet latency requirements. 3. Stability: need monitoring for data delay, task failures, and call metrics.

Flink‑Based Local Decision Invocation

The team wrapped the decision SDK into a Flink UDF, allowing the SDK to run locally within Flink jobs. Initial tests showed low QPS capacity because the SDK was repeatedly invoked for each record. Two solutions were explored: (1) modify Flink’s code‑generation to reuse SDK calls (impactful on other jobs), and (2) add a lightweight pass‑through UDTF to deduplicate calls, sacrificing minimal performance but solving the duplication issue.

Local Feature Query

Real‑time features from Kafka are complemented by offline features cached via a Hive connector. The original approach pulled full Hive tables at fixed intervals, hurting real‑time performance. The connector was rewritten to pull incremental data based on processing time (PT) and update asynchronously, preserving low latency.

Memory usage was further reduced by switching from full‑table caching to a partitioned cache: each Flink subtask loads only the required dimension tables, and large dimension tables are shuffled by join key. Storage format was also changed from RowData to raw byte arrays, cutting memory consumption to roughly one‑third for 100 k records.

Stability Construction

Existing AI platform monitoring (OpenTSDB + Elasticsearch) was reused. All request traces are logged to OpenTSDB for alerting and to Elasticsearch for replay and detailed inspection of input/output parameters.

Real‑time Feature Templatization

Three motivations drive templating: (1) 76 % of real‑time feature logic can be expressed via templates, reducing configuration effort and speeding up deployment; (2) it lowers engineering overhead by providing ready‑made templates and automated pressure testing; (3) underlying SQL is auto‑generated per template, optimizing Flink execution.

The previous workflow required manual data‑source definition, feature development, storage definition, and publishing. After templating, users configure features via a template UI, and the system automatically generates the necessary SQL and performs automated stress testing.

Template‑Based Optimizations

Examples include: (1) aggregating daily product exposure counts via a timed SQL template, reducing write pressure on downstream databases; (2) handling distinct‑count aggregation for daily unique logins by splitting the distinct operation, mitigating data skew and resource waste.

Use‑Case: Supply‑Demand Forecasting

Traditional offline forecasting (T+1, 24‑hour horizon) caused negative‑revenue dispatches due to volatile site traffic. By switching to online streaming prediction, the system can continuously update net outflow forecasts, improving dispatch decisions.

big dataFlinkAIKafkaReal-time FeaturesStreaming PredictionTemplate Optimization
HelloTech
Written by

HelloTech

Official Hello technology account, sharing tech insights and developments.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.