Big Data 19 min read

Optimizations and Extensions for Flink SQL in Tencent Real‑Time Computing Platform

This article, presented by Tencent senior engineer Du Li, details the current state of Flink SQL, compares Jar, Canvas, and SQL modes, introduces window‑function extensions, retract‑stream optimizations, and outlines future roadmap plans for cost‑based optimization and new features in the real‑time computing platform.

DataFunTalk

Feb 7, 2021

Optimizations and Extensions for Flink SQL in Tencent Real‑Time Computing Platform

The article begins with an overview of Flink job creation methods—Jar mode, Canvas mode, and SQL mode—highlighting their respective advantages, disadvantages, and target users.

It then examines the limitations of traditional SQL windowing and introduces Tencent's extensions, including a new Windowing Table‑Valued Function that enables window‑aligned joins and set operations across one or two streams.

Implementation details cover window propagation, decoupling of event‑time fields from table sources, watermark handling, and usage constraints such as identical window types for both streams.

Two novel window types are described: Incremental Window, which triggers multiple times within a window using a custom trigger, and Enhanced Tumble Window, which retains late data and re‑triggers based on the original window size.

The paper also discusses retract‑stream concepts, scenarios that generate retract messages (e.g., outer joins, rank, over windows), and how Flink processes these messages through intermediate nodes and various sink types (AppendStreamTableSink, RetractStreamTableSink, UpsertStreamTableSink).

Several performance optimizations are presented, such as caching intermediate results to reduce update frequency, bucket‑join strategies to limit memory consumption, and sink‑side caching to lower downstream pressure.

Finally, the future roadmap includes cost‑based optimization with richer statistics, new features like CEP syntax, fine‑grained join and shuffle improvements, and enhanced debugging support via trace logs and metrics.

Example of the extended window syntax:

SELECT * FROM TABLE(TUMBLE(TABLE <data>, DESCRIPTOR(<timecol>), <size> [, <offset>]))

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Flink SQL window-functions Retract Stream

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.