Optimizations and Extensions for Flink SQL in Tencent Real‑Time Computing Platform
This article, presented by Tencent senior engineer Du Li, details the current state of Flink SQL, compares Jar, Canvas, and SQL modes, introduces window‑function extensions, retract‑stream optimizations, and outlines future roadmap plans for cost‑based optimization and new features in the real‑time computing platform.
The article begins with an overview of Flink job creation methods—Jar mode, Canvas mode, and SQL mode—highlighting their respective advantages, disadvantages, and target users.
It then examines the limitations of traditional SQL windowing and introduces Tencent's extensions, including a new Windowing Table‑Valued Function that enables window‑aligned joins and set operations across one or two streams.
Implementation details cover window propagation, decoupling of event‑time fields from table sources, watermark handling, and usage constraints such as identical window types for both streams.
Two novel window types are described: Incremental Window, which triggers multiple times within a window using a custom trigger, and Enhanced Tumble Window, which retains late data and re‑triggers based on the original window size.
The paper also discusses retract‑stream concepts, scenarios that generate retract messages (e.g., outer joins, rank, over windows), and how Flink processes these messages through intermediate nodes and various sink types (AppendStreamTableSink, RetractStreamTableSink, UpsertStreamTableSink).
Several performance optimizations are presented, such as caching intermediate results to reduce update frequency, bucket‑join strategies to limit memory consumption, and sink‑side caching to lower downstream pressure.
Finally, the future roadmap includes cost‑based optimization with richer statistics, new features like CEP syntax, fine‑grained join and shuffle improvements, and enhanced debugging support via trace logs and metrics.
Example of the extended window syntax:
SELECT * FROM TABLE(TUMBLE(TABLE
, DESCRIPTOR(
),
[,
]))DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.