Big Data 12 min read

Practice Experience of Dada Group's Real-Time Computation SQLization Using Dada Flink SQL

This article details Dada Group's development of the Dada Flink SQL engine, describing its background, architecture, parser design, dimension‑table join strategies, numerous enhancements such as HA support, Kafka keyword handling, metadata integration, Redis and ClickHouse sinks, BINLOG simplification, and future migration plans toward Flink 1.10.

Dada Group Technology

Apr 15, 2020

Practice Experience of Dada Group's Real-Time Computation SQLization Using Dada Flink SQL

In 2018, Dada Group had a mature offline data platform but needed real‑time computation; they adopted the open‑source Flink Stream SQL (FSL) and built their own Dada Flink SQL (DFL) engine to enable SQL‑based stream processing without writing Java or Scala code.

The DFL architecture consists of a launcher, core module, source/sink plugins, a Flink Siddhi rule engine, and side plugins. The launcher submits jobs to Flink clusters (session or single‑job mode), while the core parses SQL into a SqlTree, loads appropriate plugins, and registers them with the Flink TableEnvironment, supporting both dimension‑table JOIN and INTERVAL JOIN.

The parser implements a flexible IParser interface with methods such as match, verifySyntax, and parserSql, allowing easy extension for new SQL syntax, including Flink Siddhi support.

DFL provides two dimension‑table JOIN implementations: ALL (eagerly loads data into memory with optional refresh) and SIDE (lazy loading with optional caching), realized through the abstract classes AllReqRow and AsyncReqRow that share the ISideReqRow interface.

Key enhancements include:

HA support for Flink session mode by specifying high-availability.cluster-id in YARN.

Workaround for Flink 1.6.2 bug that disallows SQL keywords as JSON field names in Kafka sources, by decoupling source field names from column names via an optional sourceName metadata.

A metadata management system inspired by Hive, enabling registration of heterogeneous sources (Kafka, HBase, Elasticsearch, Redis, MySQL, ClickHouse) and simplifying CREATE TABLE statements through USE TABLE … AS … WITH ().

Support for Redis hash and set data types, adding setnx, hsetnx, and TTL handling.

A ClickHouse sink implementation using RichSinkFunction and CheckpointedFunction, with a ClickhouseMapper<IN> interface to map input records to Row objects and schema discovery via DESC queries.

BINLOG simplification via a new USE BINLOG TABLE syntax that expands to the necessary Flink SQL for MySQL binlog processing.

Since its launch, DFL has been adopted across Dada Group’s data applications, running over 70 real‑time tasks covering delivery, e‑commerce, and traffic modules, with the proportion of SQL‑based tasks continuously increasing.

Future plans involve migrating to Flink 1.10 to leverage native Table/SQL features, reducing custom component maintenance, and deploying the upgraded Flink on the company’s private cloud to align with the broader shift toward cloud‑native infrastructure.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Flink ClickHouse SQL Engine Real‑Time Computing Metadata Management Stream SQL

Written by

Dada Group Technology

Sharing insights and experiences from Dada Group's R&D department on product refinement and technology advancement, connecting with fellow geeks to exchange ideas and grow together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.