Big Data 6 min read

Zero‑Code Flink: Build StreamGraph, JobGraph & ExecutionGraph via Canvas DAG

This article explains how Flink applications are transformed through StreamGraph, JobGraph, and ExecutionGraph stages, and presents a low‑code canvas approach that lets users assemble DAGs, persist them in a MySQL adjacency list, and generate zero‑code Flink programs using BFS traversal.

JD Cloud Developers

Nov 5, 2024

Zero‑Code Flink: Build StreamGraph, JobGraph & ExecutionGraph via Canvas DAG

Introduction

Submitting a DataStream Flink application goes through StreamGraph, JobGraph, ExecutionGraph stages to generate an executable DAG, which runs on a Flink cluster. Submitting a Flink SQL application follows a similar flow, with an additional step using flink-table-planner to convert SQL to StreamGraph. The following describes a low‑code method to generate StreamGraph and achieve zero‑code Flink development.

1. Flink Concepts

In a Flink program, each operator is called an Operator; processing these operators yields the desired transformed data. The example program includes Source, Filter, Map, and Sink operators.

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream dataStream = env.addSource(new FlinkKafkaConsumer("topic"));
DataStream filteredStream = dataStream.filter(new FilterFunction() {
    @Override
    public boolean filter(Object value) throws Exception {return true;}
});
DataStream mapedStream = filteredStream.map(new MapFunction() {
    @Override
    public Object map(Object value) throws Exception {return value;}
});
mapedStream.addSink(new DiscardingSink());
env.execute("test-job");

StreamGraph

StreamGraph is Flink's logical execution graph that describes the flow of a streaming job, including sources, transformation operators, and sinks, as well as their dependencies and data transfer rules. It is built as a DAG via Flink APIs or DSLs and corresponds one‑to‑one with JobGraph. Vertices (streamNode) represent operators, containing UID, parallelism, slot sharing, etc. Edges (streamEdge) represent data streams. JobGraph is generated from StreamGraph by StreamingJobGraphGenerator.

JobGraph

After optimization by the flink‑optimizer module, StreamGraph is transformed into JobGraph. During JobGraph creation, compatible operators are chained into a single JobVertex, which corresponds to one Task at runtime. Tasks are the basic execution units scheduled on TaskManagers.

ExecutionGraph

The physical execution graph, derived from JobGraph, details task scheduling, execution order, data transfer, and state management. Each Task is split into multiple SubTasks according to its parallelism.

Physical Graph

PhysicalGraph is the runtime representation of ExecutionGraph. Each ExecutionJobVertex maps to one or more ExecutionVertex nodes in the physical graph.

2. Canvas‑Based Implementation Approach

Implementation Process

We adopt a canvas (drag‑and‑drop) mode to assemble Flink programs, enabling reuse of built‑in operators and achieving zero‑code development. Operators are drawn onto the canvas as shown below.

Construct a DAG and persist it. Using the canvas, users build the Flink application; the backend stores nodes (operators: Source, Sink, intermediate operators) in a MySQL table flink_node and edges in flink_relation using an adjacency‑list representation.

Reassemble the Flink job. Initialize StreamExecutionEnvironment, then convert the stored flink_node and flink_edge records into DataStreams and stitch them together into a DataStream API program. For graph traversal we employ BFS with level‑order traversal, which conveniently assembles visited nodes onto their parent nodes.

Conclusion

In practice, additional complexities arise, such as storing parallelism, schema information, and shuffle details in nodes and edges, and supporting custom user‑defined functions (UDFs) when built‑in operators are insufficient.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Flink stream processing DAG Low-Code Development

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.