Big Data 6 min read

Zero‑Code Flink: Build StreamGraph, JobGraph & ExecutionGraph via Canvas DAG

This article explains how Flink applications are transformed through StreamGraph, JobGraph, and ExecutionGraph stages, and presents a low‑code canvas approach that lets users assemble DAGs, persist them in a MySQL adjacency list, and generate zero‑code Flink programs using BFS traversal.

JD Cloud Developers
JD Cloud Developers
JD Cloud Developers
Zero‑Code Flink: Build StreamGraph, JobGraph & ExecutionGraph via Canvas DAG

Introduction

Submitting a DataStream Flink application goes through StreamGraph, JobGraph, ExecutionGraph stages to generate an executable DAG, which runs on a Flink cluster. Submitting a Flink SQL application follows a similar flow, with an additional step using

flink-table-planner

to convert SQL to StreamGraph. The following describes a low‑code method to generate StreamGraph and achieve zero‑code Flink development.

1. Flink Concepts

In a Flink program, each operator is called an Operator; processing these operators yields the desired transformed data. The example program includes Source, Filter, Map, and Sink operators.

<code>StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream dataStream = env.addSource(new FlinkKafkaConsumer("topic"));
DataStream filteredStream = dataStream.filter(new FilterFunction() {
    @Override
    public boolean filter(Object value) throws Exception {return true;}
});
DataStream mapedStream = filteredStream.map(new MapFunction() {
    @Override
    public Object map(Object value) throws Exception {return value;}
});
mapedStream.addSink(new DiscardingSink());
env.execute("test-job");
</code>

StreamGraph

StreamGraph is Flink's logical execution graph that describes the flow of a streaming job, including sources, transformation operators, and sinks, as well as their dependencies and data transfer rules. It is built as a DAG via Flink APIs or DSLs and corresponds one‑to‑one with JobGraph. Vertices (streamNode) represent operators, containing UID, parallelism, slot sharing, etc. Edges (streamEdge) represent data streams. JobGraph is generated from StreamGraph by StreamingJobGraphGenerator.

StreamGraph diagram
StreamGraph diagram

JobGraph

After optimization by the flink‑optimizer module, StreamGraph is transformed into JobGraph. During JobGraph creation, compatible operators are chained into a single JobVertex, which corresponds to one Task at runtime. Tasks are the basic execution units scheduled on TaskManagers.

JobGraph diagram
JobGraph diagram

ExecutionGraph

The physical execution graph, derived from JobGraph, details task scheduling, execution order, data transfer, and state management. Each Task is split into multiple SubTasks according to its parallelism.

ExecutionGraph diagram
ExecutionGraph diagram

Physical Graph

PhysicalGraph is the runtime representation of ExecutionGraph. Each ExecutionJobVertex maps to one or more ExecutionVertex nodes in the physical graph.

2. Canvas‑Based Implementation Approach

Implementation Process

We adopt a canvas (drag‑and‑drop) mode to assemble Flink programs, enabling reuse of built‑in operators and achieving zero‑code development. Operators are drawn onto the canvas as shown below.

Canvas mode illustration
Canvas mode illustration

Construct a DAG and persist it. Using the canvas, users build the Flink application; the backend stores nodes (operators: Source, Sink, intermediate operators) in a MySQL table

flink_node

and edges in

flink_relation

using an adjacency‑list representation.

Reassemble the Flink job. Initialize

StreamExecutionEnvironment

, then convert the stored

flink_node

and

flink_edge

records into DataStreams and stitch them together into a DataStream API program. For graph traversal we employ BFS with level‑order traversal, which conveniently assembles visited nodes onto their parent nodes.

Conclusion

In practice, additional complexities arise, such as storing parallelism, schema information, and shuffle details in nodes and edges, and supporting custom user‑defined functions (UDFs) when built‑in operators are insufficient.

big dataFlinkStream ProcessingDAGlow-code development
JD Cloud Developers
Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.