Big Data 5 min read

Low-Code Generation of Flink StreamGraph, JobGraph, and ExecutionGraph

This article explains how to generate Flink's StreamGraph, JobGraph, and ExecutionGraph using a low‑code canvas approach, detailing the underlying concepts, the transformation pipeline from DataStream to DAG, and providing Java code examples for building and assembling operators via drag‑and‑drop.

JD Tech Talk

Nov 5, 2024

Low-Code Generation of Flink StreamGraph, JobGraph, and ExecutionGraph

Submitting a DataStream Flink application goes through StreamGraph, JobGraph, and ExecutionGraph stages to produce an executable DAG, while a Flink SQL application adds an extra step using the flink-table-planner module to convert SQL to StreamGraph.

In Flink, each operator (Source, Filter, Map, Sink) forms part of the logical execution graph (StreamGraph). The article shows a Java example that creates a StreamExecutionEnvironment, adds a Kafka source, applies filter and map functions, and finally adds a sink.

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream dataStream = env.addSource(new FlinkKafkaConsumer("topic"));
DataStream filteredStream = dataStream.filter(new FilterFunction() {
    @Override
    public boolean filter(Object value) throws Exception {return true;}
});
DataStream mapedStream = filteredStream.map(new MapFunction() {
    @Override
    public Object map(Object value) throws Exception {return value;}
});
mapedStream.addSink(new DiscardingSink());
env.execute("test-job");

StreamGraph is the logical DAG describing sources, transformations, and sinks; it maps one‑to‑one to JobGraph, whose vertices (JobVertex) are created after optimizer processing, and each vertex corresponds to a Task that runs on a TaskManager.

ExecutionGraph is the physical execution plan derived from JobGraph, detailing task scheduling, execution order, data transfer, and state management; each Task may be split into multiple SubTasks according to parallelism.

The proposed canvas (drag‑and‑drop) mode stores nodes (operators) and edges in MySQL tables, uses a BFS‑based traversal to assemble the DataStream API program, and aims to enable zero‑code Flink application development.

In practice, additional metadata such as operator parallelism, schema, key‑by fields, and custom UDF support must be handled, making the real implementation more complex than the simplified example.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Flink Low‑code ExecutionGraph JobGraph StreamGraph

Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.