Big Data 19 min read

Apache Flink 1.11 New Features Overview

The article provides a comprehensive overview of Apache Flink 1.11, detailing enhancements in cluster deployment, resource management, source/sink APIs, state backends, Table & SQL improvements, DataStream extensions, PyFlink/ML support, and runtime optimizations, along with relevant code examples and references.

Big Data Technology Architecture

May 22, 2020

Introduction

Flink 1.11 is released with numerous improvements over 1.10, aiming to increase usability, performance, and feature completeness across the entire stack.

Cluster Deployment and Resource Management

The new Application mode moves JobGraph generation and submission to the Master, allowing bin/flink run-application on Yarn or Kubernetes. Yarn Application transfers dependencies via Yarn Local Resource, while K8s builds an image containing user JARs and creates TaskManagers automatically.

Remote lib‑jar caching on Yarn reduces bandwidth and storage usage; jobs can now reference JARs directly from HDFS:

./bin/flink run -m yarn-cluster -d \
  -yD yarn.provided.lib.dirs=hdfs://myhdfs/flink/lib,hdfs://myhdfs/flink/plugins \
  examples/streaming/WindowJoin.jar

./bin/flink run-application -p 10 -t yarn-application \
  -yD yarn.provided.lib.dirs="hdfs://myhdfs/flink/lib" \
  hdfs://myhdfs/jars/WindowJoin.jar

K8s enhancements include Node Selector, Labels, Annotations, and Tolerations, plus automatic Hadoop configuration mounting.

Separate configuration of local and external network addresses/ports for JobManager and TaskManager is now supported via parameters such as jobmanager.rpc.address, taskmanager.host, and their corresponding bind‑host options.

Resource Management Enhancements

Unified JM memory configuration aligns JM and TM memory models, and Flink 1.11 adds experimental support for GPU resources through an extensible resource management framework.

Web UI Improvements

The UI now displays all JM/TM logs, supports log reload, download, and full‑screen view, and paginates historical failover exceptions. A new Thread‑Dump tab enables retrieving TM thread dumps directly from the UI.

Source & Sink

FLIP‑27 introduces a unified Source API that abstracts partition discovery and threading, simplifying source development. The StreamingFileSink now supports Avro and ORC formats:

stream.addSink(StreamingFileSink.forBulkFormat(
    Path.fromLocalFile(folder),
    AvroWriters.forSpecificRecord(Address.class)).build());

OrcBulkWriterFactory<Record> factory = new OrcBulkWriterFactory<>(
    new RecordVectorizer(schema), writerProps, new Configuration());
Stream.addSink(StreamingFileSink.forBulkFormat(new Path(outDir.toURI()), factory).build());

State Management

Savepoints now use relative paths for easier relocation. Checkpoint failure callbacks notify TaskManagers, and the SpillableKeyedStateBackend (available via Flink Packages) allows state overflow to disk. RocksDB StateBackend enables local recovery by default, and the FS StateBackend memory‑threshold is increased to 20 KB.

Table & SQL

Improvements include better UDF type inference, a richer TableEnvironment API, JDBC/Postgres catalog support, ChangeLog source support, new TableSource/TableSink interfaces, unified connector property keys, dynamic table options, and Hive integration (DDL/DML compatibility and Hive Streaming sink).

DataStream API

The WatermarkAssigner interface is consolidated, and multi‑input operators are supported via MultipleInputTransformation and MultipleConnectedStreams constructs.

PyFlink & Machine Learning

Python UDFs, UDTFs, and metrics are now supported in both batch and streaming planners. Pandas UDFs enable vectorized processing, and Table‑to‑Pandas conversions are provided. A Python Pipeline API extends Flink ML capabilities.

Runtime Optimizations

Unaligned Checkpoints reduce checkpoint latency under back‑pressure. Flink now integrates with Zookeeper 3.5, caches Slot‑level ClassLoaders to avoid OOM, upgrades to Log4j 2, and reduces data copy overhead in TaskManager receivers.

References

[1] https://github.com/apache/flink-docker [2] https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_setup.html [3] https://ci.apache.org/projects/flink/flink-docs-master/ops/memory/mem_setup_master.html [4] https://lists.apache.org/thread.html/r194671ce27360eeeb0d3110ea7f35e214ba8f7a04fdf28efd3123ae2%40%3Cdev.flink.apache.org%3E

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

stream processing Apache Flink Table API python udf Flink 1.11

Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.