Big Data 16 min read

Flink RocksDB State Backend: Practical Tuning Guide for Large Jobs

This article explains how to optimize Flink’s RocksDB state backend for large‑scale streaming jobs, covering state types, enabling latency tracking, incremental checkpoints, predefined options, and advanced memory and thread settings, with practical configuration examples and performance comparisons.

WeiLi Technology Team
WeiLi Technology Team
WeiLi Technology Team
Flink RocksDB State Backend: Practical Tuning Guide for Large Jobs

Background

In production environments, Flink tasks often rely on state backends that store intermediate state. Flink offers three state backends: MemoryStateBackend, FsStateBackend, and RocksDBStateBackend. RocksDB is the only option for large state sizes (GB to TB) and its performance heavily depends on proper tuning.

State Backend Overview

MemoryStateBackend : Stores state in JVM heap memory; suitable for local development and debugging.

FsStateBackend : Persists state snapshots to a file system (local or distributed) during checkpoints, while keeping working state in TaskManager memory.

RocksDBStateBackend : Uses the embedded RocksDB key‑value store for state; snapshots are written to a file system at checkpoints. It offers faster reads than pure file‑system storage and larger capacity than in‑memory storage.

For large state sizes, RocksDB is recommended and can be further optimized.

RocksDB Large‑State Tuning

3.1 Enable State Access Latency Tracking

Flink 1.13 introduced latency tracking for state access, applicable to any state backend.

<code>state.backend.latency-track.keyed-state-enabled: true  # Enable latency tracking</code><code>state.backend.latency-track.sample-interval: 100      # Sample every 100 accesses</code><code>state.backend.latency-track.history-size: 128        # Keep 128 samples for accuracy</code><code>state.backend.latency-track.state-name-as-variable: true</code>

Usually only the first parameter needs to be set.

3.2 Incremental Checkpoints and Local Recovery

3.2.1 Enable Incremental Checkpoints

RocksDB supports incremental checkpoints, which store only the changes since the previous checkpoint.

<code>state.backend.incremental: true</code><code>// or in code</code><code>new EmbeddedRocksDBStateBackend(true)</code>

Incremental checkpoints dramatically reduce backup time for large state (e.g., backing up only 100 MB instead of 1 GB when the total state is 1 GB).

3.2.2 Enable Local Recovery

When a Flink job fails, local recovery can restore from the local RocksDB state without fetching data from HDFS.

<code>state.backend.local-recovery: true</code>

3.3 Adjust Predefined Options

Flink provides predefined RocksDB option sets such as

DEFAULT

,

SPINNING_DISK_OPTIMIZED

,

SPINNING_DISK_OPTIMIZED_HIGH_MEM

, and

FLASH_SSD_OPTIMIZED

. The high‑memory option is generally sufficient; use the SSD‑optimized set when hardware permits.

<code>env.setStateBackend(new EmbeddedRocksDBStateBackend(true).setPredefinedOptions(PredefinedOptions.SPINNING_DISK_OPTIMIZED_HIGH_MEM));</code><code>-Dstate.backend.rocksdb.predefined-options=SPINNING_DISK_OPTIMIZED_HIGH_MEM</code>

3.4 Other Advanced Configurations

3.4.1 Increase Block Cache

Block cache size directly impacts read efficiency. The default is 8 MB; a size of 64–256 MB is recommended.

<code>state.backend.rocksdb.block.cache-size: 64m</code>

3.4.2 Increase Write Buffer and Level Size

Each column family uses a separate write buffer (default 64 MB). Increase it and adjust the level size threshold.

<code>state.backend.rocksdb.writebuffer.size: 128m</code><code>state.backend.rocksdb.compaction.level.max-size-level-base: 320m</code>

3.4.3 Increase Write Buffer Count

Increase the maximum number of write buffers (default 2) to about 5 for machines with ample memory.

<code>state.backend.rocksdb.writebuffer.count: 5</code>

3.4.4 Increase Background Threads and Merge Count

Raise the number of background threads for flushing and compaction, and set the minimum number of write buffers to merge.

<code>state.backend.rocksdb.thread.num: 4</code><code>state.backend.rocksdb.writebuffer.number-to-merge: 3</code>

3.4.5 Enable Partitioned Index

Partitioned index reduces memory pressure and can improve performance up to tenfold in low‑memory scenarios.

<code>state.backend.rocksdb.memory.partitioned-index-filters: true</code>

3.5 Parameter Setting Example

<code>nohup ./bin/yarn-session.sh -nm flink14-pv-event-nginx-parse -s 6 -tm 15360 -jm 2048 \
  -Dstate.backend.rocksdb.predefined-options=SPINNING_DISK_OPTIMIZED \
  -Dstate.backend.rocksdb.writebuffer.size=256m \
  -Dstate.backend.rocksdb.writebuffer.count=5 \
  -Dstate.backend.rocksdb.compaction.level.max-size-level-base=320m \
  -Dstate.backend.rocksdb.writebuffer.number-to-merge=3 \
  -Dstate.backend.rocksdb.memory.partitioned-index-filters=true -d &amp;&gt;/dev/null 2&amp;1 &amp;</code>

Checkpoint Settings

Configure checkpoint intervals based on business latency requirements, ensuring the interval exceeds the end‑to‑end duration.

<code>// Use RocksDBStateBackend with incremental checkpoints</code><code>RocksDBStateBackend rocksDBStateBackend = new RocksDBStateBackend("hdfs://hadoop01:8020/flink/checkpoints", true);</code><code>env.setStateBackend(rocksDBStateBackend);</code><code>env.enableCheckpointing(TimeUnit.MINUTES.toMillis(1)); // 1‑minute interval</code><code>CheckpointConfig checkpointConf = env.getCheckpointConfig();</code><code>checkpointConf.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);</code><code>checkpointConf.setMinPauseBetweenCheckpoints(TimeUnit.MINUTES.toMillis(2)); // min 2 min</code><code>checkpointConf.setCheckpointTimeout(TimeUnit.MINUTES.toMillis(10)); // timeout 10 min</code><code>checkpointConf.enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);</code>

Tuning Practice Summary

For tasks with small state (< GB), the default RocksDB settings (memory‑managed) are sufficient.

For large‑state tasks, disable RocksDB memory management (

state.backend.rocksdb.memory.managed: false

) and manually tune parameters as shown.

Beyond parameter tuning, reducing state size (e.g., state TTL, cleanup policies) remains the most effective optimization.

Reference Materials

State Backends | Apache Flink – https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/state/state_backends/

RocksDB Chinese Documentation – http://rocksdb.org.cn/doc/getting-started.html

Flink RocksDB State Backend Tuning – https://juejin.cn/post/6874493825272774663

Flink Optimization – State and Checkpoint Tuning – https://blog.csdn.net/Johnson8702/article/details/123841695

big dataFlinkPerformance TuningRocksDBState Backend
WeiLi Technology Team
Written by

WeiLi Technology Team

Practicing data-driven principles and believing technology can change the world.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.