Operations 7 min read

How to Build a Flink Monitoring System with Prometheus, Pushgateway, and Grafana

This guide walks you through configuring Flink metrics, installing and linking Pushgateway, Node_exporter, Prometheus, and Grafana, and finally visualizing and alerting on Flink metrics, providing a complete end‑to‑end monitoring solution for Flink clusters.

37 Mobile Game Tech Team
37 Mobile Game Tech Team
37 Mobile Game Tech Team
How to Build a Flink Monitoring System with Prometheus, Pushgateway, and Grafana

The previous conceptual and source‑code articles introduced the idea of Flink Metrics and showed how to add, delete, and pull them. This part demonstrates, step by step, how to build a monitoring system for Flink using

Prometheus

+

Pushgateway

+

Grafana

.

Because the Pushgateway receives metrics pushed by Flink, it avoids the need for Prometheus to scrape targets that may be on different subnets or behind firewalls. Flink uses

PrometheusPushGatewayReporter

to push metrics to the Pushgateway, and Prometheus then scrapes the Pushgateway for a unified view.

1. Configure Flink

Edit

conf/flink-conf.yaml

and add:

<code>metrics.reporters: progateway
metrics.reporter.promgateway.class: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter
metrics.reporter.promgateway.host: datanode01
metrics.reporter.promgateway.port: 9100
metrics.reporter.promgateway.jobName: flink-metrics</code>

2. Install Pushgateway

Download the appropriate version from

https://prometheus.io/download/

, extract it, and start it:

<code>tar zxvf pushgateway-1.4.1.linux-amd64.tar.gz
./pushgateway &amp;</code>

3. Install node_exporter

Start the service:

<code>./node_exporter &amp;</code>

Access

http://localhost:9100/metrics

to see host metrics.

4. Install Prometheus

Download, extract, and edit

prometheus.yml

to add the following scrape jobs:

<code>- job_name: 'node_exporter'
  static_configs:
  - targets: ['localhost:9100']
    labels:
      instance: 'node_exporter'

- job_name: 'pushgateway'
  static_configs:
  - targets: ['localhost:9091']
    labels:
      instance: 'pushgateway'</code>

Start Prometheus:

<code>./prometheus --config.file=prometheus.yml</code>

Visit

http://localhost:9090/

to verify the service.

5. Install Grafana

Download, extract, and start Grafana:

<code>tar -zxvf grafana-8.0.3.linux-amd64.tar.gz
./grafana-server</code>

Open

http://localhost:3000

to access the Grafana UI.

6. Metrics Visualization

Start a Flink cluster:

<code>./bin/start-cluster.sh</code>

Run a Flink SQL client and create a data‑generation table:

<code>CREATE TABLE prometheusdatagen (
  f_sequence INT,
  f_random INT,
  f_random_str STRING,
  ts AS localtimestamp,
  WATERMARK FOR ts AS ts
) WITH (
  'connector' = 'datagen',
  'rows-per-second'='5',
  'fields.f_sequence.kind'='sequence',
  'fields.f_sequence.start'='1',
  'fields.f_sequence.end'='100000',
  'fields.f_random.min'='1',
  'fields.f_random.max'='1000',
  'fields.f_random_str.length'='10'
);</code>

Query the table and observe the job in the Flink UI. Flink pushes its metrics to the Pushgateway, which Prometheus scrapes.

7. Add Prometheus Data Source in Grafana

Select

Prometheus

as the data source and configure the address and port.

8. Visualize node_exporter Metrics

Import the dashboard template ID

12884

to display node_exporter metrics.

9. Flink Metrics Dashboard

Choose the

Metrics

panel in Grafana and save the dashboard.

10. Alert Configuration

The entire monitoring pipeline—data collection, storage, visualization, and alerting—has been set up. For custom metrics, define them in the Flink job and follow the same steps to collect, store, display, and alert on them.

monitoringFlinkmetricsPrometheusGrafanaPushgateway
37 Mobile Game Tech Team
Written by

37 Mobile Game Tech Team

37 Mobile Game Tech Team

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.