All Products
Search
Document Center

E-MapReduce:Configure and view monitoring and alerts for Flink jobs

Last Updated:Mar 26, 2026

Use Application Real-Time Monitoring Service (ARMS) to collect Flink job metrics and set up alert rules based on those metrics.

Prerequisites

Before you begin, ensure that you have:

How it works

Flink job metrics flow through two layers of configuration:

  • `flinkConfiguration` — tells Flink to expose metrics on port 9249 via the Prometheus Metric Reporter.

  • `podTemplate.metadata.annotations` — tells ARMS Prometheus which port to scrape and at which path.

Both layers must be in place before metrics can be collected.

Configure Prometheus metrics

Step 1: Go to Prometheus Monitoring

  1. Log on to the EMR console. In the left-side navigation pane, click EMR on ACK.

  2. On the EMR on ACK page, find the cluster you want to manage and click the link in the ACK Cluster column.

  3. In the left-side navigation pane, choose Operations > Prometheus Monitoring.

  4. On the Prometheus Monitoring page, wait for the system to automatically install the component and verify that the dashboards appear. Click each tab to view the corresponding metrics.

  5. Click Go to ARMS Prometheus in the upper-right corner.

Step 2: Enable service discovery

  1. In the left-side navigation pane, click Service Discovery.

  2. On the Service Discovery page, click Configure.

  3. On the Default Service Discovery tab, turn on the switch in the Actions column for kubernetes-pods.

  4. In the dialog box that appears, click Enable.

Step 3: Submit a Flink job with Prometheus annotations

When you submit a Flink job, the FlinkDeployment YAML must include two Prometheus-specific sections:

  • In flinkConfiguration: the metric reporter class that makes Flink expose a metrics endpoint.

  • In podTemplate.metadata.annotations: the annotations that tell Prometheus to scrape that endpoint.

The following sample YAML includes both sections:

apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
  name: basic-emr-example
spec:
  flinkVersion: v1_13
  flinkConfiguration:
    state.savepoints.dir: file:///flink-data/flink-savepoints
    state.checkpoints.dir: file:///flink-data/flink-checkpoints
    # Enable the Prometheus Metric Reporter so Flink exposes metrics on port 9249
    metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
  serviceAccount: flink
  podTemplate:
    metadata:
      annotations:
        # Tell Prometheus to scrape this pod, and specify the path and port
        prometheus.io/scrape: "true"
        prometheus.io/path: /metrics
        prometheus.io/port: "9249"
    spec:
      serviceAccount: flink
      containers:
        - name: flink-main-container
          volumeMounts:
            - mountPath: /flink-data
              name: flink-volume
          ports:
            - containerPort: 9249
              name: metrics
              protocol: TCP
      volumes:
        - name: flink-volume
          emptyDir: {}
  jobManager:
    replicas: 1
    resource:
      memory: "2048m"
      cpu: 1
  taskManager:
    resource:
      memory: "2048m"
      cpu: 1
  job:
    jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
    parallelism: 2
    upgradeMode: stateless

For more information about submitting a Flink job, see Submit a Flink job.

Step 4: Verify that metrics are being collected

After the job starts, go to the Targets tab on the Service Discovery page. The pods associated with your Flink job's JobManager and TaskManager should appear with a status of UP.

If a target shows DOWN, check that:

  • The prometheus.io/scrape: "true" annotation is present in your YAML.

  • Port 9249 is exposed on the container.

Step 5: (Optional) Configure a Grafana dashboard

Important

Adding a dashboard requires Grafana Expert Edition.

  1. On the Prometheus Monitoring page, click the Others tab, then click the Prometheus tab.

  2. Click Open in New Window.

  3. In the left-side navigation pane, choose add > Create.

  4. Click Add new panel.

  5. In the Query section, select your cluster. In the A section, select a metric from the Metrics drop-down list. For example, you can select flink_jobmanager_job_lastCheckpointDuration.

  6. Enter the panel title, configure any additional parameters, and click Save in the upper-right corner.

  7. In the dialog box that appears, enter a dashboard name, select your ACK cluster, and click Save.

Configure and view alerts

Go to alert rules

  1. Log on to the EMR console. In the left-side navigation pane, click EMR on ACK.

  2. On the EMR on ACK page, find the cluster and click the link in the ACK Cluster column.

  3. In the left-side navigation pane, choose Operations > Prometheus Monitoring.

  4. On the Prometheus Monitoring page, wait for the system to automatically install the component and check the dashboards.

  5. Click Go to ARMS Prometheus in the upper-right corner.

  6. In the left-side navigation pane, click Alerts Rules.

Create an alert rule

On the Prometheus Alert Rules page, click Create Prometheus Alert Rule in the upper-right corner and complete the form.

View alert event history

On the Prometheus Alert Rules page, find the alert you want to review and click Alert Event History in the Actions column. The alert is triggered when the specified condition is met.

What's next