Configure and view metrics and alerts of Flink jobs - E-MapReduce

This topic describes how to use Application Real-Time Monitoring Service (ARMS) to collect and view metrics of Flink jobs and how to configure alert rules based on the metrics.

Prerequisites

A Flink cluster is created on the EMR on ACK page in the new Alibaba Cloud E-MapReduce (EMR) console. For more information, see Getting started.
ARMS is activated. For more information, see Create a Prometheus instance to monitor an ACK cluster.

Configure Prometheus Service

Go to the Prometheus Monitoring page.
1. Log on to the EMR console. In the left-side navigation pane, click EMR on ACK.
2. On the EMR on ACK page, find the cluster that you want to manage and click the link in the ACK Cluster column.
3. In the left-side navigation pane, choose Operations > Prometheus Monitoring.
4. On the Prometheus Monitoring page, wait for the system to automatically install the component and check the dashboards.
  After the installation is complete, you can click each tab to view the corresponding metrics.
5. On the Prometheus Monitoring page, click Go to ARMS Prometheus in the upper-right corner.
Enable service discovery.
1. In the left-side navigation pane, click Service Discovery.
2. On the Service Discovery page, click Configure.
3. On the Default Service Discovery tab, turn on the switch in the Actions column of kubernetes-pods.
4. In the dialog box that appears, click Enable.

Submit a Flink job. For more information, see Submit a Flink job.

Important

You must specify the annotations for Prometheus Metric Reporter in the podTemplate parameter of the YAML file of the Flink job.

Sample YAML file:


apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
  name: basic-emr-example
spec:
  flinkVersion: v1_13
  flinkConfiguration:
    state.savepoints.dir: file:///flink-data/flink-savepoints
    state.checkpoints.dir: file:///flink-data/flink-checkpoints
    metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
  serviceAccount: flink
  podTemplate:
    metadata:
      annotations:
        prometheus.io/path: /metrics
        prometheus.io/port: "9249"
        prometheus.io/scrape: "true"
    spec:
      serviceAccount: flink
      containers:
        - name: flink-main-container
          volumeMounts:
            - mountPath: /flink-data
              name: flink-volume
          ports:
            - containerPort: 9249
              name: metrics
              protocol: TCP
      volumes:
        - name: flink-volume
          emptyDir: {}

  jobManager:
    replicas: 1
    resource:
      memory: "2048m"
      cpu: 1
  taskManager:
    resource:
      memory: "2048m"
      cpu: 1

  job:
    jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
    parallelism: 2
    upgradeMode: stateless

After the job is run, go to the Targets tab of the Service Discovery page. You can view the status of pods and collect the metrics on the JobManager and TaskManager of the Flink job.
Optional:Configure a Grafana dashboard to view the metrics.
1. On the Prometheus Monitoring page, click the Others tab.
2. Click the Prometheus tab.
3. Click Open in New Window .
4. In the left-side navigation pane, choose > Create.
  Important
  You can add a dashboard only if you use Grafana Expert Edition.
5. Click Add new panel.
6. In the Query section, select a cluster. In the A section, select the metric that you want to view from the Metrics drop-down list. For example, you can select flink_jobmanager_job_lastCheckpointDuration.
7. Enter the panel title and configure other parameters as required.
8. Click Save in the upper-right corner. In the dialog box that appears, enter a dashboard name, select your ACK cluster, and then click Save.

Configure and view alerts

Go to the Alerts Rules page.
1. Log on to the EMR console. In the left-side navigation pane, click EMR on ACK.
2. On the EMR on ACK page, find the cluster whose alert rules you want to view and click the link in the ACK Cluster column.
3. In the left-side navigation pane, choose Operations > Prometheus Monitoring.
4. On the Prometheus Monitoring page, wait for the system to automatically install the component and check the dashboards.
  After the installation is complete, you can click each tab to view the corresponding metrics.
5. On the Prometheus Monitoring page, click Go to ARMS Prometheus in the upper-right corner.
6. In the left-side navigation pane, click Alerts Rules.
Configure alert rules.
1. In the upper-right corner of the Prometheus Alert Rules page, click Create Prometheus Alert Rule.
2. Create an alert rule.
On the Prometheus Alert Rules page, find the alert that you want to view and click Alert Event History in the Actions column.
The alert is triggered when the specified condition is met.