Use Application Real-Time Monitoring Service (ARMS) to collect Flink job metrics and set up alert rules based on those metrics.
Prerequisites
Before you begin, ensure that you have:
-
A Flink cluster created on the EMR on ACK page in the E-MapReduce (EMR) console. For more information, see Getting started.
-
ARMS activated. For more information, see Create a Prometheus instance to monitor an ACK cluster.
How it works
Flink job metrics flow through two layers of configuration:
-
`flinkConfiguration` — tells Flink to expose metrics on port 9249 via the Prometheus Metric Reporter.
-
`podTemplate.metadata.annotations` — tells ARMS Prometheus which port to scrape and at which path.
Both layers must be in place before metrics can be collected.
Configure Prometheus metrics
Step 1: Go to Prometheus Monitoring
-
Log on to the EMR console. In the left-side navigation pane, click EMR on ACK.
-
On the EMR on ACK page, find the cluster you want to manage and click the link in the ACK Cluster column.
-
In the left-side navigation pane, choose Operations > Prometheus Monitoring.
-
On the Prometheus Monitoring page, wait for the system to automatically install the component and verify that the dashboards appear. Click each tab to view the corresponding metrics.
-
Click Go to ARMS Prometheus in the upper-right corner.
Step 2: Enable service discovery
-
In the left-side navigation pane, click Service Discovery.
-
On the Service Discovery page, click Configure.
-
On the Default Service Discovery tab, turn on the switch in the Actions column for kubernetes-pods.
-
In the dialog box that appears, click Enable.
Step 3: Submit a Flink job with Prometheus annotations
When you submit a Flink job, the FlinkDeployment YAML must include two Prometheus-specific sections:
-
In
flinkConfiguration: the metric reporter class that makes Flink expose a metrics endpoint. -
In
podTemplate.metadata.annotations: the annotations that tell Prometheus to scrape that endpoint.
The following sample YAML includes both sections:
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
name: basic-emr-example
spec:
flinkVersion: v1_13
flinkConfiguration:
state.savepoints.dir: file:///flink-data/flink-savepoints
state.checkpoints.dir: file:///flink-data/flink-checkpoints
# Enable the Prometheus Metric Reporter so Flink exposes metrics on port 9249
metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
serviceAccount: flink
podTemplate:
metadata:
annotations:
# Tell Prometheus to scrape this pod, and specify the path and port
prometheus.io/scrape: "true"
prometheus.io/path: /metrics
prometheus.io/port: "9249"
spec:
serviceAccount: flink
containers:
- name: flink-main-container
volumeMounts:
- mountPath: /flink-data
name: flink-volume
ports:
- containerPort: 9249
name: metrics
protocol: TCP
volumes:
- name: flink-volume
emptyDir: {}
jobManager:
replicas: 1
resource:
memory: "2048m"
cpu: 1
taskManager:
resource:
memory: "2048m"
cpu: 1
job:
jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
parallelism: 2
upgradeMode: stateless
For more information about submitting a Flink job, see Submit a Flink job.
Step 4: Verify that metrics are being collected
After the job starts, go to the Targets tab on the Service Discovery page. The pods associated with your Flink job's JobManager and TaskManager should appear with a status of UP.
If a target shows DOWN, check that:
-
The
prometheus.io/scrape: "true"annotation is present in your YAML. -
Port 9249 is exposed on the container.
Step 5: (Optional) Configure a Grafana dashboard
Adding a dashboard requires Grafana Expert Edition.
-
On the Prometheus Monitoring page, click the Others tab, then click the Prometheus tab.
-
Click Open in New Window.
-
In the left-side navigation pane, choose
> Create. -
Click Add new panel.
-
In the Query section, select your cluster. In the A section, select a metric from the Metrics drop-down list. For example, you can select
flink_jobmanager_job_lastCheckpointDuration. -
Enter the panel title, configure any additional parameters, and click Save in the upper-right corner.
-
In the dialog box that appears, enter a dashboard name, select your ACK cluster, and click Save.
Configure and view alerts
Go to alert rules
-
Log on to the EMR console. In the left-side navigation pane, click EMR on ACK.
-
On the EMR on ACK page, find the cluster and click the link in the ACK Cluster column.
-
In the left-side navigation pane, choose Operations > Prometheus Monitoring.
-
On the Prometheus Monitoring page, wait for the system to automatically install the component and check the dashboards.
-
Click Go to ARMS Prometheus in the upper-right corner.
-
In the left-side navigation pane, click Alerts Rules.
Create an alert rule
On the Prometheus Alert Rules page, click Create Prometheus Alert Rule in the upper-right corner and complete the form.
View alert event history
On the Prometheus Alert Rules page, find the alert you want to review and click Alert Event History in the Actions column. The alert is triggered when the specified condition is met.