Use Application Real-Time Monitoring Service (ARMS) to collect Spark job metrics, create Prometheus alert rules, and review alert event history.
Prerequisites
Before you begin, ensure that you have:
A Spark cluster created on the EMR on ACK page of the EMR console. See Get started with EMR on ACK.
ARMS activated with a Prometheus instance created. See Create a Prometheus instance to monitor an ACK cluster.
Configure Prometheus service
Step 1: Enable PodMonitor
Log on to the ARMS console.
In the left-side navigation pane, click Integration Management.
On the Integrated Environments tab, click the Container Service tab, find your ACK environment instance, and click Metric Scraping in the Actions column.
On the Metric Scraping tab, click Pod Monitor.
On the Pod Monitor tab, find each of the following monitors and click Enable in the Actions column:
sparkoperator-podmonitorsparkoperator-spark-podmonitorshuffleservice-master-podmonitorshuffleservice-worker-podmonitor
If no Shuffle Service cluster is associated with your Spark cluster, skip
shuffleservice-master-podmonitorandshuffleservice-worker-podmonitor.
Step 2: Submit a Spark job
Submit a Spark job. See Submit a Spark job.
Step 3: Verify metrics on the Grafana dashboard (optional)
After submitting a job, confirm that metrics are being collected by querying them in Grafana.
Log on to Grafana.
In the left-side navigation pane, click the Explore icon.
On the Explore page, select your ACK cluster from the drop-down list, specify a metric name, and click Run Query in the upper-right corner. Use the following table to identify the metric prefix for what you want to monitor:
What you want to monitor
Metric prefix
Spark pod metrics (driver, executor, or JVM)
spark_driver_,spark_executor_, orjvm_Spark application-level metrics (via Spark Operator)
spark_appShuffle Service metrics
metrics_
Configure and view alert rules
Step 1: Go to the Alert Rules page
Log on to the EMR console. In the left-side navigation pane, click EMR on ACK.
On the EMR on ACK page, find your cluster and click the link in the ACK Cluster column.
In the left-side navigation pane, choose Operations > Prometheus Monitoring.
On the Prometheus Monitoring page, wait for the system to automatically install the required component and load the dashboards. After installation completes, click each tab to review the corresponding metrics.
Click Go to ARMS Prometheus in the upper-right corner.
In the left-side navigation pane, click Alerts Rules.
Step 2: Create a Prometheus alert rule
On the Prometheus Alert Rules page, click Create Prometheus Alert Rule.
On the Create Prometheus Alert Rule page, configure the alert parameters.
Step 3: View alert event history
On the Prometheus Alert Rules page, find the alert rule you want to review and click Alert Event History in the Actions column.
When the configured alert conditions are met, alerts are triggered and listed on this page.